Thursday, August 16, 2012

Big Data and the Goldilocks Principle

I was inspired to write this post (I can hear all of you sighing ‘Yet Another on Big Data’) due to another ‘Big’ reason. I listened to a TED (www.ted.com) talk by David Christian titled ‘The History of the World in 18 minutes’ in which he narrates a complete history of the universe, from the Big Bang to the Internet, in a riveting 18 minutes. This is “Big History”: an enlightening, wide-angle look at complexity, life and humanity, set against our slim share of the cosmic timeline. Check out his website – www.bighistoryproject.com , and I promise you that this ‘Big’ has nothing to do with Big Data, as we know it. But what got me interested in his talk is his reference to the ‘Goldilocks moment’ – a moment so precisely right for certain thresholds to be reached to enable higher forms of complexity (life) in the universe.

That got me thinking – Is Big Data the ‘Goldilocks moment’ for organizations with respect to analytics helping them towards achieving better business outcomes?

I think the answer is ‘Yes’ and this stems from the following hypothesis – An organization can utilize analytics for better business outcomes if:

a)      they have more data points to be analyzed (volume)

b)      have the ability to perform sophisticated analysis on large and diverse datasets (variety)

c)       and can do it at a much faster rate than before (velocity)

In that context, I really liked the picture (given below) from one of the IBM articles, which illustrates how Big Data when synthesized  properly along with standard transactional data can help in better business decision making (in this case, it was Fraud Detection)

Source: IBM – Understanding Big Data by Paul Zikopoulos

On the other hand, the exponential increase in processing power of CPUs, the steep fall in memory prices and high bandwidth availability, have enabled the practical use of Big Data techniques. From the human angle, people are creating digital data, viz. social media chatter, video sharing, blogs, mobility etc. at a rapid pace that organizations (with help of Big Data techniques, of course) can potentially solve the ‘Innovators Dilemma’ by providing new products and services that the consumers did not ask for simply because they couldn’t figure out what they actually want.

All in all, I think we are at a precise moment in history (the Goldilocks moment) where organizations can greatly increase their ability to provide better products & services for their consumers using Big Data techniques.

Source: http://blogs.hexaware.com/business-intelligence/big-data-and-the-goldilocks-principle/

Monday, August 6, 2012

Business Focused Analytics – The Starting Point

Having been a Business Intelligence practitioner for the last 13 years, there has never been a more exciting time to practice this art, as organizations increasingly realize that a well implemented BI & Analytics system can provide great competitive advantage for them. This leads us to the question of – ‘What is a well implemented BI system?’ Let us follow the Q&A below.

Q: What is a well implemented BI system?

A: A well implemented BI system is one that is completely business focused.

Q: Well, that doesn’t make it any easier. How can we have BI that is completely business focused?

A: BI & Analytics becomes completely business focused when they have ‘business decisions’ as the cornerstone of their implementation. The starting point to build / re-engineer a BI system is to identify the business decisions taken by business stakeholders in their sphere of operations. Business decisions can be operational in nature (taken on a daily basis) and/or strategic (taken more infrequently but they tend to have a longer term impact). To reiterate, the starting point for BI is to catalog the business decisions taken by business stakeholders and collect the artifacts that are currently used to take those decisions.

Q: The starting point is fine – What are the other pieces?

A: The next step is to identify the metrics and key performance indicators that support decision making. In other words, any metric identified should be unambiguously correlated to the decision taken with the help of that metric and by whom. Next we need to identify the core datasets in the organization. Please refer to my earlier blog post titled ‘Thinking by Datasets’  on this subject.

Q: What about the operational systems in the landscape? Aren’t they important?

A: Once we have documented the relationship between Business Decisions to Metrics to Datasets, we need to focus on the transactional applications. The key focus items are:

  • Inventory of all Transactional Applications
  • Identify the business process catered by these applications
  • Identify the datasets generated as part of each of business process
  • Next step is to drill-down into individual entities that make up each of the datasets
  • Once the Facts & Dimensions are identified from the entities, sketch out the classic ‘Bus Matrix’ which would form the basis for dimensional data modeling

 

Q: All this is good if we are building a BI system from scratch – How about existing BI systems?

A: For existing BI applications, the above mentioned process could be carried out as a health-check on the BI landscape. The bottomline is that every single report / dashboard / any other analytical component should have traceability into the metrics shown which should then link to the decisions taken by business users. BI & Analytics exist to help organizations take better business decisions and that defines its purpose & role in an enterprise IT landscape.

The answers mentioned above provide the high-level view of Hexaware’s approach to Business Intelligence projects. We have worked with many organizations across industries and a business focused analytical approach has provided good value for our customers.

Thanks for reading. Please do share your thoughts.

Sunday, July 31, 2011

Better Understanding Link-based Spam Analysis Techniques

One frustrating aspect of link building is not knowing the value of a link. Although experience, and some data, can make you better at link valuation, it is impossible to know to what degree a link may be helping you. It’s hard to know if a link is even helping at all. Search engines do not count all links, they reduce the value of many that they do count, and use factors related to your links to further suppress the value that’s left over. This is all done to improve relevancy and spam detection.

Understanding the basics of link-based spam detection can improve your understanding of link valuation and help you understand how search engines approach the problem of spam detection, which can lead to better link building practices.
I’d like to talk about a few interesting link spam analysis concepts that search engines may use to evaluate your backlink profile.
Disclaimer:
I don’t work at a search engine, so I can make no concrete claims about how search engines evaluate links. Engines may use some, or none, of the techniques in this post. They also certainly use more (and more sophisticated) techniques than I can cover in this post. However, I spend a lot of time reading through papers and patents, so I thought it'd be worth sharing some of the interesting techniques.

#1 Truncated PageRank

Truncated PageRank
The basics of Truncated PageRank are covered in the paper Linked-based Characterization and Detection of Web Spam. Truncated PageRank is a calculation that removes the direct “link juice” contribution provided by the first level(s) of links. So a page boosted by naïve methods (such as article marketing) are receiving a large portion of the PageRank value directly from the first layer. However, a link from a well linked to page will receive “link juice” contribution from additional levels. Spam pages will likely show a Truncated PageRank that is significantly less than the PageRank. The ratio of Truncated PageRank to PageRank can be a signal to indicate the spamminess of a link profile.

#2 Owned / Accessible Contributions

Links can be bucketed into three general buckets.
  1. Links from owned content – Links from pages that search engines have determined some level of ownership (well-connected co-citation, IP, whois, etc.)
  2. Links from accessible content – Links from non-owned content that is easily accessible to add links (blogs, forums, article directories, guest books, etc.)
  3. Links from inaccessible content – Links from independent sources.
A link from any one of these source is neither good nor bad. Links from owned content, via networks and relationships, are perfectly natural. However, a link from inaccessible content could be a paid link, so that bucket doesn’t mean it’s inherently good. However, knowing the bucket a link falls into can change the valuation.
Owned Contribution
This type of analysis on two sites can show a distinct difference in a link profile, all other factors being equal. The first site is primarily supported on links from content it directly controls or can gain access to. However, the second site has earned links from a substantially larger percentage of unique, independent sources. All things being equal, the second site is less likely to be spam.

#3 Relative Mass

Relative Mass accounts for the percent distribution of a profile for certain types of links. The example of the pie charts above demonstrates the concept of relative massive.
Relative Mass
Relative Mass is discussed more broadly in the paper Link Spam Detection Based on Mass Estimation. Relative Mass analysis can define a threshold at which a page is determined “spam”. In the image above, the red circles have been identified as spam. The target page now has a portion of value attributed to it via “spam” sites. If this value of contribution exceeds a potential threshold, this page could have its rankings suppressed or the value passed through these links minimized. The example above is fairly binary, but there is often a large gradient between not spam and spam.
This type of analysis can be applied to tactics as well, such as distribution of links from comments, directories, articles, hijacked sources, owned pages, paid links, etc. The algorithm may provide a certain degree of “forgiveness” before its relative mass contribution exceeds an acceptable level.

#4 Counting Supporters / Speeds to Nodes

Another method of valuing links is by counting supporters and the speed of discovery of those nodes (and the point at which this discovery peaks).
counting supporters
A histogram distribution of supporting nodes by hops can demonstrate the differences between spam and high quality sites.
supporters histogram
Well-connected sites will grow in supporters more rapidly than spam sites and spam sites are likely to peak earlier. Spam sites will grow rapidly and decay quickly as you move away from the target node. This distribution can help signify that a site is using spammy link building practices. Because spam networks have higher degrees of clustering, domains will repeat upon hops, which makes spam profiles bottleneck faster than non-spam profiles.
Protip: I think this is one reason that domain diversity and unique linking root domains is well correlated with rankings. I don’t think the relationship is as naïve as counting linking domains, but an analysis like supporter counting, as well as Truncated PageRank, would make receiving links from a larger set of diverse domains more well correlated with rankings.

#5 TrustRank, Anti-TrustRank, SpamRank, etc.

The model of TrustRank has been written about several times before and is the basis of metrics like mozTrust. The basic premise is that seed nodes can have both Trust and Spam scores which can be passed through links. The closer to the seed set, the higher the likelihood you are what that seed set was defined as. Being close to spam, makes you more likely to be spam, being close to trust, makes you more likely to be trusted. These values can be judged inbound and outbound.
I won’t go into much more detail than that, because you can read about it in previous posts, but it comes down to four simple rules.
  • Get links from trusted content.
  • Don’t get links from spam content.
  • Link to trusted content.
  • Don’t link to spam content.
This type of analysis has also been used to use SEO forums against spammers. A search engine can crawl links from top SEO forums to create a seed set of domains to perform analysis. Tinfoil hat time....

#6 Anchor Text vs. Time

Monitoring anchor text over time can give interesting insights that could detect potential manipulation. Let’s look at an example of how a preowned domain that was purchased for link value (and spam) might appear with this type of analysis.
anchor text over time
This domain has a historical record of acquiring anchor text including both brand and non-branded targeted terms. Then suddenly that rate drops and after time a new sudden influx of anchor text, never seen before, starts to come in. This type of anchor text analysis, in combination with orthogonal spam detection approaches, can help detect the point in which ownership was changed. Links prior to this point can then be evaluated differently.
This type of analysis, plus some other very interesting stuff, is discussed in the Google paper Document Scoring Based on Link-Based Criteria.

#7 Link Growth Thresholds

Sites with rapid link growth could have the impact dampened by applying a threshold of value that can be gained within a unit time. Corroborating signals can help determine if a spike is from a real event or viral content, as opposed to link manipulation.
link growth thresholds
This threshold can discount the value of links that exceed an assigned threshold. A more paced, natural growth profile is less likely to break a threshold. You can find more information about historical analysis in the paper Information Retrieval Based on Historical Data.

#8 Robust PageRank

Robust PageRank works by calculating PageRank without the highest contributing nodes.
robust pagerank
In the image above, the two strongest links were turned off and effectively reduced the PageRank of a node. Strong sites often have robust profiles and do not heavily depend on a few strong sources (such as links from link farms) to maintain a high PageRank. Robust PageRank calculations is one way the impact of over-influential nodes can be reduced. You can read more about Robust PageRank in the paper Robust PageRank and Locally Computable Spam Detection Features.

#9 PageRank Variance

The uniformity of PageRank contribution to a node can be used to evaluate spam. Natural link profiles are likely to have a stronger variance in PageRank contribution. Spam profiles tend to be more uniform.
pagerank variance
So if you use a tool, marketplace, or service to order 15 PR 4 links for a specific anchor text, it will have a low variance in PR. This is an easy way to detect these sorts of practices.

#10 Diminishing Returns

One way to minimize the value of a tactic is to create diminishing marginal returns on specific types of links. This is easiest to see in sitewide links, such as blogroll links or footer paid links. At one time, link popularity, in volume, was a strong factor which lead to sitewides carrying a disproportionate amount of value.
link building diminishing returns
The first link from a domain carries the first vote and getting additional links from one particular domain will continue to increase the total value from a domain, but only to a point. Eventually inbound links from the same domain will continue to experience diminishing returns. Going from 1 link to 3 links from a domain will have more of an effect than 101 links to 103 links.
Protip: Although it’s easy to see this with sitewide links, I think of most link building tactics in this fashion. In addition to ideas like relative mass, where you don’t want one thing to dominate, I feel tactics lose traction overtime. It is not likely you can earn strong rankings on a limited number of tactics, because many manual tactics tend to hit a point of diminishing returns (sometimes it may be algorithmic, other times it may be due to diminishing returns in the competitive advantage). It's best to avoid one-dimensional link building.

Link Spam Algorithms

All spam analysis algorithms have some percentage of accuracy and some level of false positives. Through the combination of these detection methods, search engines can maximize the accuracy and minimize the false positives.
Web spam analysis allows for more false positives than email spam detection, because there are often multiple alternatives to replace a pushed down result. It is not like email spam detection, which is binary in nature (inbox or spam box). In addition to this, search engines don’t have to create binary labels of “spam” or “not spam” to effectively improve search results. By using analysis, such as some of those discussed in this post, search engines can simply dampen rankings and minimize effects.
These analysis techniques are also designed to decrease the ROI of specific tactics, which makes spamming harder and more expensive. The goal of this post is not to stress about what links work, and which don’t, because it’s hard to know. The goal is to demonstrate some of the problem solving tactics used by search engines and how this impacts your tactics.

Thursday, July 28, 2011

5 Tips for Meeting Online Friends IRL

Dr. Pete and GianlucaSocial media is a bit of a paradox – we have more “friends” than ever, but our relationships feel more and more superficial. When we retreat to the comfort of the internet, we introverts have even less incentive to get to know people IRL (In Real Life, for those who don't spend all day on the internet). If you know me online, it may surprise you to hear that I consider myself a recovering introvert. I’m also a work-at-home father of a 1-year-old, so I’m lucky to hit one SEO conference a year.

In honor of being in Seattle for Mozcon this week, I’d like to share 5 tips for how I’ve managed to make social media count and turn online relationships into real, offline friendships and business partnerships. Just to illustrate the point, that’s a picture of me with SEOmoz enthusiast and fellow proud dad Gianluca Fiorelli, who I finally got to meet in person today (thanks to Rudy Lopez for snapping the picture).

1. Get to Know People

If you only see your online friends as a way to get more Likes and +1s or water your Farmville crops when you’re out of town, you’ll never develop a real-life connection. Building any lasting relationship starts with sincerity. I think that 80% of my own success comes from the fact that I genuinely like people. Social media blurs the lines between work and personal life, and it’s a tremendous opportunity to get to know more about people’s lives outside of work.

2. Be a White-hat Stalker

Social media is also an amazing way to keep track of people, especially with real-time information like Twitter and FourSquare. Sometimes, all it takes is paying attention and knowing when you and your online friends will be in the same place at the same time. A couple of years ago, I was on Twitter and noticed that an industry friend was visiting the Google office in Chicago, just a few blocks from my condo. I pinged him, and two hours later we were having a beer together.

I’m not suggesting that you actually stalk people and show up uninvited to wherever they check in. White-hat stalking is about finding opportunity in the fact that many people in our industry spend a lot of time on the road. Sometimes, an online friend from across the country or even the other side of the globe just happens to be in town. Sometimes, you’re going to the same event, and may not even realize it. It’s all about paying attention.

3. Pre-arrange a Meetup

If you are going to an event, especially a large conference, it’s easy to assume that meeting people will just naturally happen. Conferences are big events and 2-4 days can go by in a flash. If you’re going to be at an event, let people know. It may feel self-indulgent, but announce online that you’re going. If you leave meeting up to chance, you’re going to miss a lot of people. Arrange a meetup – it could be dinner the night before the event, or it could just be making sure you find each other at the after-party. Don’t overthink it – a simple “Hey, I’m in Session A3 – where are you?” on Twitter works wonders.

4. Don’t Miss a Chance

When an opportunity does come along to meet someone IRL, don’t pass it up. Not to keep picking on Gianluca, but when he arrived at the hotel yesterday he tweeted that he was down in the lobby. At a relatively small, 3-day conference, it’s easy to assume that we’d have plenty of chances to meet up, but instead I told him to wait a minute, grabbed my room key, and jumped in the elevator. I can’t count the number of times I saw someone I wanted to meet, thought “They look busy, I’m sure I’ll see them later” and then didn’t. Don’t miss your chance.

5. Act Like an Extrovert

I hate the phrase “Fake it ‘til you make it” because of that one word – fake. It’s taken me a long time to accept that there’s a huge difference between deliberately being fake and acting the way you’d like to act, even if it’s a bit out of character. If you’re outgoing online, you’d probably like to be a little more outgoing IRL. So, why not try it on for size? No one online knows that you’re secretly terrified of your own shadow. These days, when I recognize an online friend, I approach them like we’ve known each other forever. It’s amazing what a difference that makes.

To the introverts out there, I’d just like to end by saying that many of the people in this industry that you think are social animals are closet introverts themselves. One of my favorite industry posts of all time is Lisa Barone’s introvert confession back in 2008. Even social media professionals struggle with actually being social IRL. If you're at Mozcon, don't be afraid to say “hi” – I only bite when I haven't been fed.

Replicate Google's Panda Questionnaire - Whiteboard Friday

Want to avoid the next Panda Update and improve your websites quality? This week Will Critchlow from Distilled joins Rand to discuss an amazing idea of Will's to help those who are having problem with Panda and others who want to avoid future updates. Feel free to leave your thoughts on his idea and anything you might do to avoid Panda.

Video Transcription

Rand: Howdy, SEOmoz fans. Welcome to a very special edition of Whiteboard Friday. I am joined today by Will Critchlow, founder and Director of Distilled, now in three cities - New York, Seattle, London. My God, 36 or 37 people at Distilled?

Will: That's right. Yeah, it's very exciting.

Rand: Absolutely amazing. Congratulations on all the success.

Will: Thank you.

Rand: Will, despite the success that Distilled is having, there are a lot of people on the Web who have been suffering lately.

Will: It's been painful.

Rand: Yeah. What we're talking about today is this brilliant idea that you came up with, which is essentially to replicate Google's Panda questionnaire, send it out to people, and help them essentially improve your site, make suggestions for management, for content producers, content creators, for people on the Web to improve their sites through this same sort of search signals that Panda's getting.

Will: That's right. I would say actually the core thing of this, what I was trying to do, is persuade management. This isn't necessarily about things that we as Internet marketers don't know. We could just look at the site and tell people this, but that doesn't persuade a boss or a client necessarily. So a big part of this was about persuasion as well.

So, background, I guess, people probably know but Goggle gave this questionnaire to a bunch, I think they used students mainly to assess a bunch of websites, then ran machine learning algorithms over the top of that so that they could algorithmically determine the answer.

Rand: Take a bunch of metrics from maybe user and usage data, from possibly linked data, although it doesn't feel like linked data, but certainly onsite analysis, social signals, whatever they've got. Run these over these pages that had been marked as good or bad, classified in some way by Panda questionnaire takers, and then produce results that would push down the bad ones, push up the good ones, and we have Panda, which changed 12% of search results in the U.S.

Will: Yeah, something like that.

Rand: And possibly more.

Will: And repeatedly now, right? Panda two point whatever and so forth. So, yeah, and of course, we don't know exactly what questions Google asked, but . . .

Rand: Did you try to find out?

Will: Obviously. No luck yet. I'll let you know if I do. But there's a load of hints. In fact, Google themselves have released a lot of these questions.

Rand: That's true. They talked about it in the Wired article.

Will: They did. There have been some that have come out on Search Engine Land I think as well. There have been some that have come out on Twitter. People have referred to different kinds of questions.

Rand: Interesting. So you took these and aggregated them.

Will: Yeah. So I just tried to pull . . . I actually ignored quite a chunk that I found because they were hard to turn into questions that I could phrase well for the kinds of people I knew I was going to be sending this questionnaire to. Maybe I'll write some more about that in the accompanying notes.

Rand: Okay.

Will: I basically ended up with some of these questions that were easy to have yes/no answers for anybody. I could just send it to a URL and say, "Yes or no?"

Rand: Huh, interesting. So, basically, I have a list of page level and domain level questions that I ask my survey takers here. I put this into a survey, and I send people through some sort of system. We'll talk about Mechanical Turk in a second. Then, essentially, they'll grade my pages for me. I can have dozens of people do this, and then I can show it to management and say, "See, people don't think this is high enough quality. This isn't going to get past the Panda filter. You're in jeopardy."

Will: That's right. The first time I actually did this, because I wasn't really sure whether this was going to be persuasive or useful even, so I did it through a questionnaire I got together and sent it to a small number of people and got really high agreement. Out of the 20 people I sent the questionnaire to, for most questions you'd either see complete disagreement, complete disarray, basically people saying don't know, or you'd see 18 out of 20 saying yes or 18 out of 20 saying no.

Rand: Wow.

Will: With those kind of numbers, you don't need to ask 100 people or 1,000 people.

Rand: Right. That's statistically valid.

Will: This is looking like people think this.

Rand: People think this article contains obvious errors.

Will: Right. Exactly. So I felt like straight away that was quite compelling to me. So I just put it into a couple of charts in a deck, took it into the client meeting, and they practically redesigned that "catch me" page in that meeting because the head of marketing and the CEO were like okay, yeah.

Rand: That's fantastic. So let's share with people some of these questions.

Will: And they're simple, right, dead simple.

Rand: So what are the page level ones?

Will: Page level, what I would do is typically find a page of content, a decent, good page of content on the site, and Google may well have done this differently, but all I did was say find a recent, good, well presented, nothing desperately wrong with it versus the rest of the content on the site. So I'm not trying to find a broken page. I'm just trying to say here's a page.

Rand: Give me something average and representative.

Will: Right. So, from SEOmoz, I would pick a recent blog post, for example.

Rand: Okay, great.

Will: Then I would ask these questions. The answers were: yes, no, don't know.

Rand: Gotcha.

Will: That's what I gave people. Would you trust the information presented here?

Rand: Makes tons of sense.

Will: It's straightforward.

Rand: Easy.

Will: Is this article written by an expert? That is deliberately, vaguely worded, I think, because it's not saying are you certain this article's written by an expert? But equally, it doesn't say do you think this article . . . people can interpret that in different ways, but what was interesting was, again, high agreement.

Rand: Wow.

Will: So people would either say yes, I think it is. Or if there's no avatar, there's no name, there's no . . . they're like I don't know.

Rand: I don't know.

Will: And we'd see that a lot.

Rand: Interesting.

Will: Does this article have obvious errors? And I actually haven't found very many things where people say yes to this.

Rand: Gotcha. And this doesn't necessarily mean grammatical errors, logical errors.

Will: Again, it's open to interpretation. As I understand it, so was Google's. There are some of these that could be very easily detected algorithmically. If you're talking spelling mistakes, obviously, they can catch those. But here, where we're talking about they're going to run machine learning, it could be much broader. It could be formatting mistakes. It could be . . .

Rand: Or this could be used in concert with other questions where they say, boy, it's on the verge and they said obvious errors. It's a bad one.

Will: Exactly.

Rand: Okay.

Will: Does the article provide original content or information? A very similar one. Now, as SEOs, we might interpret this as content, right?

Rand: But a normal survey taker is probably going to think to themselves, are they saying something that no one has said before on this topic?

Will: Yeah, or even just, "Do I get the sense that this has been written for this site rather than just cribbed from somewhere?"

Rand: Right.

Will: And that may just be a gut feel.

Rand: So this is really going to hurt the Mahalos out there who just aggregate information.

Will: You would hope so, yeah. Does this article contain insightful analysis? Again, quite vague, quite open, but quite a lot of agreement on it. Would you consider bookmarking this page? I think this is a fascinating question.

Rand: That's a beautiful one.

Will: Obviously, again, here I was sending these to a random set of people, again which, as I understand it, is very similar to what Google did. They didn't take domain experts.

Rand: Ah, okay.

Will: As I understand it. They took students, so smart people, I guess.

Rand: Right, right.

Will: But if it's a medical site, these weren't doctors. They weren't whatever. I guess some people would answer no to this question because they're just not interested in it.

Rand: Sure.

Will: You send an SEOmoz page to somebody who's just not . . .

Rand: But if no one considers bookmarking a page, not even consider it, that's . . .

Will: Again, I think the consider phrasing is quite useful here, and people did seem to get the gist, because they've answered all of the questions by this point. I would send the whole set to one person as well. They kind of get what we're asking. Are there excessive adverts on this page? I love this question.

Tom actually was one of the guys, he was speculating early on that this was one of the factors. He built a custom search engine, I think, of domains that had been hit by the first Panda update, and then was like, "These guys are all loaded with adverts. Is that maybe a signal?" We believe it is, and this is one of the ones that management just . . . so this was the one where I presented a thing that said 90% of people who see your site trust it. They believe that it's written by experts, it's quality content, but then I showed 75% of people who hit your category pages think there are too many adverts, too much advertising.

Rand: It's a phenomenal way to get someone to buy in when they say, "Hey, our site is just fine. It's not excessive. There's tons of websites on the Internet that do this."

Will: Yeah.

Rand: And you can say, "Let's not argue about opinions."

Will: Yes.

Rand: "Let's look at the data."

Will: Exactly. And finally, would you expect to see this article in print.?

Rand: This is my absolute favorite question, I've got to say, on this list. Just brilliant. I wish everyone would ask that of everything that they put on the Internet.

Will: So you have a chart that you published recently that was the excessive returns from exceptional content.

Rand: Yeah, yeah.

Will: Good content is . . .

Rand: Mediocre at this point in terms of value.

Will: And good is good, but exceptional actually has its exponential. I think that's a question that really gets it.

Rand: What's great about this is that all of the things that Google hates about content farms, all of the things that users hate about not just content farms but content producers who are low quality, who are thin, who aren't adding value, you would never say yes to that.

Will: What magazine is going to go through this effort?

Rand: Forget it. Yeah. But you can also imagine that lots of great pieces, lots of authentic, good blog posts, good visuals, yeah, that could totally be in a magazine.

Will: Absolutely. I should mention that I think there's some caveats in here. You shouldn't just take this blindly and say, "I want to score 8 out of 8 on this." There's no reason to think that a category page should necessarily be capable of appearing in print.

Rand: Or bookmarked where the . . .

Will: Yes, exactly. Understand what you're trying to get out of this, which is data to persuade people with, typically, I think.

Rand: Love it, love it. So, last set of questions here. We've got some at the domain level, just a few.

Will: Which are similar and again, so the process, sometimes I would send people to the home page and ask them these questions. Sometimes I would send them to the same page as here. Sometimes it would be a category page or just kind of a normal page on the site.

Rand: Right, to give them a sense of the site.

Will: Yeah. Obviously, they can browse around. So the instructions for this are answer if you have an immediate impression or if you need to take some time and look around the site.

Rand: Go do that.

Will: Yeah. Would you give this site your credit card details? Obviously, there are some kinds of sites this doesn't apply to, but if you're trying to take payment, then it's kind of important.

Rand: A little bit, a little bit, just a touch.

Will: There's obvious overlaps with all of this, with conversion rate optimization, right? This specific example, "Would you trust medical information from this site," is one that I've seen Google refer to.

Rand: Yeah, I saw that.

Will: They talk about it a lot because I think it's the classic rebuttal to bad content. Would you want bad medical content around you? Yeah, okay. Obviously, again only applies if you're . . .

Rand: You can swap out medical information with whatever type is . . .

Will: Actually, I would just say, "Would you trust information from this site?" And just say, "Would you trust it?"

Rand: If we were using it on moz, we might say, "Would you trust web marketing information? Would you trust SEO information? Would you trust analytics information?"

Will: Are these guys domain experts in your opinion? This is almost the same thing. Would you recognize this site as an authority? This again has so much in it, because if you send somebody to Nike.com, no matter what the website is, they're probably going to say yes because of the brand.

Rand: Right.

Will: If you send somebody to a website they've never heard of, a lot of this comes down to design.

Rand: Yes. Well, I think this one comes down to . . .

Will: I think an awful lot of it does.

Rand: A lot of this comes down to design, and authority is really branding familiarity. Have I heard of this site? Does it seem legitimate? So I might get to a great blog like StuntDouble.com, and I might think to myself, I'm not very familiar with the world of web marketing. I haven't heard of StuntDouble, so I don't recognize him as an authority, but yeah, I would probably trust SEO information from this site. It looks good, seems authentic, the provider's decent.

Will: Yeah.

Rand: So there's kind of that balance.

Will: Again, it's very hard to know what people are thinking when they're answering these questions, but the degree of agreement is . . .

Rand: Is where you get something. So let's talk about Mechanical Turk, just to end this up. You take these questions and put them through a process using Mechanical Turk.

Will: So I actually used something called SmartSheet.com, which is essentially a little bit like Google Doc spreadsheets. It's very similar to Google Doc spreadsheets, but it has an interface with Mechanical Turk. So you can just literally put the column headings as the questions. Then, each row you have the page that you want somebody to go to, the input, if you like.

Rand: The URL field.

Will: So SEOmoz.org/blog/whatever, and then you select how many rows you want, click submit to Mechanical Turk, and it creates a task on Mechanical Turk for each row independently.

Rand: Wow. So it's just easy as pie.

Will: Yeah, it's dead simple. This whole thing, putting together the questionnaire and gathering it the first time, took me 20 minutes.

Rand: Wow.

Will: I paid $0.50 an answer, which is probably slightly more than I would have had to, but I wanted answers quickly. I said, "I need them returned in an hour," and I said, "I want you to maybe have a quick look around the website, not just gut feel. Have a quick look around." I did it for 20, got it back in an hour, cost me 10 bucks.

Rand: My God, this is the most dirt cheap form of market research for improving your website that I can think of.

Will: It's simple but it's effective.

Rand: It's amazing, absolutely amazing. Wow. I hope lots of people adopt this philosophy. I hope, Will, you'll jump into the Q&A if people have questions about this process.

Will: I will. I will post some extra information, yeah, definitely.

Rand: Excellent. And thank you so much for joining us.

Will: Anytime.

Rand: And thanks to all of you. We'll see you again next week for another edition of Whiteboard Friday. Take care.

Will: Bye.

Wednesday, July 27, 2011

Brand New Open Site Explorer is Here (and Linkscape's Updated, too)

This morning at Mozcon, I announced the launch of Open Site Explorer v3, a long-awaited upgrade to one of the most popular marketing tools on the web. I'm more than a little excited about all the progress, hard work and remarkable features that are included in this upgrade, so let's get right to them.

The first thing you'll notice is the new design (of which I'm a huge fan):

Open Site Explorer Homepage

This continues into the top view of link data and now, social metrics. I've always wanted these to be side-by-side, and it's great to finally be able to see both at the same time.

Open Site Explorer Social + Link Metrics

The menus of filters have improved, and there's now a new visualization to show links as groups in domains or as separate links (like the classic Yahoo! Site Explorer view).

Open Site Explorer Filters

Social metrics are also included in the Top Pages reports, so you can see how the most-linked-to content has performed on the social web. This is particularly cool for popular blogs.

Open Site Explorer Top Pages

The anchor text and linking domains tabs have a new feature that lets you see a sample of the links that come from that domain (or with that anchor text). Beware that right now, there's a small bug where we're sorting those links we do show in some odd ways. This should be fixed in the next Linkscape update.

Open Site Explorer Anchor Text Drilldown

Comparison reports have also taken a nice step forward, and feature the ability to side-by-side compare metrics for pages, subdomains and root domains on up to 5 sites simultaneously. They match the metrics you can get in the PRO web app, as well, which is very cool.

Open Site Explorer Site/Page Comparison

And last, but not least, the new advanced reports tab lets you query like a SQL master! Without having to write any complex logic against our API (though you can still do lots of awesome stuff with that), you can grab any combination of link sorts, filters and keywords you'd like (and exclude data you don't want). This is particularly excellent for link builders looking at competitive or industry-related sites' link profiles, and I expect we'll see a number of blog posts in the near future with strategies on how to employ this tool.

Open Site Explorer Advanced Reports

In addition to all the amazing new features in Open Site Explorer, Linkscape's index just updated using a new infrastructure that's allowed us to crawl much deeper on large, important sites. For many pages/domains, this will mean an increase in the total number of links we report, but likely a lower count of linking domains (unless you've gained a lot of links in late June/July) since we're excluding many domains that are low-quality/not-well-linked-to. We'd love your feedback on this index, as it's the first one of its kind, and will continue to see tweaks/improvements over the next few updates.

  • 58,273,105,508 (58.2 billion) URLs +47% from June (our largest index growth ever from one month to another!)
  • 637,828,397 (637 million) Subdomains +71% (it appears the domains we're crawling have more subdomains)
  • 91,013,438 (91 million) Root Domains -23% (due to the depth vs. breadth focus of this crawl)
  • 456,474,577,597 (456 billion) Links +14%
  • Followed vs. Nofollowed
    • 2.28% of all links found were nofollowed +5%
    • 60.44% of nofollowed links are internal, 39.56% are external
  • Rel Canonical - 9.50% of all pages now employ a rel=canonical tag +20% (my guess is higher quality domains are more likely to employ rel=canonical)
  • The average page has 78.64 links on it (+30% from 60.67 last index)
    • 65.33 internal links on average
    • 13.32 external links on average

We're looking forward to your feedback on the new features and the new index (which we plan to continue iterating upon). There's actually even more new features coming in September, so stay tuned and thanks so much for all the support and use of OSE; it's run more than a million reports, and we hope the next million are just around the corner.

Tuesday, July 26, 2011

Leveraging your SEO for Search Retargeting

Here at Moz we work hard to break down those silly silo things (frankly they scare us). We believe that the different pieces of marketing should constantly be communicating with each other. Cyrus (our SEO lead) and I try to communicate on what we are seeing, where we might be overlapping, dropping the proverbial ball and so on and so forth. We know that leveraging each person's daily activities for maximum impact is the key to any company's success.

In the past, on the blog, we've talked about ways to leverage one of your tasks for related gains. Just a few days ago we talked about utilizing your analysis in GA for eCommerce SEO, and a few months ago we talked about how you can repurpose your on page SEO techniques for off page SEO success. These are great examples of how we should all be looking at the work we do daily and ask ourselves, "who else could use this?" and "how can I leverage this information for more gains?" I've never liked the phrase "kill two birds with one stone" (cuz why are we all so cool with killing birds?) so instead I'm coining the phrase "eating two cupcakes with one fork" (cuz we all love cupcakes). Working off that approach, today I'm going to talk about another way you can leverage your SEO duties for marketing success.

cupcakes and a fork
"It's like killing two cupcakes with one fork"
(just go with it)
angry fork photo credit

SEO & Search Retargeting: A Perfect Pair

Specifically I want to outline a few ways we can take all of the data mining and reports we work on and extend there value by using it for search retargeting. But wait, what the hell is search retargeting? Good question my friend. To understand search retargeting, we need to first understand retargeting. I wrote a post a while back that defined "retargeting" as "a form of marketing in which you target users who have previously visited your website with banner ads on display networks across the web."

Search retargeting is a subset of retargeting, and takes it one step further. It is a paid acquisition channel that allows advertisers to reach back out to users who have previously searched for their brand name or target keywords.

The difference between the two makes for a huge opportunity. The visitor doesn't have to have visited your site to be added to your audience to target with ads. For those of us that aren't ranking #1 for every word they want to, and are possibly losing visits to our competitors, you can target those lost visitors simply by going after people that searched for words in a catergory, industry, service, etc. You can quickly see why it would be beneficial for me to know (when setting up these campaigns and my targeting) what Cyrus is up to, and what he has been working on in regards to keyword targeting, our rankings, and more.

In fact let me show some fun stats to really sell you on the value of search retargeting. Did you know that "retargeted consumers are nearly 70% more likely to complete a purchase as compared to non-retargeted customers." Couple that with the fact that a number of reports have come out saying that retargeted customers also spend close to 50% more than those that weren't retargeted, and you got yourself a hot little thing happening.

Ways to Recycle Those Hours of SEO Work

Okay now that we have shown off just how effective search retargeting can be, lets talk about how we can repurpose some of that hard work us SEOs do to help our search retargeting efforts succeed.

#1 Ranking Reports (the "obvious" candidate)

How much time do you spend looking at ranking tools (possibly even ours ) in gauge the performance of your target keywords? Hours upon hours are spent by SEOs looking at their rankings, or lack there of. This information helps us all understand where the actual visits to our site are coming from, and subsequently what keywords are driving conversions. But what about the rankings you can't seem to conquer? For a second lets focus on the words you simply haven't been able to make any headway on. Those are prime candidates for a search retargeting campaign.

What if you pass that list of words off to your paid marketer counterpart and told them to focus their energy (and budget) on targeting those people with highly targeted ads? That would not only help supplement your SEO efforts nicely, but you would be spending your retargeting budget on a prequalified audience. Often paid marketers spend a great deal of budget trying to isolate out a solid audience to go after, you'd be saving them time and money. Much like passing those words off to your PPC manager, you can quickly gain visits from these high converting, targeted keywords you are having a hard time ranking for.

Example time: The phrase "free seo tools" results in a lot of conversions for us, but as you can see below, we don't rank in the top five for it.

free seo tools query
Our efforts in increasing our rankings here have been slow to respond. While we continue to work on SEO efforts here, we can supplement with highly targeted ads.

Target: People who search for "free seo tools," "seo tools," "cheap seo tools," etc. with ads like this:

They directly speak to the searcher's intent, "free seo tools" and would likely produce both high CTR for us, as well as increased conversions. This can help us grow our free trial numbers while we figure some things out on the SEO front and get our rankings up for "free seo tools."

#2 Second Tier Keywords (a "little less obvious" candidate)

Oh keyword research, how we love thee. Okay maybe some of us don't loveeee it, but it's a huge part of the process. SEOs spend hours pulling likely keyword targets, pulling traffic data, and competitive data to help them decide what to go after next.

In that gold mine of keyword data are dozens of likely search retargeting candidates. SEOs know that not every word they deem valuable can be a priority right now for their company or their clients. These get pushed into some second tier keyword bucket, that often doesn't get as much content, link building, or other resources allocated to it. My advice? Send that list on over to your paid marketer. Ask them to target these topics, site categories, etc. with their search retargeting ads until you have some more time and resources available to go after them.

Example time: Let's say we are ranking well for "seo tools" and "seo software" but we don't have the time to build an SEO campaign around the idea of "SEO resources." We know there are a number of people searching for this niche (SEO newbies, SEO students, etc.) and we know we have a ton of valuable content around the topic. So how can we help people find our resources, and associate us as a SEO resource if they have never heard of SEOmoz or visited us before?

Target: users that visit {seo blogs, seo training sites, and seo tool providers} with the below ad:

By using the word "resources" we are speaking directly to this user's need for more SEO learning material. We also get the added benefit of lining our logo up with this type of value add, which hopefully, down the road could result in a visit to our site and possibly a free trial signup.

#3 Competitive Research (a "no one out there is really doing this, so go kick some butt" candidate)

My favorite part. I don't know what it is about competitive research that has us all thinking the information we gather is context specific, but it's true. I remember the first time I mentioned to a PPC colleague they should look at SEMrush's SEO results for one of our competitors to build out our PPC campaigns, you would have thought I just smacked a puppy. She was in shock.

The truth is we are all playing on the same field here guys. A lot can be gained by studying your competitors total efforts, not just their paid or organic ones. Next time you spend time spying on your competitor's organic efforts, pass that information off to your paid marketer and ask them to build a search retargeting campaign around it. Because let's be real, we all have limited time and resources, and some of their targets will never make it onto your prioritization list. Plus, often you will see brand association start to shift through retargeting, which will reciprocally help your SEO efforts. Whoa, cool huh?

Example time: Look at the below results when I used SEMrush to view some of my competitors top keyword rankings. While we perform well organically for "seo software", "best seo software", etc. we don't necessarily have many SEO campaigns around the concept of "powerful" and "easy/simple."

rankings for SEMrush

This tool is showing us that our competitors are cleaning up here, and we know we need to atleast be building some brand sentiment around these adjectives. Search Retargeting can help.

Target: We can set up search retargeting ads for people that are searching online for these terms, and then target them with ads that directly speak to this. Below you can see we have incorporated these words to help build our brand association with them.