Biznology Blog: April 2006

« March 2006 | Main | May 2006 »



April 28, 2006

Federated search always seems just around the corner

Federated search, sometimes called metasearch, has been the filling up conference agendas for years, and The Search Engine Meeting this week was no exception, as Abe Lederman, the President of Deep Web Technologies, explained his company's approach. Federated search has many appealing qualities, but is rife with the kind of technical hurdles that make a researcher's head spin. Is federated search finally on its way? Should you be considering that kind of approach for your Web site?

In some sense, federated search has been with is for years—Dogpile, Metacrawler, and others have long provided an ability to search across multiple search engines and similar approaches are available today for your Web site. The problem is that federated results just aren't as good as the "one-big-index" search engines, such as Google. At first glance, that seems odd. After all, if one search engine can do a good job, wouldn't searching across ten search engines do an even better job?

So far, the answer is no.

Federated search engines are limited in what they know about the documents they find, because they don't actually crawl and index those documents—the underlying one-index search engines do. So, while Google's spider looks at billions of documents across the Internet, Dogpile does not look at any—it merely gets the list of results from Google (and other search engines) and stitches together a list of search results.

Because Dogpile doesn't actually examine the documents, it suffers from limitations that degrade its results. Relevance ranking, while difficult in a single-index search engine, is excruciating for a federated search engine. Google can rank documents based on where the words appear in the documents, which documents get links to them, and dozens of other factors. Dogpile can't. Dogpile can only take a guess at which documents are better by examining the titles, snippets, and URLs that Google returns to display on its search results screen. That's why most people prefer Google, or Yahoo!, or another one-index search engine to Dogpile and Metacrawler.

Similarly, one-index search engines have clever "de-duplication" algorithm to make sure that they are not storing multiple copies (or near-copies) of documents in their indexes—if they did, searchers would see several documents in their list of search results that are essentially the same. One-index search engines can de-duplicate because they see the actual documents as part of their indexing process. Federated search engines can't easily "de-dup" documents when they are returned from multiple search engines. They can perform simple operations such as eliminating identical URLs, but they can't eliminate near-duplicates very easily.

So, federated search engines have long suffered with lower-quality results than one-index search engines, but they make up for it with slower performance. OK, it's no laughing matter to the federated search engines, but the truth is that a federated search can only be as fast as the slowest one-index search engine it uses. Some fancy tricks can be played to begin showing results before all the one-index search engines have returned their results to the federated search facility, but the final result is always slower than the slowest one-index engine.

So, with all these technical hurdles, why are folks still working on federated search?

  • scale. Some federated search enthusiasts believe that one-index search engines will eventually run out of steam because the number of documents will exceed their capacity. Thus far, those predictions have not panned out as one-index search engines have applied parallel processing techniques to essentially create federated search inside themselves, so they gain all the scaling advantages of federated search while retaining all the information about the documents themselves.

  • latency. Google's spider can't crawl the Internet constantly, so there is always a time lag between when a document is changed and when the search engine knows that it changed. In theory, if a document need only be indexed in one small search index, that search engine could keep up with those fewer changes more rapidly—perhaps each document that changed could be sent to the search engine rather than waiting passively to be crawled. Federated search, in theory, would search all these small search engines and provide more up-to-date results than the one-index search engines.

  • reach. Despite Google's goal of making the world's information accessible, it has a long way to go. Treasure troves of information are squirrelled away in private (fee-based) databases beyond the spider's prying eyes. Publishers may be willing to allow a metasearch engine to query their private search engine when they are not willing to have their contents crawled. This sets up an odd dynamic for one-index search engines where the same problems are rehashed to be solved over and over again. Amazon offers search-within-the-book, but Google can't use Amazon's data, so it set up its own Google Print program. Yahoo! has also struck out on its own path to digitize books, but none of these efforts work together. Federated search engines might be able to search any of these printed book indexes to find a searcher's answer. Object-oriented theorists believe that objects should be "findable" so that a search engine can literally query each document and have the documents respond if they should be found. Many problems exist with this theory, ranging from spam to performance, but who knows what the future will hold?

Relational database vendors have used federated approaches for years, so that each database is able to searched in massive data warehouse applications. Federated relational databases are simpler than federated search, however, because each database uses the same SQL language to be searched, whereas there isn't any real equivalent language that works across search engines. Moreover, with relational searches ("Show all the payroll records where the salary is higher than $60,000") there is a single right answer. With search queries, every search engine would provide different relevance-ranked lists of documents even if they had exactly the same documents in their search indexes.

Abe Lederman, in his talk this week, described how his company is working to improve their de-duplication algorithm and how they are using several approaches to relevance ranking (including retrieving entire documents at search time to decide which ones should be ranked higher). The work is fascinating to search technology buffs like me, but it's not clear how important it is to a business trying to improve search on its Web site.

If you already have multiple search engines on your site, federated search is an appealing way to build on top of that existing investment, but current federated search facilities don't easily provide the kind of search experience that most one-index search engines do. So far, you would still be better off wiping out those smaller search engines and putting in one big one. But with all this research activity going on, someday the answer may be different.

Posted by MikeMoran at 8:35 PM | Comments (0) | TrackBack

April 26, 2006

Are all search marketers spammers?

Steve Arnold says yes. Who's Steve Arnold? He's both the president of Arnold Information Technology and an author who spoke at The Search Engine Meeting yesterday, the same conference I spoke at on Monday. Apparently my talk, which I considered rather uncontroversial, piqued Steve's interest, because he referred to it a half dozen times during his own talk, describing the techniques that I advocated as "spamming." I don't know anyone that agrees with him, but he says his charge speaks to the generational divide in search technologists.

I began working in publishing and search in the late '70s, so I should clearly be part of that older generation. I clearly recall search engines such as IBM's STAIRS, which, while interesting at the time, pale in comparison to what we expect of search engines today. A large database back in the 1970s was a few hundred thousand documents, which is the number of documents returned by an average Google search today. Searchers had to understand what was in the database, what nomenclature was used, and the Boolean syntax to enter a search query. I was part of the team that developed the first commercial search engine that used linguistics (so that the word "mice" matches "mouse"), which is part of every search facility in use today. While we were all pioneers back then, I don't think even Steve wants to return to those days.

But Steve does seem to pine for the days when all documents were manually classified by trained librarians and researchers. He rails against folksonomies, such as del.icio.us, and bemoans the "manipulation" that content providers can exercise over search results, highlighting an article in his talk where digg was exposed as susceptible to trickery. In response to a question from the audience after his talk, he agreed that Google's results are "corrupt." And, honestly, he has a point. There's nothing magical about folksonomies giving you the right answer—for popular subjects, they may do just fine, but a researcher-tagged database might do a lot better for other subjects. And neither Google nor any other search engine gives you objectively relevant results, but that is because there is no such thing as objective relevance—relevance is in the eye of the beholder. The bottom line is that any system undoubtedly has its Achilles' heel, where someone out to manipulate results can do so. A good system is hard to manipulate and strives to improve every day in that respect. In my opinion, Google, Yahoo!, MSN Search, and many other systems qualify as good systems in that respect. So Steve has a point, but only because objective relevance does not exist.

But Steve goes too far when he describes every technique that makes content attractive to a search engine as "spam," as he did in his talk. In fact, he made it clear that he considered the advice I gave to folks to optimize their content to be advising them to spam the search engines. Now, I suppose folks can disagree on the meaning of the word "spam," but to me, the search engines' terms of service define spam. The search engines describe many techniques that go beyond that ethical line, techniques that they consider to be spam, everything from keyword stuffing to link farms to cloaking. Bill Hunt and I spend lots of space in our book, Search Engine Marketing, Inc., explaining what these techniques are, why they are bad for searchers, search engines, and even search marketers. These techniques are unethical and we don't advocate them in any way.

But Steve doesn't accept the standard definition of spam, expanding "spam" (in his mind) to include anything that makes your content more attractive to search engines—including all the legitimate techniques of search engine optimization. By doing so, Steve impugns the ethics of any search marketer pursuing efforts that are perfectly acceptable to the search engines. He dilutes the term "spam" and dissipates the outrage possible for real spam techniques. He makes a mockery of the work that Google's Matt Cutts and other search engine employees do to combat genuine spam, by extension calling into question why their terms of service are so lenient. If any organic search marketing technique can be called "spam," the term ceases to have any meaning.

Steve especially took issue with my advice that people "add keywords to their content" so that search engines would find their pages. (I advised this in the context of Web site search, for which there is no spam issue, but I also mentioned that this technique helps your rankings with Internet search engines such as Google.) I showed that IBM's page on Product Lifecycle Management went from #175 in Google to #1 when we did a few simple things such as adding those keywords to the page. (The keywords were missing completely originally, so adding them to page hardly constitutes the spam technique of keyword stuffing.) Steve stated that merely adding keywords to a page is spamming. I don't know why IBM's leading offering for Product Lifecycle Management ought to be relegated to #175 just because we didn't understand what search engines are looking for, but that appears to be what Steve is advocating. Steve went so far as to say that search engine optimization techniques are leading to the "end of relevance." (This will come as a shock to the hordes of searchers that rely on these results each day because they are more convenient and often more comprehensive than the libraries and manually-tagged databases of yore.) Andrew McKay of FAST commented later in the day that Steve's outlook was the "most defeatist thing he had heard in a long time—just because not all problems have been solved is there a reason to give up."

In a private discussion after his talk, I challenged Steve on his definition of spam, pointing out that his definition is one that even the search engines don't share. Steve did not defend his view in any way, merely saying, "I apologize" but he also went on to say that he will continue to use the word "spam" the same way in the future. As I continued to ask him why he would keep using the term "spam" to include ethical behavior, he continued to remind me that he had apologized, but offered no insight into his reasoning. Honestly, when someone apologizes for doing something but says that he is going to continue to do it, it's not clear how heart-felt that is.

Steve also seems nostalgic about the days of manual tagging by experts, but those systems seem every bit as susceptible to manipulation as any other. There are scattered reports of payola to Open Directory editors for listing Web sites in that directory. Whether they are true or not, it's clear that having human beings involved in the tagging process does not eliminate the possibility of manipulation.

Steve would have you believe that we are somehow worse off with the search we have today than the researcher's walled garden of yesteryear, but I don't see how. Clearly, it would be troubling if tobacco companies dominated the search results for "smoking," but they don't. Pharmaceutical companies don't dominate the results for diseases, and political parties don't hold sway on public issues. In fact, the Web, and search on the Web especially, has given voice to folks without big money behind them in a way that no other media has.

When pressed on his worldview that demonizes ethical search optimization behavior, Steve described himself as coming from the camp of manually indexing content, saying, "I am 64 years old and not about to change" and, "If I was 30, maybe I would have to adapt to these new things." Perhaps. But he certainly has adopted that new-fangled terminology of "spam," so it would be nice if he used the word properly instead of getting attention by creating artificial controversy. As someone old enough to remember the old days, I'm here to tell you they weren't all that good. I'd also like to think that we could discuss the changes that have occurred in search over the years without questioning people's character, but to each his own. I hope that my experience that goes back over 25 years helps inform me about the new things you see today, rather than causing me to wax nostalgic. I also hope I never feel I am so old that I can't change my opinion in the light of a changing world.

Posted by MikeMoran at 10:38 AM | Comments (9) | TrackBack

Web 2.0 Makes Marketing a Conversation

Everyone is talking about blogs, wikis, podcasts, and more. Are you puzzled by Web 2.0? Or are you unsure of what effect it has on marketing? Find out what you need to know in my April Biznology newsletter, Web 2.0 Makes Marketing a Conversation.

Posted by MikeMoran at 8:47 AM | Comments (0) | TrackBack

April 24, 2006

Don't Just Change the Search Engine

150 folks turned out at the Fairmont Copley Plaza in Boston for The Search Engine Meeting. I was pleased to be able to present about how to improve your Web Site Search facility in ways beyond the search engine itself. You can download my slides for Don't Just Change the Search Engine.

Posted by MikeMoran at 12:19 PM | Comments (0) | TrackBack

April 22, 2006

Appearance in Boston on Monday

If you are in the Boston area, I will be appearing at the Search Engine Meeting, speaking at around 10 am Monday on the subject, Don't Change Just the Search Engine. You'll learn how many different ways there are to improve your Web site search facility besides dragging in new technology. See how improvements in content, indexing, user interface, and other areas can give your searchers the answers they need.

Posted by MikeMoran at 12:01 AM | Comments (0) | TrackBack

April 21, 2006

Your Interactive Marketing Future

I spent a very interesting day yesterday at the Reaching Your Global Customers conference in Lanacaster, Pennsylvania, hosted by the World Trade Center. I did know that the World Trade Center is an international organization with more than 300 locations the promote trade between nations—I just knew about its eponymous, ill-fated headquarters building in New York. More than 100 people gathered in central Pennsylvania to receive a copy of our book, Search Engine Marketing, Inc., and to hear several speakers discuss world trade with business and national leaders from dozens of countries. You can look at the slides from my talk on Your Interactive Marketing Future.

Posted by MikeMoran at 3:27 PM | Comments (0) | TrackBack

April 18, 2006

Recording of Driving Demand from Site Search

Thanks to those of you who tuned into Friday's Webinar. You can listen to the recording for Driving Demand from Site Search to see how your Web site's search engine can increase traffic to your site as well as your conversion rate for existing traffic.

Posted by MikeMoran at 10:04 AM | Comments (0) | TrackBack

April 17, 2006

Two speeches in Pennsylvania

I'll be in Lancaster, Pennsylvania on Thursday at the Reaching Your Global Customers conference giving the keynote speech called Your Internet Marketing Future and a breakout session on Step-by-Step Search Marketing Success. If you spring for a full-day registration, you'll get a free copy of our book, Search Engine Marketing, Inc. I hope to see you there.

Posted by MikeMoran at 11:04 PM | Comments (0) | TrackBack

April 11, 2006

Book review by Raymond Sonoff

Some more kind words about our book, Search Engine Marketing, Inc., have been posted by Raymond Sonoff of Sonoff Consulting. Raymond writes frequently about Web site issues and he included some feedback on our book in a recent roundup of Web site productivity resources.

Posted by MikeMoran at 10:25 AM | Comments (0) | TrackBack

April 10, 2006

Driving Demand Through Advanced Site Search

I am appearing at an e-Commerce Leadership Webinar this Friday at 2 pm US eastern time to explain how you can use your site search engine to drive higher conversion rates and increased traffic to your site. You may have overlooked some simple ways to get more value out of your search engine to drive your site's navigation, to create landing pages for search marketing and e-mail campaigns, and more. Join us so we can unlock a few site search secrets to improving your sales.

Over 250 people have registered for this free event, but you can still take advantage if you act quickly. Check out the event registration page for the details.

Posted by MikeMoran at 11:50 AM | Comments (0) | TrackBack

April 4, 2006

Reduce e-Commerce Pogosticking

Way back in 2001, noted usability expert Jared Spool offered a key insight into shopper behavior. Shoppers buy more when they have no need for backtracking, which Spool calls pogosticking—choosing a link and then hitting the back button to return to the previous page to choose another link. Spool recommends that product list pages show all the information needed to allow shoppers to choose the right link to follow, which makes perfect sense. Unfortunately, it's easier to say than to do. What do you do when your page would be so filled with information that shoppers could be overwhelmed with data?

Until recently, there wasn't much you could do. You could undertake user testing to see what product information was needed on the product list page. You could try several different sample layouts to see which version provides that best clickthrough and conversion rates. But recently, one more choice has been added.

A new programming technique, known as AJAX, combines the power of JavaScript and XML to refresh parts of Web pages without redrawing the entire page. Netflix shows a great example of how AJAX can be used for product lists.

At first glance, the Netflix page seems ripe for pogosticking. Look at those pictures of movies for rent—how can shoppers decide which movies they want from that? Aren't they sure to click several different movies to see details on each one before finally deciding which ones to rent?

Not with AJAX they don't. Try moving your mouse over one of the movies as if you were going to click it. Watch what happens. You are immediately shown a cartoon bubble that contains more information about the movie—maybe enough to help you decide which one to actually click.

This mouseover experience offers the best of both worlds. Page designers can show many movies in a very small space, allowing shoppers to scan for what is desired. But as the shopper considers each one, mousing over each one reveals all the information needed. The result? A neat, elegant look for a page loaded with information.

AJAX has many other uses, but previewing links is one of the most powerful. If your e-Commerce site does not preview links, check out AJAX today.

Posted by MikeMoran at 10:21 PM | Comments (0) | TrackBack