In the first part, I wrote about the fallacy of using people with thousands of followers to illustrate how you can get great results if you ask questions on Twitter.

In this part, I’ll focus on why the conversational nature of Twitter makes searching it effectively a hard problem.

Consider this exchange:

@CherylHaas: Celebrating my newly purchased iPhone. w00t!!! No longer a Luddite. App suggestions, please?
@rakeshlobster: yelp and shazam and Facebook

This is how people interact on Twitter. Partly because we’re lazy, partly because a lot of the interaction is done from mobile devices where typing is hard and partly because of the 140 character limit on tweets.

Between these two tweets, we have an answer to the query “iPhone app”. But Twitter Search treats these tweets independently. As a result, if you search for “iPhone app”, you’d get Cheryl’s question. Not very helpful.

If you search for “shazam,” you’ll get back my response. But there’s no context for it. The meaning of my response is lost without the context of Cheryl’s question. The question could have been “what apps are causing your iPhone to crash?” This happens in ordinary conversation on Twitter; when people are slow at responding and I get a “@rakeshlobster yes,” I’ll sometimes have forgotten the context.

This problem could be alleviated if Twitter presented threaded conversations. But then Google could just as easily index the conversation, as it does with Yahoo! Answers.

Another issue is that people don’t write for Twitter the way they write for search engines. Compare my tweet above with this post I wrote on my favorite iPhone applications. That was written with searchability in mind. There’s also a lot of shorthand on Twitter. @maryvale shortened “Nikon D80” to “D80” in her tweet discussing my last blog post.

That may change if searching Twitter takes off, but it would also change the nature of Twitter. I’ve been experimenting with adding more keywords in my tweets. For example, when I dropped my laptop, I originally wrote:

“laptop hinge broken. argh. it’s pretty, sleek and light. and extremely delicate.”

But then I added in the “toshiba portege r500 is”. It’s more searchable, but it makes the conversation sound stilted and robotic.

Another challenge with searching Twitter for information is that a lot of the value in Twitter is not in the tweets, but in what the tweets point too. With the extensive of URL shorteners like TinyURL and bit.ly, even the minimal keywords are lost.

Beyond the content difficulties in search, there are the related issues of search order and authority.

The results that you get back are sorted chronologically and are highly dependent on when you search. Although the “best” answer for a search can fluctuate over time (one of my criticisms of Google is that its algorithms don’t do enough to counter the effects of Web rot), for most searches it doesn’t vary dramatically over the course of a day or a week. A notable exception would be queries like “what’s a good party at SXSW right now?”

As with asking questions of the Twitterverse, searching Twitter doesn’t provide any guidance as to whose answers are better than others. Searching Twitter is in someways like stepping back 15 years in search technology, before search engines widely used off-page clues and link authority to rank results.

Some suggestions have revolved around developing authority rankings based on number of followers, number of tweets, etc. The problem with that is that no one person is an authority on everything. A search result from Om Malik (@Om) on telecom should be ranked much higher than a result from Om on migration patterns of birds in Africa. Review sites like Amazon and Yelp have devoted a lot of energy to helping people determine which results are valuable. Twitter will have to develop something similar.

Despite today’s issues, the immense amount of data that Twitter and Facebook are collecting could be used to build a better, more spam-resistant search engine. The marriage of search and social networks has the potential to get us better and more credible answers, while also increasing our connections to our friends.

More on: Twitter, Google

Disclosure: I worked with several members of Twitter’s search team at AOL Search. While I don’t believe in the current hype in the blogosphere about Twitter as a Google killer with the current technology, the guys I know are very smart and I look forward to seeing what they do next.

3 responses

Realtime Twitter search is not a Google killer « reDesign

March 13, 2009

[…] Part 2: Challenges of searching Twitter […]
Adam

March 14, 2009

Very thoughtful note, Rocky. I like how you recognize that there’s indeed value in micro-blog entries (like on Twitter)… but that the value is mixed and could be quite difficult to mine.

One nitpick: I believe many tweets show up with an “in reply to [username, linked to their original tweet]”. So this can help users (and potentially search engines) better understand the context of @replies.

Still, though, you’ve rightly highlighted how fractured conversations can be on Twitter. This indeed is why I like Twitter as a broadcast medium, but shun it as a way to converse with others.
Rocky Agrawal

March 14, 2009

Adam, absolutely, people use @replies. But stitching them together into something meaningful is not an easy task. I tried doing that manually with Om’s query on Indian restaurants that I referenced in the previous post.

You can have multiple conversations with someone at the same time. Because the @reply is directed at a person, there isn’t an easy way to pick out which are part of a conversation and which are not. In a short timeframe, you could have 5 @replies which are related to the question and another 3 that aren’t. You can also have @replies that come in outside the time window.

Multiple people can engage in the conversation. Trying to figure out which of these @replies are part of the conversation and which are not is hard.

The easiest way I can think of to do this is to use a time window, but then you’ll certainly get extraneous tweets that aren’t part of the conversation. To algorithmically stitch this together would be extremely difficult.

Google has a similar issue when a document is about multiple topics. (A document with several news briefs is a classic example.) It doesn’t know what’s what and treats the whole thing as one. I think the industry needs better ways to deal with this. Another blog post for another time.

It would be far easier and more accurate for Twitter to push the @tweet so you know what is being replied to. It’s not really exposed in the UX, but they have the code in place to track it. It’d be interesting to know how many @replies have these data.

Realtime Twitter search is not a Google killer, part 2

Share this:

3 responses