Wednesday, November 19, 2014

Twitter updates search index, will now let you search tweets going all the way back to 2006

11/19/2014


Twitter updates search index, will now let you search tweets going all the way back to 2006
4
By Nimish Sawant /  19 Nov 2014 , 13:23
Twitter made a big announcement on its blog last night. You can now search public tweets dating back to the first tweet sent out, back in 2006. While Twitter always stored the tweets permanently, it used to purge them from the search index from time to time, in order to keep the recency of tweets intact. But with the new announcement, the search net you lay out on Twitter will gather a lot more tweets. According to the blog, that has been Twitter’s long standing goal.
According to the blog, Twitter already had a real-time index which stored the most recent tweets of the past couple of weeks. This was done in order to keep the results recent. The full index on the other hand is around 100 times larger than the real-time index and required additional capacity, re-partitioning and operational overheads. While the real-time index was stored on RAM memory to keep the updates fast and latency at the minimum, using RAM is not possible for the full index as that increases the costs. Also the search index has been a work in progress. The blog mentions a2012 historical index which had 2 billion top tweets of the time. In order to search for tweets which were not showing in Twitter search results, one had to make use of third-party tools such as Topsy and TwimeMachine.
It may seem like a fully-indexed search engine should have been a Twitter feature since ages, but Twitter did not have its own search engine even for recent tweets till 2011 which is five years after it was founded. Making the fully indexed search engine came with its set of challenges. Apart from indexing the older tweets, Twitter also had to find a way to index the upcoming tweets. Twitter employs hundreds of machines running Hadoop MapReduce, which is a popular open-source data-crunching tool, to collect and arrange the data for its entire search index. This makes the process parallelised, in the sense that one group of machines can work on indexing older tweets, whereas another machine can work on indexing the new tweets using the same set of software. Twitter uses solid-state drives to store the the indexed data instead of using RAM memory, used for real-time index. You can get a more detailed lesson on how Twitter goes about implementing the new search feature in this blog by Yi Zhuang.
You can check out Twitter’s Advanced search page which lets you refine your search, based on words, people, places and dates. This is much better than doing a search on Twitter’s official site which does not have the granularity present on the advanced search. It also lets one be focussed when searching major events or hashtags.
While this makes research easier, you are also reminded of the embarassing tweets or twitter conversations you may have had in the past. Twitter will start rolling out the feature gradually on its Android and iOS apps. You will now be able to see search results going all the way back to 2006 when you select the All tab under search results.


source

No comments: