Micro-blogging site Twitter recently announced that it has launched a new, searchable index of every public tweet ever made since its inception in 2006.
The search engine in its original form did an excellent job of bringing breaking news to the forefront.Yi Zhuang
The company’s new, fully searchable and complete “Tweet Index” can now be accessed through its Android and IOS apps and its standard Web client, according to a blog post on Tuesday. The newly completed project had been long in the works, made more difficult by the number of tweets the site contains. By Twitter’s own estimation, around “half a trillion documents” have been generated by Twitter users since 2006, making it a priority to provide users with an efficient way to search through the volumes of tweets.
Yi Zhuang, co-leader of the Twitter project, said that the search engine in its original form did an excellent job of bringing breaking news to the forefront – something that coincided with Twitter’s strong emphasis on providing breaking news. However, Zhuang added that the micro-blogging site’s ultimate goal has been to provide users with the ability to comb through every public tweet ever made.
Challenges faced by the team included creating a modular system that would be able to scale over time, yet deliver a quick, simple and cost-effective way to allow users to search for tweets. Twitter says that Tweet Index searches average at under 100 milliseconds in time to deliver results. More recent tweets are stored in RAM to provide fast updates and low latency, while older indexed tweets are stored in less expensive methods in order to prevent maintenance costs on the system from spiraling out of control.
The index was built out in increments, according to Zhuang. First, a foundational project in 2012 created a small index of historical tweets of about 2 billion in number, using the index to develop a pipeline for the offline aggregation and preprocessing of data. Meanwhile, Twitter decided to increase the size of that original index to about 20 billion tweets with an eye towards tweaking the solid-state storage solution that would be keeping the index safe. Finally, it was time to add the entire index this year, said Zhuang.
This archive content was originally published November 18, 2014 (www.betawired.com)