July 18, 2008

Gnip CEO's Goal: Make Twitter's Data Flow Suck Less

Data publishers and data consumers can be both friends and enemies, when it comes to the seemingly infinite demand for and growth of real-time data from Web services. Services like Digg, Flickr, Del.icio.us and Twitter are happy to see their user bases expand, and to see developer communities be built around their products. But with each new application hitting their API, and each new user, comes new demand that can put strain on their infrastructure, even if the outside application is just checking for updates which aren't there. Gnip is looking to act as a go-between for data publishers and data consumers, delivering updates from the services to the applications, and reducing the queries that can drag popular sites to a crawl.

Today, Gnip made headlines with an announcement that Twitter notifications would be sent to the service via XMPP, letting outside developers tap into Gnip instead of adding more strain to the embattled microblogging giant. And while this won't solve all of Twitter's issues, it does offer developers an alternative, taking some of the power out of Twitter's hands. The announcement did not contain any money changing hands, done quid pro quo.

After the morning's hubbub, I talked with Eric Marcoullier, CEO of Gnip, to better understand how adding Twitter to the team's growing array of partners would help users and developers, and whether this solved the growing concerns around Twitter's API limits that have seen application authors frustrated. And the answer so far, is that Gnip can solve some problems today, and is preparing to solve more issues soon. But it won't make Twitter's problems disappear.

"All these data protocols can be an exponentially scaling hassle. I like how people thought Gnip would single-handedly fix Twitter's problems, but that minimizes how big Twitter really is," Eric said. "Some developers don't just want the user stream, but the reply stream as well, and others want Track, which bangs against the Summize (now Twitter) API, to find if there is a new tweet that has a followed word. We might not ever solve that. It's a big scaling problem of reading the content, and it doesn't fix all of Twitter's problems."

Gnip's Data Flow Architecture

What Gnip is looking to do is help publishers looking to syndicate their data, and consumers, who are building businesses off user generated data, by simplifying the complex back-end work needed, and giving entrepreneurs more time to work on the front-end of their product, delivering tangible benefits.

"We're able to go to them and say, all of the effort you are doing to aggregate that data, stop now," Eric said. "Tell us what services you like, what protocol you like, and the data magically appears in real time."

While Twitter has been the most visible client so far, it's by no means the first for Gnip, which launched with two partners out of the gate, in Plaxo and MyBlogLog, where Eric was a co-founder. Since launch, Gnip has also penned partnerships with additional services, including Lijit and Iminta.

Gnip's Growing Partner Roster

Today, developers of applications are authoring products that query popular sites, like Twitter, Digg and Del.icio.us, and do so thousands of times a day, even if the overwhelming majority of the time, there are no updates. Where Gnip works well is for centralized services, like Plaxo, who can dramatically cut back on the amount of times they need to make requests. "It doesn't matter how many people are following an individual on Plaxo Pulse. They just have to ask once." Eric said.

But the way centralized services make queries to Twitter is different than the issues faced by the many apps that are struggling against the 100 API calls per hour per IP address limit discussed yesterday. For that, more work is needed.

"For the average user, 100 queries per hour is fine, as long as you're only querying the API when there is new data," Eric said. "But for, say Thwirl, where they each have their own user connections, they would have to query maybe 50 times, and that's half the load. We're looking for a simple way of creating anonymous buckets, so somebody like Twitter Karma can say 'we have 10,000 users with this collection', and we can centralize it. We're still a ways away from helping folks with distributed clients."

Gnip's initial efforts and partnerships have been completed on the first version of their product. In about a month (or two), the company expects to not just send service notifications to partners, but also, the full metadata, which will bring more rich information from its many supported services, including Del.icio.us and Disqus, Flickr and others. Maybe, at that point, you'll also see Twitter passing on reply streams as well, but that's not set in stone.

"Working with people like Twitter, we want to be sure we are serving their best interests and the developer community," Eric said. "It's a huge win for us."