July 19, 2011

Google+ & Other Social Networks Need Algorithmic Filters

Google+'s entry into the social networking market presents a new slate of opportunity for tech geeks who have been unsatisfied with leading offerings from Facebook and Twitter. The network's initial launch has been intriguing for two major pieces, namely the need to recreate one's social graph from scratch, including manual sorting, and secondly, as noted before, not starting with aggregation of third party content. The clean slate approach presents optimistic participants with a hope to "do things right" this time, and not fall into the limitations of networks past. Given Google's science-driven history, smart folks have also cautioned the company against leveraging algorithmic filters that might surface some content in the place of others.

The most visible argument was that from Tom Anderson, MySpace cofounder, who said, "Can a company so enamored with the power of algorithms and machine learning, let the user take control?", adding "... I'm worried that Google is going to make a misstep and ruin the service," through leveraging algorithms to cut signal from noise. While I have enjoyed Tom's resurgence to visibility and insights into early use of the network, I think the conclusion he reaches needs some work. Where there is signal, there is noise, and what's been missing in all networks to date is the right approach to surface quality content, which no doubt feeds into Tom's comments.

The vast majority of social networking content is consumed in reverse chronological format, with the most recent content being at the top. This is true for Twitter and all of Twitter's clients, it is true for Facebook's "Most Recent" feed, it's true for LinkedIn's news feed, and is mostly true with others like FriendFeed and Google+, which for the most part, sort content by the most recent activity - meaning older posts can be "bumped up" with additional comments. FriendFeed fought this intelligently over time, letting older posts eventually float downstream, while Google Buzz fought a similar challenge as the most visible posters' active threads initially took too much screen real estate.

In contrast to the chronological view, one can find intelligently filtered streams on Facebook, with the service's news feed, and in Twitter's search results, which try to show you "Top" content, and not just "All" content, dependent on the user sharing updates. But Facebook's approach, from my own understanding, relies heavily on your previous interaction with a person, augmented by that post's activity, which can bubble it to the top - independent of context. This means that if, for example, I share two posts, one on Apple's blowout quarterly earnings, and the second showing a cute picture of Braden at the supermarket, the posts may carry equal weight, assuming we are BFF. You can see this all the time in your own news feed on Facebook, as seemingly "random" posts from your friends surface to the top, while friends outside your top two dozen interactions almost disappear.

An algorithm that surfaces personalized content that does not take into account the many multiple factors that indicate interest, from the person sharing the content, to the content source, keywords, headline, author, time of day, time since publishing, the individual(s) commenting on that message, the keywords in the headline in combination with the author and/or the source, etc. simply isn't enough. The truth is that each of us does this automatically, and what the world needs is social networking that thinks like we do. For example, if you like financial news from the Wall Street Journal more than you like it from GigaOM, then similar stories from both should be weighted this way. But if you prefer articles on stock from Om Malik more than you do from Mathew Ingram, that too should be determined. The human brain is a very complex object indeed, but just because something is hard doesn't make it something you don't want to try.

Which brings us back to Google+. Initial content in any network excitedly rallies around itself. Soon following, one finds a backlash against meta posts, a call for the mainstream to enter the site, a fear for what happens when they do, a backlash against top users and so on. But once the fun of that is done, people behave like people and want to see interesting stuff. One person's noise is another person's signal, and unfortunately, very few people can be constantly logged in to a service. This means that when they log back in to a service, they shouldn't be forced to see just the most recent things that have happened, but instead, the best things that have happened - the content that is most important to them as an individual, the pieces of content that they absolutely did not want to miss. Because if something is especially important and relevant to you as an individual, that it came out two hours ago does not render it useless.

The advent of Google's much-discussed Circles delivers bidirectional manual filtering of people. It's bidirectional in that you are consuming from a unique list which you created, and you are sharing to a unique list of people which you created. If you create a list of "My Poker Buddies" and another for "College friends" and another for "Tech News People", the truth is that your poker buddies are going to talk about things other than poker, your college friends are going to talk about new stuff, and your tech news friends are going to talk about whatever they want... all day long. So the circles are porous. Simply naming one "Baseball" won't force people to talk about baseball, and until search is fully implemented beyond Sparks, there is no great way to find that on the site.

Another common fear about filters (which we discussed when responding the filter bubble) was that preferences are reinforcing, and that you see only what you want to see, at the exclusion of all else, that this leads to a dangerous space where you don't get access to alternate opinions. Again, I argue that we are very early in the game of finding high quality algorithm-driven personal filters for news, for social networks, or anything else we use that could benefit from personalized ranking. The solution to a smart algorithm that learns your preferences implicitly, rather than polling you explicitly for what you say you like will mean that you don't have to sift through the dozens or hundreds of posts that you deem off-topic, but instead that you get the best delivered right to you.

Unfortunately, while many companies have talked about this possible panacea, most all of them are cheating through collaborative filtering, and assuming that your social graph is smart enough to determine what is the best content for you. It's simply not true. Your interests are not my interests, and just because I like a specific topic on one day doesn't make it the most important thing the next. What is needed is a strong body of record that is tied to you as an individual, applied to your stream in real-time, helping you avoid the mess and find the best.

Google has traditionally been very cautious, going against conventional wisdom by not leveraging behavioral targeting as much as they could, by going out of their way to not overuse your Web history, your email activity, or in any other way, abusing the relationship you have with them and your content. With Google+, on both mobile and desktop, they have a new opportunity to do this correctly, continually learning more about your interests and activity to serve you the most relevant updates while avoiding much of the cruft that has plagued other networks, specifically Facebook. Having worked closely with my6sense for the better part of two years, I've seen directly how smart algorithms based on implicit feedback can make useful high quality streams out of what would more commonly be considered noise - and I've seen many people on Google+ and Twitter call for the same such filtering engine to be applied.

As Google+ gains visibility with the service opening up to more users, and less geeky users, it is inevitable that the stream content will be diversified and become more "noisy". Initial users will no longer be accepting of the content, excited just to use something new, but they will want the network to provide increased quality and connections than the status quo. While it's expected Google will eventually do a tie-in with casual gaming on a dedicated games site, the general hope from the community is that game-related info with not pollenate Google+. It's well-known that game info is a common polluter of Facebook streams, even if you've done your darndest to block all the services that hit your feed.

Tom's approach is laudable. He sees a new network with great promise, and is scared that Google will fumble it away. But I believe his conclusion is not perfect, for if done correctly, intelligent algorithms can make the network the most personal and most relevant one on the market. What Google+ needs to deliver is not just that it exists, but that it is differentiated and better. Why not use the smarts and information the company has to achieve just that?

Disclosure (as always): I am the VP of Marketing at my6sense, which provides personalization of news and social streams (but not yet Google+).