Entries in content discovery (1)

Tuesday
Jun142011

Social content discovery is broken but Instapaper could fix it

Instapaper is a web service that allows you to save articles you find on the web for reading later. It stores these in a list maintained on their servers, and allows you to access it from your browser, or to sync the list with an iPhone or iPad app. Part of its magic is that it strips out the text and images of the article, ignoring ads and irrelevant page navigation, and presents the article in an optimized reading format. A browser bookmarklet allows you to save an article with one click, even directly from an RSS reader like Google Reader. It’s a great experience, and if you have an iPhone or iPad and read even a small amount of material online I suggest you give it a go.

At their recent developer conference, Apple announced that a similar feature will be built into their browser, Safari, allowing users to save articles to read later and strip out the relevant material from the cruft. Online speculation immediately decreed that this competing service spells the end of Instapaper.

It’s creator Marco Arment has a different take, which he has described on his blog, and discussed on his popular podcast. He argues that if anything, Apple’s feature is likely to increase demand for his product. Currently many people are unaware of Instapaper and how great the type of optimized, save-it-for-later, reading experience that he offers is. Being exposed to Apple’s version will create awareness and whet their appetite for this type of service, and then they will easily be able to find his “premium” service in the App Store.

Obviously this argument comes down to him actually offering a better experience than Apple. He gives some examples of why Instapaper is better, such as the “social” features he has recently introduced. But while watching Apple’s demo I was struck that Apple will be ahead with at least with one feature: they automatically extract all the pages of a multi-page article, and join these into a single scrollable page. Instapaper doesn’t do this yet, and I’ve been frustrated several times by having to stop partway through a previously-synced article while on a flight.

It occurs to me that one way he could differentiate his product is by doing something clever with content discovery. The current “social” features are better than nothing, but don’t fully utilize the unique nature of Instapaper’s data.

Currently there is the option to link your Instapaper account to some of the popular social networks: Facebook, Twitter, Tumblr, Pinboard and Evernote. This allows you to share “liked” items with your friends on these networks. Pretty standard so far; if I think all of my friends on a network will find something interesting (a suspect assumption, but a common one) I can post it there. This doesn’t help me find new content on Instapaper though. Those services are too noisy with other information to be a reliable source of interesting articles.

As far as content discovery goes there are two options. I can see articles that have been “liked” by my Instapaper friends, or I can look at a list of articles that have been curated by some other editor. Neither of these are very useful. I don’t have any Instapaper friends. And none of my actual friends or contacts have similar interests to me so I don’t really care what they’ve liked. I already follow a number of people I find interesting via RSS and Twitter, and some of their links I save to read later, but Instapaper is offering nothing new here. The “curated” lists are even less likely to deliver me things I’ll find interesting.

The current model for content discovery on the internet is broken. It is based on the premise that I will be interested in the same articles that are “liked” by my “friends”, but I’m not. Sure, sometimes I’ll enjoy something a friend shares, but there is so much noise, I don’t usually bother looking. I’m also a bit of a news-junkie so I’ve usually already seen it, seen the discussion that sprang up around it, and sometimes seen it discredited.

I suggest that Instapaper has the best data anywhere on what I actually find interesting, and is in a unique position to suggest new content to me. Consider that Instapaper’s reading list is already a filtered list of all my other sources. I subscribe to over 200 RSS feeds that I read in Google Reader, both in my browser and my iPhone. Most of the items that flow through this are noise that I’m not interested in, and a very small percentage I save to Instapaper because I think I’ll actually want to read them. Very small posts I read in Google, the rest go to Instapaper. Now, in Instapaper, articles that I find interesting I read all the way to the bottom, and either simply archive (delete) or move to a folder if I may want to find them again later. Articles that I don’t find relevant I’ll possibly read all the way to the bottom and archive, or, more likely, archive without reading all the way. Instapaper may not collect all of this information, but it could, and it should.

Imagine that there is another person out there that has the same interests as me. In fact they save most of the same articles to their Instapaper reading list as me. But every now and then they find an article that I don’t, an article that I would be interested in. Why doesn’t Instapaper suggest this article to me?

A naïve solution might be to suggest this user to me to add as a “friend”, but this model is broken too, and reveals the flaws in my scenario above: people rarely have a single field of interest. Suppose this hypothetical person shares my interest in Apple news and James Bond trivia, but he also likes baseball, which I couldn’t care less about. If I started following him then yes, I’d get some relevant articles, but I’d also get the baseball noise. I’d prefer if some smart service could filter out the articles I’ll find relevant.

The fact that people have partially overlapping spheres of interest is what makes this a hard problem. But not impossible.

Let’s suppose that instead of one person who shares my interest there are three. Person A likes Apple news, James Bond trivia, and baseball. Person B likes Apple news, James Bond trivia, and taxidermy. Person C likes James Bond trivia and stamp collecting. If the same article is saved by all three people, its got a high chance of being about James Bond trivia and I will probably find it interesting. If an article is saved by only one of the people, the chance is lower that it will be about one of my interests. If its saved by Person A and Person B then the chance is good that it will be about James Bond trivia or Apple news, and interesting to me. This is a simplified example, but the point is that the more people who have saved something, and have also saved the same articles as me in the past, the higher the probability that I will find it interesting.

This is a difficult problem using “big data”, but there are lots of potential ways it could be tackled and improved over time. Perhaps a simple first effort could use cluster analysis on articles to identify distinct “topics” that people tend to be interested in. Each user could be associated with each topic, with a weight based on the number of articles of that topic they’ve saved. Each article that is saved could be classified as one or more topics based on the known topic-interests of the users that have saved it. Or perhaps users could be clustered into groups based on the similarity of articles they’ve saved in the past. And I could be suggested new articles based on the number of “similar” users that have saved it. I should have a way to mark that an article is irrelevant to me, and the algorithm can use this to adjust its weights. And my interests may change over time, so there should be a temporal component to the weighting.

The great thing is that the longer I use the service the smarter it should get, and I don’t need to make an effort to do anything “social”, it is all based on my reading habits.

I don’t know any of the details, but I believe that StumbleUpon must do something like this to suggest content (hopefully in a much more sophisticated way). They’ve also managed to figure out how to monetize it, by taking “sponsored” content, and figuring out who is most likely to find it interesting. However, I use StumbleUpon when I want some mindless distraction, I use Instapaper for the opposite type of distraction, and I’m much more engaged with the content.

I love Instapaper as a service, and I’m a fan of Marco’s blog and podcast. I’d love it even more if it could intelligently suggest content to match my interests.