This has been bothering me for a while now, dating back to last year, when I first heard Clay Shirky’s very pithy statement that information overload isn’t a real problem, the real problem is a failure to build effective filters. It’s a catchy little phrase, and like most theories from Web 2.0 gurus, it seems reasonable on the surface, but when applied to the world of scientists, it’s less than useful.

The O’Reilly Radar blog had a link this week to an interview with Shirky where he discusses the concept in detail, which was helpful to finally get a handle on what he means and why it’s irrelevant in my world:

“…the information overload people are the most narcissistic because information overload started in Alexandria, in the library of Alexandria, right? That was the first example where we have concrete archaeological evidence that there was more information in one place than one human being could deal with in one lifetime, which is almost the definition of information overload. And the first deep attempt to categorize knowledge so that you could subset; the first take on the information filtering problem appears in the library of Alexandria.

By the time that the publishing industries spun up in Venice in the early- to mid-1500s, the ability to have access to more reading material than you could finish in a lifetime is now starting to become a general problem of the educated classes. And by the 1800s, it’s a general problem of the middle class. So there is no such thing as information overload, there’s only filter failure, right? Which is to say the normal case of modern life is information overload for all educated members of society.

If you took the contents of an average Barnes and Noble, and you dumped it into the streets and said to someone, “You know what’s in there? There’s some works of Auden in there, there’s some Plato in there. Wade on in and you’ll find what you like.” And if you wade on in, you know what you’d get? You’d get Chicken Soup for the Soul. Or, you’d get Love’s Tender Fear. You’d get all this junk. The reason we think that there’s not an information overload problem in a Barnes and Noble or a library is that we’re actually used to the cataloging system. On the Web, we’re just not used to the filters yet, and so it seems like “Oh, there’s so much more information.” But, in fact, from the 1500s on, that’s been the normal case.

Okay, so if by “information overload”, you mean that there’s more interesting stuff out there than I could ever handle if I tried to read all of it, fine, Shirky’s comments make sense. But that’s not what the scientists I talk to on a daily basis mean by “information overload”. What they mean is that we’re seeing huge increases in both the numbers of people doing scientific research, and the numbers of scientific papers being published. While I hate to quote Wikipedia, the numbers listed there (take these with a grain of salt as one should all Wikipedia content) show an estimate of 11,500 total scientific journals in 1981, and over 40,000 listed in 2008 in PubMed in fields related to medical science alone.

Now, most scientists are familiar with the “cataloging system” of scientific journals, they’ve been reading them their entire careers. Everyone has their own filters, their own rankings of which journals are more interesting, or publish better work than others. And all kinds of tools are available for filtering things down to just the relevant essentials for keeping up with your own field. But even so, most people that I talk to are left with more useful, relevant articles that they need to read than they have time to get to. These are not articles that should be filtered out. These are important, quality findings of direct relevance to their own work. And there are too many of them without even factoring in a need to keep up with science in general and see what developments in other fields can be applied to one’s own.

So no, it’s not a filter failure. It’s a genuine overload. A “filter failure” implies that scientists are just not tossing out the less relevant material, but that’s not what’s happening (as an example, almost no scientists I know read science blogs–those are something filtered out as being of less value than the primary literature). Is it so hard to believe that as science and technology move forward, that more and more research is being done, and that there’s more knowledge generated that one should take in? Is it wrong to want to be as informed as possible of one’s own field, and to seek ways of assimilating more research, rather than ways of discarding valuable information?

Shirky’s suggested solution is of no help here:

“So, the real question is, how do we design filters that let us find our way through this particular abundance of information? And, you know, my answer to that question has been: the only group that can catalog everything is everybody. One of the reasons you see this enormous move towards social filters, as with Digg, as with, as with Google Reader, in a way, is simply that the scale of the problem has exceeded what professional catalogers can do.”

I don’t know about you, but I’m not sure how much I’m willing to trust a random group of strangers to tell me how relevant a particular paper is to my own research. Sure, you can get some sense of the quality of the work, perhaps even a decent summary. But no one knows your work as well as you do, and no one is going to be able to tell you what tiny details in a paper will or won’t act as a springboard for new avenues of research. I’d also argue that the top researchers are probably better at discerning those details, and if they leave the paper-reading to others, they’re going to miss out on much of what makes them better than their peers and science is going to suffer.

So while social filtering like that described does have its uses, it’s not the solution here. Social filtering is nice for discovery, for finding papers you might not have read on your own, but that’s not the problem I hear from most scientists. Most aren’t looking for more to read.

Shirky’s point may be relevant in some situations (certainly anyone looking to read every book in the Library of Alexandria will learn a valuable lesson from him), but like most Web 2.0 wisdom, it fails when applied to the particular needs of scientists. As the old phrase goes, “To a hammer the world looks like nails” and Shirky often strikes me as yet another Web 2.0 evangelist trying to convince us that our individual needs are all the same easily hammered nails.

Update: in response to some of the comments, I’ve tried to clarify things with a further posting on this subject, part 2, available here.