Social Software

Given some of the comment reactions from my last posting, perhaps I wasn’t clear enough in what I was trying to say, so a bit more here. As many have pointed out, scientists have long been bombarded with large amounts of potentially useful information, and have developed a sophisticated set of filters to deal with it, both on and offline. That’s not the issue. The issue is that, due to the exponential growth in the amount of research being done and published, even with highly effective filters that eliminate everything extraneous, one is often still left with more information than can be dealt with in a reasonable amount of time. Let me try to explain with a hypothetical example:

I’m a professor at University X. I have a busy schedule, between doing my own bench research, writing grants, managing my students/postdocs and my faculty duties. I have time in my schedule to read (choosing this number randomly) 10 papers a week in depth for a full understanding. 25 years ago, this was fine. The filters I had built pointed me toward 4 quality papers a week directly relevant to my research, and this allowed me to read 6 other papers in other fields. I had complete knowledge of the important work in my own field, plus a good working knowledge of many other fields that could be applied to my own. Fast forward to today, using even better filters, including Connotea, Digg, Science Blogs, what-have-you, I am now pointed toward 12 quality papers a week directly relevant to my research. This is not a filter failure–my filters are better than ever. They’re discarding more than ever before. But the quantity of research published has increased so much that even with more powerful filters, there’s more directly relevant information out there that I need to take in. I have no time for papers outside of my own field, not even enough time for the papers within my field.

That’s what most scientists I know mean by “information overload”. They’re filtering like crazy, but due to the exponential growth in research and journals, there’s more knowledge to assimilate. The solutions available seem to be:

1) Specialization–this is basically the answer I’m being given by those who just say that we merely need to improve our filters and eliminate more material. Doing so means a shallower knowledge of our own field, and a much shallower knowledge of other fields. This is not good for science, and seems contradictory to the cross-disciplinary world that science has become, where the skill set required is much bigger than ever. The more one filters, the more one narrows one’s focus.

2) Spend more time with the literature–this seems to be the approach most scientists are taking, and other parts of their careers and lives are suffering for it. Either their students, their universities or their families end up neglected.

Yes, it’s true, as AJ Cann notes, “every scientist since Aristotle has suffered from information overload,” but the quantity of that overload has grown exponentially. It’s one thing to follow the dozens of labs doing molecular biology in the late 1950’s, it’s another to follow the tens of thousands (if not hundreds) of molecular biology labs today. At some point, even the most sophisticated filters become overwhelmed, or at least they return more information than one can read without sacrificing elsewhere. And many are finding this frustrating, finding that it takes away from some other part of their research/lives. Solving the problem with more filters just means more specialization, which is also a sacrifice, and a way toward doing less important, less interesting science.

This has been bothering me for a while now, dating back to last year, when I first heard Clay Shirky’s very pithy statement that information overload isn’t a real problem, the real problem is a failure to build effective filters. It’s a catchy little phrase, and like most theories from Web 2.0 gurus, it seems reasonable on the surface, but when applied to the world of scientists, it’s less than useful.

The O’Reilly Radar blog had a link this week to an interview with Shirky where he discusses the concept in detail, which was helpful to finally get a handle on what he means and why it’s irrelevant in my world:

“…the information overload people are the most narcissistic because information overload started in Alexandria, in the library of Alexandria, right? That was the first example where we have concrete archaeological evidence that there was more information in one place than one human being could deal with in one lifetime, which is almost the definition of information overload. And the first deep attempt to categorize knowledge so that you could subset; the first take on the information filtering problem appears in the library of Alexandria.

By the time that the publishing industries spun up in Venice in the early- to mid-1500s, the ability to have access to more reading material than you could finish in a lifetime is now starting to become a general problem of the educated classes. And by the 1800s, it’s a general problem of the middle class. So there is no such thing as information overload, there’s only filter failure, right? Which is to say the normal case of modern life is information overload for all educated members of society.

If you took the contents of an average Barnes and Noble, and you dumped it into the streets and said to someone, “You know what’s in there? There’s some works of Auden in there, there’s some Plato in there. Wade on in and you’ll find what you like.” And if you wade on in, you know what you’d get? You’d get Chicken Soup for the Soul. Or, you’d get Love’s Tender Fear. You’d get all this junk. The reason we think that there’s not an information overload problem in a Barnes and Noble or a library is that we’re actually used to the cataloging system. On the Web, we’re just not used to the filters yet, and so it seems like “Oh, there’s so much more information.” But, in fact, from the 1500s on, that’s been the normal case.

Okay, so if by “information overload”, you mean that there’s more interesting stuff out there than I could ever handle if I tried to read all of it, fine, Shirky’s comments make sense. But that’s not what the scientists I talk to on a daily basis mean by “information overload”. What they mean is that we’re seeing huge increases in both the numbers of people doing scientific research, and the numbers of scientific papers being published. While I hate to quote Wikipedia, the numbers listed there (take these with a grain of salt as one should all Wikipedia content) show an estimate of 11,500 total scientific journals in 1981, and over 40,000 listed in 2008 in PubMed in fields related to medical science alone.

Now, most scientists are familiar with the “cataloging system” of scientific journals, they’ve been reading them their entire careers. Everyone has their own filters, their own rankings of which journals are more interesting, or publish better work than others. And all kinds of tools are available for filtering things down to just the relevant essentials for keeping up with your own field. But even so, most people that I talk to are left with more useful, relevant articles that they need to read than they have time to get to. These are not articles that should be filtered out. These are important, quality findings of direct relevance to their own work. And there are too many of them without even factoring in a need to keep up with science in general and see what developments in other fields can be applied to one’s own.

So no, it’s not a filter failure. It’s a genuine overload. A “filter failure” implies that scientists are just not tossing out the less relevant material, but that’s not what’s happening (as an example, almost no scientists I know read science blogs–those are something filtered out as being of less value than the primary literature). Is it so hard to believe that as science and technology move forward, that more and more research is being done, and that there’s more knowledge generated that one should take in? Is it wrong to want to be as informed as possible of one’s own field, and to seek ways of assimilating more research, rather than ways of discarding valuable information?

Shirky’s suggested solution is of no help here:

“So, the real question is, how do we design filters that let us find our way through this particular abundance of information? And, you know, my answer to that question has been: the only group that can catalog everything is everybody. One of the reasons you see this enormous move towards social filters, as with Digg, as with, as with Google Reader, in a way, is simply that the scale of the problem has exceeded what professional catalogers can do.”

I don’t know about you, but I’m not sure how much I’m willing to trust a random group of strangers to tell me how relevant a particular paper is to my own research. Sure, you can get some sense of the quality of the work, perhaps even a decent summary. But no one knows your work as well as you do, and no one is going to be able to tell you what tiny details in a paper will or won’t act as a springboard for new avenues of research. I’d also argue that the top researchers are probably better at discerning those details, and if they leave the paper-reading to others, they’re going to miss out on much of what makes them better than their peers and science is going to suffer.

So while social filtering like that described does have its uses, it’s not the solution here. Social filtering is nice for discovery, for finding papers you might not have read on your own, but that’s not the problem I hear from most scientists. Most aren’t looking for more to read.

Shirky’s point may be relevant in some situations (certainly anyone looking to read every book in the Library of Alexandria will learn a valuable lesson from him), but like most Web 2.0 wisdom, it fails when applied to the particular needs of scientists. As the old phrase goes, “To a hammer the world looks like nails” and Shirky often strikes me as yet another Web 2.0 evangelist trying to convince us that our individual needs are all the same easily hammered nails.

Update: in response to some of the comments, I’ve tried to clarify things with a further posting on this subject, part 2, available here.

Haven’t done one of these for a while, so time to clear out some bookmarks to interesting stories:

David Byrne gets evolution
As a Talking Heads fan since, well, ever (yes, I am old), I’ve very much enjoyed reading David Byrne’s blog, particularly his recent travelogue posts from his current tour. Reading this post, his musings on bringing back extinct species, I was pleased to see him eloquently explain one of the more misunderstood concepts about evolution:

“We wrongly, I think, persist in believing that evolution is some kind of “progress” — a series of more or less linear improvements in each species — and that animals alive today, including us, are therefore “better” than what came before. Xenophobic thinking, seems to me. Critters that came before, and stayed around way longer than we did, were extremely evolutionarily successful in that they had adapted beautifully to the environment that existed around them. For example, if present-day animals were somehow transported back millions of years, we might find ourselves less suited for survival than our hairy pals. We’d be the ones that would go extinct. Evolution is not absolute.”

Byrne and Brian Eno’s new album, by the way, is definitely in my top 5 for the year.

Open Access and Citations
Another study asking whether open access provides an advantage to getting citations for your articles. This one says no.

New Kindle leaked
Photos of the next-gen Kindle have leaked, looks like they’ve caught up from the 80’s to the late 90’s/early 00’s in their design sense. One wonders what this leak will do to holiday sales, whether there will be something of an “Osborne Effect” (which is always an interesting misnomer). Also, there’s this competing reader.

Business Book Breakdown
I think this commentary should be extended to books about Web 2.0, or the onrushing digital revolution. I keep buying them, reading the first 50 pages or so, then putting them down and never returning. Most would make great magazine articles or blog entries, but stretching a few ideas out over several hundred pages never seems to work.

….or has that boat already sailed?

I’ve read many a blog posting or magazine article declaring that scientists are behind the curve, and we biologists have been slow to pick up the new online tools that are available. I’ve repeatedly asked for examples of other professions that are ahead of the curve that we can use as models (are there social networks of bakers sharing recipes and discussing ovens?), but haven’t seen much offered in response. I tend to think that it’s not a question of scientists being slow, it’s that the tools being offered aren’t very appealing. Note how quickly scientists moved from paper journals to online versions, which only took as long as it did because of the slow progress on the part of journal publishers getting their articles up on the web. The advantages of online journals were obvious, and in comparison, the advantages of joining “Myspace for scientists” are less evident.

Are social networks (“Meet collaborators! Discuss papers!”) ever going to see heavy use from the biology community? Or are we starting to see that they’ve run their course in general, and scientists were prescient in not wasting their time?
—article continues—

I’ve written about Zotero before, it’s an intriguing tool, essentially a Firefox plug-in for managing your reference list and other pieces of information. It’s a bit of a hybrid between online management tools like Connotea and things like Papers which you store on your own computer.

The bad news is that Thomson Reuters, the manufacturers of EndNote, are suing George Mason University and the Commonwealth of Virginia because a new version of Zotero lets you take your EndNote reference lists and convert them for use in Zotero. Yes, this is the same Thomson of Thomson ISI, secret gatekeepers of journal impact factors. They really seem to be going out of their way to lose what little goodwill they have left with the scientific community. It will be interesting to see if this reverse engineering for interoperability holds up in court as something that should be prevented.
—article continues—

Some interesting recent articles on Web 2.0 and Publishing:

EmTech inanity
Ever since Dan Lyons abandoned his Fake Steve Jobs persona, his blog has gone way downhill (and his Newsweek articles have been generally lame as well). But when he fires on all cylinders, he can still put out some of the funniest, most scathing commentary you’ll find on the tech industry. Here he reviews a conference panel of some of the biggest names in Web 2.0 and really nails the failings of so many of these tools, particularly those launched for scientists: they’re solutions in search of problems:

“If I were funding these guys I might go home scratching my head about what those kids are doing with all of my millions. Maybe there is a point to what they’re doing, but honestly, what great problem are these companies trying to solve? Sitting there watching this spectacle — watching these guys unable to simply explain what they do and and how they are going to make a business out of it – it was staggering to think that someone has entrusted these people with very large sums of money.”

Lyons further hammers home his message by noting that the participants all spoke about “how they had been trying to find a good restaurant in Boston and how their cool social networking tools and collaborative filters had enabled them to do such a great job of this restaurant hunting task.” The restaurant they found? The Union Oyster House, a dreadful tourist trap that anyone who has lived in Boston knows to avoid. Also, the quote of the week can be found in the article’s comment section:

“…the unspoken agreement of Web 2.0 seems to be that there is nothing more terrible than having to spend even a second alone with one’s own thoughts.”

—article continues—

Recently, the NY Times had an article discussing the concept of “ambient awareness”, or as the article puts it, “incessant online contact”. Now, first off, I have to admit that I’m one of the over-30-year-olds the article mentions, who finds the concept of subjecting others to (and being subjected to) a stream of trivial details about one’s day completely unappealing. The proponents of Twitter and FriendFeed and the like feel that they’re getting a more intimate understanding of people, “something raw about my friends,” as one user puts it. I’m more in line with the critics quoted in the article that the end result is more “parasocial” than social, and that it ends up an extension of reading gossip magazines and following celebrities from afar.

So how do these new practices apply to the world of science research?
—article continues—

So many interesting articles, so little time to blog…
Here’s a quick roundup of some items of interest, before I forget them:

The Many Challenges of the Social Media Industry
Great article that really sums up the issues facing Web 2.0, most are directly applicable to science on the web, particularly the lack of revenue generated, the low barrier to entry causing multiple entrants for every niche, excessive noise, the difficulties in spotting expertise, and the influx of marketers and spammers.

The Importance of Being First
The Scholarly Kitchen looks at the ways scientists are gaming the arXiv system, and submitting their papers at specific times to ensure a higher listing in e-mailed announcements, which results in more citations. This is something very worrying about switching from our current editorially-supervised system of publishing papers to an open system. Sure, the current system isn’t perfect, but things like arXiv and social networks are very open to manipulation. A switch from one to the other may just be a lateral move in terms of bias and favoritism. Note that most of the proponents of Web 2.0 for science are all well-networked and well-versed in how things work, so adoption of these technologies would give the evangelists a distinct advantage over everyone else.

—article continues—

Google has officially announced that their Knol product is now open to the public. Over at the Science of the Invisible blog, AJ Cann asks, if it’s worthwhile and really anything “more than extra eyeballs for AdSense.” My response is that of course the whole driving force behind Knol is extra eyeballs for AdSense. That’s what Google does. That’s their MO. To paraphrase the now defunct Fake Steve Jobs, Google’s business model is to drive the price of everything on earth to “free”, everything except one thing that is, small ugly text-based ads, which, conveniently enough, they’ll be the ones selling. So you should never have to ask, is this just a ploy to sell more ads, because with Google, the answer is always going to be “yes”.

That said, there is some merit to the project, and it will be interesting to see if they can get buy-in.
—article continues—

Michael Nielsen has written a thoughtful essay over on his blog asking why scientists have been so slow to pick up on new web 2.0 technologies (found via Bora’s blog). It’s good to see that many of his conclusions echo my own (here too), that the big problems are a lack of time and incentive. He offers some potential solutions, and reasons why people should be using these new tools. A few responses, as always attempting to cut through the evangelism, cross-posted over in his comments thread:

—article continues—

« Previous PageNext Page »