Science Publishing

Given that today is April 1, which is always a good day to avoid the internet (so many attempts to be funny and clever, so few successes), I was surprised to come across two of the more perceptive articles on science publishing that I’ve seen for quite a while. If you got all your information from the science blogosphere, you’d be under the impression that all publishers are inherently evil, that they add no value whatsoever to anything, that scientists have an infinite supply of free time and that they’ll happily pitch in and do all of your work for you. So it was nice to see Larry Moran’s latest blog entry where he talks about the really hard (and expensive) work that goes into creating a textbook.

Moran cites a recent PLOS book review, where Sean Eddy takes the “information wants to be free” crowd to task for what he calls “magical thinking”:

“A more utopian “open” advocacy simply denies this real-world tension. Information wants to be free; corporations are evil; people will make great stuff for love not money; free stuff will save the developing world; we’ll pay for it with taxes and charity. You don’t have to subscribe to Ayn Rand’s brand of laissez-faire capitalism to have serious problems with this. It amounts to claiming that intellectual work doesn’t take time, or that time isn’t worth money—that intellectual property protections exist only to create profit for unnecessary middlemen, not to enable the work of talented professionals who create works that can be readily copied.”

The review discusses a new book on open education resources, and Eddy notes how woefully inadequate so many of them are, primarily because it takes time, effort and money to turn rough class notes into a truly useful educational tool:

“Nonetheless, when I actually went to these sites, it became clear how far they have to go before they can compete with a good book. Too many resources I saw were sketchy, incomplete, and unsatisfying…Distributing open-source software or open-access literature is only a matter of attaching an open license to a finished product, but most of an educator’s course materials are rarely a finished, free-standing work. Course materials are more usually fragmentary, cobbled-together aide-mémoires that only make sense in the context of face time in the course. A lot of work must go into each piece of content to raise it to the quality of textbook material, and yet more work is required to have the material best use the interactive capabilities of the Web.”

Moran takes up this theme to discuss his own textbook:

“Life is never as simple as the Web 2.0 fans make out. Somebody is going to have to do a lot of work before the quality of a website matches what’s in the best introductory textbooks. And it’s extremely naive to think that all that work is just going to be given away for free.

I’m not just talking about authors. There’s a whole team of people involved in publishing my textbooks. This includes editors who correct my spelling and grammar—an onerous task in my case. It includes artists who make the figures and editors who obtain permissions and copyrights for photographs. Then there’s the staff at the publishers who receive and mail out manuscripts for review and editing and who handle all the paperwork/electrons associated with a major project.

Are we going to ask all of them to work for free by putting everything on the web? Of course not.”

Of all the books I’ve worked on at CSHL Press, the textbooks are by far the most time-consuming, and the hardest to do well. A really well-done textbook takes an inordinate amount of editorial oversight. For example, taking chapters written by multiple authors and editing them so there’s a consistent voice throughout the entire book is no easy task. You want all of the illustrations to be done in the same style, again for consistency so a student can extrapolate between chapters, and that means hiring an illustrator for the book. These are just a few tiny factors in the big picture–a lot of hard work goes into creating a good textbook, and the people involved should see some recompense for that work.

Now, you can argue that the textbook market is a strange place, to be sure, and that big corporate publshers often do shady things in that market in the name of increased profit. And that there are some awful textbooks out there. No argument here. But expecting to replace all the hard work done on a textbook with some fuzzy entity called “the crowd” who will be doing the work out of the kindness of its heart, and thinking you’ll end up with as high quality a textbook as one put together by talented professionals is ludicrous. More from Eddy:

“Many technologists today are infected with an idea that “community is king,” that high-quality content will rain down freely merely because we connect digital communities openly. This confuses ways of sharing ideas with ways of creating ideas. It is a kind of magical thinking that has much in common with the cargo cults that cut landing strips in the jungle and carved radios from sticks in hope that more sophisticated beings would parachute technological artifacts down upon them.”

Talented people who work hard deserve to get paid for that work.

Addendum--note the comment on Moran’s blog from “anonymous” who suggests Moran just get a government grant to make his textbook free. Now, I’m sure all you scientist out there know how easy it is to procure government grants these days, right? And where does that government grant money come from anyway?

Addendum 2–how’s that textbook crowdsourcing effort in Texas working out?

As I explained last year, Cold Spring Harbor Protocols is something of an experiment as a publishing business model. Because some of our articles come from our lab manuals, where we owe authors and editors royalties, we chose to extend those royalty payments to authors of new, original articles. Writing up methods is usually not a priority for most scientists, they’re more focused on data-driven papers. We wanted to provide a nice incentive for authors to 1) write up their methods and 2) publish them with us, rather than other journals who don’t offer such incentives. We’re not talking about huge amounts of money, but as I recall from my graduate student days, every little bit helps. If I could have published a paper AND gotten some cash for a night on the town, I would have been thrilled.

The way it works is that each year we set aside a percentage of our subscription revenue for the journal. This total amount continues to grow as the journal’s subscription base continues to grow–and we’re happy to report that CSH Protocols is seeing a lot of uptake by the scientific community. That sum is then divided among all authors based on the usage of individual articles. Original articles generated a range from around $3 (for an article published right at the end of the year, with little time to accumulate readership) to one of our most-read articles, which will result in a payment of $367 for the author (our top original paper author wrote a set of two papers and will receive just over $600).

So, if you wrote an article for us that was published in 2008, you should expect to see a check in the mail in the next few weeks. We hope that this revenue sharing is a nice bonus for the hard work you put into your article and that this serves as an incentive to write up more methods for publication. And if you haven’t yet published with us, what are you waiting for?

There’s been a rash lately of articles and blog entries pleading with scientists to enter the blogosphere. One disturbing aspect of this has been how many of them have been written by various aspects of the Nature Publishing Group. Three recent articles (here, here and here) all make the case that scientists should start writing blogs because science journalism is on the wane, and that science blogs can fill the void left behind for educating the general public about science. Coincidentally, Nature just happens to run one of the biggest centers for science blogging. Does their desire to have this venture grow and succeed have any influence whatsoever on their opinions about the need for scientists to take on this extra workload? From Nature’s own ethical guidelines:

“In the interests of transparency and to help readers to form their own judgements of potential bias, Nature journals require the authors of most articles to declare any competing financial interests in relation to the work described…”

Interesting how that applies to authors but apparently not to their own editorials.

Now, as to the meat of the subject matter presented–are science blogs going to replace science journalism? I have my doubts, which I’ll explain below. The whole thing reminded me of Clay Shirky’s recent article, Newspapers and Thinking the Unthinkable. While I’ve strongly disagreed with Shirky in the past, I thought this was a perceptive piece, and I particularly liked the open-endedness of his argument. Essentially what Shirky says is that things break quickly, then it takes a while for something new to develop to replace those things. There’s not an immediate fix on the horizon for our disappearing newspapers. I like that instead of the usual vague cliches most Web 2.0-proponents spout for suggestions on how to proceed, Shirky leaves the question up in the air and doesn’t try to pretend there’s an obvious answer:

“No one experiment is going to replace what we are now losing with the demise of news on paper, but over time, the collection of new experiments that do work might give us the journalism we need.”

The one caveat I’d add is that the article assumes that good journalism is something our society values enough to preserve, which may be more of an open question. Sometimes things don’t get fixed after a revolution, they get worse. Time will tell.

That said, some thoughts on why science blogging is a poor substitute for science journalism:

1) Journalism is a real profession that requires training and a difficult to master skill set when done properly, as I discussed in this posting. Scientists, and science bloggers are not trained in that skill set. One can certainly make the argument that what passes for science journalism these days is far from ideal, but replacing it with something equally flawed does not strike me as an improvement.

2) Aside from the obvious problems with newspapers’ economic models, the reason journalism is on the wane is the dropping quality. Newspapers have systematically tried to cut costs over recent years, placing economic pressures on reporters. This has resulted in much of what passes for journalism becoming regurgitation of press releases (see Churnalism). Given that many science blog entries are just links to other articles, isn’t this much the same thing? Furthermore, if the nature of so many blogs (often including this one) is to provide links and commentary on original published works, what are bloggers going to write about if those original stories no longer exist? Do away with published news articles about science and you do away with a huge chunk of the subject matter of the science blogosphere.

3) The other big problem with the current state of journalism is the substitution of opinion for factual reporting. As noted here:

“Journalists report much less than they used to, and much less than they should, as the papers have switched over to a reliance on columnists and opinion.”

I can’t think of a single blog that I’ve ever read that wasn’t opinionated. Blogs are more like the editorial page of a newspaper than the front page.

4) As Larry Moran recently pointed out, most scientists are never going to blog. Reading and writing blogs appeals to a limited percentage of people in general, scientists being no different. Start with the subset of scientists deeply interested in communication, education and outreach, and then remove those who don’t enjoy the blogging process and you’re left with science bloggers. Factor in Jakob Nielsen’s 90-9-1 rule (online content is created by 1% of users, 9% occasionally contribute a little, 90% never contribute) and you’re talking about a tiny fraction of scientists. Does this give a balanced view of science? Anyone who regularly reads science blogs can quickly point out some of the general biases and viewpoints held by most of the blogosphere. Remember also that those doing really interesting research, the people you’d most like to hear from, are the least likely to blog. They’re too busy doing that research.

5) The world of science blogging is filled with navel-gazing. I think this is one of the main reasons you don’t see the mainstream of scientists writing or reading science blogs. The vast majority of blog articles I see are either about blogging (or other online communication tools) or about what other bloggers are doing/blogging about. Another big chunk is about life as a scientist. Then there’s a small percentage of posts about actual science. All this is great for building community and feeling a part of a connected group, but I’m not sure how interested the general science reading public is going to be in these cliques.

Phew. I seem to have quite a few rants in me as of late. Bottom line, let’s all keep blogging. It’s fun (at least for those of us who are into it) and no doubt it serves a solid educational purpose and opens lines of communication between scientists and between scientists and non-scientists. But I don’t expect it to become a required activity for most scientists. And let’s be honest about what it really is. The majority of these enjoyable personal diaries and spaces for voicing our opinions are a far cry from well-researched, well-written professional journalism. And I’m with Shirky on this one. New business models and new forms of communication will emerge to continue the process of journalism. We just haven’t seen them yet.

Edited to add–the one point I forgot to add. It’s interesting that the tools that were originally being sold to us as a means for scientists to interact, to troubleshoot techniques and experiments, to find collaborations, are now being pitched as a means for scientists to educate the general public. I always thought the original plans were a bit far-fetched (if every graduate student starts posting daily blog entries about their experiments, who’s going to read them all, let alone offer advice?), and it goes to show you that no matter what your intentions when you create a tool, users often find it better suited for something else. Which I guess explains why a tool created to help college students know their fellow students is now used by grandparents to show pictures of children to their former high school classmates.

Reading William Gunn’s recent blog posting, Could this be the Science Social Networking killer app? got me thinking more about the many online scientific reference list repositories like Connotea, CiteULike and 2Collab, and why they are failing to catch on. William is suggesting a Pandora-like system of expert reviewers tagging papers to set up a recommendation system. I’m not sure this would be really helpful–what you get from a scientific paper is very different from what you get from listening to a song, and their interconnectedness works in very different ways. And it brings to mind the failings of organizing your references by tags.

If you’ve ever dealt with any of these social bookmarking sites, you know how incredibly tedious they are to use. Even for journals like CSH Protocols, where we have buttons on every article to add it directly to these sites, you still end up jumping through hoops, filling out forms, writing summaries, adding tags. You’re on the spot at that moment to come up with a list of tags that will remind you about the content of that paper. As your worldview changes over time, and with it your research priorities, you’re probably going to want to revisit many papers and add additional tags. Even with all this time-consuming work, you still may not have added an appropriate tag to let you find what you want to find at a given moment. Did you add a tag for every method used in the paper? Every conclusion, every subject referenced? That band on the gel in figure 3 that you’re ignoring today might be very important to you tomorrow. How are you going to tag the paper in case you need to find it again?

It’s more work than it’s worth, particularly given the ability to do full-text searches on your collection of articles through programs like Papers (out for the iPhone this week, by the way). Why tag every single aspect of a paper when you can just do a quick search? Even Google Scholar and PubMed strike me as much easier tools to use for these purposes than social bookmarking sites. It’s why Apple, Microsoft and Google have spent so much time and money on desktop search applications over the last few years. Investing efforts in organizational schemes is pointless when you can call up the file you need via a quick search. Sure, with Papers or the search engines, you lose the social aspects of things, the use of the network as a discovery tool. Then again, your hard work adding tags isn’t helping you discover new papers, it’s helping other people. You have to hope that others are tagging papers as relentlessly as you and that they’re tagging the aspects of papers that fit your interests.

The tedium of tagging versus the relative ease of searching brought to mind this recent article from John Gruber on the Daring Fireball site, Untitled Document Syndrome. Gruber talks about friction, the number of tedious steps involved in so many programs:

“There’s the stuff you want to do, and there’s the stuff you have to do before you can do what you want to do. People have a natural tendency to skip the have to do stuff to get right to the want to do stuff if they can get away with it. Friction is resistance.”

His example is writing a Word document. How often do you find yourself starting a new document, and writing for a long while without actually saving it? Saving changes to an already saved document is trivial, a keystroke away, but a new document means you have to go through that dialogue box, figure out where you’re saving it, come up with a title, etc.

“The obvious problem with Untitled Document Syndrome is in the rare cases where you lose data because you never saved it. The non-obvious problem is that the mental friction posed by the Save dialog often keeps you from ever even creating or saving small items of data in the first place.”

Gruber talks about the different approach taken by programs like Apple’s iLife suite, where you just dump in music, video or photos, and you don’t have to worry about naming them, or deciding where to store them. The program does it for you. It’s no surprise then, that programs like Papers or Yojimbo, which are based on the same iLife-style interface, are so much easier to use for organizing your scientific research list. Given the time demands faced by scientists, it’s no wonder I’ve heard rave after rave about Papers, but never really receive much more than a shrug when discussing online reference sites with folks at the bench.

One other caveat–if you are going to invest your time in tagging, be sure to regularly extract your account’s information and back it up. Sites may disappear at the drop of a hat. Some of the science paper bookmarking sites are clearly unaware of the Napster and Grokster court decisions and their willingness to become redistributors of copyrighted material places them only a lawsuit away from the abyss. And even with those backups, uploading them to the next site is never as clean as you’d like it to be. Be prepared to repeat a lot of your efforts.

So much fodder, so little time:

Ma.gnolia suffers catastrophic data loss
Further evidence that “cloud” computing may not be the best approach for storing your precious research data. Remember, if you’re keeping any information in an online repository, it’s not enough just knowing that you can get your information out, you actually have to regularly do so and back it up.

False Fact On Wikipedia Proves Itself
Slashdot thread on the circular nature of Wikipedia. Someone posts something, another source sees it on Wikipedia and repeats it, Wikipedia confirms the fact by citing that source.

Twitter? It’s What You Make It
David Pogue weighs in on Twitter. His basic point is that while yes, it often is “a teenage time-killer”, there are useful things you can do with it. His suggestions for what’s useful though, are:

“I pass on jokes. I share little thoughts that don’t merit a full blog or article post. I follow links and track buddies….And I query the multitudes. Last week, I was writing a script for a TV segment, and needed a great example of “an arty movie that a teenage baby sitter wouldn’t be caught dead watching.” My followers instantly shot back a huge assortment of hilarious responses. (“Gandhi.” “My Dinner with André.” “The Red Balloon.”). Other people plug their blogs, or commiserate, or break news…”

So, he’s basically using it as a teenage time-killer, as a “lazy-web” way to get others to do his work and thinking for him, and for self-promotion. I can see the second use here as being a time-saver. But the question is, can you get anyone to follow you and respond to you queries if you don’t engage in the teenage jibber-jabber types of activities? Do you have to do the time-wasters to build an audience who might help you save time?

Time Demonstrates Non-Understanding of Social Media
Speaking of the sometimes overwhelming self-promotion that goes on under the guise of Web 2.0, here’s an amusing blog posting with the typical defensiveness aimed at anyone who questions the incredible value of jibber-jabbering away all day on a social network. As a colleague pointed out upon reading this:

“I think what people fail to realize is that no-one reads anybody else’s lists – like most blogs. The popularity of this sort of thing is ‘the doing it’. There is of course a whole ‘nother discussion as to why one would want to do something like that.”

Which makes me ask, if everyone is using these sorts of tools to “promote their personal brand”, is anyone actually reading anyone else’s promotion? Is Web 2.0 a room full of people with megaphones, each shouting, “Look at me!”?

Time to Hang Up the Pajamas
Fake Steve Jobs (Real Dan Lyons) notes that no, you’re not going to get rich blogging.

who is on twitter
Very amusing, my favorites being:

people who are involved in “social networking” and optimizing the power of re-Tweeting and “computers”
people who are concerned about the collapse of the publishing industry

Why aren’t we on Facebook?
The Onion, as usual, nails it. Count me in as part of that 22%.

Finally, someone else has come right out and said it–the general expectation that e-books should only cost “a dollar or two” is unrealistic, and will be a major barrier to their adoption. Bob Miller from Harper Studio weighs in here:

“Whether a book is printed on paper and bound or formatted for download as an e-book, publishers still have all the costs leading up to that stage. We still pay for the author advance, the editing, the copyediting, the proofreading, the cover and interior design, the illustrations, the sales kit, the marketing efforts, the publicity, and the staff that needs to coordinate all of the details that make books possible in these stages. The costs are primarily in these previous stages; the difference between physical and electronic production is minimal. In fact, the paper/printing/binding of most books costs about $2.00…so if we were to follow the actual costs in establishing pricing, a $26.00 “physical” book would translate to a $24.00 e-book”

And in the comments, he discusses Amazon’s pricing scheme (currently selling e-books at a loss), and the actual costs of shipping:

“the cost of shipping a physical book is usually about 20-25 cents per copy”

The common mistake appears to be, at least in my experience, that people start with the assumption that an e-book costs nothing to make–you’ve already paid for everything with the print version, and converting those files to an e-book costs nothing or very little. But every e-book copy you sell means one less print copy you’re going to sell, so the total cost of production has to be amortized out over both the e-book and the print version. It’s a big mistake John Siracusa makes here, which puts a big hole in his argument.

The book-buying public does not seem willing to accept that e-books cost a few dollars less than a regular book to produce, and does not seem willing to buy them at that price. Which is yet another reason we’re seeing slow uptake of this type of technology.

As a quick follow-up to this posting, a colleague sent along a review of a new book called Flat Earth News by Nick Davies, which rightly points out that the “death of journalism” isn’t a murder, it’s a murder-suicide. Yes, readers are abandoning print newspapers and magazines, preferring to get their information online (with an assumption that such things are free). At the same time, this abandonment is being driven by a decline in quality of the old media, as the owners seek to cut costs and increase profits. From the review:

“The most basic function of journalism, in Davies’s view, is to check facts. Journalists don’t just pass on what they’re told without making an effort to check it first. At least, in theory they don’t. In practice, contemporary journalism has been corrupted by an endemic failure to verify facts and stories in a manner so fundamental that it almost defies belief. The consequences of that are pervasive and systemic…Journalists report much less than they used to, and much less than they should, as the papers have switched over to a reliance on columnists and opinion…Stories need to be cheap, meaning ‘quick to cover’, ‘safe to publish’; they need to ‘select safe facts’ preferably from official sources; they need to ‘avoid the electric fence’, sources of guaranteed trouble such as the libel laws and the Israel lobby; to be based on ‘safe ideas’ and contradict no loved prevailing wisdoms; to avoid complicated or context-rich problems; and always to ‘give both sides of the story’ (‘balance means never having to say you’re sorry – because you haven’t said anything’). And conversely, there are active pressures to pursue stories that tell people what they want to hear, to give them lots of celebrity and TV-based coverage, and to subscribe to every moral panic.”

I do strongly believe that people are still willing to pay for quality, but as this review points out, that’s not what’s being offered by most of our media outlets. The book looks interesting, definitely worth a read.

Haven’t done one of these for a bit, so let’s clear out some useful bookmarks:
Another really nice improvement on PubMed searches. Like GoPubMed, ClusterMed provides a variety of categories to narrow down your searches to find the paper you’re seeking. I found this site through Bitesize Bio, which is still consistently one of the best biology blogs out there. Instead of the usual opinion pieces or off-topic rants, Bitesize Bio publishes a constant stream of really useful information and tips for the bench.

A sea of digital cameras
This photo made me feel old, and at the same time reminded me of hiring a wedding photographer, because if you don’t have pictures of an event, did it really happen?

Online Lab Notebooks

Good post by Cameron Neylon looking at the requirements for keeping your lab notebook online. As you can tell from the comment I left, I worry about either the IT overhead this is going to cause, or that we’d be placing our data in the very shaky hands of “the cloud”. Great article on how much you should trust cloud computing here.

Social Networks for Scientists
That post and this one from Richard Grant on the failure of “Myspace for scientists” got me thinking–are there any features unique to the myriad social networks for scientists sites that are useful? Are they offering any tools beyond what you could get on Facebook or LinkedIn that you find valuable?

Costs for e-Books
I think this points out what’s going to be a major problem for the e-book market–price. For us, paper, printing and binding are not the biggest expense when producing a book. The heavy level of editorial input, rewriting, development, design, indexing, etc., are the biggest costs. And those don’t go away when you’re doing an e-book instead of a print one. Will consumers be satisfied with e-books that cost 10% less than paper ones, if that’s truly reflective of the costs of production?

The death of journalism

Lots of recent articles have come out on the death of newspapers, particularly Seth Godin’s one about the real loss, quality journalism. The usually right-on-the-money Scholarly Kitchen responded with this article, which I think is way off base. Blogs don’t come close to replicating real, quality journalism. It reminded me of a recent piece by Warren Ellis, in which he discusses recent events in Mumbai and a talk by David Simon, co-creator of The Wire:

“His argument is that journalism is an honest-to-god job, with skills, that you have to learn in order to do it right. Citizen journalism just doesn’t cut it….Citizen journalism ate it in the US. Dan Gillmor, who had been talking of nothing else for years, launched Bayosphere–because what the world needed, see, was another website about people talking about the San Francisco Bay Area–which fell apart five minutes later. Citizen journalism looks like sites like, whose above-the-fold right now blazes with the hottest news story in town–local church members knitted some woolen caps for charity… The metroblogging sites…are great fun, but at best they’re arts journalism and in general they’re a listings magazine and linkbloggers. They’re very rarely working their own sources, doing original reporting or in broad terms, doing the work of journalists. The five rules of journalism–who, what, where, when and why–aren’t there because people like pissing you off with rules. They’re there because that’s how you learn things and that’s how you explain things, and that, eventually, is how you see that events and people are connected…and that’s how we build up a picture of the world and begin to understand where we are today and what it really looks like.

Linkblogs and Wikis are great for pointing you to original source material, but what purpose will they serve without that source material? A citizen journalist in the early 1970’s at the Watergate hotel might have sent out a tweet that the police were arresting someone for breaking and entering, but would that have led to the downfall of a president? I think good investigative journalism is something of value. But then again, what do I know, I’m a luddite, I still pay for music.

Given some of the comment reactions from my last posting, perhaps I wasn’t clear enough in what I was trying to say, so a bit more here. As many have pointed out, scientists have long been bombarded with large amounts of potentially useful information, and have developed a sophisticated set of filters to deal with it, both on and offline. That’s not the issue. The issue is that, due to the exponential growth in the amount of research being done and published, even with highly effective filters that eliminate everything extraneous, one is often still left with more information than can be dealt with in a reasonable amount of time. Let me try to explain with a hypothetical example:

I’m a professor at University X. I have a busy schedule, between doing my own bench research, writing grants, managing my students/postdocs and my faculty duties. I have time in my schedule to read (choosing this number randomly) 10 papers a week in depth for a full understanding. 25 years ago, this was fine. The filters I had built pointed me toward 4 quality papers a week directly relevant to my research, and this allowed me to read 6 other papers in other fields. I had complete knowledge of the important work in my own field, plus a good working knowledge of many other fields that could be applied to my own. Fast forward to today, using even better filters, including Connotea, Digg, Science Blogs, what-have-you, I am now pointed toward 12 quality papers a week directly relevant to my research. This is not a filter failure–my filters are better than ever. They’re discarding more than ever before. But the quantity of research published has increased so much that even with more powerful filters, there’s more directly relevant information out there that I need to take in. I have no time for papers outside of my own field, not even enough time for the papers within my field.

That’s what most scientists I know mean by “information overload”. They’re filtering like crazy, but due to the exponential growth in research and journals, there’s more knowledge to assimilate. The solutions available seem to be:

1) Specialization–this is basically the answer I’m being given by those who just say that we merely need to improve our filters and eliminate more material. Doing so means a shallower knowledge of our own field, and a much shallower knowledge of other fields. This is not good for science, and seems contradictory to the cross-disciplinary world that science has become, where the skill set required is much bigger than ever. The more one filters, the more one narrows one’s focus.

2) Spend more time with the literature–this seems to be the approach most scientists are taking, and other parts of their careers and lives are suffering for it. Either their students, their universities or their families end up neglected.

Yes, it’s true, as AJ Cann notes, “every scientist since Aristotle has suffered from information overload,” but the quantity of that overload has grown exponentially. It’s one thing to follow the dozens of labs doing molecular biology in the late 1950’s, it’s another to follow the tens of thousands (if not hundreds) of molecular biology labs today. At some point, even the most sophisticated filters become overwhelmed, or at least they return more information than one can read without sacrificing elsewhere. And many are finding this frustrating, finding that it takes away from some other part of their research/lives. Solving the problem with more filters just means more specialization, which is also a sacrifice, and a way toward doing less important, less interesting science.

This has been bothering me for a while now, dating back to last year, when I first heard Clay Shirky’s very pithy statement that information overload isn’t a real problem, the real problem is a failure to build effective filters. It’s a catchy little phrase, and like most theories from Web 2.0 gurus, it seems reasonable on the surface, but when applied to the world of scientists, it’s less than useful.

The O’Reilly Radar blog had a link this week to an interview with Shirky where he discusses the concept in detail, which was helpful to finally get a handle on what he means and why it’s irrelevant in my world:

“…the information overload people are the most narcissistic because information overload started in Alexandria, in the library of Alexandria, right? That was the first example where we have concrete archaeological evidence that there was more information in one place than one human being could deal with in one lifetime, which is almost the definition of information overload. And the first deep attempt to categorize knowledge so that you could subset; the first take on the information filtering problem appears in the library of Alexandria.

By the time that the publishing industries spun up in Venice in the early- to mid-1500s, the ability to have access to more reading material than you could finish in a lifetime is now starting to become a general problem of the educated classes. And by the 1800s, it’s a general problem of the middle class. So there is no such thing as information overload, there’s only filter failure, right? Which is to say the normal case of modern life is information overload for all educated members of society.

If you took the contents of an average Barnes and Noble, and you dumped it into the streets and said to someone, “You know what’s in there? There’s some works of Auden in there, there’s some Plato in there. Wade on in and you’ll find what you like.” And if you wade on in, you know what you’d get? You’d get Chicken Soup for the Soul. Or, you’d get Love’s Tender Fear. You’d get all this junk. The reason we think that there’s not an information overload problem in a Barnes and Noble or a library is that we’re actually used to the cataloging system. On the Web, we’re just not used to the filters yet, and so it seems like “Oh, there’s so much more information.” But, in fact, from the 1500s on, that’s been the normal case.

Okay, so if by “information overload”, you mean that there’s more interesting stuff out there than I could ever handle if I tried to read all of it, fine, Shirky’s comments make sense. But that’s not what the scientists I talk to on a daily basis mean by “information overload”. What they mean is that we’re seeing huge increases in both the numbers of people doing scientific research, and the numbers of scientific papers being published. While I hate to quote Wikipedia, the numbers listed there (take these with a grain of salt as one should all Wikipedia content) show an estimate of 11,500 total scientific journals in 1981, and over 40,000 listed in 2008 in PubMed in fields related to medical science alone.

Now, most scientists are familiar with the “cataloging system” of scientific journals, they’ve been reading them their entire careers. Everyone has their own filters, their own rankings of which journals are more interesting, or publish better work than others. And all kinds of tools are available for filtering things down to just the relevant essentials for keeping up with your own field. But even so, most people that I talk to are left with more useful, relevant articles that they need to read than they have time to get to. These are not articles that should be filtered out. These are important, quality findings of direct relevance to their own work. And there are too many of them without even factoring in a need to keep up with science in general and see what developments in other fields can be applied to one’s own.

So no, it’s not a filter failure. It’s a genuine overload. A “filter failure” implies that scientists are just not tossing out the less relevant material, but that’s not what’s happening (as an example, almost no scientists I know read science blogs–those are something filtered out as being of less value than the primary literature). Is it so hard to believe that as science and technology move forward, that more and more research is being done, and that there’s more knowledge generated that one should take in? Is it wrong to want to be as informed as possible of one’s own field, and to seek ways of assimilating more research, rather than ways of discarding valuable information?

Shirky’s suggested solution is of no help here:

“So, the real question is, how do we design filters that let us find our way through this particular abundance of information? And, you know, my answer to that question has been: the only group that can catalog everything is everybody. One of the reasons you see this enormous move towards social filters, as with Digg, as with, as with Google Reader, in a way, is simply that the scale of the problem has exceeded what professional catalogers can do.”

I don’t know about you, but I’m not sure how much I’m willing to trust a random group of strangers to tell me how relevant a particular paper is to my own research. Sure, you can get some sense of the quality of the work, perhaps even a decent summary. But no one knows your work as well as you do, and no one is going to be able to tell you what tiny details in a paper will or won’t act as a springboard for new avenues of research. I’d also argue that the top researchers are probably better at discerning those details, and if they leave the paper-reading to others, they’re going to miss out on much of what makes them better than their peers and science is going to suffer.

So while social filtering like that described does have its uses, it’s not the solution here. Social filtering is nice for discovery, for finding papers you might not have read on your own, but that’s not the problem I hear from most scientists. Most aren’t looking for more to read.

Shirky’s point may be relevant in some situations (certainly anyone looking to read every book in the Library of Alexandria will learn a valuable lesson from him), but like most Web 2.0 wisdom, it fails when applied to the particular needs of scientists. As the old phrase goes, “To a hammer the world looks like nails” and Shirky often strikes me as yet another Web 2.0 evangelist trying to convince us that our individual needs are all the same easily hammered nails.

Update: in response to some of the comments, I’ve tried to clarify things with a further posting on this subject, part 2, available here.

« Previous PageNext Page »