Reading William Gunn’s recent blog posting, Could this be the Science Social Networking killer app? got me thinking more about the many online scientific reference list repositories like Connotea, CiteULike and 2Collab, and why they are failing to catch on. William is suggesting a Pandora-like system of expert reviewers tagging papers to set up a recommendation system. I’m not sure this would be really helpful–what you get from a scientific paper is very different from what you get from listening to a song, and their interconnectedness works in very different ways. And it brings to mind the failings of organizing your references by tags.
If you’ve ever dealt with any of these social bookmarking sites, you know how incredibly tedious they are to use. Even for journals like CSH Protocols, where we have buttons on every article to add it directly to these sites, you still end up jumping through hoops, filling out forms, writing summaries, adding tags. You’re on the spot at that moment to come up with a list of tags that will remind you about the content of that paper. As your worldview changes over time, and with it your research priorities, you’re probably going to want to revisit many papers and add additional tags. Even with all this time-consuming work, you still may not have added an appropriate tag to let you find what you want to find at a given moment. Did you add a tag for every method used in the paper? Every conclusion, every subject referenced? That band on the gel in figure 3 that you’re ignoring today might be very important to you tomorrow. How are you going to tag the paper in case you need to find it again?
It’s more work than it’s worth, particularly given the ability to do full-text searches on your collection of articles through programs like Papers (out for the iPhone this week, by the way). Why tag every single aspect of a paper when you can just do a quick search? Even Google Scholar and PubMed strike me as much easier tools to use for these purposes than social bookmarking sites. It’s why Apple, Microsoft and Google have spent so much time and money on desktop search applications over the last few years. Investing efforts in organizational schemes is pointless when you can call up the file you need via a quick search. Sure, with Papers or the search engines, you lose the social aspects of things, the use of the network as a discovery tool. Then again, your hard work adding tags isn’t helping you discover new papers, it’s helping other people. You have to hope that others are tagging papers as relentlessly as you and that they’re tagging the aspects of papers that fit your interests.
The tedium of tagging versus the relative ease of searching brought to mind this recent article from John Gruber on the Daring Fireball site, Untitled Document Syndrome. Gruber talks about friction, the number of tedious steps involved in so many programs:
“There’s the stuff you want to do, and there’s the stuff you have to do before you can do what you want to do. People have a natural tendency to skip the have to do stuff to get right to the want to do stuff if they can get away with it. Friction is resistance.”
His example is writing a Word document. How often do you find yourself starting a new document, and writing for a long while without actually saving it? Saving changes to an already saved document is trivial, a keystroke away, but a new document means you have to go through that dialogue box, figure out where you’re saving it, come up with a title, etc.
“The obvious problem with Untitled Document Syndrome is in the rare cases where you lose data because you never saved it. The non-obvious problem is that the mental friction posed by the Save dialog often keeps you from ever even creating or saving small items of data in the first place.”
Gruber talks about the different approach taken by programs like Apple’s iLife suite, where you just dump in music, video or photos, and you don’t have to worry about naming them, or deciding where to store them. The program does it for you. It’s no surprise then, that programs like Papers or Yojimbo, which are based on the same iLife-style interface, are so much easier to use for organizing your scientific research list. Given the time demands faced by scientists, it’s no wonder I’ve heard rave after rave about Papers, but never really receive much more than a shrug when discussing online reference sites with folks at the bench.
One other caveat–if you are going to invest your time in tagging, be sure to regularly extract your account’s information and back it up. Sites may disappear at the drop of a hat. Some of the science paper bookmarking sites are clearly unaware of the Napster and Grokster court decisions and their willingness to become redistributors of copyrighted material places them only a lawsuit away from the abyss. And even with those backups, uploading them to the next site is never as clean as you’d like it to be. Be prepared to repeat a lot of your efforts.
February 23, 2009 at 2:30 pm
David, how do you think Mendeley fits into this picture? It’s primarily a tool for your own references and those of your research group, but it also has social features (that probably become more important once the site is more popular).
February 23, 2009 at 2:47 pm
I think Mendeley has a really clever approach, in that they’re trying to take advantage of the best of both worlds. They’re emulating the organizational programs you keep on your own computer which allow full-text searching of pdf’s you’ve downloaded, and then synching that data to an online repository where you lose the searching but gain the social interaction. At this point, they still need to work out some of the kinks, and Papers is still the gold standard for functionality. They’re really the first ones to put the two worlds together though. But, there are some other obvious issues with what they’re doing that worry me about their longevity should their product catch on.
February 24, 2009 at 8:16 am
Hi David,
I disagree that plain search is a sufficient organization tool. Tagging can assist in culling the numbers of results to a search query, particularly on a large collection of papers.
I do agree that there is too much friction in how tagging is currently implemented.
I see you are also a fan of John Gruber’s blog. In reference to Yojimbo he linked recently to an interesting discussion on everything buckets. -> http://daringfireball.net/linked/2009/02/09/everything-buckets pointing out some of the pain inherent in such systems.
It seems likely that there will not be a one size fits all approach. Perhaps the all singing and dancing semantic web will take care of things for us (I never managed to get Nepomuk installed correctly, so my vote on this is on hold at the moment).
February 24, 2009 at 8:43 am
Ian I agree that there are advantages to tagging–for example, GoPubMed is a very helpful tool, with its hierarchy for narrowing down results. Then again, if you’re talking about your own collection of frequently consulted resources, you’ve already done a lot of that narrowing down. I do have hopes for better semantic tools, but like you, I’m looking forward to seeing them actually exist.
February 24, 2009 at 5:12 pm
Hi David,
thanks for sharing your thoughts on Mendeley. I just wanted to add that we’ll also be adding full-text search to our Online Library, so you can choose to do both – rely on tagging or on full-text search.
I’d also be interested to learn what you think the issues might be regarding our longevity when Mendeley catches on – we’re always trying to improve!
Best wishes,
Victor
February 24, 2009 at 5:17 pm
Ah – sorry, I’m a bit slow tonight. You were talking about the copyright issue (referring to Napster/Grokster), I’m assuming. I thought the discussion we had when I visited you as CSH was quite illuminating – so at least you know we’re not unaware of these issues!
February 24, 2009 at 6:54 pm
That’s the gist of it Victor. I think you guys are doing interesting things and making good tools, and I’ll be saddened if you run into trouble over these issues. Then again, you’re not alone, which surprises me. I’d think there would be a better understanding of previous cases and the reasons everyone switched to P2P systems rather than centralized servers. I’m also surprised not to see more care given to statements made that may be interpreted to encourage infringement ala Grokster. The other case to keep an eye on is the lawsuit against MP3tunes.com, where EMI is alleging copyright violations for merely storing (and not distributing) copyrighted materials.
February 25, 2009 at 2:04 pm
I wrote a somewhat similar blog post several months ago, about ease of use (and lack thereof) in reference managers.
Nobody wants to do any *more* work than they’re currently doing, and a lot of the social sharing tools are based on making an effort.
February 25, 2009 at 3:08 pm
[…] Hey Paul Why article tagging doesn’t work: [Via Bench Marks] Reading William Gunn’s recent blog posting, Could this be the Science […]
February 25, 2009 at 3:15 pm
Some of these problems are not just limited to science publications. Almost any database dealing with cutting edge science has problems applying proper semantic filters. Things like what is the proper tag from the list (which may not really include a tag for the protein you are working on), trying to add a tag but not having a sufficiently useful one for anyone else to find and proper curating so that papers get retagged after new information alters the paper’s thrust.
Scientists will move onto this when it really provides a novel purpose for the individual scientist and their work. Few are going to take the time to tag an article or recommend it purely because it is useful for the community.
The pluralistic network effects of Web 2.0 approaches often have to grow out of purely selfish motives.
February 25, 2009 at 6:14 pm
Yes, it’s a very tricky issue – and you wouldn’t guess what happened today: Today, the MPAA asked me to be an expert witness in their trial against BitTorrent because of a couple of filesharing papers I published a while ago 🙂 Can’t make these things up.
Also, to add a further twist to this story, you may recall that I mentioned the background of our new investor when we met at CSH – today I can reveal that he was head of Digital Strategy at Warner Music: http://www.techcrunch.com/2009/02/25/mendeley-snags-2-million-in-early-stage-funding-for-research-paper-management-tool.
But I still agree that you have a very good point, which we’ll continue to ponder as we chart these murky waters.
March 6, 2009 at 3:26 pm
[…] Why article tagging doesn’t work | Bench Marks […]
March 23, 2009 at 10:05 pm
[…] Why article tagging doesn’t work | Bench Marks […]