Sounds like Google needs to hire some good, professional catalogers.
In Google Books: A Metadata Train Wreck, Geoff Nunberg enumerates various inaccuracies in Google Books metadata. He argues that the lack of metadata quality really impedes scholars' use of these digitized books.
Why would they want to recreate the wheel in the same way silo-based libraries do? They will likely license (or capture) library metadata, refine full-search particularly in faceting, and most importantly (in the long-run) develop and aggregate a social network discovery layer. A professional cataloger/library metadata specialist or two on staff to lend that perspective to a team *might* be helpful but not necessary.
Posted by: Allen Mullen | Thursday, September 03, 2009 at 10:44 AM
I have a couple of problems with Nunberg's article.
Nunberg says, "It might seem easy to cherry-pick howlers from a corpus as exensive as this one, but these errors are endemic." I don't see how you can conclude that errors in Google's metadata are endemic unless you know the error rate. Nunberg presents all sorts of examples of errors, but his examples *are* cherry-picking.
One way to figure out the error rate in Google's metadata would be to pull, say, 1000 books at random and see if the metadata is correct. Even 100 samples would give you a rough estimate. Nunberg might well have an estimate of Google's error rate, but I don't see it in the article.
The other problem is that the implicit message is that while Google's error rate is unacceptable, library catalogs' are better. Well, I don't know of any studies of the error rate in catalog metadata, and Nunberg hasn't cited any.
Certainly there are plenty of errors in my local public library catalog. Duplicate name authority records, for example, are "endemic".
Nunberg hasn't convinced me that Google Books is any worse than any other catalog.
Posted by: Graeme Williams | Thursday, September 03, 2009 at 08:10 PM
Hi Allen,
Sorry for such a late reply. I'm still reading and thinking about these issues surrounding Google Book Search.
I was actually trying to be facetious when I first wrote this post about Google needing to hire a few good catalogers. But the more I read, the more I think this is not such a bad idea. I don't think they will be cataloging in the traditional sense (one book after another). What they could be doing is helping with the complex decisions that will come up when dealing with bibliographic control. I envision this would be done after automated processing.
Catalogers could lend a hand with the decision-making of how best to automate the process of aggregating and capture good metadata. After all, we've been in this library organization business a lot longer than Google has!
Posted by: Christine Schwartz | Tuesday, September 22, 2009 at 10:23 AM