Thursday, September 03, 2009


Allen Mullen

Why would they want to recreate the wheel in the same way silo-based libraries do? They will likely license (or capture) library metadata, refine full-search particularly in faceting, and most importantly (in the long-run) develop and aggregate a social network discovery layer. A professional cataloger/library metadata specialist or two on staff to lend that perspective to a team *might* be helpful but not necessary.

Graeme Williams

I have a couple of problems with Nunberg's article.

Nunberg says, "It might seem easy to cherry-pick howlers from a corpus as exensive as this one, but these errors are endemic." I don't see how you can conclude that errors in Google's metadata are endemic unless you know the error rate. Nunberg presents all sorts of examples of errors, but his examples *are* cherry-picking.

One way to figure out the error rate in Google's metadata would be to pull, say, 1000 books at random and see if the metadata is correct. Even 100 samples would give you a rough estimate. Nunberg might well have an estimate of Google's error rate, but I don't see it in the article.

The other problem is that the implicit message is that while Google's error rate is unacceptable, library catalogs' are better. Well, I don't know of any studies of the error rate in catalog metadata, and Nunberg hasn't cited any.

Certainly there are plenty of errors in my local public library catalog. Duplicate name authority records, for example, are "endemic".

Nunberg hasn't convinced me that Google Books is any worse than any other catalog.

Christine Schwartz

Hi Allen,

Sorry for such a late reply. I'm still reading and thinking about these issues surrounding Google Book Search.

I was actually trying to be facetious when I first wrote this post about Google needing to hire a few good catalogers. But the more I read, the more I think this is not such a bad idea. I don't think they will be cataloging in the traditional sense (one book after another). What they could be doing is helping with the complex decisions that will come up when dealing with bibliographic control. I envision this would be done after automated processing.

Catalogers could lend a hand with the decision-making of how best to automate the process of aggregating and capture good metadata. After all, we've been in this library organization business a lot longer than Google has!

