« New blog: all things cataloged | Main | RDA in Europe, a seminar review »

Thursday, September 16, 2010


Feed You can follow this conversation by subscribing to the comment feed for this post.

Jonathan Rochkind

Don't get too self-congratulatory though.

" Woody Allen is mentioned in 325 books ostensibly published before he was born.

Other errors include misattributed authors -- Sigmund Freud is listed as a co-author of a book on the Mosaic Web browser and Henry James is credited with writing "Madame Bovary.""

My local cataloging corpus has many errors of this sort too, and I bet most readers' do too. Although you're probably going to see different types of errors in the different approaches -- library cataloging is less likely to mis-attribute an author/creator (although more likely than Google to simply leave them out altogether, if they weren't an AACR/AACR2 'main entry'), but probably nearly as likely as Google to have the wrong dates on items in actual search filters -- as 260 dates are transcribed and not suitable for machine processing, and the fixed field dates, not often used by 1980s-2000s OPACs, are often neglected. There are probably other sorts of errors even more likely in library cataloged corpuses than Google's database.

Metadata is hard.

Jonathan Rochkind

PS: "Although Google representatives did respond to Nunberg's article, blaming the bulk of the errors on outside contractors,"

Much (I have no idea how much, but I know some) of Google's metadata actually comes, believe it or not, from trying to make sense of library MARC. Both from scanning partners, and from OCLC. We're lucky Google's representatives didn't publicly blame libraries.

Christine Schwartz

Hi Jonathan,

Thanks for the comments. I'll reply to the second one first.

I'm familiar with the Internet Archive's workflow and they make good use of MARC data. I think Google's problems comes from capturing metadata after the fact rather than at the point of scanning. My impression from reading the articles and blog posts last year was that initially Google didn't think about metadata at all. But I'll have to go back and read everything again.

carol seiler

Yes, yes and yes. I love this!
@Jonathan, I agree there are problems in all sorts of metadata but these mistakes are pretty obviously glaring and seem to be machine driven. It is easily to misattribute but then to mass update that misattribution...well, this is when you can get such a wide spread mass of errors.
@Christine, I agree. I think trying to link the data after the fact contributed. I'd add that making it a mostly automated process without understanding MARC is what contributed greatly to this.
I'd still love to get involved in the clean-up if Google is hiring such [grin]

The comments to this entry are closed.

Scope of blog

  • The focus of this blog is the future of cataloging and metadata in libraries.

Enter your email address:

Delivered by FeedBurner

Twitter Updates

    follow me on Twitter

    July 2014

    Sun Mon Tue Wed Thu Fri Sat
        1 2 3 4 5
    6 7 8 9 10 11 12
    13 14 15 16 17 18 19
    20 21 22 23 24 25 26
    27 28 29 30 31    


    Future of Cataloging: Key Resources (to May 2008)

    Blog powered by Typepad
    Member since 04/2007