Das versteckte Web durchsuchbar machen

D-Lib-Artikel “Google Still Not Indexing Hidden Web URLs”. Im Artikel wird beklagt, dass Google OAIster & Co nicht unterstützt.

We […] conclude that Google has not endeavoured to increase their support and access to OAI materials. Even taking into account the caveats, we would also conclude that aggregations of OAI records are as valuable for user research purposes as they were at least two years ago.

From our own experience, we know that providing the OAIster records in bulk to Google proved problematic for them, and eventually they requested only the OAIster URLs instead of the complete metadata. We are not, at this point, certain that Google is using these URLs (crawling them) for addition to their search index.

It is also interesting to note that Google has recently dropped support of OAI for website indexing [6]. Given the resulting numbers from our investigation, it seems that Google needs to do much more to gather hidden resources, not less. (Granted, the OAI for Sitemaps feature may not have been an appropriate approach for Google.)

John Wilkin antwortet auf diesen Artikel:

As much as I like Kat’s and Josh’s analysis, I draw a different conclusion from the data. They write that, “[g]iven the resulting numbers from our investigation, it seems that Google needs to do much more to gather hidden resources.” This perspective is one many of us share. We’re inclined to point a finger at Google (or other search engines) and wish they tried harder to look into our arcane systems. We believe that if only Google and others had a deeper appreciation of our content or tried harder, this problem would go away. I’ve been fortunate enough to be able to try to advance this argument one-on-one with the heads of Google and Google Scholar, and their responses are similar–too much trouble for the value of the content. As time has passed, I’ve come to agree.

Dem kann man nur zustimmen. Wir müssen unsere Kataloge und Datenbanken zugänglicher machen für Suchmaschinen. Wenn wir das nicht machen, werden Bibliotheken als erste Anlaufstelle für bibliographische Recherche in Zukunft noch stärker von Amazon & Co in die Bredouille gebracht.

Für Katalogentwickler interessant ist eine andere Aussage von Wilkin:

We often go wrong, however, when we try to share our love of complexity with the consumers. We’ve come to understand that success in building our systems involves making complicated uses possible without at the same time requiring the user to have a complicated understanding of the resource. What we must also learn is that a simplified rendering of the content, so that it can be easily found by the search engines, is not an unfortunate compromise, but rather a necessary part of our work.