Tim Spalding zu Recommendersystemen

Librarythings Tim Spalding reagiert via NGC4LIB auf einen Bibtip-Artikel in DLIB:

1. You can’t draw conclusions based upon a small number of overlapping “trips.” If one trip were enough and you knew I’d looked at something super-obscure, you could probably figure out the other pages I’d looked at too. Just go to the page you saw me browsing and see what appears in the “people who looked at this also looked at…” box. If my obscure book of Hellenistic poetry overlaps with “Having an Affair for Dummies,” I’m in trouble with the missus.

Korrekt. Doch dagegen setzt Bibtip auf ausschließliches Anzeigen multipler Übereinstimmungen. Doch:

2. But if you need multiple overlaps, the amount of usable data goes way down. This is, I submit, what Ann Arbor’s recommendation system showed. You need a lot of data in a recommendation system for it to work. The worse the data, the more you need. (On LibraryThing, we do not generally even *try* to make a recommendation when there are fewer than 15 copies of a book in the system, and those aren’t books you casually looked at, those are books in people’s personal collection.)

“A lot of data” kann man auch generieren. Ich vermute, dass gerade dazu die lange “Inkubationszeit” zwischen Installation und Ergebnisanzeige bei Bibtip zu diesem Zweck dient.

3. Like other systems that follow where users go, not whether they liked it there and what they did there, BibTip is susceptible to “ant navigation” problems. You know how ants find their way about? They follow the trail put down by other ants. This works well in general, but it can also go bad. An ant gets lost. Another ant happens on the trail, and gets lost too, a third and sees a really strong trail, so three are lost, etc. At its worst you have the famous phenomenon of ants going round and round in a circle, following other ants and their ever-stronger trail, until all the ants die of exhaustion!

Diesem Manko könnte man nur entgegen treten, wenn man Ausleihzahlen mit in die Auswertung aufnehmen würde. Das wirft in der Praxis jedoch erhebliche Probleme auf. Genannt seien hier Datenschutz und mangelnde Schnittstellen der Bibliothekssysteme.

I ask you: Do we want library patrons dying of exhaustion?

Zumindest nicht oft.

4. In all seriousness, the ant problem is real. Every time the catalog sends you somewhere you don’t want to go, you’ve made a trail telling the next guy to go there too. If library catalogs worked, ant-tracking would too. But when I type “Harry Potter” into the search box of a large public library I use all the time, I don’t get a real-live English-language Harry Potter book until item number nine!

Prinzipiell hat Spalding Recht, nur sehe ich den Zusammenhang zu Katalogrankings nicht. Meist ist dort eine spezielle Sortierung voreingestellt, oft nach Erscheinungsjahr oder nach Datum der Katalogisierung. Was wirklich nichts mit Recommendersystemen oder Benutzerempfehlungen zu tun hat. Vielleicht geht Spalding allerdings auch von suchmaschinengestützten Katalogen aus?