February 06, 2007
Conference Call
Members attending:
Southern
Consortium Staff: John Helmer
Data Harvesting
Dahl reported that the group has created a web page
summarizing
Kiegel noted that these issues
overlap with the work of the Next Generation Summit Interface group. The
inability to export data from Innovative limits our ability to consider
products such as Endeca that will only work if data
can be exported from the
Dahl said they plan to ask about what current reports are
used and what is inadequate. Then they'll analyze those responses to
identify raw data that's needed. He thinks that the Google Scholar and
Next Generation Summit projects will be the most critical uses for the data
initially. Boock noted that if we had access to
the data in
Dahl said the group suggested they could offer all the data as XML files or
provide an API for data queries. Regarding the API approach, that could
open possibilities for on-the-fly uses for system data. III staff said
they hope to have an API as part of Encore, but Dahl noted that it sounded like
a limited API, not necessarily useful for our projects. He suggested we
might ask for a "mashup API" to allow us to
do some web 2.0 projects using
Boock asked if III still offers the XML Server
product. He thinks they aren't building on it anymore. Dahl said
that this product can only access data from the web opac,
e.g. you can't request a list of records changed in the last week. Boock said that III sells a scheduler that will generate
these lists and export the data automatically. Does it work with
INN-Reach systems? We don't know.
Dahl reported that Mike Spalti has found a way to harvest data from the existing system, based on screen scraping, which he shared with III. Helmer commented that it’s laborious, a workaround, not optimal.
Dahl reported that he has been in touch with Marita Kunkel of the Accreditation Task Force. She will provide information on what data they would like to access.
Data Harvesting Group is scheduled to produce a report by March 1.
Short Term Changes to
Von Seggern reported that the group just completed a document with their recommendations and findings. The document will be available very soon. She summarized the report as follows:
Helmer asked about when work could actually be done. Von Seggern said that they intend to send report to Catalog Committee for review, but she doesn’t know when the work could actually be done. Helmer and others agreed that some of the work can be done soon. Kiegel suggested they structure the report based on what can be done when, e.g. what can be done now, what will require additional testing, money, etc. Dahl thinks the existing structure is fine and hopes they can implement the WebPacPro recommendations right away. Dahl asked about recommendation to do further usability testing—should the group do that this year? Or wait, since we may implement a next-generation interface? We can decide that after entire Catalog Committee sees the report, but the Steering Team should consider this question.
Dahl doesn’t want the group to spend our one face-to-face meeting of the whole committee talking about minor catalog changes. He recommends the discussion take place over e-mail. Von Seggern suggested that the questions could be handled via a survey.
The report will be posted soon, after a few minor corrections.
Von Seggern noted that the usabilty testing was helpful.
Next Generation
Kiegel reported that the group created subgroups to investigate several products via phone interviews and visits at ALA. The inability to export data from INN-Reach is a significant limitation, as many products require that data be exported and refreshed regularly. Without that capability, many of the leading contenders (e.g. Endeca, Aquabrowser) are not viable options. Without the ability to export adds, deletes, and changes from INN-Reach, we would have to export the entire database every night to use tools like Aquabrowser or Endeca.
For now, the group is recommending one of two paths:
The group has started a draft report. They plan to finish by March 1, on time. Kiegel commented that a next generation interface will be expensive, no matter which product or direction is chosen.
Dahl suggested we not remove anything from consideration because of data harvesting concerns. Crum indicated that the report will include write-ups on each product.
Kiegel noted that OCLC’s Worldcat.org product is a possibility. There were questions about which holdings display in worldcat.org, including whether or not a FirstSearch subscription will be required for holdings to show in the new premium product. A FirstSearch subscription is currently required for holdings to show in worldcat.org.
Helmer noted that Council expects that data export likely will be required for a next generation interface.
Helmer commented that OhioLINK is reaching the size limits of INN-Reach, around 11 million records. They will use Lucene for indexing once they reach the limit. III will install it for them and reindex using that product. Dahl explained that Lucene is an open source search engine. Could we use that product to index other things?
Duplicate Record Reduction
Boock reported that the report
will be done by February 15. The report identifies two principal causes
of duplicates in
The report will provide several recommendations:
The report will also include some guidelines for handling non-OCLC record sets, along with the text of an enhancement that Crum wrote.
Helmer asked if staff with load
profile training could edit profiles for other libraries. Crum said that
libraries are only given the ability to edit load tables when a member of their
staff attends load profile training. She said we would likely have to
negotiate with III to allow someone from the
Dahl noted that we will always have a usability problem with multiple records for the same serial title. Solving that problem requires a FRBR-like solution.
Dahl asked if anyone had any comments re: how to gather information on data harvesting needs. No one had any comments to add to the discussion under Data Harvesting, above.
Dahl asked Susie survey committee members to find a date in March for a meeting. They looked at March 12-16 and March 19-23. No dates were especially good, with at least 8 people unavailable for any given date. He would prefer to go ahead with a March meeting even though not everyone can make it. We could conference people in by phone/video or allow people to send substitutes if needed. Helmer commented that scheduling needs to be done way in advance for groups of this size. Dahl said they would try to schedule the meeting for the week of the 12th; the best day that week is Tuesday, March 13.
Minutes prepared by: Janet Crum
approved: Mark Dahl, February 12, 2007