Summit Catalog Committee Steering Team

February 06, 2007

Conference Call


Members attending:

Lewis & Clark College: Mark Dahl

Oregon Health & Science University: Janet Crum

Oregon State University: Michael Boock

Reed College: Marcia Bianchi

Southern Oregon University: Kate Cleland-Sipfle

University of Washington: Joe Kiegel

Washington State University: Marilyn Von Seggern

 

Consortium Staff: John Helmer

 

 


 

1. Progress reports from working groups

 

Data Harvesting

Dahl reported that the group has created a web page summarizing Summit’s current capabilities for reporting and data harvesting.  The group will also survey the Catalog Committee and the Collection Development Committee re: their data harvesting needs.  They are also continuing work on the Google Scholar initiative.  Several members of the group had a conference call with Innovative staff last Friday regarding this initiative and data harvesting.  The group asked Innovative what they could do for us re: getting data to support integration with Google Scholar.  They asked for a generic approach to harvesting data, both for the Google project and other projects.  It sounded like they were already working with Google on ways that INN-Reach systems could work with Google Scholar, but they are in the early stages of this work.  The group will provide more specifics about what kinds of data are needed.  Helmer added that he has corresponded with other INN-Reach consortia about data harvesting, and they are very interested in our discussions with Innovative. 

 

Kiegel noted that these issues overlap with the work of the Next Generation Summit Interface group.  The inability to export data from Innovative limits our ability to consider products such as Endeca that will only work if data can be exported from the Summit catalog.  He asked that the data harvesting group send the survey to the members of the Next Generation working group also, as that group may wish to add some data elements to the list.

 

Dahl said they plan to ask about what current reports are used and what is inadequate.  Then they'll analyze those responses to identify raw data that's needed.  He thinks that the Google Scholar and Next Generation Summit projects will be the most critical uses for the data initially.  Boock noted that if we had access to the data in Summit, we could do many things.  He said that OSU  met with Innovative staff (Mary Chevreau, Ted Fons, Betsy Graham, and Maruta Skujina) at ALA Midwinter, asking for similar access.  Helmer commented that they seemed to understand that we are not asking for more canned reports but for access to the data itself.  They think this access may be useful for the company also, that they may be able to use it as the basis for some new products or services.

Dahl said the group suggested they could offer all the data as XML files or provide an API for data queries.  Regarding the API approach, that could open possibilities for on-the-fly uses for system data.  III staff said they hope to have an API as part of Encore, but Dahl noted that it sounded like a limited API, not necessarily useful for our projects.  He suggested we might ask for a "mashup API" to allow us to do some web 2.0 projects using Summit data.

Boock asked if III still offers the XML Server product.  He thinks they aren't building on it anymore.  Dahl said that this product can only access data from the web opac, e.g. you can't request a list of records changed in the last week.  Boock said that III sells a scheduler that will generate these lists and export the data automatically.  Does it work with INN-Reach systems?  We don't know.

 

Dahl reported that Mike Spalti has found a way to harvest data from the existing system, based on screen scraping, which he shared with III.  Helmer commented that it’s laborious, a workaround, not optimal. 

 

Dahl reported that he has been in touch with Marita Kunkel of the Accreditation Task Force.  She will provide information on what data they would like to access.

 

Data Harvesting Group is scheduled to produce a report by March 1.

 

 

Short Term Changes to Summit

Von Seggern reported that the group just completed a document with their recommendations and findings.  The document will be available very soon.  She summarized the report as follows:

 

  • Members of the group did quick and dirty usability testing the week of January 15
  • The report recommends turning on RightResult, increasing results display from12 to 50 results, and purchasing Spell Check.
  • Short-term recommendations include
    • Moving Submit button on keyword search page and allowing people to press ENTER instead of clicking on Submit
    • Clean up and prioritize pull-down menus for limits, e.g. material type
    • Use a consistent look and color for navigation buttons
    • Use more user-friendly language on buttons, e.g. Staff View instead of MARC
    • Remove Extended Display button
    • They found that users are confused about the types of subject searches, so they recommend providing instructions and examples
  • Long-term recommendations
    • Users shouldn’t have to re-enter their information for multiple requests; either implement batch requesting or set a cookie
    • Consider a tabbed approach on main page
    • Consider removing Another Search button, since pull-down menus can be used
    • Is pull-down menu of scopes used?
    • Consider offering an advanced keyword page that will add t:, etc., for qualified keyword searches
    • Consider linking to other resources, e.g. Google Scholar
    • Consider enhancing Extended Display instead of removing, e.g. adding edition information
  • Usability findings: Related to RightResult, keyword search box on first page, subject searching, ratings, reviews, style of catalog compared to others (how much information displays, book jackets

 

Helmer asked about when work could actually be done.  Von Seggern said that they intend to send report to Catalog Committee for review, but she doesn’t know when the work could actually be done.  Helmer and others agreed that some of the work can be done soon.  Kiegel suggested they structure the report based on what can be done when, e.g. what can be done now, what will require additional testing, money, etc.  Dahl thinks the existing structure is fine and hopes they can implement the WebPacPro recommendations right away.  Dahl asked about recommendation to do further usability testing—should the group do that this year?  Or wait, since we may implement a next-generation interface?  We can decide that after entire Catalog Committee sees the report, but the Steering Team should consider this question. 

 

Dahl doesn’t want the group to spend our one face-to-face meeting of the whole committee talking about minor catalog changes.  He recommends the discussion take place over e-mail.  Von Seggern suggested that the questions could be handled via a survey. 

 

The report will be posted soon, after a few minor corrections. 

 

Von Seggern noted that the usabilty testing was helpful. 

 

 

Next Generation Summit Interface

Kiegel reported that the group created subgroups to investigate several products via phone interviews and visits at ALA.  The inability to export data from INN-Reach is a significant limitation, as many products require that data be exported and refreshed regularly.  Without that capability, many of the leading contenders (e.g. Endeca, Aquabrowser) are not viable options.  Without the ability to export adds, deletes, and changes from INN-Reach, we would have to export the entire database every night to use tools like Aquabrowser or Endeca

 

For now, the group is recommending one of two paths:

  • Become a development partner for developing Encore on INN-Reach.  Innovative is interested in such a partnership.  The group recommends requiring III to provide access to an API or web services layer as part of the development partnership agreement.
  • Purchase XML Server, use it to harvest data, and invest in local development to develop enhancements using that data. 

 

The group has started a draft report.  They plan to finish by March 1, on time.  Kiegel commented that a next generation interface will be expensive, no matter which product or direction is chosen.  

 

Dahl suggested we not remove anything from consideration because of data harvesting concerns.  Crum indicated that the report will include write-ups on each product. 

 

Kiegel noted that OCLC’s Worldcat.org product is a possibility.  There were questions about which holdings display in worldcat.org, including whether or not a FirstSearch subscription will be required for holdings to show in the new premium product.  A FirstSearch subscription is currently required for holdings to show in worldcat.org. 

 

Helmer noted that Council expects that data export likely will be required for a next generation interface. 

 

Helmer commented that OhioLINK is reaching the size limits of INN-Reach, around 11 million records.  They will use Lucene for indexing once they reach the limit.  III will install it for them and reindex using that product.  Dahl explained that Lucene is an open source search engine.  Could we use that product to index other things? 

 

 

Duplicate Record Reduction

Boock reported that the report will be done by February 15.  The report identifies two principal causes of duplicates in Summit:

  • Brief records generated by ERM, which have no overlay field
  • Record sets that lack OCLC control numbers

 

The report will provide several recommendations:

  • Proposed enhancement for ERM to generate an 001 with a consistent alphabetic prefix + the ISSN
  • Create a knowledge base maintained by Alliance with information about sets of non-OCLC records, how an 001 should be constructed for each of these record sets.  When a library wants to load a set of non-OCLC records, staff could consult the knowledge base and/or Alliance staff.  Alliance staff could coordinate the load, edit load tables, and provide assistance using MARCEdit.
  • Summit Catalog Committee should review current contribution policy with respect to brief records and non-OCLC record sets.  If we cannot resolve these issues that create duplicates, we should consider not contributing these records to Summit

 

The report will also include some guidelines for handling non-OCLC record sets, along with the text of an enhancement that Crum wrote. 

 

Helmer asked if staff with load profile training could edit profiles for other libraries.  Crum said that libraries are only given the ability to edit load tables when a member of their staff attends load profile training.  She said we would likely have to negotiate with III to allow someone from the Alliance to edit load profiles for member libraries. 

 

Dahl noted that we will always have a usability problem with multiple records for the same serial title.  Solving that problem requires a FRBR-like solution. 

 

 

2. Discussion of Data Harvesting Needs 

Dahl asked if anyone had any comments re: how to gather information on data harvesting needs.  No one had any comments to add to the discussion under Data Harvesting, above.

 

3. Planning for Full Summit Catalog Committee Meeting

Dahl asked Susie survey committee members to find a date in March for a meeting.  They looked at March 12-16 and March 19-23.  No dates were especially good, with at least 8 people unavailable for any given date.  He would prefer to go ahead with a March meeting even though not everyone can make it.  We could conference people in by phone/video or allow people to send substitutes if needed.  Helmer commented that scheduling needs to be done way in advance for groups of this size.  Dahl said they would try to schedule the meeting for the week of the 12th; the best day that week is Tuesday, March 13. 

 

 


 

Minutes prepared by:  Janet Crum

approved: Mark Dahl, February 12, 2007