There are thousands of bib records in the Alliance Network Zone (NZ) that contain two or more different OCLC numbers in separate 035$a ‘Other System Number’ subfields. This causes problems when a record needs to be overlaid (either by the automated daily load of OCLC updated bib records or a manual download using Connexion), or manually edited.
Why does this problem exist?
We have so far identified two separate causes for the presence of bib records in the NZ with multiple OCLC numbers. First, since we first began the automatic loading of updated OCLC master records, we have used a merge rule named ‘OCA Bib Overlay (Keep 035s)’ that controls the overlay both when loading OCLC master records using an import profile and when downloading a record using Connexion. This merge rule is also used in the majority of institutional import profiles. This merge rule retains all 035 fields from the existing Alma bib record when an NZ bib record is overlaid. This was necessary so that vendor record numbers such as Marcive brief shipping record numbers and YBP title record numbers, which are not present in the OCLC master records, would not be lost when an Alliance NZ bib record was overlaid with an updated record from OCLC. However, from the beginning it was recognized that this approach had a negative side effect that when two records were merged in WorldCat, some records when overlaid in the Alma NZ would have both the old and new OCLC records numbers in separate 035$a subfields. However, it was felt that this negative side effect was worth the benefit of retaining the vendor record numbers after records are overlaid.
Why talk about this problem now?
The second known cause for the presence of these records is when an Alliance library loads vendor records (such as YBP records) and uses a match method (in the import profile) which allows matching on a data point other than OCLC number (such as the ISBN). When using a match point other than the OCLC number, an existing NZ bib record (for example OCLC #123) can potentially be overlaid with a different OCLC master record (for example OCLC #345), creating an NZ bib record with two OCLC numbers – OCLC #123 (the correct OCLC number) and OCLC #345 (an inappropriate OCLC number).
We have been working with Ex Libris for quite a while to develop a merge rule that both allowed retention of the vendor record numbers, while not retaining out-of-date OCLC 035 fields. After quite a bit of investigation, Ex Libris decided the best approach would be to add the capability for a merge rule to retain 035 (or any) fields except when the fields contain a specific string (such as ‘OCoLC’). This new merge rule functionality is being deployed as part of the Alma March 2016 Release.
What will happen after the Alma March 2016 Release?
What happens to all the NZ bib records that already contain multiple, different OCLC numbers?
After the March Release is deployed on the production environment, I will create a test merge rule in the Alliance NZ account and make sure the new merge rule functionality works as expected. If it does, I will modify the ‘OCA Bib Overlay (Keep 035s)’ merge rule in the Alliance NZ account to use the new functionality. From that moment on, any import profiles (and the Connexion download configuration) which use this Alliance NZ merge rule, will stop retaining OCLC 035 fields from existing bib records when an overlay occurs.
As soon as possible, we will begin a multi-step cleanup project to fix the records that have multiple, different OCLC numbers. The first challenge is to identify the records with different OCLC numbers in 035$a subfields. We should be able to run a report in Alma Analytics that lists each bib record MMS ID along with the 035 fields. While creating this type of report works in the institutional IZs, it does not work in the NZ. Ex Libris staff are currently trying to figure out why Analytics in the NZ account cannot create this report. If Ex Libris is unable to get this report to run correctly, we have asked them to either extract the data for us, or develop a method for extracting the data using an Alma API (but that approach is probably too slow to handle a database as large as ours).
After we obtain the list of MMS IDs and 035 fields, Kyle Banerjee (OHSU), will create a script that will find the records with different OCLC numbers in 035$a subfields. Using that list of MMS IDs and OCLC numbers, I will download the current version of all affected bib records from WorldCat using Connexion. Once I have that set of bib records, I will use a combination of MarcEdit and Excel and for each NZ bib record with multiple OCLC numbers, I will add a 9XX marker in each bib record indicating whether the OCLC numbers in that record are contained in the same bib record in WorldCat or in different WorldCat records.
For the set of records in which all the 035$a OCLC numbers are in the same WorldCat bib record (i.e., either in the OCLC 001 field or 019 field), I will overlay these records with their current version in WorldCat. Because this import will be using the new merge method, the overlaid records will contain only one 035$a subfield and that subfield will contain the current OCLC number. One problem I expect to encounter is that some percentage of these records will fail to load due to the multi-match problem, i.e., there will be multiple records in our NZ with the same OCLC number. These will have to be downloaded manually, modifying other records so that the download can be successful.
The records that have OCLC numbers from multiple WorldCat bib records will have to be fixed manually. A group of volunteers will need to examine each of these records and decide which WorldCat record is the best match for the attached inventory (for one sample library), and notify all other institutions with inventory attached to the record so they can also examine the record and decide if the selected bib record is appropriate for their inventory, of if their inventory needs to be moved to a different bib record. If you have read this far, you may be a good candidate for this group of volunteers. Let me know (email firstname.lastname@example.org) if you would be available to help with this cleanup project.
Will these changes and cleanup projects fix all the problems?
No. We already know of two problems that will require additional exploration and development of separate solutions.
First, when OCLC merges records (either by the OCLC Quality Control team or the automated DDR software), in a small number of cases the merges are found to be inappropriate (e.g., different editions or formats) and OCLC “unmerges” the records. I have communicated with OCLC and as a percentage, the number of records “unmerged” is fairly small. For example, ‘in January of this year DDR merged 75,013 duplicate records while [OCLC] recovered 61 merges (.08%).’ However, ‘there is no way to tell from looking at a bib record that is was involved in a record recovery’ and OCLC does not maintain a publicly accessible list of records which have been “unmerged.” Some Alliance libraries have defined logical sets which look for a format mismatch between a bib record and the attached inventory. This may identify some of the “unmerged” records. However, we need to continue looking for a better method of identifying “unmerged” records.
The second remaining issue, that we know of so far, is when libraries need to use a match method other than ‘Unique OCLC Identifier Match Method.’ These imports will continue to run the risk of overlaying one WorldCat bib record with a different WorldCat record. The TSWG YBP EOCR Load Group is currently looking at other approaches that could be used to load records where a non-OCLC number match point is needed. Stay tuned for their report and recommendations. However, another possible approach might be the creation of a new merge method which uses the existing logic, i.e., it would retain ALL 035 fields and therefore we would be able to look for records with multiple 035$a OCLC numbers to find instances where one bib record was overlaid by a different record. However, the viability of this approach depends on our ability to easily extract a new list of MMS IDs and 035$a subfields (on a regular ongoing basis), and identify within that list which records have multiple OCLC numbers.
So what can we expect to see?
Here is the summary of what I expect will happen:
- Early March 2016 – The ‘OCA Bib Overlay (Keep 035s)’ will be modified so that out-of-date OCLC numbers are no longer retained when records are imported into Alma;
- Date unknown – Ex Libris either provides the list of MMS IDs and 035$a”s we need, or offers an approach that allows us to extract that data. Only after this happens can we begin the cleanup process.
|Current phase: NA||Written by: Bob Thomas
|Approved by: NA
||Last updated: 2/18/2016|
|Staff Contact: Cassie Schmitt||Nature of last update: NA