Left Menu Right Menu
March 17, 2016

Import Records into Primo from OAI Repositories


This is a step-by-step guide to importing records into Primo from external repositories that offer OAI feeds. It offers specific instructions for Digital Commons, DSpace, and ContentDM, and includes info on how to import the records, how to get them to display in Primo, and some basic info on using normalization rules to customize the imported data.

Created by: Nathan Mealey, Portland State University

System components: PBO

Skill set requirements: PBO, XML (if you want to use norm rules and customize incoming data)

Accessibility: n/a

Browser support: n/a

Mobile support: n/a

Implementation steps: 

Pipes are made up of three parts:
  1. Data source: the specific source for the records (e.g. your institution's Digital Commons instance)
  2. Scope: the search scope that these records will be included in (not to be confused with scopes that are setup in your views, which are actually collections of individual scopes)
  3. Pipe: the specific collection of records that will be imported, for instance a specific set of records in your Digital Commons instance, or all records from it
Steps
  1. Find the OAI URL for the repository and collection that you are importing.
    1. Example repository URL: http://scholarsbank.uoregon.edu/oai/request?verb=Identify
    2. Example collection URL: http://scholarsbank.uoregon.edu/oai/request?verb=ListRecords&metadataPrefix=oai_dc&set=com_1794_16
  2. Create a new data source in the PBO for the repository
    1. Navigate to Primo Home > Ongoing Configuration Wizards > Pipe Configuration Wizard > Data Sources Configuration
    2. Under “Add a New Data Source”, select the following:
      1. Source System: other
      2. File splitter: OAI splitter
      3. Source name: whatever you want 
      4. Source code: whatever you want - note that the corresponding scope will need to have this same name
      5. Source format: DC
      6. Input record path: oai_dc:dc
    3. Click Add
  3. Create a new scope for the data source to use
    1. Navigate to Primo Home > Ongoing Configuration Wizards > Pipe Configuration Wizard > Scope Values Configuration
    2. Under “Create a new Scope Value”, enter the following:
      1. Scope value code: enter the exact same value that you included for the Data Source-Source Code field above, being sure to match the case that you used.
      2. Scope value name: whatever you want
      3. Type: collection
      4. Use scope for: search
      5. Click Create
      6. Click Deploy
    3. Add the scope to an appropriate Primo view
  4. Create a new normalization rule
    1. Navigate to Primo Home > Ongoing Configuration Wizards > Pipe Configuration Wizard > Normalization rules configuration
    2. At the bottom of the page, under Create a new normalization rules set
      1. From the drop-down, select the “Generic Dublin Core - template” option
      2. In the Name field, enter the norm rule name (no spaces allowed)
      3. Click Create
    3. Under the Section drop-down, select the “Search” option
      1. At the bottom of the page, click edit next to the “search:searchscope” fields
      2. Confirm that rule 1 includes the following values:
        1. Type: PNX
        2. XPath: control/sourceid
        3. Transformation: copy as is
        4. This ensures that the imported records will include a scope value that matches the data source code
        5. Click Save
    4. Under the Section drop-down, select the “Display” option
      1. Edit the “display:creationdate” field
      2. Change section 1
        1. Type: XML
        2. Path: dc:date (note: you should check the XML of the incoming records to be sure the date info is held in this field. If not, be sure to change this to the correct incoming field value.)
        3. Transformation: Take substring
        4. Parameter: 0@@4
        5. Click Save
      3. If you want to change the material type from “book” (the default) to “article” (or something else), edit the “display:type” field
        1. Type: constant
        2. Value: article
        3. Click Save
    5. Under the Section drop-down, select the “Links” option
      1. Edit the “links:linktosrc” field
        1. In the Conditions section, confirm that the Parameter value matches the start of the URL that is in the dc:identifier field in your records
          By default, the Parameter value will be “http”, so if you’re records’ URLs begin with “https” change the value accordingly
        2. If you made a change in step 1, click Save
    6. Navigate to the list of normalization rules
      1. Click Deploy for the norm rule you just created
  5. Create a new pipe
    1. Navigate to Primo Home > Ongoing Configuration Wizards > Pipe Configuration Wizard > Pipe Configuration > Define Pipe
    2. Enter the following info for the new pipe:
      1. Pipe Name: whatever you want - note that if you’re importing multiple collections from the same repository, you should probably include the collection name in the pipe name, and the name cannot include any spaces - use letters, numbers and underscores only.)
      2. Pipe type
        1. Regular = incoming new records will be added, existing matching records (in Primo) will be updated, existing matching records marked as "delete" in the data source will be deleted from Primo  [RECOMMENDED]
        2. Delete and reload = all existing records (in Primo) imported using the named data source will be deleted and all incoming records will be added
        3. Delete data source = all existing records (in Primo) imported using the named data source will be deleted 
        4. No harvesting - update data source = all existing records (in Primo) will be updated using previously harvested data; new records will not be pulled from the repository 
      3. Data source: the data source you created in step 1
      4. Harvesting method: OAI
      5. Server: the OAI feed base URL for your repository server
        1. Example for Digital Commons: http://pdxscholar.library.pdx.edu/do/oai/
        2. Example for CONTENTdm: https://cdm.reed.edu/cgi-bin/oai.exe
        3. Another example for CONTENTdm: http://content.libraries.wsu.edu/oai/oai.php
        4. Example for DSpace: http://digitallibrary.amnh.org/oai/request
      6. Metadata format: oai_dc
      7. Set: name of the specific collection that you are importing
        1. This field is optional - use it only if you are not importing all records from your repository
        2. When setting this value you only need to include the value that follows “set=” in the OAI URL
        3. Example for Digital Commons: publication:lse_comp
        4. Example for ContentDM: engberg 
        5. Example for DSpace: hdl_2246_6145
      8. Start harvesting records from: whatever date you want - this is based on the creation date info for the records in the repository, so only records with a creation date equal to or greater than the date that you specify will be imported.
        1. After you run a pipe for the first time, this value will automatically set itself to the date/time that you ran it.
      9. Normalization mapping set: the norm rule you created in step 4 above
      10. Priority: medium is a good value for this
      11. Enrichment set: No Enrichments - Template
      12. Harvested file format: *.tar.gz
      13. Check the boxes for “Include DEDUP” and “Include FRBR”
      14. Maximum error threshold: 5% is a good value for this
      15. More info on these fields: http://alliance-primo-bo.hosted.exlibrisgroup.com:1601/primo_publishing/admin/help/wwhelp/wwhimpl/js/html/wwhelp.htm?context=try&topic=Pipe_Monitoring
      16. Click Save
  6. Execute the pipe
    1. Navigate to Primo Home > Monitor Primo Status > Pipe Monitoring
    2. Click Execute next to the pipe you want to run
  7. Schedule the pipe (PBO > Schedule > Tasks)
  8. Add repository search scope value to a search scope in Primo

Viewing PNX Records Imported Via Your Pipe
  1. Click the PNX Viewer link from the PBO main menu
  2. In the “record ID” field, enter the scope value that you used when creating your scope above, followed by an asterisk, and then click “Go” For example, if your scope value was 'MYDATASOURCE_SCOPE1", search for 'MYDATASOURCE_SCOPE1*'.
  3. You should see a list of all of the records imported using your data source.
    Note: you can also search by the data source code, to see all of the records piped in using that data source (as opposed to a specific scope). To do this, search for the data source code, followed by an asterisk. Using the example above, you would search for 'MYDATASOURCE*'.

To Reimport Records After Changing Your Norm Rule
If you want to reimport your records after making changes to your norm rule, you can just rerun the pipe. As a first step though, you’ll need to change the date for the “Start harvesting files/records from” field, since this auto-sets to the current date/time after each time you run the pipe. So set this to a date in the past after which you’ve added records to the collection that you’re importing.
*This wouldn’t apply if you set your pipe up as a Delete and Reload pipe.


Importing via pipes vs. registering with the PCI

In 2013, Ex Libris rolled out an initiative to enable libraries to register their repositories in the PCI.  When you do this, you can specify which collections are collected, how frequently, etc. Once you’ve registered, then your repository will show up as one of the open access collections that can be individually activated in the PCI.

There are a few significant differences between going this route and importing records via pipes that should be considered:
  • Discoverability
    • Records imported via a pipe have their own scope, and so can be added to any other aggregated search scope that you want. For example, you may add them to your library-specific search scope, so that they are discoverable as part of your library’s holdings.
    • Records activated via the PCI will only be part of the PCI scope. So they would not be discoverable within your library-specific search scope, only within those scopes that include the PCI.
  • How records are normalized
    • When importing records via a pipe you have the option of applying your own normalization rules to the incoming records, enabling you to customize their display in Primo. In contrast, when records are added to the PCI via the registration process, they will be normalized according to whatever rules are in place for such content. No local customization will be possible.
  • Who can see/find your records
    • Registering your repository with the PCI enables other libraries using Primo to activate your repository as a searchable collection within Primo. 



Related customizations


Back to Blog
Comments (0)
Reply To Article