NUSL Technical Solution
The basic elements of the NUSL software solution comprise Invenio for the
NUSL digital repository and the
Elasticsearch indexing and search system for the NUSL central search
interface. The same solution architecture
has been successfully run in the Swiss CERN for several years. Individual
activities and cooperation between the digital repository in Invenio and the
Elasticsearch indexing and search system are depicted in the figure.
NUSL central search interface in the Elasticsearch system
The NUSL central search interface is aimed at creating an integrating search platform for grey literature repositories. This integrating function used to be ensured by the ESP FAST indexing and search system, but it was replaced with the Elasticsearch system in 2016. Elasticsearch enables secure, relevant and scalable searches in linked repositories. This solution should allow users to access the data from both the digital repository and the selected grey literature repositories in a single interactive environment. Searching is primarily performed using navigation functions such as document type, author, keyword, linked base and timeline.
NUSL digital repository in the Invenio system
Invenio is open source software. It may be freely installed, used and modified, enabling its configuration for storing grey literature and distributing it among partner organisations. In 2010, the system was debugged on the basis of continuous system operation testing, the introduction of data into the system and data harvesting from partner repositories. All the parts of the Invenio system were modified, from format structure to templates, and setup of collections to the search setup, etc. At the same time, the digital repository was graphically redesigned, fully localized into Czech, and record searches were adjusted.
The software solution for the NUSL project was selected through a public tender that took place in 2009. The software functionality requirements were defined in such a way as to include the requirements necessary for pilot system implementation as well as to help choose a modern, well-supported technology with good development prospects. The software functionality requirements may be found in the
The preparation for the selection of the software solution included an analysis of selected open source software for digital libraries. The following open source software was analysed: DSpace, Fedora, CDS Invenio, E-print and Greenstone. The results of the analysis may be found in the repository.
The format for storing metadata is an essential part of repository construction. An individual metadata format was defined for the needs of the Czech National Repository of Grey Literature (NUSL). The NUSL metadata format was designed especially for processing records about grey digital documents. The basic requirements for the NUSL format are maximum simplicity and compatibility with the Dublin Core standard. The NUSL metadata format uses elements of Dublin Core, Dublin Core Terms, EVSKP-MS, ETD-MS and some individual elements.
The first draft individual NUSL metadata format (version 0.1) was defined in 2008 and, in 2009 was tested on data from the NTK and University of Economics in Prague. The results of the testing and expertise were included in the beta version (0.2) of the NUSL metadata format. In 2010, the metadata format was optimized using practical experience with the introduction of metadata and full texts into the repository, with the harvesting of metadata and files with full texts from partner organisations and with requirements on compliance with the OpenGrey system. This resulted in the verified version (1.0) of the NUSL metadata format, which may be found in the repository (in Czech only).
The implementation of the NUSL metadata format into the selected Invenio software solution, which uses the MARC-21 native format, was accompanied by the creation of a conversion table.
The primary purposes of digital archives are to store digital information and make it accessible. Persistent identifiers ensure permanent access to digital documents. Here, the persistence of an identifier means the permanence of identification irrespective of the permanence of the identified document. Therefore, it is important that a source marked with a persistent identifier is never relocated or liquidated unless the information on its location is updated in the persistent identification registry. The solution concerning the use of persistent identifiers is described here.
It was intended to usea persistent identifier like URN:NBN, Handle etc. Unfortunately, there is currently no working URN:NBN resolver for grey literature in the Czech Republic. As a solution, an URI identifier is generated in the Invenio system in this format:
www.nusl.cz/ntk/nusl-ID. Identifier nusl- ID represents the number assignet to the record by the Invenio.
The defining criteria to select the persistent identifier for the NUSL are defined in the repository. Resources used in this work are cited in another document connected to the same record.