- PIDS AND DATA CITATION
PIDs and data citation
Data citation is the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to other scholarly resources. | ANDS, 2017
The correct citation of research data - data citation - is seen as one of the most important ways in which research data can be counted as 'first-class research output'. In this section, we will show what other advantages data citation offers, what role persistent identifiers (PIDs) play and what a data citation looks like.
Working on a culture of data citation
The publication of datasets increasingly counts as a citable contribution to the research curriculum. DataCite (n.d.a.) is an important player in building the technical infrastructure to enable data citation. In addition, it is the research community itself that has published two manifestos to point the way: one with a number of data citation principles (FORCE 11, 2014) and one with software citation principles (Smith, 2016). These initiatives form the basis for building a culture of data citation (ANDS, n.d.).
Citing research data is part of the Altmetrics (2010) - alternative metrics - movement that states that the impact of your research is determined by (the references to) a wide range of research output such as datasets, software, blog posts, presentations, etc.
Data citation:
- Makes data easier to find;
- Promotes reproducibility;
- Promotes reuse of data;
- Makes it possible to track the impact of the research data;
- Creates a publication structure that enables long-term availability of data;
- Provides a structure in which the impact of the data can be traced back to the researchers who created the data.
Persistent identifiers and data citation
To be citable, a dataset needs a persistent identifier (PID), a unique label that is linked to a digital object. This means that the object can always be found, even in the event of changes of name and location. With a PID you can prevent the creation of broken links or a page not found.
When publishing data in a data archive, a PID is automatically assigned to the data. A PID is conditional for the F of FAIR data. Without a PID, a dataset cannot be found in a sustainable way. A PID is therefore necessary, but not sufficient for FAIRness. If the dataset is only assigned a PID and no machine-readable metadata, it will still be difficult to find a dataset, unless the PID is known. It is via the metadata that a dataset is found and via the PID that the dataset is then located.
In the video below we explain the role of a PID - in this case the DOI (n.d.) - in data citation.
RDNL video concerning data citation; select HD-quality for the best viewing experience.
Connecting PIDs
Persistent identifiers describe a kind of endpoint. To be really useful, these endpoints must be connected to each other (Haak et. al., 2018). To be able to create a so-called 'research graph' in which the relationships between data, researchers, publications, research funders, organisational resources, etc. can be seen at a glance, more PIDs are needed than those for the research data alone. A well-known PID for a unique researcher is ORCID iD (n.d.).
PIDs act as both unique identifiers and, critically, as connectors. By unambiguously identifying and connecting an individual researcher with their research organisations, professional activities and other contributions, we can be confident that we understand – and can assert – the relationships between each of them. And, by doing so using resolvable PIDs that incorporate FAIR metadata, we also make researchers, their affiliations and their contributions more easily discoverable. | Meadows, 2019
In the spotlight
Different persistent identifier systems exist (DPC, 2017), for example the URN, Handle, PURL, ARK and DOI. Depending on the purpose, an object can be assigned one of these persistent identifiers. With the PID guide (Netwerk Digitaal Erfgoed, n.d.), you go through about 25 questions, after which a PID that seems best for your organisation and goals will appear.
DOIs are increasingly accepted as the persistent identifier of your choice when it comes to data citation. This is noticeable, among other things, because systems that offer other persistent identifiers are also going to offer DOIs. At first, Dataverse Network only offered Handle and then switched to DOIs. In addition to URNs, DANS now also offers DOIs. We will therefore zoom in on the DOI below.
DANS, 4TU.ResearchData and SURF handle persistent identifiers differently. 4TU.ResearchData uses DOIs, DANS uses DOIs and URN:NBNs and SURF uses the Handle system. Datacite DOIs are suitable and intended for citing data, the URN:NBN is primarily aimed at identifying and is less used as a citation tool. Handle is an 'all purpose' PID system and is especially useful for assigning PIDs to large quantities of objects (Netwerk Digitaal Erfgoed, n.d.).
PIDs at 4TU.Centre for Research Data
4TU.ResearchData registers DOIs via DataCite Netherlands. Within 4TU.ResearchData, all datasets that are provided with the required metadata have a DOI. They all have an UUID (Universally Unique IDentifier). A UUID consists of 36 characters (32 letters/numbers and 4 dashes) in the form of 8-4-4-4-12 characters. For example: uuid:32c53005-a4f2-447c-b231-6cdb7dcdd17f. The total number of possible unique UUIDs is so large that it is unlikely that two identical UUIDs will be created.
The DOIs of 44TU.ResearchData are prefixed with the URL of the data centre and have the UUID as a suffix, for example https://data.4tu.nl/repository/uuid:32c53005-a4f2-447c-b231-6cdb7dcdd17f. On the landing page of the dataset it says: 'please cite/link this dataset as doi:10.4121/uuid:32c53005-a4f2-447c-b231-6cdb7dcdd17f'. The code 4121 stands for 4TU.Centre for Research Data.
If you want to find a DOI, put dx.doi.org or doi.org in front of it. Then you will always come to the right place. You can also use resolve a DOI. The resolver must also be kept for the long term, of course. This is done by the international DOI foundation. There are no concerns about maintaining the resolver: "It's too big to fail".
PIDs at DANS
At DANS, all datasets have two persistent identifiers: a DOI and a URN:NBN. Both are automatically assigned when the data manager approves and publishes a deposited dataset. The DOI can be used by researchers to refer permanently to the dataset. DANS has been using the URN:NBN persistent identifier for sustainable access to all the material in the archive for a long time. DANS manages the Dutch resolver for the URN:NBN (DANS, n.d.).
A DANS dataset gets two PIDs. This looks as follows:
Schöpfel, Dr. J. (University of Lille, GERiiCO
laboratory) (2019): Data Papers as a New Form of Knowledge Organization
in the Field of Research Data. DANS. https://doi.org/10.17026/dans-zk3-jkyb
DOI: 10.17026/dans-zk3-jkyb
URN: urn:nbn:nl:ui:13-iy-02u8
A URN:NBN is structured as follows:
- URN as the identifier scheme;
- NBN as namespace for so-called National Bibliographic Numbers;
- NL:UI to indicate that these are identifiers that have been assigned within the Netherlands;
- A unique code for the dataset within DANS.
PIDs at SURF
SURF has two flavours of PIDs:
- For the SURF Data Archive (SURF, n.d.a.) SURF uses the Handle system (SURF, n.d.b.). The SURF Data Archive is suitable for storing larger quantities of data for a longer period of time.
- For the SURF Data Repository (SURF. n.d.c.), SURF offers DOIs in addition to Handle. To see what this looks like, take a look at the metadata of this dataset by Ishiyama (2011).
Researchers/research institutions can also register their data collections via SURF and make them accessible with the aid of PIDs (SURF. n.d.b.).
It makes sense to not only make research data available but also software code, posters and other research output. Some examples:
- Software code
You can make software code citable by publishing the code from GitHub to Zenodo. GitHub has a DIY guide available (GitHub, 2016). - Posters and other research output
Posters and presentations are often shared on Figshare (n.d.) or Zenodo (n.d.a). Within Zenodo you can also create a community (Zenodo, n.d.b.) where you curate the collection of output with a group of people.
Each upload gets its own PID in this way (both Figshare and Zenodo have the DOI as PID). For research output that does not automatically get a PID, this is an easy way to make that output findable, citable and more visible.

Altmetrics (2010). Altmetrics: a manifesto. http://altmetrics.org/manifesto/
ANDS (n.d.). Building a culture of data citation. https://www.ands.org.au/__data/assets/pdf_file/0003/383025/data_citation_poster.pdf
ANDS. (2017). Data citation. ANDS Guide. awareness. https://www.ands.org.au/__data/assets/pdf_file/0005/724334/Data-citation.pdf
DANS (n.d.). Resolve identifier. http://www.persistent-identifier.nl/
DataCite (n.d.a.). https://datacite.org/
DataCite (n.d.b.). DataCite MDS API. https://mds.datacite.org/
DataCite (n.d.c.). DataCite - Cite Your Data. http://www.datacite.org.s3-website-eu-west-1.amazonaws.com/cite-your-data.html
DataCite (2019, Augustus 16th). Datacite Metadata Schema. Metadata Schema 4.4. https://schema.datacite.org/
DCP (n.d.) Persistent identifiers. https://dpconline.org/handbook/technical-solutions-and-tools/persistent-identifiers
Delft University of Technology (n.d.). DataCite Netherlands. https://www.tudelft.nl/en/library/support/datacite-netherlands/
DOI (n.d.) https://www.doi.org/
Figshare (n.d.). https://figshare.com/
FORCE 11 (2014). Joint Declaration of Data Citation Principle. - Final. https://www.force11.org/datacitationprinciples
FREYA (n.d.). The FREYA project. https://www.project-freya.eu/en/about/mission
GitHub (2016). Making your code citable. https://guides.github.com/activities/citable-code/
Haak, L., Meadows, A., Brown, J. (2018). Using ORCID, DOI, and Other Open Identifiers in Research Evaluation. Front. Res. Metr. Anal, vol 3, p28. https://doi.org/10.3389/frma.2018.00028
Ishiyama, T., Rieder, S., Makino, J., Zwart, S.P., Groen, D., Nitadori, K., Laat, C. de, McMillan, S., Hiraki, K., Harfst, S. (2011). The Cosmogrid Simulation: Statistical Properties of Small Dark Matter Halos (2048-103). Leiden University. 10.25606/SURF.578c6039-0bf84511
Keen, A.S (2011): Erosive Bar Migration Using Density and Diameter Scaled Sediment Erosive Profile Set-Prototype Scale (Actual Scal 1:10). TU Delft. doi:10.4121/uuid:32c53005-a4f2-447c-b231-6cdb7dcdd17f.
Meadows, Alice, Laurel L. Haak, and Josh Brown. 2019. “Persistent Identifiers: The Building Blocks of the Research Information Infrastructure”. Insights32 (1): 9. http://doi.org/10.1629/uksg.457
Netwerk Digitaal Erfgoed (n.d.). PID wijzer. https://www.pidwijzer.nl/pid_results/new
ORCID. (n.d.). Register for an ORCID iD. Retrieved from https://orcid.org/register
PID Forum. (n.d.) https://www.pidforum.org/
Smith, A.M., Katz, D.S., Niemeyer, K.E., FORCE11 Software Citation Working Group. (2016) Software Citation Principles. PeerJ Computer Science 2:e86. https://doi.org/10.7717/peerj-cs.86
SURF (n.d.a.). SURF Data Archive. https://www.surf.nl/langdurig-data-opslaan-met-data-archive
SURF (n.d.b.). Data Persistent Identifier: data altijd vindbaar door permanente verwijzingen. https://www.surf.nl/data-persistent-identifier-data-altijd-vindbaar-door-permanente-verwijzingen
SURF (n.d.c.). SURF Data Repository. https://repository.surfsara.nl/
Zenodo (n.d.a.). https://zenodo.org/
Zenodo (n.d.b.). Zenodo Communities. https://zenodo.org/communities/