A data set is a recognized research output with a specific identity (a unique, persistent web address where information about the data set can be found, and in some instances accessed).
An example of such a digital object identifier (DOI) is http://dx.doi.org/doi:10.14749/1494330158 which references the SANHANES 2011-12 Adult data set and specifically version 1 of the data set that was made available in 2017.
A data set used in research must always be cited just as journal articles, books and other sources that contributed to the research, are cited. This applies to data
- owned by external parties,
- generated by other researchers in the HSRC, but also that you yourself generated as a researcher and are basing your publication or report on,
- a reader of your output must be able to determine which exact data set you have used and if it is accessible, retrieve that particular data set,
- all data sets must be citable, whether it is available for secondary use or not. Not all data sets can be shared, but its existence must at least be known and possibly at a later stage made available for re-use or made available to a limited internal audience.

|
|
How should a data set be referenced?
In-text reference and Citation in bibliography
Human Sciences Research Council. South African Agricultural Business Innovation Survey (AgriBIS) 2016-18 Aggregated: All provinces. [Data set]. AgriBIS 2016-18 Aggregated. Version 1.0. Pretoria South Africa: Human Sciences Research Council [producer] 2019, Human Sciences Research Council [distributor] 2022. http://dx.doi.org/doi:10.14749/1657460803
|
How do you obtain the identifier for a data set?
When any output based on a data set is produced, the underlying data set must be submitted to the Digital Scholarship Services unit of the eRKC. The outputs and data sets will receive a persistent identifier that must be added to the citation. A suggested citation will also be provided that you can easily add to your bibliography.
Why is it important to cite a data set?
- It supports the reproducibility of your research and demonstrates that you've done proper research by being transparent about the evidence on which the research is based.
- It attributes credit to those who provided the data - including data sets that you have created yourself. It enhances the “discoverability” of the authors, as well as the author’s affiliations which contribute to their visibility and standing in the academic community.
- It avoids plagiarism because the authors of a data set are clearly identified.
- Citations, and specifically those that include DOIs allow for tracking re-use and can be used as a measure of research impact.
- Some funders and publishers require data sets to be curated. Making a data set citable in the way described above, will allow this requirement to be met.
|