Skip to Main Content
HSRC

Work with data

Why documentation?

Documenting your data is simply providing sufficient descriptive information about your data so that it can be used properly by you, your colleagues, and other researchers in the future. Well documented data is identifiable, understandable, and usable in the future. You should document your data at each stage of the research process, rather than attempting to recreate information at a later stage. This will make documentation easier and make it less likely that you will forget the details of each process later. Data documentation will also ensure that you and others will be able to interpret, assess, and repeat your work.

Describing data for secondary use

The value of data for secondary research use is determined by various factors, such as the relevance of the data to the HSRC’s research objectives, whether the data are scientifically significant to research into a particular domain, the extent to which the data are unique and perhaps not replicable, the reliability, integrity and usability of the data.

Metadata is created in order

  • to make the data discoverable;
  • to determine if the data would be useful for secondary use;
  • to understand the type, purpose and scope of the study or project.

For other people to fully understand and interpret your data correctly, it is important that your data is accompanied by the right documentation:

  • Metadata record:

This record will be created by data curators within eRKC. They have the knowledge and skills to make sure that the metadata added will ensure that your record describes it in the best possible way to ensure understanding and discoverability.

  • Contextual documents

Contextual documents are files that explain the context behind the dataset and that contain information on how the research was done. Generally, these are version logs, notebooks such as lab notebooks, or documents setting put given methodologies. They may also come in the form of standardised protocols, equipment or software manuals, field notes on paper and so on. These answer the “who, what, why, where and how” of the data. Examples of the context documentation are the context around data collection (project history, objectives and hypotheses), data-collection methods (sampling, the data-collection process, measuring instruments, etc.) and information on access, conditions of use and data confidentiality.

  • Structural documents

Structural documents refer to the files that describe the structure of the dataset. These are often readme.txt files or other documents that contain an overview of the various folders and files that make up the dataset. The more elaborate your dataset is, the more important a structural document is. Examples include: Which folder contains what? Which files must be opened first? and much more.

  • Content documents

Content documents are files that describe the content of the dataset. This document describes the data set at the data level. These are often codebooks that explain the concepts and/or variables in question as well as their meaning and the numerical or other values they represent. The content document can also include an explanation or definition of codes and classification schemes used, the coding of data and reasons for missing values, etc. A user guide is an example of such a content document.

Metadata

The term metadata refers to information that describes significant aspects of a resource. It may exist at various levels, typically from that of the data collection through to the individual variables of each data file in that collection.

Metadata are completed in terms of the DDI (Data Documentation Initiative) and Dublin Core metadata data standards. Metadata is captured at Project level and Data set level. The information provided in the Data Deposit Form is used to populate the fields in the curation platform. 

At project level: the metadata should document the following:

Data set ID Number used to identify the data, even if it is just an internal project reference number
Title The data set title should give a comprehensive description of the data set and can be the same as the study title if there is only one data set in the collection.

At Data set level metadata should document the following:

Data set details Data set ID, data set title, data set description, data set abstract, time method, time period, origin, granularity, sources, data type, kind of data, production date, version, Identifier, and resource type
Subject description Keywords/topics
Scope Geographic coverage, geographic unit, unit of analysis, universe (included and excluded)
Data collection Date that data collection took place, mode of data collection, sampling procedure, weighting
Funding / Authoring Names and addresses of the organization or people who created the data (Producers), Organizations or agencies who funded the research (Funders), Others who contributed to the project (Other Identification or Acknowledgements inclusive of HSRC staff, external individuals, and organisations), Distributors, Authors/Principal investigators (HSRC staff and external individuals)
Access and copyright

Any known intellectual property rights held for the data (Copyright holders)

Where and how your data can be accessed by other researchers (access conditions)

Related documents List of all documents associated with the project, with their names and file and metadata status
Data files List of all data files associated with the project, with their names and file extensions
Research outputs List of research outputs associated with the project/data set
Data set metadata status Status (Dissemination, Preservation only, Reviewed), Date, Target audience, data file permission, Live, Data portal display date, Retraction (Yes/No), Curation status (Completed, Awaiting sign off, Curation in process, Deposited- curation not started)

Data documentation during research

How to document?

  • Embed annotations (variable and value labels) in data files. This simply means each variable needs to have a related variable label, and codes must have value labels in a data file.
  • Use syntax files
  • Develop a narrative (Update the data management plan)
  • Structured metadata (Complete the Data Deposit Form)

Principles

  • Document data while doing research - not retrospectively
  • Provide meaningful information (titles, descriptions, abstract, keywords)
  • Develop detailed, comprehensive documentation about essential content
  • Quality of documentation provided by researcher is vital to make data re-usable
Acknowledgements

https://guides.library.illinois.edu/introdata/documentation

  https://libguides.mst.edu/c.php?g=335446&p=2257031