Skip to Main Content
HSRC

Work with data

Data quality - Data ‘fitness for use’ based on data quality dimensions

As data becomes a core part of every business operation, the quality of the data that is gathered, stored and consumed during business processes will determine the success achieved in doing business today and tomorrow.

Quality dimensions make up data quality - these factors will be discussed below.

Data is of high quality ... Poor data quality results in ...
When the data is fit for the intended purpose of use Poor decision-making
When the data correctly represents the real-world construct it describes Inability to react timeously to new market opportunities, thereby hindering achievement of profit and growth
When it refers to all of the planned extensive actions that must be taken to ensure that a data product meets a set of quality criteria Deficiencies in meeting ever-increasing compliance standards
Investing time in resolving duplicated tasks

Data quality is the degree to which data is error-free and able to serve its intended purpose. Certain properties of data contribute to its quality.

These are known as data quality dimensions.

The most important data quality dimensions and how they are measured are highlighted below.

Characteristic How it is measured
Accuracy Is every detail of the information correct?
Completeness How comprehensive is the information?
Reliability Does the information contradict trusted resources?
Relevance Do you really need this information?
Timeliness

How up-to-date is the information?

Can it be used for real-time reporting?

Data quality is defined by various quality measurements in the form of best practices, guidelines, and standards that are correlated with the method used to measure or improve data quality.

Data Quality - using checklists to ensure quality data

At the HSRC, the quantitative and qualitative data checklists facilitate the process of ensuring data quality.

  • As part of the data deposit procedure at the HSRC, researchers have to complete and submit a data checklist. The document is used by data curators to appraise the quality of data and verify that all fields that impact on data quality have been addressed.
  • If certain fields are not completed, curators return the checklist to the research to ask them to complete these fields. This is done to mitigate the risk of curating unusable data files.
The key areas data curators look to in ensuring data quality
  • Participant consent form does not preclude secondary data use, e.g. that data will be available only to the project team.
  • Data is to be shared - externally, or at least with HSRC researchers who are not part of the original project team. If the HSRC does not own the data, written permission for the sharing of the data is provided.
  • The data is anonymised; OR permission has been obtained from participants for the secondary use of the date; OR data can be shared within an enclave, i.e. only at the HSRC, with supervision and regulated in terms of a special usage agreement.
  • Data is shared timeously, i.e. within a maximum of 24 months after completion of the project.
  • The data is the correct and final version of the data set. The data and documents are in an acceptable software format as listed in the Data Deposit Definition Document.
  • The number of items, e.g. interviews corresponds with the number of participants, or the number of records/cases corresponds with the number of respondents, i.e. the number of questionnaires. The number of items corresponds with that specified in the Data Deposit Form.
  • Uniform layout across transcriptions with clear fonts and speaker tags and, for both types of data, a spell check is necessary.
  • Each record has a unique identifier/record number, with no duplicate records captured.
  • Missing values have been coded and defined in the quantitative data file, e.g. System missing, Not applicable, Refused, Don’t know, Not answered.