Skip to Main Content
HSRC

Work with data

Validate, clean and process data

Data processing occurs when data that is collected and translated into usable information. This is the stage at which raw data is checked for errors, cleaned, and organised for analysis. It might also be necessary to combine different data sets to enrich a data file, compute or impute values or to anonymise / de-identify data. Although the need for data cleaning can be limited by good questionnaire design and by using well configured data capturing software, the following tasks generally form part of this process:

  • Removing extraneous data and outliers.
  • Providing missing values.
  • Conforming data to a standardized pattern.
  • Masking private or sensitive data entries.

Data validation refers to the checking of procedures followed, making sure that the respondents were chosen based on the research criteria, for example random selection method using a Kish method, and checking for completeness in the data.

The process of managing quantitative data includes:
  • Using syntax
  • Validation – error detection
    • Checking procedures
      • Number of variables and cases
      • Unique identifiers
      • Formats (numeric, string, length)
      • Completeness
      • Reasonableness (is the data believable)
      • Out-of-range / invalid values
      • Coding / capturing errors
      • Skip patterns
    • Creating and printing error lists
  • Cleaning and preparing for analysis
    • Retain original data
    • Record changes (what, how and why)
    • Numeric variables (variable and value labels)
    • String variables
    • Anonymisation
  • Process should be planned, coordinated and systematic
  • Version control
  • Converting data files
  • Data coding during analysis
    • Categorising variables, computations
      • Record changes (what, how and why) - provide a syntax for recoded variables
      • Verify
    • Version control
The process of managing qualitative data includes: 
  • Sources of qualitative data (Born digital (collected digitally) / Digitised (collected manually and then digitised)
  • Data types
    • Transcriptions
    • Audio recordings
    • Video recordings
    • Images
    • Photographs
  • Transcriptions: Validation, cleaning and preparing for analysis
    • Items (Data listing)
    • Internal metadata (Cover table)
    • Structure and layout
    • Spelling and typing errors
    • Anonymisation (Anonymisation log)
  • Version control