Skip to Main Content
HSRC

Work with data

Data processing

In many ways, working with data is like interviewing a live source. You ask questions of the data and get it to reveal the answers. But just as a source can only give answers about which he or she has information, a data set can only answer questions for which it has the right records and the proper variables (https://www.rasmussen.edu/degrees/technology/blog/what-does-a-data-analyst-do/ ). Data processing occurs when data that is collected, is translated into usable information. This is the stage at which raw data is checked for errors, cleaned, and organized for analysis. It might also be necessary to combine different data sets to enrich a data file, compute or impute values or to anonymize / de-identify data. Although the need for data cleaning can be limited by good questionnaire design and by using well configured data capturing software, the following tasks generally form part of this process:

  • Removing extraneous data and outliers.
  • Filling in missing values.
  • Conforming data to a standardized pattern.
  • Masking private or sensitive data entries.
  • Using syntax
  • Validation – error detection
  • Checking procedures
    • Number of variables and cases
    • Unique identifiers
    • Formats (numeric, string, length)
    • Completeness
    • Reasonableness
    • Out-of-range / invalid values
    • Coding / capturing errors
    • Skip patterns
    • Creating and printing error lists
  • Cleaning and preparing for analysis
    • Retain original data
    • Record changes (what, how and why)
    • Check
    • Numeric variables (variable and value labels)
    • String variables
    • Anonymization
  • Process should be planned, coordinated, systematic
  • Version control
  • Converting data files
  • Data processing during analysis
    • Categorizing variables, computations
      • Record changes (what, how and why)
      • Verify
    • Version control
    • Data processing record
    • Updating a master data set (multi-author management)
Work in Excel

Excel workbooks are designed to store a lot of information. Whether you're working with 20 cells or 20,000, Excel has several features to help you organize your data and find what you need.