#6 What should I look out for when assessing and improving my data quality?

The issue under analysis here is that of unclean data and template validation

This is one of a series of articles on solving common data issues derived from in-house experience of Consulting projects using OrgVue

Completeness

How many nodes actually contain values?

  1. Use Data Types and Patterns dashboard to view overall dataset quality
  2. Use the Hierarchy Validator custom dimension to identify reporting line issues
  3. Clean centrally or revert template to business
  4. Prioritise key fields (e.g. parent ID), those everyone should have (e.g. start date), those with a clear mapping (e.g. grade: salary) and key nodes (eg heads of orphan groups)

Validity

Are values of the appropriate type?

  1. Use Data Types and Patterns and the filter control to identify values that deviate from the expected type (especially dates, NaNs in measure fields, and IDs set as numbers)
  2. Export values to Excel to change and re-paste
  3. Use a replace macro (e.g. for currency symbols),
  4. Fix in OrgVue by painting with data, in Pivot View, Filter, Parking Lot, Color, or splash commands

Consistency

Are values consistently formatted as expected?

  1. Inconsistencies in case (mixed upper and lower), style (e.g. John Doe vs Doe, John), locale (e.g. ‘,’ vs ‘.’), rounding, and % cause reporting issues and imply deeper data problems
  2. Use clear, concise advice  and validation in templates
  3. Ensure a single Excel workbook locale
  4. Use Regex to confirm values meet the required pattern

Correctness

Is the information actually correct?

  1. Even when complete, valid and consistent, the data may be wrong – e.g. incorrect line manager, salary etc.
  2. alidate with the business sooner, checking high level salary vs finance, reporting lines with HRBPs, preferred names vs Active Directory etc.
  3. Use calculated fields to check and augment provided data
  4. Document and convey data assumptions

Supplementary Material

Please note the Hierarchy Validator is for validation purposes only and should not be run in a Production environment as it will introduce a performance impact: once the data has been suitably cleansed it should be removed.

If you have any additional queries arising from the above, please select the Submit A Request link from the top right of this screen to contact OrgVue Support

This article was authored by Ben Marshall from the OrgVue Consulting team

 

 

Have more questions? Submit a request

Comments