This article outlines a six-step process to avoid The impact of common data quality issues, and Break out of the data quality trap.
- Step 1: Create a Draft to get a record of your changes
- Step 2: Fix the hierarchy to analyze the structure
- Step 3: Ensure Organization Dimension properties provide the required subgroups
- Step 4: Check the values of properties used to calculate measures
- Step 5: Export a Changes List to get corrections made at source
- Step 6: Prevent issues recurring with Lookups
If you don’t have immediate access to the information needed to make corrections, problematic cases can be "parked" in the Pinboard. Separate Collections for different data issues are helpful for tracking issues and the affected cases.
Step 1: Create a Draft to get a record of your changes
When data changes (corrections) are saved in a Draft:
- You get a list of all changes which can be used to get issues fixed at source (see Step 5).
- You can avoid having to repeatedly clean your data, for example if it gets refreshed with source data which hasn’t been fixed.
- Select ‘Data’.
- Select ‘Save As’.
- Select ‘Draft’.
- Enter a name for the Draft (‘Corrections’ in this example).
- Select ‘Save’.
Step 2: Fix the hierarchy
When nodes are disconnected from the reporting structure, rolled-up measures will not be accurate, and visualizations will be presented poorly.
Opening The Hierarchy Cleaning Pack will reveal any structural issues in your data and the reason for each issue. Alternatively, ‘broken hierarchies’ will be seen in a Tree slide.
-
Drag and drop nodes to their correct location in the organization structure. (you may need to drag and drop across a large area of the canvas.
If you are not able to immediately 'reparent' disconnected nodes, you can:
- See more information about a node to help identify its correct location in the structure.
- Proceed without fixing every issue by excluding disconnected nodes from visualizations. (This is only recommended when there is a small number of problematic cases).
- Generate and export a list of positions requiring correct IDs, source the required data then paste merge the updated values.
A. See more information about disconnected positions
- Select the 'Data' tab.
- Select a position.
- Review the information in the Data Pane.
B. Exclude disconnected positions from visualizations
Filtering by ‘Level 1’ with the top leadership position selected will exclude cases which are disconnected from the reporting structure.
- Select the ‘Filter’ icon.
- Select ‘Level 1’.
- Select the top leadership position.
Note:
- Whilst this will improve visualizations, unfiltered positions will be excluded from rolled-up measures.
- The ‘is orphan’ generated property can be used to exclude orphans, but this will not capture disconnected nodes which have direct reports (an ‘Orphan’ is defined as a node with no parent and no children).
C. Generate and export a list of disconnected positions, then paste merge updated IDs
- A worksheet with additional information to identify positions is recommended. You do not need to include the Position ID property as this is automatically included in the export file.
- Filter by ‘Level 1’ and exclude the top leadership position.
- Export the worksheet, selecting the ‘Current’ slides option.
When the correct information has been sourced, it can be paste merged to update your Orgvue data.
If The Hierarchy Cleaning Pack is used, exporting slide 2 will detail the type of issue and the reason for the position being disconnected:
Step 3: Ensure Organization Dimension properties provide the required subgroups
Organization Dimension properties are used to ‘slice and dice’ key measures; values for these properties define the subgroups used to compare different parts of the organization. The example below shows two typical data quality issues which would lead to subgroup measures being reported inaccurately:
- Inconsistent property values;
- Positions with incorrect property values.
“Paint with Data” using the color legend to make bulk updates
- Drag and drop categories in the color legend (the example below shows 43 positions being updated by this action).
- Alternatively, drag positions from the structure to the required legend category. (Multiple positions can be selected by holding down the keyboard Shift key).
Recommended steps:
- Preview each Dimension property to check the consistency of subgroup categories.
- Check values are correct for positions: An Icicle Tree slide with the same Dimension property used for both Pages and Color can be used to cycle through each subgroup to identify any outliers or anomalies:
Best practice:
- The subgroups used for analyzing and reporting organizational effectiveness insights should be consistent with the subgroups used for financial reporting and other management information.
- Use Expression Lookups to prevent inconsistent values being applied. (See step 6, below).
“Paint-with-data” using the Filter Control to make bulk updates
Categories in the Filter Control can be dragged-and-dropped in the same way as the Color Legend. In the example below, the ‘North America’ category can be dropped on the ‘NORAM’ category to resolve the inconsistency in the Region Property.
Recommended steps:
- When there is a logical relationship between Dimension properties – for example across geographic levels in the above example, use the Filter Control to check the expected logic: The example below reveals 12 positions in 'New York', 'USA' included when the 'APAC' region is selected:
- Drag-and-drop inconsistent cases to the correct category. (In this example, the ‘Country’ and ‘Location’ values are known to be correct).
Best practice:
- Use Dependent Lookups when there is a logical relationship between properties to maintain data quality. (See step 6, below.)
Step 4: Check the values of properties used to calculate measures
Preliminary review of the properties to be used in your analysis is recommended to avoid misreporting calculated measures. In the example below,
- The ‘FTE’ property is used to calculate the size of the workforce;
- ‘Salary’ will be used to calculate the cost of the workforce;
- ‘Span’ will be used to analyze the organization structure.
With these properties added as worksheet columns you can:
- Sort rows (by clicking on the column heading) to identify whether there are any extreme or unexpected minimum and maximum values. Below we see the CFO has a 'Span' value of 21 (i.e. 21 direct reports) which is unexpected.
Adding properties to the Filter Control reveals:
- Any unexpected values. In this example there are 12 positions with an ‘FTE’ value of ‘0’.
- The number of cases with ‘Blank’ (missing) property values.
- Any extreme values, and the positions these are observed for. In this example, the HR Administrator is recorded as having a salary of almost one million.
If you have ready access to the correct data, values can be updated in the worksheet. Alternatively, you can export a list of problematic cases to source the required values.
Equipped with transparency on data quality you can take informed decisions. If there are only a small number of problematic cases, you may wish to exclude these and proceed with a caveat to accompany your analysis ‘e.g., “Preliminary observations based on XXX cases.”’
Step 5: Export a Changes List to get corrections made at source
Adding a Changes List slide provides a record of corrections which can be shared to get issues fixed at source.
- If changes are saved in a Draft, the Changes List will include every change since the Draft was created. If changes are saved in a Dataset, the Changes List will only include changes made since the last save. [See Reverting data changes (and appreciating Drafts)].
- Add a Slide and select the Changes List slide type.
- Select the ‘Export’ icon, specify CSV (or TSV).
- Select ‘Current’ Slides.
Step 6: Prevent issues recurring with Lookups
The data quality issues shown above at Step 3 could have been prevented with Lookups:
- Expression Lookups define a set of values which can be entered for a property. When editing data, users are presented with a drop-down list of values to select from. This avoids the risk of synonymous property values emerging, or typos compromising data quality.
- Dependent Lookups can be used to populate property values when there is a logical dependency across different properties. In addition to the geographic example shown at Step 3 (where there was a relationship across the ‘Region’, ‘Country’ and ‘Location’ properties), there is often a logic across functional properties, for example, ‘Function’, ‘Subfunction’ and ‘Department’.
Note: Both Lookup types should be added after data has been cleaned:
- When an Expression Lookup is applied, all pre-existing property values will remain and can be selected.
- When a Dependent Lookup is used, the defined logic does not get applied retrospectively. Any cases which do not conform to the logic before the Lookup was applied (such as the positions seen at Step 3 which had ‘USA’ as the ‘Country’ value, and ‘APAC’ as ‘Region’ value) will remain unchanged.
Related articles:
Comments
0 comments
Please sign in to leave a comment.