What is a duplicate id?
A node is considered to be a duplicate if it has the same primary identifier as one or more other nodes in the same dataset. If for example there are 2 employees with the same employee id in an employees dataset or 2 positions with the same id in a positions dataset then these would be considered duplicates.
Why are duplicates bad?
Duplicates impact the integrity of your data as the primary identifier is used to build hierarchies, manage updates and to link datasets.
To give an example if I have 2 positions with the same id and I want make on of them the manager of another position. I would add the position id into the Position Manager id to the child node but Orgvue has no way of know which of the duplicate positions I actually meant.
How can you tell if you have added duplicates?
The answer to this depends how the data was loaded.
If you loaded the data through Settings or via Paste Merge in Workspace then you can use the filter control. Search for "Is Duplicate" which is a Generated property and select "Yes". This will return all duplicate nodes within your dataset.
If you have loaded the data via the Implementation Hub then the load process will either:
1. Strip out all duplicates and load them into a Duplicates dataset for you to review.
2. Load one of the duplicates into the main Enterprise dataset and loads all duplicates into the Duplicates dataset for you to review.
This duplicates dataset will be called something like Duplicates for Enterprise People:
Discrepancies in the source file count and dataset count due to duplicates
As highlighted above, when you load a dataset through the implementation hub, duplicate ID’s will be stripped out of the dataset. However, in this case it will mean that the node count of the dataset and the count of the source file will be different.
If you want to check the number of duplicates that were identified, you can check in two places:
Job History page in the hub
1. Click on the blue hyperlink to access the details of the job
2. The number of duplicates, as well as the number of duplicate groups is listed. This along, with the nodes loaded should equal the total records
Duplicates dataset
The duplicates dataset has the advantage of allowing you to not only identify the number of duplicates in the upload using the node count in the bottom left hand corner of the screen, it also allows you to identify/investigate these duplicates by bringing in any of the properties from the dataset such as name or position title.
What next?
Once you have identified your duplicates then you can delete or update as appropriate in Orgvue but we do recommend that you always try to resolve the issue at source as this type of issue is likely to impact your source systems as well.
Comments
0 comments
Please sign in to leave a comment.