Deduping Data Matters: The Hidden Costs of Duplicate Data
For those unfamiliar with the term, “data deduplication” is the elimination of redundant information, or data, within a system. With the rising amount of data being added to ATS, CRM, and other people databases, it is becoming more and more common that there are duplicate profiles of the same person. In fact, roughly 18% of the data in people databases, such as ATSs and CRMs, is duplicate, resulting in unnecessary clutter and reduced efficiency.
Data deduplication solutions have been around since the late 1970’s, reducing storage needs by eliminating redundant data. These deduplication solutions are comprised of data algorithms that achieve compression by replacing duplicated data after referencing a single copy of that data existing earlier in the uncompressed data stream.
Since the 1970’s, these algorithms have grown in both ability and complexity, making data compression and deduplication ubiquitous across enterprise companies with massive data storage needs.
The benefits of data deduplication are widespread, and include capacity and management optimization, improved scalability, improved performance, and reduced expenditure. Data takes up less space once it is deduped, and therefore it’s simpler to navigate and manage. This reduction in space also reduces associated investment and overhead costs.
Despite the versatility and availability of deduping algorithms, they still remain incredibly limiting in one specific way: They still require that the data they dedupe be frozen. Stagnant. Stored. There are no options for fluid data upkeep. And that’s where the friction starts.
Corporate data is a rapidly growing, expanding roughly 40% each year. 20% of this data is considered “dirty,” meaning it is either duplicate, incorrect altogether, or a combination of the two. Let’s break this down to dollar valuation. Assuming each “dirty” record costs a company around $20-$100 to fix in terms of time spent on upkeep and storage thereafter, then an ATS containing 10 million profiles with roughly 2 million that contain some variation of faulty data would equate to an unnecessary financial burden of $40M-$100M.
Companies spend 25% of their IT budget on data storage. Duplicate data also takes up an incredible amount of unnecessary server space. Consequently, companies are overbuying storage systems, which cost anywhere from $2,000 to $10,000 per terabyte of data storage. This is not taking into account the $5,000-$7,000 per terabyte cost of data transfer, which has a high risk of further duplicating or dirtying the data since each system is segmented.
This server segmentation also leads to wasted IT and administrative time in terms of management and upkeep. More time is required to sift through the data and pick out the key elements in each search.
Money aside, duplicate data damages company reputation and customer relationships. It slows down internal productivity by resulting in multiple salespeople calling on the same customer. This is not only irritating, but it also gives an impression of sloppiness, undermining confidence in your company. As you can imagine, this bodes poorly for customer satisfaction and retention.
As recruiters, we source and meet many candidates who we believe are new candidates, when in reality the team may have already added them in their ATS, leading to duplicate profiles. From an internal perspective, this duplicate data leads to incorrect marketing segmentation and personalization as well as unnecessary marketing automation and CRM costs.
Applicant TRACKING (not storage) systems demand a steady flow of data and data upkeep since they track candidates. Currently there is no official system, algorithm, or product in place that eliminates duplicate profiles, apart fromPeople Data Labs.
Because this problem is so prevalent, we here at People Data Labs have spent much time and energy creating the database and technology required to enrich and update candidate profiles, while also enabling the consolidation of duplicate profiles.
Applicant tracking systems are limited when it comes to data tracking, and one of the necessary steps in making them useful and optimizing their capabilities is eliminating all duplicates.