Learn all about KnowledgeDriver products
 

Home
Up
ProfilerSuitcase
What is Profiling?
Who Needs It?
Benefits
What We Measure
Get More Product Info

What is data profiling?

Data profiling is a process to assess current data conditions, or to monitor data quality over time.  It begins with collecting measurements about your data, and then looking at the results individually and in various combinations to see where anomalies exist. 

Data anomalies are the “needle in the haystack” for technology projects.  Even the best systems have them, but they may not cause pain until a data migration or integration project comes along.  Once the “needles” are identified, the extract, transformation and load process or tools can remove them.  

Data profiling is attribute, redundancy and dependency analysis.  Attribute analysis yields a set of metrics, which can be interpreted to reveal inherent business rules, as well as anomalies embedded in source system data.  Redundancy analysis assists in determining source of record, and reduces the occurrences of violated primary keys during integration.  Dependency analysis identifies orphan records, and validates a normalized model.  Together, these analyses make it possible to interpret data meanings, and implement a structured approach to address and resolve data-related migration and integration issues.  

Data profiling has a parallel in common business practices.  How does your company prove the integrity of its financial position to its owners?  “Auditing” is sometimes viewed as a dirty word, but it assures the owners that their decisions are based on reliable financial information.  Data profiling ensures that all of your business decisions are based on reliable information. 

What business problem does profiling address?

Business information comes from manipulating huge amounts of data, usually from multiple sources, residing on different technical platforms.  To be able to transform data into information, companies need the equivalent of a decoder ring to synchronize their diverse data sources. 

Profiling creates a map of your diverse data landscape.  This map identifies the roadblocks between data sources.  Once you know where the roadblocks are, you can create a route to reach synchronized data.  Synchronized data forms the basis of your competitive business knowledge.    

BACK TO TOP

What data management problem does profiling address? 

Migrating and/or integrating multiple-source system data requires answers to the following:

Q:  Will data from the source system fit in the target system?

Q:  Does the content match?

Q:  Do the structures match?

Q:  What is the right data to move?

Your project is at risk if you answered “Don’t Know” to any of these questions.  Frequently, projects are based on the strategy of “Let’s start, and we’ll fix any problems as we go along.”  And, occasionally, that will work.  But what usually happens is the “80/20” rule:  You spend 80% of your time/budget fixing the 20% of data with anomalies.  

Data anomalies are the “needle in the haystack” for IT.  These “needles” cover single-source and cross-platform deviations in: 

Known and discovered business rules

Data standards         

Naming standards         

Data types   

Entity type definitions 

Attribute type definitions        

Domain values  

Embedded codes and flags

Inconsistencies between matching attributes

Domain type consistency

Data redundancy

Information architecture

Orphan record control

Read more about the benefits of data profiling

Read articles on data profiling by industry experts

BACK TO TOP

Company

Products Services Library News Contact Us
Careers Support Partners White Papers Links HOME

KnowledgeDriver, Inc.  *  4720 W. Princeton Ave. *  Denver, CO 80236  
phone:  303-707-0505  *  fax: 303-707-0606
Revised 18 Sep 2002   © Copyright KnowledgeDriver, Inc. 2000    
 a Brainstorm website