|
What is data profiling?Data profiling is a process to assess current data
conditions, or to monitor data quality over time. It begins with collecting measurements about your data, and
then looking at the results individually and in various combinations to see
where anomalies exist. Data anomalies are the “needle in the haystack” for
technology projects. Even the best
systems have them, but they may not cause pain until a data migration or
integration project comes along. Once
the “needles” are identified, the extract, transformation and load process
or tools can remove them. Data profiling is attribute, redundancy and dependency
analysis. Attribute analysis yields
a set of metrics, which can be interpreted to reveal inherent business rules, as
well as anomalies embedded in source system data. Redundancy analysis assists in determining source of record,
and reduces the occurrences of violated primary keys during integration.
Dependency analysis identifies orphan records, and validates a normalized
model. Together, these analyses
make it possible to interpret data meanings, and implement a structured approach
to address and resolve data-related migration and integration issues. Data profiling has a parallel in common business practices. How does your company prove the integrity of its financial position to its owners? “Auditing” is sometimes viewed as a dirty word, but it assures the owners that their decisions are based on reliable financial information. Data profiling ensures that all of your business decisions are based on reliable information. What business problem does profiling address?Business information comes from manipulating huge amounts of data, usually from multiple sources, residing on different technical platforms. To be able to transform data into information, companies need the equivalent of a decoder ring to synchronize their diverse data sources. Profiling creates a map of your diverse data landscape.
This map identifies the roadblocks between data sources.
Once you know where the roadblocks are, you can create a route to reach
synchronized data. Synchronized
data forms the basis of your competitive business knowledge. BACK TO TOPWhat data management problem does profiling address?Migrating and/or integrating multiple-source system data
requires answers to the following: Q: Will data from the source system fit in the target system? Q: Does the content match? Q: Do the structures match? Q: What is the right data to move? Your project is at risk if you answered “Don’t Know”
to any of these questions. Frequently,
projects are based on the strategy of “Let’s start, and we’ll fix any
problems as we go along.” And,
occasionally, that will work. But
what usually happens is the “80/20” rule:
You spend 80% of your time/budget fixing the 20% of data with anomalies. Data anomalies are the “needle in the haystack” for
IT.
These “needles” cover single-source and cross-platform deviations in:
Read
more about the benefits of data profiling Read articles on
data profiling by industry experts BACK TO TOP |
|
KnowledgeDriver, Inc.
* 4720 W. Princeton Ave. * Denver, CO 80236 |