Data transformation process
Data Extraction
Data analysis starts
with a company's raw transactional and operational data. Data Sources are
single or many data storage locations like Cloud, ERP, Datawarehouse, flat
files, etc. Collecting and extracting data is the initial stage in an ETL
process (extract, transform, and load). The goal of ETL is to prepare data for
analysis or business intelligence (BI). There are 2 types of data extraction
methods:
·
Logical
Extraction:. It is the first step in creating a physical data
extraction plan.
o Full Extraction : Data copied from the source system in
its entirety, even if untimestamped.
o
Incremental Extraction : This method
extracts data in increments. Timestamps can monitor new or changed data.
·
Physical Extraction Extract data
from the source system logically or physically.
o Online Extraction : The staging area processes data
directly from the source system.
o
Offline Extraction : In lieu of directly
extracting data from the source, it is obtained from an external location (flat
files, or some dump files in a specified format).
Data Consolidation
After operational data is created and gathered from various sources and
formats, it must be combined, cleaned, and checked for faults before being
stored in a data warehouse or data lake. Business executives prefer “Hand
coding” by data engineers for small datasets from few sources or “ETL Tools”
for huge datasets from many sources. In both cases, data tables from different
sources are connected by a schema, which helps standardise and address meta
data problems of data consolidation like:
·
data may not have all the required
columns.
·
data has more columns than required
·
data types of columns may not match
across datasets.
·
columns may not be in the same order
across datasets.
·
data rows to be removed as its not
relevant to the data analysis.
Taxonomy Consultancy
Transformed data
should be classified into purchased goods/products or business services for
better insights. Organizations also have general ledger (GL) codes that finance
uses but aren't always the greatest fit for procurement. As a result, spend
taxonomy is a hierarchical structure document that aids in the logical
categorization of comparable spending items or services.
·
Taxonomy can be universal &
standard like UNSPSC, SIC or NAICS or Customized
taxonomy built by procurement and sourcing teams according to their
expenses.
·
The hierarchy ranges from 3 to 5 levels
of categories (generally used 4 Level), from general to specific
o Level 1 (Group), Level 2 (Family), Level 3(Category) & Level 4
(Commodity)
o
E.g., Professional Services (L1) ←
Marketing Services (L2) ← Advertising (L3) ← Radio & TV
·
Standard taxonomies are a wonderful
place to start, but large organisations prefer custom taxonomies since they
know their business domain and products/services.
Comments
Post a Comment