What is ETL ? Extract Transform and Load

ETL

ETL is the set of processes by which data is extracted from various sources transformed and loaded into target systems.
ETL stands for Extract, Transform and Load.


Importance of ETL

  • ETL technology is an important component of a complete enterprise data integration solution and the cornerstone of many strategic technology initiatives that benefit greatly from data profiling, data quality and metadata management functionality.
  • Support data extraction, cleansing, aggregation, reorganisation, transformation and load operations.
  • Generate and maintain centralised metadata.
  • Closely integrated with RDBMS.
  • Filter data, convert codes, calculate derived values, map many source data fields to one target data field
  • Automatic generation of data extract programs.
  • High speed loading of target data warehouses.
  • Employs Middle Ware for near Real Time ETL.



Common ETL Activities

  • Data Extraction
    • Rummages through a file or database.
    • Uses some criteria for selection.
    • Identifies qualified data and
    • Transports the data over onto another file or database.
  • Data Extraction – Cleanup
    • Restructuring of records or fields.
    • Removal of Operational-only data.
    • Supply of missing field values.
    • Data Integrity checks.
    • Data Consistency and Range checks, etc.
  • Data transformation
    • Integrating dissimilar data types Changing codes.
    • Adding a time attribute.
    • Summarising data.
    • Calculating derived values.
    • Renormalizing data.
  • Data loading
    • Initial and incremental loading.
    • Updation of metadata.