Table of Contents
Data Modeling
A Data model is a conceptual representation of data structures (tables) required for a database
A data model visually represents the nature of data, business rules governing the data, and how it will be organized in the database.
Data modelers are responsible for designing the data model and they communicate with functional team to get the business requirements and technical teams to implement the database.
Various Data Modeling Tools
Data modeling tools are the only way through which we can create powerful data models. Following are the list of popular data modeling tools.
Popular Data Modeling Tools
Tool Name Company Name
Erwin Computer Associates
Embarcadero Embarcadero Technologies
Rational Rose IBM Corporation
Power Designer Sybase Corporation
Oracle Designer Oracle Corporation
Data Modeling Development Cycle
- Gathering Business Requirements – First Phase:
Data Modelers have to interact with business analysts to get the functional requirements and with end users to find out the reporting needs. - Conceptual Data Modeling(CDM) – Second Phase:
This data model includes all major entities, relationships and it will not contain much detail about attributes and is often used in the INITIAL PLANNING PHASE. - Logical Data Modeling(LDM) – Third Phase:
A logical data model is the version of the model that represents all of the business requirements of an organization. - Physical Data Modeling(PDM) – Fourth Phase:
This is a complete model that includes all required tables, columns, relationship, database properties for the physical implementation of the database. - Database – Fifth Phase:
DBAs instruct the data modeling tool to create SQL code from physical data model. Then the SQL code is executed in server to create databases.
DATA MODELING STANDARDS
- Several data modelers may work on the different subject areas of a data model and all data modelers should use the same naming convention, writing definitions and business rules.
- For example, when a data warehouse is designed, it may get data from several source systems and each source may have its own names, data types etc. These anomalies can be eliminated if a proper standardization is maintained across the organization.
Some general guidelines are listed below that may be used as a prefix or suffix for a table.
- Lookup – LKP – Used for Code, Type tables by which a fact table can be directly accessed.
e.g. Credit Card Type Lookup – CREDIT_CARD_TYPE_LKP - Fact – FCT – Used for transaction tables:
e.g. Credit Card Fact – CREDIT_CARD_FCT - History – HIST – Tables the stores history.
e.g. Credit Card Retired History – CREDIT_CARD_RETIRED_HIST - Statistics – STAT – Tables that store statistical information.
e.g. Credit Card Web Statistics – CREDIT_CARD_WEB_STAT - Key – Key System generated surrogate key.
e.g. Credit Card Key – CRDT_CARD_KEY - Identifier – ID – Character column that is used as an identifier.
e.g. Credit Card Identifier – CRDT_CARD_ID - Code – CD – Numeric or alphanumeric column that is used as an identifying attribute.
e.g. State Code – ST_CD - Description – DESC – Description for a code, identifier or a key.
e.g. State Description – ST_DESC - Indicator – IND – to denote indicator columns.
e.g. Gender Indicator – GNDR_IND - Index – Index – IDX – for index names.
e.g. Credit Card Fact IDX01 – CRDT_CARD_FCT_IDX01 - Primary Key – PK – for Primary key constraint names.
e.g. CREDIT Card Fact PK01- CRDT-CARD_FCT_PK01 - Alternate Keys – AK – for Alternate key names.
e.g. Credit Card Fact AK01 – CRDT_CARD_FCT_AK01 - Foreign Keys – FK – for Foreign key constraint names.
e.g. Credit Card Fact FK01 – CRDT_CARD_FCT_FK01
Steps to create a Data Model
- Collect Business Requirement
- Create domain.
- Create Conceptual Model.
- Create Entities and add definitions.
- Create Logical Model.
- Create attribute and add definitions.
- Create Physical Model
- Assign data type to attribute.
- Create primary or unique keys to attribute.
- Create check constraint or default to attribute.
- Create unique index or bitmap index to attribute.
- Create foreign key relationship between entities.
- Add database properties to physical data model.
- Create SQL Scripts from Physical Data Model and forward that to DBA.
- Maintain Logical & Physical Data Model.
- Create a change log document for differences between the current version and previous version of the data model.
Role of a Data Modeler
- Business Requirement Analysis.
- Development of data model.
- Review.
- Creation of database.
- Support & Maintenance
Conceptual Data Modeling
- Conceptual data model includes all major entities and relationships and does not contain much detailed level of information about attributes and is often used in the INITIAL PLANNING PHASE
- Conceptual data model is created by gathering business requirements from various sources like business documents, discussion with functional teams, business analysts, smart management experts and end users who do the reporting on the database. Data modelers create conceptual data model and forward that model to functional team for their review.
Logical Data Modeling
- This is the actual implementation and extension of a conceptual data model. A Logical data model is the version of a data model that represents the business requirements(entire or part) of an organization and is developed before the physical data model.
- As soon as the conceptual data model is accepted by the functional team, development of logical data model gets started. Once logical data model is completed, it is then forwarded to functional teams for review. A good data model is created by clearly thinking about the current and future business requirements. Logical data model includes all required entities, attributes, key groups, and relationships that represent business information and define business rules.
Physical Data Modeling
- Physical data model includes all required tables, columns, relationships, database properties for the physical implementation of databases. Database performance, indexing strategy, physical storage and denormalization are important parameters of a physical model.
- Logical data model is approved by functional team and there-after development of physical data model work gets started. Once physical data model is completed, it is then forwarded to technical teams(developer, group lead, DBA) for review. The transformations from logical model to physical model include imposing database rules, implementation of referential integrity, super types and sub types etc.
Relational (OLTP) Data Modeling
- Relational Data Model is a data model that views the real world as entities and relationships. Entities are concepts, real or abstract about which information is collected. Entities are associated with each other by relationship and attributes are properties of entities. Business rules would determine the relationship between each of entities in a data model.
- The goal of relational data model is to normalize (avoid redundancy)data and to present it in a good normal form. While working with relational data modeling, a data modeler has to understand 1st normal form thru 5th normal form to design a good data model.
Dimensional Data Modeling
- Dimensional Data Modeling comprises of one or more dimension tables and fact tables. Good examples of dimensions are location, product, time, promotion, organization etc. Dimension tables store records related to that particular dimension and no facts(measures) are stored in these tables.
- For example, Product dimension table will store information about products(Product Category, Product Sub Category, Product and Product Features) and location dimension table will store information about location( country, state, county, city, zip. A fact(measure) table contains measures(sales gross value, total units sold) and dimension columns. These dimension columns are actually foreign keys from the respective dimension tables.
Difference between Relational and Dimensional Modeling
Primary level of information | Secondary level of information |
Tables are units of storage | Cubes are units of storage |
Data is normalized and used for OLTP. Optimized for OLTP processing | Data is denormalized and used in data warehouse and data mart. Optimized for OLAP |
Several tables and chains of relationships among them | Few tables and fact tables are connected to dimensional tables |
Volatile(several updates) and time variant | Non volatile and time invariant |
Detailed level of transactional data | Summary of bulky transactional data(Aggregates and Measures) used in business decisions |
Some Important terms and abbrevations
Attribute An attribute is a part of the description of the entity.
Entity Any kind of information of importance to the business.
Relationship A relationship (link) between two entities.
Cube Collection of Dimension and Facts
Dimension Textual description of the business
Facts Numerical units of the business
Normalization Process of removing the redundancies
OLTP Online Transaction Processing
OLAP Online Analytical Processing