SQL Data Warehousing | SQL Data Modelling
I welcome you to all of the SQL series in today’s article we’ll be learning about SQL data warehousing now let’s see the agenda of the day firstly we will begin with an introduction to data matter first that we will introduce you to data modeling then, later on, we’ll be learning about different types of data modeling like conceptual logical and physical data modelling.
The data mart is nothing but a subset of our data values so consider this is my the transactional database which we are having in real-time so this is our operational systems so we are not going to load in the single-stage to the data barriers we do have so many stages will be available.
From the OLTP to oil ap trend transfer rate data transfer so definitely we do have multiple stages available in real-time so like staging area is one of the stages in the detail process like we will land all the data from different oil systems to OLTP system to staging from the staging too we will move the data to data variables so from the data values a subset of data values, We will create either logically or separate data much will be created again the data marked also database only but it is a logically different dividend.
How we do how one terabyte of the hard disk has been divided like c type d drive and e drive the same way so here also we will logically divide the data values or one particular database into multiple data marked the data marked will have subject-oriented information based on their business, Consider this is the two this is the difference between data values and data mart.
The data values it is it will have all the enterprise data but data mart will have a particular subject a department by data here we do have multiple subject areas are available in this particular data values but here it will be a single subject area each data mark will have single subject area.
Multiple data sources definitely will have multiple from the multiple data sources we will accumulate all the data we will integrate the data into paragraphs but here it will be limited data source this might be the source for data mart again we need to do the ETL process to fetch the data from it.
Data values to data mart here we used to do an ETL process again here ETL tool will be introduced so here it will occupy a large memory okay the data values will occupy large memory but the data mark will occupy limited memory, this will take a lot of a lot of time to implement this data values but the data march will uh take some shorter time to implement it.
This is the major difference between data march and data values so we do have different types of data march in real-time so, for example, three different types of data mods are available so the first type of data mark is independent data mart okay so the dependent data mart and independent data mart and hybrid data mart so what is meant by independent data match.
Here we do have OLTP systems so from the OLTP we are moving the data to data variables so from the variety we are fetching the data for our data mod so this is called dependent data match the data mark is dependent on the data values okay so from the variables. We are pulling the data so the data is first extracted from the OLTP system and then populated to the data mart the central data bar goes to data mart so from the various the data travels to the data mart but here independent data merge the data various will not be available okay so directly the data marked will be loaded from OLTP source so this is called independent data mart and then we do have hybrid data mart.
The hybrid data match the data mark will get the data from either from the OLTP system or from the database system so this is the three different data part available in the data values so what is the data modeling.
Data modeling is nothing but so for example if the business users so will give us the business-related terms okay business-related terminologies he knows or she knows the business and she wants to convert the business into a model okay data model.sHow to convert that so we do have a different methodology to draw this okay to create that data model.
Three Different Method
We do have three different methods so one is called conceptual model, logical model, and physical data model.
What it does mean so what is a conceptual model so how to create this conceptual model and what is a logical model what is the physical model we will see one by one you assume that so we are going to create the data model for one of the particular online portal or something okay, so online portal the online product selling portal we are going to create it this online portal right definitely they do have uh they need to have so these are all the tables.
We should have so to have our customers pieces of information we need to have the customer cable to maintain our employee’s details we will have the employees tables to maintain the customer’s accounts. we will have the accounts table to maintain the product information will have the product now the region table so to have the different regions on the sales to analyze which region is performing well to analyze which product is performing well so which account they have they have uh transferring the money or doing the transaction the money which employee is doing.
How is the performance on the customers okay so like like this we do have a lot of tables are involved in the data modeling so if you consider these tables so countries table regions, table accounts, table merchant so a lot of tables are available?
In this customer table so each customer will be identified by customer id so id we will call it a primary key in a database.
Then you will have all other information of our customer so, for example; customer name in this case customer name might be a duplicator okay so two customers may have the same name but the customer id will be different the customer mobile number okay we can get the customer mobile number to give them the update the mobile number we can get the data birth of the customer to identify to analyze the age of the customers.
Which type of customers which age group of the customers is ordering the product okay more so to get that we need to have the data birth so, we will get the address of the customers which location their customers are ordering more and on the city of the customer and country of the customer from working at different countries so these are all called the different dimensions.
Different dimensions of the customers so if you consider customer is one of the tables then this is the customer id is the primary key so primary key in the sense it will not allow duplicate to the columns so primary key and customer name the mobile number the data is the different dimensions of the customers this is the dimension table we do have an employee table the same way employee id employee name mobile number employee skillset employees of experience and employee city address all those we can get it the account number will be the primary key.
And what is the type of account? what is the active account or inactive account? so different details we can get the accounts so these are all the tables are diamonds and tables okay so different dimensions of the data and by using these dimensions if I’m going to create a sales okay sales this table is called as pack table.
The senses fact table will have each key values here which customer has ordered the product okay which employees have helped to purchase the product employee id okay then which product he has purchased product id then which account number okay which account number he has purchased which region has purchased the product so at the end of the day.
I want to analyze the summaries data on the regions in which the region is performing well. which merchant is performing well. We will have the merchant id this is the way I will get all the key values from the table okay key values so these key values are called foreign keys okay the integrity to maintain the integrity will have the foreign keys so normalization we will see more on the primary and foreign key but as of now you understand us, this foreign primary key relationship so, each column is called as foreign that is the reference is coming from the base table the dimension table.
So, this is like the diamond the fact table we have all the keys and then we do have the quantity. How much quantity he has purchased the product? for example; two quantities he has purchased what is the one quantity unit price, one quantity, unit price, and what is the total quantity has persisted or total price. If one quantity two quantity is purchased one quantity is the hundred dollars then the total price will be two hundred dollars.
These three call it as the measures the measurable values. why because these are the measurable values and this column is called the dimensions so these are like the foreign keys that will be maintained in the fact table this is a fact table f underscore sales is the fact table. It is derived from the dimension table to have all the customer’s information the employee will not keep.
All the customer’s information here just keeps the customer id which has purchased which employee has helped to purchase which product has been purchased on which account number from which account he has purchased which region id and which merchandise are those too. After that, If I want to get these particular total sales for the day so even I will use the date key okay date id update key to have this date so date dimensions will get the date so which date.
After that so I will combine and sum up this total price for a particular day and then I will get the merchant on the ranking and I will go for the merchant id join with this merchant table and then get which merchant is having the highest sales today, so this is what the fact and dimension tables are related.
Again the data modeling is having three different types one is the conceptual model and the logical model on the physical model.
what is that conceptual model so you have the concept you have a business concept? so I want to create a data model for this particular business.
Take the online food delivering business so this is one of the concepts, I have as of now so for this I want to draw a data model. what are the tables needed for this food delivering particular business.
I will have the time, okay we have the date and time dimensions I will have the product, okay different product will have a different store to purchase the products all those so whatever the diamonds and tables you needed for all the for this particular business you will draw all the dimension tables like time dimensions product store and regions and the merchant account whatever the dimensions you want to have you have to draw.
The Dimensions an only names so no need to mention any detailed pieces of information about that. These are called its entities. It is nothing but a table, a technical word is stable but in the conceptual data model we will call it an entity so, what are the entities we are going to have all the entities we are going to draw includes the important entities and the relationship among them.
Then what is the relationship we are going to create so the tables are like divided into two parts one is the upper path and the lower path? The upper path is having the primary keys so whatever the keys you want to have, for example, this product in the product table might have the product Id the store table will have the store Id the sales table will have all the foreign keys.
This is what the conceptual diagram we write but here we will not write any attributes is nothing but column names .since it is a conceptual data model we are not going to write any primary keys or foreign keys all those we are just mentioning the entity names and how the entities are related to each other. so this is called a conceptual model.
You can have a sing a simple paper and paint pen to draw this or you can have this paint okay so just you can drag and drop all those and then you can have this conceptual data model in this conceptual data model it will be highly abstract okay so it will be highly abstract so you cannot have any column names it will be easy to understand.
In this case, every model of the conceptual data model will have this highly abstract. It will not have any attributes name in the sense the field name the column name, it will easily understandable you can easily enhance so you can add more entities to the particular model okay .so only the entities are visible.
You should not write anything . It’s like a first step of the data model so we will have all the entities so only entities we will drop that all the relationship between the tables entities so no software tool is required to define a conceptual data model.
Since it is only the concept we will not need to have any software to draw this so this is one of the conceptual data models so time product sales store regions account merchant so n number of tables can be connected with uh the fact tables might be connected with different dimension tables as well so this is called a concept conceptual data model then we have the logical data model in this logical meta-model it has been enhanced from the conceptual data mode.
The first step is we will write or we will draw this on the conceptual data model after that in this conceptual data model we will have this what is the primary key we will use okay so definitely each entity we need to have the primary key for this time dimensions we will have the date primary key date key in product, it will have a product id so it is like a description of the column, not the column name, not the exact column name you will write simply.
We shall write product id okay with space also if it is a column name so definitely if it is attribute name we should write with an underscore. so, since it is a description so you can write with this okay so date descriptions month descriptions and here we are those so whatever the attributes you want to have you can have all the attributes here in the product table in the product entity.
You can have product id product description category and category descriptions unit price created time on updated time you can have all those and then the sales the store right the store id store descriptions which region is from the store the region name a date updated all the all those you can have it and then the store that I as I told the upper part of the fact table.
All the foreign keys are from the store table you’ll have the store id from the product table you’ll have the product id from the date from the time table you’ll have the date key after that item sold item amount.
I told right that will be the measures only the message the foreign keys and messages will be there on the fact table so item sold is the quantity again the sales amount is the price so these two are called as measures okay foreign keys and measures will be maintained on the fact table.
So, this is what we will maintain in the logical data model we will not write exact column names we will not write any primary keys and all those we will simply mention that the column description it includes all the entities and relationship among them all the attributes of the cells and the column names all the attributes for each entity are specified the primary key is for each entity is specified.
The foreign key is also identified and specified with this fact table and normalization occurs at this level normalization in the sense okay so for example I do have a store table I will I’m assuming that in the store table I have the region and region name by this I can split this store table into one more table okay consider this is called normalization anyhow we are going to see tomorrow so consider this is one more table as the region table.
We are creating so the region table we are creating here okay on this region table so we are going to delete these two fields from this table and then here we have only had the teacher id okay region id column so this region id is connected with this region id in this table so this is region id we have region id. So and then region name so a single dimension table is again split into another dimension table.
To avoid the data redundancy so how we will avoid the data redundancy so I because if I have a particular store okay a particular store uh in different regions, okay so I do have different regions and in a single region I have multiple stores 100 stores are there in a single region.
The 100 regions information will have 100 times the region names will be repeated to avoid this we will have only the region id here and we will take that region name only one time here region id and region so that the data redundancy will be avoided by using the normalization so this is called normalization.
Normalization occurs at this level so, if any possibility of the normalization on the tables we will split up this table into multiple tables this is called a logical data model so the presence of attributes so definitely will have the attributes present in each entity and the key attributes the primary can foreign key and the non-key attributes the description you will have right the non-key attributes you have these are all the non-key attributes this date is the key attributes.
Here the product id is the key attributes then primary key foreign key relationship definitely will have the primary and foreign key relationship and user-friendly attribute names you can have only the descriptions no need to have the primary and foreign key sorry no need to have that entire column name you can simply you can have these descriptions and you can it will be more detailed than the conceptual model.
And then you’ll have all the column names right so Erwin is the data modeling tool our power designer okay can be used to create logical data models so this data model we are going to create and then this can be automatically converted to your physical data model with the help of this tool so if you create the logical data model so only, this model then the Ervin tool will convert this model into a physical data model, okay so it will create the table names all those it will create the with the columns so this is the physical data model.
In this, each table will have the uh column attributes as well as the data type of the attributes and the data length of the attributes okay for example if you consider this is one of the tables the time table okay so here that the table name is uh mentioned as dim underscore time so dim is nothing but dimensions so dim underscore time dem underscore product fact underscores sales I as I told uh the sales table will be a fact table so this is the actual format we will give okay so deem underscore store is a dimension table.
The dimension tables are connected with the fact table here so see here we have the all the column names underscore with underscore exact column name should be there okay so what is the data type or for example; it is an integer or it works so how much is the data length that is also be provided here so detailed the logical model from the logical model the physical model will be automatically drawn by using any other data modeling tool.
So the admin tool if you give this for the advent tool if you give this model, automatically it will convert that into this model okay this model of data so fact underscores sales so, definitely it will have the fact tables so all the foreign keys again it is also integer integer integer and integer and floats so mostly it will have the id columns the key columns and these are dimensions.
Here the specification of all tables and columns will be maintained in this physical data model the foreign keys are used to identify the relationship between the tables and the normalization may occur okay denormalization is in the sense, we will again we will have split the tables right two tables here might merge these two tables into a single table so if necessary we will do this denomination based on the requirement.
Physical consideration may cause the physical data model okay to be quite different from the logical data model so the physical data model again it will create okay uh it will differ from each database if it is Teradata might be the backer it will be okay the integer begin integer okay but if it is that oracle the integer might be number the worker might be working too okay so different databases so whenever we are having the turbine date tool.
If you create the logical data model we should select what is the database we are going to apply this so if you select oracle then based on this particular database it will convert that logical into a physical data model so this is the physical data model it will create uh then so entities referred as a table here so again we will call it since it is a physical table so here we are going to make the terminologies as the table here okay we will not call this as the entity we will call it as table attributes.
Database compatible table database again compatible the table names database compatible column names we need to provide the database-specific data types also mentioned so difficult for the user to understand so those who don’t have any technical knowledge they were somewhat difficult to understand but if they are seeing that they can able to understand.
what is these column names so overall they can able to understand but in detail level they cannot they could not able to understand again we can create the index constraint triggers all the DB objects with this physical data model so the create table structure has been created automatically for this create a physical data model.
At last by using any other tool this is what the tool will create the automatic automatically logical data model to physical data model of the different versions different database servers, okay so this is the concept of three different data model just a quick info guys.