Clear and Load the ODS Tables Idea clear and load the staging area before DWH integration Level Basic Project Star Schema Integration Job Reading Order 4/7 Project Date @August 10, 2022 Job 1 : ODSDimClear Purpose Requirements Job overview Components Connect ODS Select NbRows Account Set NbRows Truncate table Account Get metadata Write metadata Select NbRows Entity Set NbRows Truncate table Entity Get metadata Write metadata Job 2 : ODSFactClear Purpose Requirements Job overview Components Connect ODS Select NbRows Ledger Set NbRows Truncate table Ledger Clear and Load the ODS Tables 1 Get metadata Write metadata Job 3 : ODSDimLoad Purpose Requirements Job overview Components Connect ODS Get CSV file Account Insert records Account Get metadata Write metadata Get CSV file Entity Insert records Entity Get metadata Write metadata Job 4 : ODSFactGetLatestFile Purpose Requirements Job overview Components Iterate all files List all files Select latest file Buffer result Job 5 : ODSFactLoad Purpose Requirements Job overview Components Connect ODS Get CSV file Ledger Set file date Insert records Ledger Get metadata Write metadata Job 1 : ODSDimClear Clear and Load the ODS Tables 2 Purpose This job is designed to clear the staging area hosting the dimensions raw data. This job therefore focuses on the two tables containing dimension information : Account and Entity. Requirements We clear the staging area based on the defined requirement : Full data loads on the staging area are sufficient as historical staging information is irrelevant for the client. As usual, we update the audit table based on the defined requirements : Each data ETL load/reload batch should be traceable through a uniquely generated UID. All DML changes should be tracked in a metadata table. This metadata table should contain the number of rows deleted, inserted, updates, rejected, as well as the batch UID. Job overview Clear and Load the ODS Tables 3 Components Connect ODS Select NbRows Account Track the number of rows that will be deleted from the staging area : Set NbRows Clear and Load the ODS Tables 4 Truncate table Account Get metadata Write metadata Clear and Load the ODS Tables 5 Select NbRows Entity Track the number of rows that will be deleted from the staging area : Set NbRows Truncate table Entity Clear and Load the ODS Tables 6 Get metadata Write metadata Job 2 : ODSFactClear Clear and Load the ODS Tables 7 Purpose As explained previously, it is not necessary to keep a track record of all the data that was loaded in the staging area. For this reason, this job is designed to clear the staging area hosting the facts raw data. Requirements We clear the staging area based on the defined requirement : Full data loads on the staging area are sufficient as historical staging information is irrelevant for the client. Note that we create two different jobs for facts and dimensions since those are two different processes with different timelines and different potential reload requirements. Recall from the requirements : The client will ask for file reloads under two scenarios : 1. The rejections may reveal that the Entity/Account tables are incomplete and should be re-loaded. 2. The client may realize that some records are missing from the Ledger information and should be loaded. As usual, we update the audit table based on the defined requirements : Each data ETL load/reload batch should be traceable through a uniquely generated UID. All DML changes should be tracked in a metadata table. This metadata table should contain the number of rows deleted, inserted, updates, rejected, as well as the batch UID. Job overview Clear and Load the ODS Tables 8 Components Connect ODS Select NbRows Ledger Clear and Load the ODS Tables 9 Set NbRows Truncate table Ledger Get metadata Clear and Load the ODS Tables 10 Write metadata Job 3 : ODSDimLoad Purpose Once the staging data is cleared, this job is designed to load the staging tables holding the dimension information from the up-to-date .csv files made available. This job therefore focuses on the two tables containing dimension information : Account and Entity. Requirements As usual, we update the audit table based on the defined requirements : Each data ETL load/reload batch should be traceable through a uniquely generated UID. Clear and Load the ODS Tables 11 All DML changes should be tracked in a metadata table. This metadata table should contain the number of rows deleted, inserted, updates, rejected, as well as the batch UID. Job overview Components Connect ODS Clear and Load the ODS Tables 12 Get CSV file Account Insert records Account Clear and Load the ODS Tables 13 Get metadata Write metadata Get CSV file Entity Clear and Load the ODS Tables 14 Insert records Entity Get metadata Clear and Load the ODS Tables 15 Write metadata Job 4 : ODSFactGetLatestFile Purpose This job is designed to output the name of the latest file to be integrated. Requirements Recall from the requirements : If the Ledger extracts of multiple months are made available at once, only the latest extract should be loaded (highest date, derived from the file naming convention “GENERAL_LEDGER_YYYYMM”). Job overview Components Iterate all files Clear and Load the ODS Tables 16 Iterate all files containing : “GENERAL_LEDGER” … “.csv” in descending order : List all files Select latest file Clear and Load the ODS Tables 17 Buffer result Save the result in a buffer to transmit the information to the parent job in a row context : Project Master Execution Job 5 : ODSFactLoad Purpose Once the staging data is cleared, this job is designed to load the staging tables holding the facts information from the latest up-to-date .csv file made available previously. Requirements As usual, we update the audit table based on the defined requirements : Clear and Load the ODS Tables 18 Each data ETL load/reload batch should be traceable through a uniquely generated UID. All DML changes should be tracked in a metadata table. This metadata table should contain the number of rows deleted, inserted, updates, rejected, as well as the batch UID. Job overview Components Connect ODS Clear and Load the ODS Tables 19 Get CSV file Ledger From the context variable CSVFileNameLedger made available earlier, get the .csv ledger file to be integrated : Set file date From the context variable CSVFileNameLedger made available earlier, extract the YYYYMM date : Clear and Load the ODS Tables 20 Insert records Ledger Get metadata Write metadata Clear and Load the ODS Tables 21 Clear and Load the ODS Tables 22
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-