Fact_Table_Update.pdf

Fact Table Update 1 Fact Table Update Idea manage fact table SK and track metadata using the SCD component Job Level Intermediate Project Star Schema with SCD Reading Order 3/3 Requirements Requirements Summary Job overview Components Connect to DB Create fact table Iterate all files List all files Select latest file Set var filename Delete facts Get metadata Write metadata Load latest file Get Dim Lookup dimensions Write facts Write rejections Die IF Discussion Scalability Fact Table Update 2 Summary Load the latest file located in a folder (based on its file name). The files to be considered are the .csv containing "GENERAL_LEDGER" in their name Delete the rows in the fact table [FactLedger] if this file was previously loaded Replace the natural keys of the fact data with surrogate keys from the dimension tables [DimEntity] and [DimAccount] Return a file with all rows from the fact table that did not match with the dimension tables. Specify the dimension tables with which the facts did not match Track data updates using a generated UID Register the changes in a log table [DimLog] Return an error if there is any missmatch between [FactLedger] and ([DimEntity] or [DimAccount]) Rollback if any error occurs Job overview 1. Connect to the database and create the fact table 2. Get the latest file and delete the facts associated to this file if they were previously loaded 3. Record the changes made 4. Replace the natural keys of the fact data with surrogate keys from the dimension tables [DimEntity] and [DimAccount]. Keep the rejections in memory and write the facts 5. Record the changes made 6. Merge all rejections and write the details in a file then return an error to kill the job. If no rejection occurred, do not return any error Fact Table Update 3 Components Connect to DB Connect to the shared connection on the local host database Create fact table Create a fact table based on a defined schema Fact Table Update 4 Iterate all files Iterate through all .csv files containing "GENERAL_LEDGER" in descending order List all files Transform the iterate relationship to a row data flow Select latest file Retrieve the top row from the list of files in descending order Fact Table Update 5 Set var filename Create a global variable from the previously selected row Delete facts Delete rows containing the selected file name Get metadata Retrieve the information from the DB component Fact Table Update 6 Write metadata Write the metadata input to the [DimLog] table Load latest file Load the latest file based on the global variable previously set-up Get Dim Get the dimension tables example with "Delete facts" Fact Table Update 7 Lookup dimensions Replace the natural keys in the facts with the surrogate keys from the dimension tables. Further, create one output for the rejections of each dimension table Note : we select only the active dimensions in the dimension tables : e.g. DimEntity.scd_active == 1 Note : we use a Unique match which returns the last value in the dimension table example with DimEntity Fact Table Update 8 Note : we select only the rejections in each output : e.g. DimEntity.JOURNAL == null, and Catch lookup inner join reject = true Write facts Write the facts to the fact table Write rejections Write the rejections in a .csv file. Overwrite the file if it already exists Die IF Fact Table Update 9 Kill the job if there is at least one rejection ((Integer)globalMap.get("tUnite_1_NB_LINE")) > 0 Discussion Scalability The job has been designed in a way that makes it scalable and easy to integrate new dimension tables : The dimension tables can be simply added to the tMap "Lookup dimensions" to add their surrogate keys The rejections will always have the same table structure regardless of the dimension table and can be consolidated under the same tUnite The tDie component is trigger dynamically from the tUnite_NB_LINE; there is no hard coded rejection rule based on the dimension tables