Solved Class Imbalance in Glass Dataset using IRUS Presented By: Muhammad Ahmed - K180231 Muhammad Taha Haider - K180341 Muhammad Usama - K180200 Research Goal The goal is to solve the class imbalance problem on a dataset having extremely big class imbalance ratio by using one of the unique subsampling technique "inverse random subsampling" and then comparing it with other methods. Data Retrieval ● The data set we intended to retrieve is called Glass identification data set. This data set is a multi - class data set which provides an analysis of 7 different types of glasses. ● This data set was retrieved from github. But the data set did not provide any attribute name,which were then fetched separately from the official UCI website. Data Exploration After the data was extracted it was thoroughly observed There are 10 attributes in our data set And the distinct 7 types of glass were building windows processed Following are percent value of Imbalances exist in each class • 1 st class which is building window processed contains a sample size of 70 ( 32 71 % ) • 2 nd class which is building window non processed contains a sample size of 76 ( 35 51 % ) • 3 rd class which is vehicle window processed contains a sample size of 17 ( 7 94 % ) • 4 th class which is vehicle window non processed contains a sample size of 0 ( 0 % ) • 5 th class which is containers contains a sample size of 13 ( 6 07 % ) • 6 th class which is tableware contains a sample size of 9 ( 4 21 % ) • 7 th class which is headlamps contains a sample size of 29 ( 13 55 % ) Data Exploration 01 We can see that the features have a good relation with each other and with target variables as well Distribution of features Data Preparation To see if there are any null in the columns the follow code snippet was used which shows that there are no null values in the data table On the Further inspection of the data we have observed that the data is not in same scale and is also skewed, we have first normalize it and then bring it to the same scale using Standard Scaler After Standardizing the data into similar scale It was noticed that the distribution is still very skewed because of outliers, so we had remove them. Data Exploration 02 As we can see the distribution of the data is very skewed because of outliers, Data Exploration 03 Distribution of data after normalizing, scaling and removing outliers. Data Modeling We have made One v/s All approach (the same approach is done in the IRUS paper) so we have iterate over all minority classes and then treat them in binary level ● Higher accuracy ● Applied to both classifications and regression task. ● Can Handle a large data This chart compares results of the ROC AUC Individual and Average score obtained from IRUS method on the Glass dataset with respect to each Fold Data Presentation & Results 01 This chart compares the results of F 1 Individual and Average score obtained from IRUS method on the Glass dataset with respect to each Fold Data Presentation & Results 02 Comparison w/other Methods Automation and presentation Along with this report, we have attached a PowerPoint presentation having some main points of our project, a notebook with interactive graphs containing a lot of useful and insightful information about the data, the results are also interactively presented and finally we have made a separate dashboard in HTML format having all the important details about the data like missing values, distributions, correlations etc. For future work, Since deployment was out of scope of this project we didn’t cover it but we will be taking this project to end user by deploying it on flask and a separate web page for this. Thank you for your time!