1 MA T R U S RI E N GI N EE R I NG C O L L EG E (An Autonomous Institution) (Sponsored by: Matrusri Education Society, Estd: 1980 ) ( Approved b y A I C T E and A f fil i at e d t o O s m a n i a Un i v e rsi t y) S a id a b a d, H y der a b a d # 16 - 1 - 486, Saidabad, Hyderabad - 500059. Ph: 040 - 24072764 DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING (DATA SCIENCE) (Accredited By NBA and NAAC ) L A B M A N UA L Fundamentals of Data Analytics and Data Visualization L ab ( PC551DSU23 ) III B.E S E M E S TE R - I (2025 - 202 6 ) 2 DEPARTMENT VISION The Computer Science and Engineering Department aims to produce competent professionals with strong analytical skills, technical skills, research aptitude and ethical values. MISSION M1: To provide hands - on - experience and problem - solving skills by imparting quality Education M2: To conduct skill - development programs in emerging technologies to serve the Needs of industry, society and scientific community. M3: To promote comprehensive education and professional development for effective Teaching - learni ng processes. M4: To impart project management skills with an attitude for life - long learning with Ethical values. PROGRAMME EDUCATIONAL OBJECTIVES (PEOs) 1. To learn engineering knowledge and problem analysis skills to design and develop solutions for computer science and engineering problems. 2. To address the feature engineering with the usage of modern IT and Software tools. 3. To acquire and practice the profes sion with due consideration to environment issues in conformance with societal needs and ethical values. 4. To manage projects in multidisciplinary environments as a member and as a leader with effective communications. 5. To engage in life - long learning in the context of ever changing technology. 3 PROGRAMME OUTCOMES (POs) Upon the completion of programme, the student will be able 1. Engineering knowledge : Apply and integrate the knowledge of computing to computer science and engineering problems. 2. Problem Analysis : Identify, formulate and analyze complex engineering problems using computer science and engineering knowledge. 3. Design/Development of solutions : Design and develop components or processes to engineering problems as per specification with environmental consideration. 4. Conduct Investigations of complex problems : Interpret and integrate information to provide solutions to real world problems. 5. Modern tool usage : Select and apply modern engineer ing and information technology tools for complex engineering problems 6. The engineer and Society : Assess and responsible for societal, health, safety, legal and cultural issues in professional practice. 7. Environment and Sustainability : Understand the impact o f computing solutions in the context of societal, environmental and economical development. 8. Ethics : Commit to professional ethics and responsibilities and norms of the engineering practice. 9. Individual and team work : Function as an individual, as a member o r leader in multidisciplinary environment. 10. Communication : Acquire effective written and oral communication skills on technical and general aspects. 11. Project management and finance : Apply engineering and management principles to manage projects in multidisci plinary environments. 12. Life - Long learning : Identify the need of self - learning and life - long learning in the broad context of technological evolution. PROGRAM SPECIFIC OUTCOMES (PSOs) Upon the completion of programme, the student will be able to 1. Familiar with open - ended programming environments to develop software applications. 2. Apply the knowledge of Computer System Design, Principles of Algorithms and Computer Communications to manage projects in multidisciplinary environments. 4 COURSE OBJECTIV ES AND COURSE OUTCOMES Course Objectives: 1. Learn how to create, inspect, and manipulate data frames and structured datasets in R. 2. Develop Data Pre - processing Skills. 3. Connect to SQLite database, retrieve data, and export processed results. 4. Apply Descriptive and Inferential Statistical Techniques. 5. Enhance Data Visualization and Interpretation and Explore Advanced Data Visualization Techniques Course Outcomes : Upon the completion of the Lab Course, student will be able to 1. To create , manipulate, and pre - process structured datasets in R. 2. Import, export, and manage diverse data formats (CSV,Excel,JSON,SQLite,text) for efficient data analysis workflows. 3. To apply descriptive statistical methods, compute summary statistics. 4. Analyze relationships between variables using correlation, covariance, and frequency tables. 5. To create and interpret various data visualizations, including histograms, scatter plots, box plots, 3D plots. 5 SYLLABUS List of Programs 1.Creating and Inspecting Data Frames A. Create a data frame students with the following columns: ID: Numeric student IDs Name: Character student names Age: Numeric age Grade: Character grades (e.g., A, B, C, etc.) B. View the structure of the data frame using str() and summary(). C. Display the first few rows using head() and the last few rows using tail(). D. Subset the students data frame to display only the students who are older than 21 E. Extract the names of students who have an "A " grade. F. Add a new column Passed to the students data frame, indicating whether the student has passed (Grade "A" or "B"). G. Create another data frame marks with the columns: ID: Student IDs (same as in students) Subject: Names of subjects Marks: Marks scored by the students. H. Merge the students and marks data frames on the ID column. I. Display the merged data frame. 2. Using pre - processing data set. A. Write an R script to detect, analyze and handle missing values. B. Identify an d remove outliers from the data set using statistical techniques. C. Apply Normalization and Standardization techniques to transform data. 3. Working with nominal, ordinal, relative, and absolute data types. A. Convert categorical (nominal) data into re lative frequencies. B. Convert ordinal data into absolute or relative frequencies. C. Convert relative or absolute frequencies back to ordinal or nominal form. 4. Import and Export data A. Write an r script to read data from a csv file and write data to a new csv file. B. Write an r script to read data from an excel file and write data to a new excel file. C. Write an r script to read data from a json file and write data to a new json file. D. Write an r script to connect to an sqlite database, re ad data from a table, and export it. E. Write an R script to Read a text file and write data to a new text file. 5. Descriptive Statistics in R A. Write an R script to compute mean, median, mode, variance, standard deviation, range, and summary statist ics. B. Calculate covariance and correlation between two variables of a dataset. C. Create a frequency table for categorical data. 6. Visualize single - variable distributions using histograms, box plots, and density plots 7. Visualize relationships bet ween two variables using scatter plots, line graphs, and bar charts using R 8. Visualize three - dimensional (3D) data using R for better understanding of relationships 6 between three variables 9. Create Interactive 3D Plot using plotly. 10. Visualize da ta across geographical regions. 11. Display multiple visualization types together. 12. Represent hierarchical relationships. List of Programs S.No Lab Programs Page No 1 1.Creating and Inspecting Data Frames A. Create a data frame students with the following columns: ID: Numeric student IDs Name: Character student names Age: Numeric age Grade: Character grades (e.g., A, B, C, etc.) B. View the structure of the data frame using str() and summary(). C. Display the first few rows using head() and the last few rows using tail(). D. Subset the students data frame to display only the students who are older than 21 E. Extract the names of students who have an "A" grade. F. Add a new column Passed to the students data frame, indicating whe ther the student has passed (Grade "A" or "B"). G. Create another data frame marks with the columns: ID: Student IDs (same as in students) Subject: Names of subjects Marks: Marks scored by the students. H. Merge the students and marks data frames on the ID column. I. Display the merged data frame. 7 2 Using pre - processing data set. A. Write an R script to detect, analyze and handle missing values. B. Identify and remove outliers from the data set using statistical techniques. C. Apply Normalization and Standardization techniques to transform data. 3 Working with nominal, ordinal, relative, and absolute data types. A. Convert categorical (nominal) data into relative frequencies. B. Convert ordinal data into absolute or relative frequencies. o absolute or relative frequencies. C. Convert relative or absolute frequencies back to ordinal or nominal form. 4 Import and Export data A. Write an r script to read data from a csv file and write data to a new csv file. B. Write an r script to read data from an excel file and write data to a new excel file. C. Write an r script to read data from a json file and write data to a new json file. D. Write an r script to connect to an sqlite database, read data from a table, and export it. E. Write an R script to Read a text file and write data to a new text file. 5 Descriptive Statistics in R A.Write an R script to compute mean, median, mode, variance, standard deviation, range, and summary statistics. B. Calculate covariance and correlation between two variables of a dataset. C. Create a frequency table for categorical data. 6 Visualize single - variable distributions using histograms, box plots, and density plots 7 Visualize relationships between two variables using scatter plots, line graphs, and bar charts using R 8 Visualize three - dimensional (3D) data using R for better understanding of relationships between three variables 9 Create Interactive 3D Plot using plotly. 8 10 Visualize data across geographical regions. 11 Display multiple visualization types together. 12 Represent hierarchical relationships. 9 1 Introduction to R and RStudio R is a widely - used programming language for statistical analysis, visualization, and data science. RStudio is an Integrated Development Environment (IDE) for R, which provides an easy - to - use interface for writing, executing, and managing R code. 1.1 Install R and RStudio To begin, you need to install both R and RStud io. Follow the steps below: • Step 1: Install R – Go to the official R website: https://cran.r - project.org/ – Choose your operating system (Windows, macOS, or Linux). – Download the appropriate R installer. – Run the installer and follow the instructions. • Step 2: Install RStudio – Visit the RStudio website: https://rstudio.com/ – Download the RStudio installer for your operating system. – Run the installer and follow the prompts to complete the installation. Sample program: Write and execute your first R scrip t that includes basic arithmetic operations, variable assignments, and printing results. Once R and RStudio are installed, you can write your first R script. Below is a simple R script that includes basic arithmetic operations, variable assignments, and p rinting results: 1 # Basic Arithmetic Operations 2 sum < - 10 + 5 3 difference < - 10 - 5 4 product < - 10 * 5 5 quotient < - 10 / 5 6 7 # Variable Assignments 8 x < - 25 9 y < - 5 10 11 # Performing Operations on Variables 10 12 z < - x + y 13 result < - x / y 14 15 # Printing Results 16 print(sum) 17 print(difference) 18 print(product) 11 To create and inspect data frames in R, use the built - in data.frame() function for creation, and a selection of core functions for inspection. Creating a Data Frame A data frame is constructed from vectors (each vector becomes a column, and all vectors must have the same length). This results in a table - like structure where each column can have a different data type. Inspecting a Data Frame Once created, several functions help you inspect and explore its structure and contents: str(data_frame) : Shows the structure, column types, and a preview of values. summary(data_frame) : Provides summary statistics for numeric columns and frequencies for categorical col umns. head(data_frame, n) : Displays the first n rows. dim(data_frame) : Returns the number of rows and columns as a vector. 12 names(data_frame) or colnames(data_frame) : Lists column names. nrow(data_frame), ncol(data_frame) : Returns the number of rows or columns, respectively. data_frame$column_name : Accesses a single column. data_frame[row, col] : Accesses data by row and column. 1 . Creating and Inspecting Data Frames A. Create a data frame students with the following columns: ID: Numeric student IDs Name: Character student names Age: Numeric age Grade: Character grades (e.g., A, B, C, etc.) Here is the step - by - step algorithm in pseudocode: 1. Start 2. Create a vector of student IDs (numeric) 3. Create a vector of student names (character) 4. Create a vector of student ages (numeric) 5. Create a vector of student grades (character) 6. Combine the vectors into a data frame using data.frame() 7. Assign the resulting data frame to the variable students 8. Print the data frame ( print(students) ) 9. Display the structure of the data frame ( str(students) ) 10. End 13 14 B. View the Structure of a Data Frame Input: A data frame (e.g., students ) Steps: 1. Start 2. Use str() function Call the str() function with the data frame as argument ( str(students) ) This displays the structure: the type of each column, and example values. 3. Use summary() function Call the summary() function with the data frame as argument ( summary(students ) ) This provides a summary: for numerics, min/max/mean/etc, for factors/characters, value frequency. 4. End C. Display First and Last Few Rows of a Data Frame Input: A data frame (e.g., students ) Steps: 15 1. Start 2. Display first few rows Use head() function: Syntax: head(data_frame) This shows the first 6 rows by default. 3. Display last few rows Use tail() function: Syntax: tail(data_frame) This shows the last 6 rows by default. 4. End D. Algorithm: Subset Data Frame by Age Condition To subset a data frame by an age condition in R (such as selecting students above a certain age), use either the subset() function or logical indexing. Assuming your data frame is named student_info and has an age column, here is the R code: 16 1. Start 2. Identify the relevant column : The column used for filtering is Age 3. Formulate the condition : Check if Age > 21 4. Apply the condition to the data frame: Use either base R subsetting with brackets [ ] or the subset() function. 5. Store or display the result : The filtered data frame c ontains only rows where students are older than 21. 6. End E. Extract Names of Students with Grade "A" 1. Start 2. Identify the relevant columns : The Grade column to check for "A" The Name column to extract 3. Apply a condition to filter rows where Grade == "A" 4. Subset the Name column from the filtered rows 5. Store or display the resulting vector of names 6. End F. Add a new column Passed to the students data frame, indicating whether the student has passed (Grade "A" or "B"). 1. Start 2. Identify the criterion for passing : 17 If a student's Grade is "A" or "B" , then Passed is TRUE Otherwise, Passed is FALSE 3. Create a logical vector called Passed using the condition on the Grade column. 4. Add this Passed vector as a new column in the students data frame. 5. Inspect the updated data frame to confirm the new column is added correctly. 6. End F. To create a new data frame named marks with columns: To create a new data frame named marks with specified columns in R, you can use the data frame() function. ID (student IDs, same as in the students data frame) Subject (names of subjects) Marks (numeric marks scored) 18 G. Create another data frame marks with the columns: ID: Student IDs (same as in students) Subject: Names of subjects Marks: Marks scored by the students. 1. Start 2. Obtain ID values: Use the student IDs from the existing students data frame for consistency. 3. Define vectors for Subject and Marks : Create a vector of subject names (character). Create a vector of marks (numeric), ensuring the length matches the ID vector. 19 4. Create the data frame: Combine the ID , Subject , and Marks vectors into a new data frame using data.frame() 5. Name the columns explicitly: Use ID , Subject , and Marks as the column names to match requirements. 6. Inspect the marks data frame: Use print() or other functions like str() to view the new data frame. 7. End To create a new data frame marks with columns ID (student IDs, same as in students), Subject (names of subjects), and Marks (marks scored by the students), here is an example R code: H. Merge the students and marks data frames on the ID column Algorithm to Merge students and marks by ID 1. Start 2. Identify the common key column : 20 Both data frames have the ID column with the same type. 3. Call the merge() function with the following parameters: x : first data frame ( students ) y : second data frame ( marks ) by : "ID" (the column to join on) Optional: specify all = FALSE for inner join (default) or all = TRUE for full outer join 4. Store the result in a new data frame, e.g., merged_data 5. Inspect the merged data frame (e.g., with print() or str() ). 6. End 2. Using pre - processing data set. A. Write an R script to detect, analyze and handle missing values. B. Identify and remove outliers from the data set using statistical techniques.