BDA_LAB_MANNUAL | PDF Host

Big Data Analytics (BIS701), ISE Department, AJIET 1 | P a g e VISVESVARAYA TECHNOLOGICAL UNIVERSITY JNANA SANGAMA, BELGAVI - 590018, KARNATAK A J INSTITUTE OF ENGINEERING & TECHNOLOGY (A unit of Laxmi Memorial Education Trust. (R)) NH - 66, Kottara Chowki, Kodical Cross - 575 00 6 DEPARTMENT OF INFORMATION SCIENCE AND ENGINEERING (Accredited by NBA & NAAC ) MASTER MANUAL Course: BIG DATA ANLYTICS Course Code: BIS701 VII - SEMESTER Prepared by: Dr. Krishna Prasad K Associate Professor, Department of Information Science & Engineering AJIET, Mangaluru ACADEMIC YEAR: 2025 - 26 Big Data Analytics (BIS701), ISE Department, AJIET 2 | P a g e VISION OF THE INSTITUTION To produce top - quality engineers who are groomed for attaining excellence in their profession and competitive enough to help in the growth of nation and global society. MISSSON OF THE INSTITUTION M1: To offer affordable high - quality graduate program in engineering with value education and make the students socially responsible. M2: To support and enhance the institutional environment to attain research excellence in both faculty and students and to inspire them to push the boundaries of knowledge base. M3: To identify the common areas of interest amongst the individuals for the effective industry - institute partnership in a sustainable way by systematically working together. M4: To promote the entrepreneurial attitude and inculcate innovative ideas among the engineering professionals. VISION OF THE DEPARTMENT To be a center of excellence in Information Science & Engineering education, research and training to meet the growing needs of the industry and society. MISSION THE DEPARTMENT M1: To impart theoretical and practical knowledge through the concepts and technologies in Information Science and Engineering. M2: To foster research, collaboration and higher education with premier institutions and industries. M3: Promote innovation and entrepreneurship to fulfill the needs of the society and industry. Big Data Analytics (BIS701), ISE Department, AJIET 3 | P a g e PROGRAM OUTCOMES (POs) PO1: Engineering Knowledge: Apply the knowledge of mathematics, science, engineering fundamentals, and an engineering specialization to the solution of complex engineering problems. PO2: Problem Analysis: Identify, formulate, review research literature, and analyze complex engineering problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and engineering sciences. PO3: Design/Development of Solutions: Design solutions for complex engineering problems and design system components or processes that meet the specified needs with appropriate consideration for the public health and safety, and the cultural, societal, and envi ronmental considerations. PO4: Conduct Investigations of Complex Problems: Use research - based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions. PO5: Modern Tool Usage: Create, select, and apply appropriate techniques, resources, and modern engineering and IT tools including prediction and modelling to complex engineering activities with an understanding of the limitations. PO6: The Engineer and Society: Apply reasoning informed by the contextual knowledge to assess societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to the professional engineering practice. PO7: Environment and Sustainability: Understand the impact of the professional engineering solutions in societal and environmental contexts, and demonstrate the knowledge of, and need for sustainable development. PO8: Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the engineering practice. PO9: Individual and Team work: Function effectively as an individual, and as a member or leader in diverse teams, and in multidisciplinary settings. PO10: Communication: Communicate effectively on complex engineering activities with the engineering community and with society at large, such as, being able to comprehend and write effective reports and design documentation, make effective presentations, and give and receive clear instructions. PO11: Project Management and Finance: Demonstrate knowledge and Big Data Analytics (BIS701), ISE Department, AJIET 4 | P a g e understanding of the engineering and management principles and apply these to one’s own work, as a member and leader in a team, to manage projects and in multidisciplinary environments. PO12: Life - Long Learning: Recognize the need for, and have the preparation and ability to engage in independent and life - long learning in the broadest context of technological change. PROGRAM SPECIFIC OUTCOMES (PSOs) 1. Design, implement and maintain the information systems that fulfill the current needs of the industry. 2. Apply computational theory, storage, and networking concepts to address the societal problems. PROGRAM EDUCATIONAL OBJECTIVES (PEOs) 1. Analyse, design and implement solutions to the real - world problems in the field of Information Science and Engineering with multidisciplinary setup 2. Keep abreast with the technology, innovation and pursue higher education with high standards of social and professional ethics 3. Develop professional and entrepreneurship skills to work effectively as an individual and in a team to meet the ever - changing goals of the organization COURSE OUTCOME (COs) At the end of the program, graduates will be able to 1. Identify and list various Big Data concepts, tools and applications. 2. Develop programs using HADOOP framework. 3. Use Hadoop Cluster to deploy Map Reduce jobs, PIG,HIVE and Spark programs. 4. Analyze the given data set and identify deep insights from the data set. GENERAL LAB GUIDELINES Do's 1. 1. Maintain discipline in the Laboratory. 2. Before entering the Laboratory, keep the footwear on the shoe rack. 3. Proper dress code has to be maintained while entering the Laboratory. 4. Students should carry a lab observation book, student manual and record book completed in all aspects. Big Data Analytics (BIS701), ISE Department, AJIET 5 | P a g e 5. Read and understand the logic of the program thoroughly before coming to the laboratory. 6. Enter the login book before switching on the computer. 7. Enter your batch member names and other details in the slips for hardware kits. 8. Students should be at their concerned places; unnecessary movement is restricted. 9. Students should maintain the same computer until the end of the semester. 10. Report any problems in computers/hardware kits to the faculty member in - charge/laboratory technician immediately. 11. The practical result should be noted down into their observation and the result must be shown to the faculty member in - charge for verification. 12. After completing the experiments, students should switch off the computers, enter logout time, return the hardware kits and keep the chairs properly. Don'ts 1. Do not come late to the Laboratory. 2. Do not enter the laboratory without an ID card, lab dress code, observation book and record. 3. Do not leave the laboratory without the permission of the faculty in - charge. 4. Never eat, drink while working in the laboratory. 5. Do not handle any equipment before reading the instructions/instruction manuals. 6. Do not exchange the computers with others and hardware kits also. 7. Do not misbehave in the laboratory. 8. Do not alter computer settings/software settings. 9. External Disk/drives should not be connected to computers without permission, doing so will attract fines. 10. Do not remove anything from the kits/experimental set up without permission. Doing so will attract fines. 11. Do not mishandle the equipment / Computers. Big Data Analytics (BIS701), ISE Department, AJIET 6 | P a g e 10. Do not leave the laboratory without verification of hardware kits by the lab instructor. 11. Usage of Mobile phones, tablets and other portable devices are not allowed in restricted places. INSTRUCTIONS TO STUDENTS • Students must bring Observation book, record and manual along with pen, pencil, and eraser etc., no borrowing from others. • Save your work in a designated folder (e.g., C: \ Users \ Student \ Python_Lab or a cloud storage platform). • Always name your files properly (exp1_basic_operations.py, exp2_loops.py, etc.). • Regularly back up your work to avoid data loss. • Before leaving the lab, should check whether they have switch off the power supplies and keep their chairs properly. • Avoid unnecessary talking while doing the experiment Big Data Analytics (BIS701), ISE Department, AJIET 7 | P a g e SYLLABUS Interlocution to Python Programming PRACTICAL COMPONENT OF IPCC [As per Choice Based Credit System (CBCS) scheme] SEMESTER – VII Subject Code: BIS701 CIE Marks: 25 Hours/ Week : 02 (02 Hours Laboratory) Test Hours: 03 Using suitable simulation software, demonstrate the operation of the following programs: LAB TASK 1 Install Hadoop and Implement the following file management tasks in Hadoop: Adding files and directories Retrieving files Deleting files and directories. Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them into HDFS using one of the above command line utilities. 2 Develop a MapReduce program to implement Matrix Multiplication. 3 Develop a Map Reduce program that mines weather data and displays appropriate messages indicating the weather conditions of the day. 4 Develop a MapReduce program to find the tags associated with each movie by analyzing movie lens data. 5 Implement Functions: Count – Sort – Limit – Skip – Aggregate using MongoDB 6 Write Pig Latin scripts to sort, group, join, project, and filter the data. 7 Use Hive to create, alter, and drop databases, tables, views, functions, and indexes 8 Implement a word count program in Hadoop and Spark. 9 Use CDH (Cloudera Distribution for Hadoop) and HUE (Hadoop User Interface) to analyze data and generate reports for sample datasets CIE for the practical component of the IPCC • 15 marks for the conduction of the experiment and preparation of laboratory record, and 10 marks for the test to be conducted after the completion of all the laboratory sessions. • On completion of every experiment/program in the laboratory, the students shall be evaluated including viva - voce and marks shall be awarded on the same day. • The CIE marks awarded in the case of the Practical component shall be based on the continuous evaluation of the laboratory report. Each experiment report can be evaluated for 10 marks. Marks of all experiments’ write - ups are added and scaled down to 15 marks. • The laboratory test (duration 02/03 hours) after completion of all the experiments shall be conducted for 50 marks and scaled down to 10 marks. • Scaled - down marks of write - up evaluations and tests added will be CIE marks for the laboratory component of IPCC for 25 marks. Big Data Analytics (BIS701), ISE Department, AJIET 8 | P a g e • The student has to secure 40% of 25 marks to qualify in the CIE of the practical component of the IPCC. 1. Install Hadoop and Implement the following file management tasks in Hadoop: a) Adding files and directories b) Retrieving files c) Deleting files and directories. Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them into HDFS using one of the above command line utilities. Pre - requisites: • Basic knowledge of Linux shell commands • Java JDK installed • Internet connection (for downloading packages) Lab Objective: To understand and perform the basic file management operations in Hadoop Distributed File System (HDFS) by: 1. Installing Hadoop on a single - node Ubuntu system 2. Performing file operations: o Adding files and directories o Retrieving files o Deleting files and directories Lab Requirements: Hardware Requirements: • 4 GB RAM (8 GB recommended) • 40 GB Hard disk • Dual - core processor Software Requirements: • Ubuntu OS • Java JDK 8 • Hadoop 3.x (Standalone mode) • Terminal access • Internet for downloading packages PART A: Hadoop Installation on Ubuntu (Single Node – Standalone Mode) Step 1: Install Java sudo apt update sudo apt install openjdk - 17 - jdk - y java - version Step 2: Create a Hadoop User sudo addgroup hadoop sudo adduser -- ingroup hadoop hduser sudo usermod - aG sudo hduser Step 3: Switch to Hadoop User su - hduser Big Data Analytics (BIS701), ISE Department, AJIET 9 | P a g e Step 4: Download Hadoop wget https://downloads.apache.org/hadoop/common/hadoop - 3.3.6/hadoop - 3.3.6.tar.gz tar - xzvf hadoop - 3.3.6.tar.gz mv hadoop - 3.3.6 hadoop Step 5: Configure Environment Variables nano ~/.bashrc Add the following at the end: export JAVA_HOME=/usr/lib/jvm/java - 17 - openjdk - amd64 export HADOOP_HOME=/home/hduser/hadoop export HADOOP_INSTALL=$HADOOP_HOME export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin Apply changes: source ~/.bashrc Step 6: Configure Hadoop Files Navigate to Hadoop configuration directory: cd $HADOOP_HOME/etc/hadoop sudo nano hadoop - env.sh export JAVA_HOME=/usr/lib/jvm/java - 17 - openjdk - amd64 sudo nano core - site.xml <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property> </configuration> sudo nano hdfs - site.xml <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> Format Namenode: hdfs namenode - format Install the OpenSSH server sudo apt install openssh - server openssh - client - y Big Data Analytics (BIS701), ISE Department, AJIET 10 | P a g e ssh - keygen - t rsa - b 4096 ls - 1 ~/.ssh/ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys chmod 600 ~/.ssh/authorized_keys ssh localhost PART B: Start Hadoop Daemons (Standalone Mode) start - dfs.sh If successful, verify with: jps Expected output: ➢ NameNode ➢ DataNode ➢ SecondaryNameNode PART C: HDFS File Management Tasks 1. Adding Files and Directories Create a local file: echo "This is a sample file for HDFS" > myfile.txt Create a directory in HDFS: hdfs dfs - mkdir /mydir Copy the local file to HDFS: hdfs dfs - put myfile.txt /mydir/ Verify: hdfs dfs - ls /mydir 2. Retrieving Files Copy a file from HDFS to local: hdfs dfs - get /mydir/myfile.txt myfile_from_hdfs.txt Or simply display the content: hdfs dfs - cat /mydir /myfile.txt 3. Deleting Files and Directories Delete a file: hdfs dfs - rm /mydir/myfile.txt Delete a directory: hdfs dfs - rm - r /mydir Sample Output hduser@ubuntu:~$ hdfs dfs - ls / Found 1 items drwxr - xr - x - hduser supergroup 0 2025 - 08 - 07 10:30 /mydir Full Uninstallation Steps for Hadoop on Ubuntu Step 1: Remove Hadoop Directory If you installed Hadoop in the home directory (e.g., /home/hduser/hadoop), delete it: rm - rf /home/hduser/hadoop Big Data Analytics (BIS701), ISE Department, AJIET 11 | P a g e If it’s in another directory, modify the path accordingly. Step 2: Delete the Hadoop Tar File (Optional) If the original .tar.gz file still exists: rm - f /home/hduser/hadoop - 3.3.6.tar.gz Step 3: Remove the Hadoop User and Group (if created) Only do this if the user hduser was created solely for Hadoop. sudo deluser -- remove - home hduser sudo delgroup hadoop This deletes the user and their home directory (including .bashrc and config files). Step 4: Clean up Environment Variables If you edited the environment variables for hduser in ~/.bashrc, they will be removed along with the user. If you added them system - wide (e.g., in /etc/environment or /etc/profile), open and manually delete the lines related to Hadoop: sudo nano /etc/environment Remove lines like: ini HADOOP_HOME=/home/hduser/hadoop PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin Step 5: (Optional) Remove Java (if installed only for Hadoop) If you installed Java specifically for Hadoop and no other software depends on it: sudo apt remove openjdk - 8 - jdk - y sudo apt autoremove Step 6: Verify Hadoop is Fully Removed Try running: which hadoop If you see no output, Hadoop is no longer available. You can also check: hadoop version If Hadoop was successfully removed, this should return: Command 'hadoop' not found Optional Cleanup Commands sudo updatedb locate hadoop This helps you find any remaining Hadoop - related files if you installed them Big Data Analytics (BIS701), ISE Department, AJIET 12 | P a g e elsewhere. Viva Questions 1. What is HDFS and why is it important? 2. How does HDFS differ from traditional file systems? 3. What is the role of NameNode and DataNode? 4. What happens when you run hdfs dfs - put? 5. How can you verify if a file was successfully added to HDFS? Big Data Analytics (BIS701), ISE Department, AJIET 13 | P a g e 2. Lab Manual: Matrix Multiplication using Hadoop MapReduce 1) Setup and Preparation Before proceeding, ensure you have Hadoop installed and configured correctly. For this lab, we'll use Hadoop 3.x or later. 2) Create the Source File Navigate to the Hadoop Configuration Directory: On your Ubuntu machine (as the Hadoop user): cd $HADOOP_HOME/etc/hadoop mkdir - p ~/matrix - mul cd ~/matrix - mul Create the MatrixMultiply.java file: Create a new Java file and paste the code for matrix multiplication. nano MatrixMultiply.java Paste and Save the Java Code: Copy and paste the following code into the editor: import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class MatrixMultiply { public static class MatrixMapper extends Mapper<Object, Text, Text, Text> { @Override public void map( Object key, Text value, Context context) throws IOException, InterruptedException { String[] parts = value.toString().trim().split(","); // Expecting lines like: A,i,k,value or B,k,j,value if (parts.length != 4) return; String matrixName = parts[0]; int row = Integer.parseInt(parts[1]); int col = Integer.parseInt(parts[2]); int val = Integer.parseInt(parts[3]); Configuration conf = context.getConfiguration(); int m = Integer.parseInt(conf.get("m")); // rows of A / rows of C int p = Integer.parseInt(conf.get("p")); // cols of B / cols of C if (matrixName.equals("A")) { // emit for all possible j (0..p - 1) Big Data Analytics (BIS701), ISE Department, AJIET 14 | P a g e for (int j = 0; j < p; j++) { context.write(new Text(row + "," + j), new Text("A," + col + "," + val)); } } else if (matrixName.equals("B")) { // emit for all possible i (0..m - 1) for (int i = 0; i < m; i++) { context.write(new Text(i + "," + col), new Text("B," + row + "," + val)); } } } } public static class MatrixReducer extends Reducer<Text, Text, Text, IntWritable> { @Override public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { int n = Integer.parseInt(context.getConfiguration().get("n")); // shared dim int[] aVals = new int[n]; int[] bVals = new int[n]; for (Text t : values) { String[] parts = t.toString().split(","); if (parts[0].equals("A")) { int k = Integer.parseInt(parts[1]); aVals[k] = Integer.parseInt(parts[2]); } else if (parts[0].equals("B")) { int k = Integer.parseInt(parts[1]); bVals[k] = Integer.parseInt(parts[2]); } } int sum = 0; for (int k = 0; k < n; k++) { sum += aVals[k] * bVals[k]; } context.write(key, new IntWritable(sum)); } } public static void main(String[] args) throws Exception { if (args.length < 5) { System.err.println("Usage: MatrixMultiply <input> <output> <m> <n> <p>"); System.exit(2); } Big Data Analytics (BIS701), ISE Department, AJIET 15 | P a g e Configuration conf = new Configuration(); conf.set("m", args[2]); // rows of A conf.set("n", args[3]); // shared dimension conf.set("p", args[4]); // cols of B Job job = Job.getInstance(conf, "Matrix Multiply"); job.setJarByClass(MatrixMultiply.class); job.setMapperClass(MatrixMapper.class); job.setReducerClass(MatrixReducer.class); // Mapper emits Text,Text. Final output is Text, IntWritable. job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } } 3) Compile and Create JAR 1. Make directories for compiled classes: 2. mkdir - p build/classes 3. Compile the Java file using the Hadoop classpath: 4. javac - classpath "$(hadoop classpath)" - d build/classes MatrixMultiply.java 5. Create the JAR file: 6. jar - cvf MatrixMultiply.jar - C build/classes/ . If you face any compilation issues, ensure your $HADOOP_HOME is correctly set, and the hadoop command is functioning properly. 4) Prepare Input Data 1. Create the input file (input.txt) in your project folder (~/matrix - mul): Example for 2x2 matrices: nano input.txt Add the following content: A,0,0,1 A,0,1,2 A,1,0,3 A,1,1,4 B,0,0,5 Big Data Analytics (BIS701), ISE Department, AJIET 16 | P a g e B,0,1,6 B,1,0,7 B,1,1,8 5) Put Input Data into HDFS 1. Create HDFS input directory: 2. hdfs dfs - mkdir - p /user/$(whoami )/matrix/input 3. Copy the input file to HDFS: 4. hdfs dfs - put - f input.txt /user/$(whoami)/matrix/input/ 5. Verify the file is in HDFS: 6. hdfs dfs - ls /user/$(whoami)/matrix/input 7. hdfs dfs - cat /user/$(whoami)/matrix/input/input.txt 6) Remove Previous Output (if exists) Before running a new job, ensure the output path doesn't exist: hdfs dfs - rm - r - f /user/$(whoami)/matrix/output 7) Run the MapReduce Job Run the MapReduce job with the following command. For a 2x2 matrix multiplication, you would use m=2, n=2, and p=2: hadoop jar MatrixMultiply.jar MatrixMultiply \ /user/$(whoami)/matrix/input \ /user/$(whoami)/matrix/output \ 2 2 2 Wait for the job to finish. You’ll see progress information in the terminal. 8) View Results After the job finishes, view the output: 1. List the output directory in HDFS: 2. hdfs dfs - ls /user/$(whoami)/matrix/output 3. View the output file: 4. hdfs dfs - cat /user/$(whoami)/matrix/output/part - r - 00000 Expected output (for the given example matrices) should be: 0,0 19 0,1 22 1,0 43 1,1 50 9) Check Job & Logs (if something fails) 1. Check YARN applications: 2. yarn application - list 3. yarn application - status <applicationId> Big Data Analytics (BIS701), ISE Department, AJIET 17 | P a g e 4. Check job logs: 5. yarn logs - applicationId <applicationId> | less 6. Check Hadoop logs: 7. ls - l $HADOOP_HOME/logs 8. tail - n 200 $HADOOP_HOME/logs/hadoop - $(whoami) - namenode - *.log 9. tail - n 200 $HADOOP_HOME/logs/yarn - *.log 10. Make sure NameNode/DataNode are running: 11. jps The following services should be running: NameNode, DataNode, ResourceManager, NodeManager, SecondaryNameNode. 10) Tips & Common Gotchas 1. Output path must NOT exist before running the job. Remove it using: 2. hdfs dfs - rm - r - f /path/to/output 3. Classpath: Use $(hadoop classpath) when compiling. If you get a NoClassDefFoundError, your classpath is wrong. 4. JAVA_HOME: Ensure JAVA_HOME is correctly set in $HADOOP_HOME/etc/ha- doop/hadoop - env.sh. 5. SSH Passwordless: Ensure passwordless SSH to localhost is configured for start - dfs.sh and start - yarn.sh. 6. Large matrices: For larger matrices, consider blocked matrix multiplication to re- duce shuffle. 7. Map/Reduce parallelism: You can tune the number of reducers using job.setNum- ReduceTasks(k) or - D mapreduce.job.reduces=k. 8. Java Compatibility: Ensure your Hadoop version supports Java 11/17 (Hadoop 3.3.5+). If facing issues, install OpenJDK 8 and set JAVA_HOME accordingly. Big Data Analytics (BIS701), ISE Department, AJIET 18 | P a g e 3. Lab Manual: Weather Data Mining using MapReduce Objective To develop a Hadoop MapReduce program that processes weather data and displays mes- sages indicating the weather conditions of the day (e.g., “Hot Day,” “Cold Day,” “Pleasant Day”). Step 1: Create Working Directory 1. Open your terminal. 2. Create a directory for your project (example: weather - mr): mkdir weather - mr cd weather - mr Step 2: Create Input Data 1. Create an input directory: mkdir input 2. Use nano editor to create a sample weather dataset: nano input/weather.txt 3. Enter the following sample data (year, month, day, temperature): 2025,08,15,35 2025,08,16,28 2025,08,17,18 2025,08,18,10 2025,08,19,22 4. Save and exit in nano: o Press CTRL + O → press Enter (to save). o Press CTRL + X (to exit). Step 3: Write Mapper Class 1. Create a source folder: mkdir - p src 2. Open nano to create WeatherMapper.java: nano src/WeatherMapper.java 3. Paste the following code: import java.io.IOException; Big Data Analytics (BIS701), ISE Department, AJIET 19 | P a g e import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class WeatherMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private Text day = new Text(); private IntWritable temperature = new IntWritable(); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // Input format: year,month,day,temp String[] fields = value.toString().split(","); if (fields.length == 4) { String date = fields[0] + " - " + fields[1] + " - " + fields[2]; // yyyy - mm - dd int temp = Integer.parseInt(fields[3]); day.set(date); temperature.set(temp); context.write(day, temperature); } } } Save and exit (CTRL+O, Enter, CTRL+X). Step 4: Write Reducer Class nano src/WeatherReducer.java Paste code: import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache. hadoop.mapreduce.Reducer;public class WeatherReducer extends Re- ducer<Text, IntWritable, Text, Text> { Big Data Analytics (BIS701), ISE Department, AJIET 20 | P a g e @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IO- Exception, InterruptedException { for (IntWritable val : values) { int temp = val.get(); String condition; if (temp > 30) { condition = "Hot Day"; } else if (temp < 15) { condition = "Cold Day"; } else { condition = "Pleasant Day"; } context.write(key, new Text(condition + " (" + temp + " \ u00B0C)")); } } } Step 5: Write Driver Class nano src/WeatherDriver.java Paste code: import org.apache .hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WeatherDriver { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: WeatherDriver <input path> <output path>");