SRM Institute of Science and Technology Ramapuram Chennai Faculty of Science & Humanities (A Place for Transformation) Department of Computer Applications (BCA & BCA GenAI) PRACTICAL RECORD NAME : REGISTER NUMBER : COURSE : BCA SEMESTER / YEAR : VI / III SUBJECT CODE : UCA23G04J SUBJECT NAME : Introduction to Machine Learning April 2026 SRM Institute of Science and Technology Ramapuram Chennai Faculty of Science & Humanities (A Place for Transformation) Department of Computer Applications (BCA & BCA GenAI) REGISTER NUMBER: BONAFIDE CERTIFICATE This is to certify that the bonafide work done by in the subject INTRODUCTION TO MACHINE LEARNING [UCA23G04J] at, SRM Institute of Science and Technology, Ramapuram Chennai in April 2026 STAFF IN-CHARGE HEAD OF THE DEPARTMENT Submitted for the University Practical Examination held at SRM Institute of Science and Technology, Ramapuram Chennai on INTERNAL EXAMINER 1 INTERNAL EXAMINER 2 SRM Institute of Science and Technology Ramapuram Chennai Faculty of Science & Humanities (A Place for Transformation) Department of Computer Applications (BCA & BCA GenAI) INDEX Ex No. Date Title Page No Signature 1 03/12/2025 Import the dataset 1 2 03/12/2025 Handling Data 5 3 10/12/2025 Splitting up the dataset 7 4 10/12/2025 House price prediction using linear regression algorithm 9 5 17/12/2025 Diabetic prediction Logistic Regression 12 6 07/01/2026 Feature Extraction Program 14 7 07/01/2026 K - Means clustering algorithm 16 8 21/01/2026 KNN Classification algorithm 18 9 28/01/2026 Bayesian Network Program 21 10 28/01/2026 Text Classification using Logistic Regression 24 11 04/02/2026 Mental Health Prediction using SVM classifier 27 12 04/02/2026 Random Forest Classifier 29 13 11/02/2026 Evaluation Measures 32 14 18/02/2026 Clustering the face dataset using K - Means model 34 15 25/02/2026 Fuzzy C - Means (Soft Clustering) 36 16 04/03/2026 Gaussian Mixture Model (GMM) – Soft Clustering 38 17 04/03/2026 Data visualizations 42 18 11/03/2026 Keyword classification Multi keyword classification 44 19 18/03/2026 Develop Decision Tree Classification Model for a Given Dataset 46 20 25/03/2026 Implement Naïve Bayes Classification in Python 48 1 Exercise:1 Register Number: Date:03/12/2025 Name: Import the dataset Aim: To download the dataset from the online repository and import that dataset into an jupyter notebook. Program: Code: import pandas as pd df= pd.read_csv('/content/drive/MyDrive/nlp/mental_health_social_media_dataset.csv') df Output: Code: df.shape 2 Output: (5000, 15) Code: df.head() Output: Code: df.tail() Output: Code: df.columns Output: Index(['person_name', 'age', 'date', 'gender', 'platform', 3 'daily_screen_time_min', 'social_media_time_min', 'negative_interactions_count', 'positive_interactions_count', 'sleep_hours', 'physical_activity_min', 'anxiety_level', 'stress_level', 'mood_level', 'mental_state'], dtype='object') Code: df.info() Output: <class 'pandas.core.frame.DataFrame'> RangeIndex: 5000 entries, 0 to 4999 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 person_name 5000 non-null object 1 age 5000 non-null int64 2 date 5000 non-null object 3 gender 5000 non-null object 4 platform 5000 non-null object 5 daily_screen_time_min 5000 non-null int64 6 social_media_time_min 5000 non-null int64 7 negative_interactions_count 5000 non-null int64 8 positive_interactions_count 5000 non-null int64 9 sleep_hours 5000 non-null float64 10 physical_activity_min 5000 non-null int64 11 anxiety_level 5000 non-null int64 4 12 stress_level 5000 non-null int64 13 mood_level 5000 non-null int64 14 mental_state 5000 non-null object dtypes: float64(1), int64(9), object(5) Code: df['age'] Output: Result: Thus the dataset imported successfully. 5 Exercise:2 Register Number: Date: 03/12/2025 Name: Handling Data Aim To apply the data handling methods in to imported dataset. Program Code import pandas as pd df = pd.read_csv('/content/drive/MyDrive/nlp/mental_health_social_media_dataset.csv') df Output: Code: df.isnull() 6 Output: Code: df.isna() Output Code: df.nunique() Output: Result: Some of the data handling methods are executed successfully 7 Exercise:3 Register Number: Date: 10/12/2025 Name: Splitting up the dataset Aim: To split the dataset into training set and testing set for the purpose of model training. Code: import pandas as pd df = pd.read_csv('/content/drive/MyDrive/nlp/ mental_health_social_media_dataset.csv') import sklearn from sklearn.model_selection import train_test_split x=df.drop('mood_level',axis=1) y=df['mood_level'] x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42) print(x_train.shape) print(x_test.shape) Output: (4000, 14) (1000, 14) Code: x_train.info() 8 Output : Code: x_test.info() output Code: x_train.head() output: Result: The dataset successfully split as into training dataset and testing dataset 9 Exercise:4 Register Number: Date: 10/12/2025 Name: House price prediction using linear regression algorithm Aim: To implement the house price for the particular area using linear regression algorithm Program: import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score from sklearn.impute import SimpleImputer # 2. Create dataset df = pd.read_excel('/content/drive/MyDrive/nlp/ml/house_data.xlsx') # 3. Convert to numeric (safety) df['Area'] = pd.to_numeric(df['Area'], errors='coerce') df['Price'] = pd.to_numeric(df['Price'], errors='coerce') # 4. Handle missing values imputer = SimpleImputer(strategy='mean') df[['Area', 'Price']] = imputer.fit_transform(df[['Area', 'Price']]) # 5. Features and target 10 X = df[['Area']] y = df['Price'] # 6. Train-test split X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42 ) # 7. Train Linear Regression model model = LinearRegression() model.fit(X_train, y_train) # 8. Predict y_pred = model.predict(X_test) # 9. Evaluation print("MAE:", mean_absolute_error(y_test, y_pred)) print("MSE:", mean_squared_error(y_test, y_pred)) print("R² Score:", r2_score(y_test, y_pred)) Output: MAE: 671531.3333320208 MSE: 588320429231.7675 R² Score: 0.22863799033010923 Code: 11 value=model.predict([[2345]]) print(value) Output: Code: new_house = [[1800]] predicted_price = model.predict(new_house) print("Predicted House Price:", predicted_price[0]) Output: Predicted House Price: 4793622.555968599 Code: value=model.predict([[2345]]) print(value) Output: [5241819.1753847] Result: Thus the above dataset were imported and predicted the house prices implemented with linear regression algorithm. 12 Exercise:5 Register Number: Date:17/12/2025 Name: Diabetic prediction Logistic Regression Aim: Predict whether a patient has diabetes (Yes/No) using Logistic Regression based on medical attributes. Program: import pandas as pd from matplotlib import pyplot as plt %matplotlib inline df=pd.read_csv("/content/drive/MyDrive/Colab Notebooks/Dataset/Diabetic.csv") df Output from sklearn.model_selection import train_test_split train_test_split(df[['Age']],df.Diabetic,test_size=0.1) 13 output x_train,x_test,y_test,y_train=train_test_split(df[['Age']],df.Diabetic,test_size=0.5) from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(x_train,y_train) model.predict(x_test) model.predict_proba(x_test) Result: The Logistic Regression model was implemented using the Pima Indians Diabetes Database. After preprocessing the dataset (handling missing values and applying feature scaling), the model was trained using an 80:20 train-test split. 14 Exercise:6 Register Number: Date:07/01/2026 Name: Feature Extraction Program Aim: To develop a simple Python program to perform basic feature extraction from a given text by calculating the number of characters, words, sentences, and word frequency Program: # Simple Feature Extraction Program text = input("Enter a sentence or paragraph: ") # 1. Number of characters num_characters = len(text) # 2. Number of words words = text.split() num_words = len(words) # 3. Number of sentences sentences = text.split('.') num_sentences = len([s for s in sentences if s.strip() != ""]) # 4. Word frequency word_freq = {} for word in words: word = word.lower() if word in word_freq: word_freq[word] += 1 else: 15 word_freq[word] = 1 # Output print("\n--- Extracted Features ---") print("Number of Characters:", num_characters) print("Number of Words:", num_words) print("Number of Sentences:", num_sentences) print("Word Frequency:", word_freq) output: Result: The program successfully extracted basic text features such as character count, word count, sentence count, and word frequency from the given input. 16 Exercise:7 Register Number: Date:07/01/2026 Name: K-Means clustering algorithm Aim: To implement the K-Means clustering algorithm in Python to group similar data points into clusters. Program: from sklearn.cluster import KMeans import numpy as np X = np.array([[1, 2], [1, 4], [1, 0], [10, 2], [10, 4], [10, 0]]) # Create KMeans model with 2 clusters kmeans = KMeans(n_clusters=2, random_state=0) # Fit the model kmeans.fit(X) # Get cluster labels labels = kmeans.labels_ # Get cluster centers centers = kmeans.cluster_centers_ # Output print("Cluster Labels:", labels) print("Cluster Centers:\n", centers) 17 output: Result: The program successfully clustered the given data points into the specified number of clusters using the K-Means algorithm.