Contents 1 Introduction 2 2 Scientific Process 3 3 Methods 4 3.1 Step 1: Generating Data for Object Detection . . . . . . . . . . . . . . . . . . . . . . . 4 3.2 Step 2: Model Building . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2.1 Deep Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2.2 Computer vision model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2.3 Mixed Vision model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2.4 Step 3: Insights from model and comparing model results . . . . . . . . . . . 5 4 Materials 5 5 Sample Model Predictions 6 6 Results 7 7 Discussion 9 8 Real-Life Applications 10 9 Future Plans 10 10 Algorithms: A Deep Dive into our Model Development 11 10.1 Computer Vision Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 10.2 Deep Learning Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 10.3 Deep Learning Model Iterations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 10.4 Mixed Model: Computer Vision and Deep Learning . . . . . . . . . . . . . . . . . . . 15 10.5 Model Flowcharts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 11 Conclusion 17 12 Appendix A: Our Installation Process 18 13 Appendix B: The Code 20 14 References 34 15 About the Authors 35 1 Weapon Detection using Deep Learning and OpenCV Brandon Vu Yash Prabhu March 17, 2020 Abstract Security and surveillance in schools have the ability to record but not actively detect crimes. The objective of this study is to use deep learning and traditional computer vision techniques in order to detect a visible toy firearm in a controlled environment using an average gaming laptop’s GPU and webcam. We produced and annotated six-hundred different images of our toy firearm, which we split into a training, validation, and testing set. We then trained our model using a MobileNet Single Shot Detector (SSD) framework and a deep learning library called TensorFlow. On the side, we developed a traditional computer vision firearm detection model by utilizing OpenCV’s contour, noise-filtering, and color segmentation tools. We then developed a mixed model that uses the computer vision algorithm to increase the reliability of the deep learning algorithm. Our results showed that all of our models predicted the firearm at least 70% of the time, but the mixed model detected the most firearms. The development of our toy weapon detector revealed the scalability and effectiveness of our mixed model in its ability to be optimized for different use cases, and our hope is to fine tune this model to be eventually integrated into schools’ or businesses’ security systems. 1 Introduction There have been over 180 school shootings in the last decade. This includes over 365 victims [1] with more shootings occurring in 2019 than days. In 2017, the CDC reported over 14,000 firearm deaths in the United States alone [2]. The CDC has a four-step process for dealing with violence, and one of them includes monitoring the problem. Early detection and awareness of active shoot- ing incidents are crucial in order to provide time for emergency personnel to arrive at the scene, and schools ability to perform evacuation or lock-down procedures. In these situations, every sec- ond counts, and swift action can reduce the time and risk of the violence. As citizens, parents, and students, we share a universal responsibility to use our knowledge and skills to improve and keep the world safe for others. Our team decided to combine our interest in public safety, artificial intelligence, and Python by creating solutions to detect firearms early. Artificial intelligence, especially, machine learning has become extremely popular due to the advancement of computing power and the expanding resources of the internet and the open-source community. It has shown promising results in a wide array of fields, especially in the field of object detection using computer vision and image processing subset of computer science. Object detection can help us detect instances of semantic objects of a certain class, including humans, buildings, and cars in digital images and videos. Although there are surveillance cameras in almost every public building, at present, we are unable to use these cameras autonomously to reduce the danger of shootings. With object detection there is a potential to generate a feed that can help accurately detect where firearms are being deployed at any location in a short period of time without the need of upgrading expensive security camera systems. 2 Currently, there are security systems that identify weapons like Athena Security [3], but they can be expensive due to the computing hardware required and the upgrades to the current infrastruc- ture that are needed. Other security systems include acoustic sensors, but these can be triggered too late, and they do not use existing infrastructure. Most small businesses do not have access to top-of-the line Graphical Processing Units (GPU) or have the ability to afford other forms of pro- cessing large amounts of live video feed. Thus, we want to design a program that could potentially use existing infrastructure that most places have—continuous video monitoring. 2 Scientific Process Objective: The goal of this study was to use a mix of deep learning and traditional computer vi- sion techniques in order to detect a single weapon, and to demonstrate a proof of how our program could be scaled up to provide a reliable security system for business and educational institutions using their existing infrastructure. In order to design a reliable security system, our objective also included determining the best algorithm to detect a weapon. Question: Does a mixed algorithm of deep learning and computer vision perform better in terms of speed, accuracy, and detection rate, than either algorithm alone in detection of a weapon? Hypothesis: We believe that our mixed algorithm will be better suited to detecting firearms than computer vision alone or deep learning algorithms alone because of two-step verification process we use, and the ability for us to easily optimize the model for speed or accuracy. 3 3 Methods 3.1 Step 1: Generating Data for Object Detection The first step for us was to generate images of guns for our study. While we initially considered annotating live feed from movies as data for our model, we rejected it because of lack of envi- ronmental controls we had over the data; we decided to generate our own images. Since we did not have access to fire weapons, we chose a Nerf Gun as our “toy weapon.” We started by using our own frame extractor script to sample our live webcam feed into around 600 images of our toy weapon. Using an online image annotator (VGG Annotator), we compiled a .csv file with the an- notation metadata (filename, bounding box coordinates). In addition, since the format of the .csv was not correct, we used Vim, a bash program, to find and replace the data fields that we needed. After that, we generated a .records file using a pre-built TensorFlow script. 3.2 Step 2: Model Building We built three different models: a deep learning model, a computer vision model, and a mixed model. 3.2.1 Deep Learning Model We chose a MobileNet SSD infrastructure for our deep learning model because it was commonly used for object detection. Using the .config file associated with the MobileNet SSD model, we tuned it in certain ways which is explained in 10.2. After this setup and linking the paths to all of the relevant data files, we ran the train script that came with the Tensorflow Objection Detection API, and we were able to track the training process through a Tensorboard output which showed a live update of the model’s loss function. After stopping the training when the loss function stabilized, we extracted the .ckpt file and used a TensorFlow script to convert the checkpoint file to a frozen inference graph. We trained a total of five models with each model training two to three hours; our iterations can be found in 10.3. From here, we used a template script we found from a tutorial and modified it to be able to use the frozen inference graph that we just trained to generate predictions on new images from our webcam. 3.2.2 Computer vision model We developed our own computer vision algorithm to detect firearms using OpenCV’s image filtering tools. We took inspiration from watching tutorials on doing shape detection, and we used a combination of shape detection with noise filtering, HSV, and RGB segmentation. We have a further explanation in section 10.1. 3.2.3 Mixed Vision model After developing two models, we developed a mixed model that used both the traditional and the deep learning model. We utilized a two step verification process that needed both the CV and deep learning model to confirm a detection. If the CV model failed to detect a weapon, we defaulted to using the deep learning model every 300ms. This was to reduce the computational load and also increase the detection rate. 4 3.2.4 Step 3: Insights from model and comparing model results After developing and tuning these three algorithms, we ran them on a test set of images to quantify the algorithms’ speed, accuracy, and detection rate. We timed the algorithm (speed) as well as printed the distance between the center of the true weapon and the center of the detected weapon in order to calculate the error of the location of the detection (accuracy). We also recorded the amount of correct detections that each algorithm had in order to calculate the detection rate. It was important that we were consistent in our procedure since we were comparing algorithms that take 30-100 milliseconds to run. For the test set we sampled our live webcam footage at 250 ms; the data consisted of our weapon being held, translated, and rotated. We used an annotation tool to locate the true center of the weapon on each image of the test set, so that when we evaluated each algorithm, we could get the accuracy of the detection box compared to the ground truth box that we labeled. We also experimented with non-ideal cases of our detection model such as another colored weapon and other backgrounds. After recording these attributes, we analyzed the means and compared them using a 2-sample-t-test. 4 Materials We wanted to simulate the environment of a security camera, so we purposely used lower quality and compressed images for data. We also tried to use dark lighting in the background. Instead of using a real firearm, we used a plastic Nerf gun; we covered the weapon in black tape to make it appear more like a real firearm. The backdrop for the images that we collected had three different appearances which we switched out throughout our data collection process. This was because we wanted the features of the weapon to be independent of the background. Our overall goal with our setup was to simplify our environment and data in order to provide a proof of concept of the tools and programs that we used to develop a basic weapon detector. We used an HP OMEN Gaming Laptop (Intel Core i7, 16GB RAM, 512GB SSD, GeForce RTX 2060) and the laptop’s webcam to collect data. Our primary software tools were Python, the Ten- sorFlow Object Detection API, and OpenCV. Figure 1: Nerf Gun Used in Experiment 5 5 Sample Model Predictions Figure 2: Computer Vision Model Result (Test Set 1) Figure 2 is an image from testing the computer vision algorithm. In the picture, the detection is the polygon shape that is around the weapon. Figure 3: Deep Learning Model Result (Test Set 1) Figure 3 is the deep learning model is using the trained model to predict where the weapon is, and how confident it is in that detection. Figure 4: Mixed Algorithm Result (2 Frames from Test Set 1) Figure 4 shows two outcomes of the mixed algorithm’s result. On the left, the mixed model is able to verify the initial computer vision detection, and so the deep learning detection is shown. On the right, the deep learning model only is used to detect the weapon because the computer vision model was not able to detect it. It seems that both show the same result, but the image on the left passed the 2-step verification whereas the image on the right did not. 6 6 Results Computer Vision Deep Learning Mixed 74% 81% 91% Table 1: Percent of Test Set Detected as a Weapon (Detection Rate) The results in Table 1 are calculated by counting the amount of times a weapon was detected in a test set of all weapons. Computer Vision Deep Learning Mixed Mean 121 46 188 Table 2: Detection Error in pixels (n = 54) The detection error is the distance between the detected box and the ground truth box of the weapon. Computer Vision vs Deep Learning Computer Vision vs Mixed Deep Learning vs Mixed p < 0.001 p < 0.001 p < 0.001 Table 3: 2-Sample-T-Test P-Values of Detection Error The p-values reject the null hypothesis, and there is convincing evidence that the detection error is significant between each of the algorithms. Computer Vision Deep Learning Mixed Mean 32 47 49 Table 4: Detection Time in Milliseconds (n = 600) The time of detection is the time it takes for each algorithm to produce a detection on a given frame. 7 Figure 5: Bar Graph of Error Figure 6: Bar Graph of Average Speed The results show that the weapon detection rate for the mixed model was the highest (91%); however, the accuracy of the predicted detection box for the mixed model was less than the deep learning model. This means that we detected more guns with the mixed model, but the predictions were less accurate. The result of the 2-sample-t-test shows that we reject the null hypothesis that there are no differences between the error of the detection boxes predicted, concluding that there are significant differences between the detection errors between the models. The detection rate for the computer vision model and the were within 1 ms of each other. Us- ing the 2-sample-t-test again, we found that the deep learning model and the mixed model were significantly slower than the computer vision algorithm; however, in practical use cases, a 1 ms difference is not going to result in major differences in detection. 8 7 Discussion Overall, all of our models had at least a 70% detection rate. Even with 70% accuracy, when each detection happens less than a tenth of a second during a live webcam stream, the amount of missed detections is barely noticeable. Since the target environment of our product is in a live setting with 20-30 frames a second, we are confident that our models have practical use as they are. Comparing the detection time of the deep learning model to everything else, we were surprised that the detection time for the deep learning model was surprisingly similar to the computer vision model. This is most likely due to the fact that the deep learning detections were done on the GPU whereas the computer vision algorithm was done on the CPU. It is likely that as modern object detection algorithms get more robust, they will require more computational power which will make the detection time increase, but for us, our deep learning model seemed fast and consistent. We expected the deep learning model to be more accurate and precise in its detections, and the data confirmed this. The deep learning model was significantly more accurate than the standard computer vision algorithm. Our definition of accuracy is based on the error of where exactly the location of the detection is compared to the true location; however, this does not mean that the deep learning model is practically better. For example, the error of the detection can be high (accuracy low), but the gun can still be included in the bounding box, therefore the detection of the weapon is good. For practical purposes, accuracy is not the highest priority for choosing which algorithm to use. With that, the detection rate is most useful because the goal is to be able to detect all of the firearms present in a given time. Although, the mixed model that used both computer vision and deep learning was slightly less accurate in its predictions and slightly slower, the detection rate was 10% higher than the deep learning model; this means that our mixed model detected firearms where the other algorithms by themselves could not. For this reason, our mixed model is what we would recommend to potential clients. In addition, we could remove the 2-step verification in order to opt for higher speed, or we could add stricter parameters for detections in order to reduce false detections. These changes can be made easily with the mixed detection model unlike the other models. 9 8 Real-Life Applications Aiding businesses, schools, and institutions with handling armed intruders are the most direct application of weapon detection, but there are more applications that could benefit from using a mixed CV and deep learning algorithm. In the Air Force, Unmanned Aerial Vehicles (UAVs) have vision, path finding, and other algorithms that it runs for localization and obstacle avoidance. Where the computing power is limited on a small robot, our mixed model will be better suited to optimizing the performance of the vision model. 9 Future Plans There are still improvements to both the deep learning and computer vision model we can make. For example, for the deep learning model, we only trained on 600 hundred images of a particular environment. To make our model stronger and more resilient to changing environments, we would need to collect and train on more data, preferably at least one thousand. For the computer vision model, we could experiment with a more rigid RGB segmentation. Segmenting RGB values is hard because most objects have colors in all three color channels, but using a multivariate model, we could segment out color ranges that do not include weapons. In addition, we could try a skeleton detection where we condense the weapon into line segments and we use a pattern matching algorithm to determine if the shape is a weapon. As the computer vision model and the deep learning models improve, our mixed model will improve. We could also run our experiment on different hardware so that we see the advantages or disadvantages of each algorithm clearer. 10 10 Algorithms: A Deep Dive into our Model Development 10.1 Computer Vision Model Goal: Quickly detect weapons using OpenCV using minimal computation power. How it Works: We run each of the webcam frames through this series of filters: HSV(Hue,Saturation,Value), RGB, Blur, Dilation, Morphology Transformation, Gaussian Blur, and contour detection. The HSV and RGB segmentation both done on the original images, and they get layered on each other to accumulate pixels that we think are the weapon. The blur helps reduce the noise that the RGB and HSV produce. After the blur, we do a dilation and a morphology transformation in order to fill in the areas of the weapon so there are no holes in the shape. We then use a Gaussian Blur to reduce the sharp edges that can throw off the detection. After all these filters, we do a contour detection to put a shape around the gun. We initially no- ticed that this system produced many errors, as there were many other minute objects in our frames that shared similar characteristics to our gun in terms of color and brightness. So, we created a method that eliminates detections that are anomalies through an area threshold. If a detection has an area that is far too large or far too small, it will ignore that detection. Practically speaking, depending on the object’s distance from the camera, the area changes, but we only excluded the extremes of the areas that were being detected. Figure 7: Left to Right: Original, RGB, Dilation, HSV, Blur, Closing 11 Figure 8: Left to Right: Gaussian Blur, Contour 10.2 Deep Learning Model Goal: Utilize the computer’s GPU and the TensorFlow library to accurately identify and detect the toy firearm Process: We began collected 600 hundred images of our gun under different backgrounds for train- ing our model. We used VGG Image Annotator to mark boxes where the guns were present. We then stored these annotations as a .csv file which gave information about the minimum x-coordinate of the box, the minimum y-coordinate, and the width and height of the box. The .csv file that we stored from annotations had some extra unnecessary characters, which we filtered in the VIM edi- tor in a MINGW64 command prompt. In excel, we opened our .csv file and used information that we knew about the minimum x coor- dinate and the minimum y-coordinate of each annotation, we summed the minimum x-coordinate to the width of the box to find out each coordinate of the box. Because Tensorflow does not read .csv files, but rather reads .tf files of the format .records file, a format for storing a sequence of bi- nary records, we used a python script that used the Pandas package in order to scan the .csv file and write them into a new file. This file contained the .csv data stored as .records file. These records were used to adjust the weights and biases of our model, which is the ‘ssdmobilenetv1coco’ model. A Python script in the TensorFlow library, train.py, accessed the model and the .records, and used the annotations from the .records file to constantly run through the model to adjust the weights and biases to decrease the loss of the model. The longer we ran train.py, the lower the loss function became. The weights and biases found by training our model were stored as a checkpoint in our model directory. Checkpoints with the weights and biases were used by exportinferencegraph.py, which used the checkpoint to create a frozen graph that would freeze the weights and biases in place. Our detection script, deeplearningdetection.py uses the frozen graph from exportinferencegraph.py to analyze new images in order to figure out where a weapon may be located. 12 Figure 9: Training Workflow of Deep Learning and the Scripts that were Used Model Choice: We chose to use the MobileNet Single Shot Detector v1 framework from the TensorFlow Model Zoo, an open source library by Google that stores models for object detection. MobileNet is an infrastructure created by Google to run on low-computational hardware. A Single Shot Detector is made to detect objects using one frame, instead of previous algorithms that used multiple frames for detections like the Sliding Window Approach. This model was already pre- trained on the Common Objects in Context (COCO) dataset. We can use this model because when creating a custom object detector, the training process only changes the last couple layers of the model. The model already has an understanding of basic features like edges, contours, shapes, etc in its beginning layers. The training that we do allows the model to differentiate every other object from our weapon. Results: Our deep learning model was very effective for identifying our experimental weapon. The most impressive aspect of our model, however, was that it was able to accurately detect new guns that it had never been trained on. This is an advantage of deep learning in general, and we were able to see it for ourselves. Figure 10: Detection on new Weapon 13 10.3 Deep Learning Model Iterations Model # Change Reason and Result 1 Agent-Carter Images Extraction We wanted to use movie weapons as there are a lot of them, but there were too many environmental factors in the image 2 200 Images of One Weapon The detections were erratic because there wasn’t enough data. 3 300 Images, 2 Backdrops The results were better, but faces were being detected Batch size = 20 so we added more data with faces 300x300 input 4 400 Images, 3 Backkdrops Faces are still detected. Batch = 22 480x260 input to maintain ratio 5 650 Images (Shifted Perspective) Faces detected less, but there are overall good detections Batch size = 23 Chose aspect ratios to fit the ratio of a weapon Table 5: Iterations of Deep Learning Model Figure 11: Model 1 Output This model was over fit for our data, and so the model was predicting more than it was sup- posed too. We fixed this later on by including more data in the training. 14 Figure 12: Example Tensorboard output 10.4 Mixed Model: Computer Vision and Deep Learning Goal: Incorporate our CV model into our deep learning model in order to make it faster and require less computational power while still maintaining a high detection rate. Process: In order to mix our OpenCV algorithm and our deep learning model we decided to begin detection with OpenCV and back it up with Deep Learning. We began by running our OpenCV algorithm as described earlier. If there were any detections made with small areas or extremely large areas, they were ignored. The computer vision algorithm was the first and broadest layer of detection. If the area was in the correct range, our algorithm would initiate the deep learning model de- scribed previously in our Algorithms section to verify it. Once deep learning detected an object, the predicted annotation of deep learning would be compared with the OpenCV detection. If these de- tections were close by (within 400 pixels), the algorithm would show the detection, and this meant that Deep Learning would be combined with OpenCV in order to detect objects. If CV did not detect an object in a viable area, the deep learning model would be used every 300 ms to detect the object. To optimize the computation power of this model, we could easily remove the two-step verification process or increase how sparse the deep learning model is used, and instead, we could just run the computer vision algorithm or the deep learning algorithm. This flexibility is why we believe that the mixed algorithm will be a better fit for consumers and clients. Results: Mixed learning detected more guns than our deep learning model. We noticed that there were much fewer false detections due to the use of the two-step verification process. The deep learning model functioned as a default case if the computer vision algorithm was not detecting the weapon; we saw this happen when we tested the mixed model on a new weapon. 15 10.5 Model Flowcharts Figure 13: Flowchart of Mixed Model 16 11 Conclusion Our hypothesis was correct about our mixed learning model being better suited for object de- tection than the deep learning model and OpenCV object detection algorithms. This is due to the ability of using OpenCV and the deep learning model to make up for each other’s deficiencies in the mixed model. Our analysis shows that our mixed model is the best way to detect weapons in a reliable and inexpensive manner using existing video capturing infrastructure due to its high de- tection rate and flexibility to optimize its parameters. The relationship between the deep learning and computer vision detection algorithms allowed object detection to become much more accurate than each of the individual object detection algorithms by themselves. We believe that we could improve our mixed model in many different ways. For example, the size of the training set may have hindered the ability of our model to reach its maximum potential. We currently have a test set of 600 images, but we plan to increase this number to thousands in the future. This would allow our model to become much stronger and more accurate. We also plan to improve the model to detect objects other than just firearms like knives or other weapons. Overall, we would recommend using our mixed algorithm over the other models alone. In the future, we plan to increase our training set size, improve our computer vision and deep learning models and subsequently our mixed model, and also test our models with different computing power setups to see the advantages and disadvantages of each algorithm. Developing accessible security technology can help us minimize the potential loss in an active shooting incident, and we hope to continue our research in the future. 17 12 Appendix A: Our Installation Process 1. Setting up Python and PyCharm In order to use TensorFlow or OpenCV, we installed Python. On Windows 10, Tensorflow is only compatible with Python version 3.5-3.7. We installed Python 3.6.4 at python.org/downloads/downloads/. In addition to Python, we installed an IDE in order to make our development process with Python scripts easier. We chose PyCharm, a professional IDE from JetBrains Inc. using stu- dent licenses that we obtained for free. We installed it from https://www.jetbrains.com/pycharm/download/#s 2. Setting up Python Virtual Environment To use Tensorflow and OpenCV in our Python project, we installed many dependencies that we configured to specific paths. These paths and libraries are stored in our environmental variables as well as a virtual environment (venv). 3. Setting up TensorFlow Tensorflow is a library provided by Google in order to evaluate large amounts of data stored in multi-dimensional arrays to make a decision, or in our case, a detection. We started off by installing the TensorFlow Object Detection Models repository from github, and we in- stalled the Tensorflow library with gpu support in the python virtual environment by using the command pip install tensorflow-gpu ==¡version¿ #we used version 1.8.0 after rigorous testing. We also downloaded the SSD MobileNet v1 framework for our model from the Ten- sorflow Model Zoo repository. 4. Installing Nvidia Drivers and GPU Libraries We used an HP OMEN 15 RTX 2060 Intel i7 Laptop. In order to use the GPU in TensorFlow, we had to install specific Nvidia drivers. To work with tensorflow, we had to install Drivers, CUDA Toolkit, and CUDNN. All of the installations can be found on the Nvidia website. We originally installed CUDA Toolkit 10.0 with CUDNN 7.4, but we constantly had errors in our environment with compatibility, so after extensive research we finally approached CUDA Toolkit 9.0 with CUDNN 7.0.5. 5. Setting up Path variables When we ran our program Python needed to access CUDA drivers in order to run. To ac- cess the CUDA drivers,we had to create variables for the drivers and packages in the ‘Edit environment variables’ setting on Windows. There, we had to create system variables with the dedicated path for the file we wanted to be recognized in our program. 6. Installing Tensorflow Object Detection API, a trained model on the COCO dataset, and CocoAPI 18 In order to run object detection scripts in Python, we needed to utilize the Tensorflow Ob- ject Detection API library on GitHub. This library is an open-source framework which makes it easier to construct models and train on them. We cloned this library using Git and often utilized the research folder to conduct our operations. The framework of our model came from another GitHub repository, Tensorflow Model Zoo. This library consists of many dif- ferent models, and we chose the Single Shot Detection Framework model. For this model to run, we also had to pip install CocoAPI in our Python virtual environment. 19 13 Appendix B: The Code Sampling Images from Live Webcam Script ### Author: Brandon Vu ### Sampling Images from Live Webcam Script import cv2 import numpy as np import time cap = cv2.VideoCapture(0) name = 515 time0 = int(round(time.time() * 1000)) while True: _, frame = cap.read() cv2.imshow("Frame", frame) org = frame.copy() key = cv2.waitKey(1) difference = int(round(time.time() * 1000)) - time0 if difference > 250: time0 = int(round(time.time() * 1000)) # save picture print("saved to: " + "F:\\ims3\\" + "color_" + str(name) + ’.jpg’) cv2.imwrite("F:\\ims3\\" + "color_" + str(name) + ’.jpg’, org) name = name + 1 cap.release() cv2.destroyAllWindows() 20 Computer Vision Algorithm ### Author: Brandon Vu import numpy as np import os import six.moves.urllib as urllib import sys import tarfile import tensorflow as tf import zipfile from collections import defaultdict from io import StringIO from matplotlib import pyplot as plt from PIL import Image import os, os.path import cv2 import time from playsound import playsound import numpy test_metric = False def greatest_contour(contours): max = cv2.contourArea(contours[0]) index = 0 max_index = 0 for c in contours: if cv2.contourArea(c) > max: max = cv2.contourArea(c) max_index = index index = index + 1 return contours[max_index] cap = cv2.VideoCapture(0) time1 = round(time.time() * 1000) start = round(time.time() * 1000) deltas = [] font = cv2.FONT_HERSHEY_COMPLEX list = os.listdir(’C:\\Users\\purva\\Documents\\model\\metric_test_set_1\\’) h = [] # for i in list: while True:#len(deltas) < 700 delta = round(time.time() * 1000) - time1 deltas.append(delta) time1 = round(time.time() * 1000) # if test_metric: # #image_np = cv2.imread("C:\\Users\\purva\\Documents\\model\\metric_test_set_ 21 # else: ret, image_np = cap.read() hsv = cv2.cvtColor(image_np, cv2.COLOR_BGR2HSV) rgb = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) lower_hsv = np.array([21, 113, 0]) upper_hsv = np.array([183, 118, 77]) mask = cv2.inRange(hsv, lower_hsv, upper_hsv) mask_black = cv2.inRange(rgb, np.array([0, 0, 0]), np.array([30, 50, 35])) new = mask + mask_black mask2 = cv2.blur(new, (4, 4)) kernel = np.ones((5, 5), np.uint8) kernel_dilate = np.ones((2, 2), np.uint8) dilation = cv2.dilate(mask2, kernel_dilate, iterations=1) closing = cv2.morphologyEx(dilation, cv2.MORPH_CLOSE, kernel) mask1 = cv2.GaussianBlur(closing, (9, 9), 0) contours, _ = cv2.findContours(mask1, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE) for cnt in contours: area = cv2.contourArea(cnt) approx = cv2.approxPolyDP(cnt, 0.05 * cv2.arcLength(cnt, True), True) x = approx.ravel()[0] y = approx.ravel()[1] if area > 2000 and area < 12000: if 3 <= len(approx) <= 10: con = greatest_contour(contours) cv2.drawContours(image_np, [approx], 0, (0, 0, 0), 5) cv2.putText(image_np, "Weapon", (x, y), font, 0.5, (0, 0, 0)) if test_metric: # cv2.imwrite(’C:\\Users\\purva\\Documents\\model\\metric_test_set_2\\results_ M = cv2.moments(greatest_contour(contours)) center_contour = [int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"])] h.append(center_contour) cv2.imshow(’df’,mask2) cv2.imshow(’Computer Vision Detection Only’, cv2.resize(image_np, (640, 480))) if cv2.waitKey(1) & 0xFF == ord(’q’): cap.release() cv2.destroyAllWindows() break total = 0 for i in deltas: total = total + i average = total / len(deltas) print("Average detection time: " + str(average)) # print("CONTOURS " + str(h)) 22 # a = numpy.asarray(deltas) # numpy.savetxt(’C:\\Users\\purva\\Documents\\model\\CVTIME.csv’, a, delimiter=",") 23 Deep Learning Model import numpy as np import os import six.moves.urllib as urllib import sys import tarfile import tensorflow as tf import zipfile from collections import defaultdict from io import StringIO from matplotlib import pyplot as plt from PIL import Image import os, os.path import cv2 import time from playsound import playsound # IMPORTANT : this file was found on https://pythonprogramming.net/video-tensorflow-ob # /?completed=/introduction-use-tensorflow-ob # and updated by Lambert Rosique for PenseeArtificielle.fr for the sole purpose of a t # Modified and adapted by Brandon Vu from object_detection.utils import label_map_util from object_detection.utils import visualization_utils as vis_util import numpy cap = cv2.VideoCapture(0) PATH_TO_CKPT = "C:\\Users\\purva\\Documents\\model\\model_dir_5\\exported\\frozen_infe PATH_TO_LABELS = "C:\\Users\\purva\\Documents\\model\\map.pbtxt" NUM_CLASSES = 1 detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, ’rb’) as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name=’’) label_map = label_map_util.load_labelmap(PATH_TO_LABELS) categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes use_display_name=True) category_index = label_map_util.create_category_index(categories) 24 def load_image_into_numpy_array(image): (im_width, im_height) = image.size return np.array(image.getdata()).reshape( (im_height, im_width, 3)).astype(np.uint8) def getSmallestBox(boxes): min = boxes[0][0][3] * boxes[0][0][2] index = 0 for i in range(len(boxes[0][0])): if (boxes[0][i][3] * boxes[0][i][2]) < min: min = (boxes[0][i][3] * boxes[0][i][2]) index = i second = 0 for i in range(len(boxes[0][0])): if i is not index: if (boxes[0][i][3] * boxes[0][i][2]) < min: min = (boxes[0][i][3] * boxes[0][i][2]) second = i for j in range(len(boxes[0][0])): if j is not index: boxes[0][j][0] = 0 boxes[0][j][1] = 0 boxes[0][j][2] = 0 boxes[0][j][3] = 0 return boxes[0][second] def getSecondBig(boxes): max = boxes[0][0][3] * boxes[0][0][2] index = 0 for i in range(len(boxes[0][0])): if (boxes[0][i][3] * boxes[0][i][2]) > max: max = (boxes[0][i][3] * boxes[0][i][2]) index = i second = 0 for i in range(len(boxes[0][0])): if i is not index: if (boxes[0][i][3] * boxes[0][i][2]) > max: max = (boxes[0][i][3] * boxes[0][i][2]) second = i # for j in range(len(boxes[0][0])): # if j is not index: # boxes[0][j][0] = 0 # boxes[0][j][1] = 0 # boxes[0][j][2] = 0 # boxes[0][j][3] = 0 return boxes[0][second] def clearBoxes(boxes): 25 max = boxes[0][0][3] * boxes[0][0][2] index = 0 for i in range(len(boxes[0][0])): if (boxes[0][i][3] * boxes[0][i][2]) > max: max = (boxes[0][i][3] * boxes[0][i][2]) index = i second = 0 for i in range(len(boxes[0][0])): if i is not index: if (boxes[0][i][3] * boxes[0][i][2]) > max: max = (boxes[0][i][3] * boxes[0][i][2]) second = i for j in range(len(boxes[0][0])): if j is not index or j is not second: boxes[0][j][0] = 0 boxes[0][j][1] = 0 boxes[0][j][2] = 0 boxes[0][j][3] = 0 return boxes with detection_graph.as_default(): session_config = tf.ConfigProto() session_config.gpu_options.allow_growth = True # or use session_config.gpu_options.per_process_gpu_memory_fraction = 0.7 time1 = round(time.time() * 1000) start = round(time.time() * 1000) deltas = [] font = cv2.FONT_HERSHEY_COMPLEX list = os.listdir(’C:\\Users\\purva\\Documents\\model\\metric_test_set_1\\’) h = [] with tf.Session(graph=detection_graph, config=session_config) as sess: # while True: while True:#len(deltas) < 700 delta = round(time.time() * 1000) - time1 deltas.append(delta) time1 = round(time.time() * 1000) ret, image_np = cap.read() # image_np = cv2.imread("C:\\Users\\purva\\Documents\\model\\metric_test_s image_np_expanded = np.expand_dims(image_np, axis=0) image_tensor = detection_graph.get_tensor_by_name(’image_tensor:0’) boxes = detection_graph.get_tensor_by_name(’detection_boxes:0’) scores = detection_graph.get_tensor_by_name(’detection_scores:0’) classes = detection_graph.get_tensor_by_name(’detection_classes:0’) num_detections = detection_graph.get_tensor_by_name(’num_detections:0’) 26 (boxes, scores, classes, num_detections) = sess.run( [boxes, scores, classes, num_detections], feed_dict={image_tensor: image_np_expanded}) # if(num_detections >= 1): # playsound(’C:\\Users\\purva\\Documents\\model\\siren.wav’) # Visualization of the results of a detection. h.append(getSecondBig(boxes)) vis_util.visualize_boxes_and_labels_on_image_array( image_np, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=12) # cv2.imwrite( # ’C:\\Users\\purva\\Documents\\model\\metric_test_set_3\\results_deep # image_np) cv2.imshow(’Deep Learning Object Detection’, cv2.resize(image_np, (640, 48 if cv2.waitKey(1) & 0xFF == ord(’q’): cap.release() cv2.destroyAllWindows() break total = 0 for i in deltas: total = total + i average = total / len(deltas) print(str(average)) # print("CONTOURS " + str(h)) a = numpy.asarray(deltas) #numpy.savetxt(’C:\\Users\\purva\\Documents\\model\\DL_Time.csv’, a, delimiter=",") 27 Mixed Algorithm Code import numpy as np import os import six.moves.urllib as urllib import sys import tarfile import tensorflow as tf import zipfile from collections import defaultdict from io import StringIO from matplotlib import pyplot as plt from PIL import Image import os, os.path import cv2 import time from playsound import playsound import matplotlib.pyplot as plt ### Adapted script from the deep learning script. All of the integration of computer v ### is developed by Brandon Vu from object_detection.utils import label_map_util from object_detection.utils import visualization_utils as vis_util import numpy cap = cv2.VideoCapture(0) PATH_TO_CKPT = "C:\\Users\\purva\\Documents\\model\\model_dir_5\\exported\\frozen_infe PATH_TO_LABELS = "C:\\Users\\purva\\Documents\\model\\map.pbtxt" NUM_CLASSES = 1 detection_graph = tf.Graph() with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, ’rb’) as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name=’’) label_map = label_map_util.load_labelmap(PATH_TO_LABELS) categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes use_display_name=True) category_index = label_map_util.create_category_index(categories) 28 def CVErrorDL(CVcontours,Dlcontours): err=[] returnArray=[] for i in range(len(CVcontours)): M = cv2.moments(CVcontours[i]) cv= [int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"])] tuple=[0,0] tuple[0]=i for j in range(len(Dlcontours)): tuple[1]=j dl=[((Dlcontours[0][j][0] + Dlcontours[0][j][3]) / 2), ((Dlcontours[0][j][ err.append(getDistance(cv,dl)) returnArray.append(tuple) min=0 underFifty=0 found = False for k in range(len(err)): if err[k]<180: found = True underFifty=k break # print(err[k]) if found is False: return False,CVcontours[0],Dlcontours[0] else: returnCV=returnArray[underFifty][0] returnDL=returnArray[underFifty][1] return True,returnCV,Dlcontours[returnDL] def greatest_contour(contours): max = cv2.contourArea(contours[0]) index = 0 max_index = 0 for c in contours: if cv2.contourArea(c) > max: max = cv2.contourArea(c) max_index = index index = index + 1 return contours[max_index] def load_image_into_numpy_array(image): (im_width, im_height) = image.size 29 return np.array(image.getdata()).reshape( (im_height, im_width, 3)).astype(np.uint8) def getDistance(center1, center2): x = (center1[0] - center2[0]) * (center1[0] - center2[0]) y = (center1[1] - center2[1]) * (center1[1] - center2[1]) return (x + y)**0.5 def getDistanceBoxClosestToPoint(boxes, center): min = getDistance([((boxes[0][0][0] + boxes[0][0][3]) / 2), ((boxes[0][0][1] + box for i in range(len(boxes[0][0])): if getDistance([(boxes[0][i][0] + boxes[0][i][3]) / 2, ((boxes[0][i][1] + boxe center) < min: min = getDistance([((boxes[0][i][0] + boxes[0][i][3]) / 2), ((boxes[0][i][ center) return min def getBoxClosestToPoint(boxes, center): min = getDistance([int((boxes[0][0][0] + boxes[0][0][3]) / 2), int((boxes[0][0][1] index = 0 for i in range(len(boxes[0][0])): if getDistance([int((boxes[0][i][0] + boxes[0][i][3]) / 2), int((boxes[0][i][1 min = getDistance([int((boxes[0][i][0] + boxes[0][i][3]) / 2), int((boxes[ index = i return boxes[0][index] with detection_graph.as_default(): session_config = tf.ConfigProto() session_config.gpu_options.allow_growth = True # or use session_config.gpu_options.per_process_gpu_memory_fraction = 0.7 time1 = round(time.time() * 1000) start = round(time.time() * 1000) deltas = [] timeDeep=time.time()*1000 first = True font = cv2.FONT_HERSHEY_COMPLEX cv_check = False text= "0" center_contour = 0 list = os.listdir(’C:\\Users\\purva\\Documents\\model\\metric_test_set_1\\Data\\’) with tf.Session(graph=detection_graph, config=session_config) as sess: # while True: while True: # while len(deltas) <= 700:# 30 delta = round(time.time() * 1000) - time1 deltas.append(delta) time1 = round(time.time() * 1000) ret, image_np = cap.read() write_cv = True # raw = image_np.copy() # # image_np = cv2.imread("C:\\Users\\purva\\Documents\\model\\metric_test_s hsv = cv2.cvtColor(image_np, cv2.COLOR_BGR2HSV) rgb = cv2.cvtColor(image_np, cv2.COLOR_BGR2RGB) lower_hsv = np.array([21, 113, 0]) upper_hsv = np.array([183, 118, 77]) mask = cv2.inRange(hsv, lower_hsv, upper_hsv) mask_black = cv2.inRange(rgb, np.array([0, 0, 0]), np.array([30, 50, 35])) new = mask + mask_black mask2 = cv2.blur(new, (4, 4)) kernel = np.ones((5, 5), np.uint8) kernel_dilate = np.ones((2, 2), np.uint8) dilation = cv2.dilate(mask2, kernel_dilate, iterations=1) closing = cv2.morphologyEx(dilation, cv2.MORPH_CLOSE, kernel) mask1 = cv2.GaussianBlur(closing, (9, 9), 0) contours, _ = cv2.findContours(mask1, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMP approx = None for cnt in contours: area = cv2.contourArea(cnt) M = cv2.moments(greatest_contour(contours)) center_contour = [int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"])] approx = cv2.approxPolyDP(cnt, 0.05 * cv2.arcLength(cnt, True), True) x = approx.ravel()[0] y = approx.ravel()[1] if area > 2000 and area < 10000: if 3 <= len(approx) <= 6: cv_check = True # cv2.drawContours(image_np, [approx], 0, (0, 0, 0), 5) break else: cv_check = False 31 else: cv_check = False if cv_check is True or first is True: first = False image_np_expanded = np.expand_dims(image_np, axis=0) image_tensor = detection_graph.get_tensor_by_name(’image_tensor:0’) boxes = detection_graph.get_tensor_by_name(’detection_boxes:0’) scores = detection_graph.get_tensor_by_name(’detection_scores:0’) classes = detection_graph.get_tensor_by_name(’detection_classes:0’) num_detections = detection_graph.get_tensor_by_name(’num_detections:0’ (boxes, scores, classes, num_detections) = sess.run( [boxes, scores, classes, num_detections], feed_dict={image_tensor: image_np_expanded}) # if(num_detections >= 1): # playsound(’C:\\Users\\purva\\Documents\\model\\siren.wav’) found,contour,box=CVErrorDL(contours,boxes) box1 = [[box]] print(found) if found: write_cv = False vis_util.visualize_boxes_and_labels_on_image_array( image_np, np.squeeze(box1), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=12) text = "Mixed Algorithm" else: cv_check = False if cv_check is False and (time.time()*1000-timeDeep)>300: text = "Deep Learning" timeDeep = time.time()*1000 image_np_expanded = np.expand_dims(image_np, axis=0) image_tensor = detection_graph.get_tensor_by_name(’image_tensor:0’) boxes = detection_graph.get_tensor_by_name(’detection_boxes:0’) scores = detection_graph.get_tensor_by_name(’detection_scores:0’) classes = detection_graph.get_tensor_by_name(’detection_classes:0’) 32 num_detections = detection_graph.get_tensor_by_name(’num_detections:0’ (boxes, scores, classes, num_detections) = sess.run( [boxes, scores, classes, num_detections], feed_dict={image_tensor: image_np_expanded}) # if(num_detections >= 1): # playsound(’C:\\Users\\purva\\Documents\\model\\siren.wav’) vis_util.visualize_boxes_and_labels_on_image_array( image_np, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), category_index, use_normalized_coordinates=True, line_thickness=12) cv2.putText(image_np, text, (0, 350), font, 1, (0, 0, 255)) # cv2.imwrite( # ’C:\\Users\\purva\\Documents\\model\\mixed_4_results\\’ + ’_’ + str( # # image_np) cv2.imshow(’Mixed Detection’, cv2.resize(image_np, (640, 480))) if cv2.waitKey(1) & 0xFF == ord(’q’): cap.release() cv2.destroyAllWindows() break total = 0 # a = numpy.asarray(deltas) # numpy.savetxt(’C:\\Users\\purva\\Documents\\model\\deltas_mixed_detection_v1 for i in deltas: total = total + i # average = total / len(deltas) # (str(average)) 33 14 References [1] https://www.cnn.com/interactive/2019/07/us/ten-years-of-school-shootings-trnd/ [2] https://athena-security.com/ [3] https://www.cdc.gov/nchs/fastats/homicide.htm 34 15 About the Authors Brandon and Yash were both lead programmers on their First Tech Challenge Robotics team. They both found passion for Artificial Intelligence and Public Safety and decided to complete a research project on that topic. Brandon Vu, a senior, interned at the Johns Hopkins Applied Physics Lab last summer where he worked on autonomous boat object detection and learned the process of training deep learning models. Although all of the code and documentation he produced during the summer were proprietary, he was able to share the skills he learned with Yash and apply it to this new project. Yash Prabhu, a ninth grader, was responsible for the Graphical User Interface, annotations, and hardware that was used in this experiment. 35
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-