Variable Quantization of Ensemble Networks for semantic segmentation Introduction Advances in the field of Artificial Intelligence, especially Deep Learning, have positively impacted a countless number of industries, from chatbots to autonomous driving, this wave shows no signs of stopping. With an increasing level of penetration of deep learning applications, there is an increasing need to bring about a level of mobility to such applications. In other words, mobile devices and other low-power devices with limited processing, storage and memory will gain increased focus in the coming years. More effort is being focused on efficiency and cost-effective techniques for training as well inference stages of deep learning applications. In high risk tasks such as autonomous driving, while it is extremely important to have extremely high accuracy, it is also absolutely necessary that the systems involved in such tasks be able to make predictions quickly, without compromising accuracy. Our work proposes a set of techniques to make the task of semantic segmentation more efficient. We propose a qualitative value based ranking scheme of the classes in popular autonomous driving datasets, which will enable us to segregate classes into groups of not more than 5 classes each. By aggregating the networks trained on these groups into an ensemble, we build a system of efficient, low-power networks that present a substantial speedup in inference time while maintaining accuracy comparable to existing state-of-the-art semantic segmentation models such as PSPNet or DeepLabv3+. Proposed Method Semantic segmentation is a computer vision problem that requires the classification of each pixel in a given test image, as compared to classifying the entire image for image classification. It is a critical component of autonomous navigation and robotic systems. Compared to other tasks like object detection or face recognition, semantic segmentation is much more computationally expensive, given the requirement for classification of each pixel of an image. Training on higher resolution images would naturally give better accuracy due to the sharper definition of features but consequently also result in a dramatic increase in computation time. To be able to achieve an accuracy that is acceptable in high risk scenarios, extremely large models need to be used. However, this method comes at the cost of extremely high training cost and time, in terms of computation as well as storage. This is accompanied by a similarly intensive cost during inference. The proposed method involves combining the popular techniques of quantization and ensemble models to build a system that can achieve high accuracy at a significantly lower cost to memory as well as time. Traditional networks use full precision float parameters (32-bit) which are able to represent a large number of values but at the cost of high storage and expensive calculations. However, it has been shown, at least in the case of object detection, that 8-bit networks are able to achieve similar accuracies at much lesser computational cost and storage as compared to their 32-bit counterparts. Our method aims to optimize the time and memory savings by employing variable quantization of networks. This entails quantizing networks to varying degrees depending on the importance of the data the network is trained on. The reasoning behind this idea is that when considering certain high risk tasks such as autonomous driving or robotic navigation, it is possible to assign certain classes in a dataset more value than other classes. Our approach focuses on prioritising the safety of living individuals or animals in an autonomous driving scenario. When prioritising lives, it would make sense to get the predictions for the classes that represent living things earlier than other predictions, as well as with higher accuracy. Following this line of thought, we groups classes based on their ranking and then train a single quantized network on each group. The degree of quantization of this network will depend on the position of these classes in the ranking, with lower ranked classes being used to train a network with lesser bits. The idea here is that we can afford to lose some accuracy in segmenting less important classes while giving more important classes to slightly heavier networks to ensure higher accuracy. To improve per-class accuracy, each network is trained on a limited number of classes and multiple such networks are used in an ensemble for predictions on all classes. Additionally, depending on the level of quantization used, the loss in accuracy can be very high. For such cases, we propose training 2 or more models for the same group, by using the popular technique of boosting. Class Ranking Since quantizing a large network which would run on the entire dataset would not produce any noticeable benefits, we shifted our focus on the dataset itself. In other words, we partition the dataset into groupings of not more than 5 classes per group. By employing this method, we ensure that each quantized network will only need sufficient representation capability for 5 or lesser classes, thus ensuring high accuracy can be achieved. The selection of classes is an important aspect of this entire process. The basis of our selection process lies in a qualitative analysis of each class. In other words, the more valuable a class in terms of ensuring the safety of living things, the more importance we attribute to that class. Consequently, classes that directly represent living things are given the highest priority, to ensure the highest representation capability is given to the networks trained on these classes. Likewise, other classes which might not directly represent living things but nevertheless are tied to them, are also given higher importance, such as roads and motorcycles. Although our ranking mechanism can be termed subjective, and can be modified based on other qualitative parameters, we believe that our approach represents a step forward in selective training of classes for inference devices that have a strict limit of resources such as memory or battery life. Variable bit-length representations While dividing the dataset into several groups certainly helps improve accuracy on quantized networks, not all classes are equally important or need equal amount of representational power. For example, among the Cityscapes dataset classes, sky and vegetation would not have nearly as much importance as classes like road and car. Hence, we can assign a value to each class based on how important it is (on a qualitative basis). It would make sense to group classes with similar values as it would allow us to assign groups to networks based on their overall value. As a further elaboration of the above points, we decide the level of quantization of a network based on the overall value of the group of classes that it will be trained on. This way, we can save the usage of memory and computation power in the case of low-value groups and redirect these resources to higher-value groups which will need better representational power. The table above depicts the values given to each class of the cityscapes dataset. Ensemble Networks Using ensemble-based techniques can greatly improve the accuracy in cases where there are a collection of weak classifiers. Two common methods used to create accurate ensemble systems are bagging and boosting which were proposed several years ago. Bagging helps in the reduction of variance whereas boosting can help reduce both bias and variance. While the techniques associated with ensemble methods were very popular and useful in improving the performance of systems like random forests, decision trees, the advent of CNNs resulted in a decline in the use of ensemble techniques. Since CNNs are very strong classifiers, the use of ensemble techniques would have very little effect on the performance of CNNs, increasing the computational complexity for little benefit. However, our idea is based on using quantized networks, which are weaker than full-precision networks. In such a situation, we have observed that the application of ensemble techniques on a collection of variable quantized networks has a positive effect on the accuracy of the networks, as compared to training on a single quantized network. There has been very less effort to combine the merits of neural network quantization and ensemble techniques to build strong classifiers. This paper proposed binarizing networks and then using multiple such binary nets in an ensemble, for the task of image classification. To the best of our knowledge, no other works exist which explore this specific application on other computer vision tasks. Quantization of Neural Networks Deep Neural Networks (DNN) place a huge load on memory and battery resources during training and inference. Quantization is a highly effective method to combat the high memory and power requirements of DNNs. Several papers have demonstrated that the performance of quantized neural networks can be comparable to their full-precision counterparts. While this has been demonstrated in practice for simpler tasks like image detection and classification by a large number of research works, those that apply quantization on the task of image segmentation are far less in number. This is because of the higher representation ability required of a network trained for semantic segmentation. A single low-bit network would not be capable of accurately segmenting images containing objects of various classes. This is where our idea of attributing importance to classes comes in. By prioritising important classes by training on higher-bit networks, the resultant ensemble of low-bit networks is capable of segmenting classes of higher importance with better accuracy. This method enables a good balance between accuracy and resource use. 1. 8-bit networks Past experiments have shown that directly quantizing a model trained for floating point to 8-bit without any re-training will only cause a relatively low loss in accuracy, which can be further improved with some fine-tuning. However there exist works showing that in certain cases, quantizing in this fashion (post-training quantization) need not preserve accuracy. This was noticed in the case of smaller models such as MobileNet, which are already designed with efficiency in mind, hence possessing a smaller representational capacity. In such cases, instead of post-training quantization, it is better to perform quantization-aware training. 2. 4-bit and lower As opposed to 8-bit networks, when we quantize to 4-bits or lower, there is a much greater degradation in accuracy, given the huge drop in representational capacity. In such situations, it is especially helpful to utilise techniques other than post-training quantization. We list the techniques we used below. a. Skip quantization of the first and last layers It has been shown that the first convolutional layer is more sensitive to quantization. Additionally, these layers usually contribute to a lesser degree to the overall computation, justifying the idea of skipping the quantization of the first and last layer. b. Re-training on pre-trained weights Experiments have shown that using the existing pre-trained weights of a full-precision models to train the quantized model leads to higher accuracy. c. Quantization-aware training This approach tsimulates the effects of quantization during the forward pass of the training phase. Backpropagation will happen as usual and the weights and biases are stored in full precision format to ensure that even minimal changes through gradients can be recorded. This image depicts the method for quantization aware training using the Neural Networks Distiller framework. Results 1. Cityscapes results a. PSPNet b. Deeplab-v3+ 2. Indian Driving Dataset Results using PSPNet Summary Our proposed method achieves the following 1. Comparable or higher per-class accuracy as compared to a single trained model. 2. We have multiple models, but since they are quantized to varying degrees, the combined size is still smaller than that of a single trained model. 3. The quantization greatly improves the inference time of each model, with lesser time taken for models that have lower bit-length. Future Improvements FPGA devices are custom built for maximizing performance of deep learning models. It would be useful to test our system of quantized models on low-power FPGA devices to test the accuracy and inference time, which would be significantly better than that of a single quantized model for all classes. Additionally, while the qualitative ranking of the classes is a fundamental aspect of this paper, it might help to incorporate a data driven approach to augment the rankings in a positive manner.