CSIE7435 Project 1 Report: SimpleNN Environment Setup and Execution Hee Young, Jung (t08303135) 1 Running Environments and Problems Table 1: Hardware Settings Number of Used CPU RAM Servers 3 Two 3.3 GHz 4GB Intel Processor 2 Two 3.0 GHz 8GB Intel Processor a All severs were launched on Amazon Web Service with Ubuntu 16.04 Training MNIST dataset at default settings required more than 1GB of RAM and that of CIFAR dataset required more than 4GB. I ended up getting a server with 4GB RAM and 8GB RAM for each dataset, and this level of computing re- source could be demanding for some stu- dents. Training also takes time (see Table 1), and it is hard to train the neural net on multiple initial weights. SimpleNN package is overall comfortable and well explained. However, it would be more Table 2: Training Results Data (Seed) Time (Epoch) Test Accuracy MNIST (0) 16912s (100) 98.610% MNIST (3) 14484s (100) 98.230% MNIST (2) 10597s (100) 98.560% MNIST (0) 45609s (264) 98.850% CIFAR10 (0) 39831s (300) 73.44% CIFAR10 (0) 38345s (300) 73.860% convenient if the package is distributed in Jupyter Notebook for easy visualization and integration to other platforms such as google colab. 2 Training and Test Re- sults Due to the constraint of hardware and time, this experiment had a smaller number of epochs than that of the de- fault settings (=500). However, the fi- nal results should vary little for MNIST dataset. The validation and training loss is decreasing very slowly from a certain point (see Figure 1). We can infer that the model reached its limit, and it is hard 1 Figure 1: Train and Validation Loss for MNIST to enhance the model through more train- ing. The fact that training for 264 epochs for MNIST with the same seed did not have much difference with training for 100 epochs supports this (See Table 2). In the case of CIFAR10 dataset, there is a possibility that the test accuracy could end up higher with more training epochs. The training and validation loss was de- creasing (See Figure 2). One more thing we should note here is that the initial weights of the networks can make significant differences in the total training time. When we trained CNN 4layers on MNIST with the only difference in the initialization, the train- ing time varied from 10597s (2.94 hours) to 16912s (4.69 hours). We need more experiments to tell the specific reason. On the contrary, weight initializa- tion might not be a decisive factor for accuracy. Validation accuracy varied to some extent during the first part of the training, but the accuracy started to con- Figure 2: Train and Validation Loss for CIFAR10 verge through the training. At the later part of the training, there was no signif- icant difference, and the accuracy con- verged within 0.62% interval for MNIST dataset (see Table 2). We should note that even if the ini- tial weight is the same, the result could still be different (see Table 2). The vari- ance is possibly from the randomization of batch inputs. SimpleNN package takes batches randomly, so we cannot repro- duce a neural network just by setting the same seed. 2