
ShuffleNet to be far behind. Additionally, each model of miscalibrated in the sense of being
overconfident in the majority of its predictions. We were able to confirm these findings via the
sampling distribution of the ECE, confidence intervals, and hypothesis tests.
This project was limited in its breadth as we only studied three different models on one data
set. As a result, we are unable to make general conclusions about the calibration of specific neural
network architectures on general data sets. We are also limited as we were unable to test calibration
techniques as we did not train any of the tested models ourselves.
Going forward, it would be exciting to try to explore the calibration of more models and on
more data sets. This would allow us to make more general conclusions about neural network
calibration. Additionally, it would be very interesting to test which techniques during training
result in excellent model calibration. There a variety of interesting results on the ways in which
particular loss functions or regularization techniques impact model calibration in [1]. Finally, it
would be exciting to investigate the calibration of neural networks in different fields such as natural
language processing and audio.
Although limited, we can make the conclusion that DenseNet and ResNet have superior cali-
bration to ShuffleNet on the ImageNet data set. We can also confirm that DNNs remain limited
in their ability to provide confidence scores for predictions. This is important to consider as we
introduce the DNNs to new fields where misclassifying can be both costly and dangerous. Going
forward, researchers training and creating these models should consider the calibration as a key
component for both the model’s efficacy and safety.
References
[1] Nikita Vemuri. Scoring confidence in neural networks. 2020.
[2] Anh Nguyen, Jason Yosinski, and Jeff Clune. Deep neural networks are easily fooled: High
confidence predictions for unrecognizable images. In Proceedings of the IEEE conference on
computer vision and pattern recognition, pages 427–436, 2015.
[3] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng
Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-
Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer
Vision (IJCV), 115(3):211–252, 2015.
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image
recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition,
pages 770–778, 2016.
[5] Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. Shufflenet v2: Practical guidelines
for efficient cnn architecture design. In Proceedings of the European conference on computer
vision (ECCV), pages 116–131, 2018.
12