Comparative Analysis of Face Detection Models: MTCNN, YOLO, and a Hybrid Approach

Authors

  • Arun Sharma Dept. of computer science and informatics/Central University of Himachal Pradesh, Shahpur, India
  • Kunika Dept. of computer science and informatics/Central University of Himachal Pradesh, Shahpur, India
  • Riya Dept. of computer science and informatics/Central University of Himachal Pradesh, Shahpur, India
  • Mayank Chopra Dept. of computer science and informatics/Central University of Himachal Pradesh, Shahpur, India
  • Pradeep Chouksey Dept. of computer science and informatics/Central University of Himachal Pradesh, Shahpur, India
  • Parveen Sadotra Dept. of computer science and informatics/Central University of Himachal Pradesh, Shahpur, India

DOI:

https://doi.org/10.26438/ijsrcse.v13i3.668

Keywords:

Face Detection, YOLO, MTCNN, Hybrid Model, Comparison, Algorithm Optimization

Abstract

Face detection is a vital subroutine in many computer vision systems, such as facial recognition systems, emotion detection systems and surveillance systems. Two of the many such algorithms that have emerged powerful in terms of accuracy as well as computation efficiency are Multi-task Cascaded Convolutional Networks (MTCNN) and You Only Look Once (YOLO). This paper involves a comparative review of MTCNN, YOLO and a Hybrid Model that fuses the two methods. The models are trained on the fareselmenshawii/face-detection-dataset and the performance is measured by accuracy, running time and the stability of detecting the faces. The experimental findings show that there are apparent trade-offs that exist between speed and detection accuracy across the models, but the hybrid strategy shows a balanced execution, as it efficiently takes this advantage of the strengths of both MTCNN and YOLO.

References

K. Zhang, Z. Zhang, Z. Li and Y. Qiao, “Joint face detection and alignment using multitask cascaded convolutional networks,” IEEE signal processing letters, vol. 23, p. 1499–1503, 2016.

J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.

B. Xu, W. Wang, L. Guo, G. Chen, Y. Wang, W. Zhang and Y. Li, “Evaluation of deep learning for automatic multi-view face detection in cattle,” Agriculture, vol. 11, p. 1062, 2021.

E. M. F. Caliwag, A. Caliwag, M. E. Morocho-Cayamcela and W. Lim, “Thermal Camera Face Detection and Alignment using MTCNN,” 한국통신학회 학술대회논문집, p. 319–321, 2020.

P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, 2001.

H. A. Rowley, S. Baluja and T. Kanade, “Neural network-based face detection,” IEEE Transactions on pattern analysis and machine intelligence, vol. 20, p. 23–38, 1998.

T.-Y. Lin, P. Goyal, R. Girshick, K. He and P. Dollár, “Focal loss for dense object detection,” in

Proceedings of the IEEE international conference on computer vision, 2017.

Z. Lu, C. Zhou, X. Xuyang and W. Zhang, “Face detection and recognition method based on improved convolutional neural network,” International Journal of Circuits, Systems and Signal Processing, vol. 15, p. 774–781, 2021.

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning, 2019.

K. He, X. Zhang, S. Ren and J. Sun, “Deep residual learning for image recognition,” in

Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.

N. C. Basjaruddin, E. Rakhman, Y. Sudarsa, M. B. Z. Asyikin and S. Permana, “Attendance system with face recognition, body temperature, and use of mask using multi-task cascaded convolutional neural network (MTCNN) Method,” Green Intelligent Systems and Applications, vol. 2, p. 71–83, 2022.

M. Ali, A. Diwan and D. Kumar, “Attendance system optimization through deep learning face recognition,” International Journal of Computing and Digital Systems, vol. 15, p. 1527–1540, 2024.

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani,

M. Minderer, G. Heigold, S. Gelly and others, “An image is worth 16x16 words: Transformers for image recognition at scale,” arXiv preprint arXiv:2010.11929, 2020.

Downloads

Published

2025-06-30

How to Cite

[1]
A. Sharma, Kunika, Riya, M. Chopra, P. Chouksey, and P. Sadotra, “Comparative Analysis of Face Detection Models: MTCNN, YOLO, and a Hybrid Approach”, Int. J. Sci. Res. Comp. Sci. Eng., vol. 13, no. 3, pp. 64–68, Jun. 2025.

Issue

Section

Research Article