Towards real-time detection of squamous pre-cancers from oesophageal endoscopic videos

Gao, Xiaohong W. ORCID logoORCID:, Braden, Barbara, Taylor, Stephen and Pang, Wei (2019) Towards real-time detection of squamous pre-cancers from oesophageal endoscopic videos. Proceedings: 2019 18th IEEE International Conference on Machine Learning and Applications. In: ICMLA 2019, 16-19 Dec 2019, Boca Raton, Florida, USA. e-ISBN 9781728145501. [Conference or Workshop Item] (doi:10.1109/ICMLA.2019.00264)

PDF - Final accepted version (with author's formatting)
Download (806kB) | Preview


This study investigates the feasibility of applying state of the art deep learning techniques to detect precancerous stages of squamous cell carcinoma (SCC) cancer in real time to address the challenges while diagnosing SCC with subtle appearance changes as well as video processing speed. Two deep learning models are implemented, which are to determine artefact of video frames and to detect, segment and classify those no-artefact frames respectively. For detection of SCC, both mask-RCNN and YOLOv3 architectures are implemented. In addition, in order to ascertain one bounding box being detected for one region of interest instead of multiple duplicated boxes, a faster non-maxima suppression technique (NMS) is applied on top of predictions. As a result, this developed system can process videos at 16-20 frames per second. Three classes are classified, which are ‘suspicious’, ‘high grade’ and ‘cancer’ of SCC. With the resolution of 1920x1080 pixels of videos, the average processing time while apply YOLOv3 is in the range of 0.064-0.101 seconds per frame, i.e. 10-15 frames per second, while running under Windows 10 operating system with 1 GPU (GeForce GTX 1060). The averaged accuracies for classification and detection are 85% and 74% respectively. Since YOLOv3 only provides bounding boxes, to delineate lesioned regions, mask-RCNN is also evaluated. While better detection result is achieved with 77% accuracy, the classification accuracy is similar to that by YOLOYv3 with 84%. However, the processing speed is more than 10 times slower with an average of 1.2 second per frame due to creation of masks. The accuracy of segmentation by mask-RCNN is 63%. These results are based on the date sets of 350 images. Further improvement is hence in need in the future by collecting, annotating or augmenting more datasets.

Item Type: Conference or Workshop Item (Paper)
Keywords (uncontrolled): Videos; cancer; machine learning; image color analysis, real-time systems, Proposals, image segmentation, oesophagus endoscopy, pre-cancer detection, deep learning, real-time video processing, segmentation
Research Areas: A. > School of Science and Technology > Computer Science
Item ID: 27906
Notes on copyright: © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Useful Links:
Depositing User: Xiaohong Gao
Date Deposited: 18 Oct 2019 13:03
Last Modified: 29 Nov 2022 18:41

Actions (login required)

View Item View Item


Activity Overview
6 month trend
6 month trend

Additional statistics are available via IRStats2.