Abstract
The mortality rate due to lung cancer is increasing rapidly day by day. The major reason behind this increasing mortality rate is not being able to detect the lung cancer at an early stage. Even due to advancement in technology, the number of radiologists is limited and they are being overworked. Various methods which are based on technologies like deep learning and CNN (Convolutional Neural Network) have been developed to automatically detect lung cancer through medical images. This paper presents a CNN system which is used for analyzing the patient imagery captured by the CT (Computed Tomography) scans, using the knowledge from both nuclear medicine and neural network. In this paper, the implementation of the CNN system to detect lung cancer is provided. Also, the layers which helps CNN in identifying the lung cancer are explained with the reasons for its suitability in medical image analysis. Along with that, a brief description of medical image dataset used, as well as the working environment required for managing lung nodule analysis using CNN, is specified. Due to advancement in the technology of CNN, it has become possible to diagnose the possibility of lung cancer and hence, begin with the medications earlier, thus helping to reduce the mortality rate.
INTRODUCTION
Cancer can be defined as the growth of abnormal cells which divide uncontrollably and destroy the body tissues. Major types of death causing cancers includes lung cancer, breast cancer, brain cancer, mouth cancer, blood cancer, etc. Lung cancer basically begins in the lungs and may either spread to the lymph nodes or to other organs in the body. It is caused due to major reasons like smoking, exposure to toxin and sometimes even due to family hereditary. Two broadly classified types of lung cancer are:
- A. Small Cell Lung Cancer (SCLC): For every count of 100 lung cancers diagnosed, 12 are of this type. It is usually caused due to smoking. This type of cancer tends to spread early, affecting the other organs.
- B. Non-small Cell Lung Cancer (NSCLC): For every count of 100 lung cancers diagnosed, 87 are of this type.
The rapid advancement in CT (Computed Tomography) and PET scan techniques have been remarkable. But ultimately it has led to the production of image data in huge numbers. This increases the workload on the radiologists which can be prone to erroneous diagnosis. This ultimately affects the end result. Recently, Convolutional Neural Network has been utilized remarkably in the field of medical, for diagnosis and analysis of medical image datasets. Appreciable reviews have been published on the working of CNN in applications like analysis of lungs, brain, prostate and breast cancers [1][2][3].
CT SCAN
CT stands for Computed Tomography. It is a medical imaging procedure that makes use of x-ray measurements from different angles to observe the object by producing the cross-sectional images of it. It helps the viewer to see inside the object without actually cutting it. CT scans are mostly used for diagnosis and therapy purpose. Computed tomography is better than x-rays in the sense that x-rays do not show the acute and chronic changes in the tissues of the lungs whereas CT scans can be used for detecting both acute and chronic changes.
An important advantage of CT is that it eliminates the region of disinterest. CT is better than barium enema for detection of tumors and also, it uses a lower radiation dose. CT scans can diagnose life-threatening conditions such as hemorrhage, blood clots, or cancer. It helps in diagnosing lung cancer at earlier stages. It is highly accurate in determining the cancerous mass, if present. Early changes in cell detection is remarkable than Magnetic Resonance Imaging (MRI).
DATASETS
As the implementation of CNNs require a huge setup of parameters consisting of specific hardware and software requirements. The datasets used for training and testing of the proposed system are:
A. Datasets of lung cancer CT images:
LIDC/IDRI Lung CT dataset:
To identify, address, and resolve challenging organizational, technical, and clinical issues, eight medical imaging companies and seven academic centers collaborated to provide a robust database. In total 1018 cases are there in LIDC/IDRI Database, each of which includes CT scan images and an associated XML file that records the results of the process performed by four experienced radiologists. In other words, XML file contains the labelled data of the corresponding CT images. In the initial blinded-read phase, each CT scan were reviewed by each radiologist independently and were marked the lesions belonging to one of three categories (‘nodule > or =3 mm,’ ‘nodule or =3 mm’). In the subsequent unblinded-read phase, each radiologist independently reviewed their own marks along with the anonymized marks of the three other radiologists to get a final opinion. In the database, there are 7371 lesions marked ‘nodule’ by at least one radiologist. 2669 of these lesions were marked ‘nodule > or =3 mm’ by at least one radiologist, of which 928 (34.7%) received such marks from all four radiologists. These 2669 lesions include nodule outlines and subjective nodule characteristic ratings. The LIDC/IDRI Database is expected to provide an essential medical imaging research resource to train, validate, and test the model [4].
B. Datasets used for the proposed system:
The described dataset is used for training the system for identifying the presence of cancer. The identified nodule is classified into three classes:
i. Benign:
A benign tumor is a tumor that does not invade its surrounding tissue or spread around the body. Hence, it can be stated that if the tumor is found to be benign then it is classified as non-cancerous.
ii. Malignant:
A malignant tumor is a tumor that may invade its surrounding tissue or spread around the body. Hence, it can be stated that if the tumor is found to be malignant then it is classified as cancerous.
iii. Malignant and Metastatic:
Metastasis is the process in which the malignant tumor breaks down and invade in the other tissues of the body. Such cancer cells of primary tumor which can travel to other organs such as lungs, bones, liver, brain are called metastatic tumors. These metastatic tumors are called secondary cancers as they are not primary cancer but arising from primary cancer. Some of them are curable, but many of them are not.
The proposed dataset is the actual PET image dataset of the patients of a multispecialty hospital which has been diagnosed for lung cancer. This dataset is used as testing data for the system, hence, classifying the patients’ data into benign and malignant.