Research Datasets
Comprehensive collection of lung cancer histopathological and whole slide imaging datasets used in our ICMR-funded research project
Project Datasets
Primary datasets used in developing our deep learning ensemble model for lung cancer detection
A curated dataset developed by expert pathologists from the University of South Florida. The original dataset contained 750 images in three classes—250 benign lung tissue (LBT), 250 lung adenocarcinoma (LUAD), and 250 lung squamous cell carcinoma (LSCC). Through augmentation, the dataset was expanded to 15,000 images, with 5,000 images per class.
A large collection of LUAD and LSCC Whole Slide Images obtained from The Cancer Imaging Archive (TCIA). A total of approximately 300 WSIs were downloaded and annotated by certified medical consultants. Expert pathologists marked the Regions of Interest (ROIs), and image tiles were manually extracted using Aperio ImageScope. All tiles were standardized to 512 × 512 pixels.
Local histopathology data collected from the Gujarat Cancer Research Institute (GCRI). Approximately 1,500 lung carcinoma cases are included, captured at magnifications of 10× and 40× by trained onco-pathologists. The study is conducted under approved ethical guidelines.
Dataset Usage Guidelines
These datasets are used in our ICMR-funded research project for developing deep learning models for lung cancer detection. All datasets comply with their respective licenses and usage terms.
For academic and research use, proper citation of the original dataset creators is required. The LC25000 dataset is available under CC BY 4.0 license. CPTAC WSI collections are available for public research use. GCRI data is used under institutional ethical approval.
Research Ethics & Privacy
All medical imaging data used in this project complies with ethical guidelines, patient privacy regulations, and institutional review board (IRB) requirements. The GCRI local cohort data is collected under approved ethical clearance (EC/BHR/14/2024).
Citation for LC25000 Dataset
Borkowski AA, Bui MM, Thomas LB, Wilson CP, DeLand LA, Mastorides SM. Lung and Colon Cancer Histopathological Image Dataset (LC25000). arXiv:1912.12142v1 (2019).
Need Access to Our Datasets?
For research collaboration or dataset access requests, please get in touch with our team. We welcome partnerships with academic institutions and healthcare organizations.
Request Dataset Access