A five-year (2015 to 2019) analysis of studies focused on breast cancer prediction using machine learning: A systematic review and bibliometric analysis

Journal article

Zakia Salod, Y. Singh
Journal of public health research, 2020

Semantic Scholar DOI PubMedCentral PubMed

Cite

APA Click to copy
Salod, Z., & Singh, Y. (2020). A five-year (2015 to 2019) analysis of studies focused on breast cancer prediction using machine learning: A systematic review and bibliometric analysis. Journal of Public Health Research.

Chicago/Turabian Click to copy
Salod, Zakia, and Y. Singh. “A Five-Year (2015 to 2019) Analysis of Studies Focused on Breast Cancer Prediction Using Machine Learning: A Systematic Review and Bibliometric Analysis.” Journal of public health research (2020).

MLA Click to copy
Salod, Zakia, and Y. Singh. “A Five-Year (2015 to 2019) Analysis of Studies Focused on Breast Cancer Prediction Using Machine Learning: A Systematic Review and Bibliometric Analysis.” Journal of Public Health Research, 2020.

BibTeX Click to copy

@article{zakia2020a,
  title = {A five-year (2015 to 2019) analysis of studies focused on breast cancer prediction using machine learning: A systematic review and bibliometric analysis},
  year = {2020},
  journal = {Journal of public health research},
  author = {Salod, Zakia and Singh, Y.}
}

Abstract

The objective 1 of this study was to investigate trends in breast cancer (BC) prediction using machine learning (ML) publications by analysing country, first author, journal, institutional collaborations and co-occurrence of author keywords. The objective 2 was to provide a review of studies on BC prediction using ML and a blood analysis dataset (Breast Cancer Coimbra Dataset [BCCD]), and the objective 3 was to provide a brief review of studies based on BC prediction using ML and patients’ fine needle aspirate cytology data (Wisconsin Breast Cancer Dataset [WBCD]). The design of this study was as follows: for objective 1: bibliometric analysis, data source PubMed (2015-2019); for objective 2: systematic review, data source: Google and Google Scholar (2018-2019); for objective 3: systematic review, data source: Google Scholar (2016-2019). The inclusion criteria for objective 1 were all publication results yielded from the searches. All English papers that had a ‘PDF’ option from the search results were included for objective 2. A sample of the ‘PDF’ English papers were included for objective 3. All 116 female patients from the BCCD, consisting of 64 positive BC patients and 52 controls were included in the study for objective 2. For the WBCD, all 699 female patients comprising of 458 with a benign BC tumour and 241 with a malignant BC tumour were included for objective 3. All 2928 publications were included for objective 1. The results showed that the United States of America (USA) produced the highest number of publications (n=803). In total, 2419 first authors contributed towards the publications. Breast Cancer Research and Treatment was the highest ranked journal. Institutional collaborations mainly occurred within the USA. The use of ML for BC screening and detection was the most researched topic. A total of 19 distinct papers were included for objectives 2 and 3. The findings from these studies were never presented to clinicians for validations. In conclusion, the use of ML for BC screening and detection is promising. Significance for public health This is the first study to perform a snapshot of a bibliometric analysis on the topic of breast cancer (BC) prediction using machine learning (ML) by analysing publications from an online electronic database. This is also the first systematic review on studies that have focused on BC prediction using ML and a blood analysis dataset, specifically, the new publicly available Breast Cancer Coimbra Dataset (BCCD) which has the potential in identifying more efficient and cheap BC biomarkers and ML models. Additionally, we conducted a brief systematic review of studies focused on BC prediction using ML and a publicly available fine needle aspirate cytology dataset called the Wisconsin Breast Cancer Dataset (WBCD), which may also discover BC biomarkers. It is evident that the use of ML for BC screening, detection and identification of potential BC biomarkers is promising, however, these results need to be showcased to clinicians for validations.