Quality Improvement/Clinical Outcomes
Ko Un Park, MD (she/her/hers)
Associate Surgeon
Brigham and Women's Hospital, Dana Farber Cancer Institute
Quincy, Massachusetts, United States
Ko Un Park, MD (she/her/hers)
Associate Surgeon
Brigham and Women's Hospital, Dana Farber Cancer Institute
Quincy, Massachusetts, United States
Ko Un Park, MD (she/her/hers)
Associate Surgeon
Brigham and Women's Hospital, Dana Farber Cancer Institute
Quincy, Massachusetts, United States
Stuart Lipsitz, ScD
Director of Biostatistics, Center for Surgery and Public Health
Brigham and Women's Hospital, United States
Laura Dominici, MD
Associate Chief of Surgery
Brigham and Women's Faulkner Hospital; Associate Surgeon, Brigham and Women's Hospital, Dana-Farber Cancer Institute
Scituate, Massachusetts, United States
Filipa Lynce, MD
Director, Inflammatory Breast Center; Senior Physician
Dana-Farber Cancer Institute, United States
Christina A. Minami, MD, MS (she/her/hers)
Associate Surgeon
Brigham and Women's Hospital, Dana-Farber Cancer Institute
Boston, Massachusetts, United States
Faina Nakhlis, MD
Associate Surgeon
Brigham and Women's Hospital, Dana-Farber Cancer Institute, United States
Adrienne G. G. Waks, MD
Associate Director, Breast Oncology Clinical Research; Physician
Dana-Farber Cancer Institute
Boston, Massachusetts, United States
Laura Warren, MD
Associate Network Clinical Director, Radiation Oncology
Brigham and Women's Hospital, Dana-Farber Cancer Institute, United States
Nadine Eidman, n/a
Patient Advocate
-, United States
Jeannie Frazier, n/a
Patient Advocate
-, United States
Lourdes Hernandez, n/a
Patient Advocate
-, United States
Carla Leslie, n/a
Patient Advocate
-, United States
Susan Rafte, n/a
Patient Advocate
-, United States
Delia Stroud, n/a
Patient Advocate
-, United States
Joel S. Weissman, PhD
Deputy Director and Chief Scientific Officer of the Center for Surgery and Public Health
Brigham and Women's Hospital, United States
Tari A. King, MD (she/her/hers)
Chief, Division of Breast Surgery
Brigham and Women's Hospital, Dana-Farber Cancer Institute
Boston, Massachusetts, United States
Elizabeth A. Mittendorf, MD, PhD, MHCM (she/her/hers)
Professor of Surgery
Brigham and Women's Hospital, Dana Farber Cancer Institute
Boston, Massachusetts, United States
The internet is increasingly used by patients as a source of medical information, and the chatbot interface generative pretrained transformer (GPT) (ChatGPT, OpenAI) is an AI system that can provide humanlike responses to patient questions. It is estimated that GPT 3.5 had over 100 million users in June 2023. It is unclear if GPT 3.5 responses can be trusted as an accurate source of medical information for patients. This study sought to evaluate the accuracy and clinical concordance of GPT 3.5 responses to breast cancer questions.
Methods:
Through a series of focus groups with 6 breast cancer advocates, major themes in breast cancer care that patients are likely to ask were identified and 20 questions corresponding to the themes were developed (Table). Questions were posed to GPT 3.5 in July 2023 and repeated 3 times over the course of a week. Responses were graded by 6 breast oncology specialists (3 surgical oncologists, 1 radiation oncologist, and 2 medical oncologists) in the following 2 domains: accuracy (4 point Likert scale, 1 = comprehensive, 2 = correct but inadequate, 3 = some correct, some incorrect, 4 = completely incorrect) and clinical concordance (i.e. Information is clinically similar to the specialist response; 5 point Likert scale, 1 = completely similar, 5 = not similar at all). ANOVA means and 95% confidence intervals (CI) were calculated.
Results:
There were 360 evaluations per domain (20 questions x 6 physician graders x 3 repetitions). The combined average for accuracy was 1.88 (range 1-3; 95% CI 1.42-1.94) and for clinical concordance was 2.79 (range 1-5; 95% CI 1.94 - 3.64). For accuracy, 24% (n=87; 95% CI 10 – 47.8%) were graded as ‘some correct and some incorrect’. No responses were graded as completely incorrect. Overall, 7.8% (n=28; 95% CI 2.4 – 22.6%) were graded as not similar at all to the information provided by the clinician if asked the same question. Grouped thematically, workup questions faired best for accuracy and chemotherapy questions for clinical concordance. The worst accuracy scored question was regarding lymphedema after axillary surgery (question 6; average 2.67). The worst clinical concordance scored question was regarding immunotherapy (question 11; average 3.5).
Conclusions:
Although generative AI, specifically ChatGPT, shows potential to provide breast cancer patients accurate and clinically concordant information, occasionally it provided inaccurate and clinically discordant answers. As such, patients should not use ChatGPT as a reliable source of information until future studies and technology refinements establish AI’s reliability.