Large language models in global health
Tarabanis, C. et al. Performance of publicly available large language models on internal medicine board-style questions. PLOS Digit. Health 3, e0000604 (2024).
Google Scholar
Tierney, A. A. et al. Ambient artificial intelligence scribes: learnings after 1 year and over 2.5 million uses. NEJM Catal. Innov. Care Deliv. 6, CAT.25.0040 (2025).
Rao, A. S. et al. Synthetic medical education in dermatology leveraging generative artificial intelligence. NPJ Digit. Med. 8, 247 (2025).
Google Scholar
Yang, R. et al. Retrieval-augmented generation for generative artificial intelligence in health care. NPJ Health Syst. 2, 2 (2025).
Omiye, J. A., Gui, H., Rezaei, S. J., Zou, J. & Daneshjou, R. Large language models in medicine: the potentials and pitfalls: a narrative review. Ann. Intern. Med. 177, 210–220 (2024).
Google Scholar
Khan, M. S., Umer, H. & Faruqe, F. Artificial intelligence for low income countries. Humanit. Soc. Sci. Commun. 11, 1422 (2024).
Ong, J. C. L. et al. Artificial intelligence, ChatGPT, and other large language models for social determinants of health: current state and future directions. Cell Rep. Med. 5, 101356 (2024).
Google Scholar
Akbarialiabad, H. et al. The utility of generative AI in advancing global health. NEJM AI 2, AIp2400875 (2025).
Vaswani, A. et al. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 6000–6010 (Curran Associates, 2017).
Wei, J. et al. Chain-of-thought prompting elicits reasoning in large language models. In Proc. 36th International Conf. on Neural Information Processing Systems 1800, 24824–24837 (Curran Associates, 2022).
Raiaan, M. A. K. et al. A review on large language models: architectures, applications, taxonomies, open issues and challenges. IEEE Access 12, 26839–26874 (2024).
Wu, K. et al. An automated framework for assessing how well LLMs cite relevant medical references. Nat. Commun. 16, 3615 (2025).
Google Scholar
Liu, Y. & Wang, H. Who on Earth is using generative AI? World Dev. 199, 107260 (2026).
Gibney, E. Scientists flock to DeepSeek: how they’re using the blockbuster AI model. Nature (2025).
Google Scholar
Sandmann, S. et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making. Nat. Med. (2025).
Google Scholar
Gibney, E. China’s cheap, open AI model DeepSeek thrills scientists. Nature 638, 13–14 (2025).
Google Scholar
Ritoré, Á. et al. The role of open access data in democratizing healthcare AI: a pathway to research enhancement, patient well-being and treatment equity in Andalusia, Spain. PLOS Digit. Health 3, e0000599 (2024).
Maffulli, S. ‘Open source’ AI isn’t truly open—here’s how researchers can reclaim the term. Nature 640, 9 (2025).
Google Scholar
Groeneveld, D. et al. OLMo: accelerating the science of language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (eds Ku, L.-W. et al.) 15789–15809 (Association for Computational Linguistics, 2024).
Smithwick, J. et al. “Community health workers bring value and deserve to be valued too:” key considerations in improving CHW career advancement opportunities. Front. Public Health 11, 1036481 (2023).
Google Scholar
Stanford Center for Digital Health. Generative AI for Health in Low & Middle Income Countries (Stanford Center for Digital Health, 2025).
Ochieng, S. et al. Exploring the implementation of an SMS-based digital health tool on maternal and infant health in informal settlements. BMC Pregnancy Childbirth 24, 222 (2024).
Google Scholar
Liu, R. et al. AIDMAN: an AI-based object detection system for malaria diagnosis from smartphone thin-blood-smear image. Patterns 4, 100806 (2023).
Google Scholar
Li, J. et al. Integrated image-based deep learning and language models for primary diabetes care. Nat. Med. 30, 2886–2896 (2024).
Google Scholar
Huang, M. et al. Primary care quality and provider disparities in China: a standardized-patient-based study. Lancet Reg. Health West. Pac. 50, 101161 (2024).
Google Scholar
Yang, J. et al. Generalizability assessment of AI models across hospitals in a low–middle and high income country. Nat. Commun. 15, 8270 (2024).
Google Scholar
Liu, X., Alderman, J. & Laws, E. A global health data divide. NEJM AI 1, AIe2400388 (2024).
Alderman, J. E. et al. Tackling algorithmic bias and promoting transparency in health datasets: the STANDING Together consensus recommendations. Lancet Digit. Health 7, e64–e88 (2025).
Google Scholar
Olatunji, T. et al. AfriMed-QA: a pan-African, multi-specialty, medical question-answering benchmark dataset. In Proc. 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1948–1973 (Association for Computational Linguistics, 2025).
Adams, R. et al. Mapping the potentials and limitations of using generative AI technologies to address socio-economic challenges in LMICs. Preprint at VeriXiv (2025).
Vinuesa, R. et al. The role of artificial intelligence in achieving the Sustainable Development Goals. Nat. Commun. 11, 233 (2020).
Google Scholar
Gunasekeran, D. V. et al. National use of artificial intelligence for eye screening in Singapore. NEJM AI 1, AIcs2400404 (2024).
Thakur, R. Unraveling the brain drain dilemma: analysis among skilled information technology professionals of Nepal. Preprint at SSRN (2024).
Ahmed, M. I. et al. A systematic review of the barriers to the implementation of artificial intelligence in healthcare. Cureus 15, e46454 (2023).
Google Scholar
Eisinger-Mathason, T. S. K. et al. Data linkage multiplies research insights across diverse healthcare sectors. Commun. Med. 5, 58 (2025).
Google Scholar
Woldemariam, M. T. & Jimma, W. Adoption of electronic health record systems to enhance the quality of healthcare in low-income countries: a systematic review. BMJ Health Care Inform. 30, e100704 (2023).
Google Scholar
Ullah, E., Parwani, A., Baig, M. M. & Singh, R. Challenges and barriers of using large language models (LLM) such as ChatGPT for diagnostic medicine with a focus on digital pathology—a recent scoping review. Diagn. Pathol. 19, 43 (2024).
Google Scholar
Park, P. S., Schoenegger, P. & Zhu, C. Diminished diversity-of-thought in a standard large language model. Behav. Res. Methods 56, 5754–5770 (2024).
Google Scholar
Yang, Y., Liu, X., Jin, Q., Huang, F. & Lu, Z. Unmasking and quantifying racial bias of large language models in medical report generation. Commun. Med. 4, 176 (2024).
Google Scholar
Ahia, O. et al. Do all languages cost the same? Tokenization in the era of commercial language models. Preprint at arXiv (2023).
Alhanai, T. et al. Bridging the gap: enhancing LLM performance for low-resource African languages with new benchmarks, fine-tuning, and cultural adjustments. In The Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25) 27802–27812 (AAAI, 2025).
Han, T. et al. Medical large language models are susceptible to targeted misinformation attacks. NPJ Digit. Med. 7, 288 (2024).
Google Scholar
Dong, Y. et al. Position: building guardrails for large language models requires systematic design. In Proc. 41st International Conf. on Machine Learning 451, (JMLR.org, 2024).
Modi, N. D. et al. Assessing the system-instruction vulnerabilities of large language models to malicious conversion into health disinformation chatbots. Ann. Intern. Med. 178, 1172–1180 (2025).
Google Scholar
Hartman, V. et al. Developing and evaluating large language model-generated emergency medicine handoff notes. JAMA Netw. Open 7, e2448723 (2024).
Google Scholar
Peng, Y. et al. From GPT to DeepSeek: significant gaps remain in realizing AI in healthcare. J. Biomed. Inform. 163, 104791 (2025).
Google Scholar
Zeng, D., Qin, Y., Sheng, B. & Wong, T. Y. DeepSeek’s “low-cost” adoption across China’s hospital systems: too fast, too soon? JAMA (2025).
Google Scholar
Vered, M., Livni, T., Howe, P. D. L., Miller, T. & Sonenberg, L. The effects of explanations on automation bias. Artif. Intell. 322, 103952 (2023).
WHO. Leading the future of global health with responsible artificial intelligence. World Health Organization (2024).
McKinsey & Company. The economic potential of generative AI: the next productivity frontier. McKinsey & Company (2023).
Demombynes, G., Langbein, J. & Weber, M. The Exposure of Workers to Artificial Intelligence in Low- and Middle-Income Countries. Policy Research Working Paper (World Bank Group, 2025).
Ernst, E., Berg, J. & Moore, P. V. Editorial: Artificial intelligence and the future of work: humans in control. Front. Artif. Intell. 7, 1378893 (2024).
Google Scholar
Gage, A. D. et al. Disparities in telemedicine use and payment policies in the United States between 2019 and 2023. Commun. Med. 5, 52 (2025).
Google Scholar
Mahmoud, K., Jaramillo, C. & Barteit, S. Telemedicine in low- and middle-income countries during the COVID-19 pandemic: a scoping review. Front. Public Health 10, 914423 (2022).
Google Scholar
Ye, J., He, L. & Beestrum, M. Implications for implementation and adoption of telehealth in developing countries: a systematic review of China’s practices and experiences. NPJ Digit. Med. 6, 174 (2023).
Google Scholar
Kleinig, O. et al. Environmental impact of large language models in medicine. Intern. Med. J. 54, 2083–2086 (2024).
Google Scholar
Luccioni, S., Jernite, Y. & Strubell, E. Power hungry processing: watts driving the cost of AI deployment? In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency 85–99 (Association for Computing Machinery, 2024).
Chauhan, D., Bahad, P. & Jain, J. K. Sustainable AI: environmental implications, challenges, and opportunities. In Explainable AI (XAI) for Sustainable Development (eds Lakshmi, D. et al.) 1–15 (Association for Computing Machinery, 2024).
Alami, H. et al. Artificial intelligence in health care: laying the foundation for responsible, sustainable, and inclusive innovation in low- and middle-income countries. Global Health 16, 52 (2020).
Google Scholar
Jonnagaddala, J. & Wong, Z. S.-Y. Privacy preserving strategies for electronic health records in the era of large language models. NPJ Digit. Med. 8, 34 (2025).
Google Scholar
Greshake, K. et al. Not what you’ve signed up for: compromising real-world LLM-integrated applications with indirect prompt injection. In Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security 79–90 (Association for Computing Machinery, 2023).
Normile, D. Chinese firm’s large language model makes a splash. Science 387, 238 (2025).
Google Scholar
Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).
Google Scholar
Tordjman, M. et al. Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning. Nat. Med. 31, 2550–2555 (2025).
Google Scholar
Mikhail, D. et al. Performance of DeepSeek-R1 in ophthalmology: an evaluation of clinical decision-making and cost-effectiveness. Br. J. Ophthalmol. 109, 976–981 (2025).
Google Scholar
Kim, S. H. et al. Benchmarking the diagnostic performance of open source LLMs in 1933 Eurorad case reports. NPJ Digit. Med. 8, 97 (2025).
Google Scholar
Wu, Y. et al. An eyecare foundation model for clinical assistance: a randomized controlled trial. Nat. Med. (2025).
Google Scholar
Yuan, M. et al. Large-scale local deployment of DeepSeek-R1 in pilot hospitals in China: a nationwide cross-sectional survey. Preprint at medRxiv (2025).
Lin, L., Zhou, X., Yang, K. & Chen, X. DeepSeek powered solid dosage formulation design and development. Preprint at arXiv (2025).
Bordukova, M., Makarov, N., Rodriguez-Esteban, R., Schmich, F. & Menden, M. P. Generative artificial intelligence empowers digital twins in drug discovery and clinical trials. Expert Opin. Drug Discov. 19, 33–42 (2024).
Google Scholar
Gangwal, A. & Lavecchia, A. Unleashing the power of generative AI in drug discovery. Drug Discov. Today 29, 103992 (2024).
Google Scholar
Namba-Nzanguim, C. T. Artificial intelligence for antiviral drug discovery in low resourced settings: a perspective. Front. Drug Discov. 2, 1013285 (2022).
Nievas, M., Basu, A., Wang, Y. & Singh, H. Distilling large language models for matching patients to clinical trials. J. Am. Med. Inform. Assoc. 31, 1953–1963 (2024).
Google Scholar
Chakraborty, C., Bhattacharya, M., Lee, S.-S., Wen, Z.-H. & Lo, Y.-H. The changing scenario of drug discovery using AI to deep learning: recent advancement, success stories, collaborations, and challenges. Mol. Ther. Nucleic Acids 35, 102295 (2024).
Google Scholar
Nishan, M. D. N. H. AI-powered drug discovery for neglected diseases: accelerating public health solutions in the developing world. J. Glob. Health 15, 03002 (2025).
Google Scholar
Eisenstein, M. Overlooked and underfunded: neglected diseases exert a toll. Nature 598, S20–S22 (2021).
Google Scholar
Omar, M., Nadkarni, G. N., Klang, E. & Glicksberg, B. S. Large language models in medicine: a review of current clinical trials across healthcare applications. PLOS Digit. Health 3, e0000662 (2024).
Google Scholar
PATH. PATH launches clinical trial on the use of artificial intelligence in primary health care. PATH (2025).
Agweyu, A. et al. Large language model-assisted clinicians versus unassisted clinicians in clinical decision making: a multi-centre randomized controlled trial in Nairobi, Kenya. Preprint at Zenodo (2025).
Omar, M. et al. Evaluating and addressing demographic disparities in medical large language models: a systematic review. Int. J. Equity Health 24, 57 (2025).
Google Scholar
Beste, J. et al. Working towards a decolonized, longitudinal, and equitable global health training and partnerships program. J. Med. Educ. Curric. Dev. 12, 23821205251324297 (2025).
Google Scholar
Longhurst, C. A., Singh, K., Chopra, A., Atreja, A. & Brownstein, J. S. A call for artificial intelligence implementation science centers to evaluate clinical effectiveness. NEJM AI 1, AIp2400223 (2024).
Akbarialiabad, H. & Sewankambo, N. K. Centres of excellence in AI for global health equity—a strategic vision for LMICs. Nature 625, 450 (2024).
Google Scholar
WHO. Global Initiative on AI for Health. World Health Organization (2025).
Reid, M. J. A. et al. Announcing the Lancet Global Health Commission on artificial intelligence (AI) and HIV: leveraging AI for equitable and sustainable impact. Lancet Glob. Health 13, e611–e612 (2025).
Google Scholar
Cheney, C. Exclusive: donors commit $10M to include African languages in AI models. Devex (2025).
Gates Foundation. AI equity: ensuring access to AI for all. Gates Foundation (2025).
Ong, Q. C., Ang, C.-S., Lai, N. M., Atun, R. & Car, J. Differences in expert perspectives on AI training in medical education: secondary analysis of a multinational Delphi study. J. Med. Internet Res. 27, e72186 (2025).
Google Scholar
Ministry of Health of Lao People’s Democratic Republic. Lao People’s Democratic Republic: Digital Health Strategy, 2023–2027 (Ministry of Health of Lao People’s Democratic Republic, 2023).
Fondation Pierre Fabre. Training digital healthcare professionals in Africa: 6th class of graduates for the eHealth inter-university diploma. Fondation Pierre Fabre (2024).
Edzie, E. K. M. et al. Perspectives of radiologists in Ghana about the emerging role of artificial intelligence in radiology. Heliyon 9, e15558 (2023).
Google Scholar
Stewart, J. Tesla’s autopilot was involved in another deadly car crash. Wired (2018).
Adler-Milstein, J., Redelmeier, D. A. & Wachter, R. M. The limits of clinician vigilance as an AI safety bulwark. JAMA 331, 1173–1174 (2024).
Google Scholar
Long, D. & Magerko, B. What is AI literacy? Competencies and design considerations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems 1–16 (Association for Computing Machinery, 2020); https://doi.org/10.1145/3313831.3376727
Wang, L. et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit. Med. 7, 41 (2024).
Google Scholar
Celi, L. A. et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities—a global review. PLOS Digit. Health 1, e0000022 (2022).
Google Scholar
Yang, R. et al. Disparities in clinical studies of AI enabled applications from a global perspective. NPJ Digit. Med. 7, 209 (2024).
Google Scholar
Gehrman, E. How generative AI is transforming medical education. Harvard Medicine Magazine (2024).
College of Medicine Rockford. UICOMR students participate in AI curriculum. College of Medicine Rockford, University of Illinois College of Medicine (2024).
SUSA. Sustainable healthcare with digital health data competence. University of Oulu (2025).
Zhou, K. & Gattinger, G. The evolving regulatory paradigm of AI in MedTech: a review of perspectives and where we are today. Ther. Innov. Regul. Sci. 58, 456–464 (2024).
Google Scholar
WHO. Ethics and governance of artificial intelligence for health: WHO guidance. Executive summary. World Health Organization (2021).
Digital Watch Observatory. Kenya launches project to develop National AI Strategy in collaboration with German and EU partners. Digital Watch Observatory (2024).
Luminate. Partnerships will ensure inclusivity for Nigeria’s AI strategy. Luminate (2024).
Wairagkar, N. et al. The African Medicines Agency—a potential gamechanger that requires strategic focus. PLOS Glob. Public Health 5, e0004276 (2025).
Google Scholar
WHO. WHO Global Benchmarking Tool + Medical Devices (GBT + medical devices) for evaluation of national regulatory systems of medical devices including in-vitro diagnostics. World Health Organization (2024).
Martinson, S., Kong, L., Kim, C. W., Taneja, A. & Tambe, M. LLM-based agent simulation for maternal health interventions: uncertainty estimation and decision-focused evaluation. Preprint at arXiv (2025).
Gates Foundation. Large language model (LLM)-based conversational agent for women from prenatal to postnatal care. Gates Foundation: Global Grand Challenges (2024).
Gumilar, K. E. et al. Artificial intelligence–large language models (AI–LLMs) for reliable and accurate cardiotocography (CTG) interpretation in obstetric practice. Comput. Struct. Biotechnol. J. 27, 1140–1147 (2025).
Google Scholar
Broad, A. et al. Factors associated with abusive head trauma in young children presenting to emergency medical services using a large language model. Prehosp. Emerg. Care 29, 227–237 (2025).
Google Scholar
Liu, W. et al. Bridging the gap in neonatal care: evaluating AI chatbots for chronic neonatal lung disease and home oxygen therapy management. Pediatr. Pulmonol. 60, e71020 (2025).
Google Scholar
Levin, C., Kagan, T., Rosen, S. & Saban, M. An evaluation of the capabilities of language models and nurses in providing neonatal clinical decision support. Int. J. Nurs. Stud. 155, 104771 (2024).
Google Scholar
Yang, J. et al. RDmaster: a novel phenotype-oriented dialogue system supporting differential diagnosis of rare disease. Comput. Biol. Med. 169, 107924 (2024).
Google Scholar
Beam, K. et al. Performance of a large language model on practice questions for the neonatal board examination. JAMA Pediatr. 177, 977–979 (2023).
Google Scholar
Li, Y. et al. Exploring the performance of large language models on hepatitis B infection-related questions: a comparative study. World J. Gastroenterol. 31, 101092 (2025).
Google Scholar
Wang, Y., Chen, Y. & Sheng, J. Assessing ChatGPT as a medical consultation assistant for chronic hepatitis B: cross-language study of English and Chinese. JMIR Med. Inform. 12, e56426 (2024).
Google Scholar
Wu, C. et al. The large language model diagnoses tuberculous pleural effusion in pleural effusion patients through clinical feature landscapes. Respir. Res. 26, 52 (2025).
Google Scholar
Busch, D. et al. A blueprint for large language model-augmented telehealth for HIV mitigation in Indonesia: a scoping review of a novel therapeutic modality. Health Informatics J. 31, 14604582251315595 (2025).
Google Scholar
De Vito, A. et al. Assessing ChatGPT’s potential in HIV prevention communication: a comprehensive evaluation of accuracy, completeness, and inclusivity. AIDS Behav. 28, 2746–2754 (2024).
Google Scholar
Hua, Y. et al. A scoping review of large language models for generative tasks in mental health care. NPJ Digit. Med. 8, 230 (2025).
Google Scholar
Akdogan, O. et al. Effect of a ChatGPT-based digital counseling intervention on anxiety and depression in patients with cancer: a prospective, randomized trial. Eur. J. Cancer 221, 115408 (2025).
Google Scholar
Lauderdale, S. A. et al. Effectiveness of generative AI-large language models’ recognition of veteran suicide risk: a comparison with human mental health providers using a risk stratification model. Front. Psychiatry 16, 1544951 (2025).
Google Scholar
Lara-Abelenda, F. J. et al. Personalized glucose forecasting for people with type 1 diabetes using large language models. Comput. Methods Programs Biomed. 265, 108737 (2025).
Google Scholar
Giorgi, S. et al. Evaluating generative AI responses to real-world drug-related questions. Psychiatry Res. 339, 116058 (2024).
Google Scholar
Russell, A. M., Acuff, S. F., Kelly, J. F., Allem, J.-P. & Bergman, B. G. ChatGPT-4: alcohol use disorder responses. Addiction 119, 2205–2210 (2024).
Google Scholar
Gabriel, R. A., Park, B. H., Hsu, C.-N. & Macias, A. A. A review of leveraging artificial intelligence to predict persistent postoperative opioid use and opioid use disorder and its ethical considerations. Curr. Pain Headache Rep. 29, 30 (2025).
Google Scholar
Zhang, K. et al. Integrating visual large language model and reasoning chain for driver behavior analysis and risk assessment. Accid. Anal. Prev. 198, 107497 (2024).
Google Scholar
Burns, C. et al. Use of generative AI for improving health literacy in reproductive health: case study. JMIR Form. Res. 8, e59434 (2024).
Google Scholar
Swisher, A. R. et al. Enhancing health literacy: evaluating the readability of patient handouts revised by ChatGPT’s large language model. Otolaryngol. Head Neck Surg. (2024).
Google Scholar
Oniani, D. et al. Emerging opportunities of using large language models for translation between drug molecules and indications. Sci. Rep. 14, 10738 (2024).
Google Scholar
Li, S. et al. CodonBERT large language model for mRNA vaccines. Genome Res. 34, 1027–1035 (2024).
Google Scholar
Consens, M. E., Li, B., Poetsch, A. R. & Gilbert, S. Genomic language models could transform medicine but not yet. NPJ Digit. Med. 8, 212 (2025).
Google Scholar
Ng, F. Y. C. et al. Artificial intelligence education: an evidence-based medicine approach for consumers, translators, and developers. Cell Rep. Med. 4, 101230 (2023).
Google Scholar
Wang, X. et al. ChatGPT: promise and challenges for deployment in low- and middle-income countries. Lancet Reg. Health West. Pac. 41, 100905 (2023).
Google Scholar
van Hoek, A. J. et al. Importance of investing time and money in integrating large language model-based agents into outbreak analytics pipelines. Lancet Microbe 5, 100881 (2024).
Google Scholar
Zhu, K. et al. Evaluating the accuracy of responses by large language models for information on disease epidemiology. Pharmacoepidemiol. Drug Saf. 34, e70111 (2025).
Google Scholar
link
