Publications

Journal Articles


Critique of Impure Reason: Unveiling the reasoning behaviour of medical Large Language Models

Published in Arvix, 2024

Background: Despite the current ubiquity of Large Language Models (LLMs) across the medical domain, there is a surprising lack of studies which address their reasoning behaviour. We emphasise the importance of understanding reasoning behaviour as opposed to high-level prediction accuracies, since it is equivalent to explainable AI (XAI) in this context. In particular, achieving XAI in medical LLMs used in the clinical domain will have a significant impact across the healthcare sector. Results: Therefore, we define the concept of reasoning behaviour in the specific context of medical LLMs. We then categorise and discuss the current state of the art of methods which evaluate reasoning behaviour in medical LLMs. Finally, we propose theoretical frameworks which can empower medical professionals or machine learning engineers to gain insight into the low-level reasoning operations of these previously obscure models. Conclusion: The subsequent increased transparency and trust in medical machine learning models by clinicians as well as patients will accelerate the integration, application as well as further development of medical AI for the healthcare system as a whole

Download Paper

Enhancing Medical Summarization with Parameter Efficient Fine Tuning on Local CPUs

Published in ICECCE. An IEEE Approved Conference Under Record # 63537, 2024

Documenting and summarizing patient symptoms and medical history for each visit can significantly burden clinicians’ time management. Large Language Models (LLMs) have demonstrated great potential in natural language processing (NLP) tasks; however, their effectiveness in clinical summarization tasks has not yet been rigorously validated. While much research has focused on leveraging closed LLMs like GPT-4, Claude, and Gemini for clinical applications, privacy concerns hinder their deployment in real clinical settings. On-premises deployment offers a potential solution. This study examines domain adaptation techniques on the open-source LLM, Llama 3 8B Instruct, to improve clinical summarization. Our approach emphasizes fine-tuning on CPUs instead of the more commonly used GPUs, aiming for greater cost savings in practical applications. We apply Quantized Low-Rank Adaptation (QLoRA) for efficient task-specific adaptation and introduce CPU optimization techniques such as IPEX-LLM and Intel® AMX to enhance performance. Our results show that CPU fine-tuning, while less conventional than GPU-based methods, still provides a practical, cost-effective, and privacy-aware solution for on-premises deployment, supporting the accuracy of medical summarization and enabling customization according to unique clinical requirements

Download Paper