Large Language Models (LLMs) have gained significant popularity among healthcare professionals as tools for AI-driven interactions. These models can analyze large volumes of clinical data, including patient narratives, to assist in efficient decision-making. In this paper, we aim to evaluate how the use of a syntactic validator combined with an LLM can improve the accuracy of generating FHIR resources from natural language sentences. We compared zero-shot, one-shot, and few-shot prompting methods. The process involves a validator component that iteratively assesses whether the output of the LLM adheres to the syntactic requirements specified by FHIR before returning the final response to the user. One-shot and few-shot prompting methods generated 96% of syntactic validity, while it was 90% for the zero-shot strategy. According to the semantic analysis, one-shot prompting achieved the highest number of correct comparisons between the generated JSON and ground truth resource, with 25.82 comparisons per FHIR resource. This was followed by the few-shot strategy, with 25.50 correct comparisons per FHIR resource, and the zero-shot prompting approach, with 15.21 correct comparisons per FHIR resource.

Assessing the Potential of an LLM-Powered System for Enhancing FHIR Resource Validation

Tabari P.;Piscitelli A.;Costagliola G.;de Rosa M.
2025

Abstract

Large Language Models (LLMs) have gained significant popularity among healthcare professionals as tools for AI-driven interactions. These models can analyze large volumes of clinical data, including patient narratives, to assist in efficient decision-making. In this paper, we aim to evaluate how the use of a syntactic validator combined with an LLM can improve the accuracy of generating FHIR resources from natural language sentences. We compared zero-shot, one-shot, and few-shot prompting methods. The process involves a validator component that iteratively assesses whether the output of the LLM adheres to the syntactic requirements specified by FHIR before returning the final response to the user. One-shot and few-shot prompting methods generated 96% of syntactic validity, while it was 90% for the zero-shot strategy. According to the semantic analysis, one-shot prompting achieved the highest number of correct comparisons between the generated JSON and ground truth resource, with 25.82 comparisons per FHIR resource. This was followed by the few-shot strategy, with 25.50 correct comparisons per FHIR resource, and the zero-shot prompting approach, with 15.21 correct comparisons per FHIR resource.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4952135
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact