Advanced analysis components to support SNOMED PaLM mapping project
The goal of this guide is to outline advanced analytics components and key considerations that may be relevant to the SNOMED PaLM mapping project. These insights will support the development of best practice guidance and an options paper for the programme.
Author: Jonny Pearson – Lead Data Scientist, Data Science Team, NHS England
Acknowledgment: Special thanks to Avish Vijayaraghavan for providing reference materials from the internal project 'Resource-Constrained Annotation Workflows for Paediatric Histopathology Reports using LLMs.'
A literature review was conducted to investigate current practices in ontology mapping, combined with an existing review on using large language models (LLMs) for histopathology reports. Much of the broader literature was excluded because it covered:
- associated tasks using ontology mapping to improve classification performance
- ill-formed ontologies or those affected by missing data in a data set (missingness)
- studies focused solely on extensional matchers or matching via upper ontologies
Structure
The role of advanced analytics in the SNOMED PaLM Mapping project can be considered across 3 areas, 2 of which are in scope and 1 that is future-facing.
- Current scope: Parsing semi-structured data to structured
Involves converting a semi-structured lab Read code into identifiable entities (component, property, inheres to, direct site, for example). This includes handling many-to-many mappings.
- Current scope: Matching using terminological methods
Uses the extracted categories to create matching scores against a target ontology. Includes pre-processing for acronyms and using referential maps (such as existing PBCL mappings).
- Future scope: Parsing unstructured data to structured
Primarily focuses on extracting mappable content from full pathology reports in free-text form. This process precedes or supplements the first two steps but is beyond the current scope.
Last edited: 23 May 2025 9:20 am