Skip to main content

Part of Advanced analysis components to support SNOMED PaLM mapping project

Matching using terminological methods

Terminological (or linguistic) matchers work best when labels and metadata are clear and informative. They handle abbreviations and linguistic variations effectively but do not consider hierarchical ontology structures. This can lead to false positives if unrelated terms share similar strings (common in medical domains). They also allow the use of external resources (such as lookup tables) to support the matching.

Terminological matchers can be split into string-based and language-based.


String-based approaches

  1. Levenshtein distance1 - measures the minimum edits (insertions, deletions, substitutions) needed to transform one string into another.
  2. Jaro2/Jaro-Winkler3 distance - compares common characters and their order, performing well on spelling errors.
  3. ISUB4 - designed for ontology alignment, incorporating substring matching and a Jaro-Winkler component.

In practice, multiple attributes must often be compared, not just single strings. Offering users several candidate matches derived from various string-based algorithms (and attribute combinations) can improve accuracy.


Language-based approaches

Intrinsic methods - these include the NLP steps above (stemming, lemmatization, stop-word removal, POS tagging).

Extrinsic methods - use external dictionaries or resources (such as WordNet5 and domain-specific thesauri).

Path-based measures - such as radaDistance6, LCSim7 and WuPalmerSim8.

Information-based measures - such as ResnikSim9, LinSim10 and JianDistance11, which use probability or 'collision frequency'.

Combined measures - such as PirroSim12 and TverskySim13.

Most ontology matching systems combine these techniques to maximise accuracy.


The role of semantic matching as verification

Because SNOMED is an OWL-based ontology (web ontology language), semantic matching can validate the initial matches from terminological methods. For example, an approach like Ontology Matching with Semantic Verification14 proposes matches via string or language-based methods, then checks those matches against description logic axioms in the ontology.

This helps:

  • filter out homonyms or misleadingly similar terms
  • confirm equivalences where labels differ but meanings align (such as synonyms in a hierarchical structure)

https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf

2 M. A. Jaro. Probabilistic Linkage of Large Public Health Data Files. Statistics in Medicine, 14:491–498, 1995.

William E. Winkler. The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Bureau of the Census, 1999.

Giorgos Stoilos, Giorgos Stamou, and Stefanos Kollias. A String Metric for Ontology Alignment. In Yolanda Gil, Enrico Motta, V. Richard Benjamins, and Mark A. Musen, editors, The Semantic Web ISWC 2005, volume 3729 of Lecture Notes in Computer Science, chapter 45, pages 624–637. Springer-Verlag, Berlin/Heidelberg, 2005.

Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.

6 R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1):17–30, January 1989.

7 C. Leacock and M. Chodorow. Combining local context and wordnet similarity for word sense identification. In Christiane Fellfaum, editor, MIT Press, pages 265–283, Cambridge, Massachusetts, 1998.

8 Zhibiao Wu Department and Zhibiao Wu. Verb semantics and lexical selection. In In Proceedings of the 32nd Annual Meetingof the Association for Computational Linguistics, pages 133–138, 1994.

9 Philip Resnik. Using information content to evaluate semantic similarity in a taxonomy. In In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, 1995.

10 Dekang Lin. An Information-Theoretic Definition of Similarity. In Jude W. Shavlik and Jude W. Shavlik, editors, ICML, pages 296–304. Morgan Kaufmann, 1998.

11 J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference Research on Computational Linguistics (ROCLING), Taiwan, 1997.

12 Giuseppe Pirr´o. A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng., 68:1289–1308, November 2009.

13 AmosTversky. FeaturesofSimilarity. In Psychological Review, volume 84, pages 327–352, 1977.

14 Jean-Mary YR, Shironoshita EP, Kabuka MR. Ontology Matching with Semantic Verification. Web Semant. 2009 Sep 1;7(3):235-251. doi: 10.1016/j.websem.2009.04.001. PMID: 20186256; PMCID: PMC2825706.


Last edited: 22 May 2025 5:18 pm