Part of Advanced analysis components to support SNOMED PaLM mapping project

Matching using terminological methods

Terminological (or linguistic) matchers work best when labels and metadata are clear and informative. They handle abbreviations and linguistic variations effectively but do not consider hierarchical ontology structures. This can lead to false positives if unrelated terms share similar strings (common in medical domains). They also allow the use of external resources (such as lookup tables) to support the matching.

Terminological matchers can be split into string-based and language-based.

String-based approaches

Levenshtein distance¹ - measures the minimum edits (insertions, deletions, substitutions) needed to transform one string into another.
Jaro²/Jaro-Winkler³ distance - compares common characters and their order, performing well on spelling errors.
ISUB⁴ - designed for ontology alignment, incorporating substring matching and a Jaro-Winkler component.

In practice, multiple attributes must often be compared, not just single strings. Offering users several candidate matches derived from various string-based algorithms (and attribute combinations) can improve accuracy.

Language-based approaches

Intrinsic methods - these include the NLP steps above (stemming, lemmatization, stop-word removal, POS tagging).

Extrinsic methods - use external dictionaries or resources (such as WordNet⁵ and domain-specific thesauri).

Path-based measures - such as radaDistance⁶, LCSim⁷ and WuPalmerSim⁸.

Information-based measures - such as ResnikSim⁹, LinSim¹⁰ and JianDistance¹¹, which use probability or 'collision frequency'.

Combined measures - such as PirroSim¹² and TverskySim¹³.

Most ontology matching systems combine these techniques to maximise accuracy.

The role of semantic matching as verification

Because SNOMED is an OWL-based ontology (web ontology language), semantic matching can validate the initial matches from terminological methods. For example, an approach like Ontology Matching with Semantic Verification¹⁴ proposes matches via string or language-based methods, then checks those matches against description logic axioms in the ontology.

This helps:

filter out homonyms or misleadingly similar terms
confirm equivalences where labels differ but meanings align (such as synonyms in a hierarchical structure)

¹https://nymity.ch/sybilhunting/pdf/Levenshtein1966a.pdf

² M. A. Jaro. Probabilistic Linkage of Large Public Health Data Files. Statistics in Medicine, 14:491–498, 1995.

³William E. Winkler. The state of record linkage and current research problems. Technical report, Statistical Research Division, U.S. Bureau of the Census, 1999.

⁴Giorgos Stoilos, Giorgos Stamou, and Stefanos Kollias. A String Metric for Ontology Alignment. In Yolanda Gil, Enrico Motta, V. Richard Benjamins, and Mark A. Musen, editors, The Semantic Web ISWC 2005, volume 3729 of Lecture Notes in Computer Science, chapter 45, pages 624–637. Springer-Verlag, Berlin/Heidelberg, 2005.

⁵Christiane Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.

⁶ R. Rada, H. Mili, E. Bicknell, and M. Blettner. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(1):17–30, January 1989.

⁷ C. Leacock and M. Chodorow. Combining local context and wordnet similarity for word sense identification. In Christiane Fellfaum, editor, MIT Press, pages 265–283, Cambridge, Massachusetts, 1998.

⁸ Zhibiao Wu Department and Zhibiao Wu. Verb semantics and lexical selection. In In Proceedings of the 32nd Annual Meetingof the Association for Computational Linguistics, pages 133–138, 1994.

⁹ Philip Resnik. Using information content to evaluate semantic similarity in a taxonomy. In In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 448–453, 1995.

¹⁰ Dekang Lin. An Information-Theoretic Definition of Similarity. In Jude W. Shavlik and Jude W. Shavlik, editors, ICML, pages 296–304. Morgan Kaufmann, 1998.

¹¹ J. Jiang and D. Conrath. Semantic similarity based on corpus statistics and lexical taxonomy. In Proceedings of the International Conference Research on Computational Linguistics (ROCLING), Taiwan, 1997.

¹² Giuseppe Pirr´o. A semantic similarity metric combining features and intrinsic information content. Data Knowl. Eng., 68:1289–1308, November 2009.

¹³ AmosTversky. FeaturesofSimilarity. In Psychological Review, volume 84, pages 327–352, 1977.

¹⁴ Jean-Mary YR, Shironoshita EP, Kabuka MR. Ontology Matching with Semantic Verification. Web Semant. 2009 Sep 1;7(3):235-251. doi: 10.1016/j.websem.2009.04.001. PMID: 20186256; PMCID: PMC2825706.

Last edited: 22 May 2025 5:18 pm

Matching using terminological methods

String-based approaches

Language-based approaches

The role of semantic matching as verification

Chapters