SEMILAR —— A Semantic Similarity Toolkit

Introducing SEMILAR

The SEMILAR software environment offers users, researchers, and developers, easy access to fully-implemented semantic similarity methods in one place through both a GUI-based interface and a library. Besides productivity advantages, SEMILAR provides a framework for the systematic comparison of various semantic similarity methods.

The automated methods offered by SEMILAR range from simple lexical overlap methods to methods that rely on word-to-word similarity metrics to more sophisticated methods that rely on fully unsupervised methods to derive the meaning of words and sentences such as Latent Semantic Analysis and Latent Dirichlet Allocation to kernel-based methods for assessing similarity.

Besides automated ways for assessing the semantic similarity of texts, the toolkit offers facilities for manual assessment by experts. The manual assessment and annotation component offers GUI-based facilities for experts to assess and annotate the semantic similarity of texts. This component is called SEMILAT, the SEMantic simILarity Annotation Tool. SEMILAT is available for download. The SEMILAR corpus built by our research group is also available for download. The SEMILAR corpus offers word-level similarity qualitative judgments by human experts which can be used to further the understanding of the various word-to-word semantic similarity methods and their impact on the similarity of larger texts, e.g. sentences or paragraphs.

Some of the most important features of SEMILAR are listed below:

Easy GUI interface
Data management
Preprocessing
Lexical and syntactic feature extraction
Visualization
GUI-based data assessment and annotation (SEMILAT: The SEMantic simiLArity Annotation Tool)
Performance reports (if data is accompanied by expert judgments)

Why The SEMILAR Project?

The goal of the SEMantic simILARity software toolkit (SEMILAR; pronounced the same way as the word ‘similar’) is to promote productive, fair, and rigorous research advancements in the area of semantic similarity.

Semantic similarity is the practical, widely used approach to address the natural language understanding issue in many core NLP tasks such as paraphrase identification, Question Answering, Natural Language Generation, and Intelligent Tutoring Systems. The full understanding approach, which is the other approach to language understanding, is desirable. However, because full language understanding requires world knowledge, it is more challenging and presently less practical for large scale use and real world applications.

In the semantic similarity approach, the meaning of a target text is inferred by assessing how similar it is to another text, called the benchmark text, whose meaning is known. If the two texts are similar enough, according to some measure of semantic similarity, the meaning of the target text is deemed similar to the meaning of the benchmark text. For instance, in dialogue-based Intelligent Tutoring Systems in which learners interact with a tutoring system through dialogue, students’ natural language answers to, say, science problems are assessed by comparing them to ideal responses provided by experts. The students’ answers are deemed correct if they are similar enough to experts’ responses, which are deemed correct.

The development of SEMILAR has been motivated by the lack of an integrated environment that would provide

Easy access to the various implementations of the semantic similarity approach from the same interface and/or library
Easy access to semantic similarity methods that work at different levels of text granularity: word-to-word, sentence-to-sentence, paragraph-to-paragraph, 3. document-to-document, or a combination of the various granularities such as word-to-sentence, sentence-to-paragraph, etc.
A common environment for the systematic comparison of the various semantic similarity methods

SEMILAR Application Screenshots

In the project folder you can load datasets or previously save projects.

The DataView tab allows you to view the data and the various similarity metrics that were selected to generate in other tabs.

From the Similarity Methods, you can choose similarity methods and their configuration parameters.

生命之旅

SEMILAR —— A Semantic Similarity Toolkit

Introducing SEMILAR

Why The SEMILAR Project?

SEMILAR Application Screenshots

Reference