An Extensive Comparison of Feature Extraction Methods for Paraphrase Detection

Hassan Shahmohammadi, MH Dezfoulian, M Mansoorizadeh

October, 2018

Abstract

Paraphrase detection is one of the fundamental tasks in natural language processing. Designing a system to detect the paraphrase pairs requires a good understanding of different feature extraction methods. To tackle this challenge, lots of work have been done to extract various types of features. Knowing which types of features are discriminant for paraphrase identification, saves a lot of time for researchers and helps them obtain better result in their works. In this paper we compare various types of feature extraction methods that neither need any prior knowledge nor any external resources, so they can be used in every language. Our experiments show that those types of methods which specify the importance of each word in documents or break down the document into specific parts, have a better result compared to those methods that try to capture the meaning of a given document as a whole and treat the document as a single component.

Type

Conference paper

Publication

8th International Conference on Computer and Knowledge Engineering

Paraphrase detection

An Extensive Comparison of Feature Extraction Methods for Paraphrase Detection

Abstract

Hassan Shahmohammadi

Senior Research Scientist