An Extensive Comparison of Feature Extraction Methods for Paraphrase Detection


Paraphrase detection is one of the fundamental tasks in natural language processing. Designing a system to detect the paraphrase pairs requires a good understanding of different feature extraction methods. To tackle this challenge, lots of work have been done to extract various types of features. Knowing which types of features are discriminant for paraphrase identification, saves a lot of time for researchers and helps them obtain better result in their works. In this paper we compare various types of feature extraction methods that neither need any prior knowledge nor any external resources, so they can be used in every language. Our experiments show that those types of methods which specify the importance of each word in documents or break down the document into specific parts, have a better result compared to those methods that try to capture the meaning of a given document as a whole and treat the document as a single component.

8th International Conference on Computer and Knowledge Engineering
Hassan Shahmohammadi
Hassan Shahmohammadi
PhD candidate at University of Tuebingen & IMPRS-IS

My research interests include Multi-Modal learning using deep learning bridging NLP and CV