Clustering Improvement in Homonym Detection using Concept Based Document Similarity with Conceptual Term Frequency Analysis

Sunil Kumar; Rajendra Gupta

Authors

Sunil Kumar Department of Computer Science, Rabindranath Tagore University, Bhopal, India
Rajendra Gupta Department of Computer Science, Rabindranath Tagore University, Bhopal, India

Keywords:

Concept based Document Similarity, Homonym Words, Clustering, Entropy

Abstract

The homonym words have the same spelling but have different meanings and these words found in almost every language. The homonyms are a source of noise in most text analysis and are difficult to detect. It essentially understands to make correspond to combinations of identifying / difference in parameters like sound, writing, and meaning, according to how the terms are traditionally used; the combination of same sound, same spelling, but distinct meaning is for homonyms. The paper presents a clustering improvement analysis using concept based document similarity method for homonym recognition based on concept based document similarity, which allows a word to be comprehended based on its context. The results show the proposed method shows better performance in clustering improvement and entropy calculation.

References

Müller MC (2017) “Semantic Author Name Disambiguation with Word Embeddings”, In: International Conference on Theory and Practice of Digital Libraries. Springer, pp.300–311, 2017.

Pennington J, Socher R, Manning CD (2014) “Glove: Global Vectors for Word Representation”, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1532–1543, 2014.

Pittke F, Leopoldh, Mendling J (2015) “Automatic Detection and Resolution of Lexical Ambiguity in Process Models”, IEEE Trans Software Engineering, Vol.41, Issue.6, pp.526–544, 2015.

Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) “Language Models are Unsupervised Multitask Learners”, Open AI Blog 1(8):9, 2019.

Roll U, Correia RA, Berger-Tal O (2018) “Using Machine Learning to Disentangle Homonyms in Large Text Corpora”, Conservation Biology, Vol.32, Issue.3, pp.716–724, 2018.

Santana AF, Gonçalves MA, Laender AH, Ferreira AA (2017) “Incremental Author Name Disambiguation by Exploiting Domain-Specific Heuristics”, Journal of Association Information Science & Technology, Vol.68, Issue.4, pp.931–945, 2017.

Santos CN, Gatti M (2014) “Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts”, In: COLING, pp.69–78, 2014.

Schiemann T, Leser U, Hakenberg J (2009) “Word Sense Disambiguation in Biomedical Applications: A Machine Learning Method”, In: Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration. IGI Global, pp.142–161, 2009.

Schuemiemj, Kors JA, Monsb “Word Sense Disambiguation in the Biomedical Domain: An Overview”, Journal of Computer Biology Vol.12, Issue.5, pp.554–565, 2015.

Schulz C, Mazloumian A, Petersen AM, Penner O, Helbing D “Exploiting Citation Networks for Large-Scale Author Name Disambiguation”, EPJ Data Science 3(1):11, 2014.

Shaikh T, Deshpande D “A Review On Opinion Mining and Sentiment Analysis”, International Journal of Computer Application, 975:8887, 2016.

Sharma S, Srivastava SK “Review on Text Mining Algorithms”, International Journal of Computer Applications, Vol.134, Issue.8, pp.39–43, 2016.

Shen Q, Wu T, Yang H, Wu Y, Qu H, Cui W “Nameclarifier: A Visual Analytics System for Author Name Disambiguation”, IEEE Trans Vis Computer Graph, Vol.23, Issue.1, pp.141–150, 2016.

Singh T “A Comprehensive Review of Text Mining”, International Journal of Computer Science and Information Technology, Vol.7, Issue.1, pp.167–169, 2016.

Song M, Kim EHJ, Kim HJ “Exploring Author Name Disambiguation on Pubmed-Scale”, Journal of Informetric Vol.9, Issue.4, pp.924–941, 2015.

Songa X, Mina YJ, Da-Xionga L, Fengb WZ, Shua C “Research on Text Error Detection and Repair Method Based on Online Learning Community”, Procedia Computer Science, 154: pp.13–19, 2019.

Clustering Improvement in Homonym Detection using Concept Based Document Similarity with Conceptual Term Frequency Analysis

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Journal Information

Information

Join Editorial Board

Keywords

Current Issue