Clustering Improvement in Homonym Detection using Concept Based Document Similarity with Conceptual Term Frequency Analysis
Keywords:
Concept based Document Similarity, Homonym Words, Clustering, EntropyAbstract
The homonym words have the same spelling but have different meanings and these words found in almost every language. The homonyms are a source of noise in most text analysis and are difficult to detect. It essentially understands to make correspond to combinations of identifying / difference in parameters like sound, writing, and meaning, according to how the terms are traditionally used; the combination of same sound, same spelling, but distinct meaning is for homonyms. The paper presents a clustering improvement analysis using concept based document similarity method for homonym recognition based on concept based document similarity, which allows a word to be comprehended based on its context. The results show the proposed method shows better performance in clustering improvement and entropy calculation.
References
Müller MC (2017) “Semantic Author Name Disambiguation with Word Embeddings”, In: International Conference on Theory and Practice of Digital Libraries. Springer, pp.300–311, 2017.
Pennington J, Socher R, Manning CD (2014) “Glove: Global Vectors for Word Representation”, In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.1532–1543, 2014.
Pittke F, Leopoldh, Mendling J (2015) “Automatic Detection and Resolution of Lexical Ambiguity in Process Models”, IEEE Trans Software Engineering, Vol.41, Issue.6, pp.526–544, 2015.
Radford A, Wu J, Child R, Luan D, Amodei D, Sutskever I (2019) “Language Models are Unsupervised Multitask Learners”, Open AI Blog 1(8):9, 2019.
Roll U, Correia RA, Berger-Tal O (2018) “Using Machine Learning to Disentangle Homonyms in Large Text Corpora”, Conservation Biology, Vol.32, Issue.3, pp.716–724, 2018.
Santana AF, Gonçalves MA, Laender AH, Ferreira AA (2017) “Incremental Author Name Disambiguation by Exploiting Domain-Specific Heuristics”, Journal of Association Information Science & Technology, Vol.68, Issue.4, pp.931–945, 2017.
Santos CN, Gatti M (2014) “Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts”, In: COLING, pp.69–78, 2014.
Schiemann T, Leser U, Hakenberg J (2009) “Word Sense Disambiguation in Biomedical Applications: A Machine Learning Method”, In: Information Retrieval in Biomedicine: Natural Language Processing for Knowledge Integration. IGI Global, pp.142–161, 2009.
Schuemiemj, Kors JA, Monsb “Word Sense Disambiguation in the Biomedical Domain: An Overview”, Journal of Computer Biology Vol.12, Issue.5, pp.554–565, 2015.
Schulz C, Mazloumian A, Petersen AM, Penner O, Helbing D “Exploiting Citation Networks for Large-Scale Author Name Disambiguation”, EPJ Data Science 3(1):11, 2014.
Shaikh T, Deshpande D “A Review On Opinion Mining and Sentiment Analysis”, International Journal of Computer Application, 975:8887, 2016.
Sharma S, Srivastava SK “Review on Text Mining Algorithms”, International Journal of Computer Applications, Vol.134, Issue.8, pp.39–43, 2016.
Shen Q, Wu T, Yang H, Wu Y, Qu H, Cui W “Nameclarifier: A Visual Analytics System for Author Name Disambiguation”, IEEE Trans Vis Computer Graph, Vol.23, Issue.1, pp.141–150, 2016.
Singh T “A Comprehensive Review of Text Mining”, International Journal of Computer Science and Information Technology, Vol.7, Issue.1, pp.167–169, 2016.
Song M, Kim EHJ, Kim HJ “Exploring Author Name Disambiguation on Pubmed-Scale”, Journal of Informetric Vol.9, Issue.4, pp.924–941, 2015.
Songa X, Mina YJ, Da-Xionga L, Fengb WZ, Shua C “Research on Text Error Detection and Repair Method Based on Online Learning Community”, Procedia Computer Science, 154: pp.13–19, 2019.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.