Semantics Based Document Clustering
Keywords:
demoAbstract
Document clustering is a technique used to organize large datasets of documents into meaningful groups. The associated documents are described by the relevant words which serve as cluster labels. The traditional approach for document clustering uses bag-of-words representation. This representation often ignores the semantic relations between the words. Therefore ontology-based document clustering is proposed. One of the ways to deal with reusability and remix of learning objects in context of e-learning is via the use of appropriate ontologies. The more appropriate use of ontology the better will be the annotation of learning material. To couple document clustering with ontology will help in producing better clusters which will not ignore the semantic relation between the words. The proposed system uses “an ontology-based document clustering” approach based on two-step clustering algorithm. Since it is two step clustering, it uses both partitioning as well as hierarchical clustering algorithms. Ontology is introduced through defining a weighting scheme. This weighing scheme integrates traditional scheme of co-occurrences of words paired with weights of relations between words in ontology. The algorithm used from partition clustering technique is K-means whereas from hierarchical clustering technique is hierarchical agglomerative algorithm. Thus we can say that the clustering approach that uses the semantics of the documents for term weighting produces better results than the approach without semantics.
References
Sara Alaee and Fattaneh Taghiyareh, “A semantic ontology based document organizer to cluster E-Learning documents”, 2016 Second international conference on web research(ICWR), 2016 IEEE.
Nadana Ravishankar. T and Shriram. R, “Ontology based clustering algorithm for information retrieval”, 4th ICCNT, July 2013, IEEE.
Hongwei Yang, “A document clustering algorithm for web search engine retrieval system”,2010 International conference on e-education, e-business, e-management and e-learning,2010 IEEE.
XiQuan Yang, DiNa Guo, XueYa Cao and JianYuan Zhou, “Research on Ontology-based Text Clustering”, 2008 Third International Workshop on Semantic Media Adaptation and Personalization, 2008 IEEE.
Enrico G. Caldarola and Antonio M. Rinaldi, “An Approach to Ontology Integration for Ontology Reuse”, IEEE 17th International Conference on Information Reuse and Integration, 2016.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2017 Apurva Dube, Pradnya Gotmare

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.