Semantics Based Document Clustering

Authors

  • Apurva Dube Dept.of Computer Engineering, K.J.Somaiya College of Engineering, Mumbai, India
  • Pradnya Gotmare Dept.of Computer Engineering, K.J.Somaiya College of Engineering, Mumbai, India

Keywords:

demo

Abstract

Document clustering is a technique used to organize large datasets of documents into meaningful groups. The associated documents are described by the relevant words which serve as cluster labels. The traditional approach for document clustering uses bag-of-words representation. This representation often ignores the semantic relations between the words. Therefore ontology-based document clustering is proposed. One of the ways to deal with reusability and remix of learning objects in context of e-learning is via the use of appropriate ontologies. The more appropriate use of ontology the better will be the annotation of learning material. To couple document clustering with ontology will help in producing better clusters which will not ignore the semantic relation between the words. The proposed system uses “an ontology-based document clustering” approach based on two-step clustering algorithm. Since it is two step clustering, it uses both partitioning as well as hierarchical clustering algorithms. Ontology is introduced through defining a weighting scheme. This weighing scheme integrates traditional scheme of co-occurrences of words paired with weights of relations between words in ontology. The algorithm used from partition clustering technique is K-means whereas from hierarchical clustering technique is hierarchical agglomerative algorithm. Thus we can say that the clustering approach that uses the semantics of the documents for term weighting produces better results than the approach without semantics.

 

References

Sara Alaee and Fattaneh Taghiyareh, “A semantic ontology based document organizer to cluster E-Learning documents”, 2016 Second international conference on web research(ICWR), 2016 IEEE.

Nadana Ravishankar. T and Shriram. R, “Ontology based clustering algorithm for information retrieval”, 4th ICCNT, July 2013, IEEE.

Hongwei Yang, “A document clustering algorithm for web search engine retrieval system”,2010 International conference on e-education, e-business, e-management and e-learning,2010 IEEE.

XiQuan Yang, DiNa Guo, XueYa Cao and JianYuan Zhou, “Research on Ontology-based Text Clustering”, 2008 Third International Workshop on Semantic Media Adaptation and Personalization, 2008 IEEE.

Enrico G. Caldarola and Antonio M. Rinaldi, “An Approach to Ontology Integration for Ontology Reuse”, IEEE 17th International Conference on Information Reuse and Integration, 2016.

Downloads

Published

2017-08-30

How to Cite

[1]
A. Dube and P. Gotmare, “Semantics Based Document Clustering”, Int. J. Sci. Res. Comp. Sci. Eng., vol. 5, no. 4, Aug. 2017.

Issue

Section

Research Article