Precision Improvement in Information Storage and Retrieval System by Document Length Normalization

D. Sharma; H. Nagar Nagar

Authors

D. Sharma Department of Computer Science and Engineering, Mewar University, Chittorgarh, India
H. Nagar Nagar Department of Computer Science and Engineering, Mewar University, Chittorgarh, India

Keywords:

Document length, Normalization, Rank, Storage system, Precision, Information Storage, Retrieval System, Normalization

Abstract

Huge amount of information are available over the internet in electronic document format but retrieving the correct document according to users information need is very critical task. The relevancy of the document can vary according to the length of document. Automatic information storage and retrieval system have to deal with documents of varying length in text collection. In this paper we are presenting a document term weighting scheme based on length of document. Our method increases the rank of relevant document in the retrieved ordered document set. From the result we have seen that our method increase the document rank from 0.83 precision to 0.16 precision.

References

O. King, M. Kobayashi, “Information Retrieval and Ranking on the Web: Benchmarking studies II”, IBM TRL Research Report :RT0298, Japan pp.1-38,1999.

S. Michel, K. Nguyen, A. Rosenstein, L. Zhang, S. Floyd, V. Jacobson, “Adaptive web caching: towards a new global caching architecture”, Computer Networks and ISDN systems, Vol.30, Issue.22, pp.2169-2177, 1998.

G.E. Dupret, M. Kobayashi, “Information Retrieval and Ranking on the Web: Benchmarking studies I,” IBM TRL Research Report, Japan, pp.1-138, 1999.

M. Kobayashi, K. Takeda, “Information Retrieval on the Web”, IBM Research, Japan, pp.1-64, 2000.

G. Salton, A. Wong, C. S. Yang, "A vector space model for automatic indexing", Magazine Communications of the ACM CACM Homepage archive, Vol.18, Issue.11, pp.613-620, 1975.

S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, R. Harshman, “Indexing by latent semantic analysis”, Journal of the American Society for Information Science, Vol.41, Issue.6, pp.391-407, 1990.

T. Kitagawa, Y. Kiyoki, “A mathematical model of meaning and its application to multidatabase systems”, In RIDE-IMS `93: Proceedings of the 3rd International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems, Austria, pp.130-135, 1993.

Y. Kiyoki, T. Kitagawa, T. Hayama, “A metadatabase system for semantic image search by a mathematical model of meaning”, SIGMOD Record, Vol.23, Issue.4, pp.34-41, 1994.

K. Takano, Y. Kiyoki, “A superordinate and subordinate relationship computation method and its application to aerospace engineering information”, In ACST`07: Proceedings of the third conference on IASTED International Conference, Anaheim, CA, pp.510-516, 2007.

G.A. Miller, R. Beckwith, C. Fellbaum, D. Gross, K.J. Miller. “Introduction to LexemeNet: An on-line lexical database”, Journal of Lexicography, Vol.3, Issue.4, pp.235-244, 1990.

R. Rada, H. Mili, E. Bicknell, M. Blettner, “Development and application of a metric on semantic nets”, IEEE Transactions on Systems, Man and Cybernetics, Vol.19, Issue.1, pp.17-30, 1989.

Y. Kim, J. Kim, “A model of knowledge based information retrieval with hierarchical concept graph”, Journal of Documentation, Vol.46, Issue.2, pp.113-136, 1990.

Y. Li, K. Bontcheva, “Hierarchical, perceptron-like learning for ontology-based information extraction”, In Proceedings of the 16th international conference on World Wide Web (WWW `07), NY, pp.777-786, 2007.

C. Hwang, “Incompletely and imprecisely speaking: Using dynamic ontologies for representing and retrieving information”, In Proceedings of the 6th international workshop on ontology-based information extraction system, Germany, pp.14-20, 1999.

B. Yildiz, S. Miksch “ontoX - A Method for Ontology-Driven Information Extraction”, Lecture Notes in Computer Science. Vol.4707, pp. 660-673, 2007.

A. Todirascu, L. Romary, D. Bekhouche, “Vulcain — An Ontology- Based Information Extraction System”, Lecture Notes in Computer Science, Vol.2553, pp.64-75, 2002.

M. Vargas-Vera, E. Motta, J. Domingu, S. Shum, M. Lanzoni, “Knowledge extraction by using an ontology-based annotation tool”, In Proceedings of the workshop on knowledge markup and semantic annotation, NY, pp.1-8, 2001.

B. Popov, A. Kiryakov, D. Ognyanoff, D. Monov, A. Kirilov, “KIM – a semantic platform for information extraction and retrieval”, Natural Language Engineering, Vol.10, Issue.3, pp. 375-392, 2004.

B. Adrian, J. Hees, L. Elst, A. Dengel, “iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text”, Lecture Notes in Computer Science, VOl.5803, pp.249-256, 2009.

T.G. Kolda, D.P. O`Leary, "A semidiscrete matrix decomposition for latent semantic indexing information retrieval", Journal ACM Transactions on Information Systems (TOIS) TOIS Homepage archive, Vol.16, Issue.4, pp. 322-346, 1998.

G.Salton, C. Buckley, "Lexeme weighting approaches in automatic text retrieval," Journal Information Processing and Management, Vol.24, Issue.5, pp. 513–523, 1988.

D. Harman, "Ranking algorithmsIn Information Retrieval: Data Structures and Algorithms," Prentice Hall, Englewood Cliffs, pp.363–392, 1992.

B. Yildiz, S. Miksch “ontoX - A Method for Ontology-Driven Information Extraction”, Lecture Notes in Computer Science, Vol.4707, pp.660-673, 2007.

A. Todirascu, L. Romary, D. Bekhouche, “Vulcain — An Ontology- Based Information Extraction System,” Lecture Notes in Computer Science, Vol. 2553, pp.64-75, 2002.

M. Vargas-Vera, E. Motta, J. Domingu, S. Shum, M. Lanzoni, “Knowledge extraction by using an ontology-based annotation tool”, In Proceedings of the workshop on knowledge markup and semantic annotation, NY, pp.1-13, 2001.

B. Popov, A. Kiryakov, D. Ognyanoff, D. Monov, A. Kirilov, “KIM – a semantic platform for information extraction and retrieval”, Natural Language Engineering, Vol.10, Issue3, pp. 375-392,2004.

B. Adrian, J. Hees, L. Elst, A. Dengel, “iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text”, Lecture Notes in Computer Science, Vol.5803, pp.249-256, 2009.

T.G. Kolda, D.P. O`Leary, "A semidiscrete matrix decomposition for latent semantic indexing information retrieval", Journal ACM Transactions on Information Systems, Vol.16, Issue.4, pp.322-346, 1998.

G. Salton, C. Buckley, "Term weighting approaches in automatic text retrieval", Journal Information Processing and Management, Vol.24, Issue.5, pp.513–523, 1988.

D. Harman, "Ranking algorithms. In Information Retrieval: Data Structures and Algorithms", Prentice Hall, Englewood Cliffs, pp.363–392, 1992.

Precision Improvement in Information Storage and Retrieval System by Document Length Normalization

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

Information

Join Editorial Board

Keywords

Current Issue