Precision Improvement in Information Storage and Retrieval System by Document Length Normalization
Keywords:
Document length, Normalization, Rank, Storage system, Precision, Information Storage, Retrieval System, NormalizationAbstract
Huge amount of information are available over the internet in electronic document format but retrieving the correct document according to users information need is very critical task. The relevancy of the document can vary according to the length of document. Automatic information storage and retrieval system have to deal with documents of varying length in text collection. In this paper we are presenting a document term weighting scheme based on length of document. Our method increases the rank of relevant document in the retrieved ordered document set. From the result we have seen that our method increase the document rank from 0.83 precision to 0.16 precision.
References
O. King, M. Kobayashi, “Information Retrieval and Ranking on the Web: Benchmarking studies II”, IBM TRL Research Report :RT0298, Japan pp.1-38,1999.
S. Michel, K. Nguyen, A. Rosenstein, L. Zhang, S. Floyd, V. Jacobson, “Adaptive web caching: towards a new global caching architecture”, Computer Networks and ISDN systems, Vol.30, Issue.22, pp.2169-2177, 1998.
G.E. Dupret, M. Kobayashi, “Information Retrieval and Ranking on the Web: Benchmarking studies I,” IBM TRL Research Report, Japan, pp.1-138, 1999.
M. Kobayashi, K. Takeda, “Information Retrieval on the Web”, IBM Research, Japan, pp.1-64, 2000.
G. Salton, A. Wong, C. S. Yang, "A vector space model for automatic indexing", Magazine Communications of the ACM CACM Homepage archive, Vol.18, Issue.11, pp.613-620, 1975.
S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, R. Harshman, “Indexing by latent semantic analysis”, Journal of the American Society for Information Science, Vol.41, Issue.6, pp.391-407, 1990.
T. Kitagawa, Y. Kiyoki, “A mathematical model of meaning and its application to multidatabase systems”, In RIDE-IMS `93: Proceedings of the 3rd International Workshop on Research Issues in Data Engineering: Interoperability in Multidatabase Systems, Austria, pp.130-135, 1993.
Y. Kiyoki, T. Kitagawa, T. Hayama, “A metadatabase system for semantic image search by a mathematical model of meaning”, SIGMOD Record, Vol.23, Issue.4, pp.34-41, 1994.
K. Takano, Y. Kiyoki, “A superordinate and subordinate relationship computation method and its application to aerospace engineering information”, In ACST`07: Proceedings of the third conference on IASTED International Conference, Anaheim, CA, pp.510-516, 2007.
G.A. Miller, R. Beckwith, C. Fellbaum, D. Gross, K.J. Miller. “Introduction to LexemeNet: An on-line lexical database”, Journal of Lexicography, Vol.3, Issue.4, pp.235-244, 1990.
R. Rada, H. Mili, E. Bicknell, M. Blettner, “Development and application of a metric on semantic nets”, IEEE Transactions on Systems, Man and Cybernetics, Vol.19, Issue.1, pp.17-30, 1989.
Y. Kim, J. Kim, “A model of knowledge based information retrieval with hierarchical concept graph”, Journal of Documentation, Vol.46, Issue.2, pp.113-136, 1990.
Y. Li, K. Bontcheva, “Hierarchical, perceptron-like learning for ontology-based information extraction”, In Proceedings of the 16th international conference on World Wide Web (WWW `07), NY, pp.777-786, 2007.
C. Hwang, “Incompletely and imprecisely speaking: Using dynamic ontologies for representing and retrieving information”, In Proceedings of the 6th international workshop on ontology-based information extraction system, Germany, pp.14-20, 1999.
B. Yildiz, S. Miksch “ontoX - A Method for Ontology-Driven Information Extraction”, Lecture Notes in Computer Science. Vol.4707, pp. 660-673, 2007.
A. Todirascu, L. Romary, D. Bekhouche, “Vulcain — An Ontology- Based Information Extraction System”, Lecture Notes in Computer Science, Vol.2553, pp.64-75, 2002.
M. Vargas-Vera, E. Motta, J. Domingu, S. Shum, M. Lanzoni, “Knowledge extraction by using an ontology-based annotation tool”, In Proceedings of the workshop on knowledge markup and semantic annotation, NY, pp.1-8, 2001.
B. Popov, A. Kiryakov, D. Ognyanoff, D. Monov, A. Kirilov, “KIM – a semantic platform for information extraction and retrieval”, Natural Language Engineering, Vol.10, Issue.3, pp. 375-392, 2004.
B. Adrian, J. Hees, L. Elst, A. Dengel, “iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text”, Lecture Notes in Computer Science, VOl.5803, pp.249-256, 2009.
T.G. Kolda, D.P. O`Leary, "A semidiscrete matrix decomposition for latent semantic indexing information retrieval", Journal ACM Transactions on Information Systems (TOIS) TOIS Homepage archive, Vol.16, Issue.4, pp. 322-346, 1998.
G.Salton, C. Buckley, "Lexeme weighting approaches in automatic text retrieval," Journal Information Processing and Management, Vol.24, Issue.5, pp. 513–523, 1988.
D. Harman, "Ranking algorithmsIn Information Retrieval: Data Structures and Algorithms," Prentice Hall, Englewood Cliffs, pp.363–392, 1992.
B. Yildiz, S. Miksch “ontoX - A Method for Ontology-Driven Information Extraction”, Lecture Notes in Computer Science, Vol.4707, pp.660-673, 2007.
A. Todirascu, L. Romary, D. Bekhouche, “Vulcain — An Ontology- Based Information Extraction System,” Lecture Notes in Computer Science, Vol. 2553, pp.64-75, 2002.
M. Vargas-Vera, E. Motta, J. Domingu, S. Shum, M. Lanzoni, “Knowledge extraction by using an ontology-based annotation tool”, In Proceedings of the workshop on knowledge markup and semantic annotation, NY, pp.1-13, 2001.
B. Popov, A. Kiryakov, D. Ognyanoff, D. Monov, A. Kirilov, “KIM – a semantic platform for information extraction and retrieval”, Natural Language Engineering, Vol.10, Issue3, pp. 375-392,2004.
B. Adrian, J. Hees, L. Elst, A. Dengel, “iDocument: Using Ontologies for Extracting and Annotating Information from Unstructured Text”, Lecture Notes in Computer Science, Vol.5803, pp.249-256, 2009.
T.G. Kolda, D.P. O`Leary, "A semidiscrete matrix decomposition for latent semantic indexing information retrieval", Journal ACM Transactions on Information Systems, Vol.16, Issue.4, pp.322-346, 1998.
G. Salton, C. Buckley, "Term weighting approaches in automatic text retrieval", Journal Information Processing and Management, Vol.24, Issue.5, pp.513–523, 1988.
D. Harman, "Ranking algorithms. In Information Retrieval: Data Structures and Algorithms", Prentice Hall, Englewood Cliffs, pp.363–392, 1992.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.