Text Summarization using Latent Semantic Analysis
Keywords:
Information retrieval, Latent Semantic Analysis (LSA), Singular Value Decomposition (SVD), text summarization componentAbstract
Text Summarization is an approach of creating compressed version of a given document using natural language processing that can help users to quickly understand the main points of a document. Text summarization aims at getting the most suggestive content in a system of condensed form from an enabled input document while it retains the semantic information of the text to a large extent. It is considered to be an effective way of handling the information which is found to be overloaded. It solves the problem of presenting information in more compact form. There are various approaches to produce well defined summaries and one of the newest methods is Latent Semantic Analysis (LSA). Though the available information about any topic is huge and unimaginable in numbers in recent years. So, there is a need for quick view of those articles to decide relatedness of the article to the user’s need. In this paper, the efficient way of summarizing the text document by involving the combination of the techniques and then evaluating it with the rouge scores was implemented. The Singular value decomposition plays major role in extracting important sentences from the input document. Every sentence is assigned with rank based on its importance in the original document. Sentence selection is done according to the ranks and the summary is generated. The rouge will produce three scores namely, Recall, Precision and the F-score. F-score is considered for evaluating the correctness of a summary. The comparison of three different summaries by reducing input document as 1/2nd, 1/3rd, 1/4th rouge scores and f-score are found to be obtain the effective results towards summarizing the text document.
References
Avishikta Ghosh, “Bengali Text Summarization using Singular Value Decomposition”, 2014.
Gong, Y. and Liu, X. 2001, “Generic Text Summarization Using Relevance Measure and Latent Semantic Analysis”, Proceedings of SIGIR`01.
Lin CY. ROUGE: a package for automatic evaluation of summaries. In: Proceedings of the workshop on text summarization branches out (WAS 2004) 2004.
Makbule Gulcin Ozsoy and Ferda Nur Alpaslan, “Text summarization using Latent Semantic Analysis”, Journal of Information Science 1–13, 2011.
Makbule Gulcin Ozsoy, Ilyas Cicekli, Ferda Nur Alpaslan, “Text Summarization of Turkish Texts using Latent Semantic Analysis”, Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 869–876, Beijing, August 2010.
Mihalcea R, Tarau P. Text-rank: bringing order into texts. In: Proceeding of the conference on “empirical methods in natural language processing “ 2004: 404–411.
Murray, G., Renals, S. and Carletta, J. 2005. Extractive summarization of meeting recordings. Proceedings of the 9th European Conference on Speech Communication and Technology.
R. A. García-Hernández and Y. Ledeneva, “Single Extractive Text Summarization Based on a Genetic Algorithm,” In Pattern Recognition, Springer Berlin Heidelberg, 2013, pp. 374-383.
R. Mihalcea and P. Tarau, “A language independent algorithm for single and multiple document summarizations,” Proceedings of the Second International Joint Conference Natural Language Processing (IJCNLP’05), Korea, pp. 602– 607, 11–13 October 2005.
Radev D, Blair-Goldstein S, Zang Z. Experiments in single and multi-document summarization using MEAD. In: Proceedings of the document understanding conference 2001.
Wang R, Dunnion J, Carthy J. Machine learning approach to augmenting news head-line generation. In: Proceedings of the international joint conference on natural language processing 2005.
Y. Gong and X. Liu, “Generic text summarization using relevance measure and latent semantic analysis,” in: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans, USA, pp. 19–25, 9–12 September, 2001.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.