Web Text Content Extraction and Classification using Naïve Bayes Classifier Algorithm

Authors

  • Sanjay S Bhadoria Department of Computer Science, PCST, Bhopal
  • Rajendra Kumar Patel Department of Computer Science, PCST, Bhopal

Keywords:

Classification, Text Extraction, Link Crawler, Data Mining

Abstract

The Web today contains lots of information about subjects such as people, companies, organizations, products, etc. That may be of wide interest. Text mining is the technique that helps users to find useful information from a large amount of digital text documents on the Web or databases. This paper discusses The naive Bayes classifier algorithm of how to follow the appointed website or web page according to users request and in Internet by extraction on web mining.

 

References

Shaun Yin Gang Wang Yaqui Qiu Weiqun Zhang. ‖ Research and Implement of Classification Algorithm on Web Text Mining‖. IEEE.(2007)446-449

Choi, B. and Peng, X., 2004. Dynamic and Hierarchical Classification of Web Pages. Online Information Review, Vol. 28, No. 2, pp. 139-147.

Sam, L. Z., Maarof, M. A. B. and Selamat, A., 2006. Automated Web Pages Classification with Independent Component Analysis. Proceedings of the Postgraduate Annual Research Seminar. Vol. 1, pp. 466-469.

. M. Castellano, G. Mastronardi, A. Aprile, and G. Tarricone ‖A Web Text Mining Flexible Architecture‖. World Academy of Science, Engineering and Technology 32 2007

Catarina Silva, Bernardete Ribeiro ―Margin-based Active Learning and Background Knowledge in Text Mining‖.Proceedings of the Fourth International Conference on Hybrid Intelligent Systems (HIS‘04)IEEE

Weiguo Fan1, Linda Wallace, Stephanie Rich, Zhongju Zhang ―Tapping into the Power of Text Mining‖.

http://tartarus.org/~martin/PorterStemmer

http://www.htmlparser.com

Mahadevan, I., Karuppasamy, S. and Ramasamy, R., 2009. Resource Optimization in Automatic Web Page Classification using Integrated Feature Selection and Machine Learning. International Arab Journal of e-Technology, Vol. 1, No. 1, pp. 19-28.

Zhang, B., Xu, M. and Xiu, L., 2012. A Web Site Classification Approach Based on its Topological Structure. International Journal on Asian Language Processing. Vol. 20, No. 2, pp. 75-86.

Downloads

Published

2014-10-31

How to Cite

[1]
S. S. Bhadoria and R. K. Patel, “Web Text Content Extraction and Classification using Naïve Bayes Classifier Algorithm”, Int. J. Sci. Res. Comp. Sci. Eng., vol. 2, no. 5, pp. 1–4, Oct. 2014.

Issue

Section

Research Article