Web Text Content Extraction and Classification using Naïve Bayes Classifier Algorithm

Sanjay S Bhadoria; Rajendra Kumar Patel

Authors

Sanjay S Bhadoria Department of Computer Science, PCST, Bhopal
Rajendra Kumar Patel Department of Computer Science, PCST, Bhopal

Keywords:

Classification, Text Extraction, Link Crawler, Data Mining

Abstract

The Web today contains lots of information about subjects such as people, companies, organizations, products, etc. That may be of wide interest. Text mining is the technique that helps users to find useful information from a large amount of digital text documents on the Web or databases. This paper discusses The naive Bayes classifier algorithm of how to follow the appointed website or web page according to users request and in Internet by extraction on web mining.

References

Shaun Yin Gang Wang Yaqui Qiu Weiqun Zhang. ‖ Research and Implement of Classification Algorithm on Web Text Mining‖. IEEE.(2007)446-449

Choi, B. and Peng, X., 2004. Dynamic and Hierarchical Classification of Web Pages. Online Information Review, Vol. 28, No. 2, pp. 139-147.

Sam, L. Z., Maarof, M. A. B. and Selamat, A., 2006. Automated Web Pages Classification with Independent Component Analysis. Proceedings of the Postgraduate Annual Research Seminar. Vol. 1, pp. 466-469.

. M. Castellano, G. Mastronardi, A. Aprile, and G. Tarricone ‖A Web Text Mining Flexible Architecture‖. World Academy of Science, Engineering and Technology 32 2007

Catarina Silva, Bernardete Ribeiro ―Margin-based Active Learning and Background Knowledge in Text Mining‖.Proceedings of the Fourth International Conference on Hybrid Intelligent Systems (HIS‘04)IEEE

Weiguo Fan1, Linda Wallace, Stephanie Rich, Zhongju Zhang ―Tapping into the Power of Text Mining‖.

http://tartarus.org/~martin/PorterStemmer

http://www.htmlparser.com

Mahadevan, I., Karuppasamy, S. and Ramasamy, R., 2009. Resource Optimization in Automatic Web Page Classification using Integrated Feature Selection and Machine Learning. International Arab Journal of e-Technology, Vol. 1, No. 1, pp. 19-28.

Zhang, B., Xu, M. and Xiu, L., 2012. A Web Site Classification Approach Based on its Topological Structure. International Journal on Asian Language Processing. Vol. 20, No. 2, pp. 75-86.

Web Text Content Extraction and Classification using Naïve Bayes Classifier Algorithm

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Journal Information

Information

Join Editorial Board

Keywords

Current Issue