Web Text Content Extraction and Classification using Naïve Bayes Classifier Algorithm
Keywords:
Classification, Text Extraction, Link Crawler, Data MiningAbstract
The Web today contains lots of information about subjects such as people, companies, organizations, products, etc. That may be of wide interest. Text mining is the technique that helps users to find useful information from a large amount of digital text documents on the Web or databases. This paper discusses The naive Bayes classifier algorithm of how to follow the appointed website or web page according to users request and in Internet by extraction on web mining.
References
Shaun Yin Gang Wang Yaqui Qiu Weiqun Zhang. ‖ Research and Implement of Classification Algorithm on Web Text Mining‖. IEEE.(2007)446-449
Choi, B. and Peng, X., 2004. Dynamic and Hierarchical Classification of Web Pages. Online Information Review, Vol. 28, No. 2, pp. 139-147.
Sam, L. Z., Maarof, M. A. B. and Selamat, A., 2006. Automated Web Pages Classification with Independent Component Analysis. Proceedings of the Postgraduate Annual Research Seminar. Vol. 1, pp. 466-469.
. M. Castellano, G. Mastronardi, A. Aprile, and G. Tarricone ‖A Web Text Mining Flexible Architecture‖. World Academy of Science, Engineering and Technology 32 2007
Catarina Silva, Bernardete Ribeiro ―Margin-based Active Learning and Background Knowledge in Text Mining‖.Proceedings of the Fourth International Conference on Hybrid Intelligent Systems (HIS‘04)IEEE
Weiguo Fan1, Linda Wallace, Stephanie Rich, Zhongju Zhang ―Tapping into the Power of Text Mining‖.
http://tartarus.org/~martin/PorterStemmer
Mahadevan, I., Karuppasamy, S. and Ramasamy, R., 2009. Resource Optimization in Automatic Web Page Classification using Integrated Feature Selection and Machine Learning. International Arab Journal of e-Technology, Vol. 1, No. 1, pp. 19-28.
Zhang, B., Xu, M. and Xiu, L., 2012. A Web Site Classification Approach Based on its Topological Structure. International Journal on Asian Language Processing. Vol. 20, No. 2, pp. 75-86.
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors contributing to this journal agree to publish their articles under the Creative Commons Attribution 4.0 International License, allowing third parties to share their work (copy, distribute, transmit) and to adapt it, under the condition that the authors are given credit and that in the event of reuse or distribution, the terms of this license are made clear.