An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents

Amit Shukla; Rajendra Gupta

Authors

Amit Shukla Dept. of Computer Science, Rabindranath Tagore University, Bhopal, India
Rajendra Gupta Dept. of Computer Science, Rabindranath Tagore University, Bhopal, India

Keywords:

Context-dependent Lexical Information, Word Embeddings, Deep ML Classifier, Unstructured Textual Contents

Abstract

The term "context dependent" refers to a type of word representation that enables machine learning algorithms to distinguish words that have similar meanings. It is a feature learning technique that uses probabilistic models, dimension reduction, or neural networks on the word co-occurrence vector matrix to map words into real-number vectors. We address the problem of recognizing unstructured context-dependent lexical information in unstructured data containers in the research study. We investigate a method that employs word embedding for automatic context and relevant feature detection, as well as a deep neural network for classification. Using publicly accessible tweet and image datasets, we present an alternative model that use Conventional Machine Learning (CML) classifiers and a rule-based model. The proposed method outperforms the alternatives of earlier research. The CLID is analysed in terms of four aspects on the basis of Context-Centred Extraction of Concepts (CCEC). The proposed word embeddings method CCEC gives benefit from a neural-network methods ability to encode textual information by converting meaningful text information into numeric values.

References

H. Mao, X. Shuai, A. Kapadia, “Loose Tweets: An Analysis of Privacy Leaks on Twitter”, in: Proceedings of the 10th Annual ACM Workshop on Privacy in the Electronic Society, in: WPES’11, Association for Computing Machinery, New York, NY, USA, pp. 1–12, 2021.

T.B. Murdoch, A.S. Detsky, “The Inevitable Application of Big Data to Health Care”, JAMA 309 Vol. 13 pp. 1351–1352, 2021.

J.-s. Park, G.-w. Kim, D.-h. Lee, “Sensitive Data Identification in Structured Data through Genner Model Based on Text Generation and NER”, in: Proceedings of the 2020 International Conference on Computing, Networks and Internet of Things, in: CNIOT2020, Association for Computing Machinery, New York, NY, USA, pp. 36–40, 2020.

Z. Yang, Z. Liang, “Automated Identification of Lexical Data from Implicit user Specification”, Journal of Cybersecurity, Vol. 1, Issue 1, pp.12-13 2020.

A.C. Islam, J. Walsh, R. Greenstadt, “Privacy Detective”, in: Proceedings of the 13th Workshop on Privacy in the Electronic Society - WPES ’14, 2019

M. Keshavarz, M. Anwar, “The Automatic Detection of Lexical Data in Smart Homes”, IJPE-2020, pp. 404–416, 2019.

L. Kopeykina, A.V. Savchenko, “Automatic Privacy Detection in Scanned Document Images Based on Deep Neural Networks”, in: 2019 International Russian Automation Conference (RusAutoCon), pp. 1–6, 2019.

E. Myasnikov, A. Savchenko, “Detection of Lexical Textual Information in User Photo Albums on Mobile Devices”, Journal of Computing, pp. 0384–0390, 2019.

R. Chow, P. Golle, J. Staddon, “Detecting Privacy Leaks using Corpus-based Association Rules”, in: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD 08, 2018.

P. Kamakshi, A.V. Babu, “Automatic Detection of Lexical Attribute in PPDM”, in: 2012 IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–5, 2012.

J. Akoka, I. Comyn-Wattiau, C.D. Mouza, H. Fadili, N. Lammari, E. Metais, S.S.-S. Cherfi, “A Semantic Approach for Semi-Automatic Detection of Lexical Data”, Information Resource Management, J. Vol. No. 27, Issue 4, pp.23–44, 2018.

C.D. Mouza, E. Métais, N. Lammari, J. Akoka, T. Aubonnet, I. Comyn-Wattiau, H. Fadili, S.S.-S.d. Cherfi, “Towards an Automatic Detection of Lexical Information in a Database”, in: 2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications, 2018.

H. Heni, F. Gargouri, “Towards an Automatic Detection of Lexical Information in Mongo Database”, Advanced Intelligent System Computer Intelligent System Design Application, pp.138–146, 2019.

L.Q. Trieu, T.-N. Tran, M.-K. Tran, M.-T. Tran, “Document Sensitivity Classification for Data Leakage Prevention with Twitter-Based Document Embedding and Query Expansion”, in: 2017 13th International Conference on Computational Intelligence and Security (CIS), pp. 537–542, 2017.

J.M. Gómez-Hidalgo, J.M. Martín-Abreu, J. Nieves, I. Santos, F. Brezo, P.G. Bringas, “Data Leak Prevention through Named Entity Recognition”, in: 2010 IEEE Second International Conference on Social Computing, pp. 1129–1134, 2010

H. Sak, A. Senior, F. Coise Beaufays, “Long Short-Term Memory based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition”, 2014.

A. Khan, A. Sohail, U. Zahoora, A. Saeed, “A Survey of the Recent Architectures of Deep Convolutional Neural Networks”, Artificial Intelligent Review, Vol. 53 2020.

Y. Zhang, B. Wallace, “A Sensitivity Analysis of Practitioners’, Convolutional Neural Networks for Sentence Classification, 2015.

Ramya S., "Optimal Path Planning for Navigation Using a Generalized Genetic Algorithm," International Journal of Scientific Research in Computer Science and Engineering, Vol.9, Issue.5, pp.7-13, 2021

J. Dhiviya Rose, Isha Mittal, Ramya Mihir, "Efficient and Simple Machine Learning-based Malware and Trojan Identification Tool," International Journal of Scientific Research in Computer Science and Engineering, Vol.10, Issue.2, pp.64-68, 2022.

An Efficient Context-dependent Lexical Information Detection using Word Embeddings and Deep Machine Learning Classifiers for Unstructured Textual Contents

Authors

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Most read articles by the same author(s)

Make a Submission

Journal Information

Information

Join Editorial Board

Keywords

Current Issue