This book presents recent advances (from 2008 to 2012) concerning use of the Naïve Bayes model in unsupervised word sense disambiguation (WSD).
While WSD, in general, has a number of important applications in various fields of artificial intelligence (information retrieval, text processing, machine translation, message understanding, man-machine communication etc.), unsupervised WSD is considered important because it is language-independent and does not require previously annotated corpora. The Naïve Bayes model has been widely used in supervised WSD, but its use in unsupervised WSD has led to more modest disambiguation results and has been less frequent. It seems that the potential of this statistical model with respect to unsupervised WSD continues to remain insufficiently explored.
The present book contends that the Naïve Bayes model needs to be fed knowledge in order to perform well as a clustering technique for unsupervised WSD and examines three entirely different sources of such knowledge for feature selection: WordNet, dependency relations and web N-grams. WSD with an underlying Naïve Bayes model is ultimately positioned on the border between unsupervised and knowledge-based techniques. The benefits of feeding knowledge (of various natures) to a knowledge-lean algorithm for unsupervised WSD that uses the Naïve Bayes model as clustering technique are clearly highlighted. The discussion shows that the Naïve Bayes model still holds promise for the open problem of unsupervised WSD.Autorentext
Florentina T. Hristea is a graduate of the Faculty of Mathematics and Computer Science of the University of Bucharest in 1984. She received her Ph.D. in Mathematics, from the same university, in 1996. She is currently Associate Professor of the Faculty of Mathematics and Computer Science, University of Bucharest. Her current research field is artificial intelligence, with specialization in natural language processing (NLP), as well as computational statistics and data analysis with applications in NLP. She has been Principal Investigator in several national and international research-development projects in the field of statistical NLP. Dr. Hristea is author or co-author of 8 books and of various scientific papers in the fields of computational statistics and natural language processing, respectively, out of which 28 are papers in refereed journals. Dr. Hristea is an elected member of ISI (International Statistical Institute) and of IRF (Information Retrieval Facility; member of the expert pool). She is a member of GWA (Global WordNet Association). Dr. Hristea is Co-Editor of Central European Journal of Computer Science (published by Versita and Springer Verlag). She is equally a member of the Editorial Review Board of Artificial Intelligence Research (Sciedu Press, Canada). Dr. Hristea was a Fulbright Research Fellow at Princeton University, U.S.A., in 2004.Inhalt
1.Preliminaries.- 2.The Naïve Bayes Model in the Context of Word Sense Disambiguation.- 3.Semantic WordNet-based Feature Selection.- 4.Syntactic Dependency-based Feature Selection.- 5.N-Gram Features for Unsupervised WSD with an Underlying Naïve Bayes Model References.- Index.