Book Image

Natural Language Processing and Computational Linguistics

By : Bhargav Srinivasa-Desikan
Book Image

Natural Language Processing and Computational Linguistics

By: Bhargav Srinivasa-Desikan

Overview of this book

Modern text analysis is now very accessible using Python and open source tools, so discover how you can now perform modern text analysis in this era of textual data. This book shows you how to use natural language processing, and computational linguistics algorithms, to make inferences and gain insights about data you have. These algorithms are based on statistical machine learning and artificial intelligence techniques. The tools to work with these algorithms are available to you right now - with Python, and tools like Gensim and spaCy. You'll start by learning about data cleaning, and then how to perform computational linguistics from first concepts. You're then ready to explore the more sophisticated areas of statistical NLP and deep learning using Python, with realistic language and text samples. You'll learn to tag, parse, and model text using the best tools. You'll gain hands-on knowledge of the best frameworks to use, and you'll know when to choose a tool like Gensim for topic models, and when to work with Keras for deep learning. This book balances theory and practical hands-on examples, so you can learn about and conduct your own natural language processing projects and computational linguistics. You'll discover the rich ecosystem of Python tools you have available to conduct NLP - and enter the interesting world of modern text analysis.
Table of Contents (22 chapters)
Title Page
Copyright and Credits
Packt Upsell
Contributors
Preface
Index

References


[1] A business intelligence system – H. P. Lunn, October 1958 (https://dl.acm.org/citation.cfm?id=1662381)

[2] Retrospect and prospect in computer-based translation – John Hutchins, September 1999 (http://www.mt-archive.info/90/MTS-1999-Hutchins.pdf)

[3] Introduction to Neural Machine Translation with GPUs: https://devblogs.nvidia.com/parallelforall/introduction-neural-machine-translation-gpus-part-2/

[4] Text Mining :https://en.wikipedia.org/wiki/Text_mining

[5] Open American National Corpus: http://www.anc.org

[6] British National Corpus: http://www.natcorp.ox.ac.uk

[7] List of Text Corpora: https://en.wikipedia.org/wiki/List_of_text_corpora

[8] Wikipedia Dataset: https://en.wikipedia.org/wiki/Wikipedia:Database_download

[9] Reddit, news aggregation website: https://www.reddit.com

[10] Beautiful Soup: https://www.crummy.com/software/BeautifulSoup/bs4/doc/

[11] UrlLib: https://docs.python.org/2/library/urllib.html

[12] Scrapy: https://scrapy.org

[13] What is Twitter, a social network or a news media?: https://dl.acm.org/citation.cfm?id=1772751

[14] Shakespeare and his co-authors: https://www.upenn.edu/spotlights/shakespeare-and-his-co-authors-told-penn-engineers

[15] Tweet Sentiment Visualization: https://www.csc2.ncsu.edu/faculty/healey/tweet_viz/tweet_app/

[16] Tweepy, twitter API: http://www.tweepy.org

[17] TheyWorkForYou: https://www.theyworkforyou.com

[18] Mailing WhatsApp chat history: https://faq.whatsapp.com/en/android/23756533/

[19] Project Gutenburg: https://www.gutenberg.org

[20] Pastiche detection based on stopword rankings. Exposing impersonators of a Romanian writer: http://www.aclweb.org/anthology/W12-0411

[21] TensorFlow: https://www.tensorflow.org

[22] Scikit-learn: http://scikit-learn.org/stable/

[23] spaCy: https://spacy.io