Book Image

Artificial Intelligence with Python - Second Edition

By : Alberto Artasanchez, Prateek Joshi
Book Image

Artificial Intelligence with Python - Second Edition

By: Alberto Artasanchez, Prateek Joshi

Overview of this book

Artificial Intelligence with Python, Second Edition is an updated and expanded version of the bestselling guide to artificial intelligence using the latest version of Python 3.x. Not only does it provide you an introduction to artificial intelligence, this new edition goes further by giving you the tools you need to explore the amazing world of intelligent apps and create your own applications. This edition also includes seven new chapters on more advanced concepts of Artificial Intelligence, including fundamental use cases of AI; machine learning data pipelines; feature selection and feature engineering; AI on the cloud; the basics of chatbots; RNNs and DL models; and AI and Big Data. Finally, this new edition explores various real-world scenarios and teaches you how to apply relevant AI algorithms to a wide swath of problems, starting with the most basic AI concepts and progressively building from there to solve more difficult challenges so that by the end, you will have gained a solid understanding of, and when best to use, these many artificial intelligence techniques.
Table of Contents (26 chapters)
24
Other Books You May Enjoy
25
Index

Data cleansing and transformation

Just as gas powers a car, data is the lifeblood of AI. The age-old adage of "garbage in, garbage out" remains painfully true. For this reason, having clean and accurate data is paramount to producing consistent, reproducible, and accurate AI models. Some of this data cleansing has required painstaking human involvement. By some measures, it is said that a data scientist spends about 80% of their time cleaning, preparing, and transforming their input data and 20% of the time running and optimizing their models. Examples of this are the ImageNet and MS-COCO image datasets. Both contain over a million labeled images of various objects and categories. These datasets are used to train models that can distinguish between different categories and object types. Initially, these datasets were painstakingly and patiently labeled by humans. As these systems become more prevalent, we can use AI to perform the labeling. Furthermore, there is a plethora of AI-enabled tools that help with the cleansing and deduplication process.

One good example is Amazon Lake Formation. In August 2019, Amazon made its service Lake Formation generally available. Amazon Lake Formation automates some of the steps typically involved in the creation of a data lake including the collection, cleansing, deduplication, cataloging, and publication of data. The data then can be made available for analytics and to build machine models. To use Lake Formation, a user can bring data into the lake from a range of sources using predefined templates. They can then define policies that govern data access depending on the level of access that groups across the organization require.

Some automatic preparation, cleansing, and classification that the data undergoes uses machine learning to automatically perform these tasks.

Lake Formation also provides a centralized dashboard where administrators can manage and monitor data access policies, governance, and auditing across multiple analytics engines. Users can also search for datasets in the resulting catalog. As the tool evolves in the next few months and years, it will facilitate the analysis of data using their favorite analytics and machine learning services, including:

  • Databricks
  • Tableau
  • Amazon Redshift
  • Amazon Athena
  • AWS Glue
  • Amazon EMR
  • Amazon QuickSight
  • Amazon SageMaker