Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Data Augmentation with Python
  • Table Of Contents Toc
Data Augmentation with Python

Data Augmentation with Python

By : Duc Haba
5 (10)
close
close
Data Augmentation with Python

Data Augmentation with Python

5 (10)
By: Duc Haba

Overview of this book

Data is paramount in AI projects, especially for deep learning and generative AI, as forecasting accuracy relies on input datasets being robust. Acquiring additional data through traditional methods can be challenging, expensive, and impractical, and data augmentation offers an economical option to extend the dataset. The book teaches you over 20 geometric, photometric, and random erasing augmentation methods using seven real-world datasets for image classification and segmentation. You’ll also review eight image augmentation open source libraries, write object-oriented programming (OOP) wrapper functions in Python Notebooks, view color image augmentation effects, analyze safe levels and biases, as well as explore fun facts and take on fun challenges. As you advance, you’ll discover over 20 character and word techniques for text augmentation using two real-world datasets and excerpts from four classic books. The chapter on advanced text augmentation uses machine learning to extend the text dataset, such as Transformer, Word2vec, BERT, GPT-2, and others. While chapters on audio and tabular data have real-world data, open source libraries, amazing custom plots, and Python Notebook, along with fun facts and challenges. By the end of this book, you will be proficient in image, text, audio, and tabular data augmentation techniques.
Table of Contents (17 chapters)
close
close
1
Part 1: Data Augmentation
4
Part 2: Image Augmentation
7
Part 3: Text Augmentation
10
Part 4: Audio Data Augmentation
13
Part 5: Tabular Data Augmentation

Data augmentation role

Data is paramount in any AI project. This is especially true when using the artificial neural network (ANN) algorithm, also known as DL. The success or failure of a DL project is primarily due to the input data quality.

One primary reason for the significance of data augmentation is that it is relatively too easy to develop an AI for prediction and forecasting, and those models require robust data input. With the remarkable advancement in developing, training, and deploying a DL project, such as using the FastAI framework, you can create a world-class DL model in a handful of Python code lines. Thus, expanding the dataset is an effective option to improve the DL model’s accuracy over your competitor.

The traditional method of acquiring additional data is difficult, expensive, and impractical. Sometimes, the only available option is to use data augmentation techniques to extend the dataset.

Fun fact

Data augmentation methods can increase the data’s size tenfold. For example, it is relatively challenging to acquire additional skin cancer images. Thus, using a random combination of image transformations, such as vertical flip, horizontal flip, rotating, and skewing, is a practical technique that can expand the skin cancer photo data.

Without data augmentation, sourcing new skin cancer photos and labeling them is expensive and time-consuming. The International Skin Imaging Collaboration (ISIC) is the authoritative data source for skin diseases, where a team of dermatologists verified and classified the images. ISIC made the datasets available to the public to download for free. If you can’t find a particular dataset from ISIC, it is difficult to find other means, as accessing hospital or university labs to acquire skin disease images is laced with legal and logistic blockers. After obtaining the photos, hiring a team of dermatologists to classify the pictures to correct diseases would be costly.

Another example of the impracticality of attaining additional images instead of augmentation is when you download photos from social media or online search engines. Social media is a rich source of image, text, audio, and video data. Search engines, such as Google or Bing, make it relatively easy to download additional data for a project, but copyrights and legal usage are a quagmire. Most images, texts, audio, and videos on social media, such as YouTube, Facebook, TikTok, and Twitter, are not clearly labeled as copyrights or public domain material.

Furthermore, social media promotes popular content, not unfavorable or obscure material. For example, let’s say you want to add more images of parrots to your parrot classification AI system. Online searches will return a lot of blue-and-yellow macaws, red-and-green macaws, or sulfur-crested cockatoos, but not as many Galah, Kea, or the mythical Norwegian-blue parrot – a fake parrot from the Monty Python comedy skit.

Insufficient data for AI training is exacerbated for text, audio, and tabular data types. Generally, obtaining additional text, audio, and tabular data is expensive and time-consuming. There are strong copyright laws protecting text data. Audio files are less common online, and tabular data is primarily from private company databases.

The following section will define the four commonly used data types.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Data Augmentation with Python
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon