Data input types
- Image definition
- Text definition
- Audio definition
- Tabular data definition
Figure 1.1 – Image, text, tabular, and audio augmentation
Figure 1.1 provides a sneak peek at image, text, tabular and audio augmentation. Later in this book, you will learn how to implement augmentation methods.
Let’s get started with images.
Image is a large category because you can represent almost anything as an image, such as people, landscapes, animals, plants, and various objects around us. Pictures can also represent action, such as sports, sign language, yoga poses, and many more. One particularly creative use of images is capturing a computer mouse’s movement over time to predict whether a user is a computer hacker or not.
The techniques for increasing the number of pictures are horizontal flip, vertical flip, enlarge, zoom in, zoom out, skew, warp, and lighting. Humans are experts at processing images. Thus, if a picture is slightly distorted or darkened, you can still tell that it is the same image. However, this is not the same for a computer. AI represents a color picture as a three-dimensional array of float numbers – the width, height, and RGB as depth. Any image distortion will yield an array with different values.
Graphs, such as time series data charts, and mathematical equation plots, such as 3D topology plots, are outside the scope of image augmentation.
You can eliminate the overfitting problem in DL image classification training by creatively using data augmentation methods.
Text augmentation has different concerns than image augmentation. Let’s take a look.
The primary text input data is in English, but the same techniques for text augmentation can be applied to other West Germanic languages. Python lessons use English as the text input data.
The techniques for supplementing the text input are back translation, easy data augmentation, and albumentation. A few methods might be counterintuitive at first glance, such as deleting or swamping words in a sentence. However, it is an acceptable practice because, in the real world, not everyone writes perfect English.
For example, movie reviewers on the American Multi-Cinema (AMC) website write incomplete or grammatically incorrect sentences. They omit verbs or use inappropriate words. As a rule of thumb, you should not expect perfect English for text input data in many NLP projects.
If an NLP model is trained in perfect English as text input data, it could cause bias against typical online reviewers. In other words, the NLP model will predict inaccurately when deployed to a real-world audience. For example, in sentiment analysis, the AI system will predict whether a movie review has a positive or negative sentiment. Suppose you trained the system using a perfect English dataset. In that case, the AI system might forecast a false positive or false negative when people write a short line with misspelled words and grammatical errors.
Language translation, ideograms, and hieroglyphs are outside the scope of this book. Now, let’s look at audio augmentation.
Audio input data can be any sound wave recording such as music, speech, and natural sounds. Sound wave attributes such as amplitude and frequency are represented as graphs, which are technically images, but you can’t use any image augmentation methods for audio input data.
The techniques for expanding audio input are split into two types: waveform and spectrograph. For raw audio, the transformation methods range from time-shifting and pitch scaling to random gain, while for spectrographs, the functions are time masking, time stretching, pitch scaling, and many others.
Speech in a language other than English is outside the scope of this book. This is not due to technical difficulties but rather because this book is written in English. Writing about the aftermath effects of switching to a different language would be problematic.Audio augmentation is demanding, but tabular data is even more challenging to expand.
Tabular data definition
Tabular data is information in a relational database, spreadsheet, or text file in comma-separated values (CSV) format. Tabular data augmentation is a fast-growing field in ML and DL. The tabular data augmentation techniques are transforming, interacting, mapping, and extraction.
Here is a thought experiment. Can you think of data types other than image, text, audio, and tabular? A hint is Casablanca and Blade Runner.
There are two parts to this chapter. The first half discussed the various concepts and techniques; what follows is hands-on Python coding on a Python Notebook. The book will use this learn-then-code pattern in all the chapters. It is time to get your hands dirty and write Python code.