Data Augmentation Techniques in Deep Learning with Keras

Data augmentation is a widely used technique in deep learning to increase the size of the training set. By generating additional training samples, it helps in improving the accuracy of the model. Keras, a popular deep learning framework, provides several techniques for data augmentation. In this article, we will explore some of the most commonly used data augmentation techniques in Keras.

Image Data Augmentation

In image recognition tasks, data augmentation can be done by applying various transformations to the input images. Keras provides a wide range of inbuilt functions to perform these transformations.

Rotation

Rotation involves rotating the input image by a certain angle. It can be done using the rotation_range parameter in the ImageDataGenerator function.

ImageDataGenerator(rotation_range=30)

Width and Height Shifts

Width and height shifts can be done by shifting the input image horizontally or vertically by a certain amount. It can be done using the width_shift_range and height_shift_range parameters.

ImageDataGenerator(width_shift_range=0.2, height_shift_range=0.2)

Zoom

Zooming involves zooming in or out of the input image. It can be done using the zoom_range parameter.

ImageDataGenerator(zoom_range=0.2)

Flipping

Flipping involves flipping the input image either horizontally or vertically. It can be done using the horizontal_flip and vertical_flip parameters.

ImageDataGenerator(horizontal_flip=True, vertical_flip=True)

Text Data Augmentation

In text classification tasks, data augmentation can be done by generating text samples through various techniques. Keras provides a few techniques for text data augmentation.

Synonym Replacement

Synonym replacement involves replacing some words in the input text with their synonyms. It can be done using the TfidfVectorizer function from the sklearn library.

from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
from nltk.corpus import wordnet

def get_synonyms(word):
    synonyms = []
    for syn in wordnet.synsets(word):
        for lemma in syn.lemmas():
            synonyms.append(lemma.name())
    return synonyms

def synonym_replacement(text):
    tfidf_vect = TfidfVectorizer(tokenizer=nltk.word_tokenize)
    tfidf_vect.fit_transform([text])
    feature_names = tfidf_vect.get_feature_names()

    for idx, word in enumerate(nltk.word_tokenize(text)):
        if word in feature_names:
            synonyms = get_synonyms(word)
            if len(synonyms)>0:
                text = text.replace(word, synonyms[0])
    return text

Random Insertion

Random insertion involves inserting some new words at random positions in the input text. It can be done using the insert_random_words function.

import random

def insert_random_words(text, n=2):
    words = text.split()
    for i in range(n):
        words.insert(random.randint(0,len(words)), "newword")
    return " ".join(words)

Conclusion

Data augmentation is an important technique for improving the accuracy of deep learning models. We have seen some of the most commonly used data augmentation techniques in Keras for image and text data. By using these techniques, we can generate additional training data which can help in preventing overfitting and improving the generalization of the model.