Data Augmentation Techniques in Deep Learning with Keras
Data augmentation is a widely used technique in deep learning to increase the size of the training set. By generating additional training samples, it helps in improving the accuracy of the model. Keras, a popular deep learning framework, provides several techniques for data augmentation. In this article, we will explore some of the most commonly used data augmentation techniques in Keras.
Image Data Augmentation
In image recognition tasks, data augmentation can be done by applying various transformations to the input images. Keras provides a wide range of inbuilt functions to perform these transformations.
Rotation
Rotation involves rotating the input image by a certain angle. It can be done using the rotation_range
parameter in the ImageDataGenerator
function.
ImageDataGenerator(rotation_range=30)
Width and Height Shifts
Width and height shifts can be done by shifting the input image horizontally or vertically by a certain amount. It can be done using the width_shift_range
and height_shift_range
parameters.
ImageDataGenerator(width_shift_range=0.2, height_shift_range=0.2)
Zoom
Zooming involves zooming in or out of the input image. It can be done using the zoom_range
parameter.
ImageDataGenerator(zoom_range=0.2)
Flipping
Flipping involves flipping the input image either horizontally or vertically. It can be done using the horizontal_flip
and vertical_flip
parameters.
ImageDataGenerator(horizontal_flip=True, vertical_flip=True)
Text Data Augmentation
In text classification tasks, data augmentation can be done by generating text samples through various techniques. Keras provides a few techniques for text data augmentation.
Synonym Replacement
Synonym replacement involves replacing some words in the input text with their synonyms. It can be done using the TfidfVectorizer
function from the sklearn
library.
from sklearn.feature_extraction.text import TfidfVectorizer
import nltk
from nltk.corpus import wordnet
def get_synonyms(word):
synonyms = []
for syn in wordnet.synsets(word):
for lemma in syn.lemmas():
synonyms.append(lemma.name())
return synonyms
def synonym_replacement(text):
tfidf_vect = TfidfVectorizer(tokenizer=nltk.word_tokenize)
tfidf_vect.fit_transform([text])
feature_names = tfidf_vect.get_feature_names()
for idx, word in enumerate(nltk.word_tokenize(text)):
if word in feature_names:
synonyms = get_synonyms(word)
if len(synonyms)>0:
text = text.replace(word, synonyms[0])
return text
Random Insertion
Random insertion involves inserting some new words at random positions in the input text. It can be done using the insert_random_words
function.
import random
def insert_random_words(text, n=2):
words = text.split()
for i in range(n):
words.insert(random.randint(0,len(words)), "newword")
return " ".join(words)
Conclusion
Data augmentation is an important technique for improving the accuracy of deep learning models. We have seen some of the most commonly used data augmentation techniques in Keras for image and text data. By using these techniques, we can generate additional training data which can help in preventing overfitting and improving the generalization of the model.