ESRGAN Tutorial: Improve AI Image Resolution with Ease

What is ESRGAN?

Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN) is an advanced type of Generative Adversarial Network (GAN) model, which uses two neural networks that play a competitive game against each other. The main components of this system are the generator and the discriminator. The generator creates new images, while the discriminator evaluates the authenticity of these images, determining if they are real or generated.

In the initial phase, the generator produces a new image, and subsequently, the discriminator assesses whether this image is real or a fabrication. During this process, GANs calculate two different loss values - one for the generator and one for the discriminator. This feedback loop allows the generator to learn from its mistakes and improve its created images over time, while the discriminator enhances its ability to identify real from fake images.

ESRGAN utilizes a pre-trained model that employs VGG19 weights, emphasizing the philosophy of “Do Not Reinvent The Wheel” - everything you need is often already available to use effectively.

Preparing ESRGAN for Your Purpose

To leverage ESRGAN effectively, you must prepare your dataset. For this tutorial, we will be using the Kaggle dataset called CalebA, which consists of over 200,000 images of celebrity faces, each with a resolution of 218x178 and three color channels. While it's feasible to use the entire dataset, it’s advisable to upload only 10,000 images to avoid processing delays.

Notebook Setup

ESRGAN requires a GPU with substantial memory, and we recommend using Google Colab for this purpose. Begin by adjusting the runtime type: select Runtime then Change Runtime Type and choose GPU as the hardware accelerator.

Cloning the Repository

Next, clone the repository that contains the implemented ESRGAN and install its required libraries to get started.

Loading Data

To proceed with loading your data, you will need to connect your Google Drive with Google Colab. This can be done seamlessly by using the command:

To extract files from a RAR archive, we will utilize the patool library, ensuring that the output directory is set to /content/PyTorch-GAN/data.

Creating a Testing Dataset

It's essential to maintain a separate testing dataset that wasn't included during the training of the model. To construct this, move a select portion of your images from the main dataset folder to a new folder designated for testing. Start by creating a new folder named test.

If the number of images is overwhelming, you can also employ batching methods to manage them efficiently.

Training the Model

Now it’s time to train the neural network using ESRGAN! To initiate training, utilize the following command with specified arguments:

Available training arguments:

--dataset_name: name of your folder located in /content/PyTorch-GAN/data
--n_epochs: default is 200 epochs
--hr_height: height of output image (default is 256)
--hr_width: width of output image (default is 256)
--channels: channels of the input image (default is 3)
--checkpoint_interval: recommended to set at 250 for smaller datasets (default is 5000)

Details regarding other arguments can be found HERE.

The outputted training images will be saved in the folder located at /content/PyTorch-GAN/implementations/esrgan/images/training.

Testing the Model

To evaluate the model, you will need an image from your pre-defined test set. Execute the model using the designated command:

Available testing arguments:

--image_path: name of your image (e.g., /content/PyTorch-GAN/data/test/0.jpg)
--checkpoint_model: path to your trained generator (e.g., /content/PyTorch-GAN/implementations/esrgan/saved_models/generator_X.pth; replace X with the last trained epoch number).

The generated image will be stored in /content/PyTorch-GAN/implementations/esrgan/images/outputs/.

If you'd like to copy the generated images to your Google Drive, simply do it with the appropriate commands.

Wrapping Up

The power of GAN models lies in their ability to employ neural networks that continuously refine one another for gradual improvement. ESRGAN, as showcased in this tutorial, focuses explicitly on enhancing images. Although the model is powerful, it does demand substantial computing resources. The initial outcomes, after 5 epochs with 10,000 training images, may not be flawless; however, as you proceed to more epochs, you'll observe remarkable improvements!

Explore the intriguing world of AI applications emerging from hackathons for inspiration, and enjoy this captivating journey with ESRGAN!

Stay tuned for more thrilling AI tutorials!

Thank you! - Adrian Banachowicz, Data Science Intern at New Native.