AI Goes Mobile

< Perceptual Image Enhancement on Smartphones >

In conjunction with  ECCV 2018 PIRM 2018 September 14,  Munich, Germany

More layers, more filters, deeper architectures...

Sounds like a standard recipe for achieving top results in various AI competitions, isn't it? Deep ResNets and VGGs, Pix2pix-style CNNs of size half a gigabyte... But should we always need a cluster of GPUs to process even small HD-resolution images? How about light, fast and efficient intelligent solutions? Maybe it's time to come up with something more sophisticated that can run on our everyday hardware?

This challenge is aimed exactly at the above problem, thus all provided solutions will be tested on smartphones to ensure their efficiency. In this competition, we are targeting two conventional Computer Vision tasks: Image Super-resolution and Image Enhancement, that are tightly bound to these devices. While there exists lots of works and papers that are dealing with these problems, they are generally proposing the algorithms which runtime and computational requirements are enormous even for high-end desktops, not to mention mobile phones. Therefore, here we are using a different metric - instead of assessing solution's performance solely based on PSNR/SSIM scores, in this challenge the general rule is to maximize its accuracy per runtime while meeting some additional requirements. More information about the tasks and the validation process is provided below.


9th May: Training and validation data released
17th May: Validation phase starts
3rd August: Test phase starts
10th August: Final models submission deadline
12th August: Factsheets submission deadline
15th August: Challenge results released
29th August: Paper submission deadline
5th September: Notification of accepted papers
14th September: > PIRM 2018 Workshop


Track A:   Image Super-Resolution Track B:   Image Enhancement
Original Image Bicubic
Modified Image Modified
Original Image Original
Modified Image Modified
In this challenge, we consider a conventional Super-Resolution problem, where the goal is to reconstruct the original image based on its downscaled version. To make the task more practical, we consider 4x downscaling factor, sample results for which obtained by SRGAN network are shown above. Original Image Enhancement problem introduced by DPED paper, where the goal is to map photos from a particular smartphone to the same photos obtained from a DSLR camera. Here we consider only a subtask of improving images from a very low-quality iPhone 3GS device.
Dataset for training:   DIV2K (training part) Dataset for training:   DPED (training part), iPhone 3GS
Dataset for validation:   DIV2K (validation part) Dataset for validation:   DPED (test part), included above

Don't know how to start   →   start with this or this code that already satisfies all requirements but is quite slow, and try to optimize it!

Hint: you can try to play with a number of layers, number and size of filters, or just simply try out other CNN architectures...


Read carefully before starting the development!   —   The following requirements apply for both tasks:

Delivered model: Tensorflow, saved as .pb graph
Max. model size: 100MB
Target Image Resolution: 1280 x 720px
Possible Image Resolutions: Any arbitrary size
Max. RAM consumption
(inference, 1280x 720 px image):

A code for converting and testing your Tensorflow model is available in our github repository

Rephrasing the above requirements: your solution should be based on Tensorflow Machine Learning framework, and after saving your pre-trained model should not exceed 100MB. Your solution should be capable of processing images of arbitrary size, and for our target images of resolution 1280x720px it should require no more than 3.5GB of RAM. Note that both SRCNN, DPED and SRGAN networks already satisfy all these requirements!

The performance of your solution will be assessed based on three metrics: its speed compared to a baseline network, its fidelity score measured by PSNR, and its perceptual score computed based on MS-SSIM metric. Since PSNR and SSIM scores do not always objectively reflect image quality, during the test phase we will conduct a user study where your final submissions will be evaluated by a large number of people, and the resulting MOS Scores will replace MS-SSIM results. The total score of your solution will be calculated as a weighted average of the previous scores:

We will use three different validation tracks for evaluating your results. Score A is giving preference to solution with the highest fidelity (PSNR) score, score B is aimed at the solution providing the best visual results (MS-SSIM/MOS scores), and score C is targeted at the best balance between the speed and perceptual/quantitative performance. For each track, we will use the above score formula but with different coefficients. The coefficients along with the baseline scores are available in the github repository.


Track A, Final

Team PSNR MS-SSIM CPU, ms GPU, ms Razer Phone, ms Huawei P20, ms RAM Score A Score B Score C
TEAM_ALEX 1st 28.210.96367014893613351.5GB13.2115.1514.14
KAIST-VICLAB 1st28.140.9630343348129851.5GB12.8614.8313.87
IV SR+28.130.963676770119817761.6GB12.8815.0513.97

Track B, Final

Team PSNR MS-SSIM MOS CPU, ms GPU, ms Razer Phone, ms Huawei P20, ms RAM Score A Score B Score C
Mt.Phoenix 1st 21.990.91252.680468264147221871.4GB14.7220.0619.11
EdS21.650.90482.652332412535153Out of memory2.3GB7.1812.949.36
Rainbow21.850.90672.5583828111 - * - *1.6GB13.1916.3116.93
DPED-Baseline21.380.90342.441120462151737003Out of memory3.7GB2.894.93.32
IV SR+21.60.89572.43091375125181225081.6GB8.139.2610.05

*  -  This solution is using tf.image.adjust_contrast operation not yet available in TensorFlow Mobile


Andrey Ignatov

Computer Vision Lab

ETH Zurich, Switzerland

Radu Timofte

Computer Vision Lab

ETH Zurich, Switzerland

Computer Vision Laboratory, ETH Zurich

Switzerland, 2019-2021