Computer Vision Laboratory, ETH Zurich
Switzerland, 2019-2021
AI Goes Mobile
< Perceptual Image Enhancement on Smartphones >
In conjunction with ECCV 2018 — PIRM 2018 — September 14, Munich, Germany
More layers, more filters, deeper architectures...
Sounds like a standard recipe for achieving top results in various AI competitions, isn't it? Deep ResNets and VGGs, Pix2pix-style CNNs of size half a gigabyte... But should we always need a cluster of GPUs to process even small HD-resolution images? How about light, fast and efficient intelligent solutions? Maybe it's time to come up with something more sophisticated that can run on our everyday hardware?
This challenge is aimed exactly at the above problem, thus all provided solutions will be tested on smartphones to ensure their efficiency. In this competition, we are targeting two conventional Computer Vision tasks: Image Super-resolution and Image Enhancement, that are tightly bound to these devices. While there exists lots of works and papers that are dealing with these problems, they are generally proposing the algorithms which runtime and computational requirements are enormous even for high-end desktops, not to mention mobile phones. Therefore, here we are using a different metric - instead of assessing solution's performance solely based on PSNR/SSIM scores, in this challenge the general rule is to maximize its accuracy per runtime while meeting some additional requirements. More information about the tasks and the validation process is provided below.
TIMELINE
9th May: | Training and validation data released |
17th May: | Validation phase starts |
3rd August: | Test phase starts |
10th August: | Final models submission deadline |
12th August: | Factsheets submission deadline |
15th August: | Challenge results released |
29th August: | Paper submission deadline |
5th September: | Notification of accepted papers |
14th September: | > PIRM 2018 Workshop |
TASKS
Track A: Image Super-Resolution | Track B: Image Enhancement |
---|---|
In this challenge, we consider a conventional Super-Resolution problem, where the goal is to reconstruct the original image based on its downscaled version. To make the task more practical, we consider 4x downscaling factor, sample results for which obtained by SRGAN network are shown above. | Original Image Enhancement problem introduced by DPED paper, where the goal is to map photos from a particular smartphone to the same photos obtained from a DSLR camera. Here we consider only a subtask of improving images from a very low-quality iPhone 3GS device. |
Dataset for training: DIV2K (training part) | Dataset for training: DPED (training part), iPhone 3GS |
Dataset for validation: DIV2K (validation part) | Dataset for validation: DPED (test part), included above |
Don't know how to start → start with this or this code that already satisfies all requirements but is quite slow, and try to optimize it!
Hint: you can try to play with a number of layers, number and size of filters, or just simply try out other CNN architectures...
VALIDATION
Read carefully before starting the development! — The following requirements apply for both tasks:
Delivered model: | Tensorflow, saved as .pb graph |
Max. model size: | 100MB |
Target Image Resolution: | 1280 x 720px |
Possible Image Resolutions: | Any arbitrary size |
Max. RAM consumption (inference, 1280x 720 px image): | 3.5GB |
A code for converting and testing your Tensorflow model is available in our github repository
Rephrasing the above requirements: your solution should be based on Tensorflow Machine Learning framework, and after saving your pre-trained model should not exceed 100MB. Your solution should be capable of processing images of arbitrary size, and for our target images of resolution 1280x720px it should require no more than 3.5GB of RAM. Note that both SRCNN, DPED and SRGAN networks already satisfy all these requirements!
The performance of your solution will be assessed based on three metrics: its speed compared to a baseline network, its fidelity score measured by PSNR, and its perceptual score computed based on MS-SSIM metric. Since PSNR and SSIM scores do not always objectively reflect image quality, during the test phase we will conduct a user study where your final submissions will be evaluated by a large number of people, and the resulting MOS Scores will replace MS-SSIM results. The total score of your solution will be calculated as a weighted average of the previous scores:
We will use three different validation tracks for evaluating your results. Score A is giving preference to solution with the highest fidelity (PSNR) score, score B is aimed at the solution providing the best visual results (MS-SSIM/MOS scores), and score C is targeted at the best balance between the speed and perceptual/quantitative performance. For each track, we will use the above score formula but with different coefficients. The coefficients along with the baseline scores are available in the github repository.
SUBMISSION
To register your team, send an email to ai.mobile.challenge@gmail.com with the following information:
Email Subject: | AI Mobile Challenge Registration |
---|---|
Email Text: | Team Name |
Team Member 1 (Name, Surname, Affiliation) | |
Team Member 2 (Name, Surname, Affiliation) | |
.... |
To validate your model, send an email indicating the track, team id and the corresponding model.pb file:
Email Subject: | [Track X] [Team ID] [Team Name] Submission |
---|---|
Email Text: | Link to model.pb file |
You are allowed to send up to 2 submissions per day for each track. The leaderboard will show the results of your last successful submission. Please make sure that the results provided by our validation scripts are meaningful before sending your submission files.
LEADERBOARD
Track A, Final
Team | PSNR | MS-SSIM | CPU, ms | GPU, ms | Razer Phone, ms | Huawei P20, ms | RAM | Score A | Score B | Score C |
---|---|---|---|---|---|---|---|---|---|---|
TEAM_ALEX | 28.21 | 0.9636 | 701 | 48 | 936 | 1335 | 1.5GB | 13.21 | 15.15 | 14.14 |
KAIST-VICLAB | 28.14 | 0.9630 | 343 | 34 | 812 | 985 | 1.5GB | 12.86 | 14.83 | 13.87 |
CARN_CVL | 28.19 | 0.9633 | 773 | 112 | 1101 | 1537 | 1.5GB | 13.08 | 15.02 | 14.04 |
IV SR+ | 28.13 | 0.9636 | 767 | 70 | 1198 | 1776 | 1.6GB | 12.88 | 15.05 | 13.97 |
Rainbow | 28.13 | 0.9632 | 654 | 56 | 1414 | 1749 | 1.5GB | 12.84 | 14.92 | 13.91 |
Mt.Phoenix | 28.14 | 0.9630 | 793 | 90 | 1492 | 1994 | 1.5GB | 12.86 | 14.83 | 13.87 |
SuperSR | 28.18 | 0.9629 | 969 | 98 | 1731 | 2408 | 1.5GB | 12.35 | 14.17 | 12.94 |
BOE-SBG | 27.79 | 0.9602 | 1231 | 88 | 1773 | 2420 | 1.5GB | 9.79 | 11.98 | 10.55 |
SRCNN-Baseline | 27.21 | 0.9552 | 3239 | 205 | 7801 | 11566 | 2.6GB | 5.33 | 7.77 | 5.93 |
Track B, Final
Team | PSNR | MS-SSIM | MOS | CPU, ms | GPU, ms | Razer Phone, ms | Huawei P20, ms | RAM | Score A | Score B | Score C |
---|---|---|---|---|---|---|---|---|---|---|---|
Mt.Phoenix | 21.99 | 0.9125 | 2.6804 | 682 | 64 | 1472 | 2187 | 1.4GB | 14.72 | 20.06 | 19.11 |
EdS | 21.65 | 0.9048 | 2.6523 | 3241 | 253 | 5153 | Out of memory | 2.3GB | 7.18 | 12.94 | 9.36 |
BOE-SBG | 21.99 | 0.9079 | 2.6283 | 1620 | 111 | 1802 | 2321 | 1.6GB | 10.39 | 14.61 | 12.62 |
MENet | 22.22 | 0.9086 | 2.6108 | 1461 | 138 | 2279 | 3459 | 1.8GB | 11.62 | 14.77 | 13.47 |
Rainbow | 21.85 | 0.9067 | 2.5583 | 828 | 111 | - * | - * | 1.6GB | 13.19 | 16.31 | 16.93 |
KAIST-VICLAB | 21.56 | 0.8948 | 2.5123 | 2153 | 181 | 3200 | 4701 | 2.3GB | 6.84 | 9.84 | 8.65 |
SNPR | 22.03 | 0.9042 | 2.465 | 1448 | 81 | 1987 | 3061 | 1.6GB | 9.86 | 10.43 | 11.05 |
DPED-Baseline | 21.38 | 0.9034 | 2.4411 | 20462 | 1517 | 37003 | Out of memory | 3.7GB | 2.89 | 4.9 | 3.32 |
Geometry | 21.79 | 0.9068 | 2.4324 | 833 | 83 | 1209 | 1843 | 1.6GB | 12.0 | 12.59 | 14.95 |
IV SR+ | 21.6 | 0.8957 | 2.4309 | 1375 | 125 | 1812 | 2508 | 1.6GB | 8.13 | 9.26 | 10.05 |
SRCNN-Baseline | 21.31 | 0.8929 | 2.295 | 3274 | 204 | 6890 | 11593 | 2.6GB | 3.22 | 2.29 | 3.49 |
TEAM_ALEX | 21.87 | 0.9036 | 2.1196 | 781 | 70 | 962 | 1436 | 1.6GB | 10.21 | 3.82 | 10.81 |
* - This solution is using tf.image.adjust_contrast operation not yet available in TensorFlow Mobile
ORGANIZERS
Computer Vision Lab ETH Zurich, Switzerland andrey@vision.ee.ethz.ch |
Computer Vision Lab ETH Zurich, Switzerland timofter@vision.ee.ethz.ch |
Computer Vision Laboratory, ETH Zurich
Switzerland, 2019-2021