Overview of VISCERAL Benchmarks in Intelligent Medical Data Analysis
1. How to organize a scientific challenge on organ segmentation on a large scale?
2. How to develop research infrastructures where data do not need to be moved anymore?
3. How research infrastructures can help in reproducibility and semi-automatic generation of annotations?
VISCERAL also introduces a novel Evaluation as a Service (EaaS) architecture to compare algorithm performance. The data remained in a fixed and inaccessible place in the cloud. Participants obtained a virtual machine and access to the training data. They could, then, install all necessary tools and test on the training data. When the VMs finished, the organizers took control of VMs and ran the algorithms on the test data. Like this no manual optimization is possible on the test data, the actual data are never released to the research and with the availability of executables and data, a full reproducibility is given.
The results of the benchmark compared many algorithms on all four modalities. The CT quality was often much better than MRI quality. Results on some organs such as the lungs and liver were sometimes better than the inter-rater disagreement. The availability of code also allowed to run the segmentation on new images and created fusion of algorithmic segmentations that we call a silver standard to generate large-scale training data.
Copyright © 2019, Author(s). This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 International License (http://creativecommons.org/licenses/by-nc/4.0/) which permits copy and redistribute the material just in noncommercial usages, provided the original work is properly cited.