A complete list of our publications is available on Google Scholar, please find below our highlighted work (all Open Access).


Groh, R., Goes, N. & Kist, A. M.  (2024). SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages. TinyML Research Symposium.
Article / Code / Data

This study introduces a novel, entirely artificially generated benchmarking dataset tailored for speech recognition, representing a core challenge in the field of tiny deep learning. SpokeN-100 consists of spoken numbers from 0 to 99 spoken by 32 different speakers in four different languages, namely English, Mandarin, German and French, resulting in 12,800 audio samples.


Neubig, L., Kist, A.M. (2023). Dataset Pruning using Evolutionary Optimization. Bildverarbeitung für die Medizin 2023.

The right number of data points to solve an image processing task is crucial. To analyze how many data points in a dataset are important and unique enough to support the learning process of a neural network, we used an evolutionary approach in constrained (fixed dataset size) and an unconstrained mode (flexible dataset size).

Neubig, L., Kist, A.M.(2023). Evolutionary Normalization Optimization Boosts Semantic Segmentation Network Performance. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023.

Is Batch Normalization always the best normalization method for a medical image segmentation task. We analyzed the influence of normalization methods in a U-Net layer wise in a systematic study using evolutionary optimization ad compared it to state-of-the-art models.

Groh, R. & Kist, A. M. (2023). End-to-end evolutionary neural architecture search for microcontroller units. IEEE COINS.

Here, we introduce our end-to-end evolutionary NAS (EvoNAS) for microcontroller units that optimize both, pre-processing and neural network architectures. Each neural network architecture is assessed using the multi-objective accuracy, memory footprint, inference time, and energy consumption, to derive a common performance measure to be maximized. To ensure immediate use of all potential solutions on the microcontroller environment, we create a software-hardware chain in which each neural network is deployed to measure the inference time and power consumption directly.

Dörrich, M., Fan, M., & Kist, A. M. (2023). Impact of Mixed Precision Techniques on Training and Inference Efficiency of Deep Neural Networks. IEEE Access.

Our brain works with fuzzy and low-precision logic. In this study, we investigate how mixed-precision techniques have an effect of the training and inference efficiency of encoder-decoder deep neural networks in a biomedical image segmentation task.

Kruse, E., Döllinger, M., Schützenberger, A., & Kist, A. M. (2023). GlottisNetV2: Temporal Glottal Midline Detection using Deep Convolutional Neural Networks. IEEE Journal of Translational Engineering in Health and Medicine.
Article / Code

Detecting the glottal midline accurately is crucial to assess quantitative parameters related to the symmetrical oscillation of the vocal folds. Here, we show how to use engineered neural networks to allow accurate midline detection simultaneously with glottal area segmentation.


Kist, A. M., Breininger, K., Dörrich, M., Dürr, S., Schützenberger, A., & Semmler, M. (2022). A single latent channel is sufficient for biomedical glottis segmentation. Scientific Reports12(1), 14292.
Article / Code / Data

In this paper, we show by mining an encoder-decoder deep neural network that a single latent channel image is sufficient for glottis segmentation. Further, we describe the function of the latent channel and how this affects downstream glottis segmentation.

Groh, R., Lei, Z., Martignetti, L., Li-Jessen, N. Y., & Kist, A. M. (2022). Efficient and Explainable Deep Neural Networks for Airway Symptom Detection in Support of Wearable Health Technology. Advanced Intelligent Systems, 2100284.
Article / Code

Airway symptom detection is crucial for monitoring chronic airway-related diseases. In this work, René is showing how data from neck surface accelerometers can be analyzed using a deep neural network in a computational constrained environment. He uses evolutionary neural architecture search to find an accurate, yet fast and deployable deep neural network for wearables.

Neubig, L., Groh, R., Kunduk, M., Larsen, D., Leonard, R., & Kist, A. M. (2022). Efficient Patient Orientation Detection in Videofluoroscopy Swallowing Studies. In Bildverarbeitung für die Medizin 2022 (pp. 129-134). Springer Vieweg, Wiesbaden.

Swallowing disorders are commonly examined using videofluoroscopy swallowing studies (VFSS). To comprehensively evaluate the swallowing process, a typical VFSS contains different patient orientations. Here, we show a systematic architectural scaling approach and found that an efficient ResNet18 variant is sufficient to classify a full VFSS recording of about 1800 frames in less than 14 s on conventional CPUs.


Kist, A. M., Dürr S, Schützenberger A, and Döllinger M. OpenHSV: An open platform for laryngeal high-speed videoendoscopy. Scientific reports, 11 (2021), 13760.
Article / Code / Docs / Award

Commercially available systems for laryngeal high-speed videoendoscopy have not been further developed lately, are closed-source, and have only very limited analysis capacities. With OpenHSV, we provide a novel, award-winning, open hard- and software platform with DNN-powered online analysis.

Kist, A. M., and Michael Döllinger. Efficient biomedical image segmentation on Edge TPUs. Accepted as a short paper at Medical Imaging with Deep Learning (MIDL), 2021

We highlight at MIDL our work on semantic segmentation using Edge TPUs.

Kist, A. M., Zilker J., Döllinger M., & Semmler M. Feature-based image registration in structured light endoscopy. Accepted as full paper at Medical Imaging with Deep Learning (MIDL), 2021
Article / Code

Structured light endoscopy is a 3D-imaging method. However, the assignment of a projected laser grid to its reference is still tricky. We propose a Deep Learning-based image registration approach that achieves 91% accuracy on an ex vivo dataset.

Kist, A. M., Gómez P., Dubrovskiy D., Schlegel O., Kunduk M., Echternach M., Patel RR., Semmler M., Bohr C., Dürr S., Schützenberger A., & Döllinger M. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. J Speech Lang Hear R, 64 (6), 1889-1903.

The analysis of high-speed videoendoscopy data is crucial for voice quantification. In this paper, we describe the Glottis Analysis Tools (GAT). GAT has been actively developed in C# since 2010 and is used by dozens of labs worldwide.


Kist, A. M., and Michael Döllinger. Efficient Biomedical Image Segmentation on EdgeTPUs at Point of Care. IEEE Access 8 (2020): 139356-139366.

Deep neural networks are changing the way of biomedical diagnosis. For image segmentation, they can be very large and slow, especially on CPUs. In our recent studywe show that we can improve the glottis segmentation inference speed >79x fold by optimizing a popular biomedical segmentation network (U-Net) and porting it to the inexpensive EdgeTPU Hardware Accelerator.

Kist, A. M., Zilker, J., Gómez, P., Schützenberger, A., & Döllinger, M. (2020). Rethinking glottal midline detection. Scientific reports10(1), 1-15.
Article / Code

Symmetry is important in vocal fold motion. The identification of the glottal midline is crucial to deriving symmetry from the glottal area. Here, we evaluate different approaches to determine the glottal midline and suggest a multi-task architecture, GlottisNet, that predicts both simultaneously, glottis segmentation and glottal midline.

Gómez, P.*, Kist, A. M.*, Schlegel, P., Berry, D. A., Chhetri, D. K., Dürr, S., … & Döllinger, M. (2020). BAGLS, a multihospital benchmark for automatic glottis segmentation. Scientific data7(1), 1-12.
Article / Code / Dataset

Glottis segmentation is a key component for analyzing the vocal fold vibrations. With BAGLS, we provide the first open, multihospital dataset for training and evaluating deep neural networks.


Kist, A. M., & Portugues, R. (2019). Optomotor swimming in larval zebrafish is driven by global whole-field visual motion and local light-dark transitions. Cell Reports29(3), 659-670.

Larval zebrafish swim when they perceive a whole-field moving stimulus. However, the underlying features that drive optomotor swimming remain elusive. Here, we show that larval zebrafish are predominantly driven by local light-dark transitions.

Scroll to top