A complete list of our publications is available on Google Scholar, please find below our highlighted work (we strongly support Open Access).


Fernholz, M.H.P., Guggiana Nilo, D. A., Bonhoeffer, T., Kist, A. M.  (2024). DeepD3, an open framework for automated quantification of dendritic spines – PLOS Computational Biology
Article / Code / Datasets / Model Zoo / Contribute

This work tackles the automated quantification of a crucial structure in the brain: dendritic spines, critical to many functions of the brain such as learning and memory. We introduce DeepD3, an open-source tool that uses deep learning to automatically and accurately quantify dendritic spines. Trained on a diverse range of data, annotated by multiple experts under various experimental conditions, DeepD3 is not only efficient and reliable but also transparent and adaptable to different research needs.

Darvish, M. & Kist, A. M.  (2024). A Generative Method for a Laryngeal Biosignal – Journal of Voice
Article / GIF

We introduce a new method that uses Variational Autoencoders (VAEs) to create synthetic endoscopic footage that closely resemble real ones, providing a useful resource for researchers and clinicians. We further show that using a vector describing the glottis opening (glottal opening vector, GIOVe) we can arbitrary close and open the glottis while maintaining the overall laryngeal anatomy. With that, we show that we can create artificially glottal area waveforms of any given frequency.

Hauser, S.L., Brosig, J., Murthy, B., Attardo, A. ,Kist, A. M.  (2024). Implicit neural representations in light microscopy – Biomedical Optics Express.

In this work, we explore the use of implicit neural representations in light microscopy using the SIREN networks introduced by Sitzmann et al.. Here, we use SIRENs (MLPs with sine activation functions) to predict intermediate planes across multiple micrometers and automatically correct motion artifacts while denoising images. Although SIRENs can affect noise statistics, this issue is mitigated using a denoising neural network, as demonstrated by the recovery of dendritic spines.

Groh, R., Goes, N. & Kist, A. M.  (2024). SpokeN-100: A Cross-Lingual Benchmarking Dataset for The Classification of Spoken Numbers in Different Languages. TinyML Research Symposium.
Article / Code / Data

This study introduces a novel, entirely artificially generated benchmarking dataset tailored for speech recognition, representing a core challenge in the field of tiny deep learning. SpokeN-100 consists of spoken numbers from 0 to 99 spoken by 32 different speakers in four different languages, namely English, Mandarin, German and French, resulting in 12,800 audio samples.


Dörrich, M., Hecht, M., Fietkau, R., Hartmann, A., Iro, H., Gostian, A. O., … & Kist, A. M. (2023). Explainable convolutional neural networks for assessing head and neck cancer histopathology. Diagnostic Pathology18(1), 121.

In this paper, we investigate Convolutional Neural Networks for classification and semantic segmentation of head and neck cancer histopathology. Using Explainable AI, we show that the networks not only achieve high accuracy in predicting tumor tissue but also rely on features that align with the expert opinion of pathologists.

Wölfl, A. M., Schützenberger, A., Breininger, K., & Kist, A. M. (2023). Towards image-based laryngeal videostroboscopy using deep learning-enabled compressed sensing. Biomedical Signal Processing and Control86, 105335.
Article / PDF (avoid the paywall) / German Blog post

Laryngeal videostroboscopy is an essential diagnostic tool. However, it relies on an audio signal to determine the fundamental frequency F0. Here, we show that we are able to use deep neural networks to predict the relative glottal opening and compressed sensing to reconstruct the glottal area waveform (GAW), even when not fulfilling the Shannon-Nyquist criterion. The reconstructed GAW allows us to compute F0 and apply the video stroboscopic effect without any audio signal.

Neubig, L., Kist, A.M. (2023). Dataset Pruning using Evolutionary Optimization. Bildverarbeitung für die Medizin 2023.

The right number of data points to solve an image processing task is crucial. To analyze how many data points in a dataset are important and unique enough to support the learning process of a neural network, we used an evolutionary approach in constrained (fixed dataset size) and an unconstrained mode (flexible dataset size).

Neubig, L., Kist, A.M.(2023). Evolutionary Normalization Optimization Boosts Semantic Segmentation Network Performance. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023.
Article / Code

Is Batch Normalization always the best normalization method for a medical image segmentation task? We analyzed the influence of normalization methods in a U-Net layer-wise in a systematic study using evolutionary optimization and compared it to state-of-the-art models.

Groh, R. & Kist, A. M. (2023). End-to-end evolutionary neural architecture search for microcontroller units. IEEE COINS.

Here, we introduce our end-to-end evolutionary NAS (EvoNAS) for microcontroller units that optimize both, pre-processing and neural network architectures. Each neural network architecture is assessed using the multi-objective accuracy, memory footprint, inference time, and energy consumption, to derive a common performance measure to be maximized. To ensure immediate use of all potential solutions on the microcontroller environment, we create a software-hardware chain in which each neural network is deployed to measure the inference time and power consumption directly.

Dörrich, M., Fan, M., & Kist, A. M. (2023). Impact of Mixed Precision Techniques on Training and Inference Efficiency of Deep Neural Networks. IEEE Access.

Our brain works with fuzzy and low-precision logic. In this study, we investigate how mixed-precision techniques have an effect of the training and inference efficiency of encoder-decoder deep neural networks in a biomedical image segmentation task.

Kruse, E., Döllinger, M., Schützenberger, A., & Kist, A. M. (2023). GlottisNetV2: Temporal Glottal Midline Detection using Deep Convolutional Neural Networks. IEEE Journal of Translational Engineering in Health and Medicine.
Article / Code

Detecting the glottal midline accurately is crucial to assess quantitative parameters related to the symmetrical oscillation of the vocal folds. Here, we show how to use engineered neural networks to allow accurate midline detection simultaneously with glottal area segmentation.


Kist, A. M., Breininger, K., Dörrich, M., Dürr, S., Schützenberger, A., & Semmler, M. (2022). A single latent channel is sufficient for biomedical glottis segmentation. Scientific Reports12(1), 14292.
Article / Code / Data

In this paper, we show by mining an encoder-decoder deep neural network that a single latent channel image is sufficient for glottis segmentation. Further, we describe the function of the latent channel and how this affects downstream glottis segmentation.

Groh, R., Lei, Z., Martignetti, L., Li-Jessen, N. Y., & Kist, A. M. (2022). Efficient and Explainable Deep Neural Networks for Airway Symptom Detection in Support of Wearable Health Technology. Advanced Intelligent Systems, 2100284.
Article / Code

Airway symptom detection is crucial for monitoring chronic airway-related diseases. In this work, René is showing how data from neck surface accelerometers can be analyzed using a deep neural network in a computational constrained environment. He uses evolutionary neural architecture search to find an accurate, yet fast and deployable deep neural network for wearables.

Neubig, L., Groh, R., Kunduk, M., Larsen, D., Leonard, R., & Kist, A. M. (2022). Efficient Patient Orientation Detection in Videofluoroscopy Swallowing Studies. In Bildverarbeitung für die Medizin 2022 (pp. 129-134). Springer Vieweg, Wiesbaden.

Swallowing disorders are commonly examined using videofluoroscopy swallowing studies (VFSS). To comprehensively evaluate the swallowing process, a typical VFSS contains different patient orientations. Here, we show a systematic architectural scaling approach and found that an efficient ResNet18 variant is sufficient to classify a full VFSS recording of about 1800 frames in less than 14 s on conventional CPUs.


Kist, A. M., Dürr S, Schützenberger A, and Döllinger M. OpenHSV: An open platform for laryngeal high-speed videoendoscopy. Scientific reports, 11 (2021), 13760.
Article / Code / Docs / Award

Commercially available systems for laryngeal high-speed videoendoscopy have not been further developed lately, are closed-source, and have only very limited analysis capacities. With OpenHSV, we provide a novel, award-winning, open hard- and software platform with DNN-powered online analysis.

Kist, A. M., and Michael Döllinger. Efficient biomedical image segmentation on Edge TPUs. Accepted as a short paper at Medical Imaging with Deep Learning (MIDL), 2021

We highlight at MIDL our work on semantic segmentation using Edge TPUs.

Kist, A. M., Zilker J., Döllinger M., & Semmler M. Feature-based image registration in structured light endoscopy. Accepted as full paper at Medical Imaging with Deep Learning (MIDL), 2021
Article / Code

Structured light endoscopy is a 3D-imaging method. However, the assignment of a projected laser grid to its reference is still tricky. We propose a Deep Learning-based image registration approach that achieves 91% accuracy on an ex vivo dataset.

Kist, A. M., Gómez P., Dubrovskiy D., Schlegel O., Kunduk M., Echternach M., Patel RR., Semmler M., Bohr C., Dürr S., Schützenberger A., & Döllinger M. A Deep Learning Enhanced Novel Software Tool for Laryngeal Dynamics Analysis. J Speech Lang Hear R, 64 (6), 1889-1903.

The analysis of high-speed videoendoscopy data is crucial for voice quantification. In this paper, we describe the Glottis Analysis Tools (GAT). GAT has been actively developed in C# since 2010 and is used by dozens of labs worldwide.


Kist, A. M., and Michael Döllinger. Efficient Biomedical Image Segmentation on EdgeTPUs at Point of Care. IEEE Access 8 (2020): 139356-139366.

Deep neural networks are changing the way of biomedical diagnosis. For image segmentation, they can be very large and slow, especially on CPUs. In our recent studywe show that we can improve the glottis segmentation inference speed >79x fold by optimizing a popular biomedical segmentation network (U-Net) and porting it to the inexpensive EdgeTPU Hardware Accelerator.

Kist, A. M., Zilker, J., Gómez, P., Schützenberger, A., & Döllinger, M. (2020). Rethinking glottal midline detection. Scientific reports10(1), 1-15.
Article / Code

Symmetry is important in vocal fold motion. The identification of the glottal midline is crucial to deriving symmetry from the glottal area. Here, we evaluate different approaches to determine the glottal midline and suggest a multi-task architecture, GlottisNet, that predicts both simultaneously, glottis segmentation and glottal midline.

Gómez, P.*, Kist, A. M.*, Schlegel, P., Berry, D. A., Chhetri, D. K., Dürr, S., … & Döllinger, M. (2020). BAGLS, a multihospital benchmark for automatic glottis segmentation. Scientific data7(1), 1-12.
Article / Code / Dataset

Glottis segmentation is a key component for analyzing the vocal fold vibrations. With BAGLS, we provide the first open, multihospital dataset for training and evaluating deep neural networks.


Kist, A. M., & Portugues, R. (2019). Optomotor swimming in larval zebrafish is driven by global whole-field visual motion and local light-dark transitions. Cell Reports29(3), 659-670.

Larval zebrafish swim when they perceive a whole-field moving stimulus. However, the underlying features that drive optomotor swimming remain elusive. Here, we show that larval zebrafish are predominantly driven by local light-dark transitions.

Scroll to top