Bytes & Bites - LIVE: Updates and info from the Datahack

11-17 @ MF building, room A311
Published

April 24, 2026

Theme: Live, info

Today we are hosting our first ever datahack!

There are currently two datasets available and both have some additional contextual information on the project and topic.

Dataset 1 - Videos from de tweede kamer

This dataset is from Antonis Koutsoumpis from Management, Organisation and SBE. Download the data here or check out a shorter dataset here! The following is context from the researcher:

I would like to create a function that extracts jitter, shimmer, and harmonicity from audio files of continuous speech. Jitter, shimmer, and harmonicity should be extracted from sustained vowels, instead of continuous speech. To achieve the task from continuous speech, we need to identify parts in the continuous speech where participants sustained a vowel sound for some time (e..g, and ‘uh’, ‘ee’, ‘aa’, etc for a few milliseconds, e.g, at least 80 ms). In summary, the script should perform the following: 1) process an audio file and identify speech; 2) identidy parts of the speech where participants sustained a vowel sound (e.g., sustained the vowel ‘aa’ for at least 80 milliseconds); 3) extract jitter, shimmer, and harmonicity (e.g., using an open source voice analysis software such as OpenSmile, Praat, etc.) from those identified parts of the audio file; 4) average those values across the entire audio file per participant; 5) store the output in a csv file. A similar procedure is described in this paper: Nathan, V., Rahman, M. M., Vatanparvar, K., Nemati, E., Blackstock, E., & Kuang, J. (2019, November). Extraction of voice parameters from continuous running speech for pulmonary disease monitoring. In 2019 IEEE international conference on bioinformatics and biomedicine (BIBM) (pp. 859-864). IEEE. link

Dataset 2 - Crystal materials data

This dataset is from Senja Barthel et al. from the Maths department and is about materials science. The main topic of the data is crystalline structures. She gave this description of the project:

The idea of this research project was to use machine learning to investigate in how far the performance in terms ( e.g. gas adsorption (standard is nitrogen, carbon dioxide, methane, or heat capacity) of metal-organic frameworks is determined by the atomic composition of the materials (made in a lego-style fashion using organic lingers that are attached to metal centers), and in how far it is determined by the underlying crystallographic net.

Download the data here and check out the fuller documentation here!

Links for additional context: - Chemistry and Applications of Metal-Organic Frameworks - Crystallographic nets and their quotient graphs - wiki for metal organice frameworks - book for Introduction to Metal-Organic Frameworks - github link for the team

Updates

More info wil be added as it comes in

Pizza

Pizza will be served as always! We expect the delivery around 13:00-13:30 :)