Software

Licensing

All project software is released as open-source under permissive licenses. Selected tools are distributed under Apache 2.0 and MIT licenses; some components use GPL-3.0 or public-domain dedication. See individual repositories for detailed licensing information.

Linux file tracer
Filip Krikava, Pierre Donat-Bouillud, Petr Adámek, Sebastián Krynski, Jan Vitek

The software is a tracer based on ptracer that can generate a log of all the files accessed by a program, and then create a docker image which the Ubuntu packages (and possibly R packages) required to run again the program.

Diaphanous: Transparency Disclosures About the Sexual Exploitation of Minors
Robert Grimm

This repository curates quantitative transparency disclosures about the online sexual exploitation of minors, i.e., people under the age of eighteen, in machine-readable form. It also includes a 4,400-line Python library for validating and tidying the data and Python as well as R notebooks with the analysis for the corresponding report "Putting the Count Back Into Accountability: An Analysis of Transparency Data About the Sexual Exploitation of Minors".

Dataset of Czech religious texts
Alexander Kovalenko, Daniil Pastukhov, Jan Vitek

This dataset contains a collection of religious texts in Czech (any type of religion) publicly available on the Internet. It has been curated from the SlimPajama-627B dataset using a classifier. The purposes of this dataset include exploring sentiments in religious texts and studying the evolution of religious sayings over time.

SSTT: a modular set-theoretic types library
Mickael Laurent

Library implementing set-theoretic types operations (set connectives, subtyping, constraint solving, etc.). This library is used for different research projects, in particular a type-checker for the language R developed at Czech Technical University, and Pysem, a Python type-checker under development at Université Paris-Saclay (France).

MLsem: a type-checker for dynamic languages
Mickael Laurent

A type-checking library for dynamic languages, implementating advanced set-theoretic typing techniques (type inference, type narrowing, ad-hoc and parametric polymorphism, etc.). It aims to demonstrate the effectiveness and usability of set-theoretic types for typing dynamic languages such as JavaScript, Python, or R.

Compile Server for R

A prototype for a new compiler infrastructure for the R programming language. Instead of embedding the compiler in the language runtime, we have developed a compiler-as-a-service, a client-server system that offloads client compilation requests to a server with feedback-driven optimizations for R.

Feedback-driven optimizations for R

Implementation of a proof-of-concept for keeping multiple feedback vectors, one per call context, and then using the newly available information for driving optimizations. It is done on top of RIR, a just-in-time compiler for the language R. The new branch also includes an in-house recording tool that enables fine-grained debugging of events in the virtual machine, such as function invocations, compilation, and deoptimization.

Shantay: EU DSA Transparency Database Processing Tool
Robert Grimm

Software tool that processes the EU's DSA Transparency Database, producing comprehensive reports with timelines.

Denicek
Tomáš Petříček

Computational substrate for end-user document-oriented programming implementing collaborative editing and programming by demonstration.

Smalltix
Joel Jakubovic

Smalltalk VM implementation via the filesystem based on identifying Unix executables with Smalltalk methods.