Engineering of Data Analysis Pipelines

ChRiGiD

The RiGiD project aims to develop a practical methodology for improving the reliability of data analysis pipelines written in R. Starting from the observation that computer science has failed to produce tools usable by working data scientists, the project combines large-scale empirical study of real-world code with the development of lightweight specification techniques — notably gradual typing — automated testing, and reproducibility support. The goal is not formal correctness but a pragmatic reduction in the incidence of errors across all stages of data analysis, from data acquisition and cleaning to statistical modeling and reporting.

An R&D project supported and funded by the Czech Ministry of Education, Youth, and Sports of the Czech Republic under the ERC CZ programme (grant LL2325) and conducted at the Faculty of Mathematics and Physics, Charles University.

Research Directions

Practical Data Science

Understanding how data scientists work and where things go wrong. Through user studies, interviews, and large-scale analysis of real-world R code, we catalog error patterns across all stages of the data analysis process — from data acquisition and cleaning to modeling and reporting.

Type Systems

Developing type systems that catch errors in dynamic languages without getting in the way. We explore gradual typing, set-theoretic types, and type inference techniques that accommodate the idioms data scientists actually use — data frames, implicit conversions, and multiple object systems.

Tools and Languages

Building the compilers, runtime systems, and development tools that make it all work. This includes JIT compilation for R, automated test generation, reproducibility infrastructure, and data lineage tracking for auditing results.

Project at a Glance

  • International research team (CZ, CH, PL, US, FR)
  • 9 peer-reviewed publications in 2025
  • Open-source software released under Apache 2.0, MIT, GPL-3.0, and public-domain licenses
  • Selected to host SPLASH 2027

Collaboration & International Engagement

The project maintains strong international collaboration across Europe, North America, and Asia. Nearly all publications are the result of cross-institutional research efforts.

We co-organize the International Summer School on Programming Languages (PLISS), an influential training event for early-career researchers. In 2027, we will host SPLASH, marking the first time this premier conference will take place in Central Europe.

We also actively host visiting researchers and organize joint research meetings between Charles University and Czech Technical University, strengthening the Czech programming languages research ecosystem.

R
+
%>%
R

News & Events

May 2026
PLISS 2026 International Summer School: 25–30 May, Bertinoro, Italy. Upcoming edition of the international summer school dedicated to programming languages and software systems, offering in-depth lectures and close interaction with leading researchers in the field.
Autumn 2025
SPLASH Conference Organization: Selected to host the ACM SIGPLAN International Conference on Systems, Programming, Languages, and Applications (SPLASH 2027) in Prague. This will be the first time the conference is held in Central Europe.
May 2025
PLISS 2025 International Summer School: 26–31 May, Bertinoro, Italy. Annual international event providing advanced training in programming languages and implementation. The programme brings together researchers and students for intensive lectures, interactive sessions, and exchange of current research ideas.
September 2023
PLISS 2023 International Summer School: 3–9 September, Bertinoro, Italy. An international summer school focused on programming language implementation, combining expert lectures, research-oriented discussions, and mentoring for students and early-career researchers. Organized in collaboration with King's College London, Czech Technical University, and Northeastern University.

Join

We are constantly looking for smart and enthusiastic people to work with. You can find more details regarding interesting project ideas here.

Possible openings might be available for Bachelor, Master, and PhD students as well as Post-Doc researchers. If you think you might be interested in joining us, feel free to contact via email at lucie.lerch@matfyz.cuni.cz.

When you send an email state briefly your past experience, publication track record, past supervisors, as well as attach a your CV. Please make sure to include answers for the following questions: What aspect of research is appealing to you? What is your career goal?