Publications
2026
We revisit polymorphic records in the set-theoretic type algebra, allowing the fields and tail to feature Boolean combinations of row variables and types.
This paper presents key design decisions for a statically typed, high-level intermediate representation that directly supports optimizations such as specialization, devirtualization, inlining, and copy elimination.
This paper serves as a celebration of the twenty-fifth anniversary of Kuru Kuru Kururin. Although this video game is presented as a collection of two-dimensional puzzles based on rotation, it naturally invites players to complete its levels as quickly as possible. This has led to a surprisingly rich and challenging playing field to finding foremost temporal walks. In this work, we tackle this problem both in theory and in practice. First, we introduce a model for the game and provide an in-depth complexity analysis. Most notably, we show how each gameplay mechanic independently brings a layer of NP-hardness and/or co-NP-hardness. We also provide a pseudo-polynomial time algorithm for the general problem and identify several cases which can be solved in polynomial time. Along the way, we discuss connections to the more established framework of temporal graphs, both in the point model and the interval model. Then, we propose simple and flexible algorithmic techniques to reduce state space and guide the search, offering trade-offs between precision and computation speed in practice. These techniques were implemented and tested using a full recreation of the game physics and the levels from the original game. We demonstrate the efficiency of our framework in several settings - with or without taking damage, with or without unintended game mechanics - and relate empirical struggles which we encountered in practice to our complexity analysis. Our implementation is open source and fully available online, offering a novel and amusing setting to benchmark shortest path algorithms.
Set-theoretic types provide a rich type algebra that supports unrestricted unions, intersections, and negations, together with a decidable type constraint-solving algorithm known as tallying. These types are particularly well suited for typing dynamic languages, where functions often exhibit both generic and overloaded behavior. However, the complexity of their implementation has hindered their widespread adoption. In this paper, we introduce a modular representation for set-theoretic types and revisit the algorithms for subtyping and tallying. We compare our approach with the historical CDuce implementation and evaluate the performance impact of some optimizations and design choices.
With peak content moderation seemingly behind us, this paper revisits its punitive side. But instead of focusing on who is being (disproportionately) moderated, it focuses on the punishment itself and makes three contributions. First, it develops a novel methodology that combines auto-ethnography for collecting experiences and artifacts with procedural justice for analyzing them. Second, it reworks Foucault's model of the penal system for the algorithmic age, restoring the penal colony as the historically liminal practice between punishment as performance and punishment as discipline, i.e., the stochastic penal colony. Finally, it applies this methodological and conceptual framing to three case studies, one on the gallingly performative moderation by pre-Musk Twitter, one on the exhaustively punitive content moderation for OpenAI's DALLE~2, and one on the relatively light touch but still rather precious moderation by Pinterest. While substantially different, all three feature the pervasive threat of account suspension, thereby banishing users to the stochastic penal colony.
A lot of programming research shares the same basic motivation: how can we make programming easier? Alas, this problem is difficult to tackle directly. Programming is a tangle of conceptual models, programming languages, user interfaces and more and we cannot advance all of these at the same time. Moreover, we have no good metric for measuring whether programming is easy. As a result, we usually give up on the original motivation and pursue narrow tractable research for which there is a rigorous methodology. In this paper, we investigate the limits of making programming easy. We use a dialectic method to circumscribe the design space within which easier programming systems may exist. In doing so, we not only bring together ideas on open-source software, self-sustainable systems, visual programming languages, but also the analysis of limits by Fred Brooks in his classic "No Silver Bullet" essay. We sketch a possible path towards easier programming of the future, but more importantly, we argue for the importance of proto-theories as a method for tackling the original motivating basic research question.
In this paper, we formalize a type system based on set-theoretic types for dynamic languages that support both functional and imperative programming paradigms. We adapt prior work in the typing of overloaded and generic functions to support an impure λ-calculus, focusing on imperative features commonly found in dynamic languages such as JavaScript, Python, and Julia. We introduce a general notion of parametric opaque data types using set-theoretic types, enabling precise modeling of mutable data structures while promoting modularity, clarity, and readability. Finally, we compare our approach to existing work and evaluate our prototype implementation on a range of examples.
Large language models (LLMs) excel at writing code in high-resource languages such as Python and JavaScript, yet stumble on low-resource languages that remain essential to science and engineering. We introduce Agnostics, a language-agnostic post-training pipeline that eliminates per-language engineering. The key idea is to judge code solely by its externally observable behavior, so a single verifier can test solutions written in any language. Applied to five low-resource languages — Lua, Julia, R, OCaml, and Fortran — Agnostics improves Qwen-3 4B to performance rivaling other 16B–70B open-weight models, scales to larger and diverse model families, and for ≤16B parameter models, sets new state-of-the-art pass@1 results on MultiPL-E and a new multi-language version of LiveCodeBench.
2025
While program comprehension tools often use static program analysis techniques to obtain useful information, they usually work only with sufficiently scalable techniques with limited precision. A possible improvement of this approach is to let the developer interactively reduce the scope of the code being analyzed and then apply a more precise analysis technique to the reduced scope. This paper presents a new version of the tool Slicito that allows developers to perform this kind of exploration on C# code in Visual Studio. A common usage of Slicito is to use interprocedural data-flow analysis to identify the parts of the code most relevant for the given task and then apply symbolic execution to reason about the precise behavior of these parts. Inspired by Moldable Development, Slicito provides a set of program analysis and visualization building blocks that can be used to create specialized program comprehension tools directly in Visual Studio.
Copy-and-patch is a technique for building baseline just-in-time compilers from existing interpreters. It has been successfully applied to languages such as Lua and Python. This paper reports on our experience using this technique to implement a compiler for the R programming language. We describe how this new compiler integrates with the GNU R virtual machine, present the key optimizations we implemented, and evaluate the feasibility of this approach for R. Copy-and-patch also allows extensions such as integration of the feedback recording required by multi-tier compilation. Our evaluation on 57 programs demonstrates very fast compilation times (980 bytecode instructions per millisecond), reasonable performance gains (1.15x–1.91x speedup over GNU R), and manageable implementation complexity.
User-centric programming research gave rise to a variety of compelling programming experiences, including collaborative source code editing, programming by demonstration, incremental recomputation, schema change control, end-user debugging and concrete programming. Those experiences advance the state of the art of end-user programming, but they are hard to implement on the basis of established programming languages and system. We contribute Denicek, a computational substrate that simplifies the implementation of the above programming experiences. Denicek represents a program as a series of edits that construct and transform a document consisting of data and formulas. Denicek provides three operations on edit histories: edit application, merging of histories and conflict resolution. Many programming experiences can be easily implemented by composing these three operations. We present the architecture of Denicek, discuss key design considerations and elaborate the implementation of a variety of programming experiences. To evaluate the proposed substrate, we use Denicek to develop an innovative interactive data science notebook system. The case study shows that the Denicek computational substrate provides a suitable basis for the design of rich, interactive end-user programming systems.
Experimental theological exegesis connecting Gregory of Nyssa's interpretation of an Old Testament text with concepts from machine learning.
We present a new fuzzing technique for multithreaded C# programs running on the .NET platform. It is built upon the .NET Profiling library, supported by CLR (Common Language Runtime) on Windows, and uses configurable strategies for the fuzzing process. During execution of the subject program, the fuzzing algorithm controls thread scheduling and preemption through suspending and resuming threads at specific code locations that we call stop points. For the purpose of driving the fuzzing process, we have designed a hybrid systematic-random strategy that gradually finds yet unexplored thread schedules. Results of experiments with programs from the SCT benchmark collection show that our tool is able to find errors triggered by specific thread interleavings, and within practical time limits.
Ensuring reproducibility is a fundamental challenge in computational research. Reproducing results often requires reconstructing complex software environments involving data files, external tools, system libraries, and language-specific packages. While various tools aim to simplify this process, they often rely on user-provided metadata, overlook system dependencies, or produce unnecessarily large environments. We present r4r, a tool that automates the creation of minimal, user-inspectable, self-contained execution environments through dynamic program analysis techniques. r4r captures all runtime dependencies of a data analysis pipeline and produces a Docker image capable of reproducing the original execution. Although designed with first-class support for the R programming language, r4r also includes a generic fallback mechanism applicable to other languages. We evaluate r4r on a collection of R Markdown notebooks from Kaggle and find that it achieves exact reproducibility for 97.5% of deterministic notebooks.
Unix and Smalltalk are very different in the details, but bear curious similarities in their broad outlines. Prior work has made these comparisons at a high level and sketched a path for retrofitting Smalltalk's advantages onto Unix (without compromising the advantages of the latter). Everybody seems to agree on identifying the Unix file with the Smalltalk object, but this still leaves much unspecified. I argue that we should identify the Unix executable with the Smalltalk method. A Smalltalk VM implementation via the filesystem falls out quite easily from this premise; however, the severe overhead associated with Unix processes casts doubt on its practical realisation. Nevertheless, we can see several ways around this problem. The connection shows promise for realising the benefits of Smalltalk within Unix without sequestering the former in a hermetically sealed image and VM.
Compilers for dynamic languages often rely on intermediate representations with explicit type annotations to facilitate writing program transformations. This paper documents the design of a new typed intermediate representation for a just-in-time compiler for the R programming language called FIŘ. Type annotations, in FIŘ, capture properties such as sharing, the potential for effects, and compiler speculations. In this extended abstract, we focus on the sharing properties that may be used to optimize away some copies of values.
2024
Data lineage is a view over the whole data environment of a business company or government institution, which represents the flow of data values through the system. It helps people to navigate through all the data storages and data transformations, find the origin of a specific data value, or to ensure data consistency after updates. Manta Flow is an automated data lineage platform that supports many different technologies, including dialects of SQL and programs code written in general-purpose languages. In this paper, we focus on scanners that analyze programs in Java or C# and generate data flow graphs as output. We describe the process of their development and present the main concepts of the modular symbolic data flow analysis that we designed for this purpose. Then we also discuss technical challenges related to static analysis of real-world enterprise applications that we have faced, explain the key ideas of our current solutions, and share the main lessons learned within this project.
Julia is a modern scientific-computing language that relies on multiple dispatch to implement generic libraries. While the language does not have a static type system, method declarations are decorated with expressive type annotations to determine when they are applicable. To find applicable methods, the implementation uses subtyping at run-time. We show that Julia's subtyping is undecidable, and we propose a restriction on types to recover decidability by stratifying types into method signatures over value types---where the former can freely use bounded existential types but the latter are restricted to use-site variance. A corpus analysis suggests that nearly all Julia programs written in practice already conform to this restriction.
Modern software needs fine-grained compartmentalization, i.e., intra-process isolation. A particularly important reason for it are supply-chain attacks, the need for which is aggravated by modern applications depending on hundreds or even thousands of libraries. Object capabilities are a particularly salient approach to compartmentalization, but they require the entire program to assume a lack of ambient authority. Most of existing code was written under no such assumption; effectively, existing applications need to undergo a rewrite-the-world migration to reap the advantages of ocap. We propose gradual compartmentalization, an approach which allows gradually migrating an application to object capabilities, component by component in arbitrary order, all the while continuously enjoying security guarantees. The approach relies on runtime authority enforcement and tracking the authority of objects the type system. We present Gradient, a proof-of-concept gradual compartmentalization extension to Scala which uses Enclosures and Capture Tracking as its key components. We evaluate our proposal by migrating the standard XML library of Scala to Gradient.
Object-oriented programming languages typically allow mutation of objects, but pure methods are common too. There is great interest in recognizing which methods are pure, because it eases analysis of program behavior and allows modifying the program without changing its behavior. The roDOT calculus is a formal calculus extending DOT with reference mutability. In this paper, we explore purity conditions in roDOT and pose a SEF guarantee, by which the type system guarantees that methods of certain types are side-effect free. We use the idea from ReIm to detect pure methods by argument types. Applying this idea to roDOT required just a few changes to the type system, but necessitated re-working a significant part of the soundness proof. In addition, we state a transformation guarantee, which states that in a roDOT program, calls to SEF methods can be safely reordered without changing the outcome of the program. We proved type soundness of the updated roDOT calculus, using multiple layers of typing judgments. We proved the SEF guarantee by applying the Immutability guarantee, and the transformation guarantee by applying the SEF guarantee within a framework for reasoning about safe transformations of roDOT programs. All proofs are mechanized in Coq.
Alarmist and sensationalist statements about the "explosion" of online child sexual exploitation or CSE dominate much of the public discourse about the topic. Based on a new dataset collecting the transparency disclosures for 16 US-based internet platforms and the national clearinghouse collecting legally mandated reports about CSE, this study seeks answers to two research questions: First, what does the data tell us about the growth of online CSE? Second, how reliable and trustworthy is that data? To answer the two questions, this study proceeds in three parts. First, we leverage a critical literature review to synthesize a granular model for CSE reporting. Second, we analyze the growth in CSE reports over the last 25 years and correlate it with the growth of social media user accounts. Third, we use two comparative audits to assess the quality of transparency data. Critical findings include: First, US law increasingly threatens the very population it claims to protect, i.e., children and adolescents. Second, the rapid growth of CSE report over the last decade is linear and largely driven by an equivalent growth in social media user accounts. Third, the Covid-19 pandemic had no statistically relevant impact on report volume. Fourth, while half of surveyed organizations release meaningful and reasonably accurate transparency data, the other half either fail to make disclosures or release data with severe quality issues.
Just-in-time compilers enhance the performance of future invocations of a function by generating code tailored to past behavior. To achieve this, compilers use a data structure, often referred to as a feedback vector, to record information about each function's invocations. However, over time, feedback vectors tend to become less precise, leading to lower-quality code – a phenomenon known as feedback vector pollution. This paper examines feedback vector pollution within the context of a compiler for the R language. We provide data, discuss an approach to reduce pollution in practice, and implement a proof-of-concept implementation of this approach. The preliminary results of the implementation indicate ∼30% decrease in polluted compilations and ∼37% decrease in function pollution throughout our corpus.
Large-scale software repositories are a source of insights for software engineering. They offer an unmatched window into the software development process at scale. Their sheer number and size holds the promise of broadly applicable results. At the same time, that very size presents practical challenges for scaling tools and algorithms to millions of projects. A reasonable approach is to limit studies to representative samples of the population of interest. Broadly applicable conclusions can then be obtained by generalizing to the entire population. The contribution of this paper is a standardized experimental design methodology for choosing the inputs of studies working with large-scale repositories. We advocate for a methodology that clearly lays out what the population of interest is, how to sample it, and that fosters reproducibility. Along the way, we discourage researchers from using extrinsic attributes of projects such as stars, that measure some unclear notion of popularity.
2023
Most code is executed more than once. If not entire programs then libraries remain unchanged from one run to the next. Just-in-time compilers expend considerable effort gathering insights about code they compiled many times, and often end up generating the same binary over and over again. We explore how to reuse compiled code across runs of different programs to reduce warm-up costs of dynamic languages. We propose to use speculative contextual dispatch to select versions of functions from an off-line curated code repository. That repository is a persistent database of previously compiled functions indexed by the context under which they were compiled. The repository is curated to remove redundant code and to optimize dispatch. We assess practicality by extending Ř, a compiler for the R language, and evaluating its performance. Our results suggest that the approach improves warmup times while preserving peak performance.