Abstracting Denotational Interpreters: A Summary

I’ve been diving back into static analysis and programming language theory after a 4 year hiatus. Getting re-acquainted with papers in the field has been humbling, and can feel like solving a puzzle with infinite pieces. Writing and summarizing helps me make sense of it all, so I wanted to share my thoughts on a new paper I just read: “Abstracting Denotational Interpreters” by Graf, Peyton Jones, and Keidel. Here’s a summary of what I’ve been working through.

The problem: is one framework enough?

If you’ve built or used programming tools, you know there are two ways to understand program behavior:

Dynamic semantics: how code actually runs step-by-step, like watching it play out in a debugger. This is what happens when you run the program.
Static analysis: we can infer about code without running it, like catching bugs or unused variables. This is what a program could do if it ran.

These two approaches live in different worlds and are treated separately. In addition to that, proving soundness (e.g. that static analysis matches how programs really behave) can be a nightmare because of how complex an undertaking it is.

The answer: a unified framework

The paper proposes a clever idea: a denotational interpreter (i.e. a mathematical model of what a program means) that brings these two worlds together. Instead of writing separate tools for execution and analysis, they designed a single, flexible framework that (1) simulates how code runs (dynamic semantics) and (2) generates static analyses (e.g., type-checking, usage analysis to determine things like how often variables are used). This works because the framework builds traces, which are records of the steps a program takes. These traces are detailed enough to describe both execution and analysis—all using one, single foundation.

Why this is interesting

Flexibility: By tweaking the underlying “semantic domain” (how the framework interprets code), you can parameterize the interpreter, allowing you to switch between evaluation strategies like (1) eager execution (call-by-value), where everything is computed immediately, and (2) lazy execution (call-by-need), where computations are deferred until absolutely necessary. You can even use the same framework for entirely different static analyses, like tracking unused variables or type inference.
Simpler proofs, especially for lazy evaluation: One of the hardest parts of building static analyses is proving they’re sound (i.e., they never give incorrect results). This framework simplifies those proofs because both dynamic semantics and static analyses share the same underlying structure. You only need to prove small, reusable properties about the analysis domain. This is especially important for lazy (call-by-need) evaluation, where modeling the semantics correctly is difficult. This is because being lazy means we don’t do anything until we absolutely need to, so modeling this correctly is usually difficult.

Why this matters

This approach has interesting implications for anyone working on compilers, interpreters, or static analysis tools:

Efficiency: Summaries of traces make it easier to analyze large programs modularly, without having to re-analyze everything.
Scalability: By varying the “semantic domain,” you can get many different analyses or evaluation strategies without redesigning the interpreter. The same framework can support dynamic execution and multiple kinds of analysis, saving time and effort.
Better tools: Imagine compilers that optimize your code while also catching subtle bugs or suggesting refactors, all with minimal overhead. The traces generated using this approach can tell you practical things about program performance, like how often variables are evaluated.