#
Embedding Imperative Programs in LTL
(Nov 28, 2015)

In this post I’ll describe how we embed certain classes of imperative programs into LTL, the logic we use to model quadcopter behavior. Doing so is key to being able to reason about how concrete quadcopter-controller implementations will behave in the context of the copter’s physical dynamics.

### Introduction - The Problem

Our VeriDrone model of hybrid systems is built on linear temporal logic (LTL). In LTL, we express valid transitions of the system using relations between a pre-state (before the transition occurs) and a post-state (after the transition occurs). This description of transitions is very expressive; for example, it allows us to state interesting transitions such as evolution of the system by differential equations without extending our formalism. However, this power also introduces a mismatch between the expressiveness of the model and the actual software that controls the system: there are many logical expressions in LTL that don’t obviously correspond to valid programs, and we must also account for the difference between the floating-point computations done in the controller software and the physical behavior of the drone, which is modeled in real numbers. In this post, we will discuss how we address this mismatch by demonstrating a general technique for embedding programs (expressed using standard programming-language techniques) in LTL formulas.

### Our Language

In our case, we’re interested in a simple imperative language. It can be thought of as a small subset of C, without function-calls, pointers, or loops: we can manipulate floating-point values, assign them to variables, and branch on their values. The exclusion of loops and recursion is crucial: the floating-point analysis techniques we use here to understand the errors induced by the programs’ floating-point computation cannot be easily applied to loops (doing so naively would produce infinitely large error estimates, which is unhelpful.)

#### Syntax

We define the syntax of the programming language as the following type:

Here, `Seq`

is sequencing operator (“semicolon”); `Skip`

is an
operation with no effect (“no-op”); `Asn`

assigns a value to a variable
(where the variable is referenced by a `string`

name).
`Ite`

implements a particular type of branching (“if-then-else”)
behavior: if the given expression evaluates to a value less than zero, the
first of the two given commands is run; otherwise the second is run.

`cexpr`

is a simple expression language
with floating-point variables and constants, as well as floating-point addition,
subtraction, and multiplication.

The function `cexprD`

(the “denotation function”) runs an expression
and returns the result. That is to say, we give a
*denotational semantics* for our expression language. We omit the definition of
`cexprD`

for brevity, but here its type:

`cexprD`

returns `None`

if evaluating the expression “crashes”.
In the language described here, this can only happen in the case
of attempting to look up an undefined variable.

#### Program State

Shortly, we’ll define the semantics of programs (inhabitants of the `cmd`

type). To do so, however, we must first formalize a model of program state.
For simplicity, we work in a language where all variables have the
same type. Because we care about modeling the real-time computations
performed by VeriDrone’s controller, the variables are all floating-point
numbers (we build on top of the Flocq
formalization of floating-point numbers); specifically, the model of
IEEE-754 double-precision floating-point contained therein.

Programs have finite state (in the sense of having finitely many variables), so we use an association list:

This definition is relatively easy to reason about on its own; however, the
representation is not canonical.
This lack of canonicity means that we need to define our own
notion of what it means for `fstate`

s to be *semantically* equal.
For example the following two `fstate`

s are semantically equal,
but not syntactically equal (Coq’s default definition):
`[("a",1.0);("b",2.0)]`

and `[("b",2.0);("a",1.0)]`

.

The equivalence we care about is:

That is, two states `st`

and `st'`

are equivalent in the necessary
sense if (and only if), for each variable, either

- Looking up the variable in both states fails, or
- Looking up the variable in both states succeeds, and the values read are equal as PL variables.

In order to embed a language, we need to show (among other things) that this notion of equality is respected by that particular programming-language semantics.

The type of states used in the implementation of LTL that VeriDrone runs on
is somewhat different from the imperative language’s `fstate`

. Here it is:

This is not a finite map, since all states must assign values to all
variables. In practice, we only care about a subset of the variables;
the rest are essentially populated with default, dummy values. A result
of this type mismatch is that many functions and theorems need to
carry around a `list`

of `string`

s describing what variables
are considered relevant to the program at hand.

#### Semantics

To embed programs in LTL, we need a semantics for the language
we’re embedding. In this case we follow the standard approach of giving
the language a *big-step operational semantics*
in terms of an inductively-defined relation that describes the transitions
of entire programs in terms of their starting and ending states.
(Note that, because the semantics is total and deterministic.
we could have just given it as a function. However, leaving it as a relation
makes it easier to extend the system to nondeterministic or nontotal programs,
though doing so in our case is future work.)

We encode the semantics of our language as an inductive relation over
`(fstate, cmd, fstate)`

triples.
`evals st p st'`

states that when the program `p`

is run
from the state `st`

it will terminate in the state `st'`

.

Formally, here is the semantics for our command language. This is a relatively
standard semantics, which I’ve already described at a high level above, so I
won’t go into further detail here. One noteworthy thing about this semantics
is that it can “crash” if the evaluation of an expression fails
(`cexprD`

returns `None`

). Crashing is represented by a
lack of transition in the `eval`

relation.

(`fzero`

is shorthand for the floating-point constant 0, whose full name
is annoying to type out in Coq. `<`

and `gt`

are used to abbreviate
floating-point comparison functions.)

### Embedding

Embedding our simple language inside of LTL requires us to turn a
program - expressed as a syntactic object, with semantics provided by
an inductive relation, as above - into a formula describing the
*entire execution of the program* as a single state-transition; that is,
essentially, a predicate over `(s1 : state, s2 : state)`

pairs, which is
satisfied for a given program `p`

if and only if `eval s1 p s2`

is satisfied.

We use the following module to build embeddings:

Embeddings will have (essentially) the following data type:

The first argument is a list of variables to consider when embedding the
program, needed for the technical reason mentioned above. The second
argument is the program itself.
I call the type of programs `ast`

rather than `cmd`

here,
to emphasize the fact that we seek to be parametric in the
particular syntax and semantics that the program is expressed in
(recall that the goal is a general solution, not a solution tailored to a
particular language or semantics). In practice this means that we can’t
ever do an explicit pattern-match on the constructors of `ast`

.

However, there are some wrinkles: first, as mentioned, LTL expresses
its states as functions (`string -> R`

) giving a value to every variable;
our imperative language uses a finite map that only gives values to variables
used in the program; we deal with this by passing around variable-lists
as described above.

Second, and more importantly, our implementation of LTL is a logic over real numbers, in order to make it easier to reason about the physical behaviors of our system. Thus we’ll need to make a formal connection between the floating-point computations of our langauge and the real-valued computations expressed in the logic. The gory details of this connection are beyond the scope of this post, but I’ll touch on this topic again a bit later on.

#### Correctness of Embedding

What does it mean for our definition of embedding to be correct?
Obviously there are many functions that satisfy the `embedding`

type above. In order to express what we mean by a *correct* embedding
function, we first need to define a predicate relating floating-point
program states to real-valued LTL states. Doing so will allow us to
make a formal connection between evaluations of concrete programs
and the evaluation of LTL logical formulas.

In our case, `pl_data`

is an alias for the type of floating-point numbers
(`float`

); `asReal f r`

expresses the fact that, when the
value `f`

is converted to a real number, the conversion succeeds
(in our case, this means that the floating-point value
retrieved was not Inf or NaN, so-called “exceptional values”
in floating-point), and yields the number `r`

.

`istate`

- “implementation state” - and `ls`

- “logical state”,
are respectively `fstate`

and `state`

in our case.
`fm_lookup`

simply looks up a variable in an
`istate`

, returning `None`

if the lookup fails.

Informally, this definition of `models`

imposes
the following requirement on all variables:

- For each variable
`v`

that we care about (i.e., variables in the list`vars`

) the variable should be present in`ist`

, and the result of looking up that variable should be a real number`d`

that equals the result of looking up the same variable`v`

in the real-valued state. That is, the floating-point and real-valued states should “agree” on these variables. - For all other variables, we require that they not be present
in the state
`ist`

(looking them up in the state should fail).

Armed with this definition, we can express the correctness property that we want to hold over our candidate function for embedding programs into LTL. What we’d like to have is, essentially, the following diagram “commute”, in that the result of taking the two paths shown (labeled “Path 1” and “Path 2”) should be the same.

Informally, suppose we have a program `p`

and we know that
`eval fs1 p fs2`

. Then the embedding of `p`

(informally, `embed p`

, omitting the variable-list argument
for brevity) should be able to take a step from `rs1`

to
`rs2`

(according to LTL evaluation semantics, written as
`eval_formula`

in the diagram), where `rs1`

is a real-valued
LTL state modeling the floating-point state `fs1`

according to the `models`

relation; and likewise for
`rs2`

and `fs2`

.

Writing this out formally gives us:

It’s worth noting here that some of the primitives (`LTL.*`

in particular)
have slightly different names in our Coq development; I’ve renamed them
here for clarity. `stream`

is an infinite (co-inductive) list;
`LTL.eval_formula`

uses this type in order to be able to express
temporal properties about potentially infinite runs of systems.

We also want to make sure that crashing behavior is preserved: if the source program crashed (i.e., can’t take a step), its corresponding LTL formula should not be able to take a step. (We say that the LTL formula is not “enabled”, to borrow a term often used in the TLA variant of LTL).

Written out formally, we have:

A correct embedding function must meet *both* of these conditions.

#### Our Embedding Function

We use the following embedding function, which we’ve proven correct (in the sense of proving that it meets both of the correctness conditions spelled out above). For details of the proof, the interested reader can consult the following excerpt from our Coq development (the full development is not yet publicly available).

Here we use the `LTL.Embed`

construct to
embed a step-predicate expressed in Coq’s logic (that is, of type
`TLA.state -> TLA.state -> Prop`

). In this case, we embed a
formula that requires the existence of appropriate “concrete states”
or “implementation states” (which have type `istate`

in general;
or `fstate`

in our case), such that the first one models the
initial TLA state, the second models the second, and the program
steps from the initial to the final concrete state according to the
evaluation semantics for the language.

However, it is worth noting that this definition of embedding is only correct if the evaluation relation giving semantics to the embedded programming language is deterministic (i.e., the language is a deterministic language). The problem becomes more complicated in the face of nondeterminism: consider, for instance, a program that nondeterministically fails. It is less obvious how to construct an embedding function that will preserve this behavior in LTL. The implementation given above would simply throw away failures/”crashes” and only preserve successful runs, but in practice we care greatly about cases where the system crashes (because we must prevent them). Since we do not need to support writing quadcopter controllers in a nondeterministic source language, the definition given here suffices.

### Floating-Point Reasoning

Of course, there’s still one missing piece here: making a formal connection
between the real-valued arithmetic used in our LTL model and the
floating-point arithmetic that controller programs (`cmd`

s) run on.
We use a sound, interval-based overapproximation to safely capture
floating-point error. However, that’s a story for another day.
The interested reader can consult the
implementation.

### Source

The full source code for this project isn’t yet available, but I’ve created two Gists that contain files relevant to this post.

- Embed.v has general definitions related to embedding.
- FloatEmbed.v implements the Embed abstraction for our floating-point language

### Acknowledgements

I’d like to acknowledge Gregory Malecha, with whom I worked on this project, and who helped to refine this blog post, and Sorin Lerner, my advisor.