Certification of a Garbage Collector for a Purely Functional

Preparing to load PDF file. please wait...

0 of 0
100%
Certification of a Garbage Collector for a Purely Functional

Transcript Of Certification of a Garbage Collector for a Purely Functional

Certification of a Garbage Collector for a Purely Functional Language
Anshuman Mohan
Capstone Final Report for BSc (Honours) in Mathematical, Computational, and Statistical Sciences
2017

Acknowledgements
I would like to thank my supervisor, Dr Aquinas Hobor, for guiding me through what proved to be a steep learning curve. He was very generous with his time and his patience, and made this project a lot of fun.
I also owe a debt of gratitude to Shengyi Wang, who helped me on various occasions with the technical aspects of the VST and CompCert environments.
Finally, I’m grateful to my family, friends, and classmates, who tolerated animated and unasked-for explanations of garbage collection, often over otherwise pleasant meals.
1

Summary
In this project, we take steps towards the certification of a garbage collector that services the Gallina programming language. While a working knowledge of the C programming language and the Coq proof assistant would be helpful when reading this report, this is not essential.
In the first section, we introduce the concepts of certification and garbage collection at a high level. In the next section, we go into the specifics of our project, explaining the motivation and context of our work. We also briefly discuss some related research. In section 3, we provide an overview of the most common garbage collection algorithms, and then study the key functions of our actual garbage collector using helpful diagrams. At this point, the reader is expected to be well-versed with the problem at hand.
In section 4, we explain the steps we have taken towards certifying our garbage collector. We study key definitions from our certification of the garbage collector, and justify some of our decisions. We end with a substantial sketch of the next steps in our certification. Section 5 is where we discuss the challenges we faced in our certification, and how we resolved some of them.
2

Author’s Contributions
This project constituted two broad parts. To begin with, the author needed to gain an in-depth understanding of the garbage collector he was certifying, gc.c. Authored by members of the CertiCoq project, this garbage collector is an intricate piece of code, and the author needed to reason about its various moving parts from a formal methods perspective. This discussion, spread out over Sections 3.1, 3.2, and 4.1, is the author’s work.
The next step was the certification of the garbage collector. The discussion in Section 4.2 relies heavily on a Coq file, gc_mathgraph.v. While this file was written as part of this project, most of the code in it was the work of Dr Aquinas Hobor, the author’s supervisor. The author has spent considerable time developing a strong grasp of this code, and the explanations presented in Section 4.2 and 4.3 are his work. The simple example presented in Section 2.4 is also by the author, and was adapted from other simple examples available in the source distribution of VST.
Finally, the various challenges and workarounds presented in Section 5 raise important questions about the garbage collector code itself, and about the CompCert translation pipeline. The issues were discovered by the author and his supervisor, and the explanations presented here are by the author.
3

Contents

1 Introduction

6

1.1 Certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.2 Garbage Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Our Working Environment

8

2.1 CertiCoq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 CompCert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Verified Software Toolchain . . . . . . . . . . . . . . . . . . . . . . . . 10

2.4 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Our Project, and Related Works . . . . . . . . . . . . . . . . . . . . . . 18

3 Our Garbage Collector

21

3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.2 Finer Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 objects, spaces, heaps, and args . . . . . . . . . . . . . . . . . . 25

3.2.2 forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2.3 forward_roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.4 do_scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.5 do_generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2.6 garbage_collect . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Steps Towards Certification

35

4.1 A Move Towards Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2 Finer Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2.1 vertices, spaces, and graphs . . . . . . . . . . . . . . . . . . . . 40

4.2.2 the class GC Graph . . . . . . . . . . . . . . . . . . . . . . . . . 41

4

4.2.3 guarded Cheney morphism . . . . . . . . . . . . . . . . . . . . . 42 4.2.4 stepish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 4.2.5 permissible pointers . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3 Our Overall Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Challenges of Certification

49

5.1 Bugs in the Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.2 C Semantics Undefined in CompCert C Light . . . . . . . . . . . . . . . 52

5.3 Making Accommodations for the Code . . . . . . . . . . . . . . . . . . . 54

6 Conclusion and Further Steps

57

5

1 Introduction
1.1 Certification
The field of formal methods aims to look at computer programs through a mathematical lens. While the casual programmer may be confused at the need to ruin computer science with mathematics, the benefits of introducing formal mathematical techniques are great and many. As in other scientific fields such as Physics and Chemistry, mathematical techniques o er computer programs rigor, reliability, and correctness. Formal techniques include verification, model checking, and type systems.
A specification is a mathematically precise claim about what a computer program does. It can be thought of as a bespoke set of expectations we lay out for the program. A certification, then, is a machine-checked proof that a program actually behaves according to its specification.
As we will show in a later section, certifying a piece of code can be labor-intensive. Because of this, we usually only certify those portions of our code which are either most likely to fail, or are most prone to attack.
1.2 Garbage Collection
During its runtime, a computer program repeatedly requests the underlying operating system for memory. The OS fulfils these requests if it can, and then the program uses the blocks of memory as needed. However, at a given point in time, not all memory blocks are important to the program. In fact, the program frequently divests itself from those sections of memory that it no longer needs, actually making it impossible for itself to access them at a later time. These demonstrably unreachable sections of memory are called garbage.
A garbage collector is a lightweight program which can be thought of as a service
6

o ered by a programming language’s runtime environment. It runs in the background of the main program and, when invoked, “collects” the garbage produced by the program. The specific actions undertaken at collection vary by algorithm, and we will look at these in a later section, but the basic idea is the same: when the OS next receives a request for memory, it can satisfy the request by supplying a “garbage” block and encouraging overwriting or reuse.
Garbage collectors are intricate and complicated pieces of code. This is why computer scientists who design compilers and runtime systems typically take care to certify their garbage collectors.
7

2 Our Working Environment
Our project sits quite snugly between a group of ongoing research projects. In this section, we provide an overview of the environment which we are entering, and then present an illustrative example. We end by discussing some research projects that do related work.
2.1 CertiCoq
The CertiCoq project is building a certified compiler and runtime system for “dependentlytyped, functional languages”. [9] Compilation can be thought of as a complex translation of a program in the source language into an equivalent program in the target language. This translation is often achieved via a series of intermediate translations. For a compiler to be certified means that every transformation or translation it makes during the journey from the given source code to the destination code needs to be proved correct using formal methods. That is to say, the program produced at the end must “behave exactly as prescribed by the semantics of the source program.” [10]
The CertiCoq project is most interested in building a certified compiler for Gallina, which is the specification language at the heart of the Coq Proof Assistant. They have built a compiler which accepts Gallina code, and, via translations through five intermediate languages, produces code in the language C Light. The task of translating the C Light program to the eventual target language, Assembly, is left to another project. We will discuss this next step in the section that follows.
Coming back to CertiCoq, there is a problem: Gallina assumes that it has infinite memory, and has undefined behavior in case it runs out of memory. Of course, we know that there is no such thing as infinite memory. When CertiCoq compiles Gallina code to C Light, the equivalent C Light does not, and should not, also assume that it has infinite memory. Gallina lives in a fool’s paradise, and it falls on the C Light code to support
8

the fiction of infinite memory. This is where garbage collection comes in. As should be apparent from our earlier discussion, a good garbage collector reuses memory, and therefore sustains the illusion of infinite memory for longer.
To this end, the CertiCoq project has written and tested a garbage collector, written in C Light, which can service the C Light programs produced by CertiCoq. It is called gc.c and is available digitally as an appendix. Further, the C Light program that CertiCoq produces after compilation contains explicit calls to this garbage collector at opportune moments. By garbage-collecting the C Light code that it produces, CertiCoq can sustain the illusion of infinite memory, and thus try to sustain the notion that the C Light code it produces on compilation behaves exactly as the Gallina code it accepts. In this project, we have undertaken the hefty task of certifying CertiCoq’s garbage collector.
2.2 CompCert
CompCert [10] is a compiler for the C programming language. Not all the functionality of C99 is supported, but a large subset is. As with CertiCoq, the compiler comes with a formal verification proving that it does not introduce any compilation-level bugs in the process of producing Assembly code. This means that if we supply CompCert with a source program with behavior that is well-defined in C, CompCert will produce a program in Assembly that “behaves exactly as prescribed by the semantics of the source program.” [10]
Importantly, CompCert does not claim to perform formal verification of the C source code itself. When given a program written in C, CompCert first translates the program into a smaller subset of C, called C Light, and then creates an abstract syntax tree (AST) of the C Light code. It is this AST that CompCert then goes on the certify formally. After this, the route from the C Light AST to the Assembly code is long and arduous – it involves
9
ProgramGarbage CollectorProjectCodeAuthor