Why OCaml? - Steve Zdancewic

by Nihar Patil, Prateek Agarwal

It’s somewhat romantic to think that the best scientists, athletes, and performers were born with an innate talent for what they do. Now a top researcher in the field of computer security, Steve Zdancewic’s journey started with him asking his mom to type game programs from Compute!’s Gazette into his computer. He wasn’t sure what the code meant, but he was fascinated enough to start hacking at his computer.

 

He first learned to program with video games in TRS-80s (a dinosaur in the computer world) using the BASIC programming language. In high school, with hours spent behind the grayed computer screen, he built his own version of Q*bert for his computer science club.

My junior high school had computer literacy class, as computers were pretty new, but then they started a computer programming club. We made and played games - so that was fun. Like most other people who start to do a little bit of programming, it's really rewarding when you write some code and then it does something cool.

In a way, he started his career as a professor then, too. It was also one of his first opportunities to collaborate with others in the field of computer science.

 

Other than the peers in his club, Dr. Zdancewic attributes much of his initial interest in engineering to two exceptional mathematics teachers in high school. Together, the husband and wife taught him 5 years of mathematics from 7th through 11th grade, responsible for everything from pre-algebra to statistics. “I had this sort of mathematical bent,” he says, “and I knew I liked computers. I originally planned on doing computer engineering or chemical engineering because I had a great chemistry teacher in high school.” Watching his roommate solve interesting computer science problems brewed his interest. Dr. Zdancewic then took a discrete math course in the following semester, which all but sealed the deal.

 

Throughout his time as an undergraduate, he dipped his hands in the three areas that CS undergrads are drawn to - corporate internships, startups, and research experiences. He spent one summer programming at Merck, a pharmaceutical company headquartered in New Jersey, another at a start-up programming databases for web servers, and a third conducting research. After working in all three spheres, Dr. Zdancewic found research to be the most rewarding and thus went on to complete a Ph.D. in Computer Science at Cornell University. Now, Dr. Zdancewic is an incredible professor and lead researcher at the University of Pennsylvania.

 

Dr. Zdancewic currently conducts research in a complicated, unique part of the Deep Specifications project. The Science of Deep Specifications is a collaborative project with researchers from Penn, Princeton, Yale, and MIT that strives to formalize mathematical models to verify the correctness of key pieces of software.

“Today, software is becoming more and more outward facing, such as in critical situations like autonomous vehicles, flight control software, and medical devices. In each of these situations, you really want to be sure that certain code is absolutely correct.”

Dr. Zdancewic’s portion of the project works closely with LLVM (Low-Level Virtual Machine), which is an open source compiler for C programs started a while back at UIUC. It helps build the middle layers of a complete compiler system. Many years ago, Apple and other tech giants adopted LLVM into key pieces of their software, which accentuated the need for LLVM to function perfectly as intended. Dr. Zdancewic’s research group is building a mathematical model behind LLVM to make sure that it essentially does what it’s supposed to do - to take high-level C code, transform into LLVM, optimize, and then feed translated machine code to the computer. His research has the potential to use LLVM as a prevention mechanism for buffer-overflow and stack-smash attacks. We won’t get too deep into the specifics of LLVM, C, and compilers (though, if you’re interested, you can take CIS 341 - Dr. Z’s course on compilers).

 

The mathematical models that run LLVM are a core part of the Science of Deep Specifications. It’s a tough question of how to build mathematical models around programs and systems that have millions of moving parts. Dr. Zdancewic uses something that discrete math students would go crazy for - Coq’s theorem prover.

“When you build a mathematical model in the theorem prover, the structure of the proofs you develop in discrete math classes more or less corresponds to programs you can write in the style you would in programming classes.”

This theorem prover allows Dr. Zdancewic and his team to mathematically prove that software will perform as it’s intended to. He mentioned that “The proofs that we’re talking about are literally megabytes and megabytes long. And, they’re machine checked - so we know they’re correct.”

 

This doesn’t imply that we can prove that code is unbreakable - the theorem prover functions under very specific and well-defined assumptions.

“We make assumptions about the functionality of our hardware. For example, our proofs wouldn't help prevent an attack that arises because the processor heats up and cause a few bits of memory to flip. Our proofs give very high confidence and tell you which assumptions you are making very explicitly. Nevertheless, there are always going to be things that lie outside of that realm of assumptions.”

Dr. Zdancewic closed this topic with a high-level vision for the Deep Specifications project.

“The whole point of this deep specifications project is that we are proving properties of actual running software. We’d like to combine findings from groups at MIT, Yale, Princeton, and Penn to make some very secure and reliable verification tools hosted some sort of web server or key piece of infrastructure.”

Dr. Zdancewic translates his deep knowledge of compilers into his class, CIS 341, which revolves around how abstract aspects of programming languages like functions or variables are translated into executable assembly language. As Dr. Zdancewic likes to put it, “It's a great class because, in some ways, it removes what feels like magic. 341 peels back the layers and allow students to understand how code is compiled and executed.” In addition to teaching graduate 500-level courses, Dr. Zdancewic is very well known for teaching CIS 120 - Programming Languages and Technology. The first half of the course teaches computer science topics through OCaml, a fairly underground functional programming language, while the second half is taught through Java, one of the most popular object-oriented programming languages. So, why OCaml?

“Though OCaml is far from perfect, it has a lot of the properties that people who think about language appreciate. There's much more of a logical and mathematical structure to OCaml. For example, pattern matching corresponds case analysis in proofs, recursion in functional programming corresponds to induction, and generics really correspond to quantifiers you see in discrete math. For pedagogical purposes, I think OCaml has a lot of good benefits. Also, nobody knows it so we can start from a level playing field. It’s good to twist people’s minds and show them that they can do a lot of things in functional programming without having to think about comparative programming all the time.”

On top of its apparent value, Mr. Z loves teaching in general. To close out the interview, we asked what we had been wondering the most - “Why are you a professor?”


"Believe it or not, you really can tell when the lightbulb goes off. If you’re giving a lecture about recursion and somebody has that ‘oh, I finally get it’ moment, you can actually tell. It’s just really rewarding.”


Nihar Patil

Nihar is a sophomore majoring in Networks and Social Systems Engineering, and is the founder of PTR. You’ll either find him dancing with his friends in a giant carrot costume or playing guitar and writing code.

Prateek Agarwal

Prateek is a sophomore at Penn studying Computer Science, and is a writer and editor for PTR. He loves to play soccer, drums, and write. He also enjoys eating Nutella, going crazy in the woods, and the color red.