1 00:00:09,519 --> 00:00:14,479 *applause* 2 00:00:14,479 --> 00:00:16,930 Peter Sewell: Thank you for that introduction and to the organisers for 3 00:00:16,930 --> 00:00:23,459 this opportunity to speak to you guys. So my name is Peter Sewell. I'm an academic 4 00:00:23,459 --> 00:00:30,609 computer scientist and as such, I bear a heavy burden of guilt for what we have to 5 00:00:30,609 --> 00:00:38,040 work with. Shared guilt, to be fair, but guilt all the same. In many, many years of 6 00:00:38,040 --> 00:00:44,670 higher education, I know two things about computers. Maybe you've figured them out 7 00:00:44,670 --> 00:00:55,000 slightly faster. The first thing, there's an awful lot of them about. Shocking. And 8 00:00:55,000 --> 00:01:03,300 the second is that they go wrong. They go wrong a lot. Really a lot. A lot. A lot! 9 00:01:03,300 --> 00:01:09,009 And they go wrong in many interestingly different ways. So sometimes they go wrong 10 00:01:09,009 --> 00:01:17,850 in spectacular ways, right? With fireworks. And here, we see... 11 00:01:17,850 --> 00:01:24,030 *laughter**applause* 12 00:01:24,030 --> 00:01:31,260 Here we see half a billion dollars worth of sparkle. You don't get that every day. 13 00:01:31,260 --> 00:01:34,970 And so what was going on? So as I understand it, what was going on was that 14 00:01:34,970 --> 00:01:40,600 the - this was the first flight of the Ariane 5 launcher and it reused some of 15 00:01:40,600 --> 00:01:45,851 the guidance software from Ariane 4. That's fine. Good, tested code. Should 16 00:01:45,851 --> 00:01:53,030 reuse it. But the launch profile for Ariane 5 went sideways a bit more - on 17 00:01:53,030 --> 00:01:58,830 purpose - than the Ariane 5 launch. And as a result, one of the sideways values in 18 00:01:58,830 --> 00:02:04,740 the guidance control software overflowed. Bad. And hence, the guidance software 19 00:02:04,740 --> 00:02:08,449 decided that something bad was going wrong and in order to not land rockets on 20 00:02:08,449 --> 00:02:15,841 people's heads, it kindly blew itself up. Okay. So that's a spectacular. Sometimes 21 00:02:15,841 --> 00:02:21,200 they go wrong in insidious ways. And you people probably know much more about that 22 00:02:21,200 --> 00:02:29,370 than me. Very insidious ways. Exploited by bad people to our detriment. And they go 23 00:02:29,370 --> 00:02:36,990 wrong in insidious ways, exploited by our kindly governments for our protection. 24 00:02:36,990 --> 00:02:44,959 Right? And these - there are many, many, many different causes of these things but 25 00:02:44,959 --> 00:02:49,420 many of these causes are completely trivial. You know, you have one line of 26 00:02:49,420 --> 00:02:56,150 code in the wrong place - something like that, right? So that's very, very bad and 27 00:02:56,150 --> 00:03:01,040 you're all conscious of how bad that is. But from my point of view, this is as 28 00:03:01,040 --> 00:03:06,959 nothing with the real disaster that we have been facing and continue to face. 29 00:03:06,959 --> 00:03:14,950 Right? The real disaster is the enormous waste of human life and energy and effort 30 00:03:14,950 --> 00:03:21,360 and emotion dealing with these crap machines that we've built. Right? And it's 31 00:03:21,360 --> 00:03:25,489 hard to quantify that. I did a little experiment a couple of months ago. I 32 00:03:25,489 --> 00:03:34,030 Googled up "Android problem" and "Windows problem" and got about 300 million hits 33 00:03:34,030 --> 00:03:43,840 for each, right? Somehow indicative of the amount of wrongness. So this should you 34 00:03:43,840 --> 00:03:49,790 seem kind of familiar to you. If you think back - think back to when you were young. 35 00:03:49,790 --> 00:03:55,700 Maybe back in the 19th century when you were young, and what were you doing then 36 00:03:55,700 --> 00:04:00,940 as a community of hackers and makers and builders? Well, this was the full-on 37 00:04:00,940 --> 00:04:07,130 industrial revolution. You were building stuff. You were building beautiful bridges 38 00:04:07,130 --> 00:04:14,739 of cast-iron and stone with numerous arches and pillars. And every so often 39 00:04:14,739 --> 00:04:22,300 they would fall down. I believe someone miscalculated the wind loading. And we 40 00:04:22,300 --> 00:04:30,630 would build magical steam engines to whisk us away. And every so often, they would 41 00:04:30,630 --> 00:04:35,770 blow up. And that kind of thing, as you may have noticed in the last hundred years 42 00:04:35,770 --> 00:04:41,810 - sorry - that kind of thing doesn't happen very often any more, right? We have 43 00:04:41,810 --> 00:04:46,990 pretty good civil and mechanical engineering. And what does it mean to have 44 00:04:46,990 --> 00:04:52,620 good civil and mechanical engineering? It means that we know enough thermodynamics 45 00:04:52,620 --> 00:04:56,750 and materials science and quality control management and all that kind of thing to 46 00:04:56,750 --> 00:05:01,830 model our designs before we build them and predict how they're going to behave pretty 47 00:05:01,830 --> 00:05:10,750 well and with pretty high confidence, right? I see an optimisation in this talk. 48 00:05:10,750 --> 00:05:15,340 For computer science, for computing systems we can predict with 100 per cent 49 00:05:15,340 --> 00:05:20,450 certainty that they will not work. Ah, time for lunch! 50 00:05:20,450 --> 00:05:26,509 *applause* 51 00:05:26,509 --> 00:05:33,330 But it's not a very satisfying answer, really. So why is it so hard? Let me 52 00:05:33,330 --> 00:05:38,500 discuss a couple of the reasons why computing is tricky. In some sense, 53 00:05:38,500 --> 00:05:45,430 trickier than steam engines. The first reason is that there is just too much 54 00:05:45,430 --> 00:05:50,360 code. Right? So this is one of the more scary graphs I've found on the internet 55 00:05:50,360 --> 00:05:55,470 recently. This is a measure of code in hundreds of millions of lines. Each of 56 00:05:55,470 --> 00:06:00,250 these blocks I think is 10 million lines of code. And in the little letters - you 57 00:06:00,250 --> 00:06:04,860 might not be able to see we have Android and the Large Hadron Collider and Windows 58 00:06:04,860 --> 00:06:12,820 Vista and Facebook, US Army Future Combat System, Mac OS and software in a high-end 59 00:06:12,820 --> 00:06:15,010 car. Right? 60 00:06:15,010 --> 00:06:23,230 *laughter**applause* 61 00:06:23,230 --> 00:06:25,320 I'm flying home. 62 00:06:25,320 --> 00:06:27,810 *laughter* 63 00:06:27,810 --> 00:06:33,520 So we that's a bit of a problem, really. We're never going to get that right. Then 64 00:06:33,520 --> 00:06:37,410 there's too much legacy complexity there, right? Really too much. So as 65 00:06:37,410 --> 00:06:41,840 right-minded, pure-minded people you might think that software and hardware should be 66 00:06:41,840 --> 00:06:46,490 architected beautifully with regular structure a bit like a Mies van der Rohe 67 00:06:46,490 --> 00:06:53,660 skyscraper or something. Clean. Okay. You know that's not realistic, right? You 68 00:06:53,660 --> 00:06:59,470 might expect a bit of baroqueness. You might expect maybe something more like 69 00:06:59,470 --> 00:07:02,680 this. You know, it's a bit twiddly but it still hangs together. It's still got 70 00:07:02,680 --> 00:07:08,160 structure. It still works. But then you stop and you think and you realise that 71 00:07:08,160 --> 00:07:15,770 software and hardware is built by very smart people and usually in big groups and 72 00:07:15,770 --> 00:07:20,300 subject to a whole assortment of rather intense commercial pressure and related 73 00:07:20,300 --> 00:07:25,210 open-source pressure and using the best components and tools that they know and 74 00:07:25,210 --> 00:07:27,169 they like. 75 00:07:27,169 --> 00:07:34,520 *laughter**applause* 76 00:07:34,520 --> 00:07:38,990 Someone from the room: (Indistinct) You forgot management! 77 00:07:38,990 --> 00:07:42,480 And the best management that human beings can arrange. 78 00:07:42,480 --> 00:07:49,680 *laughter**applause* 79 00:07:49,680 --> 00:07:54,910 So we end up with something spectacularly ingenious. 80 00:07:54,910 --> 00:08:05,490 *laughter**applause* 81 00:08:05,490 --> 00:08:09,730 You can see here, if you look closely, you can see all the parts. 82 00:08:09,730 --> 00:08:11,000 *laughter* 83 00:08:11,000 --> 00:08:15,770 This was just a found picture from the internet. But you can see there's a bunch 84 00:08:15,770 --> 00:08:19,060 of, you know, transparent sort of silicon-like stuff down here. Let's say 85 00:08:19,060 --> 00:08:25,079 that's our hardware. And over here, I see a C compiler with all of its phases. 86 00:08:25,079 --> 00:08:28,599 *laughter* 87 00:08:28,599 --> 00:08:36,930 And these bottles, that's the file system, I reckon. And there's a TCP stack in here 88 00:08:36,930 --> 00:08:43,690 somewhere. And there's a piece that belongs to the NSA, obviously. And up 89 00:08:43,690 --> 00:08:50,259 here, I think up here - yeah, there's a JavaScript engine and a TLS stack and a 90 00:08:50,259 --> 00:08:56,050 web browser. And then perched at the top is the software that most people use to 91 00:08:56,050 --> 00:08:57,910 talk to their banks. 92 00:08:57,910 --> 00:09:01,489 *laughter* 93 00:09:01,489 --> 00:09:09,489 It's just - it's just insane, right? And then we have to look at how we build these 94 00:09:09,489 --> 00:09:14,089 pieces, right? So we build them fundamentally by a process of trial and 95 00:09:14,089 --> 00:09:20,649 error, right? Just occasionally we write some specifications in text. And then we 96 00:09:20,649 --> 00:09:26,029 write some code and maybe we write a few ad hoc texts and then we test and fix 97 00:09:26,029 --> 00:09:30,610 until the thing is marketable and then we test and fix and extend until the thing is 98 00:09:30,610 --> 00:09:36,069 no longer marketable and then we keep on using it until we can't anymore. Right? So 99 00:09:36,069 --> 00:09:40,259 the fundamental process that we're using to develop these systems is trial and 100 00:09:40,259 --> 00:09:47,930 error by exploring particular execution paths of the code on particular inputs. 101 00:09:47,930 --> 00:09:54,459 But - you know this, but I want to highlight - there are too many. The number 102 00:09:54,459 --> 00:09:59,649 of execution paths scales typically at least exponentially with the size of the 103 00:09:59,649 --> 00:10:04,490 code and the number of states scales exponentially with the size of the data 104 00:10:04,490 --> 00:10:08,790 that you're dealing with and the number of possible inputs is also shockingly large. 105 00:10:08,790 --> 00:10:14,920 There is no way that we can do this and we should perhaps stop deluding ourselves 106 00:10:14,920 --> 00:10:21,129 that there is. And then the most fundamental reason that computers are hard 107 00:10:21,129 --> 00:10:27,899 is that they are too discrete, right? If you take a bridge or, I don't know, a 108 00:10:27,899 --> 00:10:36,519 chair or something - where's something slightly bendable? You take a chair and 109 00:10:36,519 --> 00:10:44,199 you exert a bit more force on it than it's used to - I'm not very strong - and it 110 00:10:44,199 --> 00:10:51,299 bends a bit. It deforms continuously. If you take a bridge and you tighten up one 111 00:10:51,299 --> 00:10:55,239 of the bolts a bit too much or you underspecify one of the girders by 10 per 112 00:10:55,239 --> 00:11:01,019 cent, occasionally you'll reach a point of catastrophic failure but usually it will 113 00:11:01,019 --> 00:11:04,089 just be a little bit weaker or a little bit stronger or a little bit wibbly. 114 00:11:04,089 --> 00:11:08,960 Right? But computer systems, you change a bit - sometimes it doesn't make a 115 00:11:08,960 --> 00:11:13,179 difference but disturbingly often you change a line of code - as in system of 116 00:11:13,179 --> 00:11:19,600 those bugs that we were talking about - the whole thing becomes broken, right? So 117 00:11:19,600 --> 00:11:26,929 it's quite different from that traditional engineering. It's really different. Okay. 118 00:11:26,929 --> 00:11:34,850 So, "A New Dawn", this thing is titled. What can we do? What might we do? So I'm 119 00:11:34,850 --> 00:11:40,879 going to talk about several different possibilities, all right? One thing, we 120 00:11:40,879 --> 00:11:46,490 can do more of the stuff which is traditionally called software engineering. 121 00:11:46,490 --> 00:11:50,910 It's not really engineering in my book, but some of it's good. I mean, it's 122 00:11:50,910 --> 00:11:55,119 useful, you know? We can do more testing and have more assertions in our code and 123 00:11:55,119 --> 00:11:58,920 better management maybe and all that kind of stuff. We can do all that - we should 124 00:11:58,920 --> 00:12:04,339 do all that, but it's not going to make a very big change, all right? It's not 125 00:12:04,339 --> 00:12:13,989 addressing any of those root causes of the problem. What else could we do? So as an 126 00:12:13,989 --> 00:12:18,570 academic computer scientist, I've devoted several years of my life to working in the 127 00:12:18,570 --> 00:12:24,509 area of programming languages and I look at the programming languages used out 128 00:12:24,509 --> 00:12:36,919 there in the world today. Oh, it's just disgusting! It's shocking. 129 00:12:36,919 --> 00:12:43,399 *applause* 130 00:12:43,399 --> 00:12:50,420 The reasons that languages become popular and built into our shared infrastructure, 131 00:12:50,420 --> 00:12:54,790 the reasons for that have almost nothing to do with how well those languages are 132 00:12:54,790 --> 00:13:00,600 designed and in fact whether they are designed at all, right? 133 00:13:00,600 --> 00:13:04,110 *applause* 134 00:13:04,110 --> 00:13:09,040 So I didn't really want to get too much hate mail after this talk so I'm going to 135 00:13:09,040 --> 00:13:13,999 try and avoid naming particular languages as much as I can, but I encourage you to 136 00:13:13,999 --> 00:13:20,279 think away as I speak and imagine the examples. No, that was - I didn't want to 137 00:13:20,279 --> 00:13:23,730 get much hate mail. Sorry. 138 00:13:23,730 --> 00:13:25,160 *laughter* 139 00:13:25,160 --> 00:13:26,579 So... 140 00:13:26,579 --> 00:13:32,069 *laughter**applause* 141 00:13:32,069 --> 00:13:36,540 So at the very least, we could use programming languages based on ideas from 142 00:13:36,540 --> 00:13:47,389 the 1970s instead of the 1960s. I mean, really, come on. Right? You know, back in 143 00:13:47,389 --> 00:13:54,120 '67 or '69 or whatever we had ECPL and C and only a couple of years later we had 144 00:13:54,120 --> 00:14:00,069 languages that guaranteed type and memory safety and enforcement of abstraction 145 00:14:00,069 --> 00:14:05,089 boundaries and gave us rich mechanisms for programming so we could say what we mean 146 00:14:05,089 --> 00:14:11,350 and not futz about with 23 million pointers, about 1 per cent of which were 147 00:14:11,350 --> 00:14:16,629 completely wrong, right? So for any particular - in the context of any 148 00:14:16,629 --> 00:14:20,459 particular project or any particular existing legacy infrastructure there may 149 00:14:20,459 --> 00:14:25,789 be very good reasons why one just can't make a switch. Right? But for us 150 00:14:25,789 --> 00:14:31,860 collectively, there is no excuse. It's just wretched, right? 151 00:14:31,860 --> 00:14:37,040 *applause* 152 00:14:37,040 --> 00:14:40,969 It's completely bonkers. And then the other thing that you see as a programming 153 00:14:40,969 --> 00:14:49,239 language researcher is that - er, well, to make another analogy, it looks as if 154 00:14:49,239 --> 00:14:55,529 anyone who was capable of driving a car considered themselves an expert on car 155 00:14:55,529 --> 00:15:03,559 design, right? There is in fact a rather well-established subject of, you know, 156 00:15:03,559 --> 00:15:08,239 programming language design and we know how to specify completely precisely, 157 00:15:08,239 --> 00:15:13,019 mathematically precisely not just the syntax like we had in these BNF grammars 158 00:15:13,019 --> 00:15:18,160 from the 1960s and still have, but also the behaviour of these things. We know how 159 00:15:18,160 --> 00:15:22,699 to specify a few operational semantics and the type systems. And we know how to do 160 00:15:22,699 --> 00:15:27,360 mathematical proofs that our language designs are self-consistent, that they do 161 00:15:27,360 --> 00:15:35,059 guarantee good properties of arbitrary well-typed programs that you write, right? 162 00:15:35,059 --> 00:15:38,759 We can do this now. We can do this, if we have to, post hoc. And people are doing 163 00:15:38,759 --> 00:15:45,669 this for JavaScript and C and god help them, PHP and other such languages. But 164 00:15:45,669 --> 00:15:50,829 even better, we can do it at design time. And if you do this at design time, you see 165 00:15:50,829 --> 00:15:55,509 the structure. You have to - the act of writing that specification forces you to 166 00:15:55,509 --> 00:15:59,939 consider all the cases and all the interactions between features. You can 167 00:15:59,939 --> 00:16:07,861 still get it wrong, but you get it wrong in more interesting ways. Right? So people 168 00:16:07,861 --> 00:16:12,119 are starting to do this now. So I think there was a talk on the first day of this 169 00:16:12,119 --> 00:16:17,429 by Hannes and David who are building system software in maybe not the best sys 170 00:16:17,429 --> 00:16:21,800 language you could possibly imagine, but something which compared with C is as 171 00:16:21,800 --> 00:16:27,759 manna from heaven. Anil Madhavapeddy and his colleagues in Cambridge have been 172 00:16:27,759 --> 00:16:32,759 working hard to build system software in at least moderately modern languages. And 173 00:16:32,759 --> 00:16:40,780 it works. Sometimes you have to use C but surprisingly often you really don't. Yeah, 174 00:16:40,780 --> 00:16:47,049 so I teach, as my introducer said, in the University of Cambridge. And I teach 175 00:16:47,049 --> 00:16:51,269 semantics. And one of my fears is that I will accidentally let someone out in the 176 00:16:51,269 --> 00:16:54,970 world who will accidentally find themselves over a weekend inventing a 177 00:16:54,970 --> 00:16:58,149 little scripting language which will accidentally take over the world and 178 00:16:58,149 --> 00:17:05,359 become popular and I won't have explained to them what they have to do or even that 179 00:17:05,359 --> 00:17:08,189 there is a subject they have to understand. And if you want to understand 180 00:17:08,189 --> 00:17:18,740 it, there's lots of stuff you can look at. Okay. Let's go to the uppermost extreme. 181 00:17:18,740 --> 00:17:23,829 If we want software and hardware that actually works, we should prove it 182 00:17:23,829 --> 00:17:30,700 correct. Right? So this is something which academics, especially in the domain of 183 00:17:30,700 --> 00:17:36,830 labelled "formal methods" in the '70s and '80s, they've been pushing this, promoting 184 00:17:36,830 --> 00:17:44,789 this as an idea and a promise and an exhortation for decades, right? Why don't 185 00:17:44,789 --> 00:17:50,169 we just prove our programs correct? And for some of those decades it wasn't really 186 00:17:50,169 --> 00:17:54,570 plausible. I remember back when I was considerably younger, it was a big thing 187 00:17:54,570 --> 00:17:59,540 to proof that if you took a linked list and reversed it twice, you got back the 188 00:17:59,540 --> 00:18:08,059 same thing. Right? That was hard, then. But now, well, we can't do this for the 189 00:18:08,059 --> 00:18:14,419 Linux kernel, let's say, but we can surprising often do this. And I just give 190 00:18:14,419 --> 00:18:19,590 you a couple of the examples of modern day, state-of-the-art academic 191 00:18:19,590 --> 00:18:25,720 verification projects. People have verified compilers all the way from C-like 192 00:18:25,720 --> 00:18:31,769 languages down to assembly or from ML-like languages. You know, higher-order 193 00:18:31,769 --> 00:18:36,959 functional imperative languages all the way down to binary models of how x86 194 00:18:36,959 --> 00:18:44,470 behaves. And people have verified LLVM optimisation paths. There's verified 195 00:18:44,470 --> 00:18:49,880 software fault isolation which I believe goes faster than the original non-verified 196 00:18:49,880 --> 00:18:56,130 form. Win for them. There's a verified hypervisor from Nectar group in Australia. 197 00:18:56,130 --> 00:19:00,570 Right? Verified secure hypervisor with proofs of security properties. There's any 198 00:19:00,570 --> 00:19:06,690 amount of work verifying crypto protocols, sometimes with respect to assumptions - I 199 00:19:06,690 --> 00:19:12,510 think reasonable assumption on what the underlying mathematics is giving you. So 200 00:19:12,510 --> 00:19:22,919 we can do this. I maybe have to explain a bit what I mean by "prove". To some 201 00:19:22,919 --> 00:19:30,990 computery people, "I proved it works" means "I tested it a little bit". 202 00:19:30,990 --> 00:19:32,510 *laughter* 203 00:19:32,510 --> 00:19:34,960 In extremis, "it compiled". 204 00:19:34,960 --> 00:19:44,120 *laughter**applause* 205 00:19:44,120 --> 00:19:50,770 To a program analysis person, you might run some sophisticated tool but written in 206 00:19:50,770 --> 00:19:54,649 untrusted code and possibly doing an approximate and maybe unsound analysis and 207 00:19:54,649 --> 00:19:59,720 you might find an absence of certain kinds of bugs. That's tremendously useful; it's 208 00:19:59,720 --> 00:20:04,640 not what we would really call "proved". I mean, nor would they, to be fair. 209 00:20:04,640 --> 00:20:08,970 Scientists generally don't use the word because they know they're not doing it, 210 00:20:08,970 --> 00:20:13,559 right? They're doing controlled experiments, which gives them, you know, 211 00:20:13,559 --> 00:20:20,070 information for or against some hypothesis. Mathematician write proofs. 212 00:20:20,070 --> 00:20:25,139 You probably did that when you were young. And those proofs are careful mathematical 213 00:20:25,139 --> 00:20:30,139 arguments usually written on paper with pencils and their aim is to convince a 214 00:20:30,139 --> 00:20:33,389 human being that if that mathematician was really nailed up against the wall and 215 00:20:33,389 --> 00:20:38,980 pushed, they could expand that into a completely detailed proof. But what we 216 00:20:38,980 --> 00:20:47,750 have in computing, we don't have the rich mathematical structure that these people 217 00:20:47,750 --> 00:20:51,950 are working with, we have a tremendous amount of ad hoc and stupid detail mixed 218 00:20:51,950 --> 00:20:57,769 in which a smidgin of rich mathematical structure. So - and we have hundreds of 219 00:20:57,769 --> 00:21:03,399 millions of lines of code. So this is just not going to cut it. And if we want say 220 00:21:03,399 --> 00:21:08,440 the word "prove" ever, then the only thing which is legitimate is to do honest 221 00:21:08,440 --> 00:21:13,230 program proof and that means you have to have some system that machine-checks that 222 00:21:13,230 --> 00:21:17,540 your proof is actually a proof. And sometime we'll also have machines that 223 00:21:17,540 --> 00:21:22,250 will sort of help and sort of hinder that process, right? And there's a variety of 224 00:21:22,250 --> 00:21:27,340 these systems. Coq and HOL4 and Isabelle4 - Isabelle/HOL and what have you that we 225 00:21:27,340 --> 00:21:36,799 can look up. So using these systems we can prove nontrivial facts about these. Not 226 00:21:36,799 --> 00:21:41,530 necessarily that they do what you want them to do, but that they meet the precise 227 00:21:41,530 --> 00:21:48,519 specification that we've written down. We can do that. But it's tricky, right? So 228 00:21:48,519 --> 00:21:52,930 these projects, these are by academic standards all quite big, you know? This is 229 00:21:52,930 --> 00:21:58,009 something like, I don't know, 10 people for a few years or something. By industry 230 00:21:58,009 --> 00:22:03,399 scale, maybe not that big. By your scale, maybe not that big. But still a lot of 231 00:22:03,399 --> 00:22:08,149 work for a research group to do. And quite high-end work. It can take, you know, 232 00:22:08,149 --> 00:22:13,730 literally a few years to become really fluent in using these tools, right? And 233 00:22:13,730 --> 00:22:20,139 there isn't as yet sort of an open-source, collaborative movement doing this kind of 234 00:22:20,139 --> 00:22:26,710 stuff. Maybe it's the time to start. Maybe in five years, ten years. I don't know. If 235 00:22:26,710 --> 00:22:35,680 you want a challenge, there's a challenge. But it's really quite hard, right? It's 236 00:22:35,680 --> 00:22:40,830 not something that one can quite put forward credibly as a solution for all 237 00:22:40,830 --> 00:22:46,529 software and hardware development, right? So that leads me to, like an intermediate 238 00:22:46,529 --> 00:22:53,399 point. That should have been a four there. What can we do which improves the quality 239 00:22:53,399 --> 00:22:58,340 of our system, which injects some mathematical rigour but which involves 240 00:22:58,340 --> 00:23:06,370 less work and is kind of easy? And what we can do is take some of these interfaces 241 00:23:06,370 --> 00:23:11,679 and be precise about what they are. And initially, we have to do that after the 242 00:23:11,679 --> 00:23:16,941 fact. We have to reverse-engineer good specs of existing stuff. But then we can 243 00:23:16,941 --> 00:23:23,470 use the same techniques to make much cleaner and better-tested and things which 244 00:23:23,470 --> 00:23:30,670 are easier to test in the future. That's the idea. So my colleagues and I have been 245 00:23:30,670 --> 00:23:35,360 doing this kind of stuff for the last quite a few years and I just want to give 246 00:23:35,360 --> 00:23:41,570 you just a - two little vignettes just to sort of show you how it goes. So I can't 247 00:23:41,570 --> 00:23:46,759 give you much detail. And this is, obviously, joint work with a whole bunch 248 00:23:46,759 --> 00:23:55,399 of very smart and good people, some of which I name here. So this is not me, this 249 00:23:55,399 --> 00:24:00,840 is a whole community effort. So I'm going to talk for a little bit about the 250 00:24:00,840 --> 00:24:05,799 behaviour of multiprocessors, so at that hardware interface. And I'm going to talk 251 00:24:05,799 --> 00:24:11,090 a tiny little bit about the behaviour of the TCP and socket API, right? Network 252 00:24:11,090 --> 00:24:17,169 protocols. So - and there'll be some similarities and some differences between 253 00:24:17,169 --> 00:24:24,990 these two things. So multiprocessors. You probably want your computers to go fast. 254 00:24:24,990 --> 00:24:31,090 Most people do. And it's an obvious idea to glom together, because processors don't 255 00:24:31,090 --> 00:24:35,529 go that fast, let's glom together a whole bunch of processors and make them all talk 256 00:24:35,529 --> 00:24:46,150 to the same memory. This is not a new idea. It goes back at least to 1962 when 257 00:24:46,150 --> 00:24:51,380 the Burroughs D825 had, I think, two processors talking to a single shared 258 00:24:51,380 --> 00:24:55,850 memory. It had outstanding features including truly modular hardware with 259 00:24:55,850 --> 00:25:02,649 parallel processing throughout, and - some things do not change very much - the 260 00:25:02,649 --> 00:25:11,640 complement of compiling languages was to be expanded, right? 1962, so that'll be 52 261 00:25:11,640 --> 00:25:18,820 years ago. Deary me. Okay. Now, how do these things behave? So let me show you 262 00:25:18,820 --> 00:25:22,899 some code, right? It's a hacker conference. There should be code. Let me 263 00:25:22,899 --> 00:25:29,030 show you two example programs and these will both be programs with two threads and 264 00:25:29,030 --> 00:25:34,570 they will both be acting on two shared variables, X and Y. And in the initial 265 00:25:34,570 --> 00:25:42,899 state, X and Y are both zero. So, the first program. On this thread we write X 266 00:25:42,899 --> 00:25:48,970 and then we read Y, and over here we write Y and then we read X, right? Now, these 267 00:25:48,970 --> 00:25:53,629 are operating, you know, in an interleaving concurrency, you might think, 268 00:25:53,629 --> 00:25:58,030 so there's no synchronisation between those two threads so we might interleave 269 00:25:58,030 --> 00:26:03,269 them. We might run all of thread 0 before all of thread 1 or all of thread 1 before 270 00:26:03,269 --> 00:26:10,340 all of thread 0 or they might be mixed in like that or like that or like that or 271 00:26:10,340 --> 00:26:16,940 like that. There's six possible ways of interleaving two lists of two things. And 272 00:26:16,940 --> 00:26:22,980 in all of those ways you never see in the same execution that this reads from the 273 00:26:22,980 --> 00:26:30,169 initial state AND this reads from the initial state. You never see that. So you 274 00:26:30,169 --> 00:26:35,479 might expect and you might assume in your code that these are the possible outcomes. 275 00:26:35,479 --> 00:26:40,490 So what happens if you were to actually run that code on my laptop in a particular 276 00:26:40,490 --> 00:26:46,970 test harness? Well, okay, you see occasionally they're quite well 277 00:26:46,970 --> 00:26:52,499 synchronised and they both read the other guy's right. Sorry, big call. They both 278 00:26:52,499 --> 00:26:57,510 read the other guy's right. And quite often one thread comes completely before 279 00:26:57,510 --> 00:27:09,769 the other. But sometimes, sometimes they both see 0. Huh. Rats. Well, that's 280 00:27:09,769 --> 00:27:18,840 interesting. Let me show you another program. So now, thread 0 writes some 281 00:27:18,840 --> 00:27:23,440 stuff. Maybe it writes a big bunch of data, maybe multi-word data. And then it 282 00:27:23,440 --> 00:27:28,249 writes, let's say that's a flag announcing to some another thread that that data is 283 00:27:28,249 --> 00:27:33,249 ready to go now. And the other thread busy-waits reading that value until it 284 00:27:33,249 --> 00:27:38,710 gets 1, until it sees the flag. And then it reads the data. So you might want, as a 285 00:27:38,710 --> 00:27:43,610 programmer, you might want a guarantee that this read will always see the data 286 00:27:43,610 --> 00:27:48,759 that you have initialised. Hey? That would be like a message-passing kind of an 287 00:27:48,759 --> 00:27:56,229 idiom. So what happens if you run that? Well, on the x86 processors that we've 288 00:27:56,229 --> 00:28:03,929 tested, you always see that value. Good. This is a descent coding idiom on x86. All 289 00:28:03,929 --> 00:28:12,230 right. On Arma IBM Power processors, however, you see sometimes you just ignore 290 00:28:12,230 --> 00:28:16,850 the value written and you see the initial state. So this is not a good 291 00:28:16,850 --> 00:28:25,539 message-passing idiom, right? So what's going on? Well, these behaviours, it might 292 00:28:25,539 --> 00:28:31,260 be surprising, eh? And when you see surprising behaviour in hardware, you have 293 00:28:31,260 --> 00:28:36,100 two choices. Either it's a bug in the hardware, and we have found a number of 294 00:28:36,100 --> 00:28:45,259 bugs in hardware. It's always - it's very rewarding. Or it's a bug in your 295 00:28:45,259 --> 00:28:51,350 expectation. Or it's a bug in your test harness. Right? So what you do is you walk 296 00:28:51,350 --> 00:28:58,050 along and you walk up to your friendly processor architect in IBM or ARM or x86 297 00:28:58,050 --> 00:29:04,770 land, and we have worked with all of these people. And you ask them, "Is this a bug?" 298 00:29:04,770 --> 00:29:12,360 And a processor architect is a person who is - has the authority to decide whether 299 00:29:12,360 --> 00:29:16,240 some behaviour is intended or is a bug. 300 00:29:16,240 --> 00:29:17,750 *laughter* 301 00:29:17,750 --> 00:29:24,669 That's what they do, essentially. And for these, they say, "Oh, we meant it to be 302 00:29:24,669 --> 00:29:29,239 like that". Right? No question. "We meant it to be like that. That's perfectly 303 00:29:29,239 --> 00:29:36,559 proper. We have, because you and everybody else wanted their computers to go fast, we 304 00:29:36,559 --> 00:29:41,690 as processor designers have introduced all manner of sophisticated optimizations, 305 00:29:41,690 --> 00:29:45,200 which if you were running sequential code, you would never notice, but if you start 306 00:29:45,200 --> 00:29:49,289 running fancy concurrent code like this, fancy concurrent code that isn't just 307 00:29:49,289 --> 00:29:56,049 trivially locked, you can notice", so this is intentional behaviour. And if you want 308 00:29:56,049 --> 00:30:00,129 to write that code, like a mutual exclusion algorithm or a message-passing 309 00:30:00,129 --> 00:30:06,460 algorithm or something, then you need to insert special instructions, memory 310 00:30:06,460 --> 00:30:11,180 barrier instructions. And so you go away and you say, "What do they do?" And you 311 00:30:11,180 --> 00:30:18,949 look up in the vendor documentation. And you get stuff like this. I'll read it out: 312 00:30:18,949 --> 00:30:22,190 (Reads quickly) "For any applicable pair AB, the memory barrier ensures that a will 313 00:30:22,190 --> 00:30:24,740 be performed with respect to any processor or mechanism, to the extent required by 314 00:30:24,740 --> 00:30:27,590 the associated memory coherence required at beauties, before b is performed with 315 00:30:27,590 --> 00:30:30,360 respect to that prior or mechanism. A includes all applicable storage accesses 316 00:30:30,360 --> 00:30:33,090 by any such processor or mechanism that have been performed with respect to P1 317 00:30:33,090 --> 00:30:36,210 before the memory barrier is created. B includes all applicable storage accesses 318 00:30:36,210 --> 00:30:38,670 by any such processor or mechanism that are performed after a Load instruction 319 00:30:38,670 --> 00:30:41,290 executed by that processor or mechanism has returned the value stored by a store 320 00:30:41,290 --> 00:30:42,700 that is in B." 321 00:30:42,700 --> 00:30:49,850 *applause* 322 00:30:49,850 --> 00:30:58,630 Hands up if you understand what that means? Hands up if you think that if you 323 00:30:58,630 --> 00:31:02,139 thought about it and read the rest of the book, you would understand or you could 324 00:31:02,139 --> 00:31:09,730 understand what that means? Hands up the people who think that we're doomed forever? 325 00:31:09,730 --> 00:31:12,920 *laughter* 326 00:31:12,920 --> 00:31:16,979 So I'm sorry to the first few groups, but the last ones are right. 327 00:31:16,979 --> 00:31:22,299 *laughter**applause* 328 00:31:22,299 --> 00:31:28,850 This looks like it's really intricate and really carefully thought out stuff. And we 329 00:31:28,850 --> 00:31:32,220 thought that for a while and we spent literally years trying to make precise 330 00:31:32,220 --> 00:31:37,159 mathematical models that give precise meaning to these words, but actually it 331 00:31:37,159 --> 00:31:45,460 can't be done. So what do you have to do in that circumstance? You have to go away 332 00:31:45,460 --> 00:31:52,120 and you have to invent some kind of a model. And there's a - this is a really 333 00:31:52,120 --> 00:31:56,190 fundamental problem, that we - on the one hand, we develop our software by this 334 00:31:56,190 --> 00:32:01,960 trial and error process, but on the other hand are points like this. We have these 335 00:32:01,960 --> 00:32:06,470 loose specifications, supposedly, that have to cover all manner of behaviour of 336 00:32:06,470 --> 00:32:10,730 many generations of processors that might all behave the same way, written just in 337 00:32:10,730 --> 00:32:16,769 text. And we tell people they should write "to the spec", but the only way they have 338 00:32:16,769 --> 00:32:22,450 of developing their code is to run it and run particular executions on the 339 00:32:22,450 --> 00:32:26,960 particular hardware implementations that they have, whose relationship to the spec 340 00:32:26,960 --> 00:32:34,740 is very elusive. We can't use those thick books that you get or those quite weighty 341 00:32:34,740 --> 00:32:39,260 PDFs that you get these days from the processor vendors to test programs. You 342 00:32:39,260 --> 00:32:43,970 can't feed a program into that thick book and get output out. And you can't test 343 00:32:43,970 --> 00:32:47,429 processor implementations and you can't prove anything and you can't even 344 00:32:47,429 --> 00:32:53,430 communicate between human beings, right? So these things, really, they don't exist. 345 00:32:53,430 --> 00:32:58,500 So what can we do? Well, the best we can do at this point in time is a bit of 346 00:32:58,500 --> 00:33:04,950 empirical science, right? So we can invent some model off the top of our heads, 347 00:33:04,950 --> 00:33:09,649 trying to describe just the programmer-visible behaviour, not the 348 00:33:09,649 --> 00:33:15,529 internal structure. And we can make a tool. Because that's not prose now. Now we 349 00:33:15,529 --> 00:33:18,950 can - now it's stuff. It's real mathematics and we can turn that into code 350 00:33:18,950 --> 00:33:23,200 and execute it. We can make tools that take programs and execute it and - small 351 00:33:23,200 --> 00:33:28,280 programs - tell us the set of all behaviours allowed by that model. And then 352 00:33:28,280 --> 00:33:33,759 we can compare those sets of behaviours against the experimental observations. And 353 00:33:33,759 --> 00:33:38,360 you might have to fix the model and you might find hard bugs. And then the model 354 00:33:38,360 --> 00:33:42,869 also has to make sense to the architect so you can discuss with the architects what 355 00:33:42,869 --> 00:33:47,100 they intend and what their institution is, and then you can also prove some other 356 00:33:47,100 --> 00:33:50,980 facts about it to give a bit more assurance and then because you probably 357 00:33:50,980 --> 00:33:57,249 got this wrong the first couple of times, you can go back. And this results or has 358 00:33:57,249 --> 00:34:04,509 resulted in models which are not guaranteed to be correct, but they are 359 00:34:04,509 --> 00:34:08,639 effectively the de facto standard for understanding the concurrency behaviour of 360 00:34:08,639 --> 00:34:14,300 these things, you know? And various people use them. And just to give you a tiny 361 00:34:14,300 --> 00:34:18,570 snippet - I'll have to speed up a bit - a tiny snippet of what the model looks like, 362 00:34:18,570 --> 00:34:22,969 it's basically just an abstract machine. It's got some stuff for the threads that 363 00:34:22,969 --> 00:34:26,949 handles the programmer-visible speculative execution, and some stuff for a storage 364 00:34:26,949 --> 00:34:32,739 subsystem that handles the fact that in these architectures, memory rights can be 365 00:34:32,739 --> 00:34:38,270 propagated to different threads at different times, right? And there's a - 366 00:34:38,270 --> 00:34:42,540 that model has a state which is just a collection of some sets and lists and 367 00:34:42,540 --> 00:34:47,230 partial orders and a few other things which I won't talk about. And it has some 368 00:34:47,230 --> 00:34:51,030 transitions, right? In any state, there might be several possible transitions. 369 00:34:51,030 --> 00:34:54,699 This is just a sample. I'm not going to explain all of this. But when you want to 370 00:34:54,699 --> 00:34:58,570 propagate a write to another thread, there have to be some preconditions that you have to 371 00:34:58,570 --> 00:35:07,140 satisfy. And than there is an action. It's not amazingly complicated, really. So this is text, 372 00:35:07,140 --> 00:35:13,319 but it's very precisely written text and it has the great advantage that you can go through 373 00:35:13,319 --> 00:35:16,570 these bullet points with your friendly architect and say, "Is this what you 374 00:35:16,570 --> 00:35:22,440 mean?" And they can think about it and say yes or no. But for the actual definition, 375 00:35:22,440 --> 00:35:27,309 that's in mathematics but it's mathematics that's quite close to code. I mean, it's 376 00:35:27,309 --> 00:35:32,380 terribly close to pure, functional code with outside effects. And just to give you 377 00:35:32,380 --> 00:35:37,230 an idea, this is some of it and those are three of those conditions in the real, 378 00:35:37,230 --> 00:35:42,660 honest, true version. Then we can take this mathematics and because it's in a 379 00:35:42,660 --> 00:35:49,270 particular style, we can compile this into actually OCaml and then OCaml byte code 380 00:35:49,270 --> 00:35:55,770 and then we can run it for batch jobs. But then you can compile that OCaml byte code 381 00:35:55,770 --> 00:36:02,579 to JavaScript and glue on an user interface and stick it into a web browser and then you 382 00:36:02,579 --> 00:36:07,760 have a web interface that lets people explore this model. And that's also a 383 00:36:07,760 --> 00:36:16,190 necessary part of validating that it makes sense. Okay. This is not rocket science. 384 00:36:16,190 --> 00:36:21,710 You've missed the talk for rocket science, I'm sorry. All we're doing here is 385 00:36:21,710 --> 00:36:28,630 specifying the intended behaviour of a system. Okay? That's - it's not very 386 00:36:28,630 --> 00:36:32,490 technical but it's unusual. And we happen to be doing it in this mathematical 387 00:36:32,490 --> 00:36:37,049 language, but you could use in fact almost any language so long as you understand 388 00:36:37,049 --> 00:36:40,170 what you're doing. What you have to understand is that you're writing 389 00:36:40,170 --> 00:36:45,230 something which is not an implementation. It is something which, given some observed 390 00:36:45,230 --> 00:36:49,810 behaviour, some arbitrary observed behaviour, will be able to decide if it's 391 00:36:49,810 --> 00:36:54,580 allowed, if you want it to be allowed or not, right? It has to be executed as a 392 00:36:54,580 --> 00:37:02,059 test oracle. The key question for getting this to work is to understand how much 393 00:37:02,059 --> 00:37:05,730 non-determinism or loose specification there is in the system that you're working 394 00:37:05,730 --> 00:37:08,890 with, right? So if everything is completely deterministic, you can write a 395 00:37:08,890 --> 00:37:13,839 reference implementation in the cleanest language of your choice and just compare 396 00:37:13,839 --> 00:37:19,270 the output of that and the output of the real thing, right? But if there's more 397 00:37:19,270 --> 00:37:24,000 non-determinism, then you can't do that. I'm going to have to abbreviate this a 398 00:37:24,000 --> 00:37:31,369 little tiny bit. So for the TCP network protocols, there is more non-determinism. 399 00:37:31,369 --> 00:37:35,220 You know what the TCP is, yes? A protocol that gives you sort of reliable 400 00:37:35,220 --> 00:37:41,720 connections, assuming that the world is made of good people, right? You look at 401 00:37:41,720 --> 00:37:47,640 the standards for TCP, and they're the same kind of wibbly text from the 1980s 402 00:37:47,640 --> 00:37:52,400 and you look at the implementations and they are a ghastly mess. And you try and 403 00:37:52,400 --> 00:37:55,690 understand the relationship between the two of them and you can't because the 404 00:37:55,690 --> 00:38:01,740 standard, again, is not the definition. It doesn't define in all circumstances what 405 00:38:01,740 --> 00:38:07,010 behaviour is allowed and what isn't. So again, we had to make up a specification 406 00:38:07,010 --> 00:38:14,310 in the first instance just of this host, a single endpoint, and observing its 407 00:38:14,310 --> 00:38:18,510 sockets, API, calls and returns and its behaviour on the wire, right? And when 408 00:38:18,510 --> 00:38:21,969 you've decided - and a few internal debug events. And when you've decided what to 409 00:38:21,969 --> 00:38:26,619 observe, then observation is just a trace, it's just a list of those events. And you 410 00:38:26,619 --> 00:38:34,750 have to be able to ask your spec "Is that list admissible or not"? But there's a lot 411 00:38:34,750 --> 00:38:38,820 of non-determinism. Variation between the implementations, variations within the 412 00:38:38,820 --> 00:38:45,650 implementations. And that's internal. It's not announced in the API or on the wire 413 00:38:45,650 --> 00:38:50,980 maybe until much later when the implementation picks a new window size or 414 00:38:50,980 --> 00:38:56,890 something. You can't tell until quite a lot later in the trace what it's picked. 415 00:38:56,890 --> 00:39:02,740 And that makes the job of checking whether a trace is admissible by the spec much 416 00:39:02,740 --> 00:39:08,559 harder than it has to be. Right? And if you had designed the TCP protocol and 417 00:39:08,559 --> 00:39:14,470 those implementations for to be testable in this very discriminating way, back in 418 00:39:14,470 --> 00:39:20,470 the 1980s when you were designing TCP protocol, it would be much easier. And for 419 00:39:20,470 --> 00:39:25,650 new protocols, one should make sure that you're doing this in this particular way. 420 00:39:25,650 --> 00:39:30,930 I don't think I want to talk about that slide. That's just one of the rules of 421 00:39:30,930 --> 00:39:35,480 that specification. But the key fact about that spec is the - well, we handled the 422 00:39:35,480 --> 00:39:40,210 real protocols for arbitrary inputs. But it's structured not just for this 423 00:39:40,210 --> 00:39:46,240 executability as a test oracle, but it's structured for clarity, not performance, 424 00:39:46,240 --> 00:39:49,750 which is scarcely ever what anyone ever does, right? So it's a whole new kind of 425 00:39:49,750 --> 00:39:54,480 thing. And the testing is very discriminating. So we found all manner of 426 00:39:54,480 --> 00:40:03,839 amusing and bizarre bugs which I think I don't have time to talk about. Okay. So 427 00:40:03,839 --> 00:40:12,610 I've described three kinds of things that we might do. The first thing - for 428 00:40:12,610 --> 00:40:18,800 goodness sake - I mean, it's just a no-brainer. Just do it already. Everybody. 429 00:40:18,800 --> 00:40:29,670 All of you. All of you. Right? This middle zone is a very interesting zone as far as 430 00:40:29,670 --> 00:40:33,920 I'm concerned, right? It's - out of what - a place where we could get a very good 431 00:40:33,920 --> 00:40:39,220 benefit for not that much effort, right? We can do it, if necessary, post hoc. We 432 00:40:39,220 --> 00:40:44,490 can also do it and even better, at design time. We have to do this in a way that 433 00:40:44,490 --> 00:40:48,240 makes our executions testable as test oracles, and another thing that that 434 00:40:48,240 --> 00:40:52,119 enables is completely random testing. When you've got a test oracle, you can just 435 00:40:52,119 --> 00:40:58,210 feed in random goop. This is fuzzing, but better fuzzing - feed in random goop and 436 00:40:58,210 --> 00:41:02,859 check at every point whether what the behaviour that you're getting is allowable 437 00:41:02,859 --> 00:41:09,271 or not. And then they're written for clarity, not for performance, and that 438 00:41:09,271 --> 00:41:13,619 means you can see what you're doing, right? You can see the complexity. If you 439 00:41:13,619 --> 00:41:17,140 go ahead and you add some feature to your protocol or your programming language or 440 00:41:17,140 --> 00:41:22,420 whatever and you're working just with text specifications, then you can't see the 441 00:41:22,420 --> 00:41:24,759 interactions. You just chuck in a couple of paragraphs and you think for a few 442 00:41:24,759 --> 00:41:30,210 minutes, right? And if you're lucky, you make an implementation. But if you're 443 00:41:30,210 --> 00:41:35,309 writing a spec that has to cover all the cases like this, the act of doing that 444 00:41:35,309 --> 00:41:41,150 encourages you to think or helps you think about the excess complexity. And you might 445 00:41:41,150 --> 00:41:46,690 think that's too complex, I'm being silly, I'll make it simpler. And it has to be 446 00:41:46,690 --> 00:41:51,630 usable for proof. So this, I think also is pretty much a no-brainer. And it doesn't 447 00:41:51,630 --> 00:41:56,650 require great technical, you know, mathematical skill either, right? Lots of 448 00:41:56,650 --> 00:42:01,760 people can do this. And then there's option 3, best option, full-on formal 449 00:42:01,760 --> 00:42:07,930 verification of the key components. And that is now also in reach. You can imagine 450 00:42:07,930 --> 00:42:16,309 secure systems made using actual verified compilers and verified hypervisors with 451 00:42:16,309 --> 00:42:22,540 verified TLS stacks and you can imagine making things out of those or making those 452 00:42:22,540 --> 00:42:27,870 and then making things out of those. And maybe one should be thinking about that. 453 00:42:27,870 --> 00:42:34,070 So, maybe not appropriate for everything. If you were writing Word, you would not 454 00:42:34,070 --> 00:42:38,599 wish to do these things. Probably, you're not. But for this key infrastructure that 455 00:42:38,599 --> 00:42:46,420 we really depend on, we trust even though it's not trustworthy, we have to do this. 456 00:42:46,420 --> 00:42:52,450 Is this a new dawn? I wonder if there's anyone standing next to a light switch 457 00:42:52,450 --> 00:42:58,920 that can dim them for me. I didn't ask them beforehand, so... If you think back 458 00:42:58,920 --> 00:43:05,540 the last 70-odd years, we started building computers in around 1940. It's been pretty 459 00:43:05,540 --> 00:43:12,760 dark from the point of view of rigorous, solid, reliable engineering, stuff that is 460 00:43:12,760 --> 00:43:21,760 actually trustworthy. Pretty dark, I would say. Maybe, just maybe there's a tiny 461 00:43:21,760 --> 00:43:26,339 smidgen of light coming through the doors. And we start to have a little bit of a 462 00:43:26,339 --> 00:43:31,109 clue and we start to get reusable models of processor architectures and programming 463 00:43:31,109 --> 00:43:35,540 languages and network protocols and what have you. It's just the beginnings of the 464 00:43:35,540 --> 00:43:39,790 analogue of that thermodynamics and materials science and quality control 465 00:43:39,790 --> 00:43:47,819 management and what have you that we have from mechanical engineering. And we've got 466 00:43:47,819 --> 00:43:57,200 no choice. If we don't, the next 75 years is going to be a lot worse. You can just 467 00:43:57,200 --> 00:44:02,400 imagine, right? So I went to this - as I'm sure many of you do - this great talk on 468 00:44:02,400 --> 00:44:08,370 some Apple boot loader exploit yesterday which was relying on some feature that had 469 00:44:08,370 --> 00:44:14,900 been introduced in the early '80s and was still present. And you can imagine in 100 470 00:44:14,900 --> 00:44:22,800 years from now, you can imagine as long as human civilisation endures, the x86 471 00:44:22,800 --> 00:44:28,869 architecture and the socket API and all of this stuff, it will be embedded in some 472 00:44:28,869 --> 00:44:34,099 monumental ghastly stack of virtualisation layers forever. 473 00:44:34,099 --> 00:44:45,760 *laughter**applause* 474 00:44:45,760 --> 00:44:50,720 So I'd like to thank especially all of my colleagues that have been working with me 475 00:44:50,720 --> 00:44:56,550 or not with me in these directions. I'd like to thank our generous funders who 476 00:44:56,550 --> 00:45:01,389 support this stuff. I'd like to thank you for your attention and I exhort you, 477 00:45:01,389 --> 00:45:04,850 sort it out. 478 00:45:04,850 --> 00:45:20,800 *applause* 479 00:45:20,800 --> 00:45:25,300 Herald: Thank you very much, Peter, for those interesting insights to our 480 00:45:25,300 --> 00:45:31,269 programming. We have now time for a Q & A, so those people who have to leave the 481 00:45:31,269 --> 00:45:39,970 room, please do so quietly and as fast as possible, so we can go on and hear what 482 00:45:39,970 --> 00:45:51,270 the questions are. So we start with microphone 4, please. 483 00:45:51,270 --> 00:45:58,790 Mic 4: Hello. Thanks for the talk and my question is if you got an oracle of the 484 00:45:58,790 --> 00:46:04,270 software behaviour which can translate every possible input to a correct output, 485 00:46:04,270 --> 00:46:08,410 how can this be less complex and error-prone than the reference 486 00:46:08,410 --> 00:46:10,510 implementation? 487 00:46:10,510 --> 00:46:15,620 Peter: Good question. So the first point is that this oracle doesn't have to 488 00:46:15,620 --> 00:46:22,010 produce the outputs. It only has to check that the inputs and the outputs match up. 489 00:46:22,010 --> 00:46:26,560 And then the second point is that in general it might have to have much of the 490 00:46:26,560 --> 00:46:33,221 same complexity, but by writing it to be clear as opposed to to be fast, you may 491 00:46:33,221 --> 00:46:37,730 adopt a completely different structure, right? So the structure of our TCP spec is 492 00:46:37,730 --> 00:46:43,540 organised by the behaviour that you see for particular transitions. The structure 493 00:46:43,540 --> 00:46:49,160 of a real implementation has fast-path code and lots of complicated intertwined 494 00:46:49,160 --> 00:46:55,309 branching and all manner of excess complexity, of implementation complexity, 495 00:46:55,309 --> 00:46:56,879 if you will. 496 00:46:56,879 --> 00:47:01,450 Mic 4: Thanks. Herald: So microphone 3, please? 497 00:47:01,450 --> 00:47:06,710 Mic 3: I wanted to ask you what you thought about programming by manipulating 498 00:47:06,710 --> 00:47:13,859 the abstract syntax tree directly so as to not allow incompilable programs because of 499 00:47:13,859 --> 00:47:18,410 some, like, you're missing a semicolon or something like that. What's your thoughts 500 00:47:18,410 --> 00:47:20,220 on that? 501 00:47:20,220 --> 00:47:23,700 Peter: So that's - in the grand scheme of things, I think that's not a very big 502 00:47:23,700 --> 00:47:28,890 deal, right? So there's - people have worked on structured editors for lordy 503 00:47:28,890 --> 00:47:33,309 knows how many years. It's not a big deal because it's very easy for a compiler to 504 00:47:33,309 --> 00:47:38,569 detect that kind of thing. And even more, it's very easy for a compiler of a 505 00:47:38,569 --> 00:47:43,140 sensibly designed language to detect type errors, even for a very sophisticated type 506 00:47:43,140 --> 00:47:49,589 system. So I don't think that - that's not - I mean, some people might find it 507 00:47:49,589 --> 00:47:52,930 helpful, but I don't think it's getting to the heart of the matter. 508 00:47:52,930 --> 00:47:54,250 Mic 3: Thanks. 509 00:47:54,250 --> 00:47:59,850 Herald: So we've got some questions from our signal angels from the IRC. 510 00:47:59,850 --> 00:48:06,150 Signal angel: Yes. There's one question. It's about the repository for Isabelle and 511 00:48:06,150 --> 00:48:11,300 HOL proofs. It should be on source forge, and you said there are no open source 512 00:48:11,300 --> 00:48:15,480 repositories. What about them? 513 00:48:15,480 --> 00:48:20,589 Peter: I'm not quite sure what the question is, really. So there's a 514 00:48:20,589 --> 00:48:28,230 repository of Isabelle formal proofs which is the archival form of proofs, it's 515 00:48:28,230 --> 00:48:34,790 called. I didn't really mean to say there are no open source repositories and in 516 00:48:34,790 --> 00:48:38,500 fact I don't know under what conditions most of those proofs are licensed. They 517 00:48:38,500 --> 00:48:43,210 probably are open. But there isn't an open source community building these things, 518 00:48:43,210 --> 00:48:47,710 right? It's still pretty much an academic pursuit. 519 00:48:47,710 --> 00:48:50,329 Herald: Microphone 2, please. 520 00:48:50,329 --> 00:48:55,530 Mic 2: Hello. Thanks again for your talk. I just want to add something that you 521 00:48:55,530 --> 00:49:01,970 didn't address, and that's that we actually need more good studies in 522 00:49:01,970 --> 00:49:07,970 software engineering. We often hear a lot of experts and also very well-known 523 00:49:07,970 --> 00:49:11,740 computer scientists say things like, "Well, we just need to do functional 524 00:49:11,740 --> 00:49:17,839 programming and OOP is bad and stuff like that", which I think there's a lot of 525 00:49:17,839 --> 00:49:24,109 truth to it, but we actually need studies where we test these kinds of claims that 526 00:49:24,109 --> 00:49:29,530 we make, because when you look at other fields which it also did compare to, like, 527 00:49:29,530 --> 00:49:34,250 medicine, if somebody comes around and is well-known and says things like, 528 00:49:34,250 --> 00:49:42,230 "homeopathy works", we don't believe them. We do trials, we do good trials. And 529 00:49:42,230 --> 00:49:50,430 there's a lot of myths out there, like, or not well tested things, like "hire the 530 00:49:50,430 --> 00:49:54,739 best programmers, they are 100 times more productive", "do steady type 531 00:49:54,739 --> 00:50:00,430 systems", and stuff like that. And we need to verify that this is true, that this 532 00:50:00,430 --> 00:50:02,210 helps. And it's... 533 00:50:02,210 --> 00:50:07,490 Peter: Okay. So in the grand scheme of things, arguably you're right. In the 534 00:50:07,490 --> 00:50:12,739 grandest scheme of things, computer science is actually a branch of psychology 535 00:50:12,739 --> 00:50:18,609 or really sociology. We are trying to understand how large groups of people can 536 00:50:18,609 --> 00:50:25,839 combine to make things which are insanely complicated. Now, for - would it be good 537 00:50:25,839 --> 00:50:31,329 if we had, you know, solid evidence that programming in this language was better 538 00:50:31,329 --> 00:50:38,450 than programming in that language? Well, yes, but it's staggeringly difficult and 539 00:50:38,450 --> 00:50:45,079 expensive to do honest experiments of that nature and they're scarcely ever done, 540 00:50:45,079 --> 00:50:49,040 right? You might do little experiments on little groups of students but it's really 541 00:50:49,040 --> 00:50:55,930 difficult. And then some of these things which I'm saying, when one is familiar 542 00:50:55,930 --> 00:51:01,740 with the different possible options, are just blindingly obvious. I mean, there are 543 00:51:01,740 --> 00:51:09,161 reasons why these people are using OCaml for their system programming, right? It's 544 00:51:09,161 --> 00:51:16,609 not - you know, it's not - "Homeopathy is right", but "Homeopathy is wrong" which is 545 00:51:16,609 --> 00:51:21,230 the kind of argument being put forward. 546 00:51:21,230 --> 00:51:24,260 Herald: Question from microphone 5, please. 547 00:51:24,260 --> 00:51:29,609 Mic 5: So where are you using ECC memory for your testing - up here. 548 00:51:29,609 --> 00:51:33,510 Peter: When you say up here, there's a bit of ambiguity. 549 00:51:33,510 --> 00:51:34,819 Mic 5: Here. 550 00:51:34,819 --> 00:51:37,860 Peter: Thank you. Say again? 551 00:51:37,860 --> 00:51:43,119 Mic 5: So where are you using ECC memory for the testing you showed about the 552 00:51:43,119 --> 00:51:51,070 results of the multithreaded results due to memory barriers and memory reorderings? 553 00:51:51,070 --> 00:51:55,450 Peter: Well... 554 00:51:55,450 --> 00:52:00,130 Mic 5: Okay, but even if you were or you were not, the point I want to make is that 555 00:52:00,130 --> 00:52:05,950 you can expect about 1 bit flip each minute in a modern computer system that 556 00:52:05,950 --> 00:52:13,670 may completely change what your software is doing, so perhaps we also have to look 557 00:52:13,670 --> 00:52:19,180 in ways how we can work if something goes wrong at the very low level so that we 558 00:52:19,180 --> 00:52:25,010 kind of have a check against our specification on a more higher level of 559 00:52:25,010 --> 00:52:31,940 our software. So it is valuable to do good engineering on the low levels, but still I 560 00:52:31,940 --> 00:52:37,670 think we will not solve the problems of computation and computer engineering just 561 00:52:37,670 --> 00:52:42,520 by proving things in the mathematical domain because computers are physical 562 00:52:42,520 --> 00:52:47,739 entities. They have errors and so on and we really have to deal with them as well. 563 00:52:47,739 --> 00:52:54,340 Peter: So it's certainly true that there are such random errors. In fact, I would 564 00:52:54,340 --> 00:52:59,480 put the point that I would argue that you have to be able to model the physics well 565 00:52:59,480 --> 00:53:03,680 enough and you have to be able to model the statistics of those errors well 566 00:53:03,680 --> 00:53:07,750 enough, so that's a different kind of mathematics. And it's certainly valuable 567 00:53:07,750 --> 00:53:13,290 and necessary. But if you look at the statistic you just quoted, is that the 568 00:53:13,290 --> 00:53:22,109 main cause of why systems go wrong? Except possibly for space hardware, no. So 569 00:53:22,109 --> 00:53:26,499 important, yes. The most important thing that we should be paying attention to? I 570 00:53:26,499 --> 00:53:29,940 don't really think so. 571 00:53:29,940 --> 00:53:32,980 Herald: Microphone 6, please? 572 00:53:32,980 --> 00:53:41,200 Mic 6: Hi. I really think that what you're proposing to specify and verify more key 573 00:53:41,200 --> 00:53:47,160 components is - would be a valuable addition, but how do you make sure that 574 00:53:47,160 --> 00:53:51,880 your specification actually matches the behaviour that you want to do or to have? 575 00:53:51,880 --> 00:53:58,300 Peter: So as I described, we do as partial validation of those specifications, we do 576 00:53:58,300 --> 00:54:03,050 a lot of testing against the implementations, against a range of 577 00:54:03,050 --> 00:54:07,060 existing implementations. That's one source of validation. Another source of 578 00:54:07,060 --> 00:54:11,520 validation is that you talk to the architects and the designers. You want the 579 00:54:11,520 --> 00:54:16,929 internal structure to match their intent. You want it to be comprehensible to them. 580 00:54:16,929 --> 00:54:21,510 Another kind of validation that we do is prove properties about them. So we proved 581 00:54:21,510 --> 00:54:28,750 that you can correctly compile from a C11 concurrency, a mathematical model of that, 582 00:54:28,750 --> 00:54:35,432 to a IBM Power concurrency. And that kind of proof gives you a bit more level of 583 00:54:35,432 --> 00:54:39,450 assurance in both. So none of this is giving you a total guarantee. You 584 00:54:39,450 --> 00:54:43,980 certainly don't claim a total guarantee. But it gives you pretty good levels of 585 00:54:43,980 --> 00:54:47,880 assurance by normal standards. 586 00:54:47,880 --> 00:54:49,580 Herald: Mic 4? 587 00:54:49,580 --> 00:54:56,309 Mic 4: Yes. Thanks again. You proposed random tests, and with my expertise, it's 588 00:54:56,309 --> 00:55:02,859 very annoying if you have tests where the outcome is sometimes a failure and you 589 00:55:02,859 --> 00:55:10,499 want to have reproducible tests to fix a bug. So how do we bring this aspects to 590 00:55:10,499 --> 00:55:17,850 test more variety in data and to have it reproducible together? 591 00:55:17,850 --> 00:55:24,800 Peter: Well, if - as I was mentioning, for the TCP thing, one of the - so the problem 592 00:55:24,800 --> 00:55:29,150 is reproducibility is exactly this internal non-determinism, right? The 593 00:55:29,150 --> 00:55:33,389 system makes this scheduling choice or what have you and you can't see what it 594 00:55:33,389 --> 00:55:39,599 is, or the checker can't see what it is. So one way to do that is to really design 595 00:55:39,599 --> 00:55:45,369 the protocols in a different way to expose that non-determinism. The other fact about 596 00:55:45,369 --> 00:55:53,421 this kind of general purpose specification as test oracle idea is that in some sense, 597 00:55:53,421 --> 00:55:59,240 it doesn't have to be reproducible. Right? The specification is giving a yes-no 598 00:55:59,240 --> 00:56:05,510 answer for an arbitrary test. And that means that you can use arbitrary tests. 599 00:56:05,510 --> 00:56:09,730 Figuring out the root causes of the differences afterwards may be tricky, but 600 00:56:09,730 --> 00:56:13,200 you can use them for assurance. 601 00:56:13,200 --> 00:56:18,489 Herald: So we now got time for two more questions. Microphone 1, please? 602 00:56:18,489 --> 00:56:23,069 Mic 1: Hiya. Thanks for a great talk. So what you've described seems to be a middle 603 00:56:23,069 --> 00:56:28,259 ground between a full-on, formal verification and more practical testing, 604 00:56:28,259 --> 00:56:34,309 like, in between. So where do you see in the future where formal verifications will 605 00:56:34,309 --> 00:56:39,831 go? Will they converge to the middle or whether it will still be there just to 606 00:56:39,831 --> 00:56:44,299 verify something that's more well-specified? 607 00:56:44,299 --> 00:56:49,940 Peter: So the progress of sort of the whole subject of formal verification in 608 00:56:49,940 --> 00:56:56,930 general, if you look at that over the last 10 years or so, it's been very - it's been 609 00:56:56,930 --> 00:57:01,349 really amazing compared with what we could do in the 80s and 90s and early 2000s, 610 00:57:01,349 --> 00:57:06,480 right? So the scope of things, the scale of things which are - for which it is 611 00:57:06,480 --> 00:57:11,180 becoming feasible to do verification is getting bigger. And I think that will 612 00:57:11,180 --> 00:57:16,339 continue. You know, I don't know when you might see a completely verified TOR 613 00:57:16,339 --> 00:57:23,980 client, let's say, but that's not inconceivable now. And 20 years ago, that 614 00:57:23,980 --> 00:57:29,470 would have been completely inconceivable. So that is expanding and at the same time, 615 00:57:29,470 --> 00:57:36,829 I hope we see more of this interface, text based oracle specification - and these - 616 00:57:36,829 --> 00:57:41,000 when you're doing formal verification of a piece of the system, you have to have 617 00:57:41,000 --> 00:57:47,810 these indicators already defined, all right? So these two fit very well together. 618 00:57:47,810 --> 00:57:51,489 Herald: So the last question from microphone 2, please. 619 00:57:51,489 --> 00:57:56,040 Mic 2: Hi. You mentioned that you often find bugs in hardware. Is there any effort 620 00:57:56,040 --> 00:58:00,310 to verify the transistors on chips themselves as well? 621 00:58:00,310 --> 00:58:03,440 Peter: So there's a whole world of hardware verification. That wasn't my 622 00:58:03,440 --> 00:58:10,640 topic today. And most of the big processor vendors have teams working on this. 623 00:58:10,640 --> 00:58:14,440 Unsurprisingly, if you remember your history, many of them have teams focusing 624 00:58:14,440 --> 00:58:21,059 on the floating-point behaviour of their processors. And they're doing - they're 625 00:58:21,059 --> 00:58:26,981 also doing by the standards of ten years ago, pretty spectacularly well. So there 626 00:58:26,981 --> 00:58:33,181 are companies that have the whole of some execution unit formally verified. There's 627 00:58:33,181 --> 00:58:36,979 been a lot of work over the years on verification of cache protocols and such 628 00:58:36,979 --> 00:58:43,100 like. Right? There's a lot of work on not verifying the hardware as a whole, but 629 00:58:43,100 --> 00:58:47,519 verifying that, you know, the RTL-level description of the hardware matches some 630 00:58:47,519 --> 00:58:54,060 lower level description. So there is a lot of that going on. 631 00:58:54,060 --> 00:58:57,619 Herald: Thank you very much again for this great talk. 632 00:58:57,619 --> 00:59:00,480 Give him another applause. Thank you, Peter. 633 00:59:00,480 --> 00:59:01,790 Peter: Thank you. 634 00:59:01,790 --> 00:59:03,680 *applause*