1 00:00:00,000 --> 00:00:14,710 *34c3 intro* 2 00:00:14,710 --> 00:00:16,290 Herald: He's been publishing in academia 3 00:00:16,290 --> 00:00:22,520 for almost 30 years. Please join me in giving him a warm welcome to 34c3. 4 00:00:22,520 --> 00:00:30,367 *Applause* Alastiar Reid: Thank you very much for 5 00:00:30,367 --> 00:00:35,797 your introduction. So I'm going to be talking about the ARM processors and 6 00:00:35,797 --> 00:00:41,300 they're incredibly widely used. You find them in phones, tablets, IOT devices, 7 00:00:41,300 --> 00:00:46,449 hard-disk drives; they're all over. And if you think about it, what I'm saying is: 8 00:00:46,449 --> 00:00:50,500 They're in all the things, which contain your most private and personal data, so 9 00:00:50,500 --> 00:00:54,819 it's really, really important that they do exactly what they should be doing and 10 00:00:54,819 --> 00:00:59,469 nothing else. We need to make sure we really understand them and that means it's 11 00:00:59,469 --> 00:01:05,309 important that we can analyze them for any malware, look for vulnerabilities and so 12 00:01:05,309 --> 00:01:11,990 on. So I'm going to be talking about some work I started about six years ago, 13 00:01:11,990 --> 00:01:19,560 creating a very precise specification of what an ARM processor does and I'm going 14 00:01:19,560 --> 00:01:26,670 to be talking also about back in April I'm released this their specification in a 15 00:01:26,670 --> 00:01:34,189 machine readable form. And I should say that I'm working with Kimbridge University 16 00:01:34,189 --> 00:01:42,539 to turn that in something you can use. So so this talk I'm going to mostly actually 17 00:01:42,539 --> 00:01:47,560 talk about this executable processor specification, that's going to be the bulk 18 00:01:47,560 --> 00:01:53,249 of the talk but at the end that sets me up very nicely to talk about a formally 19 00:01:53,249 --> 00:01:57,979 verified software. So I thought, given the theme of the Congress, it would be more 20 00:01:57,979 --> 00:02:05,539 useful to emphasize things you could actually do. So the specification that ARM 21 00:02:05,539 --> 00:02:10,660 released, what's in it? Well, there's lots of instruction descriptions of course but 22 00:02:10,660 --> 00:02:14,690 there's also lots of really interesting security features; things to do with 23 00:02:14,690 --> 00:02:20,380 memory protection, exceptions, privilege checks and so on. 24 00:02:20,380 --> 00:02:24,440 So there's lots of really interesting stuff ... of your interest in their how 25 00:02:24,440 --> 00:02:30,690 secure your devices and how to make sure it really is secure. Throughout the talk 26 00:02:30,690 --> 00:02:36,080 I'll be giving a bunch of links; you can go and download the specs right now from 27 00:02:36,080 --> 00:02:42,580 the first link but please wait to the end of the talk and there's also the 28 00:02:42,580 --> 00:02:47,790 specification rendered as HTML, as tools that can take the specification release 29 00:02:47,790 --> 00:02:52,560 apart and find useful information in it; I've written blogs and papers about it as 30 00:02:52,560 --> 00:03:00,910 well. And so let's just dive into, look at the bits of the actual specification. So 31 00:03:00,910 --> 00:03:07,150 the first thing is all the really important security features in the specif- 32 00:03:07,150 --> 00:03:13,070 in our processor are controlled by, what I call, the system control registers and the 33 00:03:13,070 --> 00:03:19,570 top dog among all those control registers is this one here called SCTLR. Just trips 34 00:03:19,570 --> 00:03:26,370 off the tongue, doesn't it? So SCTLR is where - it's full of lots of individual 35 00:03:26,370 --> 00:03:32,120 control bits, which affect either optimizations, the processor's doing, or 36 00:03:32,120 --> 00:03:36,950 also security features. And so let's just zoom in and one of them, to give you an 37 00:03:36,950 --> 00:03:40,210 idea of what kind of information the spec contains. 38 00:03:40,210 --> 00:03:46,770 So here's some documentation, telling you about a WXN bit. What does that do? It 39 00:03:46,770 --> 00:03:53,190 makes sure that any code, any, that your stack cannot contain code. you can't boot 40 00:03:53,190 --> 00:03:58,250 instructions on the code, and on the stack, because they're proce- if you set 41 00:03:58,250 --> 00:04:04,180 this bit the processor won't execute them. In other words: This is the bit that 42 00:04:04,180 --> 00:04:12,220 triggered the requirement for things like return-oriented programming. Okay, so what 43 00:04:12,220 --> 00:04:18,380 can you do with this fact? Well, suppose you're in the habit of reverse engineering 44 00:04:18,380 --> 00:04:24,310 some code. You might, your disassemble may show you something like this. And you're 45 00:04:24,310 --> 00:04:28,120 probably all staring at this going: "What on earth does that do?", because it's 46 00:04:28,120 --> 00:04:33,131 extremely cryptic. But using the information that's in the XML version of 47 00:04:33,131 --> 00:04:40,710 the release you could easily figure out how to build, how to decode some of the 48 00:04:40,710 --> 00:04:45,710 more cryptic parts and go "Oh actually, it's that SCTLR register", right, so you 49 00:04:45,710 --> 00:04:50,490 could decode the cryptic name for the number for the register into that. But you 50 00:04:50,490 --> 00:04:54,720 could do a bit more than that. You can see it's setting one of the bits in the 51 00:04:54,720 --> 00:05:00,990 register so - it is of course the bit I just told you about WXN, so we could dig 52 00:05:00,990 --> 00:05:08,740 into that and, so we could kind of use the information from the XML to render it, 53 00:05:08,740 --> 00:05:13,560 perhaps, as like this. So you can make things a bit more useful 54 00:05:13,560 --> 00:05:17,550 and you can also take that documentation that was there, that told you what the WXN 55 00:05:17,550 --> 00:05:23,470 bit does, and maybe, if you're a feeling really energetic, you could, when you 56 00:05:23,470 --> 00:05:27,320 click on it, mouse over or whatever, it could bring up some of that documentation. 57 00:05:27,320 --> 00:05:31,870 And and that makes specf- that makes it much easier to understand what the cryptic 58 00:05:31,870 --> 00:05:36,240 piece of code at the top is doing. Okay, so that's just a very shallow thing you 59 00:05:36,240 --> 00:05:43,720 can get from the specification. If we dig into the instruction descriptions there's 60 00:05:43,720 --> 00:05:52,570 also things like the assembly language of - the specification of the assembly syntax 61 00:05:52,570 --> 00:05:59,030 and. So, something I did a few years ago and which I just wrote a blog article 62 00:05:59,030 --> 00:06:05,190 about over the weekend was was it's possible to actually take that 63 00:06:05,190 --> 00:06:09,900 specification, turn it into a disassembler. So I first of all 64 00:06:09,900 --> 00:06:16,170 transformed it into the code I'm showing at the bottom. What this is showing is how 65 00:06:16,170 --> 00:06:18,250 to take a binary description of an 66 00:06:18,250 --> 00:06:25,710 instruction so that's the the top line of typewriter font and and then turn that 67 00:06:25,710 --> 00:06:31,470 into strings, which is what this the code at the bottom is describing how to do. So 68 00:06:31,470 --> 00:06:35,242 so you can use this as a disassembler. It's actually possible to run it in 69 00:06:35,242 --> 00:06:41,940 reverse and use it as an assembler; you basically run the code from bottom to top 70 00:06:41,940 --> 00:06:47,080 and you can turn strings into binary instructions. And we'll see more of this 71 00:06:47,080 --> 00:06:57,430 running things in reverse in a few slides time. So the main thing about instructions 72 00:06:57,430 --> 00:07:03,330 is of course they do something. So the specification contains a description of 73 00:07:03,330 --> 00:07:08,650 exactly what an instruction does and that description is as code, which, as a 74 00:07:08,650 --> 00:07:13,150 programmer, makes me very happy, right. I don't understand the English words in the 75 00:07:13,150 --> 00:07:17,900 specification but I do understand what the code does. So one thing you can do with 76 00:07:17,900 --> 00:07:22,221 code is execute it, so let's just walk through - let's suppose ... take an 77 00:07:22,221 --> 00:07:29,300 instruction and I match against that diagram there. I might get some values for 78 00:07:29,300 --> 00:07:35,680 for some of the variables from that and then I can walk through, step by step, 79 00:07:35,680 --> 00:07:40,380 evaluating this code, until I eventually realize that register five is having some 80 00:07:40,380 --> 00:07:43,889 value assigned to it. Okay, so that's a fairly basic thing you 81 00:07:43,889 --> 00:07:47,380 can do with the specification; interpreters are fairly easy thing to 82 00:07:47,380 --> 00:07:53,210 implement but once you have it there's a lot you can build on top of it. And one 83 00:07:53,210 --> 00:07:57,820 thing that's surprisingly easy to implement is to extract a symbolic 84 00:07:57,820 --> 00:08:01,680 representation of what the instruction does. So I'll just show you quickly how 85 00:08:01,680 --> 00:08:07,400 you do that, using the interpreter. So let's take those same concrete values but 86 00:08:07,400 --> 00:08:11,600 I'm also, I've added three other variables at the side there, which I'm going to 87 00:08:11,600 --> 00:08:16,620 treat as symbolic variables. And as I walk through the code I won't just calculate 88 00:08:16,620 --> 00:08:24,230 some concrete values, like the value five or 32, I'll also build up a graph, 89 00:08:24,230 --> 00:08:29,710 representing exactly how I came up with these values at five and so on. So as I'm 90 00:08:29,710 --> 00:08:34,830 executing the code I can build a graph representing exactly what this code is 91 00:08:34,830 --> 00:08:41,188 doing. And what I can do now is just throw away the code and focus on what that graph 92 00:08:41,188 --> 00:08:46,259 is telling me. So I now have a symbolic representation of 93 00:08:46,259 --> 00:08:51,540 one slice through that, through that instruction and I can feed that to a 94 00:08:51,540 --> 00:08:56,369 constraint solver, so if any of you have heard of Z3 or SMT solvers, that's what 95 00:08:56,369 --> 00:09:05,129 I'm talking about here. And a constraint solver is a really useful tool, because 96 00:09:05,129 --> 00:09:09,940 you can run code forwards through it, so given some input values you can say what 97 00:09:09,940 --> 00:09:16,930 are the outputs from this function or from this instruction but you can also run 98 00:09:16,930 --> 00:09:22,279 them backwards, given some output value, some final result you want 99 00:09:22,279 --> 00:09:26,790 to see, you can ask what inputs would cause this to happen. And this is just 100 00:09:26,790 --> 00:09:31,070 tremendously useful if you're trying to figure out what instructions you want to 101 00:09:31,070 --> 00:09:36,389 use to generate some particular effect. All right, so if you're trying to figure 102 00:09:36,389 --> 00:09:41,800 out how some particular register could be accessed, how some particular thing could 103 00:09:41,800 --> 00:09:47,449 be turned on or off, being able to ask what inputs will cause this to happen is 104 00:09:47,449 --> 00:09:53,569 incredibly useful. And you can also use the constraint solver to ask for 105 00:09:53,569 --> 00:09:58,069 intermediate values, in the middle of the calculation. You know if you know some 106 00:09:58,069 --> 00:10:02,859 value you want to see there you can ask what the inputs that would cause that to 107 00:10:02,859 --> 00:10:07,260 happen. So, if any of you are familiar with tools 108 00:10:07,260 --> 00:10:13,949 like KLEE, which is a symbolic execution tool for based on LLVM, then they use 109 00:10:13,949 --> 00:10:19,999 something similar to this. So, I've shown you are fairly simple draft, something I 110 00:10:19,999 --> 00:10:26,300 could show you how you build it, kind of on the fly. This is the actual graph for 111 00:10:26,300 --> 00:10:31,259 that same instruction. Here I've added in a lot more nodes to do with some of the 112 00:10:31,259 --> 00:10:35,980 functions that were being called and to do with the calculating, whether to take a 113 00:10:35,980 --> 00:10:46,279 branch or not. So this is about 80 or 90 nodes. And I've been experimenting with 114 00:10:46,279 --> 00:10:52,360 extending this in two ways. So this is just one path through the execution of an 115 00:10:52,360 --> 00:10:56,779 instruction, so one way to improve on that is to build a graph that represents all 116 00:10:56,779 --> 00:11:00,499 possible paths through the instruction. And that's much more useful, because you 117 00:11:00,499 --> 00:11:04,990 can you then can point at something and say "how can I make that happen?" and it 118 00:11:04,990 --> 00:11:11,920 will tell you exactly what inputs will cause the path that will make that happen. 119 00:11:11,920 --> 00:11:16,351 I've also been experimenting with taking the entire specification, right, so that 120 00:11:16,351 --> 00:11:20,509 stuff that handles exceptions, that fetches instructions or execute 121 00:11:20,509 --> 00:11:24,699 instructions. It contains all instructions. And I've been experimenting 122 00:11:24,699 --> 00:11:29,790 with building a graph representing the entire specification all at once. And 123 00:11:29,790 --> 00:11:33,470 that's even more useful, because now pretty much any question you have about 124 00:11:33,470 --> 00:11:36,720 the specification, you can ask the constraint solver and it will come back 125 00:11:36,720 --> 00:11:43,490 and give you an answer. And this graph for the full specification is quite large. 126 00:11:43,490 --> 00:11:48,540 It's about half a million nodes and that's for the smallest specification that our 127 00:11:48,540 --> 00:11:53,050 major uses. So it's about half a million nodes. But the great thing is modern 128 00:11:53,050 --> 00:11:58,009 constraint solvers can read in that half million nodes and still give you answers. 129 00:11:58,009 --> 00:12:04,040 Typically in about 1 to 10 seconds for most of the questions posed to them. So, 130 00:12:04,040 --> 00:12:08,259 these are just tremendously useful tools, if you're wanting to be able to understand 131 00:12:08,259 --> 00:12:15,670 exactly what the specification does, and exactly how some program is going to 132 00:12:15,670 --> 00:12:20,689 behave or figure out what program you want to write to make it behave some particular 133 00:12:20,689 --> 00:12:28,850 way. Okay so I've talked a lot about instructions, but most of us actually run 134 00:12:28,850 --> 00:12:33,990 programs, right? So to turn this the specification into something in execute 135 00:12:33,990 --> 00:12:38,260 programs, in other words turn it into a simulator for the ARM architecture, you 136 00:12:38,260 --> 00:12:43,839 need to add a little bit of a loop that will handle interrupts, fake instructions 137 00:12:43,839 --> 00:12:49,980 and then it can execute them and handle any exceptions. So, I... So I did this. I 138 00:12:49,980 --> 00:12:55,860 added this loop to this specification. And then, I thought I'd better try testing the 139 00:12:55,860 --> 00:13:03,610 specification. And, so the good news for me, because I work for ARM I have access 140 00:13:03,610 --> 00:13:08,259 to ARM's internal test suite. Which is something that ARM has been working on 141 00:13:08,259 --> 00:13:12,720 basically since the company started 25, 30 years ago. So it's quite a large test 142 00:13:12,720 --> 00:13:17,579 suite. It is extremely thorough, has tens of thousands of test programs in it, runs 143 00:13:17,579 --> 00:13:25,140 billions of instructions. And so I set about testing the, testing the 144 00:13:25,140 --> 00:13:30,119 specification using this test suite. And if any of you have tested software you'll 145 00:13:30,119 --> 00:13:33,620 be familiar with a graph that looks like this, right? The start things don't work 146 00:13:33,620 --> 00:13:39,529 all that well. Gradually you get things working pretty well but then there's a 147 00:13:39,529 --> 00:13:45,899 heavy tail of difficult to find bugs. Okay, so, and that's basically what I 148 00:13:45,899 --> 00:13:49,869 found when I was testing the specification. But you should all be 149 00:13:49,869 --> 00:13:56,379 shocked by what I just said. Because what I'm seeing is ARM's official specification 150 00:13:56,379 --> 00:14:03,290 could not pass ARM's official test suite when I started, right. I mean that's 151 00:14:03,290 --> 00:14:10,269 that's pretty shocking. And I'm telling you this and emphasizing it, not because 152 00:14:10,269 --> 00:14:16,850 I think ARM's specification was especially bad. I think it was just typically bad. I 153 00:14:16,850 --> 00:14:22,589 think if you were to take any specification for, you know, any real- 154 00:14:22,589 --> 00:14:27,819 world system and actually test it against the test suite, well first of all you'd 155 00:14:27,819 --> 00:14:32,389 find it's not an executable specification most the time and secondly you'd find it 156 00:14:32,389 --> 00:14:38,499 wouldn't work. And you'd probably find it would work as badly as ARM's did. So it's 157 00:14:38,499 --> 00:14:42,850 not just that it didn't pass all the tests. It didn't pass any tests. In fact 158 00:14:42,850 --> 00:14:47,809 it took me weeks to get the processor or the specification to come out of reset. 159 00:14:47,809 --> 00:14:54,009 Right? Just to get the starting fix to get the first instruction took weeks. So and I 160 00:14:54,009 --> 00:15:00,379 think so I think you'd find this if you were to try any other specification. 161 00:15:00,379 --> 00:15:11,129 What's my next slide? So, moving on to useful thing you can do with the 162 00:15:11,129 --> 00:15:16,959 specification, something we tried last summer was using it for fuzz testing. So, 163 00:15:16,959 --> 00:15:21,490 fuzz testing consists of taking a program and throwing random inputs at the 164 00:15:21,490 --> 00:15:27,959 program and just seeing what breaks and it pretty much always breaks and a modern 165 00:15:27,959 --> 00:15:34,329 fuzztester like maybe AFL to make it more effective. It monitors something 166 00:15:34,329 --> 00:15:39,351 about how the program is executing and uses that to guide its choice of what to 167 00:15:39,351 --> 00:15:45,181 change next. So, if it's finding...in particular monitor whether an 168 00:15:45,181 --> 00:15:50,179 instruction... whether the program is taking a branch or not and if it sees it's 169 00:15:50,179 --> 00:15:54,240 taking lots of new branches then it goes: "Ok I'll keep trying more of what I'm 170 00:15:54,240 --> 00:15:58,379 doing at the moment because it seems to be effective, and if it's and not finding any 171 00:15:58,379 --> 00:16:03,910 new branches, then it will look for something else to try changing. So, that's 172 00:16:03,910 --> 00:16:06,970 kind of a normal fuzzing. What you're doing is basically trying to kind of 173 00:16:06,970 --> 00:16:15,230 maximize your branch coverage. So, what we did though, when we did this with the 174 00:16:15,230 --> 00:16:20,389 specification, was, we actually monitored branches not just in the in the binary 175 00:16:20,389 --> 00:16:28,399 that we were analyzing but also in the specification. And what this gave us was 176 00:16:28,399 --> 00:16:33,040 basically: if you got.. Suppose, your the binary you're analyzing is just straight 177 00:16:33,040 --> 00:16:37,459 line code. There's no branches in it at all. Then there's nothing really for your 178 00:16:37,459 --> 00:16:41,519 fuzzer to work with right. So it doesn't see that the code is interesting, it 179 00:16:41,519 --> 00:16:47,199 quickly moves on to something else. But maybe your straight line cord is doing 180 00:16:47,199 --> 00:16:50,940 something really interesting, like accessing a privileged register. Well, 181 00:16:50,940 --> 00:16:54,899 there will be a branch around that to check that you do have the privilege you 182 00:16:54,899 --> 00:16:58,250 require. Or maybe it's accessing a memory in which 183 00:16:58,250 --> 00:17:03,649 cases memory protection checks. There's also checks for: Is this a device 184 00:17:03,649 --> 00:17:09,750 register? Or is this a kind of RAM or ROM? So, there's a number of different branches 185 00:17:09,750 --> 00:17:13,429 there and that gives the fuzzer interesting things to... interesting 186 00:17:13,429 --> 00:17:20,320 feedback to play with. So, when we set this going on one of our microkernel, 187 00:17:22,770 --> 00:17:30,130 we analyzed the system call interface for that microkernel. And one of the things 188 00:17:30,130 --> 00:17:35,330 the fuzzer surprised us with was it said: Suppose I take the stack pointer and point 189 00:17:35,330 --> 00:17:39,940 it into the middle of this device space and then take an exception immediately, 190 00:17:39,940 --> 00:17:45,530 what happens? And the answer was: there was a bug in the microkernel and what it 191 00:17:45,530 --> 00:17:48,679 did was: it first thing it would do is read from the stack, where the stack 192 00:17:48,679 --> 00:17:55,320 pointer was pointing, so we do a read from one of the devices, which doesn't wasn't 193 00:17:55,320 --> 00:18:00,060 really what was intended. In fact it completely breaks a security model. So, 194 00:18:00,060 --> 00:18:06,610 fuzztesting using not just coverage of the monitoring branches in the binary but 195 00:18:06,610 --> 00:18:10,940 also the specification can find you a bunch of really interesting things. And I 196 00:18:10,940 --> 00:18:20,110 hope some of you will I pick this idea up and run with it. So, the reason that I was 197 00:18:20,110 --> 00:18:29,290 doing all this work was I wanted to be able to formally verify ARM processors and 198 00:18:29,290 --> 00:18:34,769 so I needed to create a specification before I could do that. So, sorry, let me 199 00:18:34,769 --> 00:18:42,690 just take a drink here... So, this is an overly simplified picture of a processor, 200 00:18:42,690 --> 00:18:48,159 it's more or less what processors looked like 25, 30 years ago, in fact. 201 00:18:48,159 --> 00:18:53,170 But it's good enough for the example. So, if you want to check a processor, is 202 00:18:53,170 --> 00:18:59,770 correct, then what you can do is: add an extra logic, which will monitor the values 203 00:18:59,770 --> 00:19:03,140 at the start of an instruction executing the values that are finally produced at 204 00:19:03,140 --> 00:19:07,700 the end of an instruction executing, and then if you feed those the specification 205 00:19:07,700 --> 00:19:11,830 you can see what the right answer should have been. You can compare that with what 206 00:19:11,830 --> 00:19:17,620 the processor actually did. So, you can do this using a test-based approach, right: 207 00:19:17,620 --> 00:19:21,200 just feed in inputs and check that everything matches. 208 00:19:21,200 --> 00:19:25,570 But you can also do it using a formal verification tool called a "bounded model 209 00:19:25,570 --> 00:19:31,250 checker". And a bounded model checker is like one of those constraint solvers I 210 00:19:31,250 --> 00:19:36,760 mentioned earlier on crack cocaine. So, what it will do is: it won't just try kind 211 00:19:36,760 --> 00:19:41,799 of one step for what can happen but will actually try multiple steps: all possible 212 00:19:41,799 --> 00:19:46,130 combinations of inputs for one instruction, two instructions, three 213 00:19:46,130 --> 00:19:49,809 instructions, longer and longer sequences of instructions, looking for ways they can 214 00:19:49,809 --> 00:19:56,279 fail the property. So, and this turned out to be an incredibly effective way of 215 00:19:56,279 --> 00:20:00,530 finding bugs in our processors. We've actually, we're going to be using 216 00:20:00,530 --> 00:20:07,720 this on all future processor developments. So, there's a paper that we wrote about 217 00:20:07,720 --> 00:20:12,299 this but, I recommend that you go find the video for the top by Clifford 218 00:20:12,299 --> 00:20:20,620 Wolf from a couple of hours ago, which describes a very similar process. Okay, so 219 00:20:20,620 --> 00:20:25,779 I'm encouraging you to see the specification and find something awesome 220 00:20:25,779 --> 00:20:30,070 to do with it. There's a bunch of ideas I have suggested there, and there's a few 221 00:20:30,070 --> 00:20:36,850 extras which I didn't have time to describe here. But now, let me turn to 222 00:20:36,850 --> 00:20:43,350 this title of the talk: How can you trust formally verified software? So, what I'm 223 00:20:43,350 --> 00:20:50,200 talking about here is: verifying a program against some specification. But, of 224 00:20:50,200 --> 00:20:54,510 course, programs don't just run in a vacuum. They run into some operating 225 00:20:54,510 --> 00:21:01,559 system that, they use some libraries and they're also written in some programming 226 00:21:01,559 --> 00:21:08,210 language. And, so, when you verify your program against your specification, you're 227 00:21:08,210 --> 00:21:15,309 also verifying them against the specifications of Linux, glibc and ISO-C, 228 00:21:15,309 --> 00:21:21,940 none of which have good specifications. In fact, they're just terrible specifications 229 00:21:21,940 --> 00:21:27,550 which I bear little resemblance to what compilers and operating systems actually 230 00:21:27,550 --> 00:21:34,559 do in practice. So, if you... the current state of things is: if you do verify your 231 00:21:34,559 --> 00:21:39,580 program against these specifications, you will find lots of bugs. You will make your 232 00:21:39,580 --> 00:21:47,961 software a lot more reliable, but you'll be doing it against specifications, which 233 00:21:47,961 --> 00:21:51,639 are probably not very good. Just as ARM's specification was not 234 00:21:51,639 --> 00:21:57,850 very good before I tested it really thoroughly. And so: Do I trust formally 235 00:21:57,850 --> 00:22:03,350 verified software? No, not really. It's a lot better for being formally verified, 236 00:22:03,350 --> 00:22:06,591 but I'd also want to test it quite thoroughly and maybe to use a bit of fuzz 237 00:22:06,591 --> 00:22:15,549 testing as well. Okay, so this is my final slide, by the way. So, I'm encouraging you 238 00:22:15,549 --> 00:22:19,500 to go out and do something with the specification. If you're interested in 239 00:22:19,500 --> 00:22:25,779 this, then do contact me! Do ask me questions, if you have trouble. I'm also 240 00:22:25,779 --> 00:22:31,759 going to mention: if there's any white hat hackers out there in the... white hat 241 00:22:31,759 --> 00:22:36,639 hackers in the audience, then do please talk to me or Milisch Meriac who's here in 242 00:22:36,639 --> 00:22:44,750 the front row, if you're interested in working at ARM and I like to thank a whole 243 00:22:44,750 --> 00:22:50,150 lot of people at ARM and elsewhere, who've helped me in this work and also I'd like 244 00:22:50,150 --> 00:22:54,149 to thank you for giving me your attention for the last half hour. 245 00:22:54,149 --> 00:22:59,789 *Applause* 246 00:22:59,789 --> 00:23:03,330 Herald: So, we have time for some 247 00:23:03,330 --> 00:23:06,340 questions. I would ask anyone that has a question to line up at one of the 248 00:23:06,340 --> 00:23:10,559 microphones that are in the aisles here one through eight. Questions for Alastair 249 00:23:10,559 --> 00:23:14,019 there about formal verification, about working at ARM, how is the weather in 250 00:23:14,019 --> 00:23:19,539 Cambridge. Try to keep it on topic. And signal angel: do we have already questions 251 00:23:19,539 --> 00:23:22,590 from online? Signal Angel: No questions yet. 252 00:23:22,590 --> 00:23:25,409 Herald: Okay, then let's go to microphone number one. 253 00:23:25,409 --> 00:23:33,150 Microphone 1: Hi... I was just AR: Maybe tiptoes. 254 00:23:33,150 --> 00:23:38,009 MIC 1: Yes, I was just curious how are you actually using the machine specification 255 00:23:38,009 --> 00:23:42,360 in order to generate VCs for the SMG solver. Because you didn't really get a 256 00:23:42,360 --> 00:23:47,759 chance to talk about that I guess. AR: Trying to think how I can describe 257 00:23:47,759 --> 00:23:56,639 that briefly... My basic idea is to take the specification, which basically... the 258 00:23:56,639 --> 00:24:00,630 specification is describing a piece of hardware. And so, I try to do what a 259 00:24:00,630 --> 00:24:04,610 hardware engineer would do, if they were actually building a machine that 260 00:24:04,610 --> 00:24:09,360 implemented it. So I end up with something that's essentially a giant circuit. So, 261 00:24:09,360 --> 00:24:14,289 that was the graph I was describing. So, whenever this control flow, whenever the 262 00:24:14,289 --> 00:24:18,960 control flow joins back up, I have to introduce things to slide between the 263 00:24:18,960 --> 00:24:22,259 value of what was calculated in the left or the right. And so on. 264 00:24:22,259 --> 00:24:27,270 MIC 1: Yeah, I guess I'm mostly curious about, like, what logics you're using for, 265 00:24:27,270 --> 00:24:31,240 like, the solvers and stuff like that. AR: I see. Just very basic solvers because 266 00:24:31,240 --> 00:24:34,440 they run fastest. MIC 1: Then ugh 267 00:24:34,440 --> 00:24:40,539 H: Thank you. So, microphone 6 please. MIC 6: I was just wondering, if you could 268 00:24:40,539 --> 00:24:46,730 talk some little bit about the language you used to write your specification. 269 00:24:46,730 --> 00:24:54,309 AR: So this is a language which basically I inherited from ARM's documentation. So, 270 00:24:54,309 --> 00:24:58,440 all processors are described using pseudocode and what I realized was that 271 00:24:58,440 --> 00:25:02,789 the pseudocode the ARM had started writing was actually very close to being a 272 00:25:02,789 --> 00:25:06,879 language. And so, I sort of reverse engineered the language hiding inside the 273 00:25:06,879 --> 00:25:14,200 pseudocode, built some tools for it, and just kind of figured out what the language 274 00:25:14,200 --> 00:25:20,370 could possibly mean, given what I thought processors actually did. 275 00:25:20,370 --> 00:25:27,850 And the language itself is... it's just a very simple imperative language. It's 276 00:25:27,850 --> 00:25:33,740 actually got inherits from BBC basic for anyone who's about the same age as me and 277 00:25:33,740 --> 00:25:39,620 remembers BBC basic or it's a bit like Pascal... It's it's not a complicated 278 00:25:39,620 --> 00:25:44,059 language. It's actually designed so that as many people as possible can read it and 279 00:25:44,059 --> 00:25:47,540 know exactly what it says without any doubt, without having to consult a 280 00:25:47,540 --> 00:25:53,239 language lawyer. H: Signal angel? Yet anything? 281 00:25:53,239 --> 00:25:58,990 Signal Angel: Yes, now we've got a question: "Has ARM its own form of the 282 00:25:58,999 --> 00:26:01,099 Intel Management engine?" 283 00:26:08,950 --> 00:26:12,220 AR: No is the short answer. Yeah, I don't 284 00:26:12,220 --> 00:26:20,160 think we have anything quite like that. If you... yeah, I'll just say no. 285 00:26:20,160 --> 00:26:22,929 *Laughter* H: Microphone one. 286 00:26:22,929 --> 00:26:27,970 MIC 1: Hi! On that question that we had before about the specification language: 287 00:26:27,970 --> 00:26:33,779 Have you considered using Z3's own language for expressing the instructions? 288 00:26:33,779 --> 00:26:41,380 AR: So, Z3's own language is basically write kind of s-expressions, which, if you 289 00:26:41,380 --> 00:26:46,690 like lisp is wonderful but for anybody who doesn't like Lisp it's a bit of a turn-off 290 00:26:46,690 --> 00:26:50,169 and a bit of a barrier to understanding it. So, again the specification is 291 00:26:50,169 --> 00:26:54,410 designed so that people can look at it and go: "Oh, I see what's going on here", and 292 00:26:54,410 --> 00:26:57,259 not have... and I just try to minimize the barriers. 293 00:26:57,259 --> 00:27:03,230 MIC 1: Fair enough! H: Last call, signal angel! 294 00:27:04,684 --> 00:27:07,923 SA: How long does the complete test of the arms specification take? 295 00:27:09,540 --> 00:27:20,929 AR: About two years. So, yeah, so a modern processor team about 80% of that team will 296 00:27:20,929 --> 00:27:26,679 be verification engineers. And, so, they're basically writing new tests, 297 00:27:26,679 --> 00:27:30,150 running old tests, diagnosing them, just doing that continuously for the entire 298 00:27:30,150 --> 00:27:34,960 life of a project, which is probably about three years. And after about the first 299 00:27:34,960 --> 00:27:38,860 year, you basically have a processor that it mostly works, and after that it's kind of 300 00:27:38,860 --> 00:27:47,590 debugging it and it's, you know, kind of fine-tuning the performance. So, yeah, it 301 00:27:47,590 --> 00:27:53,929 takes a really long time. To run the actual tests, I don't actually know 302 00:27:53,929 --> 00:28:00,400 because actually one of my colleagues in the audience, they've actually run the 303 00:28:00,400 --> 00:28:07,960 tests. But, yeah, I don't know... and we use a lot of processors in parallel, so I 304 00:28:07,960 --> 00:28:12,620 really don't have an idea. H: Thank you so much, Alastair! Let's give 305 00:28:12,620 --> 00:28:18,476 him another warm round of applause! *Applause* 306 00:28:18,476 --> 00:28:24,017 *34c3 outro* 307 00:28:24,017 --> 00:28:40,000 subtitles created by c3subtitles.de in the year 2018. Join, and help us!