0 00:00:00,000 --> 00:00:30,000 Dear viewer, these subtitles were generated by a machine via the service Trint and therefore are (very) buggy. If you are capable, please help us to create good quality subtitles: https://c3subtitles.de/talk/57 Thanks! 1 00:00:08,950 --> 00:00:11,049 Mathias is a student 2 00:00:11,050 --> 00:00:13,149 at UC Berkeley 3 00:00:13,150 --> 00:00:15,849 and Boston postdoc 4 00:00:17,440 --> 00:00:18,440 Worries. 5 00:00:18,940 --> 00:00:21,189 And he's really interested 6 00:00:21,190 --> 00:00:22,479 in security, virtualization, 7 00:00:22,480 --> 00:00:23,589 exploitation. 8 00:00:23,590 --> 00:00:25,809 And he spoke with me about using, 9 00:00:25,810 --> 00:00:27,969 um, um, 10 00:00:27,970 --> 00:00:30,039 symbolic execution as a powerful way to 11 00:00:30,040 --> 00:00:31,119 analyze programs. 12 00:00:31,120 --> 00:00:32,039 Thank you very much. 13 00:00:32,040 --> 00:00:34,219 Hey, thanks for the introduction. 14 00:00:34,220 --> 00:00:35,220 Welcome. 15 00:00:38,240 --> 00:00:39,529 No worries, no worries. 16 00:00:39,530 --> 00:00:41,929 I was perfectly fine, 17 00:00:41,930 --> 00:00:43,639 good to be here. Good to be back at the 18 00:00:43,640 --> 00:00:45,739 CCC last 19 00:00:45,740 --> 00:00:47,779 year. Unfortunately, I didn't make it, 20 00:00:47,780 --> 00:00:50,299 although I made it here to 21 00:00:50,300 --> 00:00:52,069 the tenth time now. 22 00:00:52,070 --> 00:00:53,779 So 10 times in 11 years. 23 00:00:53,780 --> 00:00:55,489 And I really missed it last year and I'm 24 00:00:55,490 --> 00:00:56,419 so happy to be back. 25 00:00:56,420 --> 00:00:58,519 It's awesome. Even though the location 26 00:00:58,520 --> 00:01:00,499 is a bit different from Berlin, it's 27 00:01:00,500 --> 00:01:01,999 really nice to be to be back 28 00:01:03,170 --> 00:01:04,669 for this talk. 29 00:01:04,670 --> 00:01:06,829 If the lights work at one point in time, 30 00:01:06,830 --> 00:01:08,899 I hope the debacle in the room 31 00:01:08,900 --> 00:01:10,700 one doesn't repeat itself. 32 00:01:12,950 --> 00:01:14,539 Can you get. OK, awesome. 33 00:01:14,540 --> 00:01:15,540 Awesome. 34 00:01:16,320 --> 00:01:18,529 Let me take you on a journey down 35 00:01:18,530 --> 00:01:20,699 to a rabbit hole, that 36 00:01:20,700 --> 00:01:21,849 symbolic execution. 37 00:01:23,120 --> 00:01:25,309 So and let's follow Alice 38 00:01:25,310 --> 00:01:27,799 down that down hole and 39 00:01:27,800 --> 00:01:29,869 see how far we can go. 40 00:01:29,870 --> 00:01:31,819 Symbolic execution has had a bad 41 00:01:31,820 --> 00:01:34,429 reputation for quite a while now. 42 00:01:34,430 --> 00:01:36,649 Many say it's it's overhyped, 43 00:01:36,650 --> 00:01:38,180 it's oversold and 44 00:01:39,500 --> 00:01:41,359 people are promising too much. 45 00:01:41,360 --> 00:01:43,789 And in this talk, I want to discuss 46 00:01:43,790 --> 00:01:45,919 with you what it can do and also 47 00:01:45,920 --> 00:01:47,149 what it can do. 48 00:01:47,150 --> 00:01:49,249 So what are the limitations of 49 00:01:49,250 --> 00:01:50,269 symbolic executions? 50 00:01:50,270 --> 00:01:51,589 How far can we go? 51 00:01:51,590 --> 00:01:53,779 What kind of things can we explore and so 52 00:01:53,780 --> 00:01:54,679 on? 53 00:01:54,680 --> 00:01:56,509 What are the limitations and how can we 54 00:01:56,510 --> 00:01:57,590 get around them? 55 00:01:59,810 --> 00:02:02,209 This is joint work with Stephen McCammon, 56 00:02:02,210 --> 00:02:04,519 who started all the amazing 57 00:02:04,520 --> 00:02:06,649 Fuzzball stuff and who is the main 58 00:02:06,650 --> 00:02:09,049 contributor of the symbolic execution 59 00:02:09,050 --> 00:02:11,179 engine and maintainer 60 00:02:11,180 --> 00:02:13,219 of the engine as well then. 61 00:02:13,220 --> 00:02:15,439 And Alex, who did amazing reverse 62 00:02:15,440 --> 00:02:17,479 engineering work while they were at UC 63 00:02:17,480 --> 00:02:19,609 Berkeley and a whole bunch of 64 00:02:19,610 --> 00:02:21,739 other students, interns and so on, that 65 00:02:21,740 --> 00:02:23,329 really helped us get along and get us 66 00:02:23,330 --> 00:02:23,989 that far. 67 00:02:23,990 --> 00:02:26,179 So almost 68 00:02:26,180 --> 00:02:27,589 all of the praise goes to them. 69 00:02:27,590 --> 00:02:29,899 I'm just delivering the message 70 00:02:29,900 --> 00:02:31,129 to you. 71 00:02:31,130 --> 00:02:33,649 So let's start the some of the 72 00:02:33,650 --> 00:02:35,269 of the preconditions when been following 73 00:02:35,270 --> 00:02:36,270 down at whole. 74 00:02:36,980 --> 00:02:39,529 So there are actually a large 75 00:02:39,530 --> 00:02:41,809 set of tools 76 00:02:41,810 --> 00:02:44,059 to help you find books and 77 00:02:44,060 --> 00:02:45,979 discover books and so on. 78 00:02:45,980 --> 00:02:48,379 And we all know that finding books 79 00:02:48,380 --> 00:02:50,089 and creches, that's the easy part. 80 00:02:50,090 --> 00:02:52,729 Right. So getting something to crash, 81 00:02:52,730 --> 00:02:55,129 getting a six-fold, that's easy. 82 00:02:55,130 --> 00:02:57,229 There's fuzzing, there's bound up 83 00:02:57,230 --> 00:02:59,299 model checking and 84 00:02:59,300 --> 00:03:01,069 test cases and a whole bunch of other 85 00:03:01,070 --> 00:03:03,379 things that can help you to to get 86 00:03:03,380 --> 00:03:05,569 to these books or to to actually 87 00:03:05,570 --> 00:03:07,189 trigger the books. 88 00:03:07,190 --> 00:03:09,379 But the hard part then comes 89 00:03:09,380 --> 00:03:11,599 how can we generate an exploit 90 00:03:11,600 --> 00:03:13,999 or how can we find a specific 91 00:03:14,000 --> 00:03:16,069 form of input's that travels along 92 00:03:16,070 --> 00:03:18,319 the path, inside the application 93 00:03:18,320 --> 00:03:20,389 that will then trigger this vulnerability 94 00:03:20,390 --> 00:03:22,519 conditions deep, maybe deep inside the 95 00:03:22,520 --> 00:03:23,689 program? 96 00:03:23,690 --> 00:03:25,849 How can we actually define all 97 00:03:25,850 --> 00:03:28,099 these little things that flow through 98 00:03:28,100 --> 00:03:30,559 the application that end up 99 00:03:30,560 --> 00:03:32,629 and come together at the vulnerability 100 00:03:32,630 --> 00:03:34,309 point where the bug is to actually 101 00:03:34,310 --> 00:03:36,049 trigger the vulnerability? 102 00:03:36,050 --> 00:03:38,299 Also, how can we cope with all 103 00:03:38,300 --> 00:03:40,609 the transformations that the input 104 00:03:40,610 --> 00:03:42,769 undergoes until it reaches the final 105 00:03:42,770 --> 00:03:43,939 point? 106 00:03:43,940 --> 00:03:46,039 And we'll look into how we can 107 00:03:46,040 --> 00:03:48,109 use symbolic execution to 108 00:03:48,110 --> 00:03:50,239 kind of trigger this point and 109 00:03:50,240 --> 00:03:52,579 follow down the chain deeper and deeper 110 00:03:52,580 --> 00:03:53,580 into the rabbit hole. 111 00:03:55,460 --> 00:03:57,979 So on a very high level, what we do, 112 00:03:57,980 --> 00:04:00,229 we assume that we have found some 113 00:04:00,230 --> 00:04:02,389 vulnerability condition in inside 114 00:04:02,390 --> 00:04:04,399 the application, maybe deep inside the 115 00:04:04,400 --> 00:04:06,619 application, hidden behind some 116 00:04:06,620 --> 00:04:08,719 form of transformation's and so on. 117 00:04:08,720 --> 00:04:10,309 And what we do is starting from that 118 00:04:10,310 --> 00:04:13,159 transformation. We walk back step 119 00:04:13,160 --> 00:04:15,409 by step by step by 120 00:04:15,410 --> 00:04:17,569 step until we 121 00:04:17,570 --> 00:04:18,889 hit the input of the program. 122 00:04:18,890 --> 00:04:21,169 Some input that we can control, actually, 123 00:04:21,170 --> 00:04:22,909 and we will then generate a proof of 124 00:04:22,910 --> 00:04:25,519 concept input that follows down the 125 00:04:25,520 --> 00:04:27,709 chain of transformations until it hits 126 00:04:27,710 --> 00:04:30,139 the vulnerability and for all that 127 00:04:30,140 --> 00:04:30,979 we use. 128 00:04:30,980 --> 00:04:33,289 So until now, abstract 129 00:04:33,290 --> 00:04:35,419 form of symbolic execution, our 130 00:04:35,420 --> 00:04:38,149 magic box that reverts 131 00:04:38,150 --> 00:04:40,249 and inverses all these transformations 132 00:04:40,250 --> 00:04:42,469 to give us the inputs that we 133 00:04:42,470 --> 00:04:44,659 want to look at and 134 00:04:44,660 --> 00:04:46,759 we'll discuss what kind of tools 135 00:04:46,760 --> 00:04:49,219 for mystic's and optimizations we can do 136 00:04:49,220 --> 00:04:51,319 to make it actually feasible and scaled 137 00:04:51,320 --> 00:04:53,419 up to two larger systems so that we 138 00:04:53,420 --> 00:04:55,459 can actually work with real programs and 139 00:04:55,460 --> 00:04:57,679 not just little 20 line toyi programs 140 00:04:57,680 --> 00:05:00,019 that some of the 141 00:05:00,020 --> 00:05:01,239 papers are talking about. 142 00:05:02,240 --> 00:05:03,499 So and 143 00:05:04,610 --> 00:05:06,679 after Demotivation as a next step, I 144 00:05:06,680 --> 00:05:09,199 will quickly introduce symbolic execution 145 00:05:09,200 --> 00:05:11,299 as a technique, how it works, 146 00:05:11,300 --> 00:05:13,579 what kind of tool exists, and 147 00:05:13,580 --> 00:05:15,969 maybe a little bit at the 148 00:05:15,970 --> 00:05:18,199 at the limitations of these tools. 149 00:05:19,400 --> 00:05:21,829 So what actually 150 00:05:21,830 --> 00:05:23,480 is symbolic execution? 151 00:05:24,590 --> 00:05:27,199 So on one hand, 152 00:05:27,200 --> 00:05:29,569 it's an abstract interpretation 153 00:05:29,570 --> 00:05:30,570 of code 154 00:05:32,240 --> 00:05:35,089 instead of using concrete values 155 00:05:35,090 --> 00:05:37,379 when we execute individual and. 156 00:05:37,380 --> 00:05:39,749 We use symbolic values and carry 157 00:05:39,750 --> 00:05:42,539 these symbolic values along the execution 158 00:05:42,540 --> 00:05:44,879 of the program, so we kind of virtualize 159 00:05:44,880 --> 00:05:46,919 it and instead of concrete values, we use 160 00:05:46,920 --> 00:05:48,989 symbolic formulas that get longer 161 00:05:48,990 --> 00:05:51,869 and longer the more instructions that we 162 00:05:51,870 --> 00:05:53,939 execute and with all the additional site 163 00:05:53,940 --> 00:05:55,730 conditions and so on that we carry along. 164 00:05:57,000 --> 00:05:59,069 Also, it's agnostic 165 00:05:59,070 --> 00:06:01,229 to concrete values as every value that 166 00:06:01,230 --> 00:06:03,359 we have, every concrete value turns into 167 00:06:03,360 --> 00:06:04,679 formula. 168 00:06:04,680 --> 00:06:06,419 And then if we have additional 169 00:06:06,420 --> 00:06:08,789 constraints like an end to some values, 170 00:06:08,790 --> 00:06:10,889 some static value or some 171 00:06:12,000 --> 00:06:14,189 some Solectron or anything like that, 172 00:06:14,190 --> 00:06:17,399 then the concretization helps to reduce 173 00:06:17,400 --> 00:06:19,139 the complexity of the formulas that we 174 00:06:19,140 --> 00:06:20,429 have to carry all along. 175 00:06:23,250 --> 00:06:24,600 And the goal is 176 00:06:25,920 --> 00:06:28,949 if we finish the transformation 177 00:06:28,950 --> 00:06:31,680 and can produce a specific 178 00:06:32,730 --> 00:06:34,989 condition that holds to all the 179 00:06:34,990 --> 00:06:37,239 the past. Along the way, we can find 180 00:06:37,240 --> 00:06:39,449 the set of concrete inputs that 181 00:06:39,450 --> 00:06:41,579 will actually fulfill the condition that 182 00:06:41,580 --> 00:06:43,119 we have in the end. 183 00:06:43,120 --> 00:06:45,479 And we can then use this one 184 00:06:45,480 --> 00:06:48,149 to trigger some interesting condition. 185 00:06:48,150 --> 00:06:50,519 So let's drink the Kool-Aid 186 00:06:50,520 --> 00:06:52,919 and look what we can do with it. 187 00:06:52,920 --> 00:06:55,529 So for our vulnerability, 188 00:06:55,530 --> 00:06:57,779 for our example, we define a 189 00:06:57,780 --> 00:07:00,089 set of abstract 190 00:07:00,090 --> 00:07:02,819 conditions at specific locations. 191 00:07:02,820 --> 00:07:05,009 And symbolic execution is then used 192 00:07:05,010 --> 00:07:07,109 to trigger an input 193 00:07:07,110 --> 00:07:09,239 to generate an input that we can use 194 00:07:09,240 --> 00:07:11,339 to actually trigger that's that 195 00:07:11,340 --> 00:07:13,409 vulnerability at this other location. 196 00:07:13,410 --> 00:07:15,419 And there are two kind of ways that 197 00:07:15,420 --> 00:07:17,189 symbolic execution is used for. 198 00:07:17,190 --> 00:07:18,659 And one of them is testing. 199 00:07:18,660 --> 00:07:21,029 So for finding bugs in applications. 200 00:07:21,030 --> 00:07:23,279 And the conditions that we encode are 201 00:07:23,280 --> 00:07:25,409 a set of preconditions and post 202 00:07:25,410 --> 00:07:27,539 conditions that we want 203 00:07:27,540 --> 00:07:30,419 you to hold at specific locations. 204 00:07:30,420 --> 00:07:32,639 So maybe a specific condition 205 00:07:32,640 --> 00:07:34,479 would be that the return instruction 206 00:07:34,480 --> 00:07:37,349 pointer on the stack is never overwritten 207 00:07:37,350 --> 00:07:39,539 with some other values, or 208 00:07:39,540 --> 00:07:41,549 we use the assertions that are already in 209 00:07:41,550 --> 00:07:43,619 the code and we use 210 00:07:43,620 --> 00:07:45,749 symbolic execution to ensure that 211 00:07:45,750 --> 00:07:47,519 there's no inputs that can trigger these 212 00:07:47,520 --> 00:07:49,649 assertions under this specific set 213 00:07:49,650 --> 00:07:50,650 of conditions. 214 00:07:52,140 --> 00:07:54,599 Or an other option is 215 00:07:54,600 --> 00:07:56,699 to use symbolic execution 216 00:07:56,700 --> 00:07:58,709 for exploit generation. 217 00:07:58,710 --> 00:08:00,419 So we have a set of conditions. 218 00:08:00,420 --> 00:08:03,179 If these conditions would hold, then 219 00:08:03,180 --> 00:08:05,429 we would be able to exploit the 220 00:08:05,430 --> 00:08:06,899 program in one way or another. 221 00:08:06,900 --> 00:08:09,659 And we then use symbolic execution to 222 00:08:09,660 --> 00:08:12,479 generate the inputs that triggers 223 00:08:12,480 --> 00:08:14,730 this safety violation on the 224 00:08:15,870 --> 00:08:16,870 on the higher level. 225 00:08:17,970 --> 00:08:18,970 So. 226 00:08:20,540 --> 00:08:22,939 Our goal is to kind 227 00:08:22,940 --> 00:08:25,129 of revert or let the machine do 228 00:08:25,130 --> 00:08:27,589 most of the work, but we only define 229 00:08:27,590 --> 00:08:29,689 a set of conditions we are interested in 230 00:08:29,690 --> 00:08:31,759 and then let the machine do all 231 00:08:31,760 --> 00:08:33,769 the work and let us just look at the 232 00:08:33,770 --> 00:08:35,269 final result. 233 00:08:35,270 --> 00:08:36,678 What kind of tools are out there? 234 00:08:36,679 --> 00:08:38,839 What what what kind of how does the 235 00:08:38,840 --> 00:08:40,158 gang look like that? We have. 236 00:08:40,159 --> 00:08:42,439 So there are three major symbolic 237 00:08:42,440 --> 00:08:44,599 execution tools that all cater to 238 00:08:44,600 --> 00:08:46,039 different needs. 239 00:08:46,040 --> 00:08:49,099 One of them is Fuzzball, the one that 240 00:08:49,100 --> 00:08:52,039 we developed at UC Berkeley, and 241 00:08:52,040 --> 00:08:54,139 we are selling it as a tool for 242 00:08:55,400 --> 00:08:57,289 proof of concept exploit generation. 243 00:08:57,290 --> 00:08:59,689 A proof of concept exploit is not 244 00:08:59,690 --> 00:09:01,519 something that is 245 00:09:02,780 --> 00:09:04,369 exploited. You can shoot out and execute 246 00:09:04,370 --> 00:09:05,929 your code right away, but it just 247 00:09:05,930 --> 00:09:07,639 triggers a condition. 248 00:09:07,640 --> 00:09:09,739 Maybe it overwrites the IP and 249 00:09:09,740 --> 00:09:11,539 redirect it redirected to somewhere else 250 00:09:11,540 --> 00:09:13,339 and you can then use it, use this one to 251 00:09:13,340 --> 00:09:14,479 bootstrap it further on. 252 00:09:15,530 --> 00:09:16,909 All these tools are open source, of 253 00:09:16,910 --> 00:09:18,409 course, and you can download them and 254 00:09:18,410 --> 00:09:21,229 play around and do your own stuff. 255 00:09:21,230 --> 00:09:23,659 S3 and other awesome tool developed 256 00:09:23,660 --> 00:09:26,389 at CPFL in Switzerland, 257 00:09:26,390 --> 00:09:28,399 which offers you selective symbolic 258 00:09:28,400 --> 00:09:30,769 execution on a large, large scale 259 00:09:30,770 --> 00:09:32,599 system. And you can trigger these 260 00:09:32,600 --> 00:09:35,389 symbolic values along the execution of 261 00:09:35,390 --> 00:09:37,490 of larger applications and systems 262 00:09:39,710 --> 00:09:41,809 and Clie and other 263 00:09:41,810 --> 00:09:44,149 symbolic execution tool that's been sold 264 00:09:44,150 --> 00:09:46,219 to or developed to find 265 00:09:46,220 --> 00:09:48,649 bugs in and applications 266 00:09:48,650 --> 00:09:50,330 and in source code and so on. 267 00:09:52,190 --> 00:09:55,249 All of those, as I said, are open source 268 00:09:55,250 --> 00:09:56,599 and many of them have 269 00:09:58,460 --> 00:10:00,439 helpful information on how you can 270 00:10:00,440 --> 00:10:02,169 bootstrap and and get along. 271 00:10:03,560 --> 00:10:06,889 So now that you know that the concept 272 00:10:06,890 --> 00:10:09,379 of symbolic execution a little bit, 273 00:10:09,380 --> 00:10:11,209 I want to give you a concrete example so 274 00:10:11,210 --> 00:10:13,369 that you can actually see what the what 275 00:10:13,370 --> 00:10:15,559 the power is and 276 00:10:15,560 --> 00:10:17,929 what better example than one of the 277 00:10:17,930 --> 00:10:20,119 vortex war games. 278 00:10:20,120 --> 00:10:21,120 These are 279 00:10:22,400 --> 00:10:24,589 hacker games where you have to try 280 00:10:24,590 --> 00:10:26,599 to exploit specific conditions and 281 00:10:26,600 --> 00:10:28,669 programs and they are really good 282 00:10:28,670 --> 00:10:30,889 to develop your skills and get 283 00:10:30,890 --> 00:10:33,319 used to to coding and vulnerabilities 284 00:10:33,320 --> 00:10:35,689 and so on. And I thought, well, 285 00:10:35,690 --> 00:10:37,129 I don't like to think that much. 286 00:10:37,130 --> 00:10:39,199 So let's see if we 287 00:10:39,200 --> 00:10:41,599 can use symbolic execution to actually 288 00:10:41,600 --> 00:10:43,819 help us solve one of these 289 00:10:43,820 --> 00:10:45,229 deals for games. 290 00:10:45,230 --> 00:10:47,269 And this one is one of the early ones. 291 00:10:47,270 --> 00:10:49,069 I guess it's number one or number two or 292 00:10:49,070 --> 00:10:51,769 something like that out of thirtyish. 293 00:10:51,770 --> 00:10:54,109 And it's a program 294 00:10:54,110 --> 00:10:55,729 that reads and input's transfers. 295 00:10:55,730 --> 00:10:57,829 It does some stuff and 296 00:10:58,940 --> 00:11:01,099 then triggers a shell and 297 00:11:01,100 --> 00:11:02,689 drops you into a shell. 298 00:11:02,690 --> 00:11:04,789 And if you look at a code a little bit, 299 00:11:04,790 --> 00:11:07,039 we see that somehow we have to 300 00:11:07,040 --> 00:11:09,169 overwrite the pointer with, 301 00:11:09,170 --> 00:11:11,299 okay, and then we'll trigger 302 00:11:11,300 --> 00:11:12,589 our win. 303 00:11:12,590 --> 00:11:14,929 So how can we trigger that? 304 00:11:14,930 --> 00:11:17,059 It's down at the default case 305 00:11:17,060 --> 00:11:19,279 in the big switch where the input is loop 306 00:11:19,280 --> 00:11:21,559 through and execute it and 307 00:11:21,560 --> 00:11:22,519 oh let's see. 308 00:11:22,520 --> 00:11:23,859 Oh, we actually have a 309 00:11:25,130 --> 00:11:26,689 right winnability, but we can actually 310 00:11:26,690 --> 00:11:28,789 write some 311 00:11:28,790 --> 00:11:30,889 values into memory 312 00:11:30,890 --> 00:11:33,020 that are protected by a guard 313 00:11:34,040 --> 00:11:36,199 if he said if he cannot write after 314 00:11:36,200 --> 00:11:37,429 the end of the buffer. 315 00:11:37,430 --> 00:11:38,929 But luckily we can write before the 316 00:11:38,930 --> 00:11:41,089 buffer and we see 317 00:11:41,090 --> 00:11:43,039 that pointer is initialized as a static. 318 00:11:43,040 --> 00:11:44,809 All these you know, we know the address 319 00:11:44,810 --> 00:11:48,229 relative to the to that other buffer 320 00:11:48,230 --> 00:11:50,719 we can using a special character, 321 00:11:50,720 --> 00:11:52,999 we can decrements the pointer and 322 00:11:53,000 --> 00:11:55,339 move it back and forth along 323 00:11:55,340 --> 00:11:56,340 the 324 00:11:57,680 --> 00:12:00,619 along to execute a code or a long memory. 325 00:12:00,620 --> 00:12:03,079 And in the end, it looks like this 326 00:12:03,080 --> 00:12:05,299 pointer points to Buffer and our switch 327 00:12:05,300 --> 00:12:07,309 case. We have three different versions of 328 00:12:07,310 --> 00:12:08,310 ternary 329 00:12:09,620 --> 00:12:12,469 selection between a debug output 330 00:12:12,470 --> 00:12:15,139 that commanding the pointer and 331 00:12:15,140 --> 00:12:17,569 either triggering our win 332 00:12:17,570 --> 00:12:19,729 or writing 333 00:12:19,730 --> 00:12:20,730 a value to memory. 334 00:12:21,710 --> 00:12:23,959 This sounds still very abstract and 335 00:12:23,960 --> 00:12:26,119 the problem size is three to the power 336 00:12:26,120 --> 00:12:28,279 of NP that we have to walk through 337 00:12:28,280 --> 00:12:30,469 and we'll just look at the at a quick 338 00:12:30,470 --> 00:12:32,720 demo and I hope that I can switch over. 339 00:12:35,060 --> 00:12:36,789 Yeah, looks good. 340 00:12:46,460 --> 00:12:47,869 So we do have 341 00:12:49,580 --> 00:12:52,009 our code here instead of 342 00:12:53,180 --> 00:12:54,859 dropping us into a shell. 343 00:12:54,860 --> 00:12:56,929 I changed changed it into a clean 344 00:12:56,930 --> 00:12:59,059 exit with an exit number. 345 00:12:59,060 --> 00:13:01,189 Twenty three just makes it a little bit 346 00:13:01,190 --> 00:13:04,219 easier. And I help 347 00:13:04,220 --> 00:13:05,929 symbolic execution a bit 348 00:13:08,330 --> 00:13:10,399 by allocating 349 00:13:10,400 --> 00:13:12,769 the buffer at a specific 350 00:13:12,770 --> 00:13:14,989 offset instead 351 00:13:14,990 --> 00:13:16,639 of at a random address. 352 00:13:17,960 --> 00:13:20,119 And in the end, we'll 353 00:13:20,120 --> 00:13:22,519 just walk through the 354 00:13:22,520 --> 00:13:23,809 through the buffer and let symbolic 355 00:13:23,810 --> 00:13:25,579 execution handle it. 356 00:13:25,580 --> 00:13:27,589 I remove the debug output. 357 00:13:27,590 --> 00:13:29,539 So if Subotic execution engine would 358 00:13:29,540 --> 00:13:30,859 trigger the debug, I don't want to 359 00:13:30,860 --> 00:13:33,049 clutter to the 360 00:13:33,050 --> 00:13:34,050 screen all the time. 361 00:13:35,630 --> 00:13:38,029 So let's compile it, 362 00:13:38,030 --> 00:13:40,159 compiled it and let's 363 00:13:40,160 --> 00:13:42,349 see what kind of stuff it generated 364 00:13:42,350 --> 00:13:44,389 a large set of conditions. 365 00:13:44,390 --> 00:13:46,729 We started in our symbolic execution 366 00:13:46,730 --> 00:13:49,129 engine. We assume Linux system 367 00:13:49,130 --> 00:13:50,130 called model. 368 00:13:53,220 --> 00:13:55,849 We virtualize the different components. 369 00:13:55,850 --> 00:13:56,850 We set 370 00:13:58,190 --> 00:14:00,469 six symbolic input bytes 371 00:14:00,470 --> 00:14:02,569 at our input buffer that we want 372 00:14:02,570 --> 00:14:04,189 our symbolic execution engine to 373 00:14:04,190 --> 00:14:05,190 generate. 374 00:14:06,200 --> 00:14:08,479 And we give it a start address 375 00:14:08,480 --> 00:14:10,669 where we start the symbolic execution and 376 00:14:10,670 --> 00:14:12,619 address where we end symbolic execution 377 00:14:12,620 --> 00:14:14,659 and let's just run it so the system 378 00:14:14,660 --> 00:14:16,009 initializes. 379 00:14:16,010 --> 00:14:18,559 We have a set of system calls 380 00:14:18,560 --> 00:14:20,059 and now it's trying different inputs. 381 00:14:20,060 --> 00:14:22,039 It tries to follow all the different 382 00:14:22,040 --> 00:14:24,079 states that are possible and tries to 383 00:14:24,080 --> 00:14:25,909 generate the inputs that will trigger our 384 00:14:25,910 --> 00:14:27,499 win at one point in time. 385 00:14:27,500 --> 00:14:29,659 And you see that all these ones 386 00:14:29,660 --> 00:14:31,699 that you see scrolling by are generic 387 00:14:31,700 --> 00:14:34,189 inputs and it works through 388 00:14:34,190 --> 00:14:37,219 the backslash that triggers the decrement 389 00:14:37,220 --> 00:14:39,679 right values and 390 00:14:39,680 --> 00:14:40,680 specific 391 00:14:42,140 --> 00:14:43,909 print debug prints. 392 00:14:43,910 --> 00:14:45,830 It removes the debug prints pretty soon. 393 00:14:46,970 --> 00:14:48,589 And now we see that it's already finished 394 00:14:48,590 --> 00:14:50,149 in roughly twenty two seconds on this 395 00:14:50,150 --> 00:14:52,639 poor little netbook that I have here. 396 00:14:52,640 --> 00:14:54,379 And it's running on the netbook, not on 397 00:14:54,380 --> 00:14:56,030 the on the cluster. 398 00:14:57,500 --> 00:15:00,469 And we see that it generated a set of 399 00:15:00,470 --> 00:15:03,079 back slashes to decrement the pointer 400 00:15:03,080 --> 00:15:05,329 it writes and see 401 00:15:05,330 --> 00:15:07,129 which is two hundred and two in decimal 402 00:15:07,130 --> 00:15:09,319 and then you can write any value to 403 00:15:09,320 --> 00:15:11,719 actually trigger the win condition. 404 00:15:11,720 --> 00:15:13,849 And we just use symbolic execution 405 00:15:13,850 --> 00:15:16,339 to solve a vortex war game in 406 00:15:16,340 --> 00:15:19,399 three minutes, which is kind of nice. 407 00:15:19,400 --> 00:15:21,139 When I solved it the first time, it took 408 00:15:21,140 --> 00:15:22,610 me like three hours or so. 409 00:15:23,630 --> 00:15:25,039 So that makes it a bit easier. 410 00:15:27,020 --> 00:15:28,159 But let's continue 411 00:15:30,410 --> 00:15:32,929 so we now know what symbolic execution 412 00:15:32,930 --> 00:15:34,339 is capable of. 413 00:15:34,340 --> 00:15:36,409 But how far can 414 00:15:36,410 --> 00:15:37,999 we stretch symbolic execution? 415 00:15:38,000 --> 00:15:39,559 What are the limitations that will run 416 00:15:39,560 --> 00:15:41,089 into it? 417 00:15:41,090 --> 00:15:42,529 Looks kind of a nice tool, right? 418 00:15:42,530 --> 00:15:44,809 So if we could just use it to write our 419 00:15:44,810 --> 00:15:46,579 up exploits or upside for us, that would 420 00:15:46,580 --> 00:15:47,779 be perfect. Right. 421 00:15:47,780 --> 00:15:49,399 But unfortunately, that's not going to 422 00:15:49,400 --> 00:15:51,169 work. So we're going to look at the 423 00:15:51,170 --> 00:15:53,569 different limitations regarding the input 424 00:15:53,570 --> 00:15:55,729 size, the amount of symbolic bytes 425 00:15:55,730 --> 00:15:58,249 that we have and the amount of code 426 00:15:58,250 --> 00:16:00,259 that we can cover symbolically because we 427 00:16:00,260 --> 00:16:02,659 always have to execute it symbolically 428 00:16:02,660 --> 00:16:05,089 using some execution 429 00:16:05,090 --> 00:16:07,609 engine. It has a high amount of overhead. 430 00:16:07,610 --> 00:16:09,859 So how far or 431 00:16:09,860 --> 00:16:12,319 does symbolic execution, symbolic 432 00:16:12,320 --> 00:16:14,509 execution scale and if so, how far 433 00:16:14,510 --> 00:16:15,510 does it scale? 434 00:16:16,970 --> 00:16:19,159 We start using a simple 435 00:16:19,160 --> 00:16:21,319 example and we use a run length 436 00:16:21,320 --> 00:16:23,539 encoding, which is one of the simplest 437 00:16:24,560 --> 00:16:26,299 compression formats that you could think 438 00:16:26,300 --> 00:16:28,699 of. It was invented in the 70s 439 00:16:28,700 --> 00:16:29,700 for 440 00:16:32,950 --> 00:16:35,329 two to encode fax images 441 00:16:35,330 --> 00:16:37,249 and stuff like that, but it's still a 442 00:16:37,250 --> 00:16:39,349 valid encoding in all the PDF formats 443 00:16:39,350 --> 00:16:41,539 in any PDF reader has to 444 00:16:41,540 --> 00:16:43,099 has to actually implement it. 445 00:16:43,100 --> 00:16:44,100 In this in this back, 446 00:16:45,860 --> 00:16:48,049 we assume that 447 00:16:48,050 --> 00:16:50,509 the output is given and we have a decoder 448 00:16:50,510 --> 00:16:52,639 function, but we want to 449 00:16:52,640 --> 00:16:55,219 have a specific input that then decodes 450 00:16:55,220 --> 00:16:57,349 to our given output if 451 00:16:57,350 --> 00:16:59,389 we only have the function that decodes. 452 00:16:59,390 --> 00:17:01,609 So we use symbolic execution to compute 453 00:17:01,610 --> 00:17:04,009 the pre image to a given 454 00:17:04,010 --> 00:17:04,969 given output. 455 00:17:04,970 --> 00:17:07,068 And we evaluate the performance of 456 00:17:07,069 --> 00:17:09,289 Cleon Fuzzball, two of the the bigger 457 00:17:09,290 --> 00:17:10,939 symbolic execution's engines that we 458 00:17:10,940 --> 00:17:13,039 could just run on top of it. 459 00:17:13,040 --> 00:17:15,078 And we implemented our lead the same way 460 00:17:15,079 --> 00:17:16,819 as it is implemented in PDF. 461 00:17:17,930 --> 00:17:20,269 So let's look at how far 462 00:17:20,270 --> 00:17:22,729 symbolic execution actually scales 463 00:17:22,730 --> 00:17:24,679 on the X axis. 464 00:17:24,680 --> 00:17:27,348 We see different configurations 465 00:17:27,349 --> 00:17:29,569 for this, for the simple benchmark with 466 00:17:29,570 --> 00:17:31,669 input and output length. 467 00:17:31,670 --> 00:17:33,769 So three input bytes, three 468 00:17:33,770 --> 00:17:35,929 output bytes, up to five input 469 00:17:35,930 --> 00:17:38,329 bytes and three output bytes. 470 00:17:38,330 --> 00:17:40,399 And on the Y axis, you see the runtime in 471 00:17:40,400 --> 00:17:42,739 seconds. And remember, it's a logarithmic 472 00:17:42,740 --> 00:17:44,809 runtime and we set a ten 473 00:17:44,810 --> 00:17:45,929 hour. 474 00:17:45,930 --> 00:17:48,029 Ten hour limit, so 475 00:17:48,030 --> 00:17:50,549 as soon as we scale 476 00:17:50,550 --> 00:17:51,900 up the input sizes, 477 00:17:53,010 --> 00:17:55,109 we see that the output 478 00:17:55,110 --> 00:17:57,239 of the time we need 479 00:17:57,240 --> 00:17:59,609 to run, the experiment goes up. 480 00:17:59,610 --> 00:18:01,739 And if we are using only 481 00:18:01,740 --> 00:18:04,259 seven input lights that produce 482 00:18:04,260 --> 00:18:07,199 nine output bytes, 483 00:18:07,200 --> 00:18:09,269 we already run a couple 484 00:18:09,270 --> 00:18:10,199 of hours. 485 00:18:10,200 --> 00:18:12,629 So vanilla symbolic 486 00:18:12,630 --> 00:18:14,579 execution does not scale, if you use it 487 00:18:14,580 --> 00:18:16,739 that way, will run into the state 488 00:18:16,740 --> 00:18:18,779 explosion and you cannot invert these 489 00:18:18,780 --> 00:18:20,459 processes. So we have to get around that. 490 00:18:20,460 --> 00:18:21,460 Somehow 491 00:18:23,760 --> 00:18:25,889 we just run into the state 492 00:18:25,890 --> 00:18:28,380 explosion that we follow 493 00:18:29,520 --> 00:18:31,139 at each decision point in a controlled 494 00:18:31,140 --> 00:18:33,569 flow graph. We follow 495 00:18:33,570 --> 00:18:35,669 all the policies that follow this 496 00:18:35,670 --> 00:18:36,670 this decision point. 497 00:18:39,290 --> 00:18:40,909 So if you have a simple control flow 498 00:18:40,910 --> 00:18:43,279 graph here and we start executing 499 00:18:43,280 --> 00:18:44,280 from the top, 500 00:18:45,560 --> 00:18:47,209 first of all, the symbolic execution 501 00:18:47,210 --> 00:18:49,549 engine follows one single condition 502 00:18:49,550 --> 00:18:51,049 and updates all the passes and 503 00:18:51,050 --> 00:18:52,729 constraints along the way. 504 00:18:52,730 --> 00:18:55,549 Then we follow we have to follow both 505 00:18:55,550 --> 00:18:57,169 the second and the sixth 506 00:18:59,540 --> 00:19:01,789 point. So we already have to 507 00:19:01,790 --> 00:19:02,899 pass. 508 00:19:02,900 --> 00:19:05,239 We continue test 509 00:19:05,240 --> 00:19:07,889 on the on the right hand side. 510 00:19:07,890 --> 00:19:09,829 We have more and more passes that are 511 00:19:09,830 --> 00:19:11,599 trying to get her to continue. 512 00:19:11,600 --> 00:19:13,879 We continue and we see that 513 00:19:13,880 --> 00:19:15,949 the longer we executed symbolically, we 514 00:19:15,950 --> 00:19:17,959 have to carry on all these constraints 515 00:19:17,960 --> 00:19:19,579 and the formulas get longer and longer 516 00:19:19,580 --> 00:19:20,239 and longer. 517 00:19:20,240 --> 00:19:21,949 And that's how we how we run into the 518 00:19:21,950 --> 00:19:23,149 state explosion. 519 00:19:23,150 --> 00:19:25,309 And the problem is that we cannot 520 00:19:25,310 --> 00:19:27,499 decide which ones are more interesting 521 00:19:27,500 --> 00:19:28,969 to follow and which one or not. 522 00:19:30,890 --> 00:19:32,959 So we're basically drowning 523 00:19:32,960 --> 00:19:35,149 into the sea of tears, 524 00:19:35,150 --> 00:19:36,439 just like Ellis did. 525 00:19:36,440 --> 00:19:38,509 And we want to know what the reasons are 526 00:19:38,510 --> 00:19:40,400 and how we can get around those. 527 00:19:41,600 --> 00:19:43,819 So one of the reasons 528 00:19:43,820 --> 00:19:46,119 for the state explosion is 529 00:19:46,120 --> 00:19:48,079 there's too much input or output the 530 00:19:48,080 --> 00:19:50,149 output data that we're interested in and 531 00:19:50,150 --> 00:19:51,679 there's not much we can do about that. 532 00:19:51,680 --> 00:19:53,809 That's that's what we wanted to scale in 533 00:19:53,810 --> 00:19:55,219 the first place. 534 00:19:55,220 --> 00:19:57,379 The other one the other reason is 535 00:19:57,380 --> 00:20:00,049 that there's too much included state. 536 00:20:00,050 --> 00:20:02,209 So we have to limit the 537 00:20:02,210 --> 00:20:04,549 amount of symbolic state that we carry or 538 00:20:04,550 --> 00:20:06,289 carry around. 539 00:20:06,290 --> 00:20:08,899 And there's too much execute a code 540 00:20:08,900 --> 00:20:11,389 in one go. So it gets too complex 541 00:20:11,390 --> 00:20:13,069 for the symbolic execution engine. 542 00:20:13,070 --> 00:20:15,319 And we have to use a divide and conquer 543 00:20:15,320 --> 00:20:17,419 strategy strategy to split 544 00:20:17,420 --> 00:20:19,309 it up into into multiple steps. 545 00:20:19,310 --> 00:20:20,329 And we're going to see how we're going to 546 00:20:20,330 --> 00:20:21,330 do that later on. 547 00:20:22,250 --> 00:20:25,129 So let's scale it up 548 00:20:25,130 --> 00:20:26,929 instead of just seven symbolic input 549 00:20:26,930 --> 00:20:28,580 bytes. We want to have more. 550 00:20:29,630 --> 00:20:31,669 We want to solve the problem in the first 551 00:20:31,670 --> 00:20:33,989 step using some set of patristic. 552 00:20:33,990 --> 00:20:36,349 So if we can encode somehow 553 00:20:36,350 --> 00:20:38,599 specific conditions that 554 00:20:38,600 --> 00:20:40,519 help us in the symbolic execution 555 00:20:40,520 --> 00:20:43,159 process, how far can we go then? 556 00:20:43,160 --> 00:20:45,289 How far can we can we scale it 557 00:20:45,290 --> 00:20:47,419 up if Chote behaves according 558 00:20:47,420 --> 00:20:49,729 to a set of properties that we define 559 00:20:49,730 --> 00:20:51,499 in the next step? 560 00:20:51,500 --> 00:20:53,479 So what are interesting input size is for 561 00:20:53,480 --> 00:20:55,909 us, how far can be scale it up 562 00:20:55,910 --> 00:20:58,079 in this limited space of computation that 563 00:20:58,080 --> 00:20:59,080 we have available 564 00:21:00,710 --> 00:21:02,719 10 symbolic input bytes? 565 00:21:02,720 --> 00:21:04,789 Well, this is kind of an address, 566 00:21:04,790 --> 00:21:07,279 an offset or a pointer or two, 567 00:21:07,280 --> 00:21:08,509 but not much more. 568 00:21:08,510 --> 00:21:10,549 We can already do that with the existing 569 00:21:10,550 --> 00:21:12,140 symbolic execution engines. 570 00:21:13,180 --> 00:21:14,189 Other options. 571 00:21:14,190 --> 00:21:16,399 If you have 20 to 80 symbolic 572 00:21:16,400 --> 00:21:18,649 bytes, we can actually invert 573 00:21:18,650 --> 00:21:20,839 some shell code, a smaller chain 574 00:21:20,840 --> 00:21:23,089 or something like that, but 575 00:21:23,090 --> 00:21:25,279 unfortunately not much more. 576 00:21:25,280 --> 00:21:26,989 And that's already the limit. 577 00:21:26,990 --> 00:21:28,189 If you have a lot of computation 578 00:21:28,190 --> 00:21:29,869 available that you can do with with 579 00:21:29,870 --> 00:21:30,870 current engines. 580 00:21:31,850 --> 00:21:33,079 And if you want to have more than two 581 00:21:33,080 --> 00:21:35,479 hundred symbolic input bytes, we can 582 00:21:35,480 --> 00:21:37,609 encode longer options at our 583 00:21:37,610 --> 00:21:40,009 vulnerability condition or 584 00:21:40,010 --> 00:21:42,019 even complete data structures that are 585 00:21:42,020 --> 00:21:44,239 present at our vulnerability condition 586 00:21:44,240 --> 00:21:45,679 and flow through to through the 587 00:21:45,680 --> 00:21:47,809 application until we hit that stack 588 00:21:47,810 --> 00:21:48,810 condition. 589 00:21:49,460 --> 00:21:51,559 So first of all Ristic to 590 00:21:51,560 --> 00:21:53,720 the rescue, let's use, 591 00:21:55,070 --> 00:21:57,649 let's assume some properties on the code 592 00:21:57,650 --> 00:21:59,479 or let's assume that the code follows 593 00:21:59,480 --> 00:22:01,819 some well-defined properties for 594 00:22:01,820 --> 00:22:03,409 specific transformations. 595 00:22:03,410 --> 00:22:05,380 And we looked at a whole bunch of 596 00:22:07,340 --> 00:22:09,859 of code blocks that transfer 597 00:22:09,860 --> 00:22:12,349 input data in one way or another 598 00:22:12,350 --> 00:22:15,079 and most of the transformations 599 00:22:15,080 --> 00:22:16,429 are subjective. 600 00:22:16,430 --> 00:22:18,489 So there exists 601 00:22:18,490 --> 00:22:20,659 an transformation. 602 00:22:20,660 --> 00:22:22,849 So for a compress function, 603 00:22:22,850 --> 00:22:25,159 there exists a decompressed function 604 00:22:25,160 --> 00:22:27,619 for encode function is the decode 605 00:22:27,620 --> 00:22:29,809 function for image 606 00:22:29,810 --> 00:22:31,879 decoding, there's an image encoding and 607 00:22:31,880 --> 00:22:32,869 so on. 608 00:22:32,870 --> 00:22:34,939 So we assume that 609 00:22:36,020 --> 00:22:38,719 this holds we also assume sequential 610 00:22:38,720 --> 00:22:40,999 sequentially. So as soon as that our 611 00:22:41,000 --> 00:22:42,979 code transformation writes a specific 612 00:22:42,980 --> 00:22:45,139 input byte, we assume that this output 613 00:22:45,140 --> 00:22:48,229 byte will never be revoked. 614 00:22:48,230 --> 00:22:50,449 And we also assume that transformations 615 00:22:50,450 --> 00:22:52,729 are streaming. So the amount of 616 00:22:52,730 --> 00:22:55,339 symbolic state inside the execution 617 00:22:55,340 --> 00:22:57,499 engine is bound and does not rely 618 00:22:57,500 --> 00:22:59,449 on the complete input data that's 619 00:22:59,450 --> 00:23:01,489 available. And if these transformations 620 00:23:01,490 --> 00:23:02,779 hold, then we can 621 00:23:04,940 --> 00:23:06,109 we can do some 622 00:23:07,490 --> 00:23:09,809 tricks and we can use some specific 623 00:23:09,810 --> 00:23:12,469 Ristic to cut some corners. 624 00:23:12,470 --> 00:23:14,719 And some of the risks that we use is 625 00:23:14,720 --> 00:23:16,909 to prune early and prune often as soon 626 00:23:16,910 --> 00:23:18,769 as the target becomes unreachable and 627 00:23:18,770 --> 00:23:20,899 just drop this data that will never reach 628 00:23:20,900 --> 00:23:23,119 our our goal. 629 00:23:23,120 --> 00:23:25,459 Another one is to be greedy. 630 00:23:25,460 --> 00:23:27,679 So if we see that one specific 631 00:23:27,680 --> 00:23:30,319 state or one specific path that we follow 632 00:23:30,320 --> 00:23:31,969 and the symbolic execution engine 633 00:23:31,970 --> 00:23:34,369 produces a lot of symbolic output bytes 634 00:23:34,370 --> 00:23:36,589 or a lot of output bytes that we desire, 635 00:23:36,590 --> 00:23:38,409 we'll actually prioritize this. 636 00:23:38,410 --> 00:23:40,389 In our search and try to follow that one 637 00:23:40,390 --> 00:23:42,699 deeper into the into the rabbit 638 00:23:42,700 --> 00:23:44,529 hole to get as much output data as we 639 00:23:44,530 --> 00:23:47,349 can, and we do some specific 640 00:23:47,350 --> 00:23:49,989 optimizations for error excesses 641 00:23:49,990 --> 00:23:52,209 by combining formulas or 642 00:23:52,210 --> 00:23:54,159 splitting formulas depending on the set 643 00:23:54,160 --> 00:23:55,160 of conditions. 644 00:23:56,480 --> 00:23:57,849 But you can read about that in the paper 645 00:23:57,850 --> 00:23:59,529 if you're interested. 646 00:23:59,530 --> 00:24:02,439 So if encode this restricts, 647 00:24:02,440 --> 00:24:03,639 how far does it scale? 648 00:24:03,640 --> 00:24:05,139 So we use the same transformation as 649 00:24:05,140 --> 00:24:06,140 before. 650 00:24:06,940 --> 00:24:09,189 This time we use three configurations 651 00:24:09,190 --> 00:24:11,379 CLI, Fuzzball and Fuzzball 652 00:24:11,380 --> 00:24:14,499 with our newly encoded heuristics 653 00:24:14,500 --> 00:24:16,719 and we see, well, 654 00:24:16,720 --> 00:24:18,399 it's not too bad. 655 00:24:18,400 --> 00:24:21,529 We are doing much better then 656 00:24:21,530 --> 00:24:22,950 than an optimized version. 657 00:24:24,190 --> 00:24:26,349 So we still see some 658 00:24:26,350 --> 00:24:28,509 kind of explosion, but 659 00:24:28,510 --> 00:24:30,399 it's not as bad as before. 660 00:24:30,400 --> 00:24:31,779 So heretics help. 661 00:24:33,140 --> 00:24:34,689 Well, a little, 662 00:24:36,370 --> 00:24:39,159 but the state explosion still remains 663 00:24:39,160 --> 00:24:41,319 and we have to cope with that much 664 00:24:41,320 --> 00:24:42,320 further. 665 00:24:42,880 --> 00:24:43,880 So 666 00:24:45,040 --> 00:24:47,349 let's see if we can use divide 667 00:24:47,350 --> 00:24:49,449 and conquer approach to break it 668 00:24:49,450 --> 00:24:50,949 up into smaller steps. 669 00:24:52,330 --> 00:24:54,819 Juristic got us so far or that far. 670 00:24:54,820 --> 00:24:56,769 Now let's take a data structure or data 671 00:24:56,770 --> 00:24:59,169 centric approach and splits the process 672 00:24:59,170 --> 00:25:01,299 of computation of computing into 673 00:25:01,300 --> 00:25:03,489 pretty much into several smaller step. 674 00:25:03,490 --> 00:25:05,170 And we tackle each step on its own. 675 00:25:07,300 --> 00:25:09,489 So we have the same program, 676 00:25:09,490 --> 00:25:11,769 abstract program, some vulnerability 677 00:25:11,770 --> 00:25:14,079 condition as we had in our motivation. 678 00:25:14,080 --> 00:25:15,909 And we know that the input flows through 679 00:25:15,910 --> 00:25:18,249 that application to reach to 680 00:25:18,250 --> 00:25:20,619 the vulnerability point in the end. 681 00:25:20,620 --> 00:25:21,730 But this time 682 00:25:23,140 --> 00:25:25,329 we know that just using 683 00:25:25,330 --> 00:25:27,279 one symbolic execution process is not 684 00:25:27,280 --> 00:25:29,349 going to work and it's going 685 00:25:29,350 --> 00:25:31,269 to run into the state explosion and we're 686 00:25:31,270 --> 00:25:33,159 not going to be able to produce anything 687 00:25:33,160 --> 00:25:34,599 that's interesting. 688 00:25:34,600 --> 00:25:36,279 So we actually have to take a different 689 00:25:36,280 --> 00:25:37,269 approach. 690 00:25:37,270 --> 00:25:39,459 And by following a data centric approach, 691 00:25:40,750 --> 00:25:43,089 we know that input is processed 692 00:25:43,090 --> 00:25:45,189 in each program in several steps. 693 00:25:45,190 --> 00:25:47,079 So first, the input is copied into one 694 00:25:47,080 --> 00:25:48,249 input buffer. 695 00:25:48,250 --> 00:25:49,869 Then there's some transformation onto the 696 00:25:49,870 --> 00:25:51,549 data before it is copied into the next 697 00:25:51,550 --> 00:25:52,719 buffer. 698 00:25:52,720 --> 00:25:54,099 After that, there's another 699 00:25:54,100 --> 00:25:55,719 transformation and it's copied into the 700 00:25:55,720 --> 00:25:56,649 next buffer. 701 00:25:56,650 --> 00:25:58,059 And at one point in time they'll reach 702 00:25:58,060 --> 00:25:59,559 the vulnerability condition. 703 00:25:59,560 --> 00:26:01,899 So now, instead of using one big chunk 704 00:26:01,900 --> 00:26:04,019 of symbolic execution to compute the 705 00:26:04,020 --> 00:26:06,189 premise in one go, we use 706 00:26:06,190 --> 00:26:08,409 several chunks of symbolic execution 707 00:26:08,410 --> 00:26:10,329 and concrete this size to values that 708 00:26:10,330 --> 00:26:12,399 each at each decision point that we 709 00:26:12,400 --> 00:26:13,599 figure out. 710 00:26:13,600 --> 00:26:15,549 So we use several steps of symbolic 711 00:26:15,550 --> 00:26:17,829 execution to compute the 712 00:26:17,830 --> 00:26:19,720 image in the end and trigger 713 00:26:22,150 --> 00:26:23,859 and solve to solve this problem. 714 00:26:25,060 --> 00:26:27,509 So again, that's 715 00:26:27,510 --> 00:26:29,079 symbolic execution scale. 716 00:26:29,080 --> 00:26:31,539 If you use this this approach 717 00:26:31,540 --> 00:26:34,299 and this time we use two transformations 718 00:26:34,300 --> 00:26:37,119 which are both valid for PDF files. 719 00:26:37,120 --> 00:26:39,519 First, we use a hex 720 00:26:39,520 --> 00:26:41,979 decoding, so we have a long hex 721 00:26:41,980 --> 00:26:44,079 encoded string that is first text 722 00:26:44,080 --> 00:26:46,149 decoded by the PDF library 723 00:26:46,150 --> 00:26:48,549 and then our lead code to 724 00:26:48,550 --> 00:26:49,899 display an image. 725 00:26:49,900 --> 00:26:52,029 Imagine we have a ticket 726 00:26:52,030 --> 00:26:54,339 digitalized text message somewhere 727 00:26:54,340 --> 00:26:56,859 embedded in a PDF and both 728 00:26:56,860 --> 00:26:59,139 hex encoding and decoding are 729 00:26:59,140 --> 00:27:01,209 valid PDF transformations, according to 730 00:27:01,210 --> 00:27:02,210 the PDF specs. 731 00:27:03,880 --> 00:27:06,159 And we start 732 00:27:06,160 --> 00:27:08,169 with the second location, so it's the 733 00:27:08,170 --> 00:27:10,389 second transformation refers to the early 734 00:27:10,390 --> 00:27:12,729 decoding and then the hex decoding 735 00:27:14,950 --> 00:27:16,239 and the look at the different 736 00:27:16,240 --> 00:27:18,039 configurations of these these three 737 00:27:18,040 --> 00:27:19,040 applications again. 738 00:27:20,800 --> 00:27:23,979 So let's look at the code itself 739 00:27:23,980 --> 00:27:26,319 in a simple, simplified way. 740 00:27:26,320 --> 00:27:28,539 We see that input is copied into 741 00:27:28,540 --> 00:27:30,669 the first buffer buffer zero 742 00:27:30,670 --> 00:27:32,889 and then the ASCII 743 00:27:32,890 --> 00:27:34,989 code, the 744 00:27:34,990 --> 00:27:37,749 decoding right into the buffer one 745 00:27:37,750 --> 00:27:40,119 and then the buffer one is decoded using 746 00:27:40,120 --> 00:27:42,429 RNA, decoding into buffer to where 747 00:27:42,430 --> 00:27:44,379 we might trigger a vulnerability at one 748 00:27:44,380 --> 00:27:45,819 point in time. 749 00:27:45,820 --> 00:27:47,769 And we constrain symbolic execution to 750 00:27:47,770 --> 00:27:49,779 one transformation at a time. 751 00:27:49,780 --> 00:27:51,849 And the symbolic input bytes 752 00:27:51,850 --> 00:27:53,470 only have to flow through one 753 00:27:54,670 --> 00:27:56,409 through one transformation. 754 00:27:56,410 --> 00:27:58,359 So we have to buffer zero, which goes 755 00:27:58,360 --> 00:28:00,489 through hex code with buffer one, we 756 00:28:00,490 --> 00:28:02,619 goes through our old code and 757 00:28:02,620 --> 00:28:04,689 then we end up with our 758 00:28:04,690 --> 00:28:06,639 win condition and we use two steps of 759 00:28:06,640 --> 00:28:07,720 symbolic execution. 760 00:28:10,330 --> 00:28:12,859 So let's try a quick 761 00:28:12,860 --> 00:28:13,860 other TMO, 762 00:28:15,170 --> 00:28:17,199 I guess I have a couple more minutes, 763 00:28:17,200 --> 00:28:18,200 hopefully, 764 00:28:19,330 --> 00:28:20,859 which should be enough for that. 765 00:28:26,780 --> 00:28:27,790 Oh, that's wrong one. 766 00:28:32,800 --> 00:28:33,800 Here we go. 767 00:28:40,920 --> 00:28:43,019 So we have our 768 00:28:43,020 --> 00:28:45,569 decoding and. 769 00:28:54,340 --> 00:28:57,949 Let's just copy that part, 770 00:28:57,950 --> 00:28:58,950 not all of it. 771 00:29:08,150 --> 00:29:09,150 No. 772 00:29:10,210 --> 00:29:11,590 I'm doing that on purpose, you know. 773 00:29:16,180 --> 00:29:17,709 Yes, here we go. 774 00:29:19,940 --> 00:29:23,179 And we're going to first decode the 775 00:29:23,180 --> 00:29:26,329 Aurélie buffer and we want to produce 776 00:29:26,330 --> 00:29:28,709 a string, Serguei 777 00:29:28,710 --> 00:29:30,889 3C is awesome and 778 00:29:30,890 --> 00:29:33,229 we let the symbolic execution find 779 00:29:33,230 --> 00:29:35,269 out. So the symbolic execution engine 780 00:29:35,270 --> 00:29:37,909 knows about the Aurélie decoding 781 00:29:37,910 --> 00:29:40,129 and it's trying different, different kind 782 00:29:40,130 --> 00:29:41,119 of states. 783 00:29:41,120 --> 00:29:43,519 And this, Aurélie, you see, if there's 784 00:29:43,520 --> 00:29:45,739 no repeat characters, 785 00:29:45,740 --> 00:29:48,259 then the first bite 786 00:29:48,260 --> 00:29:50,839 encodes the length of the string, 787 00:29:50,840 --> 00:29:53,029 then the string follows and 788 00:29:53,030 --> 00:29:55,339 then there's the specific end fight. 789 00:29:55,340 --> 00:29:58,039 And in this type, the configuration 790 00:29:58,040 --> 00:30:00,169 that fuzes, it's one twenty eight, which 791 00:30:00,170 --> 00:30:02,479 is Oxlade and 792 00:30:02,480 --> 00:30:04,129 our symbolic execution engine 793 00:30:04,130 --> 00:30:06,259 successfully reverse engineered 794 00:30:06,260 --> 00:30:08,539 the RLC encoding 795 00:30:08,540 --> 00:30:11,479 process using the Aurélie decoding. 796 00:30:11,480 --> 00:30:13,219 And we can do the same thing for the for 797 00:30:13,220 --> 00:30:14,239 the hex decoding. 798 00:30:14,240 --> 00:30:16,429 But I'm not going to try to copy the 799 00:30:16,430 --> 00:30:18,169 other part again, otherwise I'm going to 800 00:30:18,170 --> 00:30:19,170 run out of time. 801 00:30:21,110 --> 00:30:23,569 So I'll actually upload 802 00:30:23,570 --> 00:30:24,949 all these examples so that you can you 803 00:30:24,950 --> 00:30:26,419 can try them out later on. 804 00:30:26,420 --> 00:30:28,189 So let's see how far we can we can 805 00:30:28,190 --> 00:30:29,190 actually go with 806 00:30:30,420 --> 00:30:32,089 this this encoding. 807 00:30:32,090 --> 00:30:34,369 And this time we have hex 808 00:30:34,370 --> 00:30:35,749 encoding and decoding. 809 00:30:35,750 --> 00:30:37,879 So two steps and we compare it again 810 00:30:37,880 --> 00:30:38,880 against 811 00:30:40,100 --> 00:30:42,289 Fuzzball without the configuration 812 00:30:42,290 --> 00:30:44,869 as a monolithic approach and KELY 813 00:30:44,870 --> 00:30:46,189 as well. 814 00:30:46,190 --> 00:30:48,379 So it gets a bit more complicated 815 00:30:48,380 --> 00:30:50,359 this time. So we have input bytes, 816 00:30:50,360 --> 00:30:52,309 intermediate bytes and output bytes. 817 00:30:52,310 --> 00:30:53,659 So there's a first transformation from 818 00:30:53,660 --> 00:30:56,029 output bytes to intermediate bytes 819 00:30:56,030 --> 00:30:57,319 and then a second transformation from 820 00:30:57,320 --> 00:31:00,199 intermediate bytes to input bytes. 821 00:31:00,200 --> 00:31:01,519 And you have a set of different 822 00:31:01,520 --> 00:31:03,589 conditions that we try out on the Y 823 00:31:03,590 --> 00:31:05,689 axis. Again, we have to logarithmic 824 00:31:05,690 --> 00:31:07,969 scale and 825 00:31:07,970 --> 00:31:10,789 again, a 10 hour time limit on 826 00:31:10,790 --> 00:31:12,649 on our cluster for single core 827 00:31:12,650 --> 00:31:13,650 computation. 828 00:31:14,450 --> 00:31:16,669 So what we see 829 00:31:16,670 --> 00:31:19,009 is, Mel, monolithic, symbolic 830 00:31:19,010 --> 00:31:21,109 execution engine as used by clear 831 00:31:21,110 --> 00:31:23,239 fastball, runs into the 10 832 00:31:23,240 --> 00:31:25,489 hour time limit every single time 833 00:31:25,490 --> 00:31:27,769 for this for these configurations. 834 00:31:27,770 --> 00:31:29,959 If we use our jurisdiction 835 00:31:29,960 --> 00:31:32,059 now, we're doing a little bit better. 836 00:31:32,060 --> 00:31:34,339 We scale up to 20 ish 837 00:31:34,340 --> 00:31:36,949 bytes, input bytes that we can cope 838 00:31:36,950 --> 00:31:38,150 if we use our 839 00:31:41,890 --> 00:31:43,999 our two step approach where we split 840 00:31:44,000 --> 00:31:44,959 according to the different 841 00:31:44,960 --> 00:31:47,149 transformations, we can scale up to 842 00:31:47,150 --> 00:31:49,249 two hundred and fifty hex bytes that 843 00:31:49,250 --> 00:31:51,319 we that we encode and 844 00:31:51,320 --> 00:31:52,519 one hundred and twenty intermediate 845 00:31:52,520 --> 00:31:54,199 bytes, which is good enough for 846 00:31:54,200 --> 00:31:55,999 reasonably sized data structure that we 847 00:31:56,000 --> 00:31:57,000 can carry along. 848 00:31:58,970 --> 00:32:01,939 And we see that this works fairly well. 849 00:32:01,940 --> 00:32:04,070 So we have one problem solved. 850 00:32:05,810 --> 00:32:07,939 Divide and conquer mitigates the scaling 851 00:32:07,940 --> 00:32:10,099 issues, but we now have 852 00:32:10,100 --> 00:32:11,100 two problems 853 00:32:12,230 --> 00:32:13,819 finding the transformation boundaries. 854 00:32:13,820 --> 00:32:16,639 How do we do that and finding 855 00:32:16,640 --> 00:32:18,469 the buffer locations? 856 00:32:18,470 --> 00:32:19,429 How do we get that? 857 00:32:19,430 --> 00:32:21,409 How do we know where they are, especially 858 00:32:21,410 --> 00:32:23,839 if we only have binary code or some 859 00:32:23,840 --> 00:32:25,759 some executable? 860 00:32:25,760 --> 00:32:28,819 So let's see how what we can do 861 00:32:28,820 --> 00:32:30,949 and let's use some binary analysis 862 00:32:30,950 --> 00:32:31,950 technique for that. 863 00:32:33,820 --> 00:32:35,919 So there should be splits, the symbolic 864 00:32:35,920 --> 00:32:37,989 execution units, and we use a 865 00:32:37,990 --> 00:32:40,839 graph based binary analysis approach 866 00:32:40,840 --> 00:32:43,089 to find minimal points along 867 00:32:43,090 --> 00:32:45,369 the execution graph of the application 868 00:32:45,370 --> 00:32:47,439 where we have little interference 869 00:32:47,440 --> 00:32:50,409 between individual transformations 870 00:32:50,410 --> 00:32:51,689 that's transformed. 871 00:32:51,690 --> 00:32:53,030 The data is available. 872 00:32:54,340 --> 00:32:56,469 So on one hand, we have a 873 00:32:56,470 --> 00:32:58,869 control flow graph, but 874 00:32:58,870 --> 00:33:01,539 we combine this control flow graph with 875 00:33:01,540 --> 00:33:03,429 information or data flow graph, on the 876 00:33:03,430 --> 00:33:05,709 other hand, and information 877 00:33:05,710 --> 00:33:07,989 flow graph encodes 878 00:33:07,990 --> 00:33:10,089 how information flows from one 879 00:33:10,090 --> 00:33:12,669 buffer to another buffer. 880 00:33:12,670 --> 00:33:14,409 And these two data structures already 881 00:33:14,410 --> 00:33:16,749 exist. But we add 882 00:33:16,750 --> 00:33:18,879 additional producer and consumer access 883 00:33:18,880 --> 00:33:20,829 along these these two graphs so that we 884 00:33:20,830 --> 00:33:22,959 can use the combined graph to 885 00:33:22,960 --> 00:33:25,059 decide what the best points are 886 00:33:25,060 --> 00:33:27,369 to split individual transformations. 887 00:33:27,370 --> 00:33:29,709 So we know that buffer to 888 00:33:29,710 --> 00:33:30,710 buffer AVL 889 00:33:32,650 --> 00:33:34,749 basic block number two will read 890 00:33:34,750 --> 00:33:36,999 from buffer from buffer and 891 00:33:37,000 --> 00:33:39,399 symbolic of basic 892 00:33:39,400 --> 00:33:41,709 block number for Will right to buffer 893 00:33:41,710 --> 00:33:43,149 seat. So we know that there's an 894 00:33:43,150 --> 00:33:45,579 information flow from Buffer A, buffer B, 895 00:33:45,580 --> 00:33:47,859 and we can use 896 00:33:47,860 --> 00:33:50,079 these informations to know 897 00:33:50,080 --> 00:33:52,929 or infer that there's a transformation 898 00:33:52,930 --> 00:33:54,969 from Buffer a two buffer C, and we can 899 00:33:54,970 --> 00:33:55,970 follow that one. 900 00:33:56,890 --> 00:33:57,890 So 901 00:33:59,320 --> 00:34:01,689 how can we recover this graph? 902 00:34:01,690 --> 00:34:04,179 And we just use a simple trace 903 00:34:04,180 --> 00:34:05,469 based binary analysis. 904 00:34:05,470 --> 00:34:07,959 On top of that, we collect a detailed 905 00:34:07,960 --> 00:34:10,209 instruction level trace of 906 00:34:10,210 --> 00:34:13,238 a concrete execution that executes 907 00:34:13,239 --> 00:34:15,939 functionality that is close 908 00:34:15,940 --> 00:34:18,129 to the vulnerability but does 909 00:34:18,130 --> 00:34:19,729 not trigger the vulnerability itself. 910 00:34:19,730 --> 00:34:20,948 Otherwise, we would already have solved 911 00:34:20,949 --> 00:34:22,149 the problem. Right. 912 00:34:22,150 --> 00:34:24,698 So we execute 913 00:34:24,699 --> 00:34:26,829 a simple version of like just a 914 00:34:26,830 --> 00:34:28,899 PDF viewer and encode the execution 915 00:34:28,900 --> 00:34:30,968 trace and use that execution trace 916 00:34:30,969 --> 00:34:33,399 to see how the input data flows 917 00:34:33,400 --> 00:34:34,539 through the application. 918 00:34:34,540 --> 00:34:36,249 And using that execution trace, we can 919 00:34:36,250 --> 00:34:38,319 follow all the individual buffer 920 00:34:38,320 --> 00:34:41,109 exercice and concretized dose. 921 00:34:41,110 --> 00:34:43,149 So we start with a concrete input. 922 00:34:43,150 --> 00:34:45,669 We collect the instruction level trace, 923 00:34:45,670 --> 00:34:47,859 we process the trace off-line to 924 00:34:47,860 --> 00:34:49,779 discover all these the individual buffers 925 00:34:49,780 --> 00:34:51,488 and set the combinations between the 926 00:34:51,489 --> 00:34:52,489 buffers. 927 00:34:53,830 --> 00:34:55,988 So problem 928 00:34:55,989 --> 00:34:58,929 is, if we are working this 929 00:34:58,930 --> 00:35:01,599 machine code, we have a huge 930 00:35:01,600 --> 00:35:03,759 a huge amount 931 00:35:03,760 --> 00:35:05,889 of memory accesses and 932 00:35:05,890 --> 00:35:07,989 we need to organize them somehow. 933 00:35:07,990 --> 00:35:09,969 And we do this using a hierarchical 934 00:35:09,970 --> 00:35:12,039 concept where we have different 935 00:35:12,040 --> 00:35:14,889 levels and categories of buffers. 936 00:35:14,890 --> 00:35:17,259 So related memory 937 00:35:17,260 --> 00:35:19,209 accesses target the same buffer. 938 00:35:19,210 --> 00:35:21,219 There was maybe a temporal or a spatial 939 00:35:21,220 --> 00:35:23,469 location temporal, 940 00:35:23,470 --> 00:35:25,719 according to the access log of memory 941 00:35:25,720 --> 00:35:28,089 access that we have or spatial according 942 00:35:28,090 --> 00:35:29,379 to the codes that we have. 943 00:35:29,380 --> 00:35:31,599 And in our analysis, we can actually use 944 00:35:31,600 --> 00:35:33,879 both these kind of accesses to group 945 00:35:33,880 --> 00:35:35,949 related memory accesses and 946 00:35:35,950 --> 00:35:37,539 we assume this buffer hierarchy as the 947 00:35:37,540 --> 00:35:38,799 level of buffers. 948 00:35:38,800 --> 00:35:40,509 So there's a complete memory image. 949 00:35:40,510 --> 00:35:42,429 There's the stack, there's an individual 950 00:35:42,430 --> 00:35:44,109 stack frame, there's this struct in a 951 00:35:44,110 --> 00:35:45,819 stack and there's a specific buffer in 952 00:35:45,820 --> 00:35:47,289 that struct on the stack. 953 00:35:48,910 --> 00:35:50,979 And we find natural boundaries 954 00:35:50,980 --> 00:35:53,049 between transformation's to split up our 955 00:35:53,050 --> 00:35:54,100 symbolic execution. 956 00:35:56,740 --> 00:35:59,239 As a last example of 957 00:35:59,240 --> 00:36:01,669 we're going to walk through the CVI, 958 00:36:01,670 --> 00:36:03,729 a real vulnerability, where we have 959 00:36:03,730 --> 00:36:06,909 a fun parsing book in a popular 960 00:36:06,910 --> 00:36:09,339 PDF viewer. That's a fairly common PDF 961 00:36:09,340 --> 00:36:10,299 library. 962 00:36:10,300 --> 00:36:12,429 And if you look at the memory 963 00:36:12,430 --> 00:36:15,039 axis, it's insanely complex. 964 00:36:15,040 --> 00:36:17,589 It will be very hard from a vulnerability 965 00:36:17,590 --> 00:36:19,869 point of view if you only have to binary 966 00:36:19,870 --> 00:36:22,029 to actually follow these individual steps 967 00:36:22,030 --> 00:36:23,799 until you hit the condition. 968 00:36:23,800 --> 00:36:25,839 So there's IO processing, there's 969 00:36:25,840 --> 00:36:28,819 deflated code that's happening, 970 00:36:28,820 --> 00:36:31,299 the fun reading part 971 00:36:31,300 --> 00:36:33,579 and to parse font parsing parts 972 00:36:33,580 --> 00:36:35,169 and debug actually happens in a fun 973 00:36:35,170 --> 00:36:37,359 parsing. But how do you generate such 974 00:36:37,360 --> 00:36:39,039 an input? How can you actually produce 975 00:36:39,040 --> 00:36:39,939 that? 976 00:36:39,940 --> 00:36:42,069 And we ran our 977 00:36:42,070 --> 00:36:44,169 magic python script on the trace 978 00:36:44,170 --> 00:36:46,479 we collected and splits the symbolic 979 00:36:46,480 --> 00:36:48,909 execution along these these paths 980 00:36:48,910 --> 00:36:51,099 and came up with a set of several 981 00:36:51,100 --> 00:36:53,469 symbolic execution processing 982 00:36:53,470 --> 00:36:56,619 steps to produce that that image. 983 00:36:56,620 --> 00:36:58,749 First, it's the meme copy that copies 984 00:36:58,750 --> 00:37:00,639 from for a buffer, B from the input 985 00:37:00,640 --> 00:37:03,279 buffer to the parsing buffer. 986 00:37:03,280 --> 00:37:05,409 Then we are able to decode 987 00:37:05,410 --> 00:37:07,479 the flat 988 00:37:07,480 --> 00:37:08,480 encoding. 989 00:37:11,150 --> 00:37:13,160 We read the front into a buffer 990 00:37:14,270 --> 00:37:16,579 and in the end in the front parsing 991 00:37:16,580 --> 00:37:18,290 part, their woman ability, 992 00:37:19,520 --> 00:37:21,589 and we're actually automatically 993 00:37:21,590 --> 00:37:23,779 able to produce input that triggers that 994 00:37:23,780 --> 00:37:25,849 vulnerability, that 995 00:37:25,850 --> 00:37:28,219 follows along these different steps 996 00:37:28,220 --> 00:37:30,199 according to the split's that we have in 997 00:37:30,200 --> 00:37:32,449 our in 998 00:37:32,450 --> 00:37:33,409 our graph. 999 00:37:33,410 --> 00:37:35,779 So selecting the individual parts 1000 00:37:35,780 --> 00:37:37,999 from the graph that we have is 1001 00:37:38,000 --> 00:37:40,129 still still need some manual 1002 00:37:40,130 --> 00:37:42,799 work, but it works fairly well 1003 00:37:42,800 --> 00:37:44,149 and you can actually try it out for 1004 00:37:44,150 --> 00:37:45,150 yourself. 1005 00:37:46,160 --> 00:37:48,349 So what did 1006 00:37:48,350 --> 00:37:49,350 I show you? 1007 00:37:51,320 --> 00:37:54,049 First of all, symbolic execution 1008 00:37:54,050 --> 00:37:56,089 is no pennachio for every problem that 1009 00:37:56,090 --> 00:37:58,159 you can just throw stuff at it and let it 1010 00:37:58,160 --> 00:38:00,289 solve it. You have to be aware of 1011 00:38:00,290 --> 00:38:01,969 all the structural differences. 1012 00:38:01,970 --> 00:38:04,279 And due to all these additional overhead, 1013 00:38:04,280 --> 00:38:06,109 it doesn't scale that far. 1014 00:38:06,110 --> 00:38:08,119 But if you look into the individual 1015 00:38:08,120 --> 00:38:09,979 transformation, the kind of code you're 1016 00:38:09,980 --> 00:38:12,079 trying to reverse and you're aware 1017 00:38:12,080 --> 00:38:13,999 of some graph based analysis on top of 1018 00:38:14,000 --> 00:38:16,159 it, then you can actually scale it up 1019 00:38:16,160 --> 00:38:18,229 to real data structures and do real 1020 00:38:18,230 --> 00:38:19,639 things. 1021 00:38:19,640 --> 00:38:22,049 You can trigger, as 1022 00:38:22,050 --> 00:38:24,649 shown in this example, you can trigger 1023 00:38:24,650 --> 00:38:26,449 vulnerability conditions that you find 1024 00:38:26,450 --> 00:38:28,159 that are deep inside the program and 1025 00:38:28,160 --> 00:38:30,259 hidden. And you can actually construct 1026 00:38:30,260 --> 00:38:32,119 some proof of concept input that will 1027 00:38:32,120 --> 00:38:33,559 trigger this ability in the end. 1028 00:38:34,760 --> 00:38:36,799 And you can explore how deep the rabbit 1029 00:38:36,800 --> 00:38:38,149 hole goes for yourself. 1030 00:38:38,150 --> 00:38:40,189 You can download all the software from 1031 00:38:40,190 --> 00:38:41,809 the website. 1032 00:38:41,810 --> 00:38:43,159 I'm around on a C.C.C.. 1033 00:38:43,160 --> 00:38:45,079 Just talk to me. 1034 00:38:45,080 --> 00:38:46,699 I'm happy to discuss anything in more 1035 00:38:46,700 --> 00:38:47,779 detail. 1036 00:38:47,780 --> 00:38:49,039 Thank you for your attention and I'm 1037 00:38:49,040 --> 00:38:50,040 happy to take questions. 1038 00:38:59,580 --> 00:39:00,579 Thanks. 1039 00:39:00,580 --> 00:39:02,109 We have around 10 minutes for a question 1040 00:39:02,110 --> 00:39:04,659 and answer if there is any 1041 00:39:04,660 --> 00:39:05,589 gap. 1042 00:39:05,590 --> 00:39:06,590 Microphone one 1043 00:39:08,800 --> 00:39:09,849 high. 1044 00:39:09,850 --> 00:39:10,850 Excellent talk. 1045 00:39:12,010 --> 00:39:14,199 So my question is 1046 00:39:14,200 --> 00:39:15,340 in relation to 1047 00:39:17,230 --> 00:39:19,419 the ease at 1048 00:39:19,420 --> 00:39:21,729 which one can recover these opaque 1049 00:39:21,730 --> 00:39:24,459 predicates that you 1050 00:39:24,460 --> 00:39:26,229 seemingly recover automatically. 1051 00:39:26,230 --> 00:39:28,629 So another developer related project, 1052 00:39:28,630 --> 00:39:30,849 Bapi, came 1053 00:39:30,850 --> 00:39:33,270 by Noisebridge San Francisco hackerspace, 1054 00:39:34,300 --> 00:39:36,490 and he showed 1055 00:39:37,570 --> 00:39:39,429 a demonstration of VIP, which is very 1056 00:39:39,430 --> 00:39:40,500 similar to bikeways 1057 00:39:41,510 --> 00:39:43,179 recovering the Aurora vulnerability, 1058 00:39:43,180 --> 00:39:45,729 which is I think, like get Phonte Icon. 1059 00:39:45,730 --> 00:39:48,009 And it seemed that a considerable 1060 00:39:48,010 --> 00:39:50,079 amount of the work that he did 1061 00:39:50,080 --> 00:39:52,299 in Bapi 1062 00:39:52,300 --> 00:39:54,489 and Microsoft Internet 1063 00:39:54,490 --> 00:39:56,619 Explorer was actually like writing 1064 00:39:56,620 --> 00:39:58,809 some Tlaib code, which, 1065 00:39:58,810 --> 00:40:00,849 you know, like you can feed into Z3. 1066 00:40:00,850 --> 00:40:02,529 And it seemed that a considerable amount 1067 00:40:02,530 --> 00:40:04,599 of the work that was performed actually 1068 00:40:04,600 --> 00:40:05,739 had to do with 1069 00:40:07,450 --> 00:40:09,519 a lot of knowledge that he had 1070 00:40:09,520 --> 00:40:10,520 about the vulnerability. 1071 00:40:11,560 --> 00:40:12,969 Well, well, after the vulnerability of 1072 00:40:12,970 --> 00:40:13,929 being published. 1073 00:40:13,930 --> 00:40:15,279 So it seems that the majority of the work 1074 00:40:15,280 --> 00:40:17,439 was done by like Z3 or 1075 00:40:17,440 --> 00:40:19,809 some other equivalent SMPTE solver. 1076 00:40:19,810 --> 00:40:22,149 And then Bapi kind of had this ancillary 1077 00:40:22,150 --> 00:40:24,849 role that, 1078 00:40:24,850 --> 00:40:26,349 you know, didn't 1079 00:40:27,430 --> 00:40:29,799 seem to be recovering these these 1080 00:40:29,800 --> 00:40:32,109 neat things like, you know, tracing a 1081 00:40:32,110 --> 00:40:34,299 return code as it gets 1082 00:40:34,300 --> 00:40:35,979 passed through the program. 1083 00:40:35,980 --> 00:40:38,079 So I guess what is your opinion? 1084 00:40:38,080 --> 00:40:39,639 If I were to ask a concrete question, 1085 00:40:39,640 --> 00:40:41,469 what is your opinion on the limits of, 1086 00:40:41,470 --> 00:40:43,750 like execution or Bitlis 1087 00:40:44,770 --> 00:40:47,049 on truly generating 1088 00:40:47,050 --> 00:40:49,329 exploits automatically without a great 1089 00:40:49,330 --> 00:40:51,459 deal of knowledge about the nature 1090 00:40:51,460 --> 00:40:53,389 of a vulnerability or the program that 1091 00:40:53,390 --> 00:40:55,479 runs? First of all, I'll be very happy 1092 00:40:55,480 --> 00:40:57,159 to talk with you about this offline for 1093 00:40:57,160 --> 00:40:59,499 hours. But let me give a very short 1094 00:40:59,500 --> 00:41:01,029 and brief answer. 1095 00:41:01,030 --> 00:41:03,219 So in this talk, 1096 00:41:03,220 --> 00:41:04,839 I gave the motivation that we already 1097 00:41:04,840 --> 00:41:07,419 know the vulnerability. 1098 00:41:07,420 --> 00:41:09,789 So we know at which instruction 1099 00:41:09,790 --> 00:41:11,499 that vulnerability is cost. 1100 00:41:11,500 --> 00:41:13,629 And we know to the code we 1101 00:41:13,630 --> 00:41:16,029 can look at the instructions prior 1102 00:41:16,030 --> 00:41:18,369 to that access and we can actually encode 1103 00:41:18,370 --> 00:41:20,439 these instructions so we know what kind 1104 00:41:20,440 --> 00:41:23,079 of conditions happen at this 1105 00:41:23,080 --> 00:41:24,459 this vulnerability. We just want to 1106 00:41:24,460 --> 00:41:25,460 produce 1107 00:41:26,950 --> 00:41:29,439 reimage that flow through the application 1108 00:41:29,440 --> 00:41:31,509 that executes that that 1109 00:41:31,510 --> 00:41:32,469 one ability. 1110 00:41:32,470 --> 00:41:33,909 And you are perfectly right. 1111 00:41:33,910 --> 00:41:36,399 Symbolic execution has a hard time 1112 00:41:36,400 --> 00:41:38,649 encoding these properties 1113 00:41:38,650 --> 00:41:40,779 and it needs a huge amount of manual 1114 00:41:40,780 --> 00:41:42,969 effort and work if you use 1115 00:41:42,970 --> 00:41:44,949 it to find the vulnerabilities. 1116 00:41:44,950 --> 00:41:47,019 So I'm not 100 percent sure if 1117 00:41:47,020 --> 00:41:49,089 symbolic execution is a perfect 1118 00:41:49,090 --> 00:41:50,769 tool to find the vulnerabilities. 1119 00:41:52,150 --> 00:41:54,369 You can use it to generate images. 1120 00:41:54,370 --> 00:41:56,079 That's what it's here for. 1121 00:41:56,080 --> 00:41:58,179 But you will have you will need 1122 00:41:58,180 --> 00:42:00,849 additional logic and manpower 1123 00:42:00,850 --> 00:42:01,989 to encode it for 1124 00:42:04,060 --> 00:42:05,649 to find the vulnerability conditions 1125 00:42:05,650 --> 00:42:07,329 itself. Just a quick follow up. 1126 00:42:07,330 --> 00:42:08,529 I'm sorry. 1127 00:42:08,530 --> 00:42:11,289 Let me give a quick, blatant 1128 00:42:11,290 --> 00:42:12,639 commercial interruption. 1129 00:42:12,640 --> 00:42:14,020 So there's bound a model checking 1130 00:42:15,460 --> 00:42:17,439 and you can use Bonnett model checking to 1131 00:42:17,440 --> 00:42:20,439 infer a nice set of 1132 00:42:20,440 --> 00:42:21,999 vulnerability conditions. 1133 00:42:22,000 --> 00:42:23,799 And we're using we are moving more and 1134 00:42:23,800 --> 00:42:25,869 more to t to use bounden model checking 1135 00:42:25,870 --> 00:42:28,419 to actually come up with the actually 1136 00:42:28,420 --> 00:42:30,129 vulnerability conditions that it can then 1137 00:42:30,130 --> 00:42:31,809 feed into the symbolic execution engine. 1138 00:42:31,810 --> 00:42:32,810 On the other hand, 1139 00:42:34,460 --> 00:42:35,460 you have 1140 00:42:36,670 --> 00:42:37,670 other questions. 1141 00:42:40,070 --> 00:42:42,179 Yes, here, please go to 1142 00:42:42,180 --> 00:42:43,180 microphone to 1143 00:42:45,510 --> 00:42:47,629 just I was just trying to 1144 00:42:47,630 --> 00:42:50,719 refer to the guy who is talking about 1145 00:42:50,720 --> 00:42:52,859 and that is very interesting 1146 00:42:52,860 --> 00:42:55,099 and he's very, very 1147 00:42:55,100 --> 00:42:56,989 intelligent programing in a sense. 1148 00:42:56,990 --> 00:42:58,699 But to be honest, I'm not really sure 1149 00:42:58,700 --> 00:43:00,319 it's the best topic to talk about in a 1150 00:43:00,320 --> 00:43:01,639 conference like this, because it's called 1151 00:43:01,640 --> 00:43:04,099 source development Microsoft. 1152 00:43:04,100 --> 00:43:06,379 And they really are ranting about 1153 00:43:06,380 --> 00:43:07,669 said, sorry, sorry, 1154 00:43:08,720 --> 00:43:10,479 what are you ranting about that. 1155 00:43:10,480 --> 00:43:11,689 Yeah. Yeah. 1156 00:43:11,690 --> 00:43:13,849 So, I mean, this report said three 1157 00:43:13,850 --> 00:43:15,829 and stop so you can use to any sulpher 1158 00:43:15,830 --> 00:43:18,379 you want. Yeah. I suggest you use S2P. 1159 00:43:18,380 --> 00:43:19,369 Yeah. 1160 00:43:19,370 --> 00:43:21,439 The drawback is STP is half as fast. 1161 00:43:21,440 --> 00:43:22,669 Right. So yeah. 1162 00:43:26,390 --> 00:43:29,369 Uh, could you handle, uh, 1163 00:43:29,370 --> 00:43:32,099 it's, uh, arbitrary 1164 00:43:32,100 --> 00:43:34,259 state explosion added by, 1165 00:43:34,260 --> 00:43:36,419 uh, obfuscated code or something 1166 00:43:36,420 --> 00:43:37,589 like that? 1167 00:43:37,590 --> 00:43:39,779 Do you have some way to detect that some 1168 00:43:39,780 --> 00:43:42,029 court has no influence in the result 1169 00:43:42,030 --> 00:43:43,030 of. 1170 00:43:45,850 --> 00:43:48,669 You're talking about the obfuscation 1171 00:43:48,670 --> 00:43:50,859 of so let me let 1172 00:43:50,860 --> 00:43:52,929 me rephrase your question, if I if I got 1173 00:43:52,930 --> 00:43:54,879 it correctly. So you assume that you have 1174 00:43:54,880 --> 00:43:57,039 some obfuscated executable 1175 00:43:57,040 --> 00:43:58,899 and you want the symbolic execution 1176 00:43:58,900 --> 00:44:00,909 engine to get rid of all the superfluous 1177 00:44:00,910 --> 00:44:03,189 computation not 1178 00:44:03,190 --> 00:44:05,529 to get to it, because sometimes 1179 00:44:05,530 --> 00:44:07,839 you just don't, 1180 00:44:07,840 --> 00:44:09,909 uh. But is there some 1181 00:44:09,910 --> 00:44:10,910 way to 1182 00:44:12,490 --> 00:44:14,559 crash this space to, 1183 00:44:14,560 --> 00:44:16,749 uh, to 1184 00:44:16,750 --> 00:44:17,979 eliminate some part of 1185 00:44:19,320 --> 00:44:21,549 it, cost us all the 1186 00:44:21,550 --> 00:44:23,649 heavy or a lot of the heavy lifting is 1187 00:44:23,650 --> 00:44:25,299 actually done by the constraints. 1188 00:44:25,300 --> 00:44:26,439 All right. 1189 00:44:26,440 --> 00:44:28,899 So if 1190 00:44:28,900 --> 00:44:30,969 under specific conditions, 1191 00:44:30,970 --> 00:44:33,189 no computation happens to 1192 00:44:33,190 --> 00:44:35,849 the constraints, should simplify 1193 00:44:35,850 --> 00:44:37,989 the formulas and throw away those 1194 00:44:37,990 --> 00:44:40,479 so that happens intrinsically 1195 00:44:40,480 --> 00:44:42,969 in the in the hole for itself. 1196 00:44:42,970 --> 00:44:45,069 So we don't have any additional checks 1197 00:44:45,070 --> 00:44:46,599 that help in that regard. 1198 00:44:48,370 --> 00:44:50,919 And we focused mostly on 1199 00:44:50,920 --> 00:44:53,349 binary only code, but not heavily 1200 00:44:53,350 --> 00:44:54,730 obfuscated or packed code. 1201 00:44:56,550 --> 00:44:58,629 You can try out and tell me about 1202 00:44:58,630 --> 00:44:59,630 it afterwards. 1203 00:45:01,180 --> 00:45:03,219 And just a quick question. 1204 00:45:03,220 --> 00:45:05,499 How much memory does it take 1205 00:45:05,500 --> 00:45:06,820 to keep track of all the 1206 00:45:07,840 --> 00:45:08,949 parts you went through 1207 00:45:11,740 --> 00:45:14,409 for popular? 1208 00:45:14,410 --> 00:45:16,929 I guess we used something like 1209 00:45:16,930 --> 00:45:19,629 12 or 16 gigabytes, 1210 00:45:19,630 --> 00:45:20,679 so not too big. 1211 00:45:24,510 --> 00:45:26,669 Well, rent is cheap, right? 1212 00:45:30,690 --> 00:45:33,119 Oh, OK, I think we're on 1213 00:45:33,120 --> 00:45:35,309 the rise. So if you have other 1214 00:45:35,310 --> 00:45:38,129 questions, just catch me somewhere on the 1215 00:45:38,130 --> 00:45:39,059 CCC. 1216 00:45:39,060 --> 00:45:40,060 Thanks. Thank you.