1 00:00:00,110 --> 00:00:14,320 *music* 2 00:00:14,320 --> 00:00:18,810 *applause* 3 00:00:18,810 --> 00:00:23,560 Raichoo: Yeah sorry about that - beamers or projectors, I don't like them. They 4 00:00:23,560 --> 00:00:27,210 don't like me either. So this is a little heads up - this is going to be the only 5 00:00:27,210 --> 00:00:32,049 slide I'm going to show you today so, "slide", because I think doing stuff like 6 00:00:32,049 --> 00:00:35,940 that in a terminal might be a little bit more interesting for you. But sadly 7 00:00:35,940 --> 00:00:40,020 something is getting cut off so I we have to improvise a little bit. But anyway, so 8 00:00:40,020 --> 00:00:43,960 today I will be able to talk about two of my favorite things right now which are 9 00:00:43,960 --> 00:00:47,820 FreeBSD and DTrace. But this talk has been capped down to 30 minutes so we'll be 10 00:00:47,820 --> 00:00:53,190 focusing a little more on the DTrace part. So there will be a little bit less BSD 11 00:00:53,190 --> 00:00:57,560 than I anticipated. And also adjusted everything a little bit to fit better into 12 00:00:57,560 --> 00:01:03,130 the resilience track so hopefully you'll enjoy that. So before we begin, who here 13 00:01:03,130 --> 00:01:08,640 is actually using DTrace? Okay more than I expected but still not as many as I 14 00:01:08,640 --> 00:01:12,610 would like to see. So hopefully after this talk you will think, "oh, this is a really 15 00:01:12,610 --> 00:01:17,170 awesome tool, I gotta learn it." Because I totally love it - it changed the way I do 16 00:01:17,170 --> 00:01:22,110 a lot of stuff. So for those of you who do not know what DTrace is, first, let me 17 00:01:22,110 --> 00:01:27,260 fill you in on this stuff. So it's open source, it originated on Solaris, and been 18 00:01:27,260 --> 00:01:31,640 developed currently on illumos which is a fork from OpenSolaris. It has been ported 19 00:01:31,640 --> 00:01:37,930 to FreeBSD, NetBSD, OS X, there's also a port for Linux called next called DTrace 20 00:01:37,930 --> 00:01:43,689 for Linux. I think it's done by a person called Paul Fox. It's been ported to QNX 21 00:01:43,689 --> 00:01:49,810 and the OpenBSD folks are currently doing some work to get the technology like 22 00:01:49,810 --> 00:01:54,040 DTrace on their system. And I think there's a port for Windows? I don't know 23 00:01:54,040 --> 00:01:57,869 if this is actually true, but it is it's kind of cool because then that means it's 24 00:01:57,869 --> 00:02:04,650 basically everywhere. So, most of you would probably know static tools like 25 00:02:04,650 --> 00:02:09,470 strace. We have a very similar tool on FreeBSD that is called truss, and what 26 00:02:09,470 --> 00:02:14,500 truss and strace are doing is - you can attach them to a process and look at the 27 00:02:14,500 --> 00:02:18,650 system calls that this process is emitting. So in case something is going 28 00:02:18,650 --> 00:02:23,319 wrong you can well look inside of the program, which can be kind of useful when 29 00:02:23,319 --> 00:02:28,870 you're trying to find a problem. It's kind of handy but it's also pretty 30 00:02:28,870 --> 00:02:32,890 limited. Because first of all it really really slows down the process that you're 31 00:02:32,890 --> 00:02:37,250 currently looking at. So if you want to debug a performance issue, you're pretty 32 00:02:37,250 --> 00:02:42,170 much out of luck there. And also it's kind of like, narrow down - you can just look 33 00:02:42,170 --> 00:02:47,940 at one process. Which is also like bad thing because the system that we currently 34 00:02:47,940 --> 00:02:52,660 have - all these systems are very complex: we have a lot of layers. You have 35 00:02:52,660 --> 00:02:56,300 virtual file systems, you have virtual memory, you have network, you have 36 00:02:56,300 --> 00:03:00,500 databases, processes communicating with each other. And in case you are using a 37 00:03:00,500 --> 00:03:04,710 high-level programming language, you might also have a runtime system. So it's a 38 00:03:04,710 --> 00:03:09,519 little operating system on top of your operating system. So when something goes 39 00:03:09,519 --> 00:03:15,000 wrong in a system that has such large complexity, something happens that we call 40 00:03:15,000 --> 00:03:19,850 the blame game. And the blame game - it's never your fault, it's always someone 41 00:03:19,850 --> 00:03:25,710 else's. So what we want to be able to do is we want to look at the system as a 42 00:03:25,710 --> 00:03:30,349 whole, so we can correlate all the data and come up with some meaningful answers 43 00:03:30,349 --> 00:03:34,506 when something is really going wrong in there. And also, we don't want to 44 00:03:34,506 --> 00:03:39,260 switch out all the processes for debug processes to make that happen, 45 00:03:39,260 --> 00:03:44,969 because as these things are all -- every problem happens in production. It never 46 00:03:44,969 --> 00:03:48,470 happens on the development box. So like, switching out all the processes - that's 47 00:03:48,470 --> 00:03:55,030 totally out of the picture. So to do that in an arbitrary way, to like, instrument 48 00:03:55,030 --> 00:03:59,910 the system in an arbitrary way, we sort of need like a programming language. So, we 49 00:03:59,910 --> 00:04:03,640 need to describe - when that happens, please submit data so I can see what's 50 00:04:03,640 --> 00:04:09,489 going on. So this kind of implies a programming language. And DTrace comes 51 00:04:09,489 --> 00:04:13,670 with such a programming language - it's a little bit reminiscent of awk cross with 52 00:04:13,670 --> 00:04:18,798 C? It's pretty simple to learn - you can pick it up 20 up to pick it up in 20 53 00:04:18,798 --> 00:04:25,199 minutes and you can start churning out your first DTrace scripts. So like awk, if 54 00:04:25,199 --> 00:04:30,559 you know awk, awk can be used to analyze large bodies of text. Dtrace is pretty 55 00:04:30,559 --> 00:04:34,749 much the same, but for system behavior - so a little bit mind boggling, but 56 00:04:34,749 --> 00:04:40,069 probably I can show you what I mean by that. And also, as a bonus we don't want 57 00:04:40,069 --> 00:04:43,860 to slow down the system, so we want to be able to do things like performance 58 00:04:43,860 --> 00:04:52,300 debugging, performance tests like that. So I've prepared this little demo here, and. 59 00:04:52,300 --> 00:04:58,780 So since we had some issues here probably this is not -- I have to play around a 60 00:04:58,780 --> 00:05:04,249 little bit. So what I'm going to do is I'm going to look at a very very naive way 61 00:05:04,249 --> 00:05:18,009 to -- excuse me for a second -- very naive way to -- give me a second -- so very 62 00:05:18,009 --> 00:05:21,960 naive way to authenticate a user. And there's a lot of stuff wrong with this 63 00:05:21,960 --> 00:05:26,030 code, but like what we're going to do is we're going to take a user string as 64 00:05:26,030 --> 00:05:32,740 input, and then we're going to just compare it to another, to a secret. So I 65 00:05:32,740 --> 00:05:36,420 know, the the secret in here is like in plain text I know this is a problem, but 66 00:05:36,420 --> 00:05:41,639 this is a little bit artificial. But I just want to get my point across. So from 67 00:05:41,639 --> 00:05:47,159 an algorithmic perspective, this check function is correct: so we take a string 68 00:05:47,159 --> 00:05:52,449 we take another string and we compare them. So everything's fine and easy. So if 69 00:05:52,449 --> 00:05:58,599 you look at the way string compare works and what it does, it's essentially 70 00:05:58,599 --> 00:06:04,449 taking these two strings and it's comparing every character bit by bit. So 71 00:06:04,449 --> 00:06:10,729 when it finds the first pair of characters that do not match up, it's going to stop. 72 00:06:10,729 --> 00:06:17,879 So we can we can conclude something about from that - so if it takes very short if 73 00:06:17,879 --> 00:06:23,399 if this function this check function takes a very short amount of time, then, what 74 00:06:23,399 --> 00:06:29,129 will happen is it will terminate earlier. And if our password guess is better, it 75 00:06:29,129 --> 00:06:34,479 will take well, it will take longer. And if we can measure that we can basically 76 00:06:34,479 --> 00:06:40,809 extract information from that running algorithm. So I wrote a little driver 77 00:06:40,809 --> 00:06:47,449 program in Haskell that basically just iterates over an alphabet and just feeds 78 00:06:47,449 --> 00:06:53,379 this one letter into that program, And I'm going to use DTrace to get some 79 00:06:53,379 --> 00:06:59,020 timing information. So let me start the driver. So this is now just running in the 80 00:06:59,020 --> 00:07:04,919 background. And you cannot see what I'm typing there, but don't worry - these 81 00:07:04,919 --> 00:07:12,240 scripts will all be; I will push them on my github. So DTrace now produces this 82 00:07:12,240 --> 00:07:17,240 nice little distribution. So if you if you were if you were able to see the entire 83 00:07:17,240 --> 00:07:22,949 alphabet, you would see that everything except "D" behaves differently. So if you 84 00:07:22,949 --> 00:07:29,399 squint a little, what you see there is DTrace the D letter takes a couple of 85 00:07:29,399 --> 00:07:32,949 nanoseconds longer. This is the precision that I'm measuring here - ten to minus 86 00:07:32,949 --> 00:07:39,219 nine seconds. Like really precise. And D takes longer than everything else, so it's 87 00:07:39,219 --> 00:07:43,929 a little bit cut off there, but trust me. I know it sound like Donald Trump I'm 88 00:07:43,929 --> 00:07:52,759 saying that. So yeah, and from that let's just enter a letter. And now the password 89 00:07:52,759 --> 00:07:56,799 and now the script clears everything and it's going to guess the next letter. So 90 00:07:56,799 --> 00:08:02,020 sadly this is cut off, because you would see that this distribution radically 91 00:08:02,020 --> 00:08:08,830 changed. It looks completely different, and so we can play that game a little bit. 92 00:08:08,830 --> 00:08:13,419 So let's just roll with that. And like every three seconds the script is 93 00:08:13,419 --> 00:08:19,159 going to recompute looking at the new distribution. And you can probably see 94 00:08:19,159 --> 00:08:26,849 where this is going. So here you can see, okay, and now it just - it just takes 95 00:08:26,849 --> 00:08:34,559 about like three seconds for me to guess the next letter. So, and this is not a 96 00:08:34,559 --> 00:08:39,809 problem that is only of something that happens when you do string 97 00:08:39,809 --> 00:08:44,139 compares. This can happen with basically everything - so it's especially 98 00:08:44,139 --> 00:08:48,029 in things like cryptographic stuff where you don't want to have some information 99 00:08:48,029 --> 00:08:56,620 leaked out. So this is what we call a timing side channel attack. So I could 100 00:08:56,620 --> 00:09:02,959 essentially use DTrace to analyze the real binary. So I didn't change the 101 00:09:02,959 --> 00:09:07,040 binary - I didn't have some some debug code there. This is like the actual binary 102 00:09:07,040 --> 00:09:12,500 that I would put into production. So what's important about out that, is to 103 00:09:12,500 --> 00:09:16,500 take the actual binary, is some of these these timing side channels might be 104 00:09:16,500 --> 00:09:21,620 introduced by a compiler optimization. And when you insert debug code into that code, 105 00:09:21,620 --> 00:09:26,920 then it might actually go away. So, you want to look at the real code that you're 106 00:09:26,920 --> 00:09:34,420 putting into production. Let me show you the script that I came up with to write 107 00:09:34,420 --> 00:09:40,779 that. So there are three interesting things in this script. So and and don't 108 00:09:40,779 --> 00:09:44,180 worry - this is the more complicated example, I just want to like 109 00:09:44,180 --> 00:09:48,839 inspire your ideas. Because the things that you can do with DTrace that's pretty 110 00:09:48,839 --> 00:09:54,600 much - the sky's the limit. You can come up with the weirdest ideas, and so 111 00:09:54,600 --> 00:09:59,420 this is more complicated example. I'm going to show you simpler ones. So to 112 00:09:59,420 --> 00:10:04,440 demonstrate how we got here. So there are three interesting things in this code. The 113 00:10:04,440 --> 00:10:09,509 first one is something that we call a probe. So a probe is a point of 114 00:10:09,509 --> 00:10:15,019 instrumentation in the system. So whenever a certain event happens in the system this 115 00:10:15,019 --> 00:10:21,269 probe is going to fire. And in this case, the begin probe like marks the state 116 00:10:21,269 --> 00:10:27,379 the moment when the script starts. So the second interesting thing is this clause. 117 00:10:27,379 --> 00:10:31,680 So this clause is basically what this probe is going to execute - what's going 118 00:10:31,680 --> 00:10:37,780 to be executed once that probe fires. So it's a little block of code. 119 00:10:37,780 --> 00:10:42,370 And this probe is a little bit more interesting, because it tells us 120 00:10:42,370 --> 00:10:48,270 something about the structure of how such a probe looks like. Because every 121 00:10:48,270 --> 00:10:54,100 probe is uniquely identified by a four tuple. So it's like four components that 122 00:10:54,100 --> 00:10:59,079 uniquely identify a probe. And the first one is called the first part of this 123 00:10:59,079 --> 00:11:03,269 tuple is called the provider, and I'm going to talk about providers in a couple 124 00:11:03,269 --> 00:11:07,160 of seconds and what they are. The second one is called the module. Third one is 125 00:11:07,160 --> 00:11:13,449 called the function. And the last one is called the name. So these four pieces of 126 00:11:13,449 --> 00:11:21,079 data, like, they identify a probe uniquely. So the third thing that is 127 00:11:21,079 --> 00:11:25,440 interesting here is, sadly something that I don't have time to talk about today, 128 00:11:25,440 --> 00:11:31,139 this is called an aggregation. And this single line that you see here is 129 00:11:31,139 --> 00:11:35,889 essentially responsible for accumulating all this data to print out this 130 00:11:35,889 --> 00:11:39,949 distribution stuff - to generate this distribution. So this is built 131 00:11:39,949 --> 00:11:44,629 into DTrace. You don't have to do that yourself. As it, when you look at this 132 00:11:44,629 --> 00:11:50,189 script, it's like 42 lines of code. And I came up with the first prototype 133 00:11:50,189 --> 00:11:55,279 after five minutes. So it's not a lot of stuff to do to get something out of 134 00:11:55,279 --> 00:12:00,360 that. So it's very useful to have things - if you use DTrace you 135 00:12:00,360 --> 00:12:05,060 will use this a lot for performance debugging so it's kind of neat that we 136 00:12:05,060 --> 00:12:11,410 have that. So yeah, let's talk a little bit about providers, and this will 137 00:12:11,410 --> 00:12:18,300 probably also will be cut off. So I'm going to cheat a little bit here - I'm 138 00:12:18,300 --> 00:12:27,649 just going to double that. So let's talk about providers -- oh that's handy -- 139 00:12:27,649 --> 00:12:32,339 so I got 27 providers here and the number of providers vary from operating system to 140 00:12:32,339 --> 00:12:38,339 operating system. But these are the ones that I can see right now. There are 141 00:12:38,339 --> 00:12:44,499 other providers that can be come into existence when you demand them. So I have 142 00:12:44,499 --> 00:12:49,370 these 27 providers, and we're going to look at the syscall provider and the FBT 143 00:12:49,370 --> 00:12:55,129 provider first. So, every provider knows how to instrument a specific part of the 144 00:12:55,129 --> 00:13:01,410 system. So the syscall provider knows how to instrument the syscall table. That's not 145 00:13:01,410 --> 00:13:08,699 very surprising. So if you can look at the syscall provider and here you can see 146 00:13:08,699 --> 00:13:16,720 essentially every system call entry and return that FreeBSD offers. So 147 00:13:16,720 --> 00:13:20,120 here you can see this four tuple, like, the provider syscall, FreeBSD is the 148 00:13:20,120 --> 00:13:28,189 module, and so on. So these are all the system calls that I have in my system. And 149 00:13:28,189 --> 00:13:32,910 the other provider that I want to look at is the so called FBT provider, and that is 150 00:13:32,910 --> 00:13:38,810 pretty astonishing. The FBT provider, FBT stands for "function boundary tracer" and 151 00:13:38,810 --> 00:13:45,160 what it allows us to do, it allows us to trace every single function in the kernel. 152 00:13:45,160 --> 00:13:50,850 So I can look at the entire kernel at functions, as they are being called. So to 153 00:13:50,850 --> 00:13:57,660 illustrate that I wrote a little, very simple DTrace script and this is probably, 154 00:13:57,660 --> 00:14:01,399 look at the upper half please, so this is probably one of the first DTrace scripts 155 00:14:01,399 --> 00:14:05,529 that you will come up with, it's a fairly simple example, so let's break it 156 00:14:05,529 --> 00:14:09,680 down. So I'm going to instrument the mmap system call. For those of you who do not 157 00:14:09,680 --> 00:14:13,720 know what the mmap system call is, what you can do with it is you can so you can 158 00:14:13,720 --> 00:14:20,970 take a file and map that into the address space of your process, so very dumbed down 159 00:14:20,970 --> 00:14:27,449 version. So whenever we enter the mmap system call we are going to set the 160 00:14:27,449 --> 00:14:32,810 variable "follow" to one, and what this "self at" means: this is essentially a 161 00:14:32,810 --> 00:14:37,970 thread local variable and we're going to associate that variable with the thread 162 00:14:37,970 --> 00:14:45,230 that we're currently inspecting. Then I'm going to do something pretty, that sounds 163 00:14:45,230 --> 00:14:49,149 scary but I'm going to instrument the entire kernel. Every function entry and 164 00:14:49,149 --> 00:14:53,009 every function return, I'm going to instrument that and say "please emit data 165 00:14:53,009 --> 00:14:57,189 when you do that". And this is what we call a predicate, so this is where the 166 00:14:57,189 --> 00:15:02,009 awkiness of the DTrace programming language comes in. So this is a predicate 167 00:15:02,009 --> 00:15:07,059 and whenever that evaluates to true then the probe is going to fire, so in 168 00:15:07,059 --> 00:15:11,139 this case when we are in the thread that we're currently tracing we're going to 169 00:15:11,139 --> 00:15:16,329 emit data. And this is just an empty clause, we just want to know "hey we got 170 00:15:16,329 --> 00:15:23,480 here". So when we exit the mmap system call and the predicate is set we're 171 00:15:23,480 --> 00:15:27,660 going to set the variable "follow" to zero, because every uninitialized variable 172 00:15:27,660 --> 00:15:33,860 in DTrace is set to zero, so this pretty much amounts to deallocating that variable 173 00:15:33,860 --> 00:15:41,279 and then we're going to exit cleanly. So let me run that. So it takes a couple of 174 00:15:41,279 --> 00:15:48,480 seconds and boom. So you saw a little pause here, that was when the DTrace guard 175 00:15:48,480 --> 00:15:55,009 reverted the driver, the kernel. So now you can see every function call that 176 00:15:55,009 --> 00:15:59,480 happens inside the mmap system call. And this is a little bit hard on the eyes, so 177 00:15:59,480 --> 00:16:08,379 let me pass this flag here and now you can have nice to read indentation. So 178 00:16:08,379 --> 00:16:12,629 now you might say "I don't like that. You are injecting code into the kernel. That 179 00:16:12,629 --> 00:16:17,880 is, that sounds dangerous" and yeah, but let me show you something that I find 180 00:16:17,880 --> 00:16:23,980 really interesting. So I'm not going too much into depth here, but this 181 00:16:23,980 --> 00:16:28,750 is a byte code, so every DTrace script gets compiled to bytecode and this 182 00:16:28,750 --> 00:16:34,499 bytecode gets sent to the kernel and in the kernel you have a virtual machine that 183 00:16:34,499 --> 00:16:39,059 interprets that bytecode. So in case you write a script that for some reason might 184 00:16:39,059 --> 00:16:44,550 go rogue on your kernel, it like allocates too much memory, takes too much time, this 185 00:16:44,550 --> 00:16:49,279 virtual machine can just say "okay, stop it" and just going to revert all the 186 00:16:49,279 --> 00:16:53,890 changes that happened to your kernel, and that's kinda handy. And it's not a new 187 00:16:53,890 --> 00:17:01,199 idea, so if you're using TCP dump it's basically the same approach. They also 188 00:17:01,199 --> 00:17:04,832 have this kind of bytecode, so that's just a little excursion here. This is called 189 00:17:04,832 --> 00:17:13,250 BPF, Berkeley Packet Filter, so it's not an entirely new idea. So everything I 190 00:17:13,250 --> 00:17:19,470 showed you until now was "hey, I can look when function calls happen". that's not 191 00:17:19,470 --> 00:17:22,519 very much information, so we're going to increase the amount of information that we 192 00:17:22,519 --> 00:17:35,080 get out of the system with every example. So let me look at the actual kernel. So I 193 00:17:35,080 --> 00:17:39,980 had to restart my machine, so my setup is basically gone now. So let's look at this 194 00:17:39,980 --> 00:17:45,309 VM fault function. So this is, this is the source code of the operating system that 195 00:17:45,309 --> 00:17:52,900 I'm running right now. This is FreeBSD current 12 and the VM fault function; 196 00:17:52,900 --> 00:17:57,539 remember the mmap system call that I told you? So the mmap system call 197 00:17:57,539 --> 00:18:03,899 I told you can bring, like map a file into your address space. And it doesn't 198 00:18:03,899 --> 00:18:10,320 necessarily have to load the entire file, so whenever we are touching a page in the 199 00:18:10,320 --> 00:18:15,780 system, like a memory page, this machine is four kilobytes and it's no super pages 200 00:18:15,780 --> 00:18:21,429 here, so whenever it touches a piece of memory that you didn't bring into memory 201 00:18:21,429 --> 00:18:25,309 yet, we're generating something that's called a page fault, and then this 202 00:18:25,309 --> 00:18:31,180 function gets called. So here let's look at the arguments, and I'm going to skip 203 00:18:31,180 --> 00:18:36,990 the zeroeth argument, to look at the first argument. So this is the address that 204 00:18:36,990 --> 00:18:44,160 provoked that page fault, this is the type and these are the flags and I'm going 205 00:18:44,160 --> 00:18:48,780 to show you something to make that a little bit more readable. So what about 206 00:18:48,780 --> 00:18:58,960 this one? So you see it's a pointer and this is a big structure, so we want 207 00:18:58,960 --> 00:19:09,961 to be able to look at that structure. And just probably should do this here, so 208 00:19:09,961 --> 00:19:17,090 let's look at this VM fault script here. So this is, make this a little bit more, 209 00:19:17,090 --> 00:19:20,950 so this is, don't pay too much attention to this code, this this is basically just 210 00:19:20,950 --> 00:19:26,049 boilerplate to make make stuff readable and this is where the actual action is 211 00:19:26,049 --> 00:19:31,690 happening. So this is, so what I'm doing there is I'm instrumenting the VM 212 00:19:31,690 --> 00:19:36,350 fault function and whenever we enter it then we're going to use some information 213 00:19:36,350 --> 00:19:40,720 that DTrace gives us for free. So this is execname, this is the name of the 214 00:19:40,720 --> 00:19:45,909 currently running executable that provoked the page fault, this is the process ID and 215 00:19:45,909 --> 00:19:53,250 here we have a bunch of argument variables. So these arg1, arg2, arg3, 216 00:19:53,250 --> 00:19:57,964 that are essentially just integers, so nothing too fancy there. But we wanna 217 00:19:57,964 --> 00:20:02,380 look, wanna be able to look at that struct. And here I'm going to use this 218 00:20:02,380 --> 00:20:08,140 args array, and this args array is kind of special, because it has typing 219 00:20:08,140 --> 00:20:15,870 information about the arguments. So when you run that, so you're referencing that 220 00:20:15,870 --> 00:20:26,570 pointer there with the star, excuse me, and let's just run that and maybe, that's 221 00:20:26,570 --> 00:20:32,899 a start yeah. So this is an in-kernel data structure that we can now look 222 00:20:32,899 --> 00:20:40,010 at. So DTrace enabled us to look at in- memory data structures as the system runs. 223 00:20:40,010 --> 00:20:44,330 And this is really really powerful. In in the DTrace script I could use all 224 00:20:44,330 --> 00:20:50,490 these fields like I can manipulate this args array, this value in there, just like 225 00:20:50,490 --> 00:20:57,010 just like every other variable; I can pretty much work like I was in C. So 226 00:20:57,010 --> 00:21:02,659 how is it doing that? There is something that's called CTF, that's not capture the 227 00:21:02,659 --> 00:21:10,120 flag, it's, this is the, the Compact C Tracing Format, so you can see that but 228 00:21:10,120 --> 00:21:14,320 there is a man page in FreeBSD, and there's a little segment in the kernel 229 00:21:14,320 --> 00:21:19,190 binary, where all this typing information is stored. I don't know how that compares 230 00:21:19,190 --> 00:21:24,320 to modern DWARF but yeah this is what DTrace is working with. So now you might 231 00:21:24,320 --> 00:21:28,549 ask yourself "Why on earth would I do that? Why on earth would I look at virtual 232 00:21:28,549 --> 00:21:33,590 memory, because, yeah, um, this stuff is safe isn't it? I mean there's no bugs in 233 00:21:33,590 --> 00:21:42,820 there." Except when they are. Anyone remembers remembers "Dirty COW"? So this 234 00:21:42,820 --> 00:21:48,510 was a very nasty vulnerability in the Linux kernel and that that was a problem 235 00:21:48,510 --> 00:21:52,399 in the virtual memory management. So it allowed you to write to a file that you 236 00:21:52,399 --> 00:21:56,679 didn't own as a regular user. So you could essentially just write to a binary that 237 00:21:56,679 --> 00:22:01,789 had "set UID" set. Very unpleasant, but I'm not going to bash the Linux folks 238 00:22:01,789 --> 00:22:08,030 here, this is just, I just want to show you these things are hard. And the first 239 00:22:08,030 --> 00:22:15,440 fix for this problem was in 2005 and then it came back in 2016. So now that's fixed 240 00:22:15,440 --> 00:22:21,080 and then it came back with "Huge Dirty COW" in 2017, so this is, I mean this 241 00:22:21,080 --> 00:22:27,580 was there for way over a decade. These things are hard to debug. And this 242 00:22:27,580 --> 00:22:33,110 is what I like about these systems, so not having, not having tools like DTrace to 243 00:22:33,110 --> 00:22:37,640 figure out what's going on inside of the system somehow, to me, amounts to security 244 00:22:37,640 --> 00:22:42,360 by obscurity. And I've heard that some people who are developing exploits for 245 00:22:42,360 --> 00:22:46,100 systems that have DTrace they say "Oh, I really like developing exploits on these 246 00:22:46,100 --> 00:22:53,230 systems, because the tooling is so great!" Yeah, but, to be honest this is cool, 247 00:22:53,230 --> 00:22:58,899 because an exploit is a proof of concept and coming up with these exploits quickly 248 00:22:58,899 --> 00:23:03,440 is very usable, because you know what's going on you can show "Hey, this is going 249 00:23:03,440 --> 00:23:07,279 wrong". I had situations, where people were telling me "Oh, this is this 250 00:23:07,279 --> 00:23:11,020 is not a problem with our program, this is this weird operating system that you're 251 00:23:11,020 --> 00:23:18,100 using. Like Solaris, weird operating system." And, yeah, and then I churned out 252 00:23:18,100 --> 00:23:22,059 some DTrace scripts and "No, it's actually your problem". "Oh, now I can see 253 00:23:22,059 --> 00:23:31,419 that on my Linux box!" Magic. So, everything I showed you until now was 254 00:23:31,419 --> 00:23:38,179 very, very much related to function calls and we want to have a little bit more 255 00:23:38,179 --> 00:23:44,720 semantics here, because you might want to write a script that inspects protocols, 256 00:23:44,720 --> 00:23:48,760 stuff like TCP, UDP stuff like that. So, you don't want to know which function 257 00:23:48,760 --> 00:23:54,320 inside of the kernel is responsible for handling your TCP/IP stuff, so DTrace 258 00:23:54,320 --> 00:24:00,549 comes with something that's called static providers and I'm just going to show the 259 00:24:00,549 --> 00:24:04,769 apropos here. So these are, so every static provider has a main page which is 260 00:24:04,769 --> 00:24:10,950 kind of handy - documentation whoo - and you can see there is an I/O provider if 261 00:24:10,950 --> 00:24:17,539 you are interested in looking at this guy: Oh, IP for looking at IPv4 and IPv6, 262 00:24:17,539 --> 00:24:23,570 TCP... This one is pretty cool, it's about scheduling behavior. So, "what does my 263 00:24:23,570 --> 00:24:29,010 scheduler do?" And if you look at that, you can see some interesting stuff like length 264 00:24:29,010 --> 00:24:33,150 priority if you ever saw things like priority inversion, stuff like that, now 265 00:24:33,150 --> 00:24:36,970 you can see that happen. I'm a nerd, I find this interesting for some reason, I 266 00:24:36,970 --> 00:24:43,230 don't know. And it's also pretty interesting to figure out what's going on, 267 00:24:43,230 --> 00:24:48,279 "why is this getting de-scheduled all the time?" So, some interesting things going 268 00:24:48,279 --> 00:24:55,809 on there. So, I'm running a little bit short on time here, but I just quickly 269 00:24:55,809 --> 00:24:59,340 want to show you something - this is all kernel stuff right now - can we do that 270 00:24:59,340 --> 00:25:05,380 with userspace? Of course. So, there was one provider that didn't show up when I 271 00:25:05,380 --> 00:25:09,590 had my provider listing, but was in the DTrace script where I did this timing 272 00:25:09,590 --> 00:25:16,230 attack stuff. And that's called the PID provider. And the PID provider generates 273 00:25:16,230 --> 00:25:21,080 probes on demand, because a process might have a lot of probes and you will shortly 274 00:25:21,080 --> 00:25:25,190 see why and this is why I'm going to use a very small program which is called "true", 275 00:25:25,190 --> 00:25:31,560 and true just exits with exit code zero. So, nothing too exciting going on here, 276 00:25:31,560 --> 00:25:37,810 and this dollar target gets substituted in, we get the process ID there. And this 277 00:25:37,810 --> 00:25:44,640 is everything that happens when I'm executing this program you see this is a 278 00:25:44,640 --> 00:25:48,679 little bit more fine-grained than the FBT provider, because now we can trace every 279 00:25:48,679 --> 00:25:53,520 single instruction inside of that function, which is kind of a handy. It's a 280 00:25:53,520 --> 00:25:58,090 scriptable debugger. So, these numbers are the instructional offsets inside of that 281 00:25:58,090 --> 00:26:03,360 function. We can also look at - so this is everything in the true segment - we can 282 00:26:03,360 --> 00:26:09,899 also look at libraries that got linked in and there's a lot of stuff happening in 283 00:26:09,899 --> 00:26:15,780 libc for example when you run true. So, one last thing that I wanted to show 284 00:26:15,780 --> 00:26:22,340 you because it consumed a week of my life: I'm using a lot of Haskell and the Mac OS 285 00:26:22,340 --> 00:26:29,419 people, they also have DTrace and they have GHC Haskell DTrace support - so the 286 00:26:29,419 --> 00:26:38,380 Glasgow Haskell compiler - and glorious... they have probes to analyze what's going 287 00:26:38,380 --> 00:26:41,620 on inside of the runtime system. So, I thought "I want to have that, I have 288 00:26:41,620 --> 00:26:47,019 DTrace, why doesn't it work on FreeBSD?" So, after a week of fighting with make 289 00:26:47,019 --> 00:26:55,100 files and linkers, that works: If you check out the recent GHC repository and 290 00:26:55,100 --> 00:27:00,260 build it on FreeBSD, you get all the nice stuff that I'm going to show you now. So, 291 00:27:00,260 --> 00:27:05,909 this is a very boring program - it just starts 32 green threads and schedules them 292 00:27:05,909 --> 00:27:10,470 all over the place - and now I can do something like this: *phone rings* I can 293 00:27:10,470 --> 00:27:13,934 ring a telephone. *laughter* 294 00:27:13,934 --> 00:27:18,750 No, that would be interesting... So, you can also use 295 00:27:18,750 --> 00:27:26,970 wildcards - and not as name of the probe - and this is what's going on inside, like 296 00:27:26,970 --> 00:27:31,580 GC garbage collection and all this stuff. Now you can look at this and write useful 297 00:27:31,580 --> 00:27:37,509 DTrace scripts that also take my runtime system into account. So, stuff like that 298 00:27:37,509 --> 00:27:41,810 exists for I think Python - I'm not entirely sure because I don't use it - 299 00:27:41,810 --> 00:27:49,120 nodejs same, Postgres - I used it but not with DTrace right now - and what a find 300 00:27:49,120 --> 00:27:55,210 interesting: Firefox. When you run JavaScript in your Firefox, it actually 301 00:27:55,210 --> 00:27:59,360 has a provider, so you can trace JavaScript running in your browser with 302 00:27:59,360 --> 00:28:05,130 DTrace, so after everything I just showed you, there might be some stuff going on 303 00:28:05,130 --> 00:28:10,700 there. So yeah, this is basically everything I wanted to show you and I 304 00:28:10,700 --> 00:28:13,759 think I'm going to wrap out, because otherwise we're not going to have a lot of 305 00:28:13,759 --> 00:28:19,001 time for questions and maybe you have some. So yeah, thanks. 306 00:28:19,001 --> 00:28:29,610 *applause* Herald: Thank you very much Raichoo. We 307 00:28:29,610 --> 00:28:34,257 are actually over time already, but we have two more minutes because we started 308 00:28:34,257 --> 00:28:38,817 three minutes late, so if there are any really quick questions, possibly from the 309 00:28:38,817 --> 00:28:43,030 internet... There is one, the signal angel says, let's hear it. 310 00:28:43,030 --> 00:28:48,013 Question: Yeah, hi, okay. So, the question is, "which changes are actually necessary 311 00:28:48,013 --> 00:28:51,809 to do in the kernel of an operating system to support DTrace?" 312 00:28:51,809 --> 00:28:56,370 Answer: That's a lot of work. So, it's not something like you do in a weekend. This 313 00:28:56,370 --> 00:29:03,062 is... So, the person who started the work on FreeBSD has sadly passed away now, but 314 00:29:03,062 --> 00:29:09,559 I think they took a couple of years to have everything in place, so you have to 315 00:29:09,559 --> 00:29:13,730 have stuff like the CTF thing that I showed you, which is what OpenBSD is 316 00:29:13,730 --> 00:29:19,890 currently working on. And then you need all those those magic gizmos, like kernel 317 00:29:19,890 --> 00:29:25,660 modules and stuff like that. So, it takes a lot of time, but it's been ported to 318 00:29:25,660 --> 00:29:30,889 most operating systems that are available and in use right now. So yeah, hope this 319 00:29:30,889 --> 00:29:34,239 answers the question. Herald: Excellent and there are no more 320 00:29:34,239 --> 00:29:38,839 questions here in the room. I will thank Raichoo and you can find him outside of 321 00:29:38,839 --> 00:29:46,590 the room and also on Twitter at "raichoo" if you have any more further question. 322 00:29:46,590 --> 00:29:51,405 *postroll music* 323 00:29:51,405 --> 00:30:08,000 subtitles created by c3subtitles.de in the year 2020. Join, and help us!