0 00:00:00,000 --> 00:00:30,000 Dear viewer, these subtitles were generated by a machine via the service Trint and therefore are (very) buggy. If you are capable, please help us to create good quality subtitles: https://c3subtitles.de/talk/90 Thanks! 1 00:00:10,660 --> 00:00:12,039 Well, good morning, everyone, 2 00:00:13,630 --> 00:00:15,519 so thank you for getting up early 3 00:00:15,520 --> 00:00:18,249 to to come to my talk today. 4 00:00:18,250 --> 00:00:20,559 I barely made it out of bed myself. 5 00:00:24,500 --> 00:00:26,089 So I hope we have a good time, it's a 6 00:00:26,090 --> 00:00:27,679 really exciting time 7 00:00:27,680 --> 00:00:29,779 in the history of 8 00:00:29,780 --> 00:00:31,969 networking when it comes to 9 00:00:31,970 --> 00:00:34,069 visibility into the internet, and 10 00:00:34,070 --> 00:00:35,689 that's what I'm going to be talking about 11 00:00:35,690 --> 00:00:37,369 today is new 12 00:00:37,370 --> 00:00:39,889 advances, new techniques 13 00:00:39,890 --> 00:00:41,389 that are coming out of the evolution of 14 00:00:41,390 --> 00:00:43,729 technology that can enable 15 00:00:43,730 --> 00:00:45,889 all of us, each and every one of us, 16 00:00:45,890 --> 00:00:48,199 you and I, to individually 17 00:00:48,200 --> 00:00:51,019 get visibility into 18 00:00:51,020 --> 00:00:52,489 what is happening throughout the 19 00:00:52,490 --> 00:00:53,659 internet, like has 20 00:00:53,660 --> 00:00:56,149 never really been possible before. 21 00:00:57,380 --> 00:00:58,880 Now I am 22 00:01:00,440 --> 00:01:00,679 just 23 00:01:00,680 --> 00:01:02,089 a little bit about myself. 24 00:01:02,090 --> 00:01:03,949 I'm a professor at the University of 25 00:01:03,950 --> 00:01:06,409 Michigan, where I work in 26 00:01:06,410 --> 00:01:08,689 security and privacy with 27 00:01:08,690 --> 00:01:10,759 an emphasis on problems that affect the 28 00:01:10,760 --> 00:01:11,760 real world. 29 00:01:12,830 --> 00:01:13,249 I am, 30 00:01:13,250 --> 00:01:13,789 however, an 31 00:01:13,790 --> 00:01:16,759 armchair professor, and it's always 32 00:01:16,760 --> 00:01:18,829 a really great privilege to be speaking 33 00:01:18,830 --> 00:01:20,839 to people who are actually in the world 34 00:01:20,840 --> 00:01:23,179 doing things instead of up in the ivory 35 00:01:23,180 --> 00:01:24,589 tower or somewhere sitting on our 36 00:01:24,590 --> 00:01:25,590 asses. 37 00:01:26,330 --> 00:01:27,920 So please, 38 00:01:28,940 --> 00:01:30,709 please bear with me if anything 39 00:01:31,760 --> 00:01:33,650 is is not clear, is 40 00:01:34,880 --> 00:01:37,069 is not at the level of detail you'd like. 41 00:01:37,070 --> 00:01:39,229 If you'd like more detail, please do 42 00:01:39,230 --> 00:01:41,839 ask in Q&A or even interrupt 43 00:01:41,840 --> 00:01:43,909 me, as is the style in talks 44 00:01:43,910 --> 00:01:44,910 where I come from. 45 00:01:47,270 --> 00:01:48,919 So just before I begin, what I'm going to 46 00:01:48,920 --> 00:01:49,669 be talking about 47 00:01:49,670 --> 00:01:51,739 today is not only 48 00:01:51,740 --> 00:01:52,339 my work, 49 00:01:52,340 --> 00:01:53,899 but is collaborative work with 50 00:01:53,900 --> 00:01:55,579 a great many people 51 00:01:55,580 --> 00:01:56,989 who I 52 00:01:56,990 --> 00:01:59,299 will list up here so you can go and 53 00:01:59,300 --> 00:02:01,159 take a look at our papers. 54 00:02:01,160 --> 00:02:03,439 They're all available on my website. 55 00:02:03,440 --> 00:02:05,539 I've taken an open access pledge, so 56 00:02:05,540 --> 00:02:07,699 everything that I write 57 00:02:07,700 --> 00:02:08,960 is available for free. 58 00:02:10,130 --> 00:02:12,199 In particular, I need to 59 00:02:12,200 --> 00:02:13,459 acknowledge my student 60 00:02:13,460 --> 00:02:14,439 as a queer drummer. 61 00:02:14,440 --> 00:02:16,519 Edge this. 62 00:02:16,520 --> 00:02:18,199 Most of this work I'm going to be talking 63 00:02:18,200 --> 00:02:20,269 about today is going to form a large part 64 00:02:20,270 --> 00:02:21,739 of his dissertation, and he's been 65 00:02:21,740 --> 00:02:23,899 working on this extremely well 66 00:02:23,900 --> 00:02:25,249 over the past couple of years. 67 00:02:28,370 --> 00:02:30,439 So the internet, right, this is where we 68 00:02:30,440 --> 00:02:31,440 live, 69 00:02:32,240 --> 00:02:32,599 and it 70 00:02:32,600 --> 00:02:34,369 looks something like this according 71 00:02:34,370 --> 00:02:36,739 to the kinds of maps that 72 00:02:36,740 --> 00:02:38,179 that people have been making since the 73 00:02:38,180 --> 00:02:39,530 late 90s. This is 74 00:02:40,700 --> 00:02:43,459 a visualization of 75 00:02:43,460 --> 00:02:45,199 routing interconnections on the internet, 76 00:02:45,200 --> 00:02:47,599 sort of the the topology of the network. 77 00:02:48,830 --> 00:02:51,559 But just seeing the shape of the land 78 00:02:51,560 --> 00:02:53,239 in a picture like this doesn't really 79 00:02:53,240 --> 00:02:53,539 tell 80 00:02:53,540 --> 00:02:55,729 us that much about what's there. 81 00:02:55,730 --> 00:02:57,649 Are there trees or are there people? 82 00:02:57,650 --> 00:03:00,379 Are there flowers or are there mountains? 83 00:03:00,380 --> 00:03:02,659 We need to know a different 84 00:03:02,660 --> 00:03:02,959 level 85 00:03:02,960 --> 00:03:04,279 of detail and a different kind of 86 00:03:04,280 --> 00:03:06,859 information in order to make sense 87 00:03:06,860 --> 00:03:09,049 of the devices, 88 00:03:09,050 --> 00:03:10,579 the information that makes up the 89 00:03:10,580 --> 00:03:11,580 internet. 90 00:03:11,990 --> 00:03:13,489 And one place we can 91 00:03:13,490 --> 00:03:15,769 get some insight into that is from search 92 00:03:15,770 --> 00:03:17,329 engines like Google. 93 00:03:17,330 --> 00:03:19,309 But Google indexes the web. 94 00:03:20,420 --> 00:03:22,609 And so when their crawlers are 95 00:03:22,610 --> 00:03:23,929 navigating the internet, they're 96 00:03:23,930 --> 00:03:26,149 concentrating on web sites that are 97 00:03:26,150 --> 00:03:28,579 making up this interconnected 98 00:03:28,580 --> 00:03:29,569 graph of links. 99 00:03:29,570 --> 00:03:31,429 That is where most of us spend most of 100 00:03:31,430 --> 00:03:32,929 our time. 101 00:03:32,930 --> 00:03:34,519 But if you think about the internet in a 102 00:03:34,520 --> 00:03:36,229 different way in terms of what's plugged 103 00:03:36,230 --> 00:03:37,489 into it, what makes up 104 00:03:37,490 --> 00:03:39,259 the public address space? 105 00:03:39,260 --> 00:03:40,939 You end up getting a very, very different 106 00:03:40,940 --> 00:03:42,799 picture from what you see in a search 107 00:03:42,800 --> 00:03:43,669 engine. 108 00:03:43,670 --> 00:03:45,739 And what I'm going to be talking about 109 00:03:45,740 --> 00:03:48,199 today is methodology 110 00:03:48,200 --> 00:03:48,919 techniques that 111 00:03:48,920 --> 00:03:50,989 we can use to get that kind of view 112 00:03:50,990 --> 00:03:52,249 of the internet, a kind of view of the 113 00:03:52,250 --> 00:03:54,319 internet that is about the devices, 114 00:03:54,320 --> 00:03:56,329 the computers, the embedded systems that 115 00:03:56,330 --> 00:03:57,330 are plugged into it. 116 00:04:00,770 --> 00:04:03,859 So internet wide scanning 117 00:04:03,860 --> 00:04:04,429 the topic of 118 00:04:04,430 --> 00:04:05,869 my talk, I just find this 119 00:04:05,870 --> 00:04:08,419 so exciting, I've been fascinated by 120 00:04:08,420 --> 00:04:10,879 the idea that it's possible 121 00:04:10,880 --> 00:04:12,949 it's tractable to connect to every 122 00:04:12,950 --> 00:04:13,489 computer 123 00:04:13,490 --> 00:04:15,169 on the internet and have a conversation 124 00:04:15,170 --> 00:04:15,349 with 125 00:04:15,350 --> 00:04:16,299 it 126 00:04:16,300 --> 00:04:17,629 for for many, many years. 127 00:04:17,630 --> 00:04:19,999 And some of the first inspiration for me 128 00:04:20,000 --> 00:04:21,919 came out of this project pioneered by the 129 00:04:21,920 --> 00:04:22,579 EFA 130 00:04:22,580 --> 00:04:24,979 called the SSL Observatory. 131 00:04:24,980 --> 00:04:26,479 What the SSL observatory 132 00:04:26,480 --> 00:04:28,579 did is they had a small cluster 133 00:04:28,580 --> 00:04:30,589 of machines that they used over a period 134 00:04:30,590 --> 00:04:30,709 of 135 00:04:30,710 --> 00:04:32,839 months to try to make a connection 136 00:04:32,840 --> 00:04:33,589 to every 137 00:04:33,590 --> 00:04:35,629 HTTPS web site. 138 00:04:35,630 --> 00:04:37,579 And they did this not with a list of web 139 00:04:37,580 --> 00:04:39,619 sites, but by exhaustively 140 00:04:39,620 --> 00:04:42,259 enumerating the IP address space. 141 00:04:42,260 --> 00:04:43,789 So if you think about it, there 142 00:04:43,790 --> 00:04:44,479 are 143 00:04:44,480 --> 00:04:46,939 IPv4 addresses are 32 bits. 144 00:04:46,940 --> 00:04:48,950 There are about four billion of them. 145 00:04:49,970 --> 00:04:52,129 Maybe I don't know, 15 20 146 00:04:52,130 --> 00:04:52,969 percent of them 147 00:04:52,970 --> 00:04:55,399 are not publicly reportable. 148 00:04:55,400 --> 00:04:56,239 So you 149 00:04:56,240 --> 00:04:57,240 have 150 00:04:57,890 --> 00:04:58,279 a 151 00:04:58,280 --> 00:04:59,989 substantial fraction of four billion 152 00:04:59,990 --> 00:05:01,579 addresses. 153 00:05:01,580 --> 00:05:02,809 That seems like a lot. 154 00:05:02,810 --> 00:05:04,729 On one hand, but on the other hand, 155 00:05:04,730 --> 00:05:05,989 networks and computers 156 00:05:05,990 --> 00:05:08,239 have gotten so much faster in recent 157 00:05:08,240 --> 00:05:08,989 years 158 00:05:08,990 --> 00:05:10,939 that suddenly just making 159 00:05:10,940 --> 00:05:12,349 a disappear connection to all 160 00:05:12,350 --> 00:05:14,569 of these machines is something that that 161 00:05:14,570 --> 00:05:15,709 you can do that we can write 162 00:05:15,710 --> 00:05:17,359 programs to do. 163 00:05:17,360 --> 00:05:19,489 It's not easy, but it's tractable. 164 00:05:20,960 --> 00:05:22,399 Further inspiration 165 00:05:22,400 --> 00:05:25,099 came about a year ago 166 00:05:25,100 --> 00:05:25,459 from 167 00:05:25,460 --> 00:05:27,559 this research from the the Qana 168 00:05:27,560 --> 00:05:29,209 botnet, the Qana botnet 169 00:05:30,890 --> 00:05:31,579 was 170 00:05:31,580 --> 00:05:33,349 an experiment done by 171 00:05:33,350 --> 00:05:35,419 some anonymous people who may or may 172 00:05:35,420 --> 00:05:36,420 not be here 173 00:05:37,190 --> 00:05:38,190 who 174 00:05:39,560 --> 00:05:40,399 who discovered 175 00:05:40,400 --> 00:05:42,499 that if they did this kind 176 00:05:42,500 --> 00:05:44,989 of internet wide probing, 177 00:05:44,990 --> 00:05:46,969 if they if they probed embedded 178 00:05:46,970 --> 00:05:48,109 systems, they found many, 179 00:05:48,110 --> 00:05:49,279 many systems that 180 00:05:49,280 --> 00:05:51,499 would allow them root access 181 00:05:51,500 --> 00:05:53,119 with just a short list of default 182 00:05:53,120 --> 00:05:54,229 passwords. That's my 183 00:05:54,230 --> 00:05:55,189 understanding. 184 00:05:55,190 --> 00:05:56,209 And using this 185 00:05:56,210 --> 00:05:58,489 technique, they compromised about 186 00:05:58,490 --> 00:06:01,069 400 20000 187 00:06:01,070 --> 00:06:01,759 boxes. 188 00:06:01,760 --> 00:06:03,859 Things like I presume that 189 00:06:03,860 --> 00:06:05,989 they were of the form of home 190 00:06:05,990 --> 00:06:07,189 routers and so forth. 191 00:06:07,190 --> 00:06:08,629 Many of them 192 00:06:08,630 --> 00:06:08,869 and 193 00:06:08,870 --> 00:06:10,759 they used these boxes to do 194 00:06:10,760 --> 00:06:12,409 probing of 195 00:06:12,410 --> 00:06:14,479 the rest of the internet on a variety 196 00:06:14,480 --> 00:06:15,919 of really interesting parts. 197 00:06:17,060 --> 00:06:19,159 So this is a map of the 198 00:06:19,160 --> 00:06:19,849 hosts that were 199 00:06:19,850 --> 00:06:21,739 compromised as part of the Karnit botnet 200 00:06:21,740 --> 00:06:22,740 to do this study. 201 00:06:23,750 --> 00:06:25,489 So they got 202 00:06:25,490 --> 00:06:27,319 a huge volume of really, really 203 00:06:27,320 --> 00:06:29,869 interesting data from that, as the ATF 204 00:06:29,870 --> 00:06:32,869 did when they were probing HTTPS. 205 00:06:32,870 --> 00:06:35,089 But what I think is true about both 206 00:06:35,090 --> 00:06:35,329 of these 207 00:06:35,330 --> 00:06:37,789 studies is that they're both 208 00:06:37,790 --> 00:06:39,649 really impressive in terms of what they 209 00:06:39,650 --> 00:06:42,199 got, but also were a tremendous 210 00:06:42,200 --> 00:06:42,469 amount 211 00:06:42,470 --> 00:06:44,599 of effort of time for the people 212 00:06:44,600 --> 00:06:44,989 who did 213 00:06:44,990 --> 00:06:46,039 them. 214 00:06:46,040 --> 00:06:46,399 I don't 215 00:06:46,400 --> 00:06:48,499 think every one of us 216 00:06:48,500 --> 00:06:51,049 is able or willing to 217 00:06:51,050 --> 00:06:53,329 create a botnet, at least I hope not 218 00:06:53,330 --> 00:06:55,370 for purposes of internet wide research. 219 00:06:58,040 --> 00:06:59,869 So there have been a couple of other 220 00:06:59,870 --> 00:07:01,369 studies I should mention, too. 221 00:07:01,370 --> 00:07:03,589 And these are, I think, at 222 00:07:03,590 --> 00:07:05,149 least from my perspective, these are some 223 00:07:05,150 --> 00:07:06,079 of the more interesting 224 00:07:06,080 --> 00:07:07,699 works that have come out over the past 225 00:07:07,700 --> 00:07:08,659 five years. The first 226 00:07:08,660 --> 00:07:10,669 one on this list census and census and 227 00:07:10,670 --> 00:07:12,799 survey of the visible internet in 228 00:07:12,800 --> 00:07:14,689 2008 was 229 00:07:14,690 --> 00:07:15,889 a study of 230 00:07:16,940 --> 00:07:17,940 basically 231 00:07:19,490 --> 00:07:20,359 what 232 00:07:20,360 --> 00:07:22,519 what addresses were connected 233 00:07:22,520 --> 00:07:25,159 and ratable in the IPv4 space. 234 00:07:25,160 --> 00:07:26,029 Not a security 235 00:07:26,030 --> 00:07:27,139 study, 236 00:07:27,140 --> 00:07:29,929 but it claimed to be the first exhaustive 237 00:07:29,930 --> 00:07:30,859 survey 238 00:07:30,860 --> 00:07:31,759 of IP 239 00:07:31,760 --> 00:07:34,669 addresses plugged into the internet 240 00:07:34,670 --> 00:07:34,909 for 241 00:07:34,910 --> 00:07:36,169 20 years. 242 00:07:36,170 --> 00:07:38,179 So since the time the internet was very 243 00:07:38,180 --> 00:07:40,789 small until just five years ago, 244 00:07:40,790 --> 00:07:42,019 has been a dark age. 245 00:07:42,020 --> 00:07:43,879 If you think about the evolution of the 246 00:07:43,880 --> 00:07:45,769 universe, we have these these periods 247 00:07:45,770 --> 00:07:47,839 when we have more visibility and less 248 00:07:47,840 --> 00:07:48,739 less visibility. 249 00:07:48,740 --> 00:07:51,079 Well, that was a period of very little 250 00:07:51,080 --> 00:07:52,309 visibility. 251 00:07:52,310 --> 00:07:54,449 Since then, we've had studies like the 252 00:07:54,450 --> 00:07:56,629 SSL observatory, like 253 00:07:56,630 --> 00:07:58,579 when I was involved in two years ago that 254 00:07:58,580 --> 00:08:01,219 I'll tell you a little bit about 255 00:08:01,220 --> 00:08:03,079 where we were able to compromise a 256 00:08:03,080 --> 00:08:04,609 substantial fraction of 257 00:08:04,610 --> 00:08:06,709 the private 258 00:08:06,710 --> 00:08:09,319 keys used for internet cryptography 259 00:08:09,320 --> 00:08:11,569 using internet wide scanning techniques 260 00:08:11,570 --> 00:08:13,309 and then the Khanum botnet just last 261 00:08:13,310 --> 00:08:14,359 year. 262 00:08:14,360 --> 00:08:15,289 All exciting work. 263 00:08:15,290 --> 00:08:17,179 But what unifies these is they 264 00:08:17,180 --> 00:08:19,220 take so much effort to do so. 265 00:08:20,510 --> 00:08:22,999 We're talking about thousands of CPU 266 00:08:23,000 --> 00:08:24,529 hours, months of waiting. 267 00:08:26,120 --> 00:08:29,329 The first three studies on this list 268 00:08:29,330 --> 00:08:30,330 all involved 269 00:08:31,460 --> 00:08:33,769 the use of off the shelf network 270 00:08:33,770 --> 00:08:36,048 probing tools, things like end map 271 00:08:36,049 --> 00:08:37,099 and 272 00:08:37,100 --> 00:08:39,288 end map. Although a wonderful 273 00:08:39,289 --> 00:08:40,158 security tool, 274 00:08:40,159 --> 00:08:41,689 something I use all the time, I have 275 00:08:41,690 --> 00:08:43,908 great admiration for Theodore, and his 276 00:08:43,909 --> 00:08:45,139 work 277 00:08:45,140 --> 00:08:46,939 in Map is not a tool that's 278 00:08:46,940 --> 00:08:48,919 designed or optimized for internet wide 279 00:08:48,920 --> 00:08:51,289 scanning, and that turns 280 00:08:51,290 --> 00:08:53,569 out to be a large part of the problem. 281 00:08:53,570 --> 00:08:55,189 So in each of these cases, 282 00:08:55,190 --> 00:08:57,019 either tremendous effort by the 283 00:08:57,020 --> 00:08:59,089 researchers, tremendous time 284 00:08:59,090 --> 00:09:01,250 or a large number of hosts was required. 285 00:09:02,360 --> 00:09:04,339 So the problem, from my perspective, is 286 00:09:04,340 --> 00:09:06,649 we've been using tools like this 287 00:09:06,650 --> 00:09:09,529 to try to get visibility into the 288 00:09:09,530 --> 00:09:11,089 the devices plugged into the internet 289 00:09:11,090 --> 00:09:13,189 where we really want something like this, 290 00:09:13,190 --> 00:09:13,429 we 291 00:09:13,430 --> 00:09:16,039 want something that's designed, optimized 292 00:09:16,040 --> 00:09:18,109 for the purpose of internet 293 00:09:18,110 --> 00:09:19,110 wide scanning. 294 00:09:21,000 --> 00:09:23,609 So in my research group last year, 295 00:09:23,610 --> 00:09:25,379 we asked, well, 296 00:09:25,380 --> 00:09:26,429 what is 297 00:09:27,570 --> 00:09:28,109 this kind 298 00:09:28,110 --> 00:09:30,089 of work didn't require heroic effort for 299 00:09:30,090 --> 00:09:32,669 researchers? What if we could democratize 300 00:09:32,670 --> 00:09:33,899 internet wide measurement? 301 00:09:35,010 --> 00:09:35,489 What if 302 00:09:35,490 --> 00:09:37,709 we could do a scan like the SSL 303 00:09:37,710 --> 00:09:39,299 observatory that E.F. 304 00:09:39,300 --> 00:09:40,109 did 305 00:09:40,110 --> 00:09:41,519 every day? 306 00:09:41,520 --> 00:09:42,869 How much more detail? 307 00:09:42,870 --> 00:09:44,639 What new things could we learned with 308 00:09:44,640 --> 00:09:46,799 that degree of visibility? 309 00:09:46,800 --> 00:09:48,659 And in particular, what kinds of 310 00:09:48,660 --> 00:09:49,529 optimization could 311 00:09:49,530 --> 00:09:51,869 we do if we brought a scanner from 312 00:09:51,870 --> 00:09:54,209 scratch to do internet wide measurement? 313 00:09:56,200 --> 00:09:58,449 So to answer these questions, we went 314 00:09:58,450 --> 00:10:00,099 and built that scanner and we built a 315 00:10:00,100 --> 00:10:02,679 tool called ZMapp that 316 00:10:02,680 --> 00:10:04,750 that you can download right now. 317 00:10:05,920 --> 00:10:08,109 ZMapp is an open source 318 00:10:08,110 --> 00:10:10,209 tool that will allow you 319 00:10:10,210 --> 00:10:12,099 to do a horizontal 320 00:10:12,100 --> 00:10:13,389 scan of the whole internet. 321 00:10:13,390 --> 00:10:16,209 So looking at one port across 322 00:10:16,210 --> 00:10:18,999 every route, able IPv4 323 00:10:19,000 --> 00:10:20,169 address 324 00:10:20,170 --> 00:10:22,389 from just one machine in 325 00:10:22,390 --> 00:10:23,409 about 45 326 00:10:23,410 --> 00:10:25,749 minutes. If you have enough upstream 327 00:10:25,750 --> 00:10:26,619 bandwidth 328 00:10:26,620 --> 00:10:27,609 and to be able to 329 00:10:27,610 --> 00:10:28,719 get 330 00:10:28,720 --> 00:10:31,329 to be able to see almost everything 331 00:10:31,330 --> 00:10:34,059 that you would see by using more 332 00:10:34,060 --> 00:10:35,979 precise and thorough methods. 333 00:10:35,980 --> 00:10:37,899 So this is something we call shotgun 334 00:10:37,900 --> 00:10:38,529 scanning. 335 00:10:38,530 --> 00:10:40,539 Just stay in the same way as shotgun DNA 336 00:10:40,540 --> 00:10:41,529 sequencing. 337 00:10:41,530 --> 00:10:43,569 Rapid, relatively dirty but 338 00:10:43,570 --> 00:10:45,639 very effective technique that gives 339 00:10:45,640 --> 00:10:46,959 you that 340 00:10:46,960 --> 00:10:49,329 drastically reduces the cost 341 00:10:49,330 --> 00:10:50,979 of doing scanning. 342 00:10:50,980 --> 00:10:53,529 So if you have, say, a gigabit 343 00:10:53,530 --> 00:10:54,099 upstream 344 00:10:54,100 --> 00:10:55,839 like you could get if you can plug into a 345 00:10:55,840 --> 00:10:56,199 wall 346 00:10:56,200 --> 00:10:57,200 here, 347 00:10:59,980 --> 00:11:00,849 I hear that they 348 00:11:00,850 --> 00:11:02,989 have much more bandwidth available. 349 00:11:02,990 --> 00:11:05,079 But my understanding 350 00:11:05,080 --> 00:11:07,149 and maybe someone involved in the network 351 00:11:07,150 --> 00:11:08,349 here can correct me if I'm wrong. 352 00:11:08,350 --> 00:11:09,669 But my understanding is that the 353 00:11:09,670 --> 00:11:11,569 distribution layer that takes it down to 354 00:11:11,570 --> 00:11:13,719 the the ports you can get access to is 355 00:11:13,720 --> 00:11:14,919 only running at a gigabit. 356 00:11:14,920 --> 00:11:16,569 So you're going to get performance like 357 00:11:16,570 --> 00:11:18,879 this instead of 10 gigabit or or more 358 00:11:18,880 --> 00:11:19,880 performance. 359 00:11:22,570 --> 00:11:24,789 But you can complete a scan 360 00:11:24,790 --> 00:11:27,039 in less than an hour and contact every 361 00:11:27,040 --> 00:11:28,179 redoubtable IP address. 362 00:11:29,950 --> 00:11:31,629 Now it really 363 00:11:31,630 --> 00:11:32,019 is as 364 00:11:32,020 --> 00:11:33,020 simple as this. 365 00:11:34,090 --> 00:11:35,589 I'm going to give you a demo in just a 366 00:11:35,590 --> 00:11:36,590 second. 367 00:11:37,540 --> 00:11:39,609 However, I recommend not doing it 368 00:11:39,610 --> 00:11:41,559 from the Wi-Fi, at least during not 369 00:11:41,560 --> 00:11:42,669 during my talk. 370 00:11:44,430 --> 00:11:45,629 Because that might interfere 371 00:11:45,630 --> 00:11:47,789 with the ability to 372 00:11:47,790 --> 00:11:49,949 to demonstrate what I have to show 373 00:11:49,950 --> 00:11:52,079 you. So let me quickly start 374 00:11:52,080 --> 00:11:53,249 something up here so that it 375 00:11:53,250 --> 00:11:55,349 finishes before our hour 376 00:11:55,350 --> 00:11:57,569 is up. So I'm going to do this in 377 00:11:57,570 --> 00:11:59,699 a VM and I'm going to try to 378 00:11:59,700 --> 00:12:01,259 make it a little bit bigger. 379 00:12:01,260 --> 00:12:02,969 So what I'm going to do? 380 00:12:02,970 --> 00:12:04,289 Let me show you 381 00:12:04,290 --> 00:12:05,549 ZMapp, an operation. 382 00:12:05,550 --> 00:12:06,550 So 383 00:12:07,830 --> 00:12:08,309 if this 384 00:12:08,310 --> 00:12:09,629 is still communicating 385 00:12:09,630 --> 00:12:10,630 with the internet. 386 00:12:11,480 --> 00:12:12,769 Come on to sleepy. 387 00:12:16,080 --> 00:12:17,789 Let's try that again. 388 00:12:20,190 --> 00:12:21,330 Boo, boo, boo. 389 00:12:32,590 --> 00:12:34,449 Well, maybe the internet is no longer 390 00:12:34,450 --> 00:12:35,450 working. 391 00:12:37,960 --> 00:12:39,340 Internet, are you still there? 392 00:12:41,900 --> 00:12:44,059 Someone broke the internet 393 00:12:44,060 --> 00:12:46,129 if anyone in this room is running an 394 00:12:46,130 --> 00:12:48,079 internet wide scan from the Wi-Fi. 395 00:12:48,080 --> 00:12:50,359 Can you please? Ah, 396 00:12:50,360 --> 00:12:52,549 can you please pause it? 397 00:12:52,550 --> 00:12:54,709 All right, so I 398 00:12:54,710 --> 00:12:55,069 should be 399 00:12:55,070 --> 00:12:56,539 able to get a few packets through 400 00:12:56,540 --> 00:12:58,639 here. This is one of those cases where 401 00:12:58,640 --> 00:12:59,599 it works before 402 00:12:59,600 --> 00:13:01,819 the audience is all on their laptops. 403 00:13:01,820 --> 00:13:03,469 But as soon as you start speaking and 404 00:13:03,470 --> 00:13:05,840 people get bored and start reading email, 405 00:13:06,950 --> 00:13:07,950 it stops. 406 00:13:08,800 --> 00:13:10,600 OK. This is not being. 407 00:13:11,830 --> 00:13:13,270 Terribly responsive. 408 00:13:16,690 --> 00:13:18,779 OK, well, maybe I can run something 409 00:13:18,780 --> 00:13:21,330 locally. Um. 410 00:13:26,910 --> 00:13:29,039 Let me see if this works, and I haven't 411 00:13:29,040 --> 00:13:30,900 tested this from the local machine yet. 412 00:13:37,350 --> 00:13:39,149 Not working from the local machine, 413 00:13:39,150 --> 00:13:40,739 either, because all the packets are 414 00:13:40,740 --> 00:13:42,449 dropping right for the same reason that 415 00:13:42,450 --> 00:13:43,649 it doesn't work remotely. 416 00:13:45,760 --> 00:13:47,619 All right. Well, if I was able to give 417 00:13:47,620 --> 00:13:49,869 you a demo, which I promise I will start 418 00:13:49,870 --> 00:13:51,939 a demo as 419 00:13:51,940 --> 00:13:54,369 soon as this internet 420 00:13:54,370 --> 00:13:56,679 comes back, I could demonstrate a couple 421 00:13:56,680 --> 00:13:57,680 of things. 422 00:13:58,420 --> 00:14:00,489 One thing is ZMapp 423 00:14:00,490 --> 00:14:02,619 is good for some fun tricks. 424 00:14:02,620 --> 00:14:04,689 Let's say you want to find some open 425 00:14:04,690 --> 00:14:06,549 proxy, some machines that are listening 426 00:14:06,550 --> 00:14:08,289 on ADHD. 427 00:14:08,290 --> 00:14:10,419 Well, you can type a command 428 00:14:10,420 --> 00:14:11,109 ZMapp, 429 00:14:11,110 --> 00:14:13,509 Dash eight, Dash Capital 430 00:14:13,510 --> 00:14:15,879 and say you want a thousand addresses 431 00:14:15,880 --> 00:14:17,679 that are listening on that port. 432 00:14:17,680 --> 00:14:19,719 And in substantially 433 00:14:19,720 --> 00:14:20,979 less than a second, you're going to get 434 00:14:20,980 --> 00:14:23,799 back a list of 1000 IP addresses that 435 00:14:23,800 --> 00:14:25,389 are responsive on that port. 436 00:14:25,390 --> 00:14:26,799 A large fraction of them are going to be 437 00:14:26,800 --> 00:14:27,909 open proxies. 438 00:14:27,910 --> 00:14:29,619 You want to find some random web servers, 439 00:14:29,620 --> 00:14:30,969 the same thing. 440 00:14:30,970 --> 00:14:32,949 So that's if you want to just do a small 441 00:14:32,950 --> 00:14:34,269 sample. 442 00:14:34,270 --> 00:14:36,579 If you want to scan everything, 443 00:14:36,580 --> 00:14:36,789 you 444 00:14:36,790 --> 00:14:38,199 could type a command like the one I 445 00:14:38,200 --> 00:14:39,429 was planning to run. 446 00:14:41,710 --> 00:14:43,209 Still not working like the 447 00:14:43,210 --> 00:14:45,699 one I was planning to run, 448 00:14:45,700 --> 00:14:48,279 which would be something 449 00:14:48,280 --> 00:14:49,280 like. 450 00:14:49,960 --> 00:14:50,960 This. 451 00:14:53,150 --> 00:14:54,559 So what I'm going to do in the hour 452 00:14:54,560 --> 00:14:56,509 after this talk, assuming that I can get 453 00:14:56,510 --> 00:14:58,639 on the internet and talk to a machine at 454 00:14:58,640 --> 00:15:01,729 Michigan that has a fast uplink 455 00:15:01,730 --> 00:15:02,179 is I'm going 456 00:15:02,180 --> 00:15:04,159 to do a SIM scan 457 00:15:04,160 --> 00:15:06,169 on Port 458 00:15:06,170 --> 00:15:08,089 zero x thirty three. 459 00:15:08,090 --> 00:15:09,769 So let's just let the world 460 00:15:09,770 --> 00:15:12,319 know that we're here today 461 00:15:12,320 --> 00:15:13,129 by 462 00:15:13,130 --> 00:15:15,019 making a connection to every public IP 463 00:15:15,020 --> 00:15:17,299 address on on that port. 464 00:15:17,300 --> 00:15:19,849 So if you are on your machine anywhere 465 00:15:19,850 --> 00:15:20,659 in the world 466 00:15:20,660 --> 00:15:22,909 over the hour after the talk, want to 467 00:15:22,910 --> 00:15:25,249 say run tcpdump and look for incoming 468 00:15:25,250 --> 00:15:26,659 send packets on 469 00:15:26,660 --> 00:15:28,339 that port? 470 00:15:28,340 --> 00:15:30,709 I think you'll get one from us in the. 471 00:15:30,710 --> 00:15:33,019 The Michigan subnet is one for one 472 00:15:33,020 --> 00:15:34,509 two one two six. 473 00:15:34,510 --> 00:15:36,169 So look for an incoming packet from 474 00:15:36,170 --> 00:15:38,689 there. If you get one, you'll know 475 00:15:38,690 --> 00:15:40,309 it's part of the scan and 476 00:15:40,310 --> 00:15:42,589 I'd appreciate if people who are looking 477 00:15:42,590 --> 00:15:43,309 for that. 478 00:15:43,310 --> 00:15:45,019 Let me know whether they've gotten one or 479 00:15:45,020 --> 00:15:46,429 not, because that'll give us a nice 480 00:15:46,430 --> 00:15:48,499 informal experiments, some scientific 481 00:15:48,500 --> 00:15:50,779 purpose, an informal experiment 482 00:15:50,780 --> 00:15:51,139 about 483 00:15:51,140 --> 00:15:54,259 reachability in coverage for the scanner. 484 00:15:54,260 --> 00:15:54,619 So I'm 485 00:15:54,620 --> 00:15:56,329 sorry I was planning to do this live 486 00:15:56,330 --> 00:15:58,279 during the talk. It'll only take 45 487 00:15:58,280 --> 00:15:59,839 minutes, but it involved sending about 488 00:15:59,840 --> 00:16:02,389 100 bytes to 489 00:16:02,390 --> 00:16:03,919 another machine in order to get it 490 00:16:03,920 --> 00:16:06,109 started, which we don't have the ability 491 00:16:06,110 --> 00:16:07,279 to do, apparently. 492 00:16:08,690 --> 00:16:11,179 All right. So I am not here, however, 493 00:16:11,180 --> 00:16:12,769 just to push ZMapp. 494 00:16:12,770 --> 00:16:14,329 And there are other tools that have been 495 00:16:14,330 --> 00:16:16,369 developed in about the same time frame 496 00:16:16,370 --> 00:16:18,619 that I'd like to also lend my endorsement 497 00:16:18,620 --> 00:16:19,579 to. 498 00:16:19,580 --> 00:16:21,469 In particular, there's a very excellent 499 00:16:21,470 --> 00:16:21,769 tool 500 00:16:21,770 --> 00:16:24,259 called Mass Scan that's 501 00:16:24,260 --> 00:16:25,260 been developed 502 00:16:26,390 --> 00:16:27,710 by Robert Graham, 503 00:16:29,060 --> 00:16:31,399 which is aimed at very similar 504 00:16:31,400 --> 00:16:33,379 purposes. And these are parallel efforts. 505 00:16:33,380 --> 00:16:35,509 They're both open source, 506 00:16:35,510 --> 00:16:37,579 actively developed open source projects. 507 00:16:37,580 --> 00:16:39,019 They're both high performance scanning 508 00:16:39,020 --> 00:16:40,819 tools. You should try both of them and 509 00:16:40,820 --> 00:16:41,239 see which 510 00:16:41,240 --> 00:16:43,369 one better suits your needs. 511 00:16:43,370 --> 00:16:43,669 I don't 512 00:16:43,670 --> 00:16:44,209 think either of 513 00:16:44,210 --> 00:16:45,829 us knew about the other effort when we 514 00:16:45,830 --> 00:16:47,959 were building our tools, but now 515 00:16:47,960 --> 00:16:49,879 you have two very good options to choose 516 00:16:49,880 --> 00:16:50,880 from. 517 00:16:52,700 --> 00:16:54,289 Mass scan 518 00:16:54,290 --> 00:16:56,479 if you are willing to install, some 519 00:16:56,480 --> 00:16:57,019 commercial 520 00:16:57,020 --> 00:16:59,149 drivers can use this 521 00:16:59,150 --> 00:17:01,129 thing called PEMF Ring 522 00:17:01,130 --> 00:17:03,079 DNA to 523 00:17:03,080 --> 00:17:05,929 drive a 10 gignac at line speed. 524 00:17:05,930 --> 00:17:08,059 ZMapp is entirely in user space 525 00:17:08,060 --> 00:17:10,309 right now. We can drive from userspace 526 00:17:10,310 --> 00:17:11,929 a one gignac at line speed. 527 00:17:11,930 --> 00:17:12,769 We can drive a 10 528 00:17:12,770 --> 00:17:15,739 gig nick at some multiple of one gig. 529 00:17:15,740 --> 00:17:17,449 We're not quite up to 10 gig yet, but 530 00:17:17,450 --> 00:17:19,519 we're planning to port over offering 531 00:17:19,520 --> 00:17:21,588 DNA support very soon, so you'll be 532 00:17:21,589 --> 00:17:24,348 able to use your haul very fat pipe. 533 00:17:24,349 --> 00:17:26,390 And we do have a 10 gig uplink waiting. 534 00:17:28,430 --> 00:17:29,430 All right, so 535 00:17:30,530 --> 00:17:32,599 let me talk a little bit about how high 536 00:17:32,600 --> 00:17:33,709 speed scanning works. 537 00:17:33,710 --> 00:17:35,359 Some of the engineering tricks we've used 538 00:17:35,360 --> 00:17:37,339 to do it. I'll give you a little bit of 539 00:17:37,340 --> 00:17:39,589 evidence that this actually works because 540 00:17:39,590 --> 00:17:40,129 I too was 541 00:17:40,130 --> 00:17:42,379 skeptical when I until we ran 542 00:17:42,380 --> 00:17:44,359 the experiments and then we'll talk about 543 00:17:44,360 --> 00:17:44,539 some 544 00:17:44,540 --> 00:17:46,669 fun things that you that we 545 00:17:46,670 --> 00:17:47,670 all can do with it. 546 00:17:49,830 --> 00:17:52,139 So in developing ZMapp, 547 00:17:52,140 --> 00:17:53,879 we realized we needed to do a bunch 548 00:17:53,880 --> 00:17:55,829 of things differently from existing 549 00:17:55,830 --> 00:17:58,859 general purpose scanners like in Map 550 00:17:58,860 --> 00:17:59,279 in order 551 00:17:59,280 --> 00:18:01,679 to try to to get really high 552 00:18:01,680 --> 00:18:03,180 bandwidth performance out of it. 553 00:18:04,200 --> 00:18:06,149 So one thing we decided to 554 00:18:06,150 --> 00:18:08,729 do was 555 00:18:08,730 --> 00:18:10,859 attempt to eliminate as much state 556 00:18:10,860 --> 00:18:11,009 as 557 00:18:11,010 --> 00:18:13,529 possible from each of the connections 558 00:18:13,530 --> 00:18:15,509 that connection attempts were trying to 559 00:18:15,510 --> 00:18:16,319 make. 560 00:18:16,320 --> 00:18:17,320 So 561 00:18:18,480 --> 00:18:19,469 the way and map 562 00:18:19,470 --> 00:18:21,749 works, it maintains some 563 00:18:21,750 --> 00:18:23,429 state to track each connection it's 564 00:18:23,430 --> 00:18:25,589 trying to open. It has time outs for each 565 00:18:25,590 --> 00:18:26,909 of those connections. 566 00:18:26,910 --> 00:18:29,099 It uses batching 567 00:18:29,100 --> 00:18:31,439 in order to do rate limiting and 568 00:18:31,440 --> 00:18:34,529 avoid over saturating the network. 569 00:18:34,530 --> 00:18:36,869 We can't do any of these things really 570 00:18:36,870 --> 00:18:38,969 efficiently because in order to scan 571 00:18:38,970 --> 00:18:41,069 at even one gigabit per second 572 00:18:41,070 --> 00:18:43,289 line speed, you have to send 573 00:18:43,290 --> 00:18:45,479 about 1.5 million packets per 574 00:18:45,480 --> 00:18:46,480 second. 575 00:18:46,980 --> 00:18:49,049 So managing state for that many 576 00:18:49,050 --> 00:18:49,949 connections 577 00:18:49,950 --> 00:18:51,420 is not something that we 578 00:18:52,770 --> 00:18:55,469 that we, we felt was a good idea. 579 00:18:55,470 --> 00:18:57,269 So we're able to do it the way we 580 00:18:57,270 --> 00:18:59,009 architected ZMapp. 581 00:18:59,010 --> 00:19:01,019 We have separate processes for sending 582 00:19:01,020 --> 00:19:02,699 separate threads, for sending and 583 00:19:02,700 --> 00:19:05,039 receiving packets, the sending processes 584 00:19:05,040 --> 00:19:05,639 or sending 585 00:19:05,640 --> 00:19:06,959 in, say, a SIM 586 00:19:06,960 --> 00:19:09,149 scan. They're sending signals as quickly 587 00:19:09,150 --> 00:19:10,949 as they possibly can. 588 00:19:10,950 --> 00:19:12,209 They're not communicating with the 589 00:19:12,210 --> 00:19:13,109 receiving threads. 590 00:19:13,110 --> 00:19:14,549 The receiving threads are looking at 591 00:19:14,550 --> 00:19:16,259 what's coming in 592 00:19:16,260 --> 00:19:16,919 and 593 00:19:16,920 --> 00:19:17,939 processing it. 594 00:19:17,940 --> 00:19:19,589 Asynchronously from descending. 595 00:19:21,360 --> 00:19:23,519 Another thing we did differently 596 00:19:23,520 --> 00:19:24,900 was the way that 597 00:19:26,280 --> 00:19:26,579 we 598 00:19:26,580 --> 00:19:29,099 are tracking which hosts are responding. 599 00:19:29,100 --> 00:19:31,109 Let me get to that in one more slide. 600 00:19:33,360 --> 00:19:35,429 In order to avoid flooding the network, 601 00:19:35,430 --> 00:19:37,259 and this is a really important thing, 602 00:19:37,260 --> 00:19:39,329 too, because if we send, if 603 00:19:39,330 --> 00:19:41,549 we overwhelm distant networks, 604 00:19:41,550 --> 00:19:41,699 they're 605 00:19:41,700 --> 00:19:43,919 just going to drop our probes and 606 00:19:43,920 --> 00:19:46,589 we're not going to get accurate data. 607 00:19:46,590 --> 00:19:48,749 And one passed 608 00:19:48,750 --> 00:19:50,789 approach to this has been just to to 609 00:19:50,790 --> 00:19:51,329 severely 610 00:19:51,330 --> 00:19:52,769 limit the rate at which you scan. 611 00:19:52,770 --> 00:19:53,519 But the thing 612 00:19:53,520 --> 00:19:54,509 that makes more sense 613 00:19:54,510 --> 00:19:56,699 is to do randomization to 614 00:19:56,700 --> 00:19:58,919 use basically get statistical 615 00:19:58,920 --> 00:20:00,629 guarantees that you're unlikely to 616 00:20:00,630 --> 00:20:02,459 overwhelm any distant network with 617 00:20:02,460 --> 00:20:04,229 traffic, even if you're scanning as fast 618 00:20:04,230 --> 00:20:05,969 as you are, upstream will allow. 619 00:20:05,970 --> 00:20:07,799 And I'll talk about how we do that to. 620 00:20:09,780 --> 00:20:12,089 Finally, a lot of the credit 621 00:20:12,090 --> 00:20:14,489 for the ability to do scanning 622 00:20:14,490 --> 00:20:16,949 at Gigabit 10 Gigabit Ethernet 623 00:20:16,950 --> 00:20:19,049 line speed ought to go to 624 00:20:19,050 --> 00:20:20,999 the operating systems and to the 625 00:20:21,000 --> 00:20:22,769 underlying architecture because there 626 00:20:22,770 --> 00:20:25,079 have been vast improvements to 627 00:20:25,080 --> 00:20:27,209 things like the Linux kernel, 628 00:20:27,210 --> 00:20:28,949 to the way 629 00:20:28,950 --> 00:20:31,049 processor, to network card 630 00:20:31,050 --> 00:20:32,639 communication happens in Intel 631 00:20:32,640 --> 00:20:34,079 architecture 632 00:20:34,080 --> 00:20:36,239 that underlie all of this. 633 00:20:36,240 --> 00:20:38,279 But still, there are inefficiencies there 634 00:20:38,280 --> 00:20:40,769 things we don't need in the kernel 635 00:20:40,770 --> 00:20:42,000 like the TCP stack. 636 00:20:43,110 --> 00:20:45,239 So one thing we do is we 637 00:20:45,240 --> 00:20:47,229 write out Ethernet frames directly. 638 00:20:47,230 --> 00:20:49,559 We're skipping everything that the kernel 639 00:20:49,560 --> 00:20:51,869 is trying to do to help us. 640 00:20:51,870 --> 00:20:53,249 This lets us just change the 641 00:20:53,250 --> 00:20:54,689 parts of the packet that need to be 642 00:20:54,690 --> 00:20:56,519 changed for each additional probe we're 643 00:20:56,520 --> 00:20:57,809 sending. 644 00:20:57,810 --> 00:20:59,579 It helps us get much, much better 645 00:20:59,580 --> 00:21:00,580 efficiency. 646 00:21:03,170 --> 00:21:05,089 So a couple of the more interesting 647 00:21:05,090 --> 00:21:06,169 challenges we face. 648 00:21:06,170 --> 00:21:08,119 One is how to do addressing. 649 00:21:08,120 --> 00:21:10,189 So the challenge here is 650 00:21:10,190 --> 00:21:12,949 that we want to be able to address. 651 00:21:12,950 --> 00:21:14,299 We want to enumerate the 652 00:21:14,300 --> 00:21:16,639 IPv4 address space 653 00:21:16,640 --> 00:21:17,899 in a random order 654 00:21:17,900 --> 00:21:20,299 every time we scan so that 655 00:21:20,300 --> 00:21:21,979 we're not overwhelming distant networks 656 00:21:21,980 --> 00:21:23,689 so that if we want to scan less than the 657 00:21:23,690 --> 00:21:25,399 full internet, we can get a statistically 658 00:21:25,400 --> 00:21:26,990 relevant sample and so forth. 659 00:21:28,070 --> 00:21:29,209 The way we chose to 660 00:21:29,210 --> 00:21:30,079 do this 661 00:21:30,080 --> 00:21:31,069 is 662 00:21:31,070 --> 00:21:33,259 to and we want to do this without 663 00:21:33,260 --> 00:21:34,429 storing a lot of state. 664 00:21:34,430 --> 00:21:36,109 We don't want to have to make up this 665 00:21:36,110 --> 00:21:38,689 list of all the addresses in advance 666 00:21:38,690 --> 00:21:39,019 and 667 00:21:39,020 --> 00:21:40,020 then 668 00:21:40,700 --> 00:21:42,409 and then use it because that's going to 669 00:21:42,410 --> 00:21:44,689 be for four billion addresses. 670 00:21:44,690 --> 00:21:46,459 This is going to be a multiple gigabyte 671 00:21:46,460 --> 00:21:47,749 list. 672 00:21:47,750 --> 00:21:49,759 So what we do instead is we use this 673 00:21:49,760 --> 00:21:51,169 trick from from 674 00:21:51,170 --> 00:21:52,969 the world of math. 675 00:21:52,970 --> 00:21:55,369 Now this trick that the crypto people 676 00:21:55,370 --> 00:21:57,049 would think is trivial 677 00:21:57,050 --> 00:21:57,319 where 678 00:21:57,320 --> 00:21:59,509 we iterate over a multiplicative group 679 00:21:59,510 --> 00:22:01,309 of integers modulo p for 680 00:22:01,310 --> 00:22:03,379 some p, that's about the size 681 00:22:03,380 --> 00:22:05,239 of the address space. 682 00:22:05,240 --> 00:22:07,699 So what this allows us to do is 683 00:22:07,700 --> 00:22:10,009 with just a negligible amount, 684 00:22:10,010 --> 00:22:12,349 a constant amount of space 685 00:22:12,350 --> 00:22:14,569 that is the the primitive route 686 00:22:14,570 --> 00:22:15,649 that defines 687 00:22:15,650 --> 00:22:17,689 this multiplicative group, our current 688 00:22:17,690 --> 00:22:19,759 location within the group and the 689 00:22:19,760 --> 00:22:21,169 starting address. 690 00:22:21,170 --> 00:22:23,449 We can enumerate the IP address 691 00:22:23,450 --> 00:22:24,680 space in random order. 692 00:22:25,820 --> 00:22:28,189 And if we want a different permutation, 693 00:22:28,190 --> 00:22:29,749 a different ordering, 694 00:22:29,750 --> 00:22:32,209 we can just vary 695 00:22:32,210 --> 00:22:33,919 the generator that we're using 696 00:22:35,300 --> 00:22:37,309 to to define this. 697 00:22:37,310 --> 00:22:38,689 So with just a very, very 698 00:22:38,690 --> 00:22:40,819 small amount of state, we 699 00:22:40,820 --> 00:22:42,949 can get a nice random ordering of the IP 700 00:22:42,950 --> 00:22:43,950 address space. 701 00:22:45,110 --> 00:22:46,039 Another thing that we 702 00:22:46,040 --> 00:22:48,109 do is 703 00:22:48,110 --> 00:22:50,179 the way we validate responses 704 00:22:50,180 --> 00:22:51,199 in an almost totally 705 00:22:51,200 --> 00:22:51,709 stateless 706 00:22:51,710 --> 00:22:52,999 way 707 00:22:53,000 --> 00:22:55,219 is by taking 708 00:22:55,220 --> 00:22:57,349 some secret and basically 709 00:22:57,350 --> 00:22:59,569 encoding it in fields of the outgoing 710 00:22:59,570 --> 00:23:01,729 probe such that we can 711 00:23:01,730 --> 00:23:03,829 recognize whether a packet coming back 712 00:23:03,830 --> 00:23:06,229 in is a result of our scanning. 713 00:23:06,230 --> 00:23:07,999 Or is it background traffic? 714 00:23:08,000 --> 00:23:10,129 Is it just someone trying 715 00:23:10,130 --> 00:23:11,899 to attack us? 716 00:23:11,900 --> 00:23:13,639 Whatever we need to differentiate 717 00:23:13,640 --> 00:23:14,449 responses that 718 00:23:14,450 --> 00:23:15,619 are a 719 00:23:15,620 --> 00:23:17,479 result of our probes from any other 720 00:23:17,480 --> 00:23:19,699 traffic we might be receiving, including 721 00:23:19,700 --> 00:23:21,439 sort of the noise that happens 722 00:23:21,440 --> 00:23:23,749 after we run a big internet wide scan? 723 00:23:23,750 --> 00:23:25,669 Some people are going to scan us back. 724 00:23:25,670 --> 00:23:27,649 Some people are going to be trying to to 725 00:23:27,650 --> 00:23:28,759 dice us. 726 00:23:28,760 --> 00:23:30,319 Some people will be connecting just to 727 00:23:30,320 --> 00:23:32,479 see why they got a probe packet and 728 00:23:32,480 --> 00:23:33,529 so forth. 729 00:23:33,530 --> 00:23:35,299 So the technique we use for this is very 730 00:23:35,300 --> 00:23:35,599 much 731 00:23:35,600 --> 00:23:38,119 like the idea behind sin cookies. 732 00:23:38,120 --> 00:23:40,219 So the idea is if we were 733 00:23:40,220 --> 00:23:40,639 to look 734 00:23:40,640 --> 00:23:42,919 into the headers for, say, a sin 735 00:23:42,920 --> 00:23:44,419 probe here, 736 00:23:44,420 --> 00:23:44,629 there 737 00:23:44,630 --> 00:23:45,049 are a bunch of 738 00:23:45,050 --> 00:23:47,119 fields that we can use to encode 739 00:23:49,280 --> 00:23:51,139 distinguishable data. 740 00:23:51,140 --> 00:23:53,749 So based on some per scan 741 00:23:53,750 --> 00:23:54,259 random 742 00:23:54,260 --> 00:23:55,669 secret, 743 00:23:55,670 --> 00:23:55,999 we 744 00:23:56,000 --> 00:23:58,069 encode bits into the 745 00:23:58,070 --> 00:24:00,139 sending IP address if we have a range 746 00:24:00,140 --> 00:24:01,849 of addresses to send from into the 747 00:24:01,850 --> 00:24:04,479 sending port and into the 748 00:24:04,480 --> 00:24:06,289 the initial sequence number. 749 00:24:06,290 --> 00:24:08,899 And so when a response comes back, 750 00:24:08,900 --> 00:24:11,029 the receiving IP address received report 751 00:24:11,030 --> 00:24:13,159 and acknowledgment number are going to 752 00:24:13,160 --> 00:24:13,939 reflect 753 00:24:13,940 --> 00:24:16,489 that information we put into those fields 754 00:24:16,490 --> 00:24:18,829 so we can then look at them and get 755 00:24:18,830 --> 00:24:20,659 and learn with high statistical 756 00:24:20,660 --> 00:24:22,279 certainty. Whether this probe was a 757 00:24:22,280 --> 00:24:24,079 response to whether this 758 00:24:24,080 --> 00:24:26,179 response was coming back as a result 759 00:24:26,180 --> 00:24:27,469 of a probe we sent or not. 760 00:24:30,780 --> 00:24:33,149 All right, finally, the overall 761 00:24:33,150 --> 00:24:35,219 architecture of the tool is 762 00:24:35,220 --> 00:24:36,569 optimized for 763 00:24:38,040 --> 00:24:38,579 extended 764 00:24:38,580 --> 00:24:40,769 ability for the ability to build new 765 00:24:40,770 --> 00:24:42,629 things on top of it as easily as 766 00:24:42,630 --> 00:24:43,919 possible. 767 00:24:43,920 --> 00:24:45,659 So we tried to do the hard work in the 768 00:24:45,660 --> 00:24:46,919 main body of the tool. 769 00:24:46,920 --> 00:24:48,389 This is all written in C. 770 00:24:49,710 --> 00:24:51,929 We provide some some probe modules 771 00:24:51,930 --> 00:24:54,539 and response interpretation modules 772 00:24:54,540 --> 00:24:56,729 that you can take and modify and build 773 00:24:56,730 --> 00:24:58,799 on to do all sorts 774 00:24:58,800 --> 00:25:01,139 of different scans. So we have we ship 775 00:25:01,140 --> 00:25:03,029 with things for doing since scans, we 776 00:25:03,030 --> 00:25:04,259 ship with things, for doing 777 00:25:04,260 --> 00:25:06,599 certain kinds of UDP probing. 778 00:25:06,600 --> 00:25:08,489 If you want to extend this to do other 779 00:25:08,490 --> 00:25:10,619 kinds of probing, it's a simple matter of 780 00:25:10,620 --> 00:25:12,659 modifying these modules. 781 00:25:12,660 --> 00:25:14,849 We also try to because because we wanted 782 00:25:14,850 --> 00:25:16,919 to use this for research, we tried to 783 00:25:16,920 --> 00:25:17,849 make the output 784 00:25:17,850 --> 00:25:20,579 system very flexible so that we could 785 00:25:20,580 --> 00:25:22,409 take data coming back from the scan or 786 00:25:22,410 --> 00:25:23,819 throw it into a database. 787 00:25:23,820 --> 00:25:24,989 Throw it into Redis. 788 00:25:26,250 --> 00:25:28,049 We have a variety of output handling 789 00:25:28,050 --> 00:25:29,759 modules that you can use for different 790 00:25:29,760 --> 00:25:30,869 purposes. 791 00:25:30,870 --> 00:25:32,939 We also have because 792 00:25:32,940 --> 00:25:33,779 initially what 793 00:25:33,780 --> 00:25:36,179 this is providing is just basically 794 00:25:36,180 --> 00:25:38,099 a port scan. 795 00:25:38,100 --> 00:25:39,659 You probably want to do more. 796 00:25:39,660 --> 00:25:41,459 You probably want to find out hosts that 797 00:25:41,460 --> 00:25:43,919 have that port open and make a real TCP 798 00:25:43,920 --> 00:25:45,239 connection to them. 799 00:25:45,240 --> 00:25:47,009 So to allow you to do that, there are a 800 00:25:47,010 --> 00:25:48,899 couple of ways to plug this together. 801 00:25:48,900 --> 00:25:50,609 We have we ship the tool with a 802 00:25:50,610 --> 00:25:52,409 banner grab program that can very 803 00:25:52,410 --> 00:25:54,869 efficiently make many thousands of 804 00:25:54,870 --> 00:25:57,029 connections in parallel to 805 00:25:57,030 --> 00:25:59,159 download, say, the root of 806 00:25:59,160 --> 00:26:01,439 a website or the the banner from 807 00:26:01,440 --> 00:26:02,639 a Telnet server. 808 00:26:02,640 --> 00:26:04,649 Put that in a database for you, 809 00:26:04,650 --> 00:26:06,929 or if that's going to be too much work 810 00:26:06,930 --> 00:26:08,279 to use a separate program 811 00:26:08,280 --> 00:26:10,289 for that, you can integrate that into 812 00:26:10,290 --> 00:26:12,359 your output handler and ZMapp, and we 813 00:26:12,360 --> 00:26:13,859 do provide a kernel module. 814 00:26:13,860 --> 00:26:15,269 If you want to do this 815 00:26:15,270 --> 00:26:16,559 with 816 00:26:16,560 --> 00:26:18,629 the TCP connection, you've half 817 00:26:18,630 --> 00:26:21,449 opened with that initial syn scan. 818 00:26:21,450 --> 00:26:23,290 So we have a TCP 819 00:26:24,330 --> 00:26:25,919 kernel module that will complete the 820 00:26:25,920 --> 00:26:28,739 handshake and give you back a socket. 821 00:26:28,740 --> 00:26:30,419 That's not something that Linux can do 822 00:26:30,420 --> 00:26:31,420 out of the box. 823 00:26:32,760 --> 00:26:34,949 All right, so really quickly, if 824 00:26:34,950 --> 00:26:36,629 this is how it works, is it does this 825 00:26:36,630 --> 00:26:38,009 actually work? 826 00:26:38,010 --> 00:26:39,929 So you might think that if you're doing a 827 00:26:39,930 --> 00:26:42,119 scan even at a gigabit, packets 828 00:26:42,120 --> 00:26:42,899 are going to get dropped. 829 00:26:42,900 --> 00:26:45,029 There's no way that 830 00:26:45,030 --> 00:26:46,679 if you're doing what I'm proposing, which 831 00:26:46,680 --> 00:26:47,549 is just send one 832 00:26:47,550 --> 00:26:49,139 sim packet to everyone and see 833 00:26:49,140 --> 00:26:50,249 what comes back. 834 00:26:50,250 --> 00:26:51,929 There's no way you're going to get very 835 00:26:51,930 --> 00:26:53,549 good coverage. Surely enough will be 836 00:26:53,550 --> 00:26:55,349 dropped that you have to send multiple 837 00:26:55,350 --> 00:26:55,799 since 838 00:26:55,800 --> 00:26:57,180 you have to retry, right? 839 00:26:58,720 --> 00:27:00,849 Well, the data says says different 840 00:27:00,850 --> 00:27:03,189 ways, so this is one experiment we ran 841 00:27:03,190 --> 00:27:05,049 where we tried doing scanning. 842 00:27:05,050 --> 00:27:06,489 This is, I think, on forward four four 843 00:27:06,490 --> 00:27:08,109 three multiple trials 844 00:27:08,110 --> 00:27:10,719 of scanning for. 845 00:27:10,720 --> 00:27:12,190 I think this is for 846 00:27:13,240 --> 00:27:15,129 one percent samples of the IP address 847 00:27:15,130 --> 00:27:16,239 space 848 00:27:16,240 --> 00:27:18,399 at different rates of 849 00:27:18,400 --> 00:27:19,179 of 850 00:27:19,180 --> 00:27:20,829 outgoing probe packets. 851 00:27:20,830 --> 00:27:22,389 And what we're really looking for here is 852 00:27:22,390 --> 00:27:24,549 we're looking for how fast is too fast. 853 00:27:24,550 --> 00:27:27,219 We're looking for the point when we start 854 00:27:27,220 --> 00:27:29,199 seeing fewer responsive hosts 855 00:27:29,200 --> 00:27:31,449 because we're overwhelming the network. 856 00:27:31,450 --> 00:27:33,819 Well, in the experiments we did up to 857 00:27:33,820 --> 00:27:36,249 this is with a gigabit uplink 858 00:27:36,250 --> 00:27:37,239 up to a gigabit. 859 00:27:37,240 --> 00:27:39,099 We were not able on our network to 860 00:27:39,100 --> 00:27:41,559 observe any meaningful drop off. 861 00:27:41,560 --> 00:27:43,029 So the there's some 862 00:27:43,030 --> 00:27:43,929 very there's some 863 00:27:43,930 --> 00:27:45,879 fluctuation based on other things I'll 864 00:27:45,880 --> 00:27:47,289 talk about in a second. 865 00:27:47,290 --> 00:27:48,999 But basically, we cannot find the 866 00:27:49,000 --> 00:27:51,099 place where we start seeing drop 867 00:27:51,100 --> 00:27:52,509 off because we're overwhelming the 868 00:27:52,510 --> 00:27:53,289 network. 869 00:27:53,290 --> 00:27:55,479 So based on this, 870 00:27:55,480 --> 00:27:55,689 we 871 00:27:55,690 --> 00:27:58,269 conclude that that gigabit scanning 872 00:27:59,680 --> 00:28:00,159 at least 873 00:28:00,160 --> 00:28:01,749 up to a gigabit is not enough to 874 00:28:01,750 --> 00:28:02,649 overwhelm 875 00:28:02,650 --> 00:28:03,939 upstream. 876 00:28:03,940 --> 00:28:05,409 However, your mileage may vary. 877 00:28:05,410 --> 00:28:07,629 This depends a lot on what the network 878 00:28:07,630 --> 00:28:08,169 topology 879 00:28:08,170 --> 00:28:09,369 around you looks like. 880 00:28:09,370 --> 00:28:10,959 We happen to have a very, very well 881 00:28:10,960 --> 00:28:13,209 provisioned uplink 882 00:28:13,210 --> 00:28:15,609 that connects in Chicago to 883 00:28:15,610 --> 00:28:17,709 to several different 884 00:28:17,710 --> 00:28:19,959 major backbone providers. 885 00:28:19,960 --> 00:28:21,849 Your mileage may vary if your your 886 00:28:21,850 --> 00:28:24,009 network upstream is not 887 00:28:24,010 --> 00:28:25,010 well provisioned. 888 00:28:25,990 --> 00:28:27,849 Part of the reason for that variation you 889 00:28:27,850 --> 00:28:29,919 see, though, is that the number 890 00:28:29,920 --> 00:28:30,579 of devices 891 00:28:30,580 --> 00:28:32,409 that are responsive on the internet 892 00:28:32,410 --> 00:28:33,279 varies quite 893 00:28:33,280 --> 00:28:35,349 a bit over the course of a day 894 00:28:35,350 --> 00:28:37,509 over the course of a week, just 895 00:28:37,510 --> 00:28:39,279 as there is a result of people 896 00:28:39,280 --> 00:28:41,349 in aggregate. Turning things on and off 897 00:28:41,350 --> 00:28:42,589 is our our main theory. 898 00:28:42,590 --> 00:28:44,829 You turn off your, your phone 899 00:28:44,830 --> 00:28:46,389 or your cable modem at night. 900 00:28:46,390 --> 00:28:48,609 We see a variation of about plus or minus 901 00:28:48,610 --> 00:28:50,859 three percent on a typical day. 902 00:28:50,860 --> 00:28:53,109 So this is based on doing 903 00:28:53,110 --> 00:28:55,569 continuous scans of a one percent 904 00:28:55,570 --> 00:28:57,789 sample of hosts on fought for four 905 00:28:57,790 --> 00:28:59,829 three over a 24 hour period. 906 00:28:59,830 --> 00:29:02,079 And you can see that very lovely diurnal 907 00:29:02,080 --> 00:29:03,080 pattern. 908 00:29:04,520 --> 00:29:05,810 Another question is, 909 00:29:07,160 --> 00:29:09,649 even if we're not overwhelming 910 00:29:09,650 --> 00:29:10,759 the network. 911 00:29:10,760 --> 00:29:12,499 Are we getting back responses 912 00:29:12,500 --> 00:29:14,779 from from everything that 913 00:29:14,780 --> 00:29:16,309 should be responding, even though we're 914 00:29:16,310 --> 00:29:18,829 not sending more than one cent 915 00:29:18,830 --> 00:29:19,729 packet? 916 00:29:19,730 --> 00:29:22,129 So one way we can see this is 917 00:29:22,130 --> 00:29:23,130 by 918 00:29:24,140 --> 00:29:25,279 trying to send multiple 919 00:29:25,280 --> 00:29:27,709 SIM packets to to each address. 920 00:29:27,710 --> 00:29:30,049 And what we'd expect to see is 921 00:29:30,050 --> 00:29:32,359 that as we send more and more, eventually 922 00:29:32,360 --> 00:29:33,349 we reach one. 923 00:29:33,350 --> 00:29:35,479 Some, some plateau, some 924 00:29:35,480 --> 00:29:38,359 rate, some number of SIM packets at which 925 00:29:38,360 --> 00:29:40,759 only a vanishingly small number 926 00:29:40,760 --> 00:29:40,879 of 927 00:29:40,880 --> 00:29:42,589 hosts will have 928 00:29:42,590 --> 00:29:43,219 lost them 929 00:29:43,220 --> 00:29:45,379 all. So basically, we're trying to reach 930 00:29:45,380 --> 00:29:48,289 ground truth here, some some denominator 931 00:29:48,290 --> 00:29:49,189 against which to 932 00:29:49,190 --> 00:29:51,139 compare the numerator of how many results 933 00:29:51,140 --> 00:29:52,489 we've gotten. 934 00:29:52,490 --> 00:29:54,559 Well, what we do see what we find indeed, 935 00:29:54,560 --> 00:29:56,749 is that there is a plateau 936 00:29:56,750 --> 00:29:59,029 and really it starts to level off 937 00:29:59,030 --> 00:30:01,879 at about 8000 and probes 938 00:30:01,880 --> 00:30:04,039 per host. But the difference for fewer 939 00:30:04,040 --> 00:30:05,809 than that is actually quite small. 940 00:30:05,810 --> 00:30:08,179 You can see the scale on the left. 941 00:30:08,180 --> 00:30:11,329 So based on this, if we use this point, 942 00:30:11,330 --> 00:30:13,609 this plateau as a kind of ground truth, 943 00:30:13,610 --> 00:30:13,819 this 944 00:30:13,820 --> 00:30:15,829 is where we get at this number of about 945 00:30:15,830 --> 00:30:18,049 97 to 98 percent of 946 00:30:18,050 --> 00:30:20,329 responsive hosts are going to be reached 947 00:30:20,330 --> 00:30:21,800 with just one cent packet. 948 00:30:24,100 --> 00:30:26,439 All right, so it's a little bit unfair 949 00:30:26,440 --> 00:30:28,659 to compare scanners like ZMapp to end 950 00:30:28,660 --> 00:30:30,159 map, which was not designed 951 00:30:30,160 --> 00:30:32,589 for internet wide scanning. 952 00:30:32,590 --> 00:30:34,659 But it's also a natural question to ask. 953 00:30:34,660 --> 00:30:36,759 So just really quickly, what do we get 954 00:30:36,760 --> 00:30:39,189 if we do this comparison? 955 00:30:39,190 --> 00:30:41,739 So let me give you some average at times 956 00:30:41,740 --> 00:30:43,869 for using ZMapp and Nmap to 957 00:30:43,870 --> 00:30:45,609 scan a random sample 958 00:30:45,610 --> 00:30:47,259 of a million hosts, and this is all 959 00:30:47,260 --> 00:30:50,079 scanning the same sample. 960 00:30:50,080 --> 00:30:52,599 So with end maps, this is using 961 00:30:52,600 --> 00:30:55,359 maps. Most aggressive default 962 00:30:55,360 --> 00:30:57,819 insane is what it's called, 963 00:30:57,820 --> 00:30:58,179 plus a 964 00:30:58,180 --> 00:31:00,069 few flags that make it a bit faster than 965 00:31:00,070 --> 00:31:03,009 that. This is basically the configuration 966 00:31:03,010 --> 00:31:05,169 we used in the 967 00:31:05,170 --> 00:31:06,730 the wikis work I mentioned. 968 00:31:07,900 --> 00:31:10,119 So with this configuration in end 969 00:31:10,120 --> 00:31:11,979 maps default 970 00:31:11,980 --> 00:31:14,139 configuration here, what I'm talking 971 00:31:14,140 --> 00:31:16,359 about, it sends it 972 00:31:16,360 --> 00:31:18,759 tries to send up to two send packets. 973 00:31:18,760 --> 00:31:20,499 Perhaps it waits for a timeout, then it 974 00:31:20,500 --> 00:31:21,500 sends another one. 975 00:31:22,630 --> 00:31:25,359 It will take about 116 976 00:31:25,360 --> 00:31:27,429 days to probe the 977 00:31:27,430 --> 00:31:29,529 entire internet using 978 00:31:29,530 --> 00:31:31,089 that configuration. 979 00:31:31,090 --> 00:31:33,279 If we reduce and map to just sending 980 00:31:33,280 --> 00:31:35,769 one probe to just waiting one timeout 981 00:31:35,770 --> 00:31:37,149 interval, it 982 00:31:37,150 --> 00:31:39,789 will only take about 62 days. 983 00:31:39,790 --> 00:31:41,079 ZMapp in its default 984 00:31:41,080 --> 00:31:43,299 configuration in this experimental 985 00:31:43,300 --> 00:31:45,519 setup would have taken about an hour 986 00:31:45,520 --> 00:31:46,569 10 minutes. 987 00:31:46,570 --> 00:31:48,639 We we normally do faster, but this 988 00:31:48,640 --> 00:31:50,469 well, these tests were done with a neck 989 00:31:50,470 --> 00:31:52,179 that was not quite as good as the one we 990 00:31:52,180 --> 00:31:54,549 used for the 45 minute trials. 991 00:31:55,750 --> 00:31:57,429 ZMapp, in a maybe more 992 00:31:57,430 --> 00:32:00,189 comparable to probe configuration 993 00:32:00,190 --> 00:32:02,409 takes, would take about 994 00:32:02,410 --> 00:32:04,389 about twice that about two hours and 12 995 00:32:04,390 --> 00:32:05,709 minutes. 996 00:32:05,710 --> 00:32:08,079 So quite a bit faster. 997 00:32:08,080 --> 00:32:10,449 In fact, this is 998 00:32:10,450 --> 00:32:13,209 about 3500 times faster 999 00:32:13,210 --> 00:32:14,380 than in Maps most 1000 00:32:16,180 --> 00:32:18,459 aggressive default configuration. 1001 00:32:18,460 --> 00:32:20,379 And I'm particularly proud of that, 1002 00:32:20,380 --> 00:32:22,089 because it's not often that you get three 1003 00:32:22,090 --> 00:32:23,889 orders of magnitude through aggressive 1004 00:32:23,890 --> 00:32:26,170 optimization, but when you do, oh man. 1005 00:32:28,660 --> 00:32:30,759 Surprisingly, though, ZMapp 1006 00:32:30,760 --> 00:32:32,829 also finds more hosts that are 1007 00:32:32,830 --> 00:32:35,199 responsive than any map does. 1008 00:32:35,200 --> 00:32:37,179 And we were really scratching our heads 1009 00:32:37,180 --> 00:32:39,609 about that because even with two probes 1010 00:32:39,610 --> 00:32:41,559 and map finds fewer responsive hosts 1011 00:32:41,560 --> 00:32:43,659 than ZMapp does with one probe in these 1012 00:32:43,660 --> 00:32:44,859 tests. 1013 00:32:44,860 --> 00:32:47,079 So why is that? 1014 00:32:47,080 --> 00:32:48,789 Well, that turns out to have to do with 1015 00:32:48,790 --> 00:32:50,319 response time. 1016 00:32:50,320 --> 00:32:52,659 So if we tried 1017 00:32:52,660 --> 00:32:54,429 to do an experiment where we measure how 1018 00:32:54,430 --> 00:32:56,289 long it takes for an 1019 00:32:56,290 --> 00:32:59,169 act to come back after we send a sin? 1020 00:32:59,170 --> 00:33:01,029 So with a huge number of hosts, we can 1021 00:33:01,030 --> 00:33:02,829 plot this as a cumulative 1022 00:33:04,720 --> 00:33:06,669 fraction here. 1023 00:33:06,670 --> 00:33:08,799 What we find is that if you just wait two 1024 00:33:08,800 --> 00:33:10,059 hundred and fifty milliseconds 1025 00:33:10,060 --> 00:33:11,619 for an act to come back, 1026 00:33:11,620 --> 00:33:13,359 you're only going to see about 85 1027 00:33:13,360 --> 00:33:15,759 percent of responsive hosts. 1028 00:33:15,760 --> 00:33:17,619 If you wait 500 milliseconds, you're 1029 00:33:17,620 --> 00:33:19,929 going to see about 98 percent. 1030 00:33:19,930 --> 00:33:22,450 And it takes almost 1031 00:33:23,620 --> 00:33:25,909 almost eight seconds before things 1032 00:33:25,910 --> 00:33:27,129 a little more than eight seconds, 1033 00:33:27,130 --> 00:33:29,049 actually before things level out and you 1034 00:33:29,050 --> 00:33:30,219 see the last 1035 00:33:30,220 --> 00:33:32,109 responses trickling in. 1036 00:33:32,110 --> 00:33:33,939 So things rattle around in the network 1037 00:33:33,940 --> 00:33:35,919 for a while and then come back. 1038 00:33:35,920 --> 00:33:37,749 It turns out that in map, because it has 1039 00:33:37,750 --> 00:33:40,209 to maintain state for each connection, 1040 00:33:40,210 --> 00:33:42,879 including a timeout in its aggressive 1041 00:33:42,880 --> 00:33:45,009 modes, has relatively 1042 00:33:45,010 --> 00:33:47,469 conservative, relatively short timeouts. 1043 00:33:47,470 --> 00:33:49,059 Actually, the time outs are something 1044 00:33:49,060 --> 00:33:49,269 like 1045 00:33:49,270 --> 00:33:52,569 250 milliseconds and 500 milliseconds. 1046 00:33:52,570 --> 00:33:54,759 So Nmap is going to just give 1047 00:33:54,760 --> 00:33:57,669 up and move on after 200 milliseconds. 1048 00:33:57,670 --> 00:33:59,289 ZMapp because we're using this 1049 00:33:59,290 --> 00:34:01,389 asynchronous design where we 1050 00:34:01,390 --> 00:34:03,219 don't have any per connection time out, 1051 00:34:03,220 --> 00:34:05,499 we'll just accept the response when 1052 00:34:05,500 --> 00:34:07,299 it comes back. Eventually, if it comes 1053 00:34:07,300 --> 00:34:09,279 back before the entire scan run 1054 00:34:09,280 --> 00:34:11,408 terminates, that allows us 1055 00:34:11,409 --> 00:34:13,869 to see these very long delayed 1056 00:34:13,870 --> 00:34:14,979 responses. 1057 00:34:14,980 --> 00:34:17,198 This is why we're able to get even higher 1058 00:34:17,199 --> 00:34:19,569 coverage, even though we're using 1059 00:34:19,570 --> 00:34:21,669 what's probably a much more aggressive 1060 00:34:21,670 --> 00:34:23,919 means of doing an internet 1061 00:34:23,920 --> 00:34:24,920 wide scan. 1062 00:34:26,920 --> 00:34:29,049 All right, so let me talk 1063 00:34:29,050 --> 00:34:29,349 about a 1064 00:34:29,350 --> 00:34:31,299 few applications, a few of the fun things 1065 00:34:31,300 --> 00:34:32,649 that we've done. 1066 00:34:32,650 --> 00:34:34,449 So let me start with a really fun one. 1067 00:34:36,639 --> 00:34:38,109 So one thing that you can use 1068 00:34:38,110 --> 00:34:39,939 internet wide scanning for is finding 1069 00:34:39,940 --> 00:34:42,399 things that are are vulnerable and 1070 00:34:42,400 --> 00:34:44,408 at really, really big scale, at internet 1071 00:34:44,409 --> 00:34:45,609 wide scale. 1072 00:34:45,610 --> 00:34:47,799 So this is important from a white 1073 00:34:47,800 --> 00:34:49,909 hat perspective because we want to know. 1074 00:34:51,130 --> 00:34:51,609 We want to 1075 00:34:51,610 --> 00:34:53,019 know the impact of a 1076 00:34:53,020 --> 00:34:54,789 problem that's been announced. 1077 00:34:54,790 --> 00:34:57,069 Or maybe we want to know before we do 1078 00:34:57,070 --> 00:34:59,259 disclosure. Before we do we 1079 00:34:59,260 --> 00:34:59,799 go public 1080 00:34:59,800 --> 00:35:01,629 with a vulnerability how big the impact 1081 00:35:01,630 --> 00:35:02,559 is going to be. 1082 00:35:02,560 --> 00:35:05,079 Maybe this will help make a case to the 1083 00:35:05,080 --> 00:35:07,299 manufacturer of whatever technology it is 1084 00:35:07,300 --> 00:35:08,799 that they need to get their patch out 1085 00:35:08,800 --> 00:35:09,800 now. 1086 00:35:11,290 --> 00:35:13,209 In any case, we experimented with this a 1087 00:35:13,210 --> 00:35:14,319 few times. 1088 00:35:14,320 --> 00:35:16,809 The first time had to do with 1089 00:35:16,810 --> 00:35:18,699 these vulnerabilities in UPnP 1090 00:35:18,700 --> 00:35:20,949 that Eddie Moore announced 1091 00:35:20,950 --> 00:35:21,279 about 1092 00:35:21,280 --> 00:35:23,349 a year ago, where several common 1093 00:35:23,350 --> 00:35:23,979 UPnP 1094 00:35:23,980 --> 00:35:25,959 implementations had just terrible, 1095 00:35:25,960 --> 00:35:28,149 exploitable problems they were used in 1096 00:35:28,150 --> 00:35:30,340 in many millions of devices and 1097 00:35:31,360 --> 00:35:32,529 could give a remote 1098 00:35:32,530 --> 00:35:34,719 attacker root 1099 00:35:34,720 --> 00:35:36,219 in 1100 00:35:36,220 --> 00:35:38,140 in just a single UDP packet. 1101 00:35:39,310 --> 00:35:42,009 Well, it took less than six hours 1102 00:35:42,010 --> 00:35:43,419 for for 1103 00:35:43,420 --> 00:35:44,319 one of my Ph.D. 1104 00:35:44,320 --> 00:35:45,639 students 1105 00:35:45,640 --> 00:35:45,939 to 1106 00:35:45,940 --> 00:35:46,689 code 1107 00:35:46,690 --> 00:35:49,239 using ZMapp, a scanner 1108 00:35:49,240 --> 00:35:50,859 that would perform a UPnP 1109 00:35:50,860 --> 00:35:52,959 discovery probe across all of 1110 00:35:52,960 --> 00:35:55,390 the internet and also to run the probe. 1111 00:35:57,020 --> 00:35:58,020 So, 1112 00:35:59,950 --> 00:36:02,049 and by doing that, we found about 1113 00:36:02,050 --> 00:36:02,529 six 1114 00:36:02,530 --> 00:36:03,530 million 1115 00:36:04,150 --> 00:36:05,919 responsive UPnP devices. 1116 00:36:05,920 --> 00:36:08,139 About three million of them would 1117 00:36:08,140 --> 00:36:09,309 have been vulnerable. 1118 00:36:09,310 --> 00:36:11,109 Still, even when we did this, this was 1119 00:36:11,110 --> 00:36:12,999 maybe a month after the vulnerabilities 1120 00:36:13,000 --> 00:36:14,049 were disclosed. 1121 00:36:14,050 --> 00:36:15,339 We're still vulnerable 1122 00:36:15,340 --> 00:36:17,469 to the vulnerability is that that 1123 00:36:17,470 --> 00:36:19,119 HD anymore went public 1124 00:36:19,120 --> 00:36:21,489 with. So in the same 1125 00:36:21,490 --> 00:36:23,289 amount of time, if you had an exploit 1126 00:36:23,290 --> 00:36:25,059 ready, you could have exploited all of 1127 00:36:25,060 --> 00:36:26,349 these machines. 1128 00:36:26,350 --> 00:36:28,239 So it's both really valuable from a 1129 00:36:28,240 --> 00:36:30,429 defensive standpoint and really, really 1130 00:36:30,430 --> 00:36:31,899 scary 1131 00:36:31,900 --> 00:36:32,859 from 1132 00:36:32,860 --> 00:36:34,929 a vulnerability standpoint 1133 00:36:34,930 --> 00:36:36,219 that this kind of very high 1134 00:36:36,220 --> 00:36:38,769 speed probe probing is possible. 1135 00:36:38,770 --> 00:36:40,449 Of course, you always could have done 1136 00:36:40,450 --> 00:36:42,469 this with a botnet, right? 1137 00:36:42,470 --> 00:36:44,409 So in one sense, it doesn't make anything 1138 00:36:44,410 --> 00:36:46,029 possible for the attacker. 1139 00:36:46,030 --> 00:36:48,429 That wasn't possible before. 1140 00:36:48,430 --> 00:36:50,649 But this does democratize it. 1141 00:36:50,650 --> 00:36:52,779 This puts this same technique the same 1142 00:36:52,780 --> 00:36:54,849 capability into all of our 1143 00:36:54,850 --> 00:36:56,949 hands that previously, maybe just 1144 00:36:56,950 --> 00:36:57,819 a few of us who had 1145 00:36:57,820 --> 00:37:00,039 botnets, a few of maybe you who had 1146 00:37:00,040 --> 00:37:01,360 botnets, I don't have a botnet 1147 00:37:02,590 --> 00:37:03,850 had the ability to do, 1148 00:37:05,290 --> 00:37:07,389 Oh, I should add, we did another study 1149 00:37:07,390 --> 00:37:09,489 over the summer looking at Ipman 1150 00:37:09,490 --> 00:37:10,479 devices. 1151 00:37:10,480 --> 00:37:12,759 So I PMI is this remote 1152 00:37:12,760 --> 00:37:14,619 management interface that's built into 1153 00:37:14,620 --> 00:37:16,779 almost every server motherboard 1154 00:37:16,780 --> 00:37:19,599 where the motherboard actually has. 1155 00:37:19,600 --> 00:37:21,279 In addition to the CPU, 1156 00:37:21,280 --> 00:37:23,619 it has an embedded arm 1157 00:37:23,620 --> 00:37:24,369 system on the 1158 00:37:24,370 --> 00:37:25,329 motherboard 1159 00:37:25,330 --> 00:37:27,639 that's on even when the server is powered 1160 00:37:27,640 --> 00:37:28,299 off. 1161 00:37:28,300 --> 00:37:30,399 And you can get 1162 00:37:30,400 --> 00:37:32,319 into through a web interface to power 1163 00:37:32,320 --> 00:37:34,569 cycle the server, monitor itself, get 1164 00:37:34,570 --> 00:37:36,939 a remote KVM, you know, basically 1165 00:37:36,940 --> 00:37:38,589 anything you could do if you had physical 1166 00:37:38,590 --> 00:37:40,329 access to the box? 1167 00:37:40,330 --> 00:37:41,619 Well, you'd think that they would have 1168 00:37:41,620 --> 00:37:42,549 gotten it PMI 1169 00:37:42,550 --> 00:37:43,839 security, right? 1170 00:37:43,840 --> 00:37:46,239 Since this is protected, really valuable 1171 00:37:46,240 --> 00:37:48,279 assets and giving you an incredible 1172 00:37:48,280 --> 00:37:49,749 amount of power, it's basically a 1173 00:37:49,750 --> 00:37:51,699 backdoor that comes with your your 1174 00:37:51,700 --> 00:37:52,700 server. 1175 00:37:53,650 --> 00:37:55,929 But we looked into a 1176 00:37:55,930 --> 00:37:56,379 leading 1177 00:37:56,380 --> 00:37:59,619 IP, my manufacturers implementation, 1178 00:37:59,620 --> 00:38:01,419 and it was just riddled with 1179 00:38:01,420 --> 00:38:02,349 vulnerabilities. 1180 00:38:02,350 --> 00:38:04,359 I mean, I'm talking about 1181 00:38:04,360 --> 00:38:05,360 a 1182 00:38:07,420 --> 00:38:09,789 root buffer overflow in 1183 00:38:09,790 --> 00:38:12,309 the web interfaces, username and password 1184 00:38:12,310 --> 00:38:13,310 fields. 1185 00:38:21,350 --> 00:38:23,539 Anyway, using ZMapp, we 1186 00:38:23,540 --> 00:38:25,789 were able to do a scan and figure out how 1187 00:38:25,790 --> 00:38:27,979 many systems could be exploited 1188 00:38:27,980 --> 00:38:29,059 immediately using 1189 00:38:29,060 --> 00:38:30,889 this problem, and we were able to find, 1190 00:38:32,360 --> 00:38:33,829 I forget the exact number. 1191 00:38:33,830 --> 00:38:35,539 I don't have it on the slide 50 to 100 1192 00:38:35,540 --> 00:38:36,529 thousand servers 1193 00:38:36,530 --> 00:38:37,969 that were plugged in to public 1194 00:38:37,970 --> 00:38:39,739 IP addresses, even though these are 1195 00:38:39,740 --> 00:38:41,749 server grade machines 1196 00:38:41,750 --> 00:38:43,219 that had this 1197 00:38:43,220 --> 00:38:45,050 remotely exploitable IP am-I. 1198 00:38:47,060 --> 00:38:49,279 So they're sort of more depressing things 1199 00:38:49,280 --> 00:38:49,579 you can 1200 00:38:49,580 --> 00:38:50,580 do with this, 1201 00:38:51,590 --> 00:38:52,219 like 1202 00:38:52,220 --> 00:38:54,259 enumerating hidden services. 1203 00:38:54,260 --> 00:38:56,359 So if anyone here has built a service 1204 00:38:56,360 --> 00:38:57,919 where you rely on the 1205 00:38:57,920 --> 00:38:59,239 internet being large 1206 00:38:59,240 --> 00:39:01,519 and your things being hard to find, 1207 00:39:01,520 --> 00:39:03,259 you might want to rethink your threat 1208 00:39:03,260 --> 00:39:04,260 model. 1209 00:39:06,470 --> 00:39:08,749 We can easily uncover things 1210 00:39:08,750 --> 00:39:09,079 using 1211 00:39:09,080 --> 00:39:11,239 tools like ZMapp and 1212 00:39:11,240 --> 00:39:12,529 high speed scanning. 1213 00:39:12,530 --> 00:39:15,589 So one experiment we did was looking for 1214 00:39:15,590 --> 00:39:17,840 Tor Bridges, and these are 1215 00:39:18,920 --> 00:39:20,689 hidden Tor nodes that aren't publicly 1216 00:39:20,690 --> 00:39:23,149 advertised that are used to let people in 1217 00:39:23,150 --> 00:39:25,669 countries that practice censorship. 1218 00:39:25,670 --> 00:39:28,189 Access Tor without that, that sensor 1219 00:39:28,190 --> 00:39:30,109 just blocking access to these IP 1220 00:39:30,110 --> 00:39:31,399 addresses 1221 00:39:31,400 --> 00:39:32,449 and 1222 00:39:32,450 --> 00:39:34,759 using a very simple application 1223 00:39:34,760 --> 00:39:34,879 of 1224 00:39:34,880 --> 00:39:38,389 ZMapp, we identified 1225 00:39:38,390 --> 00:39:39,379 more than 86 1226 00:39:39,380 --> 00:39:41,539 percent of the tour 1227 00:39:41,540 --> 00:39:43,909 bridges that were live and allocated 1228 00:39:43,910 --> 00:39:45,349 at the time in the Tour Directory 1229 00:39:45,350 --> 00:39:46,350 service. 1230 00:39:48,110 --> 00:39:49,159 This is with 1231 00:39:49,160 --> 00:39:51,079 a single scan, just looking at 1232 00:39:51,080 --> 00:39:52,639 two ports. 1233 00:39:52,640 --> 00:39:54,649 Now, this is not the state of the art 1234 00:39:54,650 --> 00:39:56,779 tour. This is luckily tour is 1235 00:39:56,780 --> 00:39:58,069 already moved on. 1236 00:39:58,070 --> 00:40:00,619 They've developed other techniques 1237 00:40:00,620 --> 00:40:02,329 that are much more resistant to this 1238 00:40:02,330 --> 00:40:04,579 attack, including OP's proxy, 1239 00:40:04,580 --> 00:40:06,949 which randomize is the port it listens 1240 00:40:06,950 --> 00:40:09,259 on, which makes it thousands 1241 00:40:09,260 --> 00:40:11,689 of times harder to exhaustively enumerate 1242 00:40:11,690 --> 00:40:13,969 everything and also has maybe 1243 00:40:13,970 --> 00:40:15,859 a more difficult to spot signature. 1244 00:40:16,880 --> 00:40:19,009 However, this is something 1245 00:40:19,010 --> 00:40:20,839 that other services ought to be worried 1246 00:40:20,840 --> 00:40:21,840 about. 1247 00:40:22,370 --> 00:40:24,139 Here's one more thing that's kind of fun 1248 00:40:24,140 --> 00:40:25,519 we can do with this. 1249 00:40:25,520 --> 00:40:27,979 So detecting service disruptions 1250 00:40:27,980 --> 00:40:29,149 so let's say there's a 1251 00:40:29,150 --> 00:40:31,249 hurricane that happens like what 1252 00:40:31,250 --> 00:40:33,709 happened in my country last 1253 00:40:33,710 --> 00:40:35,509 year. We had this this thing called 1254 00:40:35,510 --> 00:40:37,609 Superstorm Sandy 1255 00:40:37,610 --> 00:40:39,439 that hit the Northeast and flooded New 1256 00:40:39,440 --> 00:40:40,440 Jersey, 1257 00:40:41,240 --> 00:40:43,609 and sandy was a terrific natural 1258 00:40:43,610 --> 00:40:44,779 disaster. 1259 00:40:44,780 --> 00:40:47,119 But one thing that we could 1260 00:40:47,120 --> 00:40:49,999 do using internet wide scanning 1261 00:40:50,000 --> 00:40:52,189 is track exactly what 1262 00:40:52,190 --> 00:40:54,229 was going off and when. 1263 00:40:54,230 --> 00:40:56,389 So looking at power outages 1264 00:40:56,390 --> 00:40:58,129 using internet wide scanning. 1265 00:40:58,130 --> 00:41:01,039 So what we did is I is 1266 00:41:01,040 --> 00:41:03,409 my student secure and I just use the map 1267 00:41:03,410 --> 00:41:05,569 continuously to scan as fast as 1268 00:41:05,570 --> 00:41:06,499 we could on. 1269 00:41:06,500 --> 00:41:07,399 I think this was Port 1270 00:41:07,400 --> 00:41:10,399 443 again https 1271 00:41:10,400 --> 00:41:12,619 while the storm was going on and 1272 00:41:12,620 --> 00:41:14,899 we were able to look for locations 1273 00:41:14,900 --> 00:41:17,149 for areas in the network where there was 1274 00:41:17,150 --> 00:41:19,219 a greater than normal decrease 1275 00:41:19,220 --> 00:41:20,989 in the number of listening house. 1276 00:41:20,990 --> 00:41:23,029 So places where the power went out and 1277 00:41:23,030 --> 00:41:24,949 suddenly people's Netgear routers, 1278 00:41:24,950 --> 00:41:27,379 for instance, were no longer exposing 1279 00:41:27,380 --> 00:41:29,059 their or their web interface to the 1280 00:41:29,060 --> 00:41:30,019 world. 1281 00:41:30,020 --> 00:41:32,269 So by looking at the whole internet, 1282 00:41:32,270 --> 00:41:34,459 we could spot evidence of this localized 1283 00:41:34,460 --> 00:41:36,559 disaster and we can do 1284 00:41:36,560 --> 00:41:38,659 something like this on a continuous basis 1285 00:41:38,660 --> 00:41:40,699 all the time if we want to to spot for 1286 00:41:40,700 --> 00:41:41,689 disruptions 1287 00:41:41,690 --> 00:41:44,119 in in critical infrastructure. 1288 00:41:46,980 --> 00:41:48,479 All right, so let me tell you about one 1289 00:41:48,480 --> 00:41:51,299 other thing, that's a pretty 1290 00:41:51,300 --> 00:41:53,519 and very interesting to me anyway, 1291 00:41:53,520 --> 00:41:54,479 security 1292 00:41:54,480 --> 00:41:56,399 related application of this. 1293 00:41:56,400 --> 00:41:58,619 And this has to do with cryptography, 1294 00:41:58,620 --> 00:42:00,299 and some of you have probably heard of 1295 00:42:00,300 --> 00:42:03,389 this work. This is work that we did 1296 00:42:03,390 --> 00:42:04,349 using 1297 00:42:04,350 --> 00:42:07,379 basically the prototype to ZMapp 1298 00:42:07,380 --> 00:42:07,589 in 1299 00:42:07,590 --> 00:42:09,779 2012, where 1300 00:42:09,780 --> 00:42:12,149 we did a scan of everything 1301 00:42:12,150 --> 00:42:14,339 that was using HTTPS and everything 1302 00:42:14,340 --> 00:42:16,409 that was using SSA in order to 1303 00:42:16,410 --> 00:42:17,369 collect the public 1304 00:42:17,370 --> 00:42:19,709 keys that are 1305 00:42:19,710 --> 00:42:20,309 used 1306 00:42:20,310 --> 00:42:21,419 for 1307 00:42:21,420 --> 00:42:23,519 for crypto on the internet, for the most 1308 00:42:23,520 --> 00:42:24,659 important protocols. 1309 00:42:25,890 --> 00:42:28,199 So we found some very interesting things. 1310 00:42:28,200 --> 00:42:30,299 Basically, we wanted to give to 1311 00:42:30,300 --> 00:42:32,519 do a health check 1312 00:42:32,520 --> 00:42:35,549 for the state of crypto online. 1313 00:42:35,550 --> 00:42:37,340 And this was one of the 1314 00:42:38,620 --> 00:42:38,999 the 1315 00:42:39,000 --> 00:42:41,819 easiest ways to get broad visibility 1316 00:42:41,820 --> 00:42:43,860 into the global use of cryptography. 1317 00:42:45,060 --> 00:42:47,429 One really interesting thing we found 1318 00:42:47,430 --> 00:42:49,589 is that even though we found 1319 00:42:49,590 --> 00:42:52,079 on the order of four HD types, almost 13 1320 00:42:52,080 --> 00:42:54,209 million listening house for SSA to 1321 00:42:54,210 --> 00:42:56,339 about 10 million hosts, the number of 1322 00:42:56,340 --> 00:42:58,499 actual distinct different 1323 00:42:58,500 --> 00:43:00,809 public keys is much, much lower 1324 00:43:00,810 --> 00:43:01,709 than that. 1325 00:43:01,710 --> 00:43:03,779 So we have about we have less than 1326 00:43:03,780 --> 00:43:05,879 half the number of distinct public 1327 00:43:05,880 --> 00:43:07,529 keys for HTTPS 1328 00:43:07,530 --> 00:43:09,119 as we have hosts. 1329 00:43:09,120 --> 00:43:11,279 This means there is a very large amount 1330 00:43:11,280 --> 00:43:12,389 of duplication 1331 00:43:12,390 --> 00:43:14,549 of keys, the same he used 1332 00:43:14,550 --> 00:43:16,709 on multiple systems. 1333 00:43:16,710 --> 00:43:18,719 Now there's some legitimate reasons why 1334 00:43:18,720 --> 00:43:21,059 hosts might share keys may be there. 1335 00:43:21,060 --> 00:43:22,889 These are all the different IP addresses 1336 00:43:22,890 --> 00:43:24,089 that belong to Google, 1337 00:43:25,200 --> 00:43:26,849 but there are also reasons that we 1338 00:43:26,850 --> 00:43:29,279 detected, which are distinct security 1339 00:43:29,280 --> 00:43:30,329 vulnerabilities, 1340 00:43:31,920 --> 00:43:34,199 things like default certificates 1341 00:43:34,200 --> 00:43:36,659 that are in use across many, 1342 00:43:36,660 --> 00:43:38,699 many different devices, and house 1343 00:43:39,960 --> 00:43:42,419 also evidence that many systems 1344 00:43:42,420 --> 00:43:43,679 have problems with their random 1345 00:43:43,680 --> 00:43:44,909 number generators, 1346 00:43:44,910 --> 00:43:46,649 and that this resulted in distinct 1347 00:43:46,650 --> 00:43:48,749 systems, completely distinct 1348 00:43:48,750 --> 00:43:50,309 opposite sides of the world, different 1349 00:43:50,310 --> 00:43:50,969 owners 1350 00:43:50,970 --> 00:43:53,159 generating the same keys. 1351 00:43:53,160 --> 00:43:54,959 Now this is a problem, of course, because 1352 00:43:54,960 --> 00:43:57,359 if you and I share the same public key, 1353 00:43:57,360 --> 00:43:59,610 then we can each impersonate the other. 1354 00:44:00,960 --> 00:44:02,699 And when you talk about this happening at 1355 00:44:02,700 --> 00:44:04,829 global scale with almost 1356 00:44:06,240 --> 00:44:08,459 with having only half the distinct number 1357 00:44:08,460 --> 00:44:10,859 of keys as hosts, it's evidence 1358 00:44:10,860 --> 00:44:12,840 of a really huge problem. 1359 00:44:14,220 --> 00:44:15,220 But there is. 1360 00:44:16,410 --> 00:44:17,410 Let me see here. 1361 00:44:18,750 --> 00:44:19,589 But there's another 1362 00:44:19,590 --> 00:44:21,659 problem that we could spot using this 1363 00:44:21,660 --> 00:44:24,749 kind of global data from scans, 1364 00:44:24,750 --> 00:44:27,479 which is that some keys 1365 00:44:27,480 --> 00:44:29,639 are not only duplicated in ways 1366 00:44:29,640 --> 00:44:31,199 that would lead 1367 00:44:31,200 --> 00:44:31,889 you, and I 1368 00:44:31,890 --> 00:44:33,719 impersonate each other, 1369 00:44:33,720 --> 00:44:35,699 but are duplicated in ways that would let 1370 00:44:35,700 --> 00:44:37,949 me with only the data we collected 1371 00:44:37,950 --> 00:44:39,959 with scanning. Actually, figure out what 1372 00:44:39,960 --> 00:44:41,190 your private key is. 1373 00:44:42,540 --> 00:44:45,179 So one example of this 1374 00:44:45,180 --> 00:44:48,569 has to do with RSA, where in RSA 1375 00:44:48,570 --> 00:44:50,729 and you should come to the crypto talk 1376 00:44:50,730 --> 00:44:51,389 tonight if you 1377 00:44:51,390 --> 00:44:53,849 want to to learn much more detail 1378 00:44:53,850 --> 00:44:55,889 about how these crypto systems actually 1379 00:44:55,890 --> 00:44:58,379 work from real cryptographers. 1380 00:44:58,380 --> 00:45:00,419 But in RSA, you have this modulus 1381 00:45:00,420 --> 00:45:02,519 and that is a product of two large 1382 00:45:02,520 --> 00:45:03,760 primes pacu. 1383 00:45:04,860 --> 00:45:07,109 Normally, we don't know 1384 00:45:07,110 --> 00:45:09,569 any efficient way to go backwards 1385 00:45:09,570 --> 00:45:11,879 to get P and Q your private key 1386 00:45:11,880 --> 00:45:14,189 from any which is part of your public 1387 00:45:14,190 --> 00:45:15,190 key. 1388 00:45:15,980 --> 00:45:18,349 However, if your key wasn't quite 1389 00:45:18,350 --> 00:45:20,449 properly generated 1390 00:45:20,450 --> 00:45:20,989 because, 1391 00:45:20,990 --> 00:45:22,609 say, there was a problem with your random 1392 00:45:22,610 --> 00:45:22,849 number 1393 00:45:22,850 --> 00:45:25,069 generator and you 1394 00:45:25,070 --> 00:45:27,259 and I end up having different keys, but 1395 00:45:27,260 --> 00:45:29,689 we share one of those two 1396 00:45:29,690 --> 00:45:31,909 primes, let's say we share P and 1397 00:45:31,910 --> 00:45:32,059 have 1398 00:45:32,060 --> 00:45:33,499 different cues 1399 00:45:33,500 --> 00:45:35,389 or public keys will look completely 1400 00:45:35,390 --> 00:45:37,939 different. But if, say, we examined 1401 00:45:37,940 --> 00:45:40,159 them together, we 1402 00:45:40,160 --> 00:45:42,499 can very trivially factor both 1403 00:45:42,500 --> 00:45:44,359 of them just by taking the greatest 1404 00:45:44,360 --> 00:45:47,869 common divisor, which takes milliseconds. 1405 00:45:47,870 --> 00:45:49,699 So we came up with a way 1406 00:45:49,700 --> 00:45:50,929 based on a technique by 1407 00:45:50,930 --> 00:45:53,269 Dan Burnstein, Dan Bernstein, 1408 00:45:53,270 --> 00:45:55,609 that lets us take the jack 1409 00:45:55,610 --> 00:45:55,759 of 1410 00:45:55,760 --> 00:45:58,879 all pairs of keys on the internet. 1411 00:45:58,880 --> 00:46:01,249 And what this let us do is 1412 00:46:02,570 --> 00:46:04,879 get the private keys 1413 00:46:04,880 --> 00:46:07,009 to about half a percent of all. 1414 00:46:07,010 --> 00:46:07,669 The TLS 1415 00:46:07,670 --> 00:46:09,739 hosts all 1416 00:46:09,740 --> 00:46:09,859 the 1417 00:46:09,860 --> 00:46:11,989 HTP as hosts on the internet 1418 00:46:11,990 --> 00:46:14,329 because they have improperly generated 1419 00:46:14,330 --> 00:46:15,799 keys of that form. 1420 00:46:15,800 --> 00:46:16,800 Similarly, with. 1421 00:46:24,620 --> 00:46:26,959 Similarly, with DSA, which is 1422 00:46:26,960 --> 00:46:28,459 another public key 1423 00:46:28,460 --> 00:46:29,750 algorithm that's used 1424 00:46:30,800 --> 00:46:32,449 even more so for S.H. 1425 00:46:32,450 --> 00:46:32,779 than for 1426 00:46:32,780 --> 00:46:34,369 TLC, we got more than 1427 00:46:34,370 --> 00:46:36,319 one percent of everyone's of all the 1428 00:46:36,320 --> 00:46:38,479 hosts private keys, 1429 00:46:38,480 --> 00:46:39,769 and this is all based on 1430 00:46:39,770 --> 00:46:41,689 visibility that we can get with internet 1431 00:46:41,690 --> 00:46:42,690 wide scanning. 1432 00:46:43,640 --> 00:46:45,289 But there's a question of why are these 1433 00:46:45,290 --> 00:46:46,219 systems vulnerable? 1434 00:46:46,220 --> 00:46:47,179 Well, the what? 1435 00:46:47,180 --> 00:46:49,819 What are they? Why are they vulnerable? 1436 00:46:49,820 --> 00:46:51,409 Well, if we actually look at the 1437 00:46:51,410 --> 00:46:53,479 certificates say that are 1438 00:46:53,480 --> 00:46:53,899 served 1439 00:46:53,900 --> 00:46:54,949 for 1440 00:46:54,950 --> 00:46:56,539 HD types hosts with these 1441 00:46:56,540 --> 00:46:58,099 vulnerabilities, here's what we 1442 00:46:58,100 --> 00:46:59,329 see. 1443 00:46:59,330 --> 00:47:01,669 These are not web servers. 1444 00:47:01,670 --> 00:47:03,759 They're not normal web servers anyway. 1445 00:47:03,760 --> 00:47:04,760 This is not. 1446 00:47:05,600 --> 00:47:06,289 This is not 1447 00:47:06,290 --> 00:47:08,749 Google or or PayPal. 1448 00:47:10,100 --> 00:47:11,899 Let's see. The systems say things like, 1449 00:47:11,900 --> 00:47:13,999 Oh, system generated 1450 00:47:14,000 --> 00:47:16,609 TP-Link technologies from TP-Link 1451 00:47:16,610 --> 00:47:18,079 software. You can learn that 1452 00:47:18,080 --> 00:47:19,999 this is some company that makes some 1453 00:47:20,000 --> 00:47:22,219 kind of embedded system gateway 1454 00:47:22,220 --> 00:47:23,659 device. 1455 00:47:23,660 --> 00:47:25,729 What's this Xerox office printer 1456 00:47:25,730 --> 00:47:27,679 at the University of Michigan C.S. 1457 00:47:27,680 --> 00:47:28,680 Department? 1458 00:47:29,780 --> 00:47:31,520 And I swear I didn't pick this 1459 00:47:33,020 --> 00:47:34,759 specifically for that. 1460 00:47:34,760 --> 00:47:36,679 Turns out that almost all of these 1461 00:47:36,680 --> 00:47:38,839 devices are different kinds of embedded 1462 00:47:38,840 --> 00:47:39,799 systems. 1463 00:47:39,800 --> 00:47:42,799 The Internet of Things is already here. 1464 00:47:42,800 --> 00:47:44,209 This is the world we're living in. 1465 00:47:44,210 --> 00:47:45,829 In fact, if you look at just random 1466 00:47:45,830 --> 00:47:48,379 samples of device of IP addresses, 1467 00:47:48,380 --> 00:47:49,909 almost everything you're going to be 1468 00:47:49,910 --> 00:47:51,949 seeing is some kind of 1469 00:47:51,950 --> 00:47:53,509 embedded or headless device. 1470 00:47:53,510 --> 00:47:55,280 It's not a normal public server. 1471 00:47:56,630 --> 00:47:59,269 So these are things like home routers, 1472 00:47:59,270 --> 00:48:01,339 professional routers, server management 1473 00:48:01,340 --> 00:48:03,949 devices, printers, et cetera. 1474 00:48:03,950 --> 00:48:06,139 So in the end, in this week, key 1475 00:48:06,140 --> 00:48:08,359 research we identified devices from more 1476 00:48:08,360 --> 00:48:10,639 than 40 different manufacturers got 1477 00:48:10,640 --> 00:48:12,469 in touch with those manufacturers to try 1478 00:48:12,470 --> 00:48:13,849 to fix the problem. 1479 00:48:13,850 --> 00:48:16,339 And we also traced the problem back 1480 00:48:16,340 --> 00:48:18,379 to a to a fundamental problem in the 1481 00:48:18,380 --> 00:48:19,699 Linux kernel. 1482 00:48:19,700 --> 00:48:21,499 So this is one of the things that many of 1483 00:48:21,500 --> 00:48:22,159 these devices 1484 00:48:22,160 --> 00:48:23,479 shared was that they were running 1485 00:48:23,480 --> 00:48:24,769 Linux while on Linux. 1486 00:48:24,770 --> 00:48:26,509 Where do you get random numbers for key 1487 00:48:26,510 --> 00:48:27,199 generation? 1488 00:48:27,200 --> 00:48:29,269 You probably get them in practice 1489 00:48:29,270 --> 00:48:31,219 from dev you random. 1490 00:48:31,220 --> 00:48:33,079 They say you should be using Dev Random, 1491 00:48:33,080 --> 00:48:34,640 but nobody does trust me. 1492 00:48:36,320 --> 00:48:38,059 So the way Linux generates 1493 00:48:38,060 --> 00:48:39,859 random numbers is, it looks at things 1494 00:48:39,860 --> 00:48:40,639 like the time of 1495 00:48:40,640 --> 00:48:42,799 boot, like keyboard and mouse movement, 1496 00:48:42,800 --> 00:48:45,259 like disk access timing, and it 1497 00:48:45,260 --> 00:48:46,369 matches all mixes. 1498 00:48:46,370 --> 00:48:46,819 All of these 1499 00:48:46,820 --> 00:48:48,529 together in something called the input 1500 00:48:48,530 --> 00:48:49,939 entropy pool. 1501 00:48:49,940 --> 00:48:52,009 Then periodically, when things read from 1502 00:48:52,010 --> 00:48:54,229 dev urandom, it mixes in 1503 00:48:55,400 --> 00:48:57,409 the time the system turned on 1504 00:48:57,410 --> 00:48:58,410 and 1505 00:48:59,270 --> 00:49:01,909 mixes mixes in stuff from the input pool 1506 00:49:01,910 --> 00:49:02,719 and gives you back 1507 00:49:02,720 --> 00:49:04,639 some some bits. 1508 00:49:04,640 --> 00:49:06,529 Well, on a headless device, let's think 1509 00:49:06,530 --> 00:49:08,869 about your Netgear router at home. 1510 00:49:08,870 --> 00:49:10,639 You probably don't have a keyboard and 1511 00:49:10,640 --> 00:49:12,049 mouse plugged into it. 1512 00:49:12,050 --> 00:49:13,699 It doesn't have a traditional desk. 1513 00:49:13,700 --> 00:49:15,320 It might not even have a clock. 1514 00:49:19,450 --> 00:49:20,450 Hopes. 1515 00:49:27,810 --> 00:49:29,789 And there's another problem, which is 1516 00:49:29,790 --> 00:49:32,309 that because of another security 1517 00:49:32,310 --> 00:49:33,929 vulnerability that might happen if they 1518 00:49:33,930 --> 00:49:36,029 didn't do this detente only gets copied 1519 00:49:36,030 --> 00:49:37,859 from that input pool into the non 1520 00:49:37,860 --> 00:49:40,319 blocking pool after you've accumulated 1521 00:49:40,320 --> 00:49:40,379 a 1522 00:49:40,380 --> 00:49:42,689 certain amount of data, 192 1523 00:49:42,690 --> 00:49:45,029 bits in the kernels we looked at. 1524 00:49:45,030 --> 00:49:47,309 That means dev urandom can take a long 1525 00:49:47,310 --> 00:49:49,469 time to warm up, and before 1526 00:49:49,470 --> 00:49:51,089 it gets this much data, it may be 1527 00:49:51,090 --> 00:49:52,589 entirely predictable. 1528 00:49:52,590 --> 00:49:54,659 And this creates a period of time between 1529 00:49:54,660 --> 00:49:57,449 when the system first boots and 1530 00:49:57,450 --> 00:49:59,609 when it's warmed up, when 1531 00:49:59,610 --> 00:50:01,799 it's a potential window of vulnerability. 1532 00:50:01,800 --> 00:50:04,619 So in one test system, here is when 1533 00:50:04,620 --> 00:50:06,869 Dev Random warmed up and actually became 1534 00:50:06,870 --> 00:50:07,919 randomized. 1535 00:50:07,920 --> 00:50:09,809 Here is when open SSA ceded its 1536 00:50:09,810 --> 00:50:11,699 internal random state. 1537 00:50:13,200 --> 00:50:15,509 This is what we call a boot time entropy 1538 00:50:15,510 --> 00:50:17,909 hall. And our hypothesis 1539 00:50:17,910 --> 00:50:19,979 is that many of these vulnerable devices 1540 00:50:19,980 --> 00:50:22,079 generated on first boot 1541 00:50:22,080 --> 00:50:23,189 their private 1542 00:50:23,190 --> 00:50:24,569 key 1543 00:50:24,570 --> 00:50:26,699 during this entropy hall, when 1544 00:50:26,700 --> 00:50:27,329 they had not 1545 00:50:27,330 --> 00:50:30,269 yet collected any form of randomness. 1546 00:50:30,270 --> 00:50:32,489 So there's a fix in the kernel now. 1547 00:50:32,490 --> 00:50:34,289 As a result of all of this, it may or may 1548 00:50:34,290 --> 00:50:36,149 not have rolled out to your distribution 1549 00:50:36,150 --> 00:50:37,229 yet. 1550 00:50:37,230 --> 00:50:39,419 Linux has now added other sources 1551 00:50:39,420 --> 00:50:41,579 of entropy and is trying at least to 1552 00:50:41,580 --> 00:50:43,439 make everyone's system distinct by 1553 00:50:43,440 --> 00:50:43,979 rolling in 1554 00:50:43,980 --> 00:50:46,109 the the Mac address. 1555 00:50:46,110 --> 00:50:46,409 If you 1556 00:50:46,410 --> 00:50:48,299 want to learn more about this part of the 1557 00:50:48,300 --> 00:50:50,459 work or to check your own keys online, 1558 00:50:50,460 --> 00:50:52,289 we have a key check service at factor 1559 00:50:52,290 --> 00:50:53,429 rebel dot net that will 1560 00:50:53,430 --> 00:50:55,139 tell you whether we have your private 1561 00:50:55,140 --> 00:50:56,140 key. 1562 00:51:05,410 --> 00:51:07,539 So just one more 1563 00:51:07,540 --> 00:51:08,499 thing I want to tell you 1564 00:51:08,500 --> 00:51:10,569 about, which is more, we learned 1565 00:51:10,570 --> 00:51:12,429 about HD tapes, we have another 1566 00:51:12,430 --> 00:51:14,499 paper about this actually looking at the 1567 00:51:14,500 --> 00:51:15,669 ecosystem 1568 00:51:15,670 --> 00:51:16,959 surrounding HD types. 1569 00:51:16,960 --> 00:51:18,219 So we all know there are too many 1570 00:51:18,220 --> 00:51:18,819 certificate 1571 00:51:18,820 --> 00:51:20,499 authorities that aren't secure or 1572 00:51:20,500 --> 00:51:21,909 trustworthy. 1573 00:51:21,910 --> 00:51:24,189 But what we don't all know is who those 1574 00:51:24,190 --> 00:51:25,749 certificate authorities are. 1575 00:51:25,750 --> 00:51:27,819 There is no single list of 1576 00:51:27,820 --> 00:51:29,739 all of the entities that your browser 1577 00:51:29,740 --> 00:51:31,869 will trust to 1578 00:51:31,870 --> 00:51:33,279 sign a certificate 1579 00:51:33,280 --> 00:51:35,379 vouching for the identity of a web site 1580 00:51:35,380 --> 00:51:36,609 you're connecting to. 1581 00:51:36,610 --> 00:51:38,589 The reason for this is that those root 1582 00:51:38,590 --> 00:51:40,419 certificate authorities that are listed 1583 00:51:40,420 --> 00:51:42,399 in your browser have the ability to 1584 00:51:42,400 --> 00:51:44,499 delegate their authority to other 1585 00:51:44,500 --> 00:51:46,329 parties to what are called intermediate 1586 00:51:46,330 --> 00:51:47,769 certificate authorities. 1587 00:51:47,770 --> 00:51:49,389 And those intermediate certificate 1588 00:51:49,390 --> 00:51:51,699 authorities aren't on any list anywhere. 1589 00:51:51,700 --> 00:51:53,469 You don't know about them until you see a 1590 00:51:53,470 --> 00:51:55,329 certificate from them. 1591 00:51:55,330 --> 00:51:57,549 So what we did was we use ZMapp 1592 00:51:57,550 --> 00:51:59,709 to crawl the to 1593 00:51:59,710 --> 00:52:01,509 crawl the whole public address space. 1594 00:52:01,510 --> 00:52:03,669 Find out what certificate chains it 1595 00:52:03,670 --> 00:52:05,859 saw and compile a list of all 1596 00:52:05,860 --> 00:52:07,509 of those intermediate certificate 1597 00:52:07,510 --> 00:52:08,439 authorities. 1598 00:52:08,440 --> 00:52:10,149 We compiled a lot of information about 1599 00:52:10,150 --> 00:52:12,489 the way those servers were connected to. 1600 00:52:12,490 --> 00:52:14,139 So this is an example of how you might 1601 00:52:14,140 --> 00:52:16,449 change ZMapp together with other kinds 1602 00:52:16,450 --> 00:52:17,739 of connection and processing 1603 00:52:17,740 --> 00:52:19,689 libraries in order to do a more 1604 00:52:19,690 --> 00:52:22,359 sophisticated kind of scan like this. 1605 00:52:22,360 --> 00:52:24,519 In the end, we did about 200 1606 00:52:24,520 --> 00:52:25,299 internet wide 1607 00:52:25,300 --> 00:52:28,539 probes to get this data. 1608 00:52:28,540 --> 00:52:29,619 More than a 1609 00:52:29,620 --> 00:52:31,959 internet wide scans each of them 1610 00:52:31,960 --> 00:52:32,919 more comprehensive 1611 00:52:32,920 --> 00:52:34,209 than the FSL 1612 00:52:34,210 --> 00:52:35,229 Observatory. 1613 00:52:35,230 --> 00:52:38,079 More than a trillion probe packets sent 1614 00:52:38,080 --> 00:52:38,469 over the 1615 00:52:38,470 --> 00:52:40,809 course of a bit more than a year. 1616 00:52:42,880 --> 00:52:45,219 So in addition to finding, 1617 00:52:45,220 --> 00:52:47,829 I think thirteen hundred 1618 00:52:47,830 --> 00:52:49,899 entities that can sign a 1619 00:52:49,900 --> 00:52:50,949 certificate for any 1620 00:52:50,950 --> 00:52:53,319 website you visit because there's no kind 1621 00:52:53,320 --> 00:52:55,389 of limitation to that scope 1622 00:52:55,390 --> 00:52:56,889 of authority and trust that's 1623 00:52:56,890 --> 00:52:58,989 put into SSL certificate 1624 00:52:58,990 --> 00:53:00,369 authorities. 1625 00:53:00,370 --> 00:53:03,249 We find interesting things like this. 1626 00:53:03,250 --> 00:53:04,250 There are 1627 00:53:05,410 --> 00:53:07,569 still 50 percent of certificates are 1628 00:53:07,570 --> 00:53:09,699 eventually routed in a 10 24 1629 00:53:09,700 --> 00:53:10,749 bit key. 1630 00:53:10,750 --> 00:53:12,459 This is something on the verge of being 1631 00:53:12,460 --> 00:53:13,479 factor rebel. 1632 00:53:13,480 --> 00:53:15,519 More than 70 percent of these routes will 1633 00:53:15,520 --> 00:53:16,719 expire more than 1634 00:53:17,740 --> 00:53:19,539 more than two years from now. 1635 00:53:19,540 --> 00:53:22,779 So this is bad. 1636 00:53:22,780 --> 00:53:24,009 On the other hand, there's some good 1637 00:53:24,010 --> 00:53:26,079 things. During the life of this scan, we 1638 00:53:26,080 --> 00:53:28,299 see SSA deployment. 1639 00:53:28,300 --> 00:53:31,179 I mean, SSA TLS deployment 1640 00:53:31,180 --> 00:53:32,709 growing dramatically. 1641 00:53:32,710 --> 00:53:34,959 We see a 10 percent uptick in the number 1642 00:53:34,960 --> 00:53:36,939 of servers, a 23 percent 1643 00:53:36,940 --> 00:53:39,819 uptick among Alexa top one million. 1644 00:53:39,820 --> 00:53:41,709 We need to encrypt the web. 1645 00:53:41,710 --> 00:53:42,999 Progress is happening. 1646 00:53:43,000 --> 00:53:44,890 I hope it will happen more quickly. 1647 00:53:46,120 --> 00:53:47,799 All right. So just a couple of concluding 1648 00:53:47,800 --> 00:53:49,119 remarks. 1649 00:53:49,120 --> 00:53:50,120 One. 1650 00:53:51,010 --> 00:53:52,779 One of my goals from all of this is to 1651 00:53:52,780 --> 00:53:55,149 encourage everyone here to be doing 1652 00:53:55,150 --> 00:53:55,779 more 1653 00:53:55,780 --> 00:53:57,219 work 1654 00:53:57,220 --> 00:53:59,169 involving internet wide scanning 1655 00:53:59,170 --> 00:54:01,269 involving measurement involving trying 1656 00:54:01,270 --> 00:54:03,579 to understand the internet 1657 00:54:03,580 --> 00:54:05,589 from this perspective of global 1658 00:54:05,590 --> 00:54:06,549 measurement. 1659 00:54:06,550 --> 00:54:08,689 And we've shown with work like the week 1660 00:54:08,690 --> 00:54:11,139 key work that this can give you really 1661 00:54:11,140 --> 00:54:12,279 new and different kinds of 1662 00:54:12,280 --> 00:54:13,959 security insight 1663 00:54:13,960 --> 00:54:15,669 in order to try to make this possible, 1664 00:54:15,670 --> 00:54:17,589 since not everyone has bandwidth to do 1665 00:54:17,590 --> 00:54:19,089 these scans themselves. 1666 00:54:19,090 --> 00:54:21,249 We're taking the data that we collect in 1667 00:54:21,250 --> 00:54:23,409 our scans and throwing it up on the web, 1668 00:54:23,410 --> 00:54:25,479 and we'd be happy to host data from other 1669 00:54:25,480 --> 00:54:27,069 people too. We're getting data from 1670 00:54:27,070 --> 00:54:28,989 Rapid7 now already. 1671 00:54:28,990 --> 00:54:31,119 So if you go to scans Dot Io, you 1672 00:54:31,120 --> 00:54:33,399 can download the data sets, 1673 00:54:33,400 --> 00:54:34,329 do your own data 1674 00:54:34,330 --> 00:54:36,349 mining on them yourselves 1675 00:54:36,350 --> 00:54:38,199 and and 1676 00:54:38,200 --> 00:54:40,779 ask probing questions. 1677 00:54:40,780 --> 00:54:43,209 ZMapp is also available 1678 00:54:43,210 --> 00:54:44,469 online. You can get it. 1679 00:54:44,470 --> 00:54:46,539 Here are demo is not 1680 00:54:46,540 --> 00:54:48,280 actually running, but I'll run it soon. 1681 00:54:49,840 --> 00:54:52,449 Finally, if you do decide to 1682 00:54:52,450 --> 00:54:54,519 do scanning yourself, just 1683 00:54:54,520 --> 00:54:56,589 a few reminders about 1684 00:54:56,590 --> 00:54:59,529 ethics, so it is possible 1685 00:54:59,530 --> 00:55:01,359 to cause damage by overwhelming 1686 00:55:01,360 --> 00:55:02,439 other people's networks. 1687 00:55:02,440 --> 00:55:04,539 And we don't want to to be dosing people. 1688 00:55:04,540 --> 00:55:06,789 We want to be measuring what's publicly 1689 00:55:06,790 --> 00:55:07,790 visible, 1690 00:55:08,620 --> 00:55:10,239 but there's no way to request 1691 00:55:10,240 --> 00:55:12,489 permission from everyone involved because 1692 00:55:12,490 --> 00:55:14,769 there's no equivalent to robots.txt 1693 00:55:14,770 --> 00:55:17,199 for for IP. 1694 00:55:17,200 --> 00:55:19,269 So for now, in order 1695 00:55:19,270 --> 00:55:21,399 to avoid alarming 1696 00:55:21,400 --> 00:55:24,010 administrators in order to avoid 1697 00:55:25,240 --> 00:55:28,779 knocking networks offline inadvertently, 1698 00:55:28,780 --> 00:55:29,889 there are bunch of steps you 1699 00:55:29,890 --> 00:55:31,689 can take to try to reduce impact. 1700 00:55:31,690 --> 00:55:31,989 One. 1701 00:55:31,990 --> 00:55:33,219 Just using the random 1702 00:55:33,220 --> 00:55:35,169 ordering that things like ZMapp provide 1703 00:55:35,170 --> 00:55:37,659 should help with avoiding 1704 00:55:37,660 --> 00:55:39,429 dosing people. 1705 00:55:39,430 --> 00:55:41,529 It's also a good idea to try to signal 1706 00:55:41,530 --> 00:55:43,299 that you're not maliciously. 1707 00:55:43,300 --> 00:55:44,859 You don't have malicious intent. 1708 00:55:44,860 --> 00:55:47,139 We do things like put up a web page 1709 00:55:47,140 --> 00:55:49,239 on Port 80 or on the scan source 1710 00:55:49,240 --> 00:55:52,449 IP addresses to let people know 1711 00:55:52,450 --> 00:55:54,069 and, of course, honor request to be 1712 00:55:54,070 --> 00:55:55,070 removed. 1713 00:55:55,690 --> 00:55:57,159 So the bottom line is, be a good 1714 00:55:57,160 --> 00:55:57,549 neighbor, 1715 00:55:57,550 --> 00:55:59,499 don't be a jerk, and everyone can get 1716 00:55:59,500 --> 00:56:00,500 along. 1717 00:56:01,090 --> 00:56:03,339 So over 200 internet wide 1718 00:56:03,340 --> 00:56:04,249 scans, we don't. 1719 00:56:04,250 --> 00:56:05,619 He got about 145 1720 00:56:05,620 --> 00:56:07,239 requests from people who wanted to be 1721 00:56:07,240 --> 00:56:08,739 blocked from further scanning. 1722 00:56:08,740 --> 00:56:10,599 Certainly some people blocked us without 1723 00:56:10,600 --> 00:56:11,649 requesting, 1724 00:56:11,650 --> 00:56:13,840 but very little. 1725 00:56:14,890 --> 00:56:15,999 Just one more story. 1726 00:56:16,000 --> 00:56:18,189 One thing went wrong, which 1727 00:56:18,190 --> 00:56:21,249 is that at one point during our scanning, 1728 00:56:21,250 --> 00:56:23,079 this article appeared in a certain 1729 00:56:23,080 --> 00:56:24,939 publication link to From the Drudge 1730 00:56:24,940 --> 00:56:27,189 Report about 1731 00:56:27,190 --> 00:56:29,529 how Iranian attackers had taken over 1732 00:56:29,530 --> 00:56:31,659 a large part or all of the University of 1733 00:56:31,660 --> 00:56:33,969 Michigan College of Engineering 1734 00:56:33,970 --> 00:56:36,009 Network in order to attack U.S. 1735 00:56:36,010 --> 00:56:37,010 banks. 1736 00:56:39,350 --> 00:56:40,819 There are several things wrong with this 1737 00:56:40,820 --> 00:56:41,820 story. 1738 00:56:43,490 --> 00:56:45,229 It wasn't Iranian hackers. 1739 00:56:45,230 --> 00:56:46,909 It was Zakia, my student. 1740 00:56:57,280 --> 00:56:59,409 So we got them to run a little bit of a 1741 00:56:59,410 --> 00:57:01,539 statement from us saying, no, this isn't 1742 00:57:01,540 --> 00:57:03,569 accurate, but they say their expert 1743 00:57:03,570 --> 00:57:05,709 stands behind their assertion that this 1744 00:57:05,710 --> 00:57:06,999 was Iranian attackers. 1745 00:57:09,280 --> 00:57:10,899 All right, so there's much more to be 1746 00:57:10,900 --> 00:57:13,569 done, but I hope everyone will use this 1747 00:57:13,570 --> 00:57:16,179 technique to to do great research. 1748 00:57:16,180 --> 00:57:18,399 We're living really in a golden age now. 1749 00:57:18,400 --> 00:57:20,409 Networks and computers are fast enough to 1750 00:57:20,410 --> 00:57:22,899 exhaustively scan IPv4 1751 00:57:22,900 --> 00:57:25,089 and IPv6, which has a much larger 1752 00:57:25,090 --> 00:57:26,589 address. Space that's going to be hard to 1753 00:57:26,590 --> 00:57:29,019 scan isn't here yet. 1754 00:57:29,020 --> 00:57:31,539 So this is an opportunity 1755 00:57:31,540 --> 00:57:33,699 for all of us who are interested 1756 00:57:33,700 --> 00:57:35,529 in what's on the internet 1757 00:57:35,530 --> 00:57:36,519 to go ahead 1758 00:57:36,520 --> 00:57:37,389 and find out. 1759 00:57:37,390 --> 00:57:39,309 So again, here are some new URLs for 1760 00:57:39,310 --> 00:57:40,659 things you can try out. 1761 00:57:40,660 --> 00:57:41,679 Thank you very much. 1762 00:57:59,740 --> 00:58:00,969 Thank you, Alex, certainly a very 1763 00:58:00,970 --> 00:58:01,869 interesting talk. 1764 00:58:01,870 --> 00:58:03,639 Now we've got time for about one or two 1765 00:58:03,640 --> 00:58:06,039 questions maximum because we're close to 1766 00:58:06,040 --> 00:58:07,359 wrapping the session up, so if there's 1767 00:58:07,360 --> 00:58:09,549 any questions, please just quickly line 1768 00:58:09,550 --> 00:58:11,319 up a bit of behind one of those eight 1769 00:58:11,320 --> 00:58:13,029 microphones. 1770 00:58:13,030 --> 00:58:14,979 OK, I've got three questions and then I 1771 00:58:14,980 --> 00:58:16,959 think we'll have to finish up, go ahead 1772 00:58:16,960 --> 00:58:18,189 at number four place. 1773 00:58:18,190 --> 00:58:20,139 Yeah, thanks for all the talk. 1774 00:58:20,140 --> 00:58:22,299 What kind of hardware do you need 1775 00:58:22,300 --> 00:58:23,739 for the scans? 1776 00:58:23,740 --> 00:58:25,929 You mentioned one geek uplink 1777 00:58:25,930 --> 00:58:28,299 pipe, but the server doing 1778 00:58:28,300 --> 00:58:30,219 the scanning, what do you need? 1779 00:58:30,220 --> 00:58:32,499 So for most of these experiments 1780 00:58:32,500 --> 00:58:34,629 we used either I think about 1781 00:58:34,630 --> 00:58:37,119 a five year old Dell workstation 1782 00:58:37,120 --> 00:58:39,309 or a really cheap 1783 00:58:39,310 --> 00:58:40,809 $800 1784 00:58:40,810 --> 00:58:43,479 hp one. You so low end, 1785 00:58:43,480 --> 00:58:44,889 low end but solid 1786 00:58:44,890 --> 00:58:47,229 grade hardware, I think, is probably 1787 00:58:47,230 --> 00:58:48,279 the 1788 00:58:48,280 --> 00:58:49,719 the ideal. 1789 00:58:49,720 --> 00:58:51,819 You can do pretty well just from 1790 00:58:51,820 --> 00:58:53,419 cheap commodity stuff. 1791 00:58:53,420 --> 00:58:54,759 OK. Thanks. 1792 00:58:54,760 --> 00:58:56,320 All right. And no place. 1793 00:58:57,670 --> 00:58:59,889 Yes. Things that you mentioned 1794 00:58:59,890 --> 00:59:02,049 IPV six at the end was about 1795 00:59:02,050 --> 00:59:03,100 to ask about that. 1796 00:59:04,480 --> 00:59:05,799 The other point. 1797 00:59:05,800 --> 00:59:08,079 Are you aware of the 1798 00:59:08,080 --> 00:59:10,659 RIBE Atlas project concerning 1799 00:59:10,660 --> 00:59:12,699 scanning the whole internet in a 1800 00:59:12,700 --> 00:59:13,700 distributed way? 1801 00:59:14,650 --> 00:59:16,689 So I have heard of, but I'm not 1802 00:59:16,690 --> 00:59:19,119 well-versed in that in that project, 1803 00:59:19,120 --> 00:59:20,949 and I'd love to know more if anyone 1804 00:59:20,950 --> 00:59:23,019 involved in it is here. 1805 00:59:23,020 --> 00:59:25,419 As far as IPv6, IPv6 1806 00:59:25,420 --> 00:59:27,219 raises a bunch of interesting new 1807 00:59:27,220 --> 00:59:28,839 challenges. Of course, the address space 1808 00:59:28,840 --> 00:59:30,459 is much too big to exhaustively 1809 00:59:30,460 --> 00:59:31,719 enumerate. 1810 00:59:31,720 --> 00:59:33,670 My hypothesis is that 1811 00:59:34,690 --> 00:59:35,139 the main 1812 00:59:35,140 --> 00:59:37,239 way that Wills will scan that 1813 00:59:37,240 --> 00:59:39,429 is based on 1814 00:59:39,430 --> 00:59:41,499 lists of addresses that are in use 1815 00:59:41,500 --> 00:59:42,249 that are collected 1816 00:59:42,250 --> 00:59:45,159 through passive measurement, probably by 1817 00:59:45,160 --> 00:59:46,329 by people in industry. 1818 00:59:47,770 --> 00:59:49,029 And one question from the internet, 1819 00:59:49,030 --> 00:59:50,030 please. 1820 00:59:50,620 --> 00:59:52,989 OK, so how do you think 1821 00:59:52,990 --> 00:59:54,909 that sysadmins should be dealing with 1822 00:59:54,910 --> 00:59:56,029 full scans in the future? 1823 00:59:56,030 --> 00:59:58,089 Do you think it's really worthwhile to 1824 00:59:58,090 --> 01:00:00,129 send abuse emails and phone calls for 1825 01:00:00,130 --> 01:00:01,130 each and every port scan? 1826 01:00:02,140 --> 01:00:04,029 I think this admins have to. 1827 01:00:04,030 --> 01:00:05,739 Well, it's difficult for sysadmin 1828 01:00:05,740 --> 01:00:08,439 sysadmins. I think should recognize 1829 01:00:08,440 --> 01:00:10,089 that not all probe traffic they're 1830 01:00:10,090 --> 01:00:12,159 receiving is malicious and 1831 01:00:12,160 --> 01:00:14,859 that if it's malicious, 1832 01:00:14,860 --> 01:00:16,629 people aren't going to stop it just 1833 01:00:16,630 --> 01:00:19,569 because you ask them to 1834 01:00:19,570 --> 01:00:21,729 that if it's malicious, the 1835 01:00:21,730 --> 01:00:23,559 the bad actors are probably just going to 1836 01:00:23,560 --> 01:00:25,059 be probing you from somewhere else. 1837 01:00:25,060 --> 01:00:26,349 They're probably probing you from a 1838 01:00:26,350 --> 01:00:27,279 botnet. 1839 01:00:27,280 --> 01:00:28,929 So I think the gains from 1840 01:00:31,000 --> 01:00:33,009 spending time 1841 01:00:33,010 --> 01:00:36,309 trying to stop probe traffic 1842 01:00:36,310 --> 01:00:38,499 are relatively small compared 1843 01:00:38,500 --> 01:00:40,779 to what my advice to sysadmins 1844 01:00:40,780 --> 01:00:42,999 would be, which is to make 1845 01:00:43,000 --> 01:00:44,439 sure you're scanning your network 1846 01:00:44,440 --> 01:00:47,469 yourself and locking it down 1847 01:00:47,470 --> 01:00:47,739 so 1848 01:00:47,740 --> 01:00:49,959 that there are there isn't anything that 1849 01:00:49,960 --> 01:00:52,629 a bad actor can gain by 1850 01:00:52,630 --> 01:00:54,879 getting visibility into what's publicly 1851 01:00:54,880 --> 01:00:56,469 addressable in your network. 1852 01:00:57,550 --> 01:00:59,139 OK. And the very last question, because 1853 01:00:59,140 --> 01:01:00,609 we're running out of time for microphone 1854 01:01:00,610 --> 01:01:01,929 number one place. 1855 01:01:01,930 --> 01:01:03,009 All right, thanks. Yeah. 1856 01:01:03,010 --> 01:01:05,229 Just going back to that thing about the 1857 01:01:05,230 --> 01:01:07,389 when you use the time of day 1858 01:01:07,390 --> 01:01:10,599 to explain the difference in responses, 1859 01:01:10,600 --> 01:01:11,529 were you using? 1860 01:01:11,530 --> 01:01:13,479 Does that mean that that scan sample used 1861 01:01:13,480 --> 01:01:15,369 in that test was based on the same time 1862 01:01:15,370 --> 01:01:17,689 zone? Or how do you explain that? 1863 01:01:17,690 --> 01:01:19,809 I didn't quite get that because 1864 01:01:19,810 --> 01:01:20,739 it shouldn't. 1865 01:01:20,740 --> 01:01:22,809 It shouldn't be explainable that way. 1866 01:01:22,810 --> 01:01:25,149 So, no, we picked a random sample of IP 1867 01:01:25,150 --> 01:01:26,739 addresses about, I think, a million 1868 01:01:26,740 --> 01:01:27,669 addresses in those 1869 01:01:27,670 --> 01:01:29,229 tests and 1870 01:01:29,230 --> 01:01:30,940 just probe them every few minutes 1871 01:01:32,650 --> 01:01:34,059 throughout a day. 1872 01:01:34,060 --> 01:01:35,979 And what we're seeing, yeah, is that 1873 01:01:35,980 --> 01:01:37,449 there were actually fewer responsive 1874 01:01:37,450 --> 01:01:39,159 hosts at certain times of day. 1875 01:01:39,160 --> 01:01:41,439 So this is their reasons you might expect 1876 01:01:41,440 --> 01:01:41,769 that to 1877 01:01:41,770 --> 01:01:42,849 happen, that if some 1878 01:01:42,850 --> 01:01:44,679 of these devices are things that people 1879 01:01:44,680 --> 01:01:46,779 are powering off at home while 1880 01:01:46,780 --> 01:01:48,129 they're asleep, they're things that 1881 01:01:48,130 --> 01:01:50,559 offices that are going to 1882 01:01:50,560 --> 01:01:52,119 that are going into hibernation mode, 1883 01:01:52,120 --> 01:01:53,139 their phones. 1884 01:01:53,140 --> 01:01:55,359 So I think it's a reflection of just the 1885 01:01:55,360 --> 01:01:58,029 aggregate diurnal pattern 1886 01:01:58,030 --> 01:01:59,409 of everyone 1887 01:01:59,410 --> 01:02:01,959 who's plugged into the address space. 1888 01:02:01,960 --> 01:02:02,769 I'd be happy to 1889 01:02:02,770 --> 01:02:04,869 talk about it more, but that's the 1890 01:02:04,870 --> 01:02:05,949 working hypothesis 1891 01:02:05,950 --> 01:02:07,899 because we can't actually find any 1892 01:02:07,900 --> 01:02:08,950 difference in. 1893 01:02:10,480 --> 01:02:13,209 We can't find it's not at all correlated 1894 01:02:13,210 --> 01:02:13,779 with any 1895 01:02:13,780 --> 01:02:15,879 local variation with 1896 01:02:15,880 --> 01:02:17,919 the cycle of local variation in network 1897 01:02:17,920 --> 01:02:19,899 traffic. In fact, we're seeing more 1898 01:02:19,900 --> 01:02:21,459 responsive stuff at times when 1899 01:02:21,460 --> 01:02:22,839 we'd expect our network to be more 1900 01:02:22,840 --> 01:02:24,419 heavily loaded. 1901 01:02:24,420 --> 01:02:26,589 You know, OK, before we're wrapping up, 1902 01:02:26,590 --> 01:02:28,149 I would like to remind everyone to please 1903 01:02:28,150 --> 01:02:29,739 provide us with some feedback because 1904 01:02:29,740 --> 01:02:31,689 that helps the speakers and that 1905 01:02:31,690 --> 01:02:33,339 presentations, and it also helps to have 1906 01:02:33,340 --> 01:02:35,499 a contrast. If you want to do that, you 1907 01:02:35,500 --> 01:02:36,949 hop online at the top line. 1908 01:02:36,950 --> 01:02:38,439 There is some feedback forms would be 1909 01:02:38,440 --> 01:02:40,419 really great if you could fill them out. 1910 01:02:40,420 --> 01:02:42,639 Last thing, don't for get to collect 1911 01:02:42,640 --> 01:02:44,139 your rubbish would be really great if you 1912 01:02:44,140 --> 01:02:46,089 take it out and we do recycle the bottle, 1913 01:02:46,090 --> 01:02:48,069 so there should be crates outside to put 1914 01:02:48,070 --> 01:02:49,809 them in and lasting. 1915 01:02:49,810 --> 01:02:51,249 One more round of applause for Alex. 1916 01:02:51,250 --> 01:02:52,250 Thank you very much.