1 00:00:00,000 --> 00:00:18,230 *35C3 preroll music* 2 00:00:18,230 --> 00:00:23,140 Herald-Angel: The next talk is by Hannes Mehnert, you can see him here already. 3 00:00:23,140 --> 00:00:30,689 It's called Transmission Control Protocol, also known as TCP and Hannes Mehnert works 4 00:00:30,689 --> 00:00:35,910 at a non-profit organization in Berlin. It's called Center for the cultivation of 5 00:00:35,910 --> 00:00:45,530 technology. And he also works on an open ... no, Mirage OS. If you don't know it, 6 00:00:45,530 --> 00:00:51,870 maybe you can find out what it is. And he researches in several engineering areas 7 00:00:51,870 --> 00:00:58,180 such as: programming languages, network protocols, security protocols and many 8 00:00:58,180 --> 00:01:02,370 many more. So give him a warm applause for his talk. 9 00:01:02,370 --> 00:01:10,770 *Applause* 10 00:01:10,770 --> 00:01:12,390 Hannes Mehnert: Thank you. Yes, today I 11 00:01:12,390 --> 00:01:17,070 want to talk a bit about the Transmission Control Protocol and the Internet Protocol 12 00:01:17,070 --> 00:01:23,940 suite. So, what is it all about? It's a foundation talk here, so if you already 13 00:01:23,940 --> 00:01:30,020 know TCP/IP by heart then maybe only the last five minutes will be of interest for 14 00:01:30,020 --> 00:01:36,250 you, otherwise. So, if you want to connect your laptop or if you want to browse to a 15 00:01:36,250 --> 00:01:42,540 Web site somewhere, you want to read that Web site, it is that, the client on your 16 00:01:42,540 --> 00:01:49,680 laptop or the browser that sends an HTTP request to the web server host. So it sends 17 00:01:49,680 --> 00:01:59,190 an HTTP request which is specified by the HTTP protocol. It's maybe GET /. It's a 18 00:01:59,190 --> 00:02:05,830 common method of getting the main page of a website, but how is this information 19 00:02:05,830 --> 00:02:10,789 actually transmitted to the server? That is the question and the motivation for 20 00:02:10,789 --> 00:02:18,690 this talk. So that is something I want to go deep into the answer for the question. 21 00:02:18,690 --> 00:02:24,040 So let's look a bit at the network topology. So, on the left hand side we 22 00:02:24,040 --> 00:02:29,700 have the laptop which sends to some server a GET request. You can see that by the 23 00:02:29,700 --> 00:02:35,520 dashed arrow and the laptop itself is connected, likely via a wireless network 24 00:02:35,520 --> 00:02:41,670 to the Internet. But what is actually the Internet? Well, the Internet is a 25 00:02:41,670 --> 00:02:46,950 collection of computers and your laptop or anyone's mobile phone is likely connected 26 00:02:46,950 --> 00:02:53,959 to a router, a router is just a normal computer which has some knowledge about 27 00:02:53,959 --> 00:03:00,220 the network and that router is likely connected via fiber or a satellite or any 28 00:03:00,220 --> 00:03:13,400 other link, can also be an ethernet cable to another router or to several routers. 29 00:03:13,400 --> 00:03:19,000 And in this picture you can only see two routers, the router A and router B, but 30 00:03:19,000 --> 00:03:23,819 there may be any number of routers or nearly any number of routers in between 31 00:03:23,819 --> 00:03:29,340 you and the server. So here the router B is connected via Ethernet, which is just a 32 00:03:29,340 --> 00:03:36,650 physical cable to the server and Ethernet is a protocol which is talked over the 33 00:03:36,650 --> 00:03:43,740 cable. So I won't to go into the physical network connectivity like fibers and 34 00:03:43,740 --> 00:03:53,379 satellite and cables and copper cables, in this talk at all but I will start with the 35 00:03:53,379 --> 00:04:01,550 layer which is on top of the physical medium. So the first one is the data link 36 00:04:01,550 --> 00:04:08,510 layer. And, well, what is the data link layer? What's task is it? It has a scope 37 00:04:08,510 --> 00:04:15,240 of a network and it only spans over the local network to which a host is 38 00:04:15,240 --> 00:04:20,440 connected. So in this picture, only the laptop and the router A share the same 39 00:04:20,440 --> 00:04:27,320 data link layer as well as the router B and the router A, they share the same data 40 00:04:27,320 --> 00:04:32,020 link layer. It's also the case that router B and the server share the same data link 41 00:04:32,020 --> 00:04:37,610 layer. What is the task of the data link layer? Well it's pretty easy, it just 42 00:04:37,610 --> 00:04:43,000 moves Internet layer packets between two different hosts that sit on the same link. 43 00:04:43,000 --> 00:04:50,280 So, the data link layers really it's only purpose is to provide an abstraction over 44 00:04:50,280 --> 00:04:56,770 the physical thing and how many bytes you can transport on the physical media over 45 00:04:56,770 --> 00:05:04,759 the link. So the next layer is already the Internet layer. The internet layer which 46 00:05:04,759 --> 00:05:09,960 task is to transport packets across multiple networks. So as you have seen in 47 00:05:09,960 --> 00:05:16,419 the diagram there are router A and router B, they are both connected to several data 48 00:05:16,419 --> 00:05:21,360 link layers and they use the Internet layer in order to transport packets 49 00:05:21,360 --> 00:05:30,440 across. The Internet layer solves already the issue of addressing by providing for 50 00:05:30,440 --> 00:05:37,050 every host an IP address. IP address is actually the Internet Protocol address and 51 00:05:37,050 --> 00:05:42,220 the Internet layer provides another task or solves another task which is routing so 52 00:05:42,220 --> 00:05:48,120 it forwards packets to the next router which is hope fully closer to the final 53 00:05:48,120 --> 00:05:53,940 destination. That is the task. The Internet layer also has support for 54 00:05:53,940 --> 00:06:00,729 fragmentation. So if your higher layer sends something which is way too big for 55 00:06:00,729 --> 00:06:06,199 the data link layer, then the Internet layer can fragment that and the other side 56 00:06:06,199 --> 00:06:14,100 has to reassemble it. What is on top of the Internet layer is the transport layer. 57 00:06:14,100 --> 00:06:19,979 So the transport layer establishes host to host connectivity. It does multiplexing 58 00:06:19,979 --> 00:06:26,890 usually using source and destination ports and there are two widely used transport 59 00:06:26,890 --> 00:06:33,320 layer protocols, which I will go into more detail in this talk, which is the user 60 00:06:33,320 --> 00:06:38,350 datagram protocol and the transmission control protocol, that's UDP and TCP and 61 00:06:38,350 --> 00:06:48,960 they have different properties. So UDP is unreliable and it is not ordered and it is 62 00:06:48,960 --> 00:06:55,130 only an abstraction over datagrams and it has on the advantage side, a very low 63 00:06:55,130 --> 00:07:05,539 overhead. Whereas TCP is a reliable and ordered byte stream, so you have a 64 00:07:05,539 --> 00:07:10,509 reliable byte stream which you can work on. The downside of TCP is that it's 65 00:07:10,509 --> 00:07:16,690 connection establishment and teardown is slightly more complex. In UDP, you just 66 00:07:16,690 --> 00:07:22,160 don't have to establish a connection and teardown and connect. But in TCP you have 67 00:07:22,160 --> 00:07:30,700 to synchronize the two hosts. Then on top of the transport layer we have the 68 00:07:30,700 --> 00:07:35,770 application layer and the application layer just exchanges application data over 69 00:07:35,770 --> 00:07:43,569 the transport layer. So some examples for application layers are HTTP or TLS or DNS. 70 00:07:43,569 --> 00:07:51,930 So in the first example we saw there was HTTP and HTTP was used to send the GET 71 00:07:51,930 --> 00:07:57,050 request. So that is all application layer, which I won't focus on in this talk at 72 00:07:57,050 --> 00:08:03,580 all. For the lower layers the application layers, just payload so it's just some 73 00:08:03,580 --> 00:08:10,600 arbitrary data. So if we look again at that picture and we draw the different 74 00:08:10,600 --> 00:08:16,729 layers which are supported or which are used by the different devices, we end up 75 00:08:16,729 --> 00:08:22,800 with a diagram similar to that. So here on the left we have the laptop again which 76 00:08:22,800 --> 00:08:30,490 has all four layers and then we have the routers in the middle which are only using 77 00:08:30,490 --> 00:08:34,969 the data link and the Internet layer and then on the right hand side we have the 78 00:08:34,969 --> 00:08:41,779 server which also has all four layers. So the transport layer is really host to 79 00:08:41,779 --> 00:08:49,189 host, so the TCP we saw earlier, the TCP is establishing a connection from the 80 00:08:49,189 --> 00:08:54,910 laptop to the server and on top of TCP. So on top of the transport layer there is the 81 00:08:54,910 --> 00:09:00,880 process to process communication so the application layer which is the web browser 82 00:09:00,880 --> 00:09:06,639 talking to the web server. So only on the highest layer here, we have the GET 83 00:09:06,639 --> 00:09:13,850 request. And the routers in the middle they don't have to inspect or they don't 84 00:09:13,850 --> 00:09:20,779 have to use information of the transport or application layer from the laptop or 85 00:09:20,779 --> 00:09:29,959 the server. So the routers just for using the Internet layer they forward packets to 86 00:09:29,959 --> 00:09:35,239 the next router or to the final destination. So the laptop first sends the 87 00:09:35,239 --> 00:09:43,370 whole TCP segment or a TCP packet to the router and the router A decides, oh yeah 88 00:09:43,370 --> 00:09:48,939 we'll forward it to router B because router B is more closer to the final 89 00:09:48,939 --> 00:09:54,230 destination than myself and the router B says, oh yeah well I actually know and I'm 90 00:09:54,230 --> 00:09:58,519 connected via ethernet to the final destination so I will just forward it to 91 00:09:58,519 --> 00:10:07,819 the server. That's how the data flow of such a connection would look like. How 92 00:10:07,819 --> 00:10:13,320 does a packet actually look like? So we have seen that at the application layer we 93 00:10:13,320 --> 00:10:19,100 have the application data, which is here in blue and that one is just the GET 94 00:10:19,100 --> 00:10:23,829 request. Then the transport layer actually prefixes the application data with a 95 00:10:23,829 --> 00:10:30,139 header which is a common header that encodes some data, we will look into the 96 00:10:30,139 --> 00:10:38,631 TCP header in more detail soon, then the Internet layer also adds a header, a 97 00:10:38,631 --> 00:10:45,999 prefix: the IP header, which is just put in front of the TCP header. And then the 98 00:10:45,999 --> 00:10:51,749 data link layer, well that is the lowest layer we actually care about and that one 99 00:10:51,749 --> 00:10:58,179 will likely prepend a header and append a footer in order to synchronize or to make 100 00:10:58,179 --> 00:11:06,310 sure that the physical wire only has only sees a single packet at a time. So as you 101 00:11:06,310 --> 00:11:11,749 can see from the layering from those two pictures on the one side you have the 102 00:11:11,749 --> 00:11:17,309 bottom two up layer and every layer if you go down from the application to the 103 00:11:17,309 --> 00:11:22,769 transport, to the Internet, to data link, they basically add some header information 104 00:11:22,769 --> 00:11:28,360 and the Internet layer for example that takes the TCP header, so the transport 105 00:11:28,360 --> 00:11:33,559 layer and the application layer as payload, so it doesn't care that it is 106 00:11:33,559 --> 00:11:42,320 TCP, it could as well be UDP in this case. So what is actually in the. So I will not 107 00:11:42,320 --> 00:11:49,769 go into the data link layer details at all, but here is the header of an IP 108 00:11:49,769 --> 00:11:59,809 version 4 frame or packet and that one is at least 20 bytes, it contains various 109 00:11:59,809 --> 00:12:06,589 fields. The first one is a four bit version, which usually is version 4 in our 110 00:12:06,589 --> 00:12:14,870 current world. Then it has a 4 bits header length, which is a header length invert, 111 00:12:14,870 --> 00:12:23,061 so in multiples of 32 bits. Then it has some not really used stuff I won't deal 112 00:12:23,061 --> 00:12:28,600 with in this talk. It has the total length field with just 16 bits and it describes 113 00:12:28,600 --> 00:12:36,339 how long the entire IP frame is. Then it has an identification, which is also a 16 114 00:12:36,339 --> 00:12:43,429 bit unique number and the 16 bits for fragmentation flags and offset. And that 115 00:12:43,429 --> 00:12:48,579 is crucial. So, if the IP header decides: oh yeah well the package, the application 116 00:12:48,579 --> 00:12:53,269 data you sent me is way too big for this data link, I need to fragment it, then it 117 00:12:53,269 --> 00:12:57,829 will just reuse the very same identification number and then use here 118 00:12:57,829 --> 00:13:04,859 the 16 bits in the fragmentation flags and offset in order to portion that 119 00:13:04,859 --> 00:13:10,779 application data into multiple IP fragments. Then it has a field which is 8 120 00:13:10,779 --> 00:13:19,079 bit, so 1 entire byte, it's the time to live and it's actually not a timestamp but 121 00:13:19,079 --> 00:13:26,989 it's only a count. So how many routers should this package live. How long should 122 00:13:26,989 --> 00:13:33,709 this package live and every router decreases that time to live by 1. Then it 123 00:13:33,709 --> 00:13:41,350 has a 1 byte protocol field which specifies what is the type of the payload 124 00:13:41,350 --> 00:13:47,109 carried by this IP version 4 packet, then it has a 16 bit header checksum which is 125 00:13:47,109 --> 00:13:56,739 the CRC checksum to avoid that some bits got flipped on the transport. Then we can 126 00:13:56,739 --> 00:14:01,850 see the source IP address and the destination IP address which is, yeah I 127 00:14:01,850 --> 00:14:06,589 mean, the source IP address is the IP address of my laptop and the destination 128 00:14:06,589 --> 00:14:13,690 IP address, is the IP address of the server. And then after those 20 bytes you 129 00:14:13,690 --> 00:14:22,819 have either IP options, if the header length was more than 20 bytes or you have 130 00:14:22,819 --> 00:14:30,859 directly the payload. Now for the protocol field here there are various types and 131 00:14:30,859 --> 00:14:35,350 various types are predefined. 1 is ICMP which is the internet control message 132 00:14:35,350 --> 00:14:41,959 protocol, I will talk a bit about that which is the product field, there the 133 00:14:41,959 --> 00:14:47,911 number set to 1, then for TCP it's set to 6 and for UDP it's 17. We have other 134 00:14:47,911 --> 00:14:54,850 protocols which can be carried over an IP frame or an IP packet but I won't go into 135 00:14:54,850 --> 00:15:00,850 the details here. As you can see there are at least 255 numbers here in the product 136 00:15:00,850 --> 00:15:08,399 field. So because it's 8 bit long, you can store up to 256 different numbers in 137 00:15:08,399 --> 00:15:17,809 there. So ICMP is a protocol I haven't talked about at all but it is the internet 138 00:15:17,809 --> 00:15:23,269 control message protocol. So it sits on top of IP and its purpose is on the one 139 00:15:23,269 --> 00:15:29,579 side, to deliver error messages such as destination host unreachable or time to 140 00:15:29,579 --> 00:15:34,229 live exceeded and on the other side it can also carry operational information like 141 00:15:34,229 --> 00:15:43,429 diagnostics. There's one program which you may know which is called ping. The purpose 142 00:15:43,429 --> 00:15:48,809 of ping is to send an ICMP echo request to a remote host and the remote host is then 143 00:15:48,809 --> 00:15:56,569 supposed to send the very same packet with only 1 single bit flipped and send that back 144 00:15:56,569 --> 00:16:04,229 to you and that is an ICMP echo reply. And if you can successfully ping another host 145 00:16:04,229 --> 00:16:14,619 you can verify that the other host has at least ip connectivity up and online. OK, 146 00:16:14,619 --> 00:16:21,249 Let's look into the next layer which is the transport layer. And at first we will 147 00:16:21,249 --> 00:16:26,839 look into a UDP header. A UDP header has only eight bytes. it contains. It consists 148 00:16:26,839 --> 00:16:31,899 of a source port, the destination port. Then the length of the entire UDP frame 149 00:16:31,899 --> 00:16:41,209 and the checksum the checksum is again a 16 bit field computed over the entire 150 00:16:41,209 --> 00:16:49,000 payload and the header plus some IP pseudo header. So this actually carries the 151 00:16:49,000 --> 00:16:54,779 information of the source and destination ip address inside of itself. UDP as I 152 00:16:54,779 --> 00:17:01,899 mentioned is unreliable, unordered and its advantage is that the low overhead data 153 00:17:01,899 --> 00:17:11,220 grams as you can see it adds 8 bytes to the to the payload whereas IP already added 20 154 00:17:11,220 --> 00:17:19,579 bytes to the payload. Here's a simple Unix program which is a UDP client. This 155 00:17:19,579 --> 00:17:23,900 program does not compile because I left out some bits but in order to see what how 156 00:17:23,900 --> 00:17:30,230 do you actually use this whole IP stack. So the IP stack the TCP IP stack is 157 00:17:30,230 --> 00:17:37,320 usually embedded in the kernel and as a programmer as an API programmer you have the 158 00:17:37,320 --> 00:17:44,940 API provided by the Unix sockets API. And that one usually contains of the very same 159 00:17:44,940 --> 00:17:52,090 five or seven functions which is the first one is a socket socket opens or creates a 160 00:17:52,090 --> 00:17:58,889 file descriptor and you specifie the address family and the socket type. So 161 00:17:58,889 --> 00:18:05,520 this is the adress family Internet and the socket is a datagram socket. It's called 162 00:18:05,520 --> 00:18:14,330 DGRAM in Unix. Once that is created then you for a UDP client and you just say Oh I 163 00:18:14,330 --> 00:18:20,220 will use the function "send to" which takes a socket file descriptor. So just 164 00:18:20,220 --> 00:18:25,240 the file descriptor and then some data and will just send it to the other side since 165 00:18:25,240 --> 00:18:33,039 it's unreliable it's just fire and forget. Then afterwards we close the socket file 166 00:18:33,039 --> 00:18:39,250 descriptor because we are nice here and we try to be nice the other side. So if you 167 00:18:39,250 --> 00:18:44,309 don't have a UDP client but if you want to implement a UDP server or a UDP listener 168 00:18:44,309 --> 00:18:49,429 what do you do is you again create a socket then you have the function which is 169 00:18:49,429 --> 00:18:57,250 called bind. Bind binds it to a combined into a specific IP address on your server 170 00:18:57,250 --> 00:19:03,279 or on your network stack then you say receive from recieve from takes the socket 171 00:19:03,279 --> 00:19:10,179 file descriptor and a buffer and some maximum size and an offset. And yeah you 172 00:19:10,179 --> 00:19:18,409 just receive from will only return once you actually receive the UDP frame on that 173 00:19:18,409 --> 00:19:26,690 IP address and port and then you then we print out that we received some packets 174 00:19:26,690 --> 00:19:35,420 and we close the socket file descriptor. So that's UDP. UDP is used for a variety 175 00:19:35,420 --> 00:19:42,200 of protocols and that's crucial to have it TCP on the other hand is a bit bigger. So 176 00:19:42,200 --> 00:19:47,970 instead of eight bytes header TCP adds another 20 bytes of header what does the 177 00:19:47,970 --> 00:19:53,559 TCP header contain? Well similar to UDP it contains a source port and Destination Port 178 00:19:53,559 --> 00:19:59,610 both again 16 bits then it contains two sequence numbers one is the sequence 179 00:19:59,610 --> 00:20:05,549 number itself it's a 32 bit number and one is the acknowledgement number which is the 180 00:20:05,549 --> 00:20:13,159 last sequence number we have seen from the other side then TCP contains a data offset 181 00:20:13,159 --> 00:20:20,850 date offset is similar to the header length field so TCP a TCP segment may also 182 00:20:20,850 --> 00:20:26,980 contain some options so the header may contain options before a payload. That's 183 00:20:26,980 --> 00:20:31,519 why we need a data set field in order to be able to find out where this extra 184 00:20:31,519 --> 00:20:38,611 payload start then TCP has certain flags and some of these flags a some of these 185 00:20:38,611 --> 00:20:43,610 flags are just a single bit of values and some of them I mentioned down here of 186 00:20:43,610 --> 00:20:48,440 which I will go into more detail later which is acknowledgement or ACK 187 00:20:48,440 --> 00:20:54,490 synchronize or SYN and finished or FIN there is also reset and some urgent stuff 188 00:20:54,490 --> 00:20:59,779 I will not go into detail of that then we have a six sixteen bit field which is the 189 00:20:59,779 --> 00:21:08,909 window size which is the size of the receive buffer then we have again sixteen 190 00:21:08,909 --> 00:21:13,980 bit field checksum and then we have some space for the urgent stuff I will not go 191 00:21:13,980 --> 00:21:22,120 into detail. A TCP client if you program it in a Unix way. You have a very similar 192 00:21:22,120 --> 00:21:28,740 API as we have seen in the UDP. So we first call. We first create a file 193 00:21:28,740 --> 00:21:35,799 descriptor using the socket system call which we give again the adress family INET 194 00:21:35,799 --> 00:21:44,080 and the SOCK_STREAM which is the since we are stream oriented. It's it's the name of 195 00:21:44,080 --> 00:21:54,549 the TCP. It's the name for TCP socket. Then as a TCP client we connect using the 196 00:21:54,549 --> 00:22:00,889 socket file descriptor to a remote host. And then once we are connected so connect 197 00:22:00,889 --> 00:22:09,760 will only return once a TCP session has been established. Then we say here receive. So 198 00:22:09,760 --> 00:22:13,909 we receive on the socket file descriptor. The specific buffer buffer then we print 199 00:22:13,909 --> 00:22:24,700 it and then we close the socket file descriptor again. The TCP listener is very 200 00:22:24,700 --> 00:22:31,881 similar. So well first we create a socket. Then we bind it and bind specifies the IP 201 00:22:31,881 --> 00:22:38,480 address and also the port number. Then we use a function called listen on the socket 202 00:22:38,480 --> 00:22:47,920 file descriptor and then we enter a loop. And so now we wait for client connections 203 00:22:47,920 --> 00:22:53,610 which appear at some point. And for every client connection we well we call accept 204 00:22:53,610 --> 00:22:59,030 and accept returns whenever there was a client which successfully established a 205 00:22:59,030 --> 00:23:08,580 TCP connection. What accept returns is a new file descriptor so another file 206 00:23:08,580 --> 00:23:13,080 descriptor not the same as a socket file descriptor. So the socket file descriptor 207 00:23:13,080 --> 00:23:20,299 we call again except on it at a later point. Usually you then handle any work on 208 00:23:20,299 --> 00:23:26,409 the client connection on this new_fd. You handle that in a separate process or set a 209 00:23:26,409 --> 00:23:31,399 separate thread or a separate task in order to enable the server to accept 210 00:23:31,399 --> 00:23:39,269 another connection while you are handling the the one client connection. Then we 211 00:23:39,269 --> 00:23:47,289 just do some printf output and we send hello world to the client to the client 212 00:23:47,289 --> 00:23:52,720 connection so to this new file descriptor then we close it and we start from the 213 00:23:52,720 --> 00:23:59,389 while(1) and we accept a new client socket. So that is TCP listener as we have 214 00:23:59,389 --> 00:24:11,580 seen it as you will see it in an in any network program. Now TCP as I mentioned it 215 00:24:11,580 --> 00:24:17,870 has to do some work in order to establish a section a session and to tear down the 216 00:24:17,870 --> 00:24:24,419 main work which needs to be done is to synchronize the initial sequence numbers 217 00:24:24,419 --> 00:24:29,759 because we have seen in the header that we have this sequence number and somehow we 218 00:24:29,759 --> 00:24:35,639 need to transform that information to the other side. So here's the TCP state machine 219 00:24:35,639 --> 00:24:40,610 which is which has initially been part of the RFC, which is the specification 220 00:24:40,610 --> 00:24:47,690 for TCP and also duplicated in books like Stevens design and implementation of the 221 00:24:47,690 --> 00:24:55,440 of TCP IP and TCP IP illustrated and so on. So you can see it is it has here one 222 00:24:55,440 --> 00:25:00,030 specific state which is listen and listen is as we've seen in the server 223 00:25:00,030 --> 00:25:05,009 implementation if you call listen then you are in the listen stand in the listen 224 00:25:05,009 --> 00:25:12,190 state and you always start well you always end up in the in the closed state after 225 00:25:12,190 --> 00:25:19,270 you've called Close. Basically I will go into more detail of connection 226 00:25:19,270 --> 00:25:23,889 establishment and teardown right now. So on the connection establishment we have 227 00:25:23,889 --> 00:25:32,080 seen on the client side we start with the socket in the closed state. Then we say 228 00:25:32,080 --> 00:25:41,860 the Unix call connect on that socket and that connect. Does send an initial TCP 229 00:25:41,860 --> 00:25:48,990 segment to the server side which has the synchronized flag set to true or set to 230 00:25:48,990 --> 00:25:57,999 one and the sequence number is some artificial number some random 32 bits 231 00:25:57,999 --> 00:26:07,649 integer number. So just call it "a" here. The State of the file descriptor goes from 232 00:26:07,649 --> 00:26:16,419 closed to SYN_SENT and SYN_SENT yeah well we just have sent out the the synchronize 233 00:26:16,419 --> 00:26:23,259 segment so aTCP segment which doesn't carry any data but only the TCP header on the 234 00:26:23,259 --> 00:26:30,490 server side. We had prepared previously. We started in a closed state then we 235 00:26:30,490 --> 00:26:36,730 called Listen. Then we end up in a listen state. Now in the listen state we call 236 00:26:36,730 --> 00:26:43,039 accept and accept blocks until the SYN is received and once the SYN is received, a 237 00:26:43,039 --> 00:26:48,580 new socket is a new file descriptor spawned and that one ends up in the 238 00:26:48,580 --> 00:26:55,850 SYN_RECIEVED state. The server sends out the TCP segment again without any data but 239 00:26:55,850 --> 00:26:59,840 the SYN and acknowledgement flags are set and the sequence number is sent to some b 240 00:26:59,840 --> 00:27:05,649 and the acknowledgement number is set to a+1. So the acknowledgement number 241 00:27:05,649 --> 00:27:13,830 acknowledges that the SYN was received with the sequence number a +1. Upon 242 00:27:13,830 --> 00:27:19,320 the client receiving that SYN and ACK, it is in the established state and it will send out 243 00:27:19,320 --> 00:27:26,580 an acknowledgement segment so that the other side the server knows. Oh yeah. My 244 00:27:26,580 --> 00:27:31,669 segment has been received and that one is sent with the sequence number of a+1 245 00:27:31,669 --> 00:27:40,710 because a was already used here and the SYN flag consumes one one byte or 1 in the 246 00:27:40,710 --> 00:27:45,330 sequence number range and the acknowledgement number is also set to 247 00:27:45,330 --> 00:27:51,230 b + 1. So that is the sequence number from here plus 1. Once that is received 248 00:27:51,230 --> 00:27:57,639 the server ends up in the established state. Sequence numbers. Yeah well it's a 249 00:27:57,639 --> 00:28:02,019 good idea of both. Pick a random initial sequence number for each connection 250 00:28:02,019 --> 00:28:08,990 otherwise we can get into some nasty attacks. The acknowledgement number is the 251 00:28:08,990 --> 00:28:14,700 next sequence number from the other hosts and the sequence numbers always increased 252 00:28:14,700 --> 00:28:20,740 for each byte of data. And for the SYN and FIN flags which only a single bits each 253 00:28:20,740 --> 00:28:25,529 sequence number must be not acknowledged and each sent packet is retransmited unless 254 00:28:25,529 --> 00:28:33,269 it is acknowledged after a certain timeout and after certain retransmit time after trying 255 00:28:33,269 --> 00:28:39,600 it several times at some point the TCP stack gives up the trardown since I am a 256 00:28:39,600 --> 00:28:45,590 bit short on time I will skip that. TCP provides us with the flow control. What 257 00:28:45,590 --> 00:28:51,610 does that mean. Well every network stack has received so the Kernel has a receive 258 00:28:51,610 --> 00:28:57,399 buffer for each TCP connection and that buffer is size limited to avoid Kernel 259 00:28:57,399 --> 00:29:02,960 memory exhaustion which means that whenever the application so the web server 260 00:29:02,960 --> 00:29:10,960 or the web browser is reading data some buffer spaces are reclaimed and when TCP 261 00:29:10,960 --> 00:29:17,930 segments are arriving. Some of that buffer is consumed. It's a sliding window and we 262 00:29:17,930 --> 00:29:22,309 have seen in the TCP header is is a windows size so there is a 16 bit field 263 00:29:22,309 --> 00:29:29,679 called window size which specifies how many more bytes my TCP Stack has for 264 00:29:29,679 --> 00:29:42,169 receiving data from the other side to avoid deadlocks there is also a timer in a 265 00:29:42,169 --> 00:29:47,159 timer called the persist timer which is started when the window is when the window 266 00:29:47,159 --> 00:29:54,460 size is zero and that then at a time out try retransmits TCP segment in order to 267 00:29:54,460 --> 00:30:00,219 get information about the new window size from the other side. Congestion control 268 00:30:00,219 --> 00:30:07,749 I will also skip a bit but the main idea is to control the rate of data entering the 269 00:30:07,749 --> 00:30:17,019 network because if you're using multiple routers at some point you may saturate 270 00:30:17,019 --> 00:30:23,110 some of the network links and that is avoided in TCP by doing by applying 271 00:30:23,110 --> 00:30:28,820 congestion control which measures for example the time between segments sent and 272 00:30:28,820 --> 00:30:34,869 acknowledgement received also has to do with a slow start and how your window size 273 00:30:34,869 --> 00:30:41,100 your window buffer grows. Acknowledgments. Well there are some 274 00:30:41,100 --> 00:30:47,809 strategys the basic one is every segment is acknowledged individually there's 275 00:30:47,809 --> 00:30:54,059 delayed ACK where you collect multiple segments to acknowledge them at a certain 276 00:30:54,059 --> 00:30:59,289 time then you have also selective acknowledgements where you can acknowledge 277 00:30:59,289 --> 00:31:08,489 discontinuous segments which helps for lowering the amount of retransmissions. TCP 278 00:31:08,489 --> 00:31:14,320 also carries some maximum segment size to avoid fragmentation. Actually on the IP 279 00:31:14,320 --> 00:31:18,739 layer because to this partially open there's some struggle because you have 280 00:31:18,739 --> 00:31:25,749 simultaneous open so what if both parties want to open a connection at the very same 281 00:31:25,749 --> 00:31:30,710 time. Then you have a flag which is called the reset in order to terminate a 282 00:31:30,710 --> 00:31:37,110 connection. There are some extensions like Window scaling and fast open to improve 283 00:31:37,110 --> 00:31:42,489 the throughput and also to lower the delay. There are some attacks like denial 284 00:31:42,489 --> 00:31:47,759 of service. So if your server implementation accepts something and 285 00:31:47,759 --> 00:31:52,669 allocates a lot of memory for a client which doesn't do a lot but just sending a 286 00:31:52,669 --> 00:31:58,629 SYN frame that is bad and leads to denial of service connection hijacking if you can 287 00:31:58,629 --> 00:32:05,710 predict the sequence numbers then you can hijack and emit data into an established 288 00:32:05,710 --> 00:32:10,049 connection. There have been some blind in- window attacks. What does that mean that 289 00:32:10,049 --> 00:32:17,330 even without knowing the sequence number you can do something on an established TCP 290 00:32:17,330 --> 00:32:25,710 connection such as, yeah sending a reset or sending a FIN frame and tearing that 291 00:32:25,710 --> 00:32:32,760 connection down. The specification for TCP is written in English prose in a 292 00:32:32,760 --> 00:32:38,440 collection of RFC and there are some widely deployed implementations. During 293 00:32:38,440 --> 00:32:45,929 some research work in Cambridge over the last years me and various colleagues 294 00:32:45,929 --> 00:32:50,940 implemented a formal model developed in the interactive theorem prover HOL4, which 295 00:32:50,940 --> 00:32:56,640 has a precise specification with implementation looseness and we really use 296 00:32:56,640 --> 00:33:03,390 that as an input so the sockets API and interface for getting the TCP control 297 00:33:03,390 --> 00:33:08,739 block, which is the host internal state of the TCP and then the wire interface which 298 00:33:08,739 --> 00:33:16,779 is data received and sent on that. And we use that formal model to validate itself, 299 00:33:16,779 --> 00:33:23,320 so we used actual implementations to do that, we use it to draw some diagrams 300 00:33:23,320 --> 00:33:29,100 where you can see the rules which fires on the left hand side, when something 301 00:33:29,100 --> 00:33:36,200 happened like there was a CONNECT called and then the logical rule connect_1 was 302 00:33:36,200 --> 00:33:42,880 used in the label transition system. Then we see here as well some TCP segments 303 00:33:42,880 --> 00:33:51,779 which are going out and then what are the contributions of the network semantics, we 304 00:33:51,779 --> 00:34:00,130 checked the model, we validated the model by recording traces and executing them. We 305 00:34:00,130 --> 00:34:03,419 published a paper called Engineering with Logic: Rigorous Test-Oracle Specification 306 00:34:03,419 --> 00:34:09,280 and Validation for TCP/IP and the unix Sockets API. The specification itself is 307 00:34:09,280 --> 00:34:16,280 typeset in 384 pages, that's all the transitions you basically need. So roughly 308 00:34:16,280 --> 00:34:22,359 10000 lines of HOL4 code and a lot of comments, where we embedded a lot of 309 00:34:22,359 --> 00:34:27,580 LaTex code. And the Unix TCP/IP stack has usually around 15000 lines of code, the 310 00:34:27,580 --> 00:34:37,219 TCP state machine we saw earlier, is here in this paragraph, in this diagram and we 311 00:34:37,219 --> 00:34:43,199 try to draw a more correct TCP state machine which led us to this picture, 312 00:34:43,199 --> 00:34:48,750 which is a bit more complicated. We have this state NONEXIST up here and we have 313 00:34:48,750 --> 00:34:55,159 much more transitions due to timers and so on. So the state machine used in common 314 00:34:55,159 --> 00:35:02,660 literature is actually not complete or not precise and we have a revision for that. 315 00:35:02,660 --> 00:35:10,380 The conclusion is you have all TCP/IP widely deployed. I hope I managed to give 316 00:35:10,380 --> 00:35:15,869 you some insight how TCP/IP actually works. It's a layered architecture which 317 00:35:15,869 --> 00:35:23,710 is agnostic of underlying layers and in the network semantics working, we had an 318 00:35:23,710 --> 00:35:29,170 executable specification. That's all I have to say and I welcome you to ask any 319 00:35:29,170 --> 00:35:33,090 questions, either now or offline. *Applause* 320 00:35:33,090 --> 00:35:41,090 *Applause* 321 00:35:41,090 --> 00:35:44,790 Herald-Angel: Thank you. So, if you have any questions just go to the microphones. 322 00:35:44,790 --> 00:35:51,420 We have two here and two on the right side. And do we have some question from 323 00:35:51,420 --> 00:36:05,160 the Internet? No questions? No questions? Yeah, 1 question, come on, don't be shy. 324 00:36:05,160 --> 00:36:11,901 Microphone 1: Right. Hi thanks, that was a very interesting talk. So your model, does 325 00:36:11,901 --> 00:36:16,950 it allow synthesizing a implementation from the specification, or is it used 326 00:36:16,950 --> 00:36:21,830 mostly for validating? HM: It's at the moment used for validation 327 00:36:21,830 --> 00:36:25,541 because we have the specification looseness. So we have an implementation 328 00:36:25,541 --> 00:36:30,750 looseness. So at some point in implementation you have to choose whether you take one 329 00:36:30,750 --> 00:36:36,090 transition or the other one. So if you go into a failure state or if you go into a 330 00:36:36,090 --> 00:36:41,830 success or if you transmit some piece of data and go into a success state. So we 331 00:36:41,830 --> 00:36:47,940 don't have synthesized any implementation but there is ongoing work to use it as the 332 00:36:47,940 --> 00:36:50,349 implementation, as a base for an implementation. 333 00:36:50,349 --> 00:36:55,980 Mic: Okay. And do you think that such an implementation can be made, can it be made 334 00:36:55,980 --> 00:37:00,290 efficient as well, once synthesized? HM: Yes. 335 00:37:00,290 --> 00:37:05,969 Mic 1: Okay, thanks. *Applause* 336 00:37:05,969 --> 00:37:08,949 Herald-Angel: Yeah, your question please. 337 00:37:08,949 --> 00:37:13,859 Microphone 2: Yeah, thank you. How independent is TCP from IP? I mean, can 338 00:37:13,859 --> 00:37:21,730 you integrate TCP over different protocols like Bluetooth or something like that? 339 00:37:21,730 --> 00:37:26,319 HM: *Sighs* Since TCP requires for error messages, a 340 00:37:26,319 --> 00:37:32,619 bit of ICMP, I haven't seen any TCP implementation on top of any other medium 341 00:37:32,619 --> 00:37:34,809 than IP. Mic 2: Okay. 342 00:37:34,809 --> 00:37:41,980 HM: So, I don't know, but I can think of it. Could work. 343 00:37:41,980 --> 00:37:48,140 Herald-Angel: Okay, your question please. Microphone 3: Thank you. Hello. So you 344 00:37:48,140 --> 00:37:55,270 used HOL4 for the specification part. Did you actually need the higher order logic 345 00:37:55,270 --> 00:37:59,230 part of HOL, or would it be possible to just use predicate logics? 346 00:37:59,230 --> 00:38:10,040 HM: *Sighs* I, I will have to reread. I think we need 347 00:38:10,040 --> 00:38:14,819 actually some higher order logic for it, for the whole state and the transitions. 348 00:38:14,819 --> 00:38:20,170 Mic 3: It would be interesting to meet and... 349 00:38:20,170 --> 00:38:24,440 HM: Yes, well, the paper has been published at the journal of ACM and 350 00:38:24,440 --> 00:38:30,710 luckily scihub.is is available and you can download it for free from there. 351 00:38:30,710 --> 00:38:34,250 *Laughs* Mic 3: Okay, thanks. 352 00:38:34,250 --> 00:38:40,670 Herald-Angel: Any more questions? No? Then, thank you Hannes. A warm applause 353 00:38:40,670 --> 00:38:49,800 for Hannes please. *Applause* 354 00:38:49,800 --> 00:38:55,530 *postroll music* 355 00:38:55,530 --> 00:39:13,000 subtitles created by c3subtitles.de in the year 2018. Join, and help us!