1
00:00:00,000 --> 00:00:13,635
*34c3 intro*

2
00:00:13,635 --> 00:00:19,891
Herald: The next talk will be about
embedded systems security and Pascal, the

3
00:00:19,891 --> 00:00:25,931
speaker, will explain how you can hijack
debug components for embedded security in

4
00:00:25,931 --> 00:00:33,170
ARM processors. Pascal is not only an
embedded software security engineer but

5
00:00:33,170 --> 00:00:39,100
also a researcher in his spare time.
Please give a very very warm

6
00:00:39,100 --> 00:00:41,910
welcoming good morning applause to
Pascal.

7
00:00:41,910 --> 00:00:48,010
*applause*

8
00:00:48,010 --> 00:00:54,489
Pascal: OK, thanks for the introduction.
As it was said, I'm an engineer by day in

9
00:00:54,489 --> 00:00:59,483
a French company where I work as an
embedded system security engineer. But

10
00:00:59,483 --> 00:01:04,459
this talk is mainly about my spare-time
activity which is researcher, hacker or

11
00:01:04,459 --> 00:01:10,659
whatever you call it. This is because I
work with a PhD student called Muhammad

12
00:01:10,659 --> 00:01:17,640
Abdul Wahab. He's a third year PhD student
in a French lab. So, this talk will be

13
00:01:17,640 --> 00:01:23,070
mainly a representation on his work about
embedded systems security and especially

14
00:01:23,070 --> 00:01:29,990
debug components available in ARM
processors. Don't worry about the link. At

15
00:01:29,990 --> 00:01:34,189
the end, there will be also the link with
all the slides, documentations and

16
00:01:34,189 --> 00:01:42,479
everything. So, before the congress, I
didn't know about what kind of background

17
00:01:42,479 --> 00:01:46,780
you will need for my talk. So, I
put there some links, I mean some

18
00:01:46,780 --> 00:01:51,710
references of some talks where you will
have all the vocabulary needed to

19
00:01:51,710 --> 00:01:57,490
understand at least some parts of my talk.
About computer architecture and embedded

20
00:01:57,490 --> 00:02:03,079
system security, I hope you had attended
the talk by Alastair about the formal

21
00:02:03,079 --> 00:02:09,440
verification of software and also the talk
by Keegan about Trusted Execution

22
00:02:09,440 --> 00:02:17,610
Environments (TEEs such as TrustZone).
And, in this talk, I will also talk about

23
00:02:17,610 --> 00:02:25,880
FPGA stuff. About FPGAs, there was a talk
on day 2 about FPGA reverse engineering.

24
00:02:25,880 --> 00:02:31,180
And, if you don't know about FPGAs, I hope
that you had some time to go to the

25
00:02:31,180 --> 00:02:37,889
OpenFPGA assembly because these guys are
doing a great job about FPGA open-source

26
00:02:37,889 --> 00:02:46,950
tools. When you see this slide, the first
question is that why I put "TrustZone is

27
00:02:46,950 --> 00:02:53,590
not enough"? Just a quick reminder about
what TrustZone is. TrustZone is about

28
00:02:53,590 --> 00:03:03,600
separating a system between a non-secure
world in red and a secure world in green.

29
00:03:03,600 --> 00:03:09,290
When we want to use the TrustZone
framework, we have lots of hardware

30
00:03:09,290 --> 00:03:16,700
components, lots of software components
allowing us to, let's say, run separately

31
00:03:16,700 --> 00:03:24,750
a secure OS and a non-secure OS. In our
case, what we wanted to do is to use the

32
00:03:24,750 --> 00:03:31,450
debug components (you can see it on the
left side of the picture) to see if we can

33
00:03:31,450 --> 00:03:39,300
make some security with it. Furthermore,
we wanted to use something else than

34
00:03:39,300 --> 00:03:45,460
TrustZone because if you have attended the
talk about the security in the Nintendo

35
00:03:45,460 --> 00:03:51,150
Switch, you can see that the TrustZone
framework can be bypassed under specific

36
00:03:51,150 --> 00:03:58,970
cases. Furthermore, this talk is something
quite complimentary because we will do

37
00:03:58,970 --> 00:04:07,900
something at a lower level, at the
processor architecture level. I will talk

38
00:04:07,900 --> 00:04:14,730
in a later part of my talk about what we
can do between TrustZone and the approach

39
00:04:14,730 --> 00:04:21,250
developed in this work. So, basically, the
presentation will be a quick introduction.

40
00:04:21,250 --> 00:04:27,320
I will talk about some works aiming to use
debug components to make some security.

41
00:04:27,320 --> 00:04:33,570
Then, I will talk about ARMHEx which
is the name of the system we developed to

42
00:04:33,570 --> 00:04:37,640
use the debug components in a hardcore
processor. And, finally, some results and

43
00:04:37,640 --> 00:04:46,180
a conclusion. In the context of our
project, we are working with System-on-

44
00:04:46,180 --> 00:04:54,030
Chips. So, System-on-Chips are a kind of
devices where we have in the green part a

45
00:04:54,030 --> 00:04:58,785
processor. So it can be a single core,
dual core or even quad core processor.

46
00:04:58,785 --> 00:05:05,575
And another interesting part which is in
yellow in the image is the programmable

47
00:05:05,575 --> 00:05:09,531
logic. Which is also called an FPGA
in this case. And

48
00:05:09,531 --> 00:05:13,870
in this kind of System-on-
Chip, you have the hardcore processor,

49
00:05:13,870 --> 00:05:23,790
the FPGA and some links between those two
units. You can see here, in the red

50
00:05:23,790 --> 00:05:32,840
rectangle, one of the two processors. This
picture is an image of a System-on-Chip

51
00:05:32,840 --> 00:05:38,500
called Zynq provided by Xilinx which is
also a FPGA provider. In this kind of

52
00:05:38,500 --> 00:05:45,030
chip, we usually have 2 Cortex-A9
processors and some FPGA logic to work

53
00:05:45,030 --> 00:05:53,910
with. What we want to do with the debug
components is to work about Dynamic

54
00:05:53,910 --> 00:06:00,290
Information Flow Tracking. Basically, what
is information flow? Information flow is

55
00:06:00,290 --> 00:06:07,040
the transfer of information from an
information container C1 to C2 given a

56
00:06:07,040 --> 00:06:14,408
process P. In other words, if we take this
simple code over there: if you have 4

57
00:06:14,408 --> 00:06:24,100
variables (for instance, a, b, w and x),
the idea is that if you have some metadata

58
00:06:24,100 --> 00:06:31,990
in a, the metadata will be transmitted to
w. In other words, what kind of

59
00:06:31,990 --> 00:06:39,560
information will we transmit into the
code? Basically, the information I'm

60
00:06:39,560 --> 00:06:48,210
talking in the first block is "OK, this
data is private, this data is public" and

61
00:06:48,210 --> 00:06:55,248
we should not mix data which are public
and private together. Basically we can say

62
00:06:55,248 --> 00:07:00,440
that the information can be binary
information which is "public or private"

63
00:07:00,440 --> 00:07:08,620
but of course we'll be able to have
several levels of information. In the

64
00:07:08,620 --> 00:07:16,449
following parts, this information will be
called taint or even tags and to be a bit

65
00:07:16,449 --> 00:07:22,070
more simple we will use some colors to
say "OK, my tag is red or green" just to

66
00:07:22,070 --> 00:07:33,930
say if it's private or public data. As I
said, if the tag contained in a is red,

67
00:07:33,930 --> 00:07:42,240
the data contained in w will be red as
well. Same thing for b and x. If we have a

68
00:07:42,240 --> 00:07:48,920
quick example over there, if we look at a
buffer overflow. In the upper part of the

69
00:07:48,920 --> 00:07:57,100
slide you have the assembly code and on
the lower part, the green columns will be

70
00:07:57,100 --> 00:08:03,600
the color of the tags. On the right side
of these columns you have the status of

71
00:08:03,600 --> 00:08:10,940
the different registers. This code is
basically: OK, when my input is red at the

72
00:08:10,940 --> 00:08:19,900
beginning, we just use the tainted input
into the index variable. The register 2

73
00:08:19,900 --> 00:08:28,210
which contains the idx variable will be
red as well. Then, when we want to access

74
00:08:28,210 --> 00:08:36,979
buffer[idx] which is the second line in
the C code at the beginning, the

75
00:08:36,979 --> 00:08:43,568
information we have there will be red as
well. And, of course, the result of the

76
00:08:43,568 --> 00:08:50,101
operation which is x will be red as well.
Basically, that means that if there is a

77
00:08:50,101 --> 00:08:57,050
tainted input at the beginning, we must
be able to transmit this information until

78
00:08:57,050 --> 00:09:03,389
the return address of this code just to
say "OK, if this tainted input is private,

79
00:09:03,389 --> 00:09:12,470
the return adress at the end of the code
should be private as well". What can we do

80
00:09:12,470 --> 00:09:17,970
with that? There is a simple code over
there. This is a simple code saying if you

81
00:09:17,970 --> 00:09:25,890
are a normal user, if in your code, you
just have to open the welcome file.

82
00:09:25,890 --> 00:09:33,329
Otherwise, if you are a root user, you
must open the password file. So this is to

83
00:09:33,329 --> 00:09:38,680
say if we want to open the welcome file,
this is a public information: you can do

84
00:09:38,680 --> 00:09:45,129
whatever you want with it. Otherwise, if
it's a root user, maybe the password will

85
00:09:45,129 --> 00:09:51,920
contain for instance a cryptographic key
and we should not go to the printf

86
00:09:51,920 --> 00:10:01,970
function at the end of this code. The idea
behind that is to check that the fs

87
00:10:01,970 --> 00:10:08,290
variable containing the data of the file
is private or public. There are mainly

88
00:10:08,290 --> 00:10:13,899
three steps for that. First of all, the
compilation will give us the assembly

89
00:10:13,899 --> 00:10:25,290
code. Then, we must modify system calls to
send the tags. The tags will be as I said

90
00:10:25,290 --> 00:10:33,720
before the private or public information
about my fs variable. I will talk a bit

91
00:10:33,720 --> 00:10:40,699
about that later: maybe, in future works,
the idea is to make or at least to compile

92
00:10:40,699 --> 00:10:51,790
an Operating System with integrated
support for DIFT. There were already some

93
00:10:51,790 --> 00:10:58,459
works about Dynamic Information Flow
Tracking. So, we should do this kind of

94
00:10:58,459 --> 00:11:04,839
information flow tracking in two manners.
The first one at the application level

95
00:11:04,839 --> 00:11:14,920
working at the Java or Android level. Some
works also propose some solutions at the

96
00:11:14,920 --> 00:11:21,100
OS level: for instance, KBlare. But what
we wanted to do here is to work at a lower

97
00:11:21,100 --> 00:11:27,730
level so this is not at the application or
the OS leve but more at the hardware level

98
00:11:27,730 --> 00:11:34,769
or, at least, at the processor
architecture level. If you want to have

99
00:11:34,769 --> 00:11:39,540
some information about the OS level
implementations of information flow

100
00:11:39,540 --> 00:11:47,179
tracking, you can go to blare-ids.org
where you have some implementations of an

101
00:11:47,179 --> 00:11:55,749
Android port and a Java port of intrusion
detection systems. In the rest of my talk,

102
00:11:55,749 --> 00:12:05,069
I will just go through the existing works
and see what we can do about that. When we

103
00:12:05,069 --> 00:12:10,706
talk about dynamic information flow
tracking at a low level, there are mainly

104
00:12:10,706 --> 00:12:22,489
three approaches. The first one is the
one in the left-side of this slide. The idea is

105
00:12:22,489 --> 00:12:29,300
that in the upper-side of this figure, we
have the normal processor pipeline:

106
00:12:29,300 --> 00:12:38,059
basically, decode stage, register file and
Arithmetic & Logic Unit. The basic idea is

107
00:12:38,059 --> 00:12:44,410
that when we want to process with tags or
taints, we just duplicate the processor

108
00:12:44,410 --> 00:12:54,129
pipeline (the grey pipeline under the
normal one) just to process data. And, it

109
00:12:54,129 --> 00:12:58,009
implies two things: First of all, we must
have the source code of the processor

110
00:12:58,009 --> 00:13:08,720
itself just to duplicate the processor
pipeline and to make the DIFT pipeline.

111
00:13:08,720 --> 00:13:16,399
This is quite inconvenient because we
must have the source code of the processor

112
00:13:16,399 --> 00:13:25,160
which is not really easy sometimes.
Otherwise, the main advantage of this

113
00:13:25,160 --> 00:13:29,929
approach is that we can do nearly anything
we want because we have access to all

114
00:13:29,929 --> 00:13:34,839
codes. So, we can pull all wires we need
from the processor just to get the

115
00:13:34,839 --> 00:13:41,470
information we need. On the second
approach (right side of the picture),

116
00:13:41,470 --> 00:13:47,129
there is something a bit more different:
instead of having a single processor

117
00:13:47,129 --> 00:13:52,459
aiming to do the normal application flow +
the information flow tracking, we should

118
00:13:52,459 --> 00:13:58,869
separate the normal execution and the
information flow tracking (this is the

119
00:13:58,869 --> 00:14:04,639
second approach over there). This approach
is not satisfying as well because you will

120
00:14:04,639 --> 00:14:15,019
have one core running the normal
application but core #2 will be just able

121
00:14:15,019 --> 00:14:22,360
to make DIFT controls. Basically, it's a
shame to use a processor just to make DIFT

122
00:14:22,360 --> 00:14:29,829
controls. The best compromise we can do is
to make a dedicated coprocessor just to

123
00:14:29,829 --> 00:14:35,670
make the information flow tracking
processing. Basically, the most

124
00:14:35,670 --> 00:14:42,160
interesting work in this topic is to have
a main core processor aiming to make the

125
00:14:42,160 --> 00:14:47,079
normal application and a dedicated
coprocessor to make the IFT controls. You

126
00:14:47,079 --> 00:14:54,380
will have some communications between
those two cores. If we want to make a

127
00:14:54,380 --> 00:15:01,040
quick comparison between different works.
If you want to run the dynamic information

128
00:15:01,040 --> 00:15:09,230
flow control in pure software (I will talk
about that in the slide after), this is

129
00:15:09,230 --> 00:15:19,809
really painful in terms of time overhead
because you will see that the time to do

130
00:15:19,809 --> 00:15:25,329
information flow tracking in pure software
is really unacceptable. Regarding

131
00:15:25,329 --> 00:15:30,630
hardware-assisted approaches, the best
advantage in all cases is that we have a

132
00:15:30,630 --> 00:15:38,269
low overhead in terms of silicon area: it
means that, on this slide, the overhead

133
00:15:38,269 --> 00:15:45,799
between the main core and the main core +
the coprocessor is not so important. We

134
00:15:45,799 --> 00:16:00,967
will see that, in the case of my talk, the
dedicated DIFT coprocessor is also easier

135
00:16:00,967 --> 00:16:10,410
to get different security policies. As I
said in the pure software solution (the

136
00:16:10,410 --> 00:16:17,499
first line of this table), the basic idea
behind that is to use instrumentation. If

137
00:16:17,499 --> 00:16:23,579
you were there on day 2, the
instrumentation is the transformation of a

138
00:16:23,579 --> 00:16:30,049
program into its own measurement tool. It
means that we will put some sensors in all

139
00:16:30,049 --> 00:16:36,600
parts of my code just to monitor its
activity and gather some information from

140
00:16:36,600 --> 00:16:42,869
it. If we want to measure the impact of
instrumentation on the execution time of

141
00:16:42,869 --> 00:16:48,129
an application, you can see in this
diagram over there, the normal application

142
00:16:48,129 --> 00:16:53,989
level which is normalized to 1. When we
want to use instrumentation with it, the

143
00:16:53,989 --> 00:17:06,130
minimal overhead we have is about 75%. The
time with instrumentation will be most of

144
00:17:06,130 --> 00:17:11,888
the time twice higher than the normal
execution time. This is completely

145
00:17:11,888 --> 00:17:18,609
unacceptable because it will just run
slower your application. Basically, as I

146
00:17:18,609 --> 00:17:24,409
said, the main concern about my talk is
about reducing the overhead of software

147
00:17:24,409 --> 00:17:29,880
instrumentation. I will talk also a bit
about the security of the DIFT coprocessor

148
00:17:29,880 --> 00:17:36,679
because we can't include a DIFT
coprocessor without taking care of its

149
00:17:36,679 --> 00:17:45,370
security. According to my knowledge, this
is the first work about DIFT in ARM-based

150
00:17:45,370 --> 00:17:53,380
system-on-chips. On the talk about the
security of the Nintendo Switch, the

151
00:17:53,380 --> 00:17:59,460
speaker said that black-box testing is fun
... except that it isn't. In our case, we

152
00:17:59,460 --> 00:18:05,380
have only a black-box because we can't
modify the structure of the processor, we

153
00:18:05,380 --> 00:18:13,810
must make our job without, let's say,
decaping the processor and so on. This is

154
00:18:13,810 --> 00:18:21,910
an overall schematic of our architecture.
On the left side, in light green, you have

155
00:18:21,910 --> 00:18:27,130
the ARM processor. In this case, this is a
simplified version with only one core.

156
00:18:27,130 --> 00:18:32,630
And, on the right side, you have the
structure of the coprocessor we

157
00:18:32,630 --> 00:18:40,720
implemented in the FPGA. You can notice,
for instance, for the moment sorry, two

158
00:18:40,720 --> 00:18:48,070
things. The first is that you have some
links between the FPGA and the CPU. These

159
00:18:48,070 --> 00:18:54,160
links are already existing in the system-
on-chip. And you can see another thing

160
00:18:54,160 --> 00:19:03,680
regarding the memory: you have separate
memory for the processor and the FPGA. And

161
00:19:03,680 --> 00:19:08,620
we will see later that we can use
TrustZone to add a layer of security, just

162
00:19:08,620 --> 00:19:17,470
to be sure that we won't mix the memory
between the CPU and the FPGA. Basically,

163
00:19:17,470 --> 00:19:24,240
when we want to work with ARM processors,
we must use ARM datasheets, we must read

164
00:19:24,240 --> 00:19:29,660
ARM datasheets. First of all, don't be
afraid by the length of ARM datasheets

165
00:19:29,660 --> 00:19:36,590
because, in my case, I used to work with
the ARM-v7 technical manual which is

166
00:19:36,590 --> 00:19:49,251
already 2000 pages. The ARM-v8 manual is
about 6000 pages. Anyway. Of course, what

167
00:19:49,251 --> 00:19:54,690
is also difficult is that the information
is split between different documents.

168
00:19:54,690 --> 00:20:01,320
Anyway, when we want to use debug
components in the case of ARM, we just

169
00:20:01,320 --> 00:20:07,740
have this register over there which is
called DBGOSLAR. We can see that, in this

170
00:20:07,740 --> 00:20:15,400
register, we can say that writing the key
value 0xC5A-blabla to this field locks the

171
00:20:15,400 --> 00:20:20,179
debug registers. And if your write any
other value, it will just unlock those

172
00:20:20,179 --> 00:20:27,599
debug registers. So that was basically the
first step to enable the debug components:

173
00:20:27,599 --> 00:20:38,840
Just writing a random value to this register
just to unlock my debug components. Here

174
00:20:38,840 --> 00:20:44,870
is again a schematic of the overall
system-on-chip. As you see, you have the

175
00:20:44,870 --> 00:20:50,220
two processors and, on the top part, you
have what are called Coresight components.

176
00:20:50,220 --> 00:20:56,120
These are the famous debug components I
will talk in the second part of my talk.

177
00:20:56,120 --> 00:21:05,680
Here is a simplified view of the debug
components we have in Zynq SoCs. On the

178
00:21:05,680 --> 00:21:13,460
left side, we have the two processors
(CPU0 and CPU1) and all the Coresight

179
00:21:13,460 --> 00:21:21,210
components are: PTM, the one which is in
the red rectangle; and also the ECT which

180
00:21:21,210 --> 00:21:26,460
is the Embedded Cross Trigger; and the ITM
which is the Instrumentation Trace

181
00:21:26,460 --> 00:21:32,940
Macrocell. Basically, when we want to
extract some data from the Coresight

182
00:21:32,940 --> 00:21:43,559
components, the basic path we use is the
PTM, go through the Funnel and, at this

183
00:21:43,559 --> 00:21:50,750
step, we have two choices to store the
information taken from debug components.

184
00:21:50,750 --> 00:21:55,830
The first one is the Embedded Trace Buffer
which is a small memory embedded in the

185
00:21:55,830 --> 00:22:04,279
processor. Unfortunately, this memory is
really small because it's only about

186
00:22:04,279 --> 00:22:10,570
4KBytes as far as I remember. But the other
possibility is just to export some data to

187
00:22:10,570 --> 00:22:15,799
the Trace Packet Output and this is what
we will use just to export some data to

188
00:22:15,799 --> 00:22:26,309
the coprocessor implemented in the FPGA.
Basically, what PTM is able to do? The

189
00:22:26,309 --> 00:22:34,149
first thing that PTM can do is to trace
whatever in your memory. For instance, you

190
00:22:34,149 --> 00:22:41,880
can trace all your code. Basically, all
the blue sections. But, you can also let's

191
00:22:41,880 --> 00:22:47,890
say trace specific regions of the code:
You can say OK I just want to trace the

192
00:22:47,890 --> 00:22:55,519
code in my section 1 or section 2 or
section N. Then the PTM is also able to

193
00:22:55,519 --> 00:23:00,100
make some Branch Broadcasting. That is
something that was not present in the

194
00:23:00,100 --> 00:23:06,919
Linux kernel. So, we already submitted a
patch that was accepted to manage the

195
00:23:06,919 --> 00:23:14,309
Branch Broadcasting into the PTM. And we
can do some timestamping and other things

196
00:23:14,309 --> 00:23:22,250
just to be able to store the information
in the traces. Basically, what a trace

197
00:23:22,250 --> 00:23:27,340
looks like? Here is the most simple
code we could had: it's just a for loop

198
00:23:27,340 --> 00:23:35,570
doing nothing. The assembly code over
there. And the trace will look like this.

199
00:23:35,570 --> 00:23:45,070
In the first 5 bytes, some kind of start
packet which is called the A-sync packet

200
00:23:45,070 --> 00:23:50,390
just to say "OK, this is the beginning of
the trace". In the green part, we'll have

201
00:23:50,390 --> 00:23:56,460
the address which corresponds to the
beginning of the loop. And, in the orange

202
00:23:56,460 --> 00:24:02,700
part, we will have the Branch Address
Packet. You can see that you have 10

203
00:24:02,700 --> 00:24:08,299
iterations of this Branch Address Packet
because we have 10 iterations of the for

204
00:24:08,299 --> 00:24:18,679
loop. This is just to show what is the
general structure of a trace. This is just

205
00:24:18,679 --> 00:24:22,720
a control flow graph just to say what we
could have about this. Of course, if we

206
00:24:22,720 --> 00:24:27,009
have another loop at the end of this
control flow graph, we'll just make the

207
00:24:27,009 --> 00:24:31,820
trace a bit longer just to have the
information about the second loop and so

208
00:24:31,820 --> 00:24:40,980
on. Once we have all these traces, the
next step is to say I have my tags but how

209
00:24:40,980 --> 00:24:49,220
do I define the rules just to transmit my
tags. And this is there we will use static

210
00:24:49,220 --> 00:24:55,880
analysis for this. Basically, in this
example, if we have the instruction "add

211
00:24:55,880 --> 00:25:05,870
register1 + register2 and put the result
in register0". For this, we will use

212
00:25:05,870 --> 00:25:12,779
static analysis which allows us to say that
the tag associated with register0 will be

213
00:25:12,779 --> 00:25:19,029
the tag of register1 or the tag of
register2. Static analysis will be done

214
00:25:19,029 --> 00:25:25,220
before running my code just to say I have
all the rules for all the lines of my

215
00:25:25,220 --> 00:25:33,590
code. Now that we have the trace, we know
how to transmit the tags all over my code,

216
00:25:33,590 --> 00:25:41,529
the final step will be just to make the
static analysis in the LLVM backend. The

217
00:25:41,529 --> 00:25:46,640
final step will be about instrumentation.
As I said before, we can recover all the

218
00:25:46,640 --> 00:25:51,809
memory addresses we need through
instrumentation. Otherwise, we can also

219
00:25:51,809 --> 00:26:02,850
only get the register-relative memory
addresses through instrumentation. In this

220
00:26:02,850 --> 00:26:12,179
first case, on this simple code, we can
instrument all the code but the main

221
00:26:12,179 --> 00:26:19,909
drawback of this solution is that it will
completely excess the time of the

222
00:26:19,909 --> 00:26:27,400
instruction. Otherwise, what we can do is
that with the store instruction over

223
00:26:27,400 --> 00:26:33,529
there, we can get data from the trace:
basically, we will use the Program Counter

224
00:26:33,529 --> 00:26:37,860
from the trace. Then, for the Stack
Pointer, we will use static analysis to

225
00:26:37,860 --> 00:26:42,730
get information from the Stack Pointer.
And, finally, we can use only one

226
00:26:42,730 --> 00:26:54,590
instrumented instruction at the end. If I
go back to this system, the communication

227
00:26:54,590 --> 00:27:03,039
overhead will be the main drawback as I
said before because if we have over there

228
00:27:03,039 --> 00:27:09,340
the processor and the FPGA running in
different parts, the main problem will be

229
00:27:09,340 --> 00:27:18,090
how we can transmit data in real-time or,
at least, in the highest speed we can

230
00:27:18,090 --> 00:27:27,460
between the processor and the FPGA. This
is the time overhead when we enable

231
00:27:27,460 --> 00:27:35,299
Coresight components or not. In blue, we
have the basic time overhead when the

232
00:27:35,299 --> 00:27:40,610
traces are disabled. And we can see that,
when we enable traces, the time overhead

233
00:27:40,610 --> 00:27:50,620
is nearly negligible. Regarding time
instrumentation, we can see that regarding

234
00:27:50,620 --> 00:27:56,780
the strategy 2 which is using the
Coresight components, using the static

235
00:27:56,780 --> 00:28:02,429
analysis and the instrumentation, we can
lower the instrumentation overhead from

236
00:28:02,429 --> 00:28:11,120
53% down to 5%. We still have some
overhead due to instrumentation but it's

237
00:28:11,120 --> 00:28:18,219
really low compared to the related works
where all the code was instrumented. This

238
00:28:18,219 --> 00:28:26,190
is an overview that shows that in the
grey lines some overhead of related works

239
00:28:26,190 --> 00:28:31,200
with full instrumentation and we can see
that, in our approach (with the greeen

240
00:28:31,200 --> 00:28:43,870
lines over there), the time overhead with
our code is much much smaller. Basically,

241
00:28:43,870 --> 00:28:49,139
how we can use TrustZone with this? This
is just an overview of our system. And we

242
00:28:49,139 --> 00:28:55,699
can say we can use TrustZone just to
separate the CPU from the FPGA

243
00:28:55,699 --> 00:29:07,210
coprocessor. If we make a comparison with
related works, we can see that compared to

244
00:29:07,210 --> 00:29:14,260
the first works, we are able to make some
information flow control with an hardcore

245
00:29:14,260 --> 00:29:22,289
processor which was not the case with the
two first works in this table. It means

246
00:29:22,289 --> 00:29:26,510
you can use a basic ARM processor just to
make the information flow tracking instead

247
00:29:26,510 --> 00:29:33,340
of having a specific processor. And, of
course, the area overhead, which is

248
00:29:33,340 --> 00:29:39,090
another important topic, is much much
smaller compared to the existing works.

249
00:29:39,090 --> 00:29:44,570
It's time for the conclusion. As I
presented in this talk, we are able to use

250
00:29:44,570 --> 00:29:50,789
the PTM component just to obtain runtime
information about my application. This is

251
00:29:50,789 --> 00:29:56,938
a non-intrusive tracing because we still
have negligible performance overhead.

252
00:29:56,938 --> 00:30:02,150
And we also improve the software security
just because we were able to make some

253
00:30:02,150 --> 00:30:07,709
security on the coprocessor. The future
perspective of that work is mainly to work

254
00:30:07,709 --> 00:30:16,020
with multicore processors and see if we can
use the same approach for Intel and maybe

255
00:30:16,020 --> 00:30:21,100
ST microcontrollers to see if we can also
do information flow tracking in this case.

256
00:30:21,100 --> 00:30:25,519
That was my talk. Thanks for listening.

257
00:30:25,519 --> 00:30:33,171
*applause*

258
00:30:35,210 --> 00:30:37,866
Herald: Thank you very much for this talk.

259
00:30:37,866 --> 00:30:44,580
Unfortunately, we don't have time for Q&A,
so please, if you leave the room and take

260
00:30:44,580 --> 00:30:48,169
your trash with you, that makes the angels
happy.

261
00:30:48,169 --> 00:30:54,840
Pascal: I was a bit long, sorry.


262
00:30:54,840 --> 00:30:57,490
Herald: Another round
of applause for Pascal.

263
00:30:57,490 --> 00:31:02,722
*applause*

264
00:31:02,722 --> 00:31:07,512
*34c3 outro*

265
00:31:07,512 --> 00:31:24,000
subtitles created by c3subtitles.de
in the year 2020. Join, and help us!