0 00:00:00,000 --> 00:00:30,000 Dear viewer, these subtitles were generated by a machine via the service Trint and therefore are (very) buggy. If you are capable, please help us to create good quality subtitles: https://c3subtitles.de/talk/1388 Thanks! 1 00:00:18,730 --> 00:00:20,829 So welcome here on Karl's West Stage 2 00:00:20,830 --> 00:00:23,169 on the very first talk of the morning 3 00:00:23,170 --> 00:00:26,259 at 11, huh? 4 00:00:26,260 --> 00:00:28,449 And I am very happy to 5 00:00:28,450 --> 00:00:30,759 introduce to you last for A.. 6 00:00:30,760 --> 00:00:32,889 He was previously at 7 00:00:32,890 --> 00:00:35,019 and they are specialists Quantico 8 00:00:35,020 --> 00:00:37,239 and is now working for the 9 00:00:37,240 --> 00:00:38,319 health think tank 10 00:00:39,340 --> 00:00:41,709 of the German Ministry of Health. 11 00:00:41,710 --> 00:00:44,179 And he will give you a talk about access 12 00:00:44,180 --> 00:00:46,419 to health care and how to improve 13 00:00:46,420 --> 00:00:49,029 life with health data. 14 00:00:49,030 --> 00:00:50,589 Welcome, Lars. 15 00:00:50,590 --> 00:00:51,590 Thank you very much. 16 00:00:55,560 --> 00:00:56,789 Good morning, everybody. 17 00:00:56,790 --> 00:00:58,350 Thanks for making it out so early. 18 00:00:59,370 --> 00:01:01,589 It's my first Congress and I'm 19 00:01:01,590 --> 00:01:03,209 a little overwhelmed with all the things 20 00:01:03,210 --> 00:01:05,609 that are happening, especially at night. 21 00:01:05,610 --> 00:01:07,349 So I really appreciate you all being here 22 00:01:07,350 --> 00:01:08,999 so early. 23 00:01:09,000 --> 00:01:10,319 My name is Lars. 24 00:01:10,320 --> 00:01:12,179 I used to be a data scientist. 25 00:01:12,180 --> 00:01:14,519 I'm a data scientist by training, and 26 00:01:14,520 --> 00:01:16,769 now I switched to bureaucratic 27 00:01:16,770 --> 00:01:18,959 bureaucracy to policy doing 28 00:01:18,960 --> 00:01:20,309 a policy stint. 29 00:01:20,310 --> 00:01:21,629 My life kind of looks like this right 30 00:01:21,630 --> 00:01:23,819 now. I work in the Federal 31 00:01:23,820 --> 00:01:26,249 Ministry of Health in a think tank 32 00:01:26,250 --> 00:01:28,469 that is advising policy, 33 00:01:28,470 --> 00:01:30,659 sort of bringing in new ideas from 34 00:01:30,660 --> 00:01:32,819 the outside and informing 35 00:01:32,820 --> 00:01:34,469 the rest of the world about what's going 36 00:01:34,470 --> 00:01:35,939 on inside the ministry. 37 00:01:35,940 --> 00:01:37,739 And I lead the efforts on artificial 38 00:01:37,740 --> 00:01:38,969 intelligence. 39 00:01:38,970 --> 00:01:42,029 Today, I'm here to talk to you about 40 00:01:42,030 --> 00:01:44,339 what to do with health care data because 41 00:01:44,340 --> 00:01:46,529 I think there is a lot of talent 42 00:01:46,530 --> 00:01:48,929 missing in health care that technical 43 00:01:48,930 --> 00:01:51,029 people that hackers have. 44 00:01:51,030 --> 00:01:52,229 And I think there's a lot of knowledge 45 00:01:52,230 --> 00:01:53,639 missing in the hacker community 46 00:01:54,990 --> 00:01:56,999 about what to do with this kind of data. 47 00:01:57,000 --> 00:01:58,709 And so I'm trying to address five points 48 00:01:58,710 --> 00:02:00,839 today and sort 49 00:02:00,840 --> 00:02:02,909 of get you acquainted 50 00:02:02,910 --> 00:02:04,379 with some of the stuff that is happening 51 00:02:06,030 --> 00:02:07,059 to get you started. 52 00:02:07,060 --> 00:02:09,059 Does anyone here, a doctor, can you raise 53 00:02:09,060 --> 00:02:10,060 your hand? 54 00:02:11,220 --> 00:02:13,349 Excellent. Zero is the worst listeners. 55 00:02:14,790 --> 00:02:16,469 This is a pathologist. 56 00:02:16,470 --> 00:02:18,749 What pathologists do is not 57 00:02:18,750 --> 00:02:20,429 what you see in crime scenes. 58 00:02:20,430 --> 00:02:22,559 Well, pathologists do typically 59 00:02:22,560 --> 00:02:23,849 look for cancer. 60 00:02:23,850 --> 00:02:26,249 So whenever a doctor suspects 61 00:02:26,250 --> 00:02:28,499 cancer in a patient, 62 00:02:28,500 --> 00:02:30,359 at some point in their journey, they will 63 00:02:30,360 --> 00:02:32,819 cut a piece of tissue from the patient. 64 00:02:32,820 --> 00:02:34,169 They will send it to a pathologist. 65 00:02:34,170 --> 00:02:36,239 A pathologist will harden that with 66 00:02:36,240 --> 00:02:37,769 special chemicals. 67 00:02:37,770 --> 00:02:39,359 We'll cut a thin slice of it and we'll 68 00:02:39,360 --> 00:02:41,339 look at it under a microscope like that 69 00:02:41,340 --> 00:02:42,269 one. 70 00:02:42,270 --> 00:02:44,549 And they will basically look for cancer 71 00:02:44,550 --> 00:02:46,739 cells in their tissue. 72 00:02:46,740 --> 00:02:48,839 The whole process seems very 73 00:02:48,840 --> 00:02:50,279 scientific. You know, it smells like 74 00:02:50,280 --> 00:02:52,169 chemicals as microscopes involved. 75 00:02:53,610 --> 00:02:55,289 It's fairly opaque. 76 00:02:55,290 --> 00:02:56,669 Even the doctor is sending in the 77 00:02:56,670 --> 00:02:58,709 original tissue. It doesn't normally 78 00:02:58,710 --> 00:03:00,779 quite understand what's happening. 79 00:03:00,780 --> 00:03:02,579 And in the end, this pathologist will 80 00:03:02,580 --> 00:03:05,279 answer one question Is it cancer? 81 00:03:05,280 --> 00:03:07,079 And if yes, how many? 82 00:03:07,080 --> 00:03:09,569 And then sort of the doctor 83 00:03:09,570 --> 00:03:11,729 that send the original tissue will work 84 00:03:11,730 --> 00:03:13,859 with that information and 85 00:03:13,860 --> 00:03:15,119 data people. 86 00:03:15,120 --> 00:03:16,889 We're pretty familiar with the kind of 87 00:03:16,890 --> 00:03:18,779 thing because it's a black box, right? 88 00:03:18,780 --> 00:03:21,479 It's input data comes in, 89 00:03:21,480 --> 00:03:24,029 then black box processing 90 00:03:24,030 --> 00:03:25,739 happens here in the form of this 91 00:03:25,740 --> 00:03:27,869 pathologist and then some output comes 92 00:03:27,870 --> 00:03:28,919 back. 93 00:03:28,920 --> 00:03:30,929 And so I thought it's pretty natural as 94 00:03:30,930 --> 00:03:33,059 data people to ask, Hey, what is 95 00:03:33,060 --> 00:03:35,489 the accuracy of this 96 00:03:35,490 --> 00:03:37,499 black box algorithm and the format of 97 00:03:37,500 --> 00:03:38,500 this pathologist? 98 00:03:39,690 --> 00:03:41,999 Turns out that's kind of a wild 99 00:03:42,000 --> 00:03:43,529 question in health care. 100 00:03:43,530 --> 00:03:45,629 And I found the 101 00:03:45,630 --> 00:03:47,969 results via the answer to the question 102 00:03:47,970 --> 00:03:49,439 fairly surprising. 103 00:03:49,440 --> 00:03:50,550 And I want to share it with you. 104 00:03:51,630 --> 00:03:53,759 This is a somewhat extreme 105 00:03:53,760 --> 00:03:56,279 example is very illustrative, 106 00:03:56,280 --> 00:03:59,129 but there's no way out of the ordinary. 107 00:03:59,130 --> 00:04:01,229 It's a study done at a German University 108 00:04:01,230 --> 00:04:03,449 Hospital in Hamburg, where 109 00:04:03,450 --> 00:04:05,939 they compare the diagnosis 110 00:04:05,940 --> 00:04:07,859 that came in from the outside world. 111 00:04:07,860 --> 00:04:10,619 Then they had an expert panel of several 112 00:04:10,620 --> 00:04:12,329 high-ranking pathologists looking at the 113 00:04:12,330 --> 00:04:14,639 same tissue and comparing 114 00:04:14,640 --> 00:04:16,889 What did the experts say versus 115 00:04:16,890 --> 00:04:19,619 what the original pathologists say? 116 00:04:19,620 --> 00:04:21,189 So we're basically comparing this kind of 117 00:04:21,190 --> 00:04:22,259 confusion matrix. 118 00:04:22,260 --> 00:04:23,610 If you've heard that term before, 119 00:04:24,630 --> 00:04:26,399 where we're comparing the original 120 00:04:26,400 --> 00:04:29,009 prediction of a real pathologist 121 00:04:29,010 --> 00:04:31,139 with the closest we can get to 122 00:04:31,140 --> 00:04:32,999 ground truth, which is sort of a 123 00:04:33,000 --> 00:04:34,769 consensus opinion of experts. 124 00:04:36,360 --> 00:04:38,039 And the results I found were relatively 125 00:04:38,040 --> 00:04:39,040 shocking. 126 00:04:39,930 --> 00:04:42,569 Most medical diagnosis fall on a spectrum 127 00:04:42,570 --> 00:04:44,669 from a sort of oh good to oh 128 00:04:44,670 --> 00:04:45,670 god. 129 00:04:46,290 --> 00:04:48,119 With this example, it's about prostate 130 00:04:48,120 --> 00:04:49,120 cancer. 131 00:04:50,160 --> 00:04:52,469 It's anything worse than, oh, good. 132 00:04:52,470 --> 00:04:54,059 So if it's sort of on the right of a 133 00:04:54,060 --> 00:04:56,159 good, then your 134 00:04:56,160 --> 00:04:58,379 prostate will probably be removed to sort 135 00:04:58,380 --> 00:05:00,479 of keep the cancer in check. 136 00:05:00,480 --> 00:05:02,429 And in the event that you have a 137 00:05:02,430 --> 00:05:04,199 prostate, about half of us, do 138 00:05:05,310 --> 00:05:07,349 you kind of want to keep it? 139 00:05:07,350 --> 00:05:08,909 The, you know, having your prostate 140 00:05:08,910 --> 00:05:10,589 removed is not something that is a very 141 00:05:10,590 --> 00:05:11,590 pleasant experience. 142 00:05:14,140 --> 00:05:16,799 Yeah, it's a very 143 00:05:16,800 --> 00:05:19,199 not nice surgery to have 144 00:05:19,200 --> 00:05:21,719 leave a lot of damage in the body. 145 00:05:21,720 --> 00:05:23,939 So sort of having a wrong 146 00:05:23,940 --> 00:05:26,819 diagnosis here is important. 147 00:05:26,820 --> 00:05:29,189 The first thing you see is that about 148 00:05:29,190 --> 00:05:31,409 two thirds of all diagnoses the 149 00:05:31,410 --> 00:05:33,689 green bars in the middle are exactly 150 00:05:33,690 --> 00:05:35,819 correct, which means that 151 00:05:35,820 --> 00:05:38,369 one third of diagnoses are actually 152 00:05:38,370 --> 00:05:40,649 not exactly correct. 153 00:05:40,650 --> 00:05:41,579 And you can say, Hey, you know, it 154 00:05:41,580 --> 00:05:43,079 doesn't really matter. 155 00:05:43,080 --> 00:05:45,299 But what does matter is 156 00:05:45,300 --> 00:05:47,519 the cases from oh good 157 00:05:47,520 --> 00:05:49,409 upwards, because that's where I have a 158 00:05:49,410 --> 00:05:51,629 good surgery and not get surgery. 159 00:05:51,630 --> 00:05:53,279 And what this data shows is that from 160 00:05:53,280 --> 00:05:55,709 these. 5400 patients 161 00:05:57,720 --> 00:06:00,089 out of those that actually by consensus 162 00:06:00,090 --> 00:06:02,189 opinion did not have cancer, did 163 00:06:02,190 --> 00:06:04,319 not have prostate cancer or did not have 164 00:06:04,320 --> 00:06:06,449 an immediate need to treat one 165 00:06:06,450 --> 00:06:07,799 third of those patients had their 166 00:06:07,800 --> 00:06:09,929 prostate removed without the need 167 00:06:09,930 --> 00:06:10,930 to do that. 168 00:06:12,090 --> 00:06:14,369 So that's a, you know, 169 00:06:14,370 --> 00:06:16,230 pretty drastic outcome for patients 170 00:06:17,460 --> 00:06:19,619 that you would really like 171 00:06:19,620 --> 00:06:20,729 to avoid. 172 00:06:20,730 --> 00:06:22,979 Now the question that I ask for myself 173 00:06:22,980 --> 00:06:25,049 is if this were the 174 00:06:25,050 --> 00:06:27,149 result of an algorithm of 175 00:06:27,150 --> 00:06:28,619 a machine learning algorithm or some 176 00:06:28,620 --> 00:06:30,959 other algorithm, would this be acceptable 177 00:06:30,960 --> 00:06:31,960 performance? 178 00:06:33,330 --> 00:06:35,459 Maybe not, but for humans as best we 179 00:06:35,460 --> 00:06:37,529 can do. But I think 180 00:06:37,530 --> 00:06:40,049 being more transparent about 181 00:06:40,050 --> 00:06:42,689 how humans are just humans 182 00:06:42,690 --> 00:06:44,669 and maybe not particularly well suited to 183 00:06:44,670 --> 00:06:46,739 look at pictures all day, I 184 00:06:46,740 --> 00:06:48,389 think would be a good thing to have more 185 00:06:48,390 --> 00:06:50,579 clarity in the health care system about 186 00:06:50,580 --> 00:06:52,439 what diagnoses actually mean. 187 00:06:52,440 --> 00:06:54,810 And so this is my starting example. 188 00:06:57,420 --> 00:06:59,639 This is called in the medical community. 189 00:06:59,640 --> 00:07:02,279 This is called intra inter observer 190 00:07:02,280 --> 00:07:03,569 variability. 191 00:07:03,570 --> 00:07:06,119 So different observers look at it 192 00:07:06,120 --> 00:07:08,579 and their diagnoses vary 193 00:07:08,580 --> 00:07:10,469 depending on who looks at it. 194 00:07:10,470 --> 00:07:12,719 Intra observer variability. 195 00:07:12,720 --> 00:07:14,219 So you show the same picture to the same 196 00:07:14,220 --> 00:07:16,379 doctor at a different time of the day. 197 00:07:16,380 --> 00:07:18,389 Also exists and is not much better. 198 00:07:20,010 --> 00:07:23,129 The example I showed you is 199 00:07:23,130 --> 00:07:25,289 very illustrative, but as I said, 200 00:07:25,290 --> 00:07:27,089 this happens in a lot of medical fields, 201 00:07:27,090 --> 00:07:28,949 not only for prostate cancer, but for all 202 00:07:28,950 --> 00:07:30,809 kinds of cancer, for all kinds of 203 00:07:30,810 --> 00:07:31,810 illnesses. 204 00:07:32,970 --> 00:07:34,499 And again, it shouldn't be surprising. 205 00:07:34,500 --> 00:07:35,739 Doctors are not bad people. 206 00:07:35,740 --> 00:07:37,709 Doctors are just humans like us all. 207 00:07:37,710 --> 00:07:39,089 And so mistakes happen. 208 00:07:41,650 --> 00:07:44,109 And sort of it's one example 209 00:07:44,110 --> 00:07:45,759 of this question that I think can be 210 00:07:45,760 --> 00:07:48,219 answered with data and that currently 211 00:07:48,220 --> 00:07:50,019 maybe not explored enough. 212 00:07:50,020 --> 00:07:51,219 Another example would be who gets 213 00:07:51,220 --> 00:07:52,449 treatment when? 214 00:07:52,450 --> 00:07:54,189 So who has access to doctors? 215 00:07:54,190 --> 00:07:55,209 How long do they have to wait for 216 00:07:55,210 --> 00:07:56,860 appointments, that kind of thing? 217 00:07:58,090 --> 00:08:00,279 How unhealthy are waiting rooms? 218 00:08:00,280 --> 00:08:02,439 So people are waiting at the local 219 00:08:02,440 --> 00:08:04,509 doctor's office to be seen 220 00:08:04,510 --> 00:08:06,069 often for hours makes too many sick 221 00:08:06,070 --> 00:08:08,290 people. How unhealthy is that really? 222 00:08:09,940 --> 00:08:12,039 Does specialization into 223 00:08:12,040 --> 00:08:13,899 more and more specialized doctors 224 00:08:13,900 --> 00:08:16,029 practices actually improve quality of 225 00:08:16,030 --> 00:08:17,979 care? Or does it maybe detract from it 226 00:08:17,980 --> 00:08:19,300 because people lose oversight? 227 00:08:20,860 --> 00:08:22,899 Can we predict the best course of 228 00:08:22,900 --> 00:08:24,819 treatment for a given individual to 229 00:08:24,820 --> 00:08:26,199 personalized medicine? 230 00:08:28,290 --> 00:08:30,359 Or does the probability that a given 231 00:08:30,360 --> 00:08:32,489 patient receive the hip surgery so 232 00:08:32,490 --> 00:08:33,779 it gets in your hip? 233 00:08:33,780 --> 00:08:35,969 Does that depend on the reimbursement 234 00:08:35,970 --> 00:08:37,079 the doctor gets, i.e. 235 00:08:37,080 --> 00:08:38,939 the price tag associated with that 236 00:08:38,940 --> 00:08:39,940 surgery? 237 00:08:41,070 --> 00:08:42,329 These are all questions that you would 238 00:08:42,330 --> 00:08:43,979 like to answer with data. 239 00:08:43,980 --> 00:08:46,049 And the problem in health care is what 240 00:08:46,050 --> 00:08:47,639 I call the great irony of health care 241 00:08:47,640 --> 00:08:49,799 data, which is roughly this sort 242 00:08:49,800 --> 00:08:52,619 of the more legitimate use would be, 243 00:08:52,620 --> 00:08:54,839 the harder it is for those actors to get 244 00:08:54,840 --> 00:08:56,189 access to that data. 245 00:08:56,190 --> 00:08:57,779 And you know, the Converse holds 246 00:08:58,830 --> 00:08:59,940 as a case in point 247 00:09:01,050 --> 00:09:03,209 in Germany in the year 2020, 248 00:09:03,210 --> 00:09:05,489 most patients still do not have 249 00:09:05,490 --> 00:09:07,589 some official plan 250 00:09:07,590 --> 00:09:09,029 of their medication. What meds are they 251 00:09:09,030 --> 00:09:10,889 taking? This is important because you 252 00:09:10,890 --> 00:09:12,539 have these cross effects and side effects 253 00:09:12,540 --> 00:09:14,159 of different medication, and when they 254 00:09:14,160 --> 00:09:15,419 are hospitalized, you want to know what 255 00:09:15,420 --> 00:09:16,409 they're taking. 256 00:09:16,410 --> 00:09:18,059 And when you prescribe a new piece of 257 00:09:18,060 --> 00:09:20,129 medication, you want to compare that 258 00:09:20,130 --> 00:09:21,899 to what they're taking already. 259 00:09:21,900 --> 00:09:24,209 The really active, really engaged 260 00:09:24,210 --> 00:09:26,009 patients in Germany write these little 261 00:09:26,010 --> 00:09:28,199 papers by hand where they write down 262 00:09:28,200 --> 00:09:30,389 what has been prescribed 263 00:09:30,390 --> 00:09:32,669 to them at some point and just carry 264 00:09:32,670 --> 00:09:34,589 that piece of paper with them. 265 00:09:34,590 --> 00:09:36,149 And then if they go to a hospital and 266 00:09:36,150 --> 00:09:37,619 they get out of hospital again, they 267 00:09:37,620 --> 00:09:40,109 would have to update that piece of paper. 268 00:09:40,110 --> 00:09:41,549 Maybe they do. Maybe they don't. 269 00:09:41,550 --> 00:09:43,319 It's just not a good system. 270 00:09:43,320 --> 00:09:45,389 So as a doctor who is working with 271 00:09:45,390 --> 00:09:47,669 patients, every time I see a new patient 272 00:09:47,670 --> 00:09:49,379 have a really hard time finding out what 273 00:09:49,380 --> 00:09:50,969 they're actually taking already and what 274 00:09:50,970 --> 00:09:52,080 I can prescribe to them. 275 00:09:54,320 --> 00:09:56,419 Then we have this phenomenon of consumer 276 00:09:56,420 --> 00:09:58,459 electronics companies increasingly going 277 00:09:58,460 --> 00:10:00,289 into the health data space 278 00:10:01,610 --> 00:10:02,959 in a very smart way, and I think there's 279 00:10:02,960 --> 00:10:04,849 a lot of good things happening here. 280 00:10:04,850 --> 00:10:06,949 But basically declaring their 281 00:10:06,950 --> 00:10:08,659 health care products as lifestyle 282 00:10:08,660 --> 00:10:11,089 products, thereby 283 00:10:11,090 --> 00:10:13,849 circumventing a lot of those pesky 284 00:10:13,850 --> 00:10:15,709 health data data privacy problems are 285 00:10:15,710 --> 00:10:17,599 saying this is not actually health data. 286 00:10:17,600 --> 00:10:18,710 You know, lifestyle data 287 00:10:20,180 --> 00:10:21,829 and getting more access. 288 00:10:21,830 --> 00:10:23,179 And you can sort of think, you know, why 289 00:10:23,180 --> 00:10:25,369 does Apple need to have health 290 00:10:25,370 --> 00:10:26,370 data 291 00:10:27,980 --> 00:10:29,269 at the same time? I think a lot of good 292 00:10:29,270 --> 00:10:31,159 things are happening here. 293 00:10:31,160 --> 00:10:33,589 Patients are becoming more and more 294 00:10:33,590 --> 00:10:35,659 empowered to make their own decisions. 295 00:10:35,660 --> 00:10:36,649 They're learning more and more. And I 296 00:10:36,650 --> 00:10:38,090 think overall, it's a good thing. 297 00:10:39,470 --> 00:10:41,719 The worrying part about this is 298 00:10:41,720 --> 00:10:43,729 that traditional health care providers 299 00:10:43,730 --> 00:10:46,009 might actually lose track of 300 00:10:46,010 --> 00:10:48,289 or sort of lose the connection to the 301 00:10:48,290 --> 00:10:49,909 consumer electronics competition. 302 00:10:51,020 --> 00:10:52,879 And then finally, as we heard yesterday, 303 00:10:54,380 --> 00:10:56,479 health data does get lost 304 00:10:56,480 --> 00:10:57,769 and it's a terrible thing. 305 00:10:57,770 --> 00:10:59,119 This is probably the worst story I've 306 00:10:59,120 --> 00:11:00,649 ever heard about. 307 00:11:00,650 --> 00:11:02,929 Singapore lost a whole database of HIV 308 00:11:04,820 --> 00:11:07,009 people with HIV and 309 00:11:07,010 --> 00:11:08,010 were extorted for it. 310 00:11:09,230 --> 00:11:11,599 And you really don't want that to happen. 311 00:11:11,600 --> 00:11:13,639 So what are the learning from the hacker 312 00:11:13,640 --> 00:11:15,619 community and what are the things that I 313 00:11:15,620 --> 00:11:17,119 want to talk about here? 314 00:11:17,120 --> 00:11:19,039 First of all, I want to say very clearly 315 00:11:20,180 --> 00:11:21,169 keep hacking. 316 00:11:21,170 --> 00:11:23,269 You know, you don't wanna lose this data. 317 00:11:23,270 --> 00:11:25,219 And I think having people like the 318 00:11:25,220 --> 00:11:26,450 theater to say community 319 00:11:27,620 --> 00:11:29,869 being white hat hackers 320 00:11:29,870 --> 00:11:31,219 and making sure that people stay on their 321 00:11:31,220 --> 00:11:33,529 toes is very, very important 322 00:11:33,530 --> 00:11:35,179 for the safety and security of our health 323 00:11:35,180 --> 00:11:36,529 care data because I don't think we will 324 00:11:36,530 --> 00:11:38,149 go back to a world where this is all on 325 00:11:38,150 --> 00:11:39,150 paper. 326 00:11:39,620 --> 00:11:41,749 This will be digitalized and 327 00:11:41,750 --> 00:11:43,669 we will need to make sure that it's safe. 328 00:11:45,500 --> 00:11:47,029 The second part I want to talk about is a 329 00:11:47,030 --> 00:11:49,189 little more subtle in 330 00:11:49,190 --> 00:11:51,769 data privacy under GDPR. 331 00:11:51,770 --> 00:11:54,109 You have this idea of consent so patients 332 00:11:54,110 --> 00:11:55,159 can donate their data. 333 00:11:55,160 --> 00:11:57,319 Patients can say you can use my data 334 00:11:57,320 --> 00:11:58,699 to do something. 335 00:11:58,700 --> 00:12:00,559 And this conversation currently, 336 00:12:00,560 --> 00:12:02,119 especially in Germany but all over 337 00:12:02,120 --> 00:12:04,369 Europe, is very much driven by this idea 338 00:12:04,370 --> 00:12:06,679 between narrow consent and broad 339 00:12:06,680 --> 00:12:08,959 consent, where narrow 340 00:12:08,960 --> 00:12:11,029 consent is. This idea that 341 00:12:11,030 --> 00:12:13,159 as a data subject, I couldn't 342 00:12:13,160 --> 00:12:15,469 say very specifically what can and cannot 343 00:12:15,470 --> 00:12:16,609 be done with my data. 344 00:12:17,660 --> 00:12:19,789 And so I need to allow every single step 345 00:12:19,790 --> 00:12:22,039 of processing my data, whereas 346 00:12:22,040 --> 00:12:24,199 broad consent would say, You know, hey, 347 00:12:24,200 --> 00:12:26,299 I can say my data can be used 348 00:12:26,300 --> 00:12:28,699 for all kinds of cancer research. 349 00:12:28,700 --> 00:12:31,579 And the issue that I see 350 00:12:31,580 --> 00:12:33,619 is that the idea of narrow consent is 351 00:12:33,620 --> 00:12:35,629 beautiful in theory, and this is a great 352 00:12:35,630 --> 00:12:37,759 legal idea, but in 353 00:12:37,760 --> 00:12:39,859 reality leads to situations where, 354 00:12:39,860 --> 00:12:42,019 for example, a patient is being asked 355 00:12:42,020 --> 00:12:44,719 to sign the data usage form 356 00:12:44,720 --> 00:12:46,519 in the waiting room of a hospital or by a 357 00:12:46,520 --> 00:12:47,809 study nurse. 358 00:12:47,810 --> 00:12:49,699 I don't think that the idea of freedom we 359 00:12:49,700 --> 00:12:51,709 had when we first said we want their 360 00:12:51,710 --> 00:12:53,539 consent because you have this power 361 00:12:53,540 --> 00:12:55,729 dynamic, the patient wants 362 00:12:55,730 --> 00:12:57,809 treatment that is not a free choice 363 00:12:57,810 --> 00:12:59,629 of what to do with their data. 364 00:12:59,630 --> 00:13:01,309 And I think there's a tradeoff here 365 00:13:01,310 --> 00:13:03,439 between narrow consent and being very 366 00:13:03,440 --> 00:13:06,079 specific what you can do with data 367 00:13:06,080 --> 00:13:08,209 and the user experience part of it of 368 00:13:08,210 --> 00:13:10,129 how you can give that consent. 369 00:13:10,130 --> 00:13:12,349 And I think being more 370 00:13:12,350 --> 00:13:14,569 clear about the 371 00:13:14,570 --> 00:13:16,849 chances we have, we have a broader 372 00:13:16,850 --> 00:13:18,439 idea of consent. But having better user 373 00:13:18,440 --> 00:13:20,509 experience is something that would be 374 00:13:20,510 --> 00:13:22,669 extremely helpful in health data and in 375 00:13:22,670 --> 00:13:23,569 health care. 376 00:13:23,570 --> 00:13:25,069 Sort of similar to this idea of the 377 00:13:25,070 --> 00:13:27,469 cookie banners you get everywhere where 378 00:13:27,470 --> 00:13:30,379 having one switch per tracking cookie, 379 00:13:30,380 --> 00:13:31,999 at least for me, doesn't really help me 380 00:13:32,000 --> 00:13:33,289 much. 381 00:13:33,290 --> 00:13:35,479 I want to say cookies, yes, cookies, no. 382 00:13:35,480 --> 00:13:38,000 And a very simple one click solution. 383 00:13:40,040 --> 00:13:42,259 And then finally, I want to 384 00:13:42,260 --> 00:13:44,329 give you some ideas of legitimate 385 00:13:44,330 --> 00:13:46,519 uses of things that you can do 386 00:13:46,520 --> 00:13:48,349 with health care data. 387 00:13:48,350 --> 00:13:50,539 It may be kind of an open data style 388 00:13:51,830 --> 00:13:54,079 project that you can work on to answer, 389 00:13:54,080 --> 00:13:56,239 for example, the questions that I started 390 00:13:56,240 --> 00:13:57,240 with. 391 00:13:58,940 --> 00:14:01,219 So to get you started 392 00:14:01,220 --> 00:14:02,629 on something I thought I would tell you 393 00:14:02,630 --> 00:14:04,309 about a few data sets that I think are 394 00:14:04,310 --> 00:14:06,529 cool and allow some 395 00:14:06,530 --> 00:14:07,639 hacking. 396 00:14:07,640 --> 00:14:08,869 The first one are these two. 397 00:14:09,950 --> 00:14:12,469 On the left, you see a famous 398 00:14:12,470 --> 00:14:14,089 dataset for pathology. 399 00:14:14,090 --> 00:14:16,189 So what we started with this is 400 00:14:16,190 --> 00:14:18,259 what human 401 00:14:18,260 --> 00:14:20,359 tissue when you sort of cut it and 402 00:14:20,360 --> 00:14:22,459 dry it and cut into slices of 403 00:14:22,460 --> 00:14:24,589 it and look at a microscope and color, it 404 00:14:24,590 --> 00:14:26,659 looks like, is these pictures on 405 00:14:26,660 --> 00:14:28,849 the left? You get that with a diagnosis, 406 00:14:28,850 --> 00:14:30,229 and you can train computer vision 407 00:14:30,230 --> 00:14:32,389 algorithms to hopefully 408 00:14:32,390 --> 00:14:34,789 improve on the quality of diagnosis 409 00:14:34,790 --> 00:14:36,919 that we currently have on 410 00:14:36,920 --> 00:14:38,419 the other side. Just to show you another 411 00:14:38,420 --> 00:14:41,359 example. These are pictures of 412 00:14:41,360 --> 00:14:42,360 birthmarks. 413 00:14:43,580 --> 00:14:45,559 And the question is, is that melanoma? 414 00:14:45,560 --> 00:14:47,539 That's skin cancer or not? 415 00:14:47,540 --> 00:14:49,039 Both of these datasets are relatively 416 00:14:49,040 --> 00:14:51,199 well studied and great places to start 417 00:14:51,200 --> 00:14:52,520 if you're into computer vision. 418 00:14:54,420 --> 00:14:56,759 Then this is a bit of a German specific 419 00:14:56,760 --> 00:14:58,619 thing because health care is organized 420 00:14:58,620 --> 00:15:00,479 nationally and so different countries 421 00:15:00,480 --> 00:15:02,459 will have different data sets, but these 422 00:15:02,460 --> 00:15:04,589 are two data sets that I think are 423 00:15:04,590 --> 00:15:06,059 very interesting and probably 424 00:15:06,060 --> 00:15:07,439 underutilized. 425 00:15:07,440 --> 00:15:09,659 The one is the so-called 426 00:15:09,660 --> 00:15:12,209 dog browser where 427 00:15:12,210 --> 00:15:14,099 you actually intended to download 428 00:15:14,100 --> 00:15:16,229 software, which then allows you 429 00:15:16,230 --> 00:15:18,779 to see certain aggregate statistics about 430 00:15:18,780 --> 00:15:20,129 what billing codes are used for. 431 00:15:20,130 --> 00:15:21,599 Procedures are used. 432 00:15:21,600 --> 00:15:23,429 So it tells you a lot about what health 433 00:15:23,430 --> 00:15:25,589 care is rendered and which numbers in 434 00:15:25,590 --> 00:15:26,639 Germany. 435 00:15:26,640 --> 00:15:29,009 So the example here is actually 436 00:15:29,010 --> 00:15:31,199 birth and under 437 00:15:31,200 --> 00:15:32,399 the billing code for birth. 438 00:15:32,400 --> 00:15:34,199 What procedures were done? 439 00:15:34,200 --> 00:15:35,789 So what specific procedures were 440 00:15:35,790 --> 00:15:36,779 happening? 441 00:15:36,780 --> 00:15:38,939 You can download that software online. 442 00:15:38,940 --> 00:15:40,799 Turns out that all the data that that 443 00:15:40,800 --> 00:15:42,869 software displays is just in case of 444 00:15:42,870 --> 00:15:44,399 files and install directory. 445 00:15:44,400 --> 00:15:45,450 You don't even need to install it. 446 00:15:46,500 --> 00:15:48,179 But it's very interesting data to just 447 00:15:48,180 --> 00:15:50,999 get your fingers dirty on. 448 00:15:51,000 --> 00:15:52,949 And the other thing is the so-called 449 00:15:52,950 --> 00:15:55,199 quality Boavista is something 450 00:15:55,200 --> 00:15:57,419 that all hospitals have to publish, 451 00:15:57,420 --> 00:15:58,919 and somehow, I don't think this has 452 00:15:58,920 --> 00:16:00,449 gotten much attention than the open data 453 00:16:00,450 --> 00:16:01,450 community yet 454 00:16:02,910 --> 00:16:05,189 tells you a lot of data about what types 455 00:16:05,190 --> 00:16:07,709 of procedures, what type of diagnoses 456 00:16:07,710 --> 00:16:09,089 hospitals are seeing. 457 00:16:09,090 --> 00:16:10,499 And then some. So I don't think this is 458 00:16:10,500 --> 00:16:12,779 necessarily super helpful for their 459 00:16:12,780 --> 00:16:14,999 quality metrics that they're publishing 460 00:16:15,000 --> 00:16:16,799 there, but it tells you a lot about what 461 00:16:16,800 --> 00:16:18,899 kind of patients hospitals are 462 00:16:18,900 --> 00:16:21,029 seeing and if 463 00:16:21,030 --> 00:16:23,069 you pool this with other data like census 464 00:16:23,070 --> 00:16:24,389 data. I think there's a lot of 465 00:16:24,390 --> 00:16:25,649 interesting things you can do here. 466 00:16:28,200 --> 00:16:30,449 This is another class 467 00:16:30,450 --> 00:16:32,459 example to sort of get you thinking a 468 00:16:32,460 --> 00:16:34,229 little bit out of the ordinary on the 469 00:16:34,230 --> 00:16:35,879 left, you can see that certain cities 470 00:16:35,880 --> 00:16:38,129 have published the 471 00:16:38,130 --> 00:16:40,619 availability of their emergency 472 00:16:40,620 --> 00:16:42,719 rooms in different hospitals. 473 00:16:42,720 --> 00:16:44,549 So basically for a given hospital in a 474 00:16:44,550 --> 00:16:47,219 row per time of day, you can see 475 00:16:47,220 --> 00:16:49,229 how much room they have in their E.R. 476 00:16:49,230 --> 00:16:50,639 for specific diagnoses. 477 00:16:50,640 --> 00:16:52,919 So you can say hospital aid 478 00:16:52,920 --> 00:16:55,409 is getting flooded with gynecology 479 00:16:55,410 --> 00:16:57,599 cases, but they're doing okay 480 00:16:57,600 --> 00:16:58,619 for bone fractures. 481 00:17:00,300 --> 00:17:01,799 And you can do all kinds of things like 482 00:17:01,800 --> 00:17:03,450 try and build predictive models on this. 483 00:17:05,069 --> 00:17:07,348 Maybe somebody wants to archive the data 484 00:17:07,349 --> 00:17:08,699 somewhere. 485 00:17:08,700 --> 00:17:09,629 I think there's a lot of interesting 486 00:17:09,630 --> 00:17:11,039 things that you could do with this type 487 00:17:11,040 --> 00:17:12,568 of data. Of course, originally it's 488 00:17:12,569 --> 00:17:14,608 intended to be used for ambulances to see 489 00:17:14,609 --> 00:17:15,750 which hospital to drive to. 490 00:17:16,890 --> 00:17:19,108 And then finally on the right 491 00:17:19,109 --> 00:17:21,419 is a very well-studied example 492 00:17:21,420 --> 00:17:22,799 from the US. 493 00:17:22,800 --> 00:17:24,358 Probably one of the richest data sets 494 00:17:24,359 --> 00:17:26,338 that we currently have in health care 495 00:17:26,339 --> 00:17:28,469 mimic is a data set of 496 00:17:28,470 --> 00:17:30,569 patients in a hospital system on the East 497 00:17:30,570 --> 00:17:33,389 Coast that contains 498 00:17:33,390 --> 00:17:35,129 almost everything that happens in a 499 00:17:35,130 --> 00:17:37,079 hospital for lab results. 500 00:17:37,080 --> 00:17:38,190 Doctors notes 501 00:17:39,390 --> 00:17:40,919 medication. 502 00:17:40,920 --> 00:17:43,109 All of these things that in Germany 503 00:17:43,110 --> 00:17:44,699 we don't really have access to are in 504 00:17:44,700 --> 00:17:46,889 this dataset for research in the 505 00:17:46,890 --> 00:17:47,890 identified manner. 506 00:17:49,170 --> 00:17:51,689 This dataset is not public public, 507 00:17:51,690 --> 00:17:53,879 but extremely easy to get access to 508 00:17:53,880 --> 00:17:56,339 for another interesting case to 509 00:17:56,340 --> 00:17:59,459 dig into and see what's possible. 510 00:17:59,460 --> 00:18:01,349 And there are some privacy concerns, but 511 00:18:01,350 --> 00:18:03,629 given that this dataset is widely 512 00:18:03,630 --> 00:18:05,579 used in research already, there's a lot 513 00:18:05,580 --> 00:18:06,959 of papers about it. 514 00:18:06,960 --> 00:18:08,819 I think this is a relatively innocent 515 00:18:08,820 --> 00:18:09,899 example to start with 516 00:18:12,840 --> 00:18:14,909 now. With all health care 517 00:18:14,910 --> 00:18:16,959 data, there is a lot of issues and I want 518 00:18:16,960 --> 00:18:18,119 to talk about them briefly. 519 00:18:19,710 --> 00:18:21,659 I see three main issues. 520 00:18:21,660 --> 00:18:24,119 The one thing is ground truth 521 00:18:25,740 --> 00:18:27,569 is often lacking. It's unclear what 522 00:18:27,570 --> 00:18:28,559 ground truth actually is. 523 00:18:28,560 --> 00:18:30,659 And if you remember the example I started 524 00:18:30,660 --> 00:18:32,639 with, you know, you have in the original 525 00:18:32,640 --> 00:18:34,769 doctor's diagnoses and now you have 526 00:18:34,770 --> 00:18:36,899 a consensus diagnosis, is that 527 00:18:36,900 --> 00:18:38,879 consensus necessarily better? 528 00:18:38,880 --> 00:18:41,279 Why is it better if five doctors 529 00:18:41,280 --> 00:18:42,359 agreed on it? 530 00:18:42,360 --> 00:18:43,889 You know, maybe they were wrong in the 531 00:18:43,890 --> 00:18:44,910 original doctor was right. 532 00:18:46,020 --> 00:18:48,149 It's very hard to really find out what 533 00:18:48,150 --> 00:18:50,279 the right diagnosis would have been. 534 00:18:50,280 --> 00:18:51,809 And I think a lot of research is lacking 535 00:18:51,810 --> 00:18:53,879 in that regard, and this sort of 536 00:18:53,880 --> 00:18:55,319 carries over to a lot of different data 537 00:18:55,320 --> 00:18:56,670 points where 538 00:18:58,260 --> 00:19:00,089 it's extremely hard in health care to 539 00:19:00,090 --> 00:19:02,489 just trust the data that you have 540 00:19:02,490 --> 00:19:03,960 then. Second, 541 00:19:05,040 --> 00:19:07,319 semantics are surprisingly difficult 542 00:19:07,320 --> 00:19:09,749 in health care. What this means is, 543 00:19:09,750 --> 00:19:11,279 you know, somewhere in a hospital based 544 00:19:11,280 --> 00:19:13,859 or lab results, for example, 545 00:19:13,860 --> 00:19:15,899 and there's just no standard, whether 546 00:19:15,900 --> 00:19:18,449 they use milligrams per liter 547 00:19:18,450 --> 00:19:20,519 or grams per cubic 548 00:19:20,520 --> 00:19:22,649 centimeter, or there's all 549 00:19:22,650 --> 00:19:24,719 these different options. And this one 550 00:19:24,720 --> 00:19:26,849 example sounds trivial, but this is 551 00:19:26,850 --> 00:19:28,469 all over the place. 552 00:19:28,470 --> 00:19:30,209 How do you store a certain diagnosis? 553 00:19:30,210 --> 00:19:32,549 Is it the flu or is it influenza? 554 00:19:32,550 --> 00:19:33,690 Is it influenza A? 555 00:19:34,710 --> 00:19:36,629 There's just many different ways to code 556 00:19:36,630 --> 00:19:38,699 this stuff and having the 557 00:19:38,700 --> 00:19:40,499 semantics, which is called sort of the 558 00:19:40,500 --> 00:19:42,749 mapping of what people use 559 00:19:42,750 --> 00:19:44,339 to what you can actually work with a data 560 00:19:44,340 --> 00:19:47,099 person is still, 561 00:19:47,100 --> 00:19:48,359 I would say, in its infancy. 562 00:19:49,530 --> 00:19:52,079 And then finally, and this leads me to my 563 00:19:52,080 --> 00:19:53,190 last two points. 564 00:19:54,870 --> 00:19:57,159 There's a little sampling bias and 565 00:19:57,160 --> 00:19:59,550 resulting issues of representativeness. 566 00:20:00,660 --> 00:20:02,849 And I think sort of deep down, 567 00:20:02,850 --> 00:20:04,169 this comes from the fact that the health 568 00:20:04,170 --> 00:20:06,119 care data is personal data. 569 00:20:06,120 --> 00:20:08,339 And so, you know, you don't you can't 570 00:20:08,340 --> 00:20:10,199 just go and get access to all health care 571 00:20:10,200 --> 00:20:12,179 data for all patients. 572 00:20:12,180 --> 00:20:14,759 So often you have these examples 573 00:20:14,760 --> 00:20:17,459 where some dataset surfaces 574 00:20:17,460 --> 00:20:18,690 like the mimic datasets 575 00:20:19,800 --> 00:20:21,239 and then that's all you have and you work 576 00:20:21,240 --> 00:20:23,369 with that. And that leads to a lot 577 00:20:23,370 --> 00:20:26,549 of representativeness problems in 578 00:20:26,550 --> 00:20:27,629 medical research. 579 00:20:27,630 --> 00:20:30,029 But I think in general, in a lot of 580 00:20:30,030 --> 00:20:32,099 health care data projects, this is 581 00:20:32,100 --> 00:20:33,869 one example. 582 00:20:33,870 --> 00:20:35,939 A paper published in Nature 583 00:20:35,940 --> 00:20:37,619 this year, which is very impressive. 584 00:20:37,620 --> 00:20:39,989 They predicted some really cool things, 585 00:20:39,990 --> 00:20:42,269 but the only data they had was from 586 00:20:42,270 --> 00:20:44,609 the Veterans Association in the U.S., so 587 00:20:44,610 --> 00:20:46,829 a separate hospital system for U.S. 588 00:20:46,830 --> 00:20:47,830 veterans. 589 00:20:49,020 --> 00:20:51,329 And as you can predict there, ninety four 590 00:20:51,330 --> 00:20:52,330 percent male. 591 00:20:53,520 --> 00:20:55,889 So learning algorithms 592 00:20:55,890 --> 00:20:57,989 on data that is 94 percent 593 00:20:57,990 --> 00:21:00,299 male sort of generalization 594 00:21:00,300 --> 00:21:01,319 error will be an issue. 595 00:21:01,320 --> 00:21:03,989 And you would like these algorithms. 596 00:21:03,990 --> 00:21:05,219 You would probably like this algorithms 597 00:21:05,220 --> 00:21:07,109 to work as well on women, as on men. 598 00:21:08,130 --> 00:21:09,419 At the same time, what else are you going 599 00:21:09,420 --> 00:21:10,589 to do? This is the only data that's 600 00:21:10,590 --> 00:21:12,389 available, so you can't really blame 601 00:21:12,390 --> 00:21:14,429 them, but it's just an issue that is 602 00:21:14,430 --> 00:21:15,899 around in health care and that you need 603 00:21:15,900 --> 00:21:17,190 to take into account. 604 00:21:20,710 --> 00:21:22,839 There's one thing that this 605 00:21:22,840 --> 00:21:25,179 also implies, which is it's 606 00:21:25,180 --> 00:21:27,369 very hard to certify 607 00:21:27,370 --> 00:21:29,649 that a certain medical device works 608 00:21:29,650 --> 00:21:31,989 as intended because the 609 00:21:31,990 --> 00:21:34,059 regulatory bodies also don't have 610 00:21:34,060 --> 00:21:35,559 fully representative test data. 611 00:21:36,700 --> 00:21:38,049 You're basically going to have to believe 612 00:21:38,050 --> 00:21:39,759 the manufacturer or the vendor of a 613 00:21:39,760 --> 00:21:42,609 medical device, such as an algorithm, 614 00:21:42,610 --> 00:21:45,099 that they used appropriate 615 00:21:45,100 --> 00:21:46,359 test data and that they use 616 00:21:46,360 --> 00:21:48,819 representative studies. 617 00:21:48,820 --> 00:21:51,189 Now, traditionally in pharmaceutical 618 00:21:51,190 --> 00:21:53,349 research, you run these randomized 619 00:21:53,350 --> 00:21:55,749 controlled trials, i.e. 620 00:21:55,750 --> 00:21:57,249 you have a status quo, you have a new 621 00:21:57,250 --> 00:21:59,349 drug. You sort of randomly 622 00:21:59,350 --> 00:22:00,819 give you a new drug to 50 percent of 623 00:22:00,820 --> 00:22:01,869 patients. 624 00:22:01,870 --> 00:22:03,579 You use the status quo and the remaining 625 00:22:03,580 --> 00:22:05,709 half, and then you compare the two 626 00:22:05,710 --> 00:22:06,729 groups. 627 00:22:06,730 --> 00:22:09,729 Still the gold standards in evidence, 628 00:22:09,730 --> 00:22:11,469 but extremely hard to do this in a 629 00:22:11,470 --> 00:22:13,329 representative way because you need to 630 00:22:13,330 --> 00:22:15,579 recruit patients to voluntarily take part 631 00:22:15,580 --> 00:22:16,580 in a trial. 632 00:22:17,500 --> 00:22:19,029 It's expensive, and sometimes it might 633 00:22:19,030 --> 00:22:21,099 not be possible because, for 634 00:22:21,100 --> 00:22:23,109 example, you might be interested in how 635 00:22:23,110 --> 00:22:25,569 cancer drugs work in children. 636 00:22:25,570 --> 00:22:27,639 We think that running 637 00:22:27,640 --> 00:22:29,979 medical trials in children is unethical. 638 00:22:29,980 --> 00:22:31,359 So how are we ever going to get to a 639 00:22:31,360 --> 00:22:33,099 point where we can learn about the 640 00:22:33,100 --> 00:22:34,100 efficacy in children? 641 00:22:35,560 --> 00:22:37,629 And I think a really cool opportunity 642 00:22:37,630 --> 00:22:39,519 for evidence that we have with 643 00:22:39,520 --> 00:22:41,679 algorithmic data and algorithmic, 644 00:22:41,680 --> 00:22:43,839 the medical devices here is 645 00:22:43,840 --> 00:22:45,969 that we can collect test data 646 00:22:45,970 --> 00:22:47,649 as a regulatory body. 647 00:22:47,650 --> 00:22:48,939 And so I think this is something that is 648 00:22:48,940 --> 00:22:51,549 currently being discussed internationally 649 00:22:51,550 --> 00:22:53,889 by the World Health Organization 650 00:22:53,890 --> 00:22:55,599 on the European level with the new 651 00:22:55,600 --> 00:22:57,699 commission and that we are doing a lot 652 00:22:57,700 --> 00:22:59,829 of work on, which is this idea 653 00:22:59,830 --> 00:23:02,079 that governments and regulatory bodies 654 00:23:02,080 --> 00:23:03,789 should collect test data in a 655 00:23:03,790 --> 00:23:05,619 representative manner. 656 00:23:05,620 --> 00:23:07,809 So what would happen is on the top 657 00:23:07,810 --> 00:23:10,179 row, you sort of see the normal workflow 658 00:23:10,180 --> 00:23:11,409 here for deep learning, but it could 659 00:23:11,410 --> 00:23:12,699 really be anything. 660 00:23:12,700 --> 00:23:14,619 There's public data that's private data. 661 00:23:14,620 --> 00:23:16,779 People train their algorithms on it, and 662 00:23:16,780 --> 00:23:18,249 in the end they come up with a model on 663 00:23:18,250 --> 00:23:20,649 the far right and then the 664 00:23:20,650 --> 00:23:22,719 regulatory bodies say on the 665 00:23:22,720 --> 00:23:24,789 European level would have a test 666 00:23:24,790 --> 00:23:27,099 dataset that is actually secret. 667 00:23:27,100 --> 00:23:29,109 That is sort of not to be shared with 668 00:23:29,110 --> 00:23:31,179 anyone that has a high quality 669 00:23:31,180 --> 00:23:32,139 standards. 670 00:23:32,140 --> 00:23:33,789 And this can be achieved, for example, by 671 00:23:33,790 --> 00:23:36,099 mandating that you collect this data 672 00:23:36,100 --> 00:23:38,169 from hospitals all over Europe and you 673 00:23:38,170 --> 00:23:40,359 just say hospitals have to submit, say, 674 00:23:40,360 --> 00:23:42,909 one percent of the images that they have 675 00:23:42,910 --> 00:23:44,469 into the secret body. 676 00:23:44,470 --> 00:23:46,449 And now I have representative data that 677 00:23:46,450 --> 00:23:48,009 I'm not sharing and so they can use this 678 00:23:48,010 --> 00:23:49,089 test data. 679 00:23:49,090 --> 00:23:51,369 And I also can keep doing this over 680 00:23:51,370 --> 00:23:53,499 time so that I can account 681 00:23:53,500 --> 00:23:54,909 for population shift. 682 00:23:54,910 --> 00:23:57,489 So, for example, in skin cancer, 683 00:23:57,490 --> 00:23:59,109 you could say that in Germany, the 684 00:23:59,110 --> 00:24:01,629 average patient in the next 10 years 685 00:24:01,630 --> 00:24:03,759 might have a slightly 686 00:24:03,760 --> 00:24:05,889 darker skin tone than the average patient 687 00:24:05,890 --> 00:24:08,019 in the last 10 years because 688 00:24:08,020 --> 00:24:09,549 of migration patterns. 689 00:24:09,550 --> 00:24:11,019 And so if you're working with algorithms, 690 00:24:11,020 --> 00:24:12,309 you want to account for this population 691 00:24:12,310 --> 00:24:13,419 shift and you want to make sure that the 692 00:24:13,420 --> 00:24:15,519 algorithm that you're certified looking 693 00:24:15,520 --> 00:24:18,159 back on test data from the last 10 years 694 00:24:18,160 --> 00:24:20,239 will still work on the new patients of 695 00:24:20,240 --> 00:24:21,629 the next 10 years. 696 00:24:21,630 --> 00:24:22,749 And so this will be a way to keep 697 00:24:22,750 --> 00:24:25,419 collecting this data and recertifying 698 00:24:25,420 --> 00:24:26,420 algorithms. 699 00:24:29,180 --> 00:24:31,219 Which finally leads me to my last point 700 00:24:31,220 --> 00:24:32,479 before we open to questions. 701 00:24:34,100 --> 00:24:36,619 I think what you have here is 702 00:24:36,620 --> 00:24:38,449 just accuracy in the end, but this 703 00:24:38,450 --> 00:24:39,709 doesn't really answer the question of 704 00:24:39,710 --> 00:24:41,239 what fairness means. 705 00:24:41,240 --> 00:24:43,429 And I mentioned earlier that, 706 00:24:43,430 --> 00:24:45,409 you know, you want an algorithm that was 707 00:24:45,410 --> 00:24:47,569 trained on 94 percent men 708 00:24:47,570 --> 00:24:49,639 to work as well on women. 709 00:24:49,640 --> 00:24:52,099 But what does aswell actually mean? 710 00:24:52,100 --> 00:24:53,839 You know, is two percentage points 711 00:24:53,840 --> 00:24:56,149 difference. Good enough is four percent 712 00:24:56,150 --> 00:24:57,049 good enough. 713 00:24:57,050 --> 00:24:59,239 Sort of not super clear from 714 00:24:59,240 --> 00:25:01,489 the outset what fairness means 715 00:25:01,490 --> 00:25:02,490 in these things. 716 00:25:03,830 --> 00:25:05,719 And so I think this is a broader issue 717 00:25:05,720 --> 00:25:07,909 that we should discuss as a community 718 00:25:07,910 --> 00:25:10,189 beyond health care, which is 719 00:25:10,190 --> 00:25:12,619 quantitative fairness, which is the idea 720 00:25:12,620 --> 00:25:15,079 that more and more decisions 721 00:25:15,080 --> 00:25:17,629 are made by algorithm and that of people. 722 00:25:17,630 --> 00:25:18,739 Traditionally, when you look at 723 00:25:18,740 --> 00:25:21,079 discrimination and fairness, you tried 724 00:25:21,080 --> 00:25:23,569 to sort of empathize 725 00:25:23,570 --> 00:25:25,729 with the actor for the 726 00:25:25,730 --> 00:25:27,049 judgment. In the end, the judge would 727 00:25:27,050 --> 00:25:28,909 say, did he mean to discriminate? 728 00:25:28,910 --> 00:25:30,410 Or did he not mean to discriminate 729 00:25:31,700 --> 00:25:32,959 with machines? That doesn't really work 730 00:25:32,960 --> 00:25:35,059 anymore. And so we need new ideas to 731 00:25:35,060 --> 00:25:37,249 describe what fairness looks like 732 00:25:37,250 --> 00:25:38,959 and what discrimination looks like. 733 00:25:38,960 --> 00:25:41,119 And I think approaches to 734 00:25:41,120 --> 00:25:44,029 quantify the fairness of a decision 735 00:25:44,030 --> 00:25:45,649 exist. The literature is relatively 736 00:25:45,650 --> 00:25:48,499 mature. This is one overview, 737 00:25:48,500 --> 00:25:50,449 but somehow in Europe, we don't really 738 00:25:50,450 --> 00:25:51,949 have that debate yet, and I think it's 739 00:25:51,950 --> 00:25:53,929 long overdue that we start having this 740 00:25:53,930 --> 00:25:56,149 debate for health care and outside. 741 00:25:58,370 --> 00:26:01,129 So my talk in a nutshell, 742 00:26:01,130 --> 00:26:02,809 first of all, keep hacking. 743 00:26:02,810 --> 00:26:04,879 Second of all, think about 744 00:26:04,880 --> 00:26:07,099 what informed consent really means for 745 00:26:07,100 --> 00:26:08,449 data. 746 00:26:08,450 --> 00:26:10,519 First of all, do you contribute 747 00:26:10,520 --> 00:26:12,469 legitimate uses? 748 00:26:12,470 --> 00:26:13,909 The data it showed you the questions I 749 00:26:13,910 --> 00:26:15,709 showed you are next are the starting 750 00:26:15,710 --> 00:26:16,710 point. 751 00:26:17,450 --> 00:26:19,249 Reach out if I can help with anything to 752 00:26:19,250 --> 00:26:20,780 connect you with data or ideas 753 00:26:21,830 --> 00:26:24,199 for demand evidence 754 00:26:24,200 --> 00:26:25,489 for medical devices. 755 00:26:25,490 --> 00:26:28,249 And I think trying to 756 00:26:28,250 --> 00:26:30,379 seize the opportunity we have 757 00:26:30,380 --> 00:26:32,119 to get better evidence is a really big 758 00:26:32,120 --> 00:26:34,219 chance. And the fifth promote 759 00:26:34,220 --> 00:26:35,659 quantitative fairness. 760 00:26:35,660 --> 00:26:36,660 Thank you very much. 761 00:26:43,260 --> 00:26:44,729 Thank you, Lars, for your talk. 762 00:26:44,730 --> 00:26:47,159 So we actually do have 763 00:26:47,160 --> 00:26:49,319 a lot of time for questions, we have 764 00:26:49,320 --> 00:26:51,419 two microphones in the hall 765 00:26:51,420 --> 00:26:53,729 and the Signal Andrews, so please line 766 00:26:53,730 --> 00:26:56,279 up at the microphones and 767 00:26:57,360 --> 00:26:59,429 we start off with the questions from the 768 00:26:59,430 --> 00:27:01,079 internet. So please signal angel 769 00:27:02,100 --> 00:27:03,269 one. 770 00:27:03,270 --> 00:27:05,489 And you don't want us to know if 771 00:27:05,490 --> 00:27:07,949 you and your employer live and 772 00:27:07,950 --> 00:27:10,409 work is being 773 00:27:10,410 --> 00:27:12,659 mentioned is involved in any 774 00:27:12,660 --> 00:27:13,899 kind of work involved. 775 00:27:13,900 --> 00:27:16,319 Checking this, there was a book by I 776 00:27:16,320 --> 00:27:18,539 to go on the checklist 777 00:27:18,540 --> 00:27:20,369 manifesto and he argues quite 778 00:27:20,370 --> 00:27:23,429 convincingly about improving 779 00:27:23,430 --> 00:27:25,739 certain procedures, reducing animals 780 00:27:25,740 --> 00:27:28,019 into the chickens. 781 00:27:28,020 --> 00:27:29,669 Do you know anything about that? 782 00:27:29,670 --> 00:27:32,130 Yeah, I'm familiar with the literature 783 00:27:33,180 --> 00:27:35,189 is not one of our main topics because we 784 00:27:35,190 --> 00:27:36,420 focus on digitalization. 785 00:27:37,710 --> 00:27:40,289 So the checklist idea is pretty, pretty 786 00:27:40,290 --> 00:27:41,969 convincing. Actually, the idea was that 787 00:27:41,970 --> 00:27:44,939 in aviation pilots, 788 00:27:44,940 --> 00:27:46,799 we're at some point required to fill out 789 00:27:46,800 --> 00:27:48,419 these checklists before takeoff and 790 00:27:48,420 --> 00:27:50,039 during flight and say, you know, yes, I 791 00:27:50,040 --> 00:27:51,629 did check this, I did check this, I did 792 00:27:51,630 --> 00:27:53,879 check this and sign it, which pilots 793 00:27:53,880 --> 00:27:56,129 hated because, you know, makes 794 00:27:56,130 --> 00:27:57,269 you do things that you don't really want 795 00:27:57,270 --> 00:27:58,289 to do. 796 00:27:58,290 --> 00:27:59,339 They were forced to do it, and it 797 00:27:59,340 --> 00:28:01,919 dramatically reduced incidences in 798 00:28:01,920 --> 00:28:02,909 aviation. 799 00:28:02,910 --> 00:28:05,069 And now Atul Gawande argues 800 00:28:05,070 --> 00:28:06,929 that the same thing should hold for 801 00:28:06,930 --> 00:28:09,059 doctors because doctors also 802 00:28:09,060 --> 00:28:10,499 hate this idea that they're subjected to 803 00:28:10,500 --> 00:28:12,839 processes by checklists 804 00:28:12,840 --> 00:28:15,089 to reduce the number of incidences 805 00:28:15,090 --> 00:28:16,090 that happen. 806 00:28:16,860 --> 00:28:18,599 The idea is relatively old and I think 807 00:28:18,600 --> 00:28:20,639 has been taken up in medical guidelines a 808 00:28:20,640 --> 00:28:22,709 lot, but we 809 00:28:22,710 --> 00:28:24,539 don't specifically work on that because 810 00:28:24,540 --> 00:28:25,859 we focus on digital aspects. 811 00:28:29,170 --> 00:28:32,079 OK, microphone, let's call it one. 812 00:28:32,080 --> 00:28:34,659 How would you legislate or 813 00:28:34,660 --> 00:28:36,909 decide what kind of 814 00:28:36,910 --> 00:28:38,950 consent would be useful for medical data? 815 00:28:41,800 --> 00:28:44,079 Currently, it is. 816 00:28:44,080 --> 00:28:45,909 I mean, there is legislation around. 817 00:28:45,910 --> 00:28:48,099 It basically comes down all the way from 818 00:28:48,100 --> 00:28:50,199 GDPR done for German 819 00:28:50,200 --> 00:28:51,250 legislative bodies. 820 00:28:52,630 --> 00:28:55,659 Currently, health data is deemed 821 00:28:55,660 --> 00:28:57,969 a special interest private data, 822 00:28:57,970 --> 00:29:00,039 and so you need very specific narrow 823 00:29:00,040 --> 00:29:01,040 consent. 824 00:29:02,410 --> 00:29:04,479 And one option that you have 825 00:29:04,480 --> 00:29:06,279 for health data in particular is that 826 00:29:06,280 --> 00:29:08,589 GDPR actually allows for 827 00:29:08,590 --> 00:29:10,809 exceptions to this idea 828 00:29:10,810 --> 00:29:13,869 of consent, where you can say if a 829 00:29:13,870 --> 00:29:16,839 important social interest stands against 830 00:29:16,840 --> 00:29:18,519 sort of your private interest in keeping 831 00:29:18,520 --> 00:29:19,630 your data personal, 832 00:29:20,710 --> 00:29:22,419 you can say this data is available for 833 00:29:22,420 --> 00:29:24,039 research use. So specifically for 834 00:29:24,040 --> 00:29:26,289 research, this 835 00:29:26,290 --> 00:29:28,539 is something that 836 00:29:28,540 --> 00:29:29,829 can be done. 837 00:29:29,830 --> 00:29:32,199 But I think, you know, because 838 00:29:32,200 --> 00:29:33,399 it's so sensitive, you want to give 839 00:29:33,400 --> 00:29:35,529 people the opportunity to maybe 840 00:29:35,530 --> 00:29:37,449 still opt in, but in a different way, 841 00:29:37,450 --> 00:29:39,519 maybe in a broader way or opt 842 00:29:39,520 --> 00:29:40,520 out. 843 00:29:41,320 --> 00:29:43,299 And I think that's why I'm talking about 844 00:29:43,300 --> 00:29:45,189 broad consent because I think Americans 845 00:29:45,190 --> 00:29:47,290 and currently is too narrow 846 00:29:48,370 --> 00:29:50,559 and there would be ways to go broader 847 00:29:50,560 --> 00:29:53,139 and that would significantly facilitate 848 00:29:53,140 --> 00:29:54,730 access to this data for health care. 849 00:29:57,160 --> 00:29:58,330 OK. Microphone two. 850 00:30:00,100 --> 00:30:02,439 So you mentioned at the beginning this 851 00:30:02,440 --> 00:30:05,199 patient that they had care, 852 00:30:05,200 --> 00:30:07,929 so it was diagnosed with cancer, 853 00:30:07,930 --> 00:30:10,959 but he didn't have cancer was 854 00:30:10,960 --> 00:30:13,239 operated, but I don't 855 00:30:13,240 --> 00:30:15,729 think that was the fault of the doctor 856 00:30:15,730 --> 00:30:18,979 because this reduces to statistics. 857 00:30:18,980 --> 00:30:21,159 You have the procedure called 858 00:30:21,160 --> 00:30:23,259 trade off. Like if you want 859 00:30:23,260 --> 00:30:25,989 only the patients that have truly cancer, 860 00:30:25,990 --> 00:30:28,149 you have high precision, but you miss 861 00:30:28,150 --> 00:30:30,759 a lot of patients that 862 00:30:30,760 --> 00:30:32,109 have cancer. 863 00:30:32,110 --> 00:30:34,179 So or if you want all 864 00:30:34,180 --> 00:30:36,249 patients with cancer, you were to have 865 00:30:36,250 --> 00:30:38,319 a lot of people that 866 00:30:38,320 --> 00:30:39,909 don't have cancer as well. 867 00:30:39,910 --> 00:30:42,309 And the same problem you also have with 868 00:30:42,310 --> 00:30:44,649 machine learning. So you have the 869 00:30:44,650 --> 00:30:47,019 precision required trade 870 00:30:47,020 --> 00:30:49,749 off and this reduces 871 00:30:49,750 --> 00:30:52,089 to actually two problems you need better 872 00:30:53,110 --> 00:30:55,269 measuring devices or better devices 873 00:30:55,270 --> 00:30:57,549 that reduces this erode 874 00:30:57,550 --> 00:30:59,709 the variance and of course, 875 00:30:59,710 --> 00:31:00,820 more data as well. 876 00:31:02,260 --> 00:31:04,519 And surface not the problem 877 00:31:04,520 --> 00:31:06,599 of the doctors 878 00:31:06,600 --> 00:31:08,439 as statistics. 879 00:31:08,440 --> 00:31:10,359 So first of all, I agree with you, it's 880 00:31:10,360 --> 00:31:11,709 not. The problem of a doctor is they're 881 00:31:11,710 --> 00:31:13,359 doing a fine job and doing the best job 882 00:31:13,360 --> 00:31:14,360 they can. 883 00:31:15,280 --> 00:31:17,349 But sort of as data 884 00:31:17,350 --> 00:31:19,869 people, if you look at this naively, 885 00:31:21,070 --> 00:31:22,959 it's kind of weird that you have humans 886 00:31:22,960 --> 00:31:25,089 looking at pictures all day and are 887 00:31:25,090 --> 00:31:27,219 expected to diagnose 100 888 00:31:27,220 --> 00:31:28,929 percent correctly. 889 00:31:28,930 --> 00:31:30,699 You know, all day, every day, they just 890 00:31:30,700 --> 00:31:31,899 look at pictures. 891 00:31:31,900 --> 00:31:33,609 That's not what humans are good at. 892 00:31:33,610 --> 00:31:36,189 So it shouldn't be surprising that 893 00:31:36,190 --> 00:31:38,379 mistakes happen, but it's not their 894 00:31:38,380 --> 00:31:40,839 fault. It's just they're they're human. 895 00:31:40,840 --> 00:31:42,729 The second part is, yeah, you're right, 896 00:31:42,730 --> 00:31:43,839 there's precision and recall. 897 00:31:43,840 --> 00:31:45,999 But what I'm arguing is that humans 898 00:31:46,000 --> 00:31:48,369 have poor AUC 899 00:31:48,370 --> 00:31:50,799 in their predictive qualities, 900 00:31:50,800 --> 00:31:53,049 and we would want to have better 901 00:31:53,050 --> 00:31:55,209 performance in this type of 902 00:31:55,210 --> 00:31:56,379 algorithm 903 00:31:56,380 --> 00:31:58,769 and help get people respond 904 00:31:58,770 --> 00:32:00,879 to questions or discussion 905 00:32:00,880 --> 00:32:02,499 is OK. 906 00:32:02,500 --> 00:32:04,689 Is there a follow up question to 907 00:32:04,690 --> 00:32:05,529 that? Yeah. 908 00:32:05,530 --> 00:32:08,259 Where do you said then the threshold? 909 00:32:08,260 --> 00:32:10,479 Do we all want more precision or 910 00:32:10,480 --> 00:32:13,329 do we want more? You call so 911 00:32:13,330 --> 00:32:13,689 you want? 912 00:32:13,690 --> 00:32:14,690 Who are you? See? 913 00:32:17,860 --> 00:32:19,089 OK? 914 00:32:19,090 --> 00:32:20,679 Is there a question from the internet 915 00:32:20,680 --> 00:32:22,209 that second engine? 916 00:32:22,210 --> 00:32:24,189 Yes, Twitter wants to know. 917 00:32:24,190 --> 00:32:26,769 Do you think that using algorithms 918 00:32:26,770 --> 00:32:28,959 on the existing health they don't know 919 00:32:28,960 --> 00:32:31,629 prevent gender bias diagnosis? 920 00:32:31,630 --> 00:32:33,969 Or isn't there enough knowledge about 921 00:32:33,970 --> 00:32:36,429 how diseases so any women 922 00:32:36,430 --> 00:32:37,430 or any? 923 00:32:40,170 --> 00:32:41,579 Sorry, could you repeat the second half 924 00:32:41,580 --> 00:32:43,269 of the question, OK? 925 00:32:43,270 --> 00:32:45,599 The reprieve, given that we already 926 00:32:45,600 --> 00:32:48,149 know how certain diseases 927 00:32:48,150 --> 00:32:50,489 show up in women differently than men, 928 00:32:50,490 --> 00:32:52,289 do you think that these data and 929 00:32:52,290 --> 00:32:54,419 algorithms can help discover 930 00:32:54,420 --> 00:32:56,909 more things and change 931 00:32:56,910 --> 00:32:59,279 how diagnosis are 932 00:32:59,280 --> 00:33:01,409 provided or how accurate they 933 00:33:01,410 --> 00:33:02,609 are? 934 00:33:02,610 --> 00:33:03,629 Yes, absolutely. 935 00:33:03,630 --> 00:33:06,869 I think first and foremost, 936 00:33:06,870 --> 00:33:09,389 the medical practice 937 00:33:09,390 --> 00:33:11,789 often does not yet 938 00:33:11,790 --> 00:33:13,349 sort of live in the year 2020. 939 00:33:13,350 --> 00:33:15,269 So I think a lot of gender bias exists 940 00:33:15,270 --> 00:33:17,549 implicitly in how treatment still 941 00:33:17,550 --> 00:33:19,049 is delivered. 942 00:33:19,050 --> 00:33:21,179 And so first of all, I think using 943 00:33:21,180 --> 00:33:23,459 existing health care data is helpful 944 00:33:23,460 --> 00:33:25,769 to show these biases and 945 00:33:25,770 --> 00:33:28,739 demand them to be reduced. 946 00:33:28,740 --> 00:33:29,740 And then second of all, 947 00:33:30,950 --> 00:33:32,759 I mean, I can't predict the future, but I 948 00:33:32,760 --> 00:33:34,889 could imagine that you find all types of 949 00:33:34,890 --> 00:33:37,649 understudied groups of patients 950 00:33:37,650 --> 00:33:39,509 that maybe historically because they 951 00:33:39,510 --> 00:33:41,879 didn't sign up to to be part of 952 00:33:41,880 --> 00:33:42,779 our cities. 953 00:33:42,780 --> 00:33:45,179 Randomized controlled trials 954 00:33:45,180 --> 00:33:47,009 were understudied and you're going to 955 00:33:47,010 --> 00:33:49,139 find gaps in the care that 956 00:33:49,140 --> 00:33:50,039 these people receive. 957 00:33:50,040 --> 00:33:51,509 And women are one example, but I think 958 00:33:51,510 --> 00:33:53,639 ethnic minorities will be another. 959 00:33:53,640 --> 00:33:55,559 Age groups will be another. 960 00:33:55,560 --> 00:33:57,719 I would expect a lot of these type of 961 00:33:57,720 --> 00:33:58,720 insights to surface. 962 00:34:00,800 --> 00:34:03,019 OK. Microphone, one really short 963 00:34:03,020 --> 00:34:04,909 question, you mentioned this German 964 00:34:04,910 --> 00:34:07,129 quality assurance they deserve. 965 00:34:07,130 --> 00:34:10,039 What does it contain, except 966 00:34:10,040 --> 00:34:12,499 besides admission rates, does it contain 967 00:34:12,500 --> 00:34:15,439 outcomes or risk 968 00:34:15,440 --> 00:34:16,789 adjustment, etc.? 969 00:34:17,960 --> 00:34:19,969 You mean the the the A-G dataset, 970 00:34:21,409 --> 00:34:23,908 the the dataset actually only contains 971 00:34:23,909 --> 00:34:26,388 aggregate data on 972 00:34:26,389 --> 00:34:27,649 the total proceed. 973 00:34:27,650 --> 00:34:29,809 So the total episode, so it was 974 00:34:29,810 --> 00:34:31,849 a total episode billed as. 975 00:34:31,850 --> 00:34:33,769 So there's a billing code and then there 976 00:34:33,770 --> 00:34:35,509 is procedures linked to that. 977 00:34:37,460 --> 00:34:39,169 Average data sets exist, some of them 978 00:34:39,170 --> 00:34:41,629 private, some of them public access. 979 00:34:41,630 --> 00:34:42,799 But I think if you want to have more 980 00:34:42,800 --> 00:34:45,379 detail on all the admission data, 981 00:34:45,380 --> 00:34:47,089 you're moving into a world of currently 982 00:34:47,090 --> 00:34:49,459 private data or protected 983 00:34:49,460 --> 00:34:51,448 research data because it's very hard to 984 00:34:51,449 --> 00:34:52,968 anonymized this data. 985 00:34:52,969 --> 00:34:54,319 And so there's always a risk of really 986 00:34:54,320 --> 00:34:55,320 identifying it. 987 00:34:57,230 --> 00:34:59,569 OK. So but for one again, 988 00:34:59,570 --> 00:35:00,079 thanks. 989 00:35:00,080 --> 00:35:01,699 Thanks for your talk. 990 00:35:01,700 --> 00:35:04,759 I like the idea quite a lot with 991 00:35:04,760 --> 00:35:06,169 building up representative data sets, 992 00:35:06,170 --> 00:35:07,369 obviously. 993 00:35:07,370 --> 00:35:09,829 But for me, the question in my mind, 994 00:35:09,830 --> 00:35:11,209 what you are achieving is a 995 00:35:11,210 --> 00:35:13,159 representative dataset, but not 996 00:35:13,160 --> 00:35:15,169 necessarily also a high quality data set, 997 00:35:15,170 --> 00:35:17,329 especially if you force this data 998 00:35:17,330 --> 00:35:19,639 collection up on hospitals, 999 00:35:19,640 --> 00:35:21,289 then you could end up with a really messy 1000 00:35:21,290 --> 00:35:22,579 data set. 1001 00:35:22,580 --> 00:35:24,499 What are your thoughts about this? 1002 00:35:24,500 --> 00:35:25,909 Very good point. 1003 00:35:25,910 --> 00:35:27,619 I think it's not enough to just collect 1004 00:35:27,620 --> 00:35:29,479 this data. I think you would also need to 1005 00:35:29,480 --> 00:35:31,639 actually invest in the quality of this 1006 00:35:31,640 --> 00:35:33,649 data so you wouldn't need to hire an 1007 00:35:33,650 --> 00:35:35,899 expert panel to 1008 00:35:35,900 --> 00:35:38,419 improve the data that you're collecting, 1009 00:35:38,420 --> 00:35:40,609 which means it's a very expensive overall 1010 00:35:40,610 --> 00:35:41,610 process. 1011 00:35:42,620 --> 00:35:44,329 You will not be able to do this for every 1012 00:35:44,330 --> 00:35:46,399 type of diagnose that you're interested 1013 00:35:46,400 --> 00:35:48,739 in, but only for the bigger fields 1014 00:35:48,740 --> 00:35:50,569 that are becoming more and more mature. 1015 00:35:50,570 --> 00:35:52,010 But one example I think 1016 00:35:53,030 --> 00:35:54,949 mammography is for breast cancer 1017 00:35:54,950 --> 00:35:57,289 screening is 1018 00:35:57,290 --> 00:35:58,319 relatively mature. 1019 00:35:58,320 --> 00:36:00,409 The technology there is several companies 1020 00:36:00,410 --> 00:36:02,629 going on the market now, and I think 1021 00:36:02,630 --> 00:36:04,019 they'll be one. The field where you're 1022 00:36:04,020 --> 00:36:06,319 starting to collect international 1023 00:36:06,320 --> 00:36:08,389 test data will be very feasible and 1024 00:36:08,390 --> 00:36:09,390 probably worth it. 1025 00:36:11,690 --> 00:36:13,879 OK, I'm looking at the I'm looking at the 1026 00:36:13,880 --> 00:36:14,809 signal injuries. 1027 00:36:14,810 --> 00:36:16,729 There are questions from the internet 1028 00:36:16,730 --> 00:36:18,899 that be the subject of, you 1029 00:36:18,900 --> 00:36:20,989 know, OK and microphone 1030 00:36:20,990 --> 00:36:21,990 two, please. 1031 00:36:23,570 --> 00:36:25,519 So looking at this from the perspective 1032 00:36:25,520 --> 00:36:27,679 of the individual whose data is supposed 1033 00:36:27,680 --> 00:36:29,300 to be of these datasets in the end, 1034 00:36:30,860 --> 00:36:32,509 is there any work being done or do you 1035 00:36:32,510 --> 00:36:34,609 have ideas on how to sort of 1036 00:36:34,610 --> 00:36:36,679 say soften the blow if your data 1037 00:36:36,680 --> 00:36:38,179 becomes public? 1038 00:36:38,180 --> 00:36:40,539 This could be either unintentionally, 1039 00:36:40,540 --> 00:36:42,409 for example, data leaks or 1040 00:36:43,490 --> 00:36:45,739 also if I just agreed to have my data 1041 00:36:45,740 --> 00:36:48,199 be part of a public dataset. 1042 00:36:48,200 --> 00:36:50,089 What kind of protections to have as a as 1043 00:36:50,090 --> 00:36:52,249 an individual in 1044 00:36:52,250 --> 00:36:53,599 that situation? 1045 00:36:53,600 --> 00:36:55,699 Because I think that that 1046 00:36:55,700 --> 00:36:57,829 is obviously one big reason why 1047 00:36:57,830 --> 00:36:59,629 you don't have access to sensitive data 1048 00:36:59,630 --> 00:37:01,249 because it is sensitive and this is 1049 00:37:01,250 --> 00:37:03,349 sensitive often for 1050 00:37:03,350 --> 00:37:05,629 reasons that are maybe 1051 00:37:05,630 --> 00:37:07,130 something you could address with policy. 1052 00:37:08,760 --> 00:37:10,819 Um, yes and no. 1053 00:37:10,820 --> 00:37:12,589 So first of all, I think what we're 1054 00:37:12,590 --> 00:37:14,689 actually lacking is 1055 00:37:16,850 --> 00:37:18,829 in Germany that $5m supporter. 1056 00:37:18,830 --> 00:37:21,149 So penalize protection 1057 00:37:21,150 --> 00:37:22,909 of that. Even if you have certain data, 1058 00:37:22,910 --> 00:37:23,910 you cannot use it. 1059 00:37:24,830 --> 00:37:26,959 So, you know, if certain types of data 1060 00:37:26,960 --> 00:37:29,359 fall into your lap or if you happen to 1061 00:37:29,360 --> 00:37:31,819 accidentally identify anonymized 1062 00:37:31,820 --> 00:37:34,369 data, you're obliged 1063 00:37:34,370 --> 00:37:35,659 to delete, delete that data, 1064 00:37:36,680 --> 00:37:38,329 which does two things it makes the data 1065 00:37:38,330 --> 00:37:40,429 less valuable on the market. 1066 00:37:40,430 --> 00:37:42,199 So hacking for them becomes less 1067 00:37:42,200 --> 00:37:43,200 interesting. 1068 00:37:43,910 --> 00:37:46,039 And second of all, those 1069 00:37:46,040 --> 00:37:47,389 activities that are thought of in good 1070 00:37:47,390 --> 00:37:49,519 faith will help protect that this 1071 00:37:49,520 --> 00:37:51,649 data doesn't, you know, move through 1072 00:37:51,650 --> 00:37:52,579 the world. 1073 00:37:52,580 --> 00:37:54,649 But I think that that 1074 00:37:54,650 --> 00:37:56,989 kind of falls short of really protecting 1075 00:37:56,990 --> 00:37:57,990 you. 1076 00:37:58,850 --> 00:38:00,949 I would say one part is 1077 00:38:00,950 --> 00:38:03,049 people tend to be a little too 1078 00:38:03,050 --> 00:38:05,599 scared. I personally think about 1079 00:38:05,600 --> 00:38:07,129 what can be done with their health care 1080 00:38:07,130 --> 00:38:09,259 data. So I would be pretty 1081 00:38:09,260 --> 00:38:11,239 careful with genetics data because we 1082 00:38:11,240 --> 00:38:12,289 know there's a lot of information in 1083 00:38:12,290 --> 00:38:14,329 there. We don't know what data is in your 1084 00:38:14,330 --> 00:38:16,459 genetics data. So yeah, 1085 00:38:16,460 --> 00:38:17,460 maybe not. 1086 00:38:18,080 --> 00:38:20,239 And there are certain stigmatized data 1087 00:38:20,240 --> 00:38:21,800 points like sexual health, 1088 00:38:23,210 --> 00:38:25,349 but or or psychological health 1089 00:38:25,350 --> 00:38:27,259 that maybe don't want to out in the open. 1090 00:38:27,260 --> 00:38:28,850 But besides that, I think, 1091 00:38:30,440 --> 00:38:31,909 yes, there are certain issues 1092 00:38:32,930 --> 00:38:34,669 these issues we can address with 1093 00:38:34,670 --> 00:38:36,829 legislation maybe better than we are. 1094 00:38:36,830 --> 00:38:38,959 So quantitative fairness can 1095 00:38:38,960 --> 00:38:41,989 also go to health conditions 1096 00:38:41,990 --> 00:38:43,039 and can say, you know, you cannot 1097 00:38:43,040 --> 00:38:44,149 discriminate against certain health 1098 00:38:44,150 --> 00:38:45,889 conditions as an insurance company 1099 00:38:47,510 --> 00:38:49,639 that would take a lot of fears away that 1100 00:38:49,640 --> 00:38:50,640 people currently have. 1101 00:38:51,650 --> 00:38:54,679 And then finally, I think the 1102 00:38:54,680 --> 00:38:56,869 idea my idea is 1103 00:38:56,870 --> 00:38:58,399 I'm willing to share my health care data 1104 00:38:58,400 --> 00:38:59,669 if you're also sharing your health. 1105 00:38:59,670 --> 00:39:01,709 A data, because I think in a world where 1106 00:39:01,710 --> 00:39:04,019 everybody's data is open, a lot of the 1107 00:39:04,020 --> 00:39:05,730 risks are already mitigated. 1108 00:39:06,900 --> 00:39:09,179 Surfing, being conscious about 1109 00:39:09,180 --> 00:39:10,180 how to do this 1110 00:39:11,310 --> 00:39:12,989 and maybe not starting with sexual health 1111 00:39:12,990 --> 00:39:14,609 data, not starting with psychological 1112 00:39:14,610 --> 00:39:16,949 health data would be a 1113 00:39:16,950 --> 00:39:17,950 way forward. 1114 00:39:19,830 --> 00:39:20,819 OK. 1115 00:39:20,820 --> 00:39:22,439 Are you queuing for the microphone? 1116 00:39:22,440 --> 00:39:23,639 What phone? 1117 00:39:23,640 --> 00:39:24,659 Yeah. 1118 00:39:24,660 --> 00:39:25,649 Hi. 1119 00:39:25,650 --> 00:39:28,049 My question is basically that 1120 00:39:28,050 --> 00:39:30,119 giving all the concerns that 1121 00:39:30,120 --> 00:39:32,459 people have with disclosure 1122 00:39:32,460 --> 00:39:34,559 of data being detrimental to 1123 00:39:34,560 --> 00:39:36,869 their personal privacy and issues, 1124 00:39:36,870 --> 00:39:38,669 are we already at that point where we 1125 00:39:38,670 --> 00:39:41,399 really need that considering health, 1126 00:39:41,400 --> 00:39:43,439 if there's a toxic substance? 1127 00:39:43,440 --> 00:39:46,199 A lot of like health issues are caused 1128 00:39:46,200 --> 00:39:48,329 by needing to 1129 00:39:48,330 --> 00:39:50,669 analyze data on how toxic stuff 1130 00:39:50,670 --> 00:39:52,859 would be in certain circumstances 1131 00:39:52,860 --> 00:39:54,509 in the public sphere. 1132 00:39:54,510 --> 00:39:56,729 Like, have we done all 1133 00:39:56,730 --> 00:39:59,069 data disclosure that is not private, 1134 00:39:59,070 --> 00:40:00,929 that is not linked to an individual that 1135 00:40:00,930 --> 00:40:03,479 is linked to male companies 1136 00:40:03,480 --> 00:40:05,699 having a business model that generates 1137 00:40:05,700 --> 00:40:07,409 health issues for the public? 1138 00:40:07,410 --> 00:40:09,569 And are we already disclosing 1139 00:40:09,570 --> 00:40:12,179 this data first before we harvest 1140 00:40:12,180 --> 00:40:13,619 my data? 1141 00:40:13,620 --> 00:40:15,719 Like, how does 1142 00:40:15,720 --> 00:40:17,189 this count end? 1143 00:40:17,190 --> 00:40:19,319 Because in Germany, if I 1144 00:40:19,320 --> 00:40:21,749 recall right, there was this issue of 1145 00:40:21,750 --> 00:40:23,879 coffee makers and the coffee 1146 00:40:23,880 --> 00:40:26,339 brewing, and they investigated 1147 00:40:26,340 --> 00:40:29,069 that the cleaning and the mining of this 1148 00:40:29,070 --> 00:40:31,739 led some toxic substances 1149 00:40:31,740 --> 00:40:33,899 left in the first brews of the coffee, 1150 00:40:33,900 --> 00:40:35,819 but they wouldn't disclose this data for 1151 00:40:35,820 --> 00:40:37,949 saving the companies 1152 00:40:37,950 --> 00:40:40,439 not having a bad business model. 1153 00:40:40,440 --> 00:40:42,719 So the question is, is my 1154 00:40:42,720 --> 00:40:44,969 private data the last resort here or 1155 00:40:44,970 --> 00:40:47,429 is there a head way and wriggle room 1156 00:40:47,430 --> 00:40:49,499 for improving our health 1157 00:40:49,500 --> 00:40:51,689 without going harvesting my 1158 00:40:51,690 --> 00:40:52,690 data first? 1159 00:40:53,520 --> 00:40:55,229 Why kind of why does it have to be either 1160 00:40:55,230 --> 00:40:56,609 or? 1161 00:40:56,610 --> 00:40:58,919 I think asking for more 1162 00:40:58,920 --> 00:41:01,139 aggregate data to be publicly released is 1163 00:41:01,140 --> 00:41:02,459 a good idea. 1164 00:41:02,460 --> 00:41:04,559 I showed you some available aggregate 1165 00:41:04,560 --> 00:41:05,560 data sets. 1166 00:41:06,330 --> 00:41:08,459 You can probably think about more. 1167 00:41:09,600 --> 00:41:11,039 But in the end, I mean, make no mistake, 1168 00:41:11,040 --> 00:41:12,809 these aggregate data sets come from 1169 00:41:12,810 --> 00:41:14,880 pooling individual people's data. 1170 00:41:16,080 --> 00:41:17,579 So you need to collect it at some point 1171 00:41:17,580 --> 00:41:19,859 to then be able to do the analysis 1172 00:41:19,860 --> 00:41:20,860 on it. 1173 00:41:21,690 --> 00:41:23,969 So I think even current ideas say, 1174 00:41:23,970 --> 00:41:25,709 you know, maybe you cannot actually, even 1175 00:41:25,710 --> 00:41:27,209 as a researcher, get access to one 1176 00:41:27,210 --> 00:41:28,739 individual patients data. 1177 00:41:28,740 --> 00:41:30,929 But you can run queries on 1178 00:41:30,930 --> 00:41:32,099 a lot of patients data. And since you 1179 00:41:32,100 --> 00:41:33,269 can't predict the queries that are 1180 00:41:33,270 --> 00:41:35,429 interesting, you need to collect 1181 00:41:35,430 --> 00:41:36,810 individual data at some point. 1182 00:41:39,340 --> 00:41:40,869 OK. Microphone to 1183 00:41:40,870 --> 00:41:41,779 you. 1184 00:41:41,780 --> 00:41:43,369 Uh, thank you for the talk. 1185 00:41:43,370 --> 00:41:46,449 But you mentioned mammography 1186 00:41:46,450 --> 00:41:48,759 as a good example for 1187 00:41:48,760 --> 00:41:49,760 using 1188 00:41:50,980 --> 00:41:53,089 computerized detection 1189 00:41:53,090 --> 00:41:55,419 and any 1190 00:41:55,420 --> 00:41:56,589 improvement. 1191 00:41:56,590 --> 00:41:58,959 So we had two waves 1192 00:41:58,960 --> 00:42:01,389 of automatic mammography 1193 00:42:01,390 --> 00:42:03,159 in the last 30 years. 1194 00:42:03,160 --> 00:42:05,559 The first one was a total mess because 1195 00:42:06,580 --> 00:42:08,799 the techniques were not safe 1196 00:42:08,800 --> 00:42:10,929 enough, good enough and 1197 00:42:10,930 --> 00:42:13,419 that they differ from side to side. 1198 00:42:13,420 --> 00:42:15,429 They used analog techniques and things 1199 00:42:15,430 --> 00:42:17,679 like that. And the next wave was 1200 00:42:17,680 --> 00:42:19,869 about 10 years ago where 1201 00:42:19,870 --> 00:42:22,299 we used automated data 1202 00:42:22,300 --> 00:42:25,059 sets and all the things that we had. 1203 00:42:25,060 --> 00:42:26,060 And 1204 00:42:27,190 --> 00:42:29,439 it ended in a total mess to 1205 00:42:29,440 --> 00:42:31,629 course in 2015, when we 1206 00:42:31,630 --> 00:42:33,909 did the first revision of this 1207 00:42:33,910 --> 00:42:34,839 idea. 1208 00:42:34,840 --> 00:42:36,999 We saw that the people 1209 00:42:37,000 --> 00:42:39,309 that used it, either in the US 1210 00:42:39,310 --> 00:42:41,769 or in the European hospitals, 1211 00:42:41,770 --> 00:42:43,959 had the problem that 1212 00:42:43,960 --> 00:42:46,689 there were too much false positive ideas. 1213 00:42:46,690 --> 00:42:49,119 So we had to more much 1214 00:42:49,120 --> 00:42:51,939 to too much recalls. 1215 00:42:51,940 --> 00:42:54,189 And I think we lost a 1216 00:42:54,190 --> 00:42:56,019 lot of trust. 1217 00:42:56,020 --> 00:42:58,000 And so how do we 1218 00:42:59,080 --> 00:43:01,449 how can we avoid this and the next 1219 00:43:01,450 --> 00:43:03,219 time we use it? 1220 00:43:03,220 --> 00:43:05,259 I think what you're saying is exactly why 1221 00:43:05,260 --> 00:43:08,079 I'm suggesting that we need 1222 00:43:08,080 --> 00:43:10,269 regulated test data 1223 00:43:10,270 --> 00:43:12,309 to avoid making these mistakes on 1224 00:43:12,310 --> 00:43:13,809 patients. 1225 00:43:13,810 --> 00:43:15,969 I think that I would personally think 1226 00:43:15,970 --> 00:43:17,529 that technology currently is relatively 1227 00:43:17,530 --> 00:43:19,609 mature and there 1228 00:43:19,610 --> 00:43:22,119 is several companies going on the market 1229 00:43:22,120 --> 00:43:25,479 again with these types of products. 1230 00:43:25,480 --> 00:43:28,389 And precisely because 1231 00:43:28,390 --> 00:43:30,159 in the past it didn't always work as 1232 00:43:30,160 --> 00:43:32,109 intended. You might be interested in 1233 00:43:32,110 --> 00:43:34,239 ascertaining as a regulator that 1234 00:43:34,240 --> 00:43:35,889 you have good test data. 1235 00:43:35,890 --> 00:43:38,499 Yeah. And I think the 1236 00:43:38,500 --> 00:43:40,539 re- improvement of the test data is a 1237 00:43:40,540 --> 00:43:42,250 crucial point, as you mentioned. 1238 00:43:43,690 --> 00:43:45,549 OK, microphone one, please. 1239 00:43:45,550 --> 00:43:47,979 And some important 1240 00:43:47,980 --> 00:43:50,229 health data we have in Germany are the 1241 00:43:50,230 --> 00:43:53,079 registers like the Cancer Register, 1242 00:43:53,080 --> 00:43:55,269 but they are organized on the federal 1243 00:43:55,270 --> 00:43:57,369 level and we 1244 00:43:57,370 --> 00:43:59,619 had a lot of problems getting them 1245 00:43:59,620 --> 00:44:01,419 running in a good way. 1246 00:44:01,420 --> 00:44:03,699 And now we have the 1247 00:44:03,700 --> 00:44:06,639 new possibilities of the electronic 1248 00:44:06,640 --> 00:44:09,339 patients file 1249 00:44:09,340 --> 00:44:11,589 coming up and also 1250 00:44:11,590 --> 00:44:14,379 when we will have a change 1251 00:44:14,380 --> 00:44:17,379 on our regulation with organ 1252 00:44:17,380 --> 00:44:19,629 transplant stuff, we are going 1253 00:44:19,630 --> 00:44:21,699 to have some sort of 1254 00:44:21,700 --> 00:44:24,309 organ register like 1255 00:44:24,310 --> 00:44:26,439 am I'm willing to donate or 1256 00:44:26,440 --> 00:44:27,399 not? 1257 00:44:27,400 --> 00:44:29,709 And are you thinking about 1258 00:44:29,710 --> 00:44:32,619 is this register going to be 1259 00:44:32,620 --> 00:44:34,959 organized on a national level or 1260 00:44:34,960 --> 00:44:36,309 on a federal level? 1261 00:44:36,310 --> 00:44:38,559 And do you think that 1262 00:44:38,560 --> 00:44:41,019 the future of the electronic 1263 00:44:41,020 --> 00:44:42,020 patient? 1264 00:44:43,810 --> 00:44:46,269 Yeah, the API type patient 1265 00:44:46,270 --> 00:44:48,339 file will change 1266 00:44:48,340 --> 00:44:50,649 the way the registries 1267 00:44:50,650 --> 00:44:52,899 like the Cancer Register are 1268 00:44:52,900 --> 00:44:55,419 organized. Do you think that there 1269 00:44:55,420 --> 00:44:57,819 will be some new interaction 1270 00:44:57,820 --> 00:45:00,099 and how we use this data? 1271 00:45:00,100 --> 00:45:01,149 Yeah. 1272 00:45:01,150 --> 00:45:02,769 Personally, I really hope so. 1273 00:45:02,770 --> 00:45:04,750 So maybe for those that are not familiar, 1274 00:45:05,920 --> 00:45:07,699 Germany, I think, like other countries, 1275 00:45:07,700 --> 00:45:09,999 have these registries where 1276 00:45:10,000 --> 00:45:13,419 we collect data on individual diagnoses 1277 00:45:13,420 --> 00:45:14,439 to be used for research. 1278 00:45:14,440 --> 00:45:15,939 So we say, hey, we have a lack of 1279 00:45:15,940 --> 00:45:18,759 understanding of a certain cancer type 1280 00:45:18,760 --> 00:45:20,589 of organ transplants. 1281 00:45:20,590 --> 00:45:22,269 And so we collect data specifically for 1282 00:45:22,270 --> 00:45:24,429 this purpose under a sort of a regulated 1283 00:45:24,430 --> 00:45:25,430 exception. 1284 00:45:26,680 --> 00:45:27,639 And yeah, absolutely. 1285 00:45:27,640 --> 00:45:30,099 I mean, in a world where we have 1286 00:45:30,100 --> 00:45:32,259 a national electronic medical record, 1287 00:45:32,260 --> 00:45:34,089 you would hope that this registry data 1288 00:45:34,090 --> 00:45:36,159 can be included there, 1289 00:45:36,160 --> 00:45:38,559 partially also because the electronic 1290 00:45:38,560 --> 00:45:40,689 medical record could offer consent 1291 00:45:40,690 --> 00:45:42,879 management where patients could 1292 00:45:42,880 --> 00:45:45,219 be told to give their consent 1293 00:45:45,220 --> 00:45:47,529 for certain uses and having 1294 00:45:47,530 --> 00:45:49,269 that on national infrastructure in a 1295 00:45:49,270 --> 00:45:51,189 secure environment would be very 1296 00:45:51,190 --> 00:45:52,190 desirable. 1297 00:45:53,860 --> 00:45:55,179 The second part of your question? 1298 00:45:58,180 --> 00:45:59,180 Who knows, 1299 00:46:00,700 --> 00:46:02,589 you know, who knows what's going to 1300 00:46:02,590 --> 00:46:04,809 happen with the national AI 1301 00:46:04,810 --> 00:46:06,399 trying to make medical records there? 1302 00:46:06,400 --> 00:46:07,719 They're not really available yet, and we 1303 00:46:07,720 --> 00:46:09,189 don't know how people use them. 1304 00:46:09,190 --> 00:46:11,469 But in an ideal world, I think they 1305 00:46:11,470 --> 00:46:13,599 could be used as a central 1306 00:46:13,600 --> 00:46:15,189 information storage and sharing 1307 00:46:15,190 --> 00:46:17,349 opportunity not only with doctors, 1308 00:46:17,350 --> 00:46:19,629 but also if the patient ones that 1309 00:46:19,630 --> 00:46:21,819 we have registries and with 1310 00:46:21,820 --> 00:46:23,919 researchers that answer your 1311 00:46:23,920 --> 00:46:24,920 question. 1312 00:46:26,050 --> 00:46:28,389 Yeah, thank you. Just not the organ thing 1313 00:46:28,390 --> 00:46:30,699 if you think this is going to be national 1314 00:46:30,700 --> 00:46:31,239 or not. 1315 00:46:31,240 --> 00:46:32,919 But that's what my field of expertize 1316 00:46:32,920 --> 00:46:34,169 unfortunately can tell you things. 1317 00:46:35,560 --> 00:46:37,300 OK, microphone one again. 1318 00:46:40,780 --> 00:46:42,939 I thank you for the talk. 1319 00:46:42,940 --> 00:46:44,589 I'm. 1320 00:46:44,590 --> 00:46:47,229 I attended the the presentation 1321 00:46:47,230 --> 00:46:49,629 yesterday about the EPA. 1322 00:46:49,630 --> 00:46:52,029 So the patient file, the case file 1323 00:46:52,030 --> 00:46:54,249 and what was 1324 00:46:54,250 --> 00:46:56,649 explained there is that they 1325 00:46:56,650 --> 00:46:59,829 don't want to take the efforts for 1326 00:46:59,830 --> 00:47:02,239 actually having the option 1327 00:47:02,240 --> 00:47:04,329 and opening the file 1328 00:47:04,330 --> 00:47:06,879 for certain doctors, just for like 1329 00:47:06,880 --> 00:47:09,189 a or certain parts of the own 1330 00:47:09,190 --> 00:47:10,959 file for doctors. 1331 00:47:10,960 --> 00:47:13,929 It's more about all or nothing. 1332 00:47:13,930 --> 00:47:15,639 What do you have to choose as a patient 1333 00:47:15,640 --> 00:47:18,009 first when they want to implement it? 1334 00:47:18,010 --> 00:47:20,829 And I'm considering that private 1335 00:47:20,830 --> 00:47:23,439 health companies already 1336 00:47:23,440 --> 00:47:25,569 implement systems 1337 00:47:25,570 --> 00:47:27,879 for data collecting with 1338 00:47:27,880 --> 00:47:30,849 a automated 1339 00:47:30,850 --> 00:47:33,729 system itemization for 1340 00:47:33,730 --> 00:47:36,069 the data, which they send to 1341 00:47:36,070 --> 00:47:37,070 research 1342 00:47:38,320 --> 00:47:40,689 centers. So to say, is there 1343 00:47:40,690 --> 00:47:43,149 any effort by now 1344 00:47:43,150 --> 00:47:45,249 from the German government 1345 00:47:45,250 --> 00:47:47,379 or certain 1346 00:47:47,380 --> 00:47:49,689 institutions to 1347 00:47:49,690 --> 00:47:53,199 follow such an idea of systemize 1348 00:47:53,200 --> 00:47:56,589 itemization to collect the data 1349 00:47:56,590 --> 00:47:58,839 and keep the individual data 1350 00:47:58,840 --> 00:48:00,129 with the doctors? 1351 00:48:00,130 --> 00:48:02,319 And if not, 1352 00:48:02,320 --> 00:48:05,139 is there a 1353 00:48:05,140 --> 00:48:07,359 or where's the best place, 1354 00:48:07,360 --> 00:48:09,909 where to start for 1355 00:48:09,910 --> 00:48:11,920 promoting such ideas? 1356 00:48:13,120 --> 00:48:15,189 So the idea of 1357 00:48:15,190 --> 00:48:17,289 pseudonym Sudan food on 1358 00:48:17,290 --> 00:48:19,569 amazing and anonymizing 1359 00:48:19,570 --> 00:48:21,909 health care data, of course, is as widely 1360 00:48:21,910 --> 00:48:23,169 spread and is well understood in the 1361 00:48:23,170 --> 00:48:24,170 government. 1362 00:48:25,540 --> 00:48:27,339 I think for the German patient records, 1363 00:48:27,340 --> 00:48:28,360 the national EMR, 1364 00:48:29,680 --> 00:48:31,539 that doesn't really well. 1365 00:48:31,540 --> 00:48:33,189 It could have work, but we decided to 1366 00:48:33,190 --> 00:48:35,079 actually have central storage and 1367 00:48:35,080 --> 00:48:37,269 guarantee privacy by encryption, 1368 00:48:39,580 --> 00:48:41,679 which has certain advantages 1369 00:48:41,680 --> 00:48:42,909 in particular, I think. 1370 00:48:44,110 --> 00:48:46,299 Me personally, I'm much more comfortable 1371 00:48:46,300 --> 00:48:48,489 with my data being in sort of 1372 00:48:48,490 --> 00:48:50,649 national infrastructure than 1373 00:48:50,650 --> 00:48:53,199 being in doctor's offices and seeing 1374 00:48:53,200 --> 00:48:54,549 the I.T. 1375 00:48:54,550 --> 00:48:56,829 security of the typical doctor's office 1376 00:48:56,830 --> 00:48:57,830 in Germany. 1377 00:49:00,100 --> 00:49:01,839 So I think there's a lot to be said about 1378 00:49:01,840 --> 00:49:03,609 having that in the national 1379 00:49:03,610 --> 00:49:05,529 infrastructure type place. 1380 00:49:05,530 --> 00:49:06,530 The second part, 1381 00:49:08,110 --> 00:49:10,369 I think the option. 1382 00:49:10,370 --> 00:49:12,429 So what you currently can do is 1383 00:49:12,430 --> 00:49:14,709 you can share all your data 1384 00:49:14,710 --> 00:49:15,710 with one doctor. 1385 00:49:16,990 --> 00:49:19,449 You cannot choose what parts of your data 1386 00:49:19,450 --> 00:49:20,709 in the current specification. 1387 00:49:20,710 --> 00:49:22,899 I think the criticism 1388 00:49:22,900 --> 00:49:25,359 has been heard and it will be possible 1389 00:49:25,360 --> 00:49:27,579 to share only selected parts 1390 00:49:27,580 --> 00:49:28,580 of your data. 1391 00:49:29,350 --> 00:49:31,299 But again, I think what what doctors are 1392 00:49:31,300 --> 00:49:32,839 saying on the other side of it, this 1393 00:49:32,840 --> 00:49:35,139 discussion is what 1394 00:49:35,140 --> 00:49:37,209 good is a subset 1395 00:49:37,210 --> 00:49:38,769 of a patient's data. The patient might 1396 00:49:38,770 --> 00:49:41,079 not know what parts of their 1397 00:49:41,080 --> 00:49:42,849 previous diagnosis I actually need for my 1398 00:49:42,850 --> 00:49:46,419 work. So I think the idea of withholding 1399 00:49:46,420 --> 00:49:48,519 data from doctors is very unpopular with 1400 00:49:48,520 --> 00:49:49,520 doctors. 1401 00:49:50,610 --> 00:49:52,899 And I think 1402 00:49:52,900 --> 00:49:55,359 it might be much more of an opt out 1403 00:49:55,360 --> 00:49:58,029 type situation where I don't want my 1404 00:49:58,030 --> 00:50:00,339 sexual health information shared 1405 00:50:00,340 --> 00:50:02,079 unless it's explicitly needed. 1406 00:50:02,080 --> 00:50:03,789 But everything else is OK. 1407 00:50:03,790 --> 00:50:05,469 Then, as an opt in situation where it's 1408 00:50:05,470 --> 00:50:07,269 used every single piece of data that I 1409 00:50:07,270 --> 00:50:08,469 have in my patient record. 1410 00:50:10,690 --> 00:50:12,909 OK, Mary, 1411 00:50:12,910 --> 00:50:13,939 concise question. 1412 00:50:13,940 --> 00:50:15,249 Yes. OK, so 1413 00:50:17,620 --> 00:50:18,639 what? 1414 00:50:18,640 --> 00:50:20,799 It doesn't 1415 00:50:20,800 --> 00:50:22,959 really answer for me the question 1416 00:50:22,960 --> 00:50:25,119 about the itemization, because I 1417 00:50:25,120 --> 00:50:27,759 think there's a lot, 1418 00:50:27,760 --> 00:50:30,159 um, interest of, 1419 00:50:30,160 --> 00:50:32,709 yeah, gathering the data um 1420 00:50:32,710 --> 00:50:34,869 sodomized to do 1421 00:50:34,870 --> 00:50:36,939 research. So why couldn't 1422 00:50:36,940 --> 00:50:39,909 be this a first step? 1423 00:50:39,910 --> 00:50:42,429 Um, because with 1424 00:50:42,430 --> 00:50:45,219 the new EPA coming, 1425 00:50:45,220 --> 00:50:47,409 if you have the access or if 1426 00:50:47,410 --> 00:50:49,569 you have signed up for this 1427 00:50:49,570 --> 00:50:51,969 electronic case 1428 00:50:53,110 --> 00:50:54,819 file, then let me interrupt you for a 1429 00:50:54,820 --> 00:50:56,109 second. Yeah. What you're saying? 1430 00:50:56,110 --> 00:50:58,300 Lawsuit anonymization is totally right 1431 00:51:00,070 --> 00:51:02,649 in the law that went into effect, 1432 00:51:02,650 --> 00:51:04,849 I think two weeks ago the fog, 1433 00:51:04,850 --> 00:51:06,939 the telephones, almost Corvette's. 1434 00:51:06,940 --> 00:51:09,009 We actually installed a sensor 1435 00:51:09,010 --> 00:51:11,409 to collect research data from 1436 00:51:11,410 --> 00:51:13,689 health insurance companies, and we 1437 00:51:13,690 --> 00:51:15,349 have actually written into law a 1438 00:51:15,350 --> 00:51:17,559 mechanism by which this data is 1439 00:51:17,560 --> 00:51:18,560 pseudonymous. 1440 00:51:19,720 --> 00:51:21,879 The sort of minimization is in 1441 00:51:21,880 --> 00:51:24,399 the law and it is being used 1442 00:51:24,400 --> 00:51:27,279 sort of from a data privacy standpoint 1443 00:51:27,280 --> 00:51:29,349 that is just the identifying that is not 1444 00:51:29,350 --> 00:51:31,479 anonymizing because using external 1445 00:51:31,480 --> 00:51:33,129 data, you can still really identify the 1446 00:51:33,130 --> 00:51:34,059 data. 1447 00:51:34,060 --> 00:51:36,999 And so it sort of it's 1448 00:51:37,000 --> 00:51:39,329 it's a hygiene factor. 1449 00:51:39,330 --> 00:51:40,919 But it doesn't solve all these privacy 1450 00:51:40,920 --> 00:51:42,629 issues that we have. 1451 00:51:42,630 --> 00:51:44,879 And so I think, yes, we're doing it. 1452 00:51:44,880 --> 00:51:47,669 But at the same time, you need consent 1453 00:51:47,670 --> 00:51:49,589 and you maybe need, in certain 1454 00:51:49,590 --> 00:51:51,449 circumstances, effective anonymization 1455 00:51:51,450 --> 00:51:53,489 techniques to really solve this bigger 1456 00:51:53,490 --> 00:51:54,490 issue. 1457 00:51:54,930 --> 00:51:57,179 OK. OK, so this concludes 1458 00:51:57,180 --> 00:51:59,939 our Q&A, and thanks 1459 00:51:59,940 --> 00:52:01,829 very much for the talk and for the 1460 00:52:01,830 --> 00:52:03,959 extensive Q&A round of applause 1461 00:52:03,960 --> 00:52:04,960 for him. Thank you.