1 00:00:00,000 --> 00:00:18,890 *Music* 2 00:00:18,890 --> 00:00:24,350 Herald: Hello everybody, we are ready to get started we have Lucas and Amir here 3 00:00:24,350 --> 00:00:29,170 and they want to give us a quick introduction of a project from the 4 00:00:29,170 --> 00:00:33,540 wikimedia foundation called "cloud services" and how it might be may be 5 00:00:33,540 --> 00:00:39,110 useful to all of us. So let's give a round of welcoming applause to Lucas and Amir. 6 00:00:39,110 --> 00:00:42,850 *Applause* 7 00:00:42,850 --> 00:00:49,490 Lucas: Thanks! yea, hello. So "wikimedia cloud services" is basically this big 8 00:00:49,490 --> 00:00:55,230 collection of all kinds of different things which are useful if you want to do 9 00:00:55,230 --> 00:00:58,780 taking your things in the wikimedia universe like with wikipedia or other 10 00:00:58,780 --> 00:01:05,920 projects and you get them free of charge or you can just use them and the only 11 00:01:05,920 --> 00:01:09,880 requirement is that you use them for something that's a kind of relevant to the 12 00:01:09,880 --> 00:01:14,400 mission of wikimedia of promoting free knowledge and that kind of stuff and it's 13 00:01:14,400 --> 00:01:18,360 kind of split into the things that you can do with your regular wikimedia account 14 00:01:18,360 --> 00:01:22,750 which any registered user can do and then there's also things you need a special 15 00:01:22,750 --> 00:01:26,250 account for on a different system called wiki tech and Amir is going to talk more 16 00:01:26,250 --> 00:01:30,030 about those later but first let's just look into some of the things you can do 17 00:01:30,030 --> 00:01:34,420 with your regular wikimedia account. And if you want to follow any of these links 18 00:01:34,420 --> 00:01:39,220 there's a shortcut here. I was about to switch the next tab, so let's just stay 19 00:01:39,220 --> 00:01:45,211 here for a few seconds yeah. So the first thing is the API sandbox which is if you 20 00:01:45,211 --> 00:01:51,590 want to use the MediaWiki API to figure out what you have on a page or to make 21 00:01:51,590 --> 00:01:55,820 edits or any kind of stuff. The API sandbox is a special page that's really 22 00:01:55,820 --> 00:02:00,630 useful to find out how to use the API for example here's all the different actions I 23 00:02:00,630 --> 00:02:08,179 can use that say query is the kind of general catch-all action that's here and 24 00:02:08,179 --> 00:02:12,700 then I get down here a list of all the parameters I can use with queries such as: 25 00:02:12,700 --> 00:02:20,160 I want to have all the user info and what kind of user info do I want? I want 26 00:02:20,160 --> 00:02:24,280 options, blablabla. I would like to have some different format versions. So it 27 00:02:24,280 --> 00:02:29,070 gives you all these nice inputs for figuring out exactly how to use the API 28 00:02:29,070 --> 00:02:33,000 what's valid what's not valid and then you can make the API request and there you get 29 00:02:33,000 --> 00:02:38,440 a response and we can't read anything because it's zoomed in way too much. But 30 00:02:38,440 --> 00:02:42,880 it's very helpful when trying to use the API and then in the end here you can see 31 00:02:42,880 --> 00:02:49,280 what you need to do in your own code to make the same API request. And for 32 00:02:49,280 --> 00:02:54,949 anything that you can't do with the normal API - so if you want to do some kind of 33 00:02:54,949 --> 00:02:59,570 more expensive analysis - you can often do that with Quarry, which is a tool that 34 00:02:59,570 --> 00:03:04,910 lets you write SQL queries against databases that are almost like the ones in 35 00:03:04,910 --> 00:03:09,310 production like you don't have user passwords and stuff but you'll have all 36 00:03:09,310 --> 00:03:16,100 the database tables with page metadata and connections between them and the logs and 37 00:03:16,100 --> 00:03:20,370 all kinds of stuff and you can just write your SQL here send it and you get the 38 00:03:20,370 --> 00:03:25,310 results for example here's the number of lexemes published a days so it's some kind 39 00:03:25,310 --> 00:03:30,790 of selecting from the page where the namespace is the lexeme namespace and 40 00:03:30,790 --> 00:03:41,260 grouping that by the date and then we get something like all the way down to 41 00:03:41,260 --> 00:03:46,260 September which is apparently when I ran this query there were here there were 116 42 00:03:46,260 --> 00:03:52,630 lexemes created in this day. Or here someone had a list of edits to JavaScript 43 00:03:52,630 --> 00:03:57,810 and CSS pages on Hungarian Wikipedia so you can run these queries against any Wiki 44 00:03:57,810 --> 00:04:07,090 you like, like this here in wikipedia one. And if you can't get by with just SQL what 45 00:04:07,090 --> 00:04:13,340 you also have is this thing called Paws, which gives you a Jupiter(?) instance if 46 00:04:13,340 --> 00:04:18,970 you've heard of that you can basically write your own Python code here and do it 47 00:04:18,970 --> 00:04:23,900 in a very convenient way because there's all kinds of auto-completion and helpful 48 00:04:23,900 --> 00:04:36,780 things. So i can just try to copy this and run the code (then I needed a new cell 49 00:04:36,780 --> 00:04:43,000 below it… there we go, Thanks!) and if I type item I should get helpful hints what 50 00:04:43,000 --> 00:04:50,350 I can do with the item (if it's not hanging or something or the tab control 51 00:04:50,350 --> 00:04:58,650 space no oh there we go yeah) and it's also a very useful way to work with py- 52 00:04:58,650 --> 00:05:07,310 wiki-bot or you can also directly get normal shell here. And one thing (oops did 53 00:05:07,310 --> 00:05:11,890 I click and wrong thing? I would like to have oh no I don't want a bash notebook I 54 00:05:11,890 --> 00:05:20,750 want a new terminal that's what I want). And here you have for example database 55 00:05:20,750 --> 00:05:33,010 dumps in (where was it?) public/dumps/ something public again… So if you want to 56 00:05:33,010 --> 00:05:40,660 do some kind of analysis here on the data dumps you can get them here and then have 57 00:05:40,660 --> 00:05:47,530 all the computing that you want I guess to analyze the wiki more thoroughly and all 58 00:05:47,530 --> 00:05:51,720 of this is hosted in the Wikimedia Cloud for you and you don't need your own server 59 00:05:51,720 --> 00:05:56,950 or anything. Oh yeah I had two more examples of that, for example here: I use 60 00:05:56,950 --> 00:06:01,750 that too so there were a lot of items on Wikipedia where there was some encoding 61 00:06:01,750 --> 00:06:06,360 error, this should be an apostrophe like down here and instead it was this kind of 62 00:06:06,360 --> 00:06:12,680 I with an accent and I hacked together some ugly Java/Python code to make all of 63 00:06:12,680 --> 00:06:16,609 these edits and it was already logged in as well I didn't need to worry about 64 00:06:16,609 --> 00:06:21,060 logging in or having a password or anything. So it's a very convenient way to 65 00:06:21,060 --> 00:06:29,280 make edits as well. Or you can build something nicer here you can insert like 66 00:06:29,280 --> 00:06:36,950 markdown cells to explain what you're doing and how the code works and build 67 00:06:36,950 --> 00:06:41,700 nice notebooks like that, which are almost self-explanatory. And those are some of 68 00:06:41,700 --> 00:06:44,880 the things you can do just with your Wikimedia account and now Amir is going to 69 00:06:44,880 --> 00:06:49,180 talk about some other things. Amir: Thanks Lucas! So the thing that we 70 00:06:49,180 --> 00:06:55,010 can do is that maybe some of you like me think that doing thing in browser is for 71 00:06:55,010 --> 00:07:00,130 kids I need to do things in terminal I need to do connected system and then you 72 00:07:00,130 --> 00:07:05,340 can access for a wiki tech account which you can just make a wiki tech account in 73 00:07:05,340 --> 00:07:11,980 this place called wiki tech. (where is the li… no no but I do'… the main thing, the 74 00:07:11,980 --> 00:07:20,520 main list. yeah okay) And so in here so and then you make a wiki tech account and 75 00:07:20,520 --> 00:07:24,650 it gets approved quickly and then you get the shell and then you can just quickly go 76 00:07:24,650 --> 00:07:30,440 there (where is yer…) and you can go to this shell and just log in and then you 77 00:07:30,440 --> 00:07:35,260 have access to day a big set of nodes in the cloud and you can just do whatever you 78 00:07:35,260 --> 00:07:40,520 want. Also you have access to the core dumps and you have access to the replica 79 00:07:40,520 --> 00:07:58,670 database. Let me show it to you. [mumbling] So for example you can go to LS 80 00:07:58,670 --> 00:08:13,750 /public/dumps/public/wikidatawiki/ and then you get - oh there's like all sorts 81 00:08:13,750 --> 00:08:18,790 of time and everything that you want to, but if you also… you can do something else 82 00:08:18,790 --> 00:08:32,780 is that you can just do SQL wikidatawiki and then you go inside the wikidata's 83 00:08:32,780 --> 00:08:36,150 database, I mean it does you don't have the rights you can you cannot write to 84 00:08:36,150 --> 00:08:40,329 their replica because it's a replica and also it's sanitized so it doesn't have 85 00:08:40,329 --> 00:08:49,740 their like hash of user password and stuff like that but still you can do just select 86 00:08:49,740 --> 00:09:09,130 varies from recent changes limits five and yeah and then you get all of the things 87 00:09:09,130 --> 00:09:15,140 that you want you cannot even describe anything you want to directly into their 88 00:09:15,140 --> 00:09:20,171 system and then there is also we have something called the job grid so you can 89 00:09:20,171 --> 00:09:25,310 just put a crown and anything that you want to or just it's something run 90 00:09:25,310 --> 00:09:30,690 something directly and you goes to the a big note of cloud kubernetes and then just 91 00:09:30,690 --> 00:09:35,820 runs everything that you want to in its here there's a more information about it 92 00:09:35,820 --> 00:09:42,690 in here there's a like a long help that it says like oh I used to run this job and 93 00:09:42,690 --> 00:09:48,260 then job of what it does and you can get this so you just need to it's a bash 94 00:09:48,260 --> 00:09:54,780 command you can run any bash command and send it okay return me this output to this 95 00:09:54,780 --> 00:10:00,140 place and the other places one thing that you can do is also there's a web server 96 00:10:00,140 --> 00:10:05,380 that you can access everything directly so you can just put a PHP file there and into 97 00:10:05,380 --> 00:10:12,720 the Apache and then yet for example this is this is an example that we built 98 00:10:12,720 --> 00:10:19,380 together I think two two Christmases ago but this was like you can just see this is 99 00:10:19,380 --> 00:10:23,710 a piece before the source code is available and you just copy pasted that 100 00:10:23,710 --> 00:10:28,350 source code into like a directory and it was there and every time we click on it 101 00:10:28,350 --> 00:10:32,210 and you get most of the edits that happen on description wiki data that might be 102 00:10:32,210 --> 00:10:38,040 vandalism and we can fix it also a this is not just the only thing that you can do 103 00:10:38,040 --> 00:10:45,050 with this is that you can also put a Python flask application is this the file 104 00:10:45,050 --> 00:10:50,714 implants and then this can be just a Python application and you can just have 105 00:10:50,714 --> 00:10:57,830 the file there and also know JSON Java there's so many of them also you can have 106 00:10:57,830 --> 00:11:01,230 own database like I have something that has its own database for example quick 107 00:11:01,230 --> 00:11:09,520 categories in here has jobs that are here this is this tool for its own built-in 108 00:11:09,520 --> 00:11:15,480 database inside our select cloud services and its uses it just fine you can do that 109 00:11:15,480 --> 00:11:21,930 as well and also there's a cloud VPS that it doesn't do any kubernetes it just you 110 00:11:21,930 --> 00:11:27,070 can make a VPS of your own and then do whatever you want with it so for example 111 00:11:27,070 --> 00:11:31,770 and you get a project and you get the quota it's a slightly more limited but 112 00:11:31,770 --> 00:11:35,799 also you have access to the whole VPS you have sudo rights on it you can do whatever 113 00:11:35,799 --> 00:11:40,400 you feel like about it so we have like for example this project in here and it's 114 00:11:40,400 --> 00:11:47,050 called tools and then there's proxies and you can for example go into that instance 115 00:11:47,050 --> 00:11:52,170 and reboot it and do whatever you want and you can make new instance and look at your 116 00:11:52,170 --> 00:11:58,770 culture and look at everything else there and also you can also make it even a wiki 117 00:11:58,770 --> 00:12:05,740 on one of those cloud VPS systems which is for example we did in here in here if you 118 00:12:05,740 --> 00:12:10,750 look at it it's just a wiki and the difference is that for other ones for 119 00:12:10,750 --> 00:12:14,590 example for the vandalism dashboard you have tools that wmf labs org and then 120 00:12:14,590 --> 00:12:22,149 slash WD w VD which is the tool itself but in here we get our own subdomain so which 121 00:12:22,149 --> 00:12:28,870 will be wiki data - like seam that flew out the wmf labs org and you can even put 122 00:12:28,870 --> 00:12:35,900 all sorts of add subdomains for the wmf labs or as long it's not taken so you can 123 00:12:35,900 --> 00:12:42,260 build a media week instance instance or you can just complete a new software 124 00:12:42,260 --> 00:12:47,810 anything you can put a word processor who cares and then you can use it it's very 125 00:12:47,810 --> 00:12:58,970 simple your own thing and you can help lots of experience. Anything else? 126 00:12:58,970 --> 00:13:00,250 Lucas: I don't think so. Most 127 00:13:00,250 --> 00:13:06,430 important I would say is tool Forge to run your websites or if that's not enough for 128 00:13:06,430 --> 00:13:10,550 you cloud VPS and then you get your own VMware you can do absolutely anything you 129 00:13:10,550 --> 00:13:19,970 want as long as it matches those rules and stuff and I think that's it are there any questions? 130 00:13:19,970 --> 00:13:24,890 Herald: Hello thank you very much for the talk that was very quick so maybe 131 00:13:24,890 --> 00:13:34,600 anybody has a question here I'll give you my microphone to ask it. I don't see any 132 00:13:34,600 --> 00:13:41,630 hands nope okay I don't think we have questions but if you're just too shy to 133 00:13:41,630 --> 00:13:47,019 ask I think these guys always hanging around here around the wikipaka wiki so 134 00:13:47,019 --> 00:13:52,510 if you have anything you want to talk about you'll find them later okay then 135 00:13:52,510 --> 00:13:56,130 give a round of applause again for Lucas and Amir. 136 00:13:56,130 --> 00:13:58,830 *Applause* 137 00:13:58,830 --> 00:14:26,000 *Music*