WEBVTT 1 00:00:00.060 --> 00:00:08.980 Hi. Okay, I am Andrew Duchowski and I'll be presenting our recent work on eye tracking in 360 videos 2 00:00:09.130 --> 00:00:13.710 with the focus on accessibility standing in for Chris Hughes. He's really our 3 00:00:13.730 --> 00:00:19.830 developer of our...you know what, I'll play the slides...were we are... 4 00:00:20.980 --> 00:00:26.240 He's really our lead developer in our LEAD-ME COST Action Workgroup. 5 00:00:26.270 --> 00:00:33.260 And so here in Vienna, what I'll talk about is a little bit of the background and motivation. 6 00:00:35.880 --> 00:00:40.180 accessible, immersive video. And then what we do, eye tracking 7 00:00:40.440 --> 00:00:45.460 in 360 video to test various usability type 8 00:00:45.540 --> 00:00:52.970 applications and accessibility type developments. Okay. So background and motivation. Here is 9 00:00:53.270 --> 00:00:57.050 what standard 2D subtitles are 10 00:00:57.850 --> 00:00:59.100 used to 11 00:00:59.350 --> 00:01:04.080 percent, or what we are used to seeing when they are presented. So here, in this case 12 00:01:04.269 --> 00:01:11.410 on TV, on the 2D display, and here they were showing color and a box, a specific kind of font. 13 00:01:13.640 --> 00:01:17.080 So you've seen all that before. and 14 00:01:17.190 --> 00:01:33.310 this kind of thing can show up on most devices these days. And even this is actually taken from, I think the Opera House in Barcelona, where you even have something called surtitles where the subtitles are above the stage, or even in the seat back in front of you. 15 00:01:33.650 --> 00:01:40.330 So the point of this is that we need this kind of accessibility in modern 16 00:01:40.690 --> 00:01:46.930 extended reality applications, VR being one of them, augmented reality as well. 17 00:01:48.170 --> 00:01:57.270 Yet another form is sign language which I will quickly skip because we're not really implementing this or looking at this in our particular workgroup. 18 00:01:58.570 --> 00:02:03.120 Instead, we're doing immersive video, which is 360. 19 00:02:03.210 --> 00:02:13.020 So this goes back to a BBC project that Chris was involved in building the immersive accessibility player 20 00:02:13.160 --> 00:02:17.400 and looking at the media there and trying to get 21 00:02:18.840 --> 00:02:22.500 subtitles to show up in VR. 22 00:02:24.560 --> 00:02:31.110 And the ImAc player and recorder was part of that design. So that was back then it was user centered with 23 00:02:31.360 --> 00:02:43.840 user requirements. They eventually built the platform but the trouble is that you'll see that this development kind of was short lived, or maybe limited in its extent, because 24 00:02:46.450 --> 00:02:54.850 too much was uncovered in the sense, too many variables, too many type of possibilities, and it was really hard to pin down what worked. 25 00:02:54.970 --> 00:03:07.480 But anyway, what we want to do is, do this for VR glasses specifically, and one of the concerns is limited resolution. And so where do you put the subtitles. Do you put it 26 00:03:07.710 --> 00:03:16.250 in the regular kind of like 2D display at the bottom. or what? You have to remember that the user's head moves around here. 27 00:03:16.280 --> 00:03:22.140 And so we get a 360 view, and so it's much different from the standard TV 2D display. 28 00:03:22.220 --> 00:03:30.020 The resolution is not much of a of an issue, except that you don't want to put subtitles there, because it would distort the text. 29 00:03:30.170 --> 00:03:33.910 The other issue that it shows up, or what was uncovered in the ImAc 30 00:03:34.640 --> 00:03:36.480 player was that 31 00:03:36.510 --> 00:03:48.350 this person speaking here is not speaking. In fact, he's just standing there, but the speaker is off to the side somewhere where we don't see them at the moment. And so here, how do you indicate 32 00:03:48.470 --> 00:03:50.520 where they are? They use an arrow to test. 33 00:03:50.560 --> 00:03:53.240 So yeah, he's not speaking. 34 00:03:53.290 --> 00:03:59.660 Speaker is over on the left. Go look there! Meanwhile the other technical challenge here is, how do you present 35 00:04:00.220 --> 00:04:06.330 video and 3D. You could do various geometric mappings, using cube map, or 36 00:04:06.410 --> 00:04:10.410 pick, we rectangular what the difference here is 37 00:04:10.710 --> 00:04:16.029 is a distortion, and so the plot below shows you how much distortion there is. 38 00:04:16.079 --> 00:04:23.920 and where, and so most of it happens at the polls, as it were. That's why you see often the globe is kind of stretched. 39 00:04:24.000 --> 00:04:27.630 There's a video here, but it doesn't want to play in in the player. 40 00:04:27.690 --> 00:04:29.150 which just shows you 41 00:04:31.530 --> 00:04:35.400 which shows you what the distortion looks like. So if it's done properly. 42 00:04:35.450 --> 00:04:37.090 the distortion 43 00:04:37.710 --> 00:04:40.870 isn't there because these lines look like they're 44 00:04:43.210 --> 00:04:45.390 like lines, and they're not distorted 45 00:04:45.430 --> 00:04:49.080 like curves like you would expect here in the image. 46 00:04:49.860 --> 00:04:58.200 Alright, other types of mappings include standard cube. Back, then, you see, the distortions are kind of around the corners of the cube. 47 00:04:58.290 --> 00:05:06.370 and then also equi-angular cube map seems like one of the better options, but it's a little bit more exotic, and 48 00:05:07.450 --> 00:05:08.300 I don't know. 49 00:05:08.420 --> 00:05:17.700 not quite as sort of prevalent as the equi rectangular projection that we use a lot of 360 cameras use, and so they give you that pretty much for for free. 50 00:05:18.200 --> 00:05:23.060 So here you see what the what the mapping looks like, where the distortions are. 51 00:05:23.150 --> 00:05:29.580 and if you're mostly in the center field of view. Then you're kind of okay. And so it seems to work very well. 52 00:05:29.740 --> 00:05:34.500 Now user testing for accessibility. The trouble here is, 53 00:05:34.960 --> 00:05:43.600 first of all, people might not be used to the technology. So, in fact, in some of our testing that Marta may tell you about 54 00:05:43.720 --> 00:05:47.120 when asked which type of subtitle. Do you want head 55 00:05:47.350 --> 00:05:58.200 or a headlocked, or fixed, people say what's VR? And so there, that's an interesting kind of technological challenge in itself. Content is super important 56 00:05:58.390 --> 00:06:08.810 because how do you actually get? Excuse me, objective results as opposed to subjective kinds of things. 57 00:06:09.040 --> 00:06:12.290 So this is what they learned in the W3C group is, they looked at various 58 00:06:12.400 --> 00:06:15.480 VR approaches, and here's what they came up with. 59 00:06:15.760 --> 00:06:19.250 Lots of things! Are the subtitles fixed in the scene, 60 00:06:20.840 --> 00:06:26.470 or headlocked, so do they, are they linked to where your head direction is pointing. 61 00:06:27.800 --> 00:06:39.030 Turns out that they didn't get a very good answer to all this, even though they were able. and Chris was instrumental in this, and implementing this. So this is a technical challenge. 62 00:06:39.280 --> 00:06:50.270 On the right. The little cone is the head, direction and head position on the the plane. There is where the actual part of the video shows up where we can draw subtitles. 63 00:06:50.410 --> 00:06:51.220 So 64 00:06:51.540 --> 00:06:59.980 in VR world in 360 video we have location of elements using polar coordinates. 65 00:07:01.430 --> 00:07:03.310 polar angle and azimuth. 66 00:07:03.440 --> 00:07:13.420 And then what you do in computer graphics is you build a scene graph. So the scene means the world. The world includes a camera. It includes the captions. 67 00:07:13.620 --> 00:07:15.180 the video. 68 00:07:15.220 --> 00:07:19.490 the sphere and the fake camera is where the where the head is. 69 00:07:19.750 --> 00:07:22.600 So how does it move? Is it fixed and so forth? 70 00:07:25.210 --> 00:07:29.520 Chris came to this because they, I think they asked him to do this with a 71 00:07:29.650 --> 00:07:41.510 with in beyond, he said, oh, yeah, sure, I'll be happy to do that. It's pretty easy. And then they gave him this video where the speakers were on the side in an airplane seat, and right away you can see that things didn't quite work. 72 00:07:42.140 --> 00:07:46.100 It wasn't quite obvious who was speaking when and so forth, and so 73 00:07:46.280 --> 00:07:54.130 fixed subtitles. He had to basically reinvent them, using something based on particle systems. This is a screenshot from 74 00:07:54.160 --> 00:08:03.620 Star Trek II The Wrath of Kahn. I think a 1982 or 85 movie where each particle here 75 00:08:04.210 --> 00:08:07.870 has a position inside this flame. 76 00:08:08.200 --> 00:08:16.800 So each of these sort of strands in the video. Here is a is a particle, and it's got a position of velocity. So speed, direction, color. 77 00:08:17.170 --> 00:08:23.240 age, lifetime. Eventually the particles die, and they are removed from the scene. 78 00:08:23.360 --> 00:08:36.240 There's also transparency and things like that. So Chris thought, hey, why not use the same computer graphics technology to represent subtitles as particles. 79 00:08:36.280 --> 00:08:44.039 And that's what he did. And of course, with that you've got an emitter so you can throw these captions, as he calls them, or subtitles anywhere you want. 80 00:08:44.140 --> 00:08:52.360 Each one has a certain lifetime or lifespan. They can have their own color, font shape. box, transparency whatever. 81 00:08:53.580 --> 00:09:03.170 And so here he is with all these options you've got this ton of things to pick from. Here on the right you can see the menu. It's quite involved 82 00:09:03.230 --> 00:09:13.680 guide modes, responsive captions, time codes, all sorts of things. Trouble is, how do you actually test this? How do you actually establish what is useful 83 00:09:13.730 --> 00:09:17.380 out of all these myriad of choices. 84 00:09:19.100 --> 00:09:31.010 Here's what they look like, and some of the things that that particle systems give you. You can use physics engine to prevent collisions. You can stack them so that they don't overlap each other. 85 00:09:31.170 --> 00:09:38.350 You can do the different colors. You can make sure they're inside the scene. This is Chris's living room, by the way. 86 00:09:39.360 --> 00:09:50.240 There he is. You can basically see him in the middle picture. There he's standing up, and he's talking to his kids and all the various things you can do 87 00:09:50.390 --> 00:10:01.230 to make these display in VR. We still don't know, though. which is the best of all these, what's the most effective? What's the most accessible. 88 00:10:03.880 --> 00:10:17.650 And so. like Henry Ford, if he was asked what they wanted, they would have said, Just give me the old style 2D subtitles. so that I can see them. So they would not want all these new fangled things. 89 00:10:17.670 --> 00:10:25.040 So here's how we test them in 360. The technology that we've conceived of here, and that Chris implemented. 90 00:10:25.360 --> 00:10:28.970 And we use an eye tracker to find out what you're looking at. 91 00:10:30.590 --> 00:10:41.440 The HTC Vive Pro Eye is such. an eye-tracked, endowed VR headset that we use, and we've got several of these, and so 92 00:10:41.780 --> 00:10:48.720 Dr. Krejtz has used one in Warsaw. Marta has taken it to Slovenia, 93 00:10:48.770 --> 00:10:59.990 Spain, and Chris has one in the Uk. I've got one here in in the States, so we all share the the source code. And we we all test this and run this 94 00:11:00.100 --> 00:11:04.190 when it works. 120 Hertz ey tracking that's the sampling rate. 95 00:11:04.330 --> 00:11:14.490 It comes with this sort of software development toolkit SDK: from SRanipal which gives us the gaze information vectors. 96 00:11:14.530 --> 00:11:16.730 Here's what it looks like. You get a position 97 00:11:16.910 --> 00:11:24.820 for where the gaze ray emanates at each eyeball, the direction, and then the intersection 98 00:11:24.910 --> 00:11:27.920 on the screen, if you want it, or you can calculate it. 99 00:11:29.350 --> 00:11:30.280 So 100 00:11:30.560 --> 00:11:33.880 the geometry is there, and it works very fairly well. 101 00:11:35.220 --> 00:11:36.180 We get 102 00:11:36.240 --> 00:11:44.780 the gaze, origin, direction, pupil diameter, eye openness, and whether your eye is in the sensor or visible by the sensor. 103 00:11:45.130 --> 00:11:46.180 I am. 104 00:11:46.560 --> 00:11:52.240 We get all this in Unity. We can output these files and put them in. And so we've got. 105 00:11:52.420 --> 00:11:54.160 Chris has created this 106 00:11:54.400 --> 00:12:08.540 system architecture where it all relies on a threaded kind of recorder. So we record the data, the screen, eye movement data, save it to a file, and then later you can read it all in 107 00:12:08.700 --> 00:12:11.950 and play it back via this 108 00:12:12.140 --> 00:12:17.840 player that he's also developed. So that is the technology. I think 109 00:12:18.110 --> 00:12:21.950 Marta will come in and show you an example of this 110 00:12:22.110 --> 00:12:31.970 video that we have, or several videos. And the key concern here is that we've got all these options, and we don't quite know which is the best. How do you test this? 111 00:12:32.040 --> 00:12:39.680 Technology itself is the learning step that users have to put the helmet on to learn how to use it. 112 00:12:40.270 --> 00:12:42.220 And then the eye tracker that we get 113 00:12:42.340 --> 00:12:54.800 that will give us objective evidence of what people are doing. Are they actually reading the subtitles or not. This was actually not known before people would say, oh, sure, I read the subtitles. 114 00:12:55.060 --> 00:12:58.460 but we actually have data that shows quite the contrary. 115 00:12:58.570 --> 00:13:04.440 So I will leave you at that. That is the technology that we've built up, Chris mainly. 116 00:13:04.620 --> 00:13:08.460 And then Marta, Dr. Krejtz 117 00:13:08.680 --> 00:13:16.810 has gone and obtained really interesting data, and I think you'll see more during this LEAD-ME COST Action 118 00:13:17.100 --> 00:13:18.350 workshop. 119 00:13:19.340 --> 00:13:28.000 Thank you so much on behalf of myself. Dr. Krejtz, Marta, and of course. the indubitable Chris Hughes in Salford. 120 00:13:28.760 --> 00:13:30.710 Thank you. See, you later.