Stereo 3D Vision (How to avoid being dinner for Wolves)

Stereo 3D Vision (How to avoid being dinner for Wolves) - Computerphile

Рет қаралды 142,344

8 жыл бұрын

If you've wondered how computer scientists use pairs of cameras to reconstruct a 3D scene, Image Analyst & Lecturer Dr Mike Pound explains.
EXTRA BITS: • EXTRA BITS: More on 3D...
Industrial Light-Field Magic: • Industrial Light-field...
Brain Scanner: • Brain Scanner - Comput...
3D Rock Art Scanner: • 3D Rock Art Scanner - ...
CPU vs GPU: • CPU vs GPU (What's the...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Пікірлер: 130

@Jader7777 8 жыл бұрын

Best computerphile video: - Clean desk - Tidy shelf - Nice hair - Classic perforated printing paper - Popped collar

@DeJayHank 8 жыл бұрын

Since pictures can be a bit noisy due to sensor imperfections, and the information gained from a single pixel isn't that much, Stereo vision algorithms often utilize Block Matching. It means that instead of finding a single pixel in the other image, you look at a "block" of pixels around it (a 5x5 block for example) and see if a very similar block can be found in the other picture. It is much more robust to single pixels being very distorted due to noise etc. but of course takes more computing time since now you have to work with 25 pixels for each matching test, instead of just 1.

@giphe Жыл бұрын

That makes alot of sense, thanks for the additional info!

@giphe Жыл бұрын

So is it kind of like performing a convolution from the sets of pixels from one image onto the other image and finding the closest match from that?

@dvoraj20 8 жыл бұрын

Frankly... this ended where I hoped it would start.

@unvergebeneid 8 жыл бұрын

+Jan Dvořák Well, there _are_ extra bits. Don't know if they contain what you wanted to see but I wanted to make sure you checked.

@hoopshank 8 жыл бұрын

+Jan Dvořák Watch it backwards?

@simoncarlile5190 8 жыл бұрын

+hoopshank lololol awesome response

@nickwoodward819 6 жыл бұрын

whereas your comment started where I hoped it would end.

@unvergebeneid 8 жыл бұрын

You can also move one eye/camera to get "actual" 3D because it's mathematically the same as two eyes for static scenes and under certain circumstances it even resembles the 3D "qualia" if you want to call it that.

@BGBTech 8 жыл бұрын

I once did it with low enough CPU use to run (sort of) on a Raspberry Pi at ~ 10 fps, but it was pretty crude (output depth map was 80x60, and a bit noisy/glitchy. input was a pair of 320x240 images from parallel webcams). partial optimization was making use of blocky-VQ so that for a block of pixels (4x4), you can determine early if they are out of range (or are a nearly exact match). it was based on a trick I had used for doing motion compensation in video compression during capture (motion compensation helps somewhat with compression). the trick greatly reduced the amount of pixel-by-pixel checks. also it worked internally using a dichromatic colorspace, partly to save space and also because it was cheaper to only compare two axes (whereas a single Y axis loses the ability for it to distinguish things based on color, reducing accuracy somewhat). I had also tried unsuccessfully to use some designs based around Haar wavelets.

@pirouettenerd2675 8 жыл бұрын

Needs a part 2.

@DeJayHank 8 жыл бұрын

Would have liked if you showed an example of disparity map obtained from some stereo vision algorithm. It can help show what the result would look like.

@Kruglord 8 жыл бұрын

+DeJayHank Google "disparity map" and look at the image results, you'll see a bunch of examples of what that might look like.

@DeJayHank 8 жыл бұрын

+Kruglord Well I know what it looks like because I've been working with Stereo matching algorithms with and without OpenCV, but I thought it would have been a good addition in the video for people who haven't seen.

@LyCaNid 5 жыл бұрын

Great explanation!

@DaanLuttik 8 жыл бұрын

Could you do one about how you account for reflective objects or objects in a space where lighting isn't ambient? I have some ideas on how one could do this. But I'd like to hear the smart ticks surrounding this topic.

@abdulrahmanalmoamaralmadan7843 6 жыл бұрын

Thank you for your great explanation . Amazing!

@FabrizioBianchi 8 жыл бұрын

Mike is the best speaker in the channel. AI guy comes a close second.

@TheDrawdex 8 жыл бұрын

I remember doing this for a final proyect. :D This would've been awesome.

@Kruglord 8 жыл бұрын

Great video, I think a good follow up video might be how people have approached the correspondence problem, such as using SIFT or SURF points.

@LiborTinka 8 жыл бұрын

+Kruglord Don't forget MSER, these are especially useful (area) features for wide-baseline stereo. Feature detectors and descriptors alone are an interesting topic.

@sungjinchun1094 3 жыл бұрын

Wow nice explanation! thanks

@Peepnbrick 8 жыл бұрын

I know you've already talked about color spaces, which was very interesting, but could you get Mike to do an episode on Gamma / Gamma Correction?

@AhpgZfoc4s 8 жыл бұрын

Looks like the cameraman didn't have his customary dozen shots of espresso.

@WaqarRashid 8 жыл бұрын

Its really helpful channel and had lots of interesting videos but I can't find some videos in order, some videos are hidden from the channel and there is only a small number of playlists.. Is there any website where I can access these videos in some order. Thanks.

@Pr0toc01 6 жыл бұрын

at 6:12 he is talking about its only possible because they know the positions of the 2 cameras, and that if they dont know the camera positions they have to search through the whole image. Question: Once you have done this one, two,...,ten times can you compute the cameras relative position? Once you calculate the cameras relative position you could then use that to make any future searches easier?

@TenSeiKenZX 8 жыл бұрын

I wish I had Dr Mike Pound as a lecturer

@jll8520 8 жыл бұрын

1:42 Shots fired at the Fine Bros

@jesper86broberg 3 ай бұрын

Im considering using TOF or Stereo 3D for a QA vision project for small details (o-rings) in a production. What pros and cons are they? to me, just by reading it seems TOF seems as the better option in most aspects but you seem more geard towards stereo 3d? Thank you for the nice videos!

@chrisradford1157 6 жыл бұрын

Does anyone know if I can use this method if I have the cameras gps coordinate when the images were taken? Can I calibrate the cameras using that data and follow the same method?

@Spongman 8 жыл бұрын

Monocular vision gets accurate depth from micro focus changes. Otherwise how does your eye know how to focus when you close one eye?

@pratherat 8 жыл бұрын

If I were to attempt this, I would create a coincidence map from the left and right image, representing how much each left pixel matched the corresponding right pixel, using an offset of a series of intervals from -somevalue to +somevalue. Somevalue weighed against each coincidence pixel should yield a kind of edge map depicting the differential offsets of similar pixels, interpreted as distance. I dream about developing some kind of inverse GPU card. Instead of taking a set of 3-D polygons and rendering them onto a 2-D screen, this would take a set of 2 or more camera inputs and render them into a set of polygons in 3-D. Given my history of "invention", this has already been done.

@LiborTinka 8 жыл бұрын

Reminds me of the "Fundamental Matrix Song", which is about the matrix connecting the two images. For more mathematically oriented people :-) Multiple-view geometry can be fun.

@TheAprone 8 жыл бұрын

When the 3D picture was zoomed into the screen at 1:32, I just barely was able to make out the image before they started messing with it. It doesn't match the simulated "answer" they displayed a few seconds later.

@steve1978ger 8 жыл бұрын

I played around with stereo vision using OpenCV, and found it really hard. Computers are still years away from having 3d vision like even a rather simple animal.

@BriMR 8 жыл бұрын

+steve1978ger That's why stereo vision is no longer used in most cases, the current laser and RGBD cameras are easier and more effective in depth feature recognition.

@gigige5928 2 жыл бұрын

....'like even rather a simple animal' - only predators have binocular vision which in turn are the most advanced biological life on this planet, like humans. overlapping field of vision might appear in some avians, but because totally different reasons (flight) and only for a few degrees, unlike predators who have true stereoscopic vision, there is no simple animal with that kind of trait

@steve1978ger 2 жыл бұрын

@@gigige5928 - okay, very simple animals may not have binocular vision, but "only predators" is an overgeneralization.

@LLHLMHfilms 8 жыл бұрын

The Rubik's cube on the sled isn't solved!!!! It's driving me crazy!!!!!!

@trunc8 4 жыл бұрын

So finally, how is the occlusion problem solved(feature hidden in one view, existing in the other)?

@MrBeanbones 8 жыл бұрын

Oh god, it's a headache to calculate all the points in analytic geometry, but is possible to use the focus of the camera to create a useful constant.

@peterjamesfinn 8 жыл бұрын

Instead of keeping the cameras in a fixed alignment, why not have some permanent feature in between the cameras of a known size (like our nose for our eyes) that allows you to calibrate each frame separately? Sure you will get a bit of obscuring, but it makes the 3D position calculation less reliant on keeping the relative position and angle of the two cameras absolutely fixed. You would need to use a bit of inference (or possibly something like staccato) to fill in the positions of the blank bit.

@Kruglord 8 жыл бұрын

+peterjamesfinn The video glosses over this fact (for good reason) but the camera calibration step they mentioned is actually a very important and relatively complex step in the whole system. Because the relative position of the two cameras determines the measurement of the rest of the system, even very small errors in their estimated position can have huge impacts on the precision of everything else. For this reason, in stereo-vision systems the cameras tend to be rigidly mounted together, and their position determined in a separate step before they're used in any other measurements. Now, it's common to have a system that only has a single camera, that takes lots of pictures and moves around the scene alone rather than having a pair. This method relies on a simultaneous calculation of the camera's position at each photo location, and the location of the features in each photo. This can be fairly accurate as well, but it also has it's limitations. Specifically, the scale is indeterminate without additional observations, so you can only record the shape of the scene, not the size. Also, as mentioned in the video, the camera's locations can only be determined through features common to different images, which leads us into the correspondence problem. There are methods available to solve this problem, but they're complex and take a lot of time to calculate, even with today's powerful computers.

@tombombadillo1 8 жыл бұрын

just thought Id make a point that I find I judge the distance to an object with one eye primarily with focussing; refocussing on an object until its the least blurry, and having your brain estimate how far away it is from memory.

@DeJayHank 8 жыл бұрын

+Hayden Muscat Yeah there is a lot more going on in the brain for distance estimation, like just knowing what size cars usually are or a mug or whatever is enough to decently judge its distance from you even with a single image. There are algorithms that judge depth by taking several pictures from one camera with different focus for every image. This wouldn't quite work on moving objects though since it takes time to get enough pictures for it to be useful.

@raj61091 3 жыл бұрын

i leant how to explain something from your video, also a bit about stereo vision

@highwayrunner9771 3 жыл бұрын

part II please

@tedchirvasiu 8 жыл бұрын

where the object oriented programming video gone at?

@JegErHolyNoah 8 жыл бұрын

+Ted Chirvasiu Yeah I'd like to know too

@RomainQ 8 жыл бұрын

That wolf sound effect at the beginning... I'm sure I've heard it in many different game but I can't find a source for it!

@ancientapparition1638 8 жыл бұрын

Dota 2 when the clock hits night time.

@RomainQ 8 жыл бұрын

+Ancient Apparition I also heard it in Dofus, WoW and HoTS

@demetriuspsf 8 жыл бұрын

I work with this technique to reconstruct faces from photos.

@unveil7762 3 ай бұрын

@piwithatsme 8 жыл бұрын

Moving your head also works to help see depth with one eye

@OsamaRana 8 жыл бұрын

+WiWiPiWiWi Essentially you're using one eye to gather information that you'd normally get with two eyes

@DeJayHank 8 жыл бұрын

+WiWiPiWiWi That would work with a camera as well, but only if the object(s) you are taking pictures of is stationary during the whole process, and you know how far you moved the camera.

@GISP 8 жыл бұрын

Can it be used in real time? eg. In a car, to give a 100% accuracy on distence to stuff? Can it be used, in Augmented Reality applications, and games in real time?

@TestDrivenUK 8 жыл бұрын

+GISP Subaru have a system (called 'EyeSight' strangely enough) that uses 'stereo' cameras mounted high in the windscreen to detect obstacles and warn the driver if they get too close, applying the brakes if necessary.

@olatunjifelix2102 4 жыл бұрын

great

@camius1 7 жыл бұрын

Im trying to implement this using IR cameras in real-time without any luck haha

@harshitkhandelwal1243 5 жыл бұрын

Why he is not seeing in the camera?

@OVBLANA 8 жыл бұрын

Could it be possible to recreate this with 2 mobile phones? You could calculate the distance between the two phones using gps location, but the precision could possibly be too bad.

@SyukriLajin 8 жыл бұрын

No need for gps, wifi/bluetooth triangulation could do it more precisely. Of course we would need 3 or more phones.

@tetradb_ 8 жыл бұрын

+Purple Blaze Google(not sure of the project name, but it's to do with google earth/maps) and Microsoft(Photo Tourism) already do this using peoples images, then they build up a 3d map of an area using those individual images. I dont think that accuarate/any gps data is a requirement, I think the algorithms used are clever enough to extrapolate the position.

@memorablename5187 7 жыл бұрын

i had a question on an exam 'is stereo vision possible with only 1 camera, if soo what ancillary data is needed' how would u guys answer this????

@mitigatekeeps1371 7 жыл бұрын

No.

@rufioh 8 жыл бұрын

Would this be easier with three cameras instead of 2?

@ahmetmelihafsar2352 4 жыл бұрын

I don't think so. For example if you use 3 cameras named a, b, c, you have to draw the triangles for a-b, b-c, a-c. It would be more accurate, but it would cost 3 times processing power.

@MrAlbedo39 8 жыл бұрын

So how do you determine the depth of a point in one view if it's occluded in the other view? Can you?

@Kruglord 8 жыл бұрын

+MrAlbedo39 You can't really determine the coordinate of a point that only appears in one image. All you can say is that it exists, it falls on the epi-polar line outside of the bounds of the second image. So is either too close or too far to be seen by both cameras.

@MrAlbedo39 8 жыл бұрын

+Kruglord I'm more interested in how that occluded point ends up being represented in the 3-D result. Does it appear as a flaw that must be manually corrected?

@elerosvecchio 7 жыл бұрын

MIKE! Finish the rubiks cube damn it

@MrRJReynolds 8 жыл бұрын

How do the cameras function when there is specularity?

@Kruglord 8 жыл бұрын

+R.J. Reynolds Specularity (highly reflective surfaces) generally cause the correspondence to break down, resulting in what effectively appears to be occlusion in the depth map.

@titaniumdiveknife 8 жыл бұрын

I understood about a quarter of that. that's enough for today.

@99Davidcool 3 жыл бұрын

This is easier with a plenoptic camera

@remybrandt8347 8 жыл бұрын

use filters on the receiver. That kills sunshine.

@noahwilliams8996 8 жыл бұрын

But how do we know what direction that line is going in?

@ACDCBoy62 8 жыл бұрын

+Noah Williams It's a straight line between the camera and the object.

@noahwilliams8996 8 жыл бұрын

Elias Simon No I mean the one that the other camera has that can be quickly checked to see if it has the same value.

@AnimeReference 8 жыл бұрын

If somehow my eyes were moved further apart would I have distorted vision? Is the distance between the eyes of a human even constant? or are they in the same spot since birth? I wouldn't bother with a system for eye distance calibration if I were a ... god?

@Tfin 8 жыл бұрын

+Jake Surname You might have trouble for a while, but you'd adjust, because your pattern matching ability is still much better than a computer's.

@liquidminds 8 жыл бұрын

+Jake Surname since you won't wake up one morning with your eye-distance being completely different, it doesn't matter. even if you grow a little, the changes are so small, that you can easily adapt. If you woke up and your vision was impaired by

@Tfin 8 жыл бұрын

You can simulate moving your eyes with a series of mirrors. Take one of those toy periscopes, and turn it sideways. Now your eyes are suddenly very far apart.

@Frrk 8 жыл бұрын

+R3Testa Suddenly, superhuman depth perception :)

@MadMonkey126 8 жыл бұрын

Your hair is on fleek! Wow did I just say fleek?

@Nulono 8 жыл бұрын

1:49 *farther

@ITR 8 жыл бұрын

Wouldn't ultrasound be easier or cheaper?

@Thomcat 8 жыл бұрын

+MMMIK13 Accuracy, lack of colour information, limited depth... At my work we use tens of fixed IR cameras to do motion capture, there is no way it could be reasonably done with ultrasound.

@ITR 8 жыл бұрын

***** Hmm, but in the example at the beginning he talked about using Stereo 3D Vision as an alternative to lasers, and in that case, do you think ultrasound would work better?

@SyukriLajin 8 жыл бұрын

Or radar. Google's project soli is doing exactly this. Pretty interesting

@aka5 8 жыл бұрын

+Syukri Lajin Radar uses ultrasound dunnit, isn't that what the previous guy was referring to?

@SyukriLajin 8 жыл бұрын

Akașșș ultrasound uses.. sound. radar uses electromagnetic waves. if i'm not wrong

@tankmohit 8 жыл бұрын

am i the only one who found Dr Mike Pound speaks like christian bale.

@TrollingAround 8 жыл бұрын

Who in their right mind leaves an unsolved Rubics cube on a shelf in the background of a video? No idea what the video was about as I was totally distracted. :-(

@MarcelRobitaille 8 жыл бұрын

I love how he says "free d"

@OH5EDP 8 жыл бұрын

Ye fock'n non brit :D

@MarcelRobitaille 8 жыл бұрын

+Jimi Leander I'm Canadian eh. The Queen is on my money. Pretty British if you ask me.

@avro549B 8 жыл бұрын

+Marcel Robitaille He's close to having a speech impediment, (e.g. notice "ovver"). It may be an English public (expensive private) school accent/affectation.

@rich1051414 8 жыл бұрын

+avro549B "Peasant accent" :D

@turbotrading7910 6 жыл бұрын

liked bcoz of wolf story

@canguar 8 жыл бұрын

does the brain work similarly, i wonder

@calfischer1149 8 жыл бұрын

in what way?

@Germanywithtripti101 3 жыл бұрын

Corrospondance problem

@RedSquirrelVanguard 5 жыл бұрын

It's always triangles!

@afroninjadeluxe 8 жыл бұрын

So the human brain knows the distance between the eyes?

@j7ndominica051 8 жыл бұрын

I can't see anything special in the "magic eyes" picture. There are 10 by 8 repeated patterns of noise.

@Sazoji 8 жыл бұрын

+j7ndominica0 i see two cube popping towards me diagonal to each other and a square and a circle with a circle inside popping in

@jadoo16815125390625 8 жыл бұрын

This started very abruptly. It would have been nicer to have a gooder introduction.

@leestons 5 жыл бұрын

"gooder"

@hanniffydinn6019 8 жыл бұрын

Try driving a car with one eye closed....!

@joelproko 8 жыл бұрын

What a boring magic image :(

@TheMasonX23 8 жыл бұрын

+joelproko Stereograms can be so cool, with detailed shapes and customized backgrounds, and the example they go with are some simple shapes over static...

@oldcowbb 4 жыл бұрын

i was expecting more nerdy stuff :(

@fivforfivfor Жыл бұрын

I already know the answer Because I have already solved this issue (light cones) ...🧐🧐🧐...

@calt03 7 жыл бұрын

I wanna pound dr pound😇🤔😅 he is so cute

@melihaslan9509 2 жыл бұрын

What I understand? Nothing

@realcygnus 8 жыл бұрын

he should have explained some of the maths

@IonoTheFanatics 8 жыл бұрын

+realcygnus ??? but the math there is really secondary to the principal core of the mechanism ie: how to simplify the problem to make solving it actually feasible instead of matching pixel by pixel on the entire image

@Kruglord 8 жыл бұрын

+realcygnus The maths tends to use a lot of linear algebra, which is probably beyond the scope of these videos.

@jebus6kryst 8 жыл бұрын

Everyone's a LOBO!. You do realize that there has never been a reported wolf attack in the Americas.

@PwnUIDo 8 жыл бұрын

You do realize saying something doesnt make it true.

@jebus6kryst 8 жыл бұрын

I Am The Way You do realize you are communicating with me on a device that can show if I am right or wrong. I am sorry if I do not cite my work in the KZfaq comment section.

@sam08g16 8 жыл бұрын

computerphile is way too nerd for the normal human being

@bcn1gh7h4wk 8 жыл бұрын

basically, "how to solve a problem with 3 variables" fix one, know another, and math the result out. that's a thousand-year-old principle.... and people still fail to apply it to daily situations.