Stereo Vision at Home PC

Stereo Vision at Home PC

Abstract. This article explains the physical and psychophysical basics of stereo vision, describes certain hardware and software implementations of the stereo vision at a home PC and provides a Demo, displaying fascinating 3D animated stereo trajectories for several important physical problems. Also it implements a 3D cursor (controlled with a conventional mouse), making possible “tactile” exploration with audible feedback of 3D curves.

Disclaimer: To watch the effects described in the article and displayed by the software, first of all you need a pair of Red/Blue glasses, obtainable for example at [1] for 50 cents. Also, you must have relatively healthy eyes capable of stereopsis (ability to naturally fuse two plane images comprising a stereo pair into a 3D scene). Almost everybody do (not to be confused with fusing so called auto-stereograms). If you have ever been at modern movie theaters watching stereo movies through polarized or electronic shutter glasses, or when you try a stereoscope, or look at Japanese stereo post cards covered with micro raster, or when you explore a hologram – and are always stunned with the scene popping out into space, your stereopsis is OK.

1. A gift we are not aware of: the basics and the terminology.

Stereo vision is a wonderful capability of our brain to recreate an image of 3D scene “fusing” the two plane images received by the eyes. It's a very special feeling of 3D space, belonging completely to the realm of imaginary sensations produced by our brain: it's much more “imaginary”, than sensations of plane pictures. After all, there is no place inside our body, where 3D scene of reality is somehow physically projected, because the only projections we do have are those onto the retina of our eyes, i.e. two plane images. (Well, actually the retinal surface is near spherical, but topologically it is plane).

Yet this powerful sensation of stereo vision usually remains unnoticed, being a wonderful gift we are hardly aware of. That is true as far as most time we are in real 3D environment inhabited with objects of regular shapes, or those of irregular but very familiar shapes. Our brain so routinely “maintains” this 3D scene, that even if the regular video input is interrupted (say one eye is closed), we usually do not feel as if anything in our perception were lost. Therefore, while we look at the real 3D world, our brain doesn't bother to always keep us “high” with that special feeling of 3-dimensiality. The real excitement of the 3D vision happens when we look at special objects expected to be plane in its physical shape (a screen, a hologram, a pair of slides in stereoscope), but the brain suddenly discovers there a 3D scene, popping out of the plane.

This is the 3D Stereo Vision, or the real 3D – not to be confused with erroneously used term 3D Graphics (for video cards and video libraries). This confusion emerged when the game and video technology switched from very simplistic projections (known as front, side or top views in drawing), to a more advanced oblique perspective (or isometric) projection of 3D objects and scenes. Thus, the 3D Graphics actually renders 3D scenes into plane isometric projections. Photos, isometric drawings, artistic paintings (if their perspective is correct) all are examples of such projection. When we look at them, we perceive it exactly as plane 2D projections of 3D reality. With a little training, we all are good in “reading” these projections and understanding which 3D reality they represent. We are so used to it (due to photos, movies, and paintings), that we are hardly aware of the convention, namely that we are looking at projections of 3D world – the only known way of reproducing images (to not mention sculptures) until invention of a stereoscope in 19th century in Great Britain.

The two eyepieces of a stereoscope deliver the two plane images of a stereo pair directly to the corresponding eyes. And then a miracle occurs: viewers begin to perceive the 3‑dimesiality of the scene so real, that they wish to touch the objects hanging in air – something never happening when we look at any plane projection of the 3D world.

After the stereoscope, many other techniques were developed for displaying 3D stereo images. In all of them (except the holography) the basic idea remains the same as in the stereoscope: to deliver each of the two images of a stereo-pair to the proper eye. On the contrary, the holographic equipment completely reproduces the 3D light front, i.e. the 3D vector field of those very electro-magnetic waves, which would be reflected by the 3D scene if it were really there. Thus a hologram creates a sculptural “ghost” of 3D scene in the real world as a real physical phenomenon, while a stereo pair creates this scene in our conscience only – which is an ultimate addressee anyway. Another fundamental difference is in the quantity of information: a stereo pair is just two plain images, while a hologram encodes infinity of them.

Now, putting aside an awe and emotional excitement of the stereo vision (as well as obvious evolutionary advantages of perceiving the 3-dimenciality for surviving of the species), let us ask ourselves, how the stereo vision is better than a plane perspective projection of a 3D scene like a photo, painting, or drawing (also called the isometric drawing). The isometrics (or photo-imaging) is good for displaying such objects like scenes with rectangular shapes, bodies with edges, skeletal structures, scenes with good perspective or with reflection hints. Yet the isometrics works poorly for smooth surfaces without edges (smooth 2D manifolds). For example, to perceive a picture of a sphere or a torus, we additionally draw several curves on their surfaces, or special shadows and reflections. The 2D projections become especially inefficient in the case of visualizing non-planar curves (1D manifolds). The attached DEMO software is intended exactly for that type of curves, taking the full advantage of the 3D stereo vision.

2. Mathematical model of stereo vision.

The direct problem formulates as the following: given a viewing point and a 3D scene, obtain two plane images, called a stereo pair, which “fuses” into a given 3D scene. Note, that strictly speaking a stereo pair depends on the viewing point: it encodes the 3D scene only as viewed at one specific point. For example, all viewers in a movie theater perceive stereo scene as though they all watched it from the same point – the one it was shot from.

Obtaining a stereo pair is a straightforward geometrical problem as simple as construction of two perspective projections of the 3D scene onto a plane with the centers of projection in either eye (Fig. 1). The technical tools and schemes for obtaining stereo images (such as double lens cameras) are known also since the time of invention of a stereoscope: they may be found for example at [1]. (Also stereo photos of still images may be shot by a conventional camera sequentially – techniques that will be discussed in part 2).

On the contrary, the inverse problem, i.e. the reconstruction of a 3D scene from the two plane images of a stereo pair, is geometrically an ill-defined problem, requiring heuristic assumptions and a priory “knowledge” about the reality. Our brain somehow solves this problem quite successfully, not only decoding 3D scenes of environment where we physically live, but also fusing 3D scene encoded in artificially created stereo pairs and displayed with various technical tools. Not much is known how the brain does it. It must start with matching the identical points, edges, curves and other topological elements in the two images of a stereo pair (Fig. 1). Then the disparities (also called parallaxes or angular differences between the corresponding elements) should be translated into 3D coordinates. Mathematicians do that employing trigonometry and a lot of computation, but we do not know how it happens in the brain.

The best stereo vision (the easiest to fuse) is achievable with such tools, that deliver each of the two high quality color images of a stereo pair to the proper (and only to the proper) eye. For example, in stereo movies the two images are projected onto a conventional reflective screen. The trick is in that each of the images is polarized and their planes of polarizations are perpendicular to each other (say horizontal and vertical). The audience watches through the polarizing filters so that each of the two eyes receives only the proper image in full color (human eyes ignore polarization of the light). So simple and efficient at movie theaters, this idea isn't applicable to computer monitors. However another scheme based on high frequency altering display of left/right images does work for them. In this scheme the viewer has to wear goggles or liquid crystal glasses functioning as an electronic shutter, synchronized by the stereo projector. It is implemented both for movie theaters and personal computers. In case of the computers a special video card with proper software is required: the card must be capable to control the electronic shutter. As to the monitor, it may be a conventional but a high end monitor (the frame rate at least 120 Hz and phosphor with shortest decay rate to avoid “inertial” overlapping of the images).

Recently Sharp introduced a 3D stereo Notebook [2] not requiring any glasses at all. They implemented the idea similar to that in raster stereo post cards: the left/right images of a stereo pair are represented as intermittent very narrow vertical stripes (and that halves the horizontal resolution). Unlike viewers in stereo theaters wearing special glasses or goggles and capable of perceiving the effect in wide variety of positions, the 3D stereo Notebook requires the viewer to maintain his position strictly at the axis of the screen (plus/minus an inch).

Another old idea – the so called anaglyphic (Red/Blue) glasses described in this article and implemented in the software, is nice in its simplicity and in that it requires nothing more than a conventional PC plus cardboard glasses [1] – the cheapest gear possible. The two images of an anaglyphic stereo pair must be monochromatic – each in a different color of one of the basic Red/Green/Blue set. The Red/Blue pair is better because their spectrums are farther away from each other and better separable. When you look through the Red/Blue glasses at the Red/Blue images of a stereo pair simultaneously displayed (and overlapped) on the screen, the second undesired image is filtered away for either eye so that each eye receives only the proper image of the stereo pair addressed exactly to this eye.

Sounds simple, but you may rightfully ask: what about that the two images of the stereo pair appearing in different colors? How does the brain fuse them if it is trained to match the corresponding elements which indeed must be of the same color?

Well, the brain follows not a “rigid”, but rather a “flexible” algorithm. Certain neurons fire if they find out recognizable elements in the image disregarding their colors, while the others fire if these elements happen to comprise a “meaningful” stereo pair. When it occurs, the signals of those “stereo-specialized” neurons dominate over everything else. As a result we fuse the 3D scene even in spite the fact, that each eye sees it in a different color. Various people may perceive an anaglyphic scene either as monochromatic Red only, or as Blue only, or as Gray, or in "competing colors". In either case fusing an anaglyphic stereo pair is just as easy as “uncompromised” stereo pairs in stereoscopes or movie theaters. This is the case when the natural stereopsis works as is and no extra efforts are required to fuse the stereo pair. (On the contrary, for some other types of stereo effects like the so called auto-stereograms, or for stereo pairs placed next to each other without a separator, the eyes must be crossed in a certain way and the brain needs more efforts and some training to fuse such a stereo pair successfully).

3. Programming the anaglyphic stereo

There is a principle feature of PC monitors which is particularly beneficial for anaglyphic stereo. The crucial fact in the implementation of the color control in PCs is that both the hardware and software actually support not one but three completely independent monochromatic screens simultaneously: the Red, the Green and the Blue. True, usually we do need them all three working together in order to continuously display the three color components of a color (or black and white) stream of images. However, due to particularity of the anaglyphic stereo, requiring two separate monochromatic streams of images, it does take full advantage of the above mentioned architecture.

In Delphi graphics, in order to display three overlapping monochromatic bitmaps in Red, Green and Blue without corrupting each other (when viewed through the filters), the first layer must be drawn with

Canvas.CopyMode := cmSrcCopy,

while the second and the third layers with

CopyMode := cmSrcPaint {OR mode for Bitmaps, Brush}.

This is because the three color components occupy different bits in TColor values and never interfere with each other. Similarly, when doing Pen operations, all three layers must be drawn on the black background with

Pen.Mode := pmMerge {OR mode for pen}.

Indeed, we need only two layers and two colors (Red and Blue) for anaglyphic stereo. If viewed without the anaglyphic glasses, the overlapped Red/Blue images do interfere appearing Magenta in the overlapped pixels. However the filters of the glasses separate the images so that each looks intact as though the other doesn't exist.

Although the sample software associated with the article displays only curves, and therefore uses the pixel colors of pure Red and Blue of the maximum intensity 255 only, the similar techniques may be employed for half-tone black and white stereo images as well. Having a source black and white stereo pair, just erase Green and Red bits in all pixels of the first image of the stereo pair (hence converting it into half-tone Blue), and erase Green and Blue bits in all pixels of the second one (making it half-tone Red). Then display the images overlapped as explained above, and view the stunning stereo picture through your anaglyphic glasses. (A program helping to do that will appear in the part 2 in the future). Find two such anaglyphic photo bitmaps in the downloaded files. Another fascinating example of an anaglyphic map of hilly San Francisco as a hard copy is available at [3].

The method applies also for mathematical surfaces whose images were created with half-tone shadows, reflection hints or texture.

To obtain a printed copy of an anaglyphic stereo we have to convert the source screen colors Red, Blue (plus Magenta at places of their overlapping) into the more appropriate printer colors (Cyan, Magenta, Yellow and Black). However the color pigments on paper reflect the light not like the phosphors of the CRT screen emit it. Usually the pure Cyan is better in the role of the screen Blue, composite Red – for the screen Red, and some mixture of Gray – for the screen Magenta, while the black background remains black.

In the simplest case when only the full Red and Blue colors are used, the table below summarizes requirements to the printed colors for specific Red/Blue glasses.

	Viewed through Red filter	Viewed through Blue filter
Composite Red	Light Gray [RR]	Very dark Gray [RB]
Composite Blue	Very dark Gray [BR]	Light Gray [BB]
Their overlapping	Some Gray [OR]	Some Gray [OB]

The ideal printer color adjustment should satisfy the following conditions:

Light Gray [RR] = Light Gray [BB] = Some Gray [OR] = Some Gray [OB]

Very dark Gray [BR] = Very dark Gray [RB] = Black.

Printing half-tone Red/Blue anaglyphic images would require much finer printer color adjustment.

4. Programming the geometry of the stereo vision.

Let us assume that the 3D scene we are going to render is located in the space relatively to the screen as shown on the (Fig. 1). To represent a point in 3D space, we are going to declare an object containing both the 3D coordinates of the point and two 2D coordinates of its left-eye and right-eye projections:

T2DPoint = record

x, y : extended

end;

T3DPoint = object

x, y, z : extended; {3D coordinates to be mapped into two 2D coordinates RightEyeProj and LeftEyeProj}

RightEyeProj, {right eye perspective projection}

LeftEyeProj : T2DPoint; {left eye perspective projection}

procedure SetRightLeft;

function Init(const x1, y1, z1 : extended) : T3DPoint;

{with rotation}

function InitStill(const x1, y1, z1 : extended) : T3DPoint;

{without rotation}

end;

We have to deal with the following three different coordinate systems and different scales:

1. The parallelepiped [Xmin..Xmax, Ymin..Ymax, Zmin..Zmax] of the 3D scene where trajectories of the physical motion take place. These values may be in any units in any scale from the size of atom to that of the Universe. The values x, y, z in the type T3DPoint represent these coordinates.

2. The proportional parallelepiped of the 3D scene corresponding to the physical sizes of your local macro-world: your 2D screen, the distance from you to that screen and the distance between your eyes (say in cm). In this scale the 3D scene is to be projected onto the 2D plane of your screen. The values x, y in the type T2DPoint represent the coordinates of this screen.

3. The proportional logical screen – Canvas – with sizes and points in pixels, corresponding to that physical 2D screen.

The simplest projection method for points (x,y,z) in 3D space onto a 2D coordinate plane XOY is such that projecting rays are all parallel to OZ axis (as if the viewing point were at infinite distance). Then the projection process is simply reduced to omitting the coordinate z so that (x,y,z)®(x,y). However in perspective projection the rays from different points of the scene are not parallel: they converge in the center of an eye lens (or in the lens of a camera) mapping the 3D coordinates onto the 2D screen (film, retina). These 2D coordinates may be obtained via solution of similar triangles. (Note: it is the projection process where the 3D information is lost; all points on each projecting ray in 3D space are mapped into one point only).

Here are important variables specifying the 3D scene:

cmImageWidth, {width of the screen}

Abase, cmABase, {half-distance between eyes}

Dist, cmDist, {distance from eyes to the screen}

Front, cmFront, {distance from eyes to zMax i.e. to the Front of the 3D scene}

Cx, Cy, Cz, {coordinates of the point of fixation}

procedure Set3DParams(const xMin, xMax, yMin, yMax, zMin, zMax : extended);

var k : extended; {boundaries of the 3D scene in source units}

begin

k := (yMax - yMin)/cmImageWidth; {viewer-screen scale to source units factor}

Base := k*cmBase;

Dist := k*cmDist;

Front := k*cmFront;

Cx := (xMin + xMax)*0.5;

Cy := (yMin + yMax)*0.5;

Cz := zMax + Front;

p3Min.InitStill(xMin, yMin, zMax); {opposite corners of parallelepiped}

p3Max.InitStill (xMax, yMax, zMax); {defining the 3D scene }

xMinSter := p3Min.LeftEye.x;

xMaxSter := p3Max.RightEye.x; xSpan := xMaxSter - xMinSter;

yMinSter := p3Min.LeftEye.y;

yMaxSter := p3Max.RightEye.y; ySpan := yMaxSter - yMinSter;

end;

The procedure below computes perspective projections for a given point (x,y,z) onto each of the two plane images designated for the corresponding eyes.

procedure T3DPoint.SetRightLeft;

var Kz : extended;

begin

Kz := Max( - z + Cz, Front);

RightEyeProj.x := Cx - Base + Dist*(x - Cx + Base)/Kz;

LeftEyeProj.x := Cx + Base + Dist*(x - Cx - Base)/Kz;

RightEyeProj.y := Cy + Dist*(y - Cy)/Kz;

LeftEyeProj.y := RightEyeProj.y {y values always equal for both projections}

end;

Real type points of the type T2DPoint are intended for mapping onto a Canvas of a target object, hence they must be scaled to the sizes of that Canvas. The scaling is performed by a function

function RealToIntPoint(const x1, y1 : extended; const Rct : TRect) : TPoint;

begin {xMinSter,... yMaxSter represent overlapped stereo pair screen}

with Rct do

result := Point(Round((x1 - xMinSter)/xSpan*(Right-Left)) + Left, Round((yMaxSter - y1)/ySpan*(Bottom-Top)) + Top) {OY axis up}

end;

Here Rct defines the rectangular area of the screen (Fig 2), proportional to the real type rectangular area [xMinSter..xMaxSter, yMinSter..yMaxSter] containing the two overlapped images of the stereo pair, where

xSpan = xMaxSter - xMinSter

ySpan = yMaxSter – yMinSter;

Thus the function linearly transforms the real type coordinates into the values applicable to Canvas.Pixels with the axis OY directed traditionally down-up. For example, here is a procedure drawing a stereo line between points p and q:

procedure StereoLine(const p, q : T3DPoint; const Cnv : TCanvas;

const Rct : TRect);

var a, b : TPoint;

begin

with Cnv do

begin {First x,y belong to RightEyeProj, second - to RealToIntPoint}

with p, RightEyeProj, RealToIntPoint(x, y, Rct) do a := Point(x,y);

with q, RightEyeProj, RealToIntPoint(x, y, Rct) do b := Point(x,y);

Pen.Mode := pmMerge; { OR mode for pen}

Pen.Color := RightColor; PolyLine([a, b]);

with p, LeftEyeProj, RealToIntPoint(x, y, Rct) do a := Point(x,y);

with q, LeftEyeProj, RealToIntPoint(x, y, Rct) do b := Point(x,y);

Pen.Color := LeftColor; PolyLine([a, b])

end

end;

Having this we can write a procedure drawing a triangular facet (of a polyhedron) with a certain gray color (i.e. certain level of half-tone Red and Blue). Now we are ready for stereo triangulation of a given surface in 3D space (but surfaces are beyond the scope of this article). Here we are going to deal with special curves only – solutions of ordinary differential equations, integrated in the frame of an All-In-One package called the Taylor Center [4]. The points of the curves are computed via the Taylor expansion, as explained in that article. The next chapter will guide you through the DEMO, displaying fascinating examples of 3D stereo motion.

5. Playing with the DEMO

The DEMO is a limited version of the Taylor Center intended for the integration of ordinary differential equations, but there are also a variety of examples including several in 3D stereo, illustrating the considered principles of the anaglyphic stereo.

To install the DEMO, just unzip it into an empty folder of your choice, designated for the running module and the associated files. Then run TCenter.exe and follow the steps conducting you to several examples producing 3D trajectories.

In the menu select Demo/Three Bodies/Disturbed/3D: it opens the corresponding script and compiles the problem. After clicking OK to the message “Compilation successful”, the results appear as knotty Red and Blue curves. Now put on your anaglyphic glasses (over those you usually use, if any) and get ready to fun. (It's recommended to maximize the Graph window).

What you hopefully perceive would look like a “fishing string” hanging in thin air between the monitor and your face. These are trajectories of three bodies moving under gravitational pull. More specifically, this is a so called disturbed Lagrange case. (The Lagrange case proper is when three equal masses are placed at vertices of an equilateral triangle with initial velocity vectors comprising also a co-planar equilateral triangle –Demo/Three Bodies/Symmetrical). This “fishing string” is a result of a small disturbances applied perpendicularly to the initial plane (the plane of your screen).

However the program is capable to produce something more than “still life”: click “Play” button. This button initiates the 3D stereo animated real time motion of the bullets representing the three bodies with all their accelerations, decelerations, coupling.

When they come to still, you may try to explore the elements of the trajectories with a "tactile" 3D cursor. Move it into the scene, where it transforms into a small cross. The mouse by itself always moves it in a plane parallel to the screen. In order to control its depth, you have to move the mouse and simultaneously depress either Ctrl key (to bring the cursor closer to your eyes), or Shift key (to move it away from you). The current 3D coordinates always appear on the top window panel.

Now, moving the mouse and helping yourself with Ctrl or Shift keys, try to touch one of the trajectories in the space with the 3D cursor. If your speakers are ON, you will hear a clicking sound when the touch occurs: this is the so called audio feedback, helping to explore points of interest in the curves.

You can turn the curves in the space with the Turns controls. However with the given specific sizes of the parallelepiped, you can notice that the front side (controlled by zMax value) prevents the curves of escaping and “flattens” them. (Therefore increase zMax).

Already familiarized with the 3D stereo features of the package, you can try several other problems. Click Main Panel in the menu to re-visualize the main form, and go to Demo/Four Bodies. The two pairs of bodies with equal masses are all initially placed in a horizontal plane parallel to your desk (perpendicular to the screen). The horizontal components of the velocities provide near circular motion for each coupled pair, while the small vertical components push the two pairs into a large circular motion around the center of the masses (you can see the initial data in the Main window). At the beginning the trajectories spin into a braid looking like a torus (and like the newly discovered rings of Saturn shot by Voyager probe), but the braid actually is not a torus: you can notice that the velocities in both coupled pairs preserve their initial horizontal orientation. Indeed, this fact may be obtained mathematically, but here you just watch and see it.

Finally, you can explore few more 3D stereo examples opening them as scripts. Click the Main Panel and go to File/Open script menu item. Here is the list of files producing the 3D stereo images:

PendulumApple.scr (a model of spherical pendulum)

PendulumFlower.scr

KnotChain3D.scr

TrefoilKnot3D.scr

Mobius.scr

MobiusLarge.scr

6. Conclusions.

The anaglyphic display is the simplest yet efficient approach implementing the stereo vision at conventional PCs. It produces almost uncompromised stereo effect requiring nothing more than the natural ability of the stereopsis in viewers. For displaying monochromatic stereo images it is the best choice.

In this article we considered 3D stereo anaglyphic display for points, non-planar curves and straight lines only. However these basic elements make possible also constructing surfaces, whose elements are random dots, or random texture, or parametric curvilinear coordinate grids, or triangles of a triangulation grid.

Don't through away your anaglyphic glasses: more applications follow.

[1] A source of stereo glasses and other stereo gear is www.reel3d.com. The recommended Red/Blue glasses are of type 7001: they are the best match to screen Red/Blue colors.

[2] Sharp 3D Stereo Notebook: www.sharpsystems.com/news/press_releases/release.asp?press=45

[3] Satellite stereo maps

[4] Differential Equations: Delphi Informant 2002/3.

Illustrating graphs: