How Vision Works

Solving the problem of converting light into ideas, of visually understanding features and objects in the world, is a complex task far beyond the abilities of the world's most powerful computers. Vision requires distilling foreground from background, recognizing objects presented in a wide range of orientations, and accurately interpreting spatial cues. The neural mechanisms of visual perception offer rich insight into how the brain handles such computationally complex situations.

Visual perception begins as soon as the eye focuses light onto the retina, where it is absorbed by a layer of photoreceptor cells. These cells convert light into electrochemical signals, and are divided into two types, rods and cones, named for their shape. Rod cells are responsible for our night vision, and respond well to dim light. Rods are found mostly in the peripheral regions of the retina, so most people will find that they can see better at night if they focus their gaze just off to the side of whatever they are observing.

Cone cells are concentrated in a central region of the retina called the fovea; they are responsible for high acuity tasks like reading, and also for color vision. Cones can be subcategorized into three types, depending on how they respond to red, green, and blue light. In combination, these three cone types enable us to perceive color.

Signals from the photoreceptor cells pass through a network of interneurons in the second layer of the retina to ganglion cells in the third layer. The neurons in these two retinal layers exhibit complex receptive fields that enable them to detect contrast changes within an image; these changes might indicate edges or shadows. Ganglion cells gather this information along with other information about color, and send their output into the brain through the optic nerve.


The optic nerve primarily routes information via the thalamus to the cerebral cortex, where visual perception occurs, but the nerve also carries information required for the mechanics of vision to two sites in the brainstem. The first of these sites is a group of cells (a nucleus) called the pretectum, which controls pupillary size in response to light intensity. Information concerning moving targets and information governing scanning of the eyes travels to a second site in the brainstem, a nucleus called the superior colliculus. The superior colliculus is responsible for moving the eyes in short jumps, called saccades. Saccades allow the brain to perceive a smooth scan by stitching together a series of relatively still images. Saccadic eye movement solves the problem of extreme blurring that would result if the eyes could pan smoothly across a visual landscape; saccades can be readily observed if you watch someone's eyes as they attempt to pan their gaze across a room.

Most projections from the retina travel via the optic nerve to a part of the thalamus called the lateral geniculate nucleus (LGN), deep in the center of the brain. The LGN separates retinal inputs into parallel streams, one containing color and fine structure, and the other containing contrast and motion. Cells that process color and fine structure make up the top four of the six layers of the LGN; those four are called the parvocellular layers, because the cells are small. Cells processing contrast and motion make up the bottom two layers of the LGN, called the magnocellular layers because the cells are large.

The cells of the magnocellular and parvocellular layers project all the way to the back of the brain to primary visual cortex (V1). Cells in V1 are arranged in several ways that allow the visual system to calculate where objects are in space. First, V1 cells are organized retinotopically, which means that a point-to-point map exists between the retina and primary visual cortex, and neighboring areas in the retina correspond to neighboring areas in V1. This allows V1 to position objects in two dimensions of the visual world, horizontal and vertical. The third dimension, depth, is mapped in V1 by comparing the signals from the two eyes. Those signals are processed in stacks of cells called ocular dominance columns, a checkerboard pattern of connections alternating between the left and right eye. A slight discrepancy in the position of an object relative to each eye allows depth to be calculated by triangulation.

Finally, V1 is organized into orientation columns, stacks of cells that are strongly activated by lines of a given orientation. Orientation columns allow V1 to detect the edges of objects in the visual world, and so they begin the complex task of visual recognition. The columnar organization of primary visual cortex was first described by David Hubel and Torsten Wiesel, resulting in their 1981 Nobel Prize.


Interestingly, this checkerboarded, columnar organization of V1 is extremely fuzzy at birth. The visual cortex of a newborn baby has a hypertrophy, or overgrowth, of haphazard connections which must be carefully pruned, based on visual experience, into crisply defined columns. It is actually a reduction in the number of connections, not an increase, that improves the infant's ability to see fine detail and to recognize shapes and patterns.

This type of activity-dependent refinement is not limited to V1, but occurs in many areas throughout the cerebral cortex. At the same time that the ability to discriminate lines and edges is improving in primary visual cortex, cells in secondary visual cortex, V2, are refining their ability to interpret colors. V2 is largely responsible for the phenomenon of color constancy, which explains the fact that a red rose still looks red to us under many different colors of illumination. Color constancy is thought to occur because V2 can compare an object and the ambient illumination, and can subtract out the estimated illumination color; however, this process is strongly influenced by what color the viewer expects the object to be.

In fact, almost all higher order features of vision are influenced by expectations based on past experience. This characteristic extends to color and form perception in V3 and V4, to face and object recognition in the inferior temporal lobe, and to motion and spatial awareness in the parietal lobe. Although such influences occasionally allow the brain to be fooled into misperception, as is the case with optical illusions, they also give us with the ability to see and respond to the visual world very quickly. From the detection of light and dark in the retina, to the abstraction of lines and edges in V1, to the interpretation of objects and their spatial relationships in higher visual areas, each task in visual perception illustrates the efficiency and strength of the human visual system.