5. 2 Images: Information without Words or Numbers Images play a fundamental role in the representation, storage, and transmission of important information throughout our professional and personal lives. In many professions, including publishing, art, film making, architecture, and medicine, it is crucial to be able to represent and manipulate information in image form. Furthermore, with the development of multimedia technology and virtual reality, many other professions are beginning to explore the power of representing information in visual form. In Chapter 3, we introduced the ideas behind binary representation of information, and in particular showed how inter and text can be converted into binary form. We also mentioned that other types of information can be represented by bits, and briefly described the process one might use to convert an image into binary digits.
We then suggested how this would extend to representation of time-varying imagery, or video. 5. 3 Cameras and Image Formation As mentioned in the introduction to this book, the film-based camera is over 150 years old. Recent advances have provided a variety of alternatives to the use of conventional film, but the basic image formation process has not changed. This process may be familiar to you from experience with basic optics, and is illustrated in Figure 5. 1.
The essential components of this system are: the object or scene to be imaged, the lens, and the image recording medium (retina of the eye, film, or other device).
... for African Americans to be able to have a positive representation on screen was a long and hard road to take ... only months after the first theatrical projection of moving images. 3 The first films showed black soldiers embarking for the Spanish American ... device to project those pictures, a projector.The entire development process involved six people: Etienne Jules Mary, Eadweard Muy bridge, Thomas ...
The image recording medium is usually located in a plane parallel to the lens, known as the image plane. Note that the image that is formed is inverted; this is usually of no consequence because the display device may easily correct this condition. The resulting image represents a projection from the three-dimensional object world to the two-dimensional image world. The focal length specifies the distance from the lens to the image plane.
More useful to us, it also indicates the degree of magnification of the lens. From 35 mm photography, we know that a lens of 50 mm focal length is considered ”normal” (in the sense that the resulting photo will contain the same expanse of image that a human would see from the same point as the camera); one of 28 mm focal length is ”wide angle,’ ‘ and one of 135 mm focal length is ”telephoto.’ ‘ For a different film (image) size, those focal lengths would change, but the principle remains the same. Figure 5. 1: The operation of an imaging system based upon projection by a lens of a scene onto an imaging plane.
While we will not delve into the details here, it is important to understand that there is a precise mathematical relationship between the location of each point in the image and the corresponding points in the real world. So for example, if we have an aerial photograph of farmland and we know the altitude of the camera, we can calculate the area of each field or other object in the image. Similarly, if we have X-ray images of the heart as it pumps, we can determine the cross-sectional areas of the ventricles, and hence their pumping efficiencies. In robotics, a video camera may be used to determine precisely the location of a robot arm with respect to the work, and hence provide motion guidance.
This process of reducing the dimensionality of the information (from three dimensions to two in photography) is referred to as projection and is fundamentally a mathematical concept. Inevitably, information is discarded irretrievable when a projection is made, as is reflected in the fact that we are losing a dimension, in this case depth. Also, there are many different types of projections that may be made in each situation. For cameras (including video and film cameras and our eyes), the image formation system is referred to as ”perspective projection,’ ‘ the most well-known characteristic of which is that images of objects become smaller as the objects become farther from the camera. This is the effect that causes railroad tracks to appear to converge as they become farther from the camera.
... greater array of digital functions, improved optics, and higher image resolution. Digital cameras are available across every price range, and for every ... 09MP), 256 levels of gray. The Fujix DS-1P was the world’s first commercial digital camera. It wrote digital ... purposes. For low cost and small size, these cameras typically use image sensors with a diagonal of approximately 6 mm, corresponding ...
From geography, you may be familiar with the ”Mercator projection” for transforming the three-dimensional earth onto a two-dimensional map, and the resulting distortions. Each of our eyes can be thought of as a camera that records a two-dimensional view of the three-dimensional world in front of us. Because our two eyes are some distance apart, the views from each eye are slightly different. Our brains take these two images and merge them to recreate three-dimensional images in our brain.
(Try closing one eye and see how the world seems to flatten! ) Similarly, two different camera-acquired images of the same scene can be used to reconstruct the third dimension, which would be lost if only one image were used. 5. 4 Human Visual Discrimination and Acuity What are the limitations of the human eye’s ability to discern gray levels and spatial resolution? Such limitations have long been the subject of multidisciplinary studies by physicists, physiologists, and psychologists. It has been determined experimentally that the minimum discernible difference in gray level, called contrast sensitivity, is about 2% of full brightness.
5. 1 Hence, a gray scale image need only have about 50 levels of gray (approximately what you saw in the 6-bit gray scale images above) to meet the needs of apparently continuous gray scale representation. However, if a small section of that image were to be cut out and brightly illuminated, we might again see edges between the pixels because the definition of full scale for the new image has been changed by removing a portion of the image. Figure 5.
2: Depiction of the geometry of the viewing angle of the eye. In describing the ability of the eye to resolve fine detail, we generally speak in terms of ”lines per degree of visual arc.’ ‘ This complex statement simply refers to the fact that as an image is brought closer to our eye, we can resolve more detail, until it becomes too close for our eye to focus clearly. In bright illumination, the adult, visually impaired human can resolve approximately 60 lines per degree of visual arc. 5. 2 By visual arc, we mean the angle covered by the area being viewed at the apparent focal point of the eye, as shown in Figure 5. 2.
... or discordant qualitiesofidea with the image (Coleridge 482). In, Resolution and Independence, Wordsworth attempts to create an image of the poetic imagination in ... his fear of losing his divine gift. In the last line of the poem he calls to God for support, as ... . Wordsworths description of the old mans occupation gives the clearest image of the secondary imagination. At length, himself unsettling, he ...
Thus, if more than 60 lines are crowded side-by-side (with white space between them with the same width as the lines) into a single subtended degree of viewed space, they will appear to merge into single gray mass to the human viewer. This is a good point at which to clarify potential confusion in the use of the terms ”lines” and ”pixels.’ ‘ Note that to form a line out of pixels, we need to arrange a string of black pixels parallel to a string of white pixels. The eye will discern this black-white transition as a line. Hence, one line requires two strings of pixels. Therefore it is equivalent to say that the human eye can individually discern 120 pixels per subtended degree of visual arc, or 60 lines per degree. This spatial resolution limit of human visual acuity derives from the fact that the color-sensing cone cells that are concentrated at the center of your retina are packed approximately 120 across a distance of 290 micrometers of your retina.
Also, one degree of a scene is displayed across 290 micrometers of the retina by the eye’s lens. Hence, human visual acuity is a determined by the digitization of the analog image on your cornea! To make this more concrete, consider the act of holding a common 8. 5 by 11 inch piece of paper one foot in front of your face with the longer dimension held horizontally (this is often called ”landscape orientation”).
We can apply some geometry and determine the subtended angle in both directions. These results in the horizontal and vertical directions are the specific values for the arc marked ”angle” in Figure 5. 2.
The horizontal angle turns out to be 49. 25 degrees, and the vertical angle is 39 degrees. Thus, if the paper were crowded with horizontal lines, a person with normal (20-20) vision would just be able to discern the individual lines on the paper. Similarly, if the paper were crowded with vertical lines, they would be just discernible.
... expressed in Fig.3. Most of the image information is in the low frequency filters. High ... around the edge. The majority of the information is localized in low frequency filters while ... on matrix which corresponds to the image column and row pixels. In the inverse wavelet transform, ... we reverse all forward process. It does both 1-D and 2-D wavelet transform. -Image ...
Recalling that two pixels are required to make the black-white transition of a line, these resolutions correspond to or 27, 658, 800 pixels per page. This number of pixels would be sufficient to represent any image on the inch page with no visible degradation compared to a perfect () image at a distance of one foot. When dealing with printers we often quote the resolution in terms of dots per inch, which in our example would correspond to pixels per inch. Using the numbers above, we have seen that 550 dots per inch would be sufficient to fool any eye into thinking a picture was rendered without if the paper were held at distances of a foot or greater.
This explains the long-term popularity of 600 dots per inch (dpi) laser and ink jet printers. Of course, by holding the paper closer, we can get greater perception of resolution; this explains the utility of printers with even higher resolutions than 600 dpi. How close we will hold a picture for viewing is, of course, a consideration in judgments about sufficient resolution. We call the shortest distance at which a person can focus on an object that person’s accommodation distance. This distance typically varies with age from about 7 cm (2. 75 inches) at age 10 to about 200 cm (78 inches) at age 65.
At age 47, the accommodation is approximately the 12 inches used in the above example, and anything better than about 600 dpi resolution is wasted on the unaided eye. The student at 17 years of age with an accommodation of 9 cm (3. 5 inches), on the other hand, feels cheated with anything less than a printer capable of 1200 dpi, motivating the use of 1200 dpi and greater printing resolutions in the magazine industry. 5. 5 Other Types of Image Formation Lens-based cameras are not the only means by which images may be formed from the real world.
Several other examples of image formation systems include radar, sonar, X-rays, and tomography (”CAT scans”).
These systems differ from traditional cameras in two ways: (1) the type of energy used to form the image (instead of visible light, radio, sound waves, X-rays, or radio emissions of nuclei under the influence of a magnetic field are used); and (2) the geometry of the system that relates the locations of the objects in the real world (three-dimensional) to the image world (two-dimensional).
... was developed. A highway visual image shows highway alignment information, road sign and roadside safety facilities. Visual image database will be added and ... in the digital map should be divided according to the number changes. In order to solve this problems, a multi ... route changes. The contents of kilometer post are; 1) Route number for highway, 2) The closest destination ahead, 3) Remaining ...
A type of image that we are all at least somewhat familiar with (from TV weather forecasts) is the radar image. ”Radar”s tanks for”Radio Detection and Ranging,’ ‘ and is an excellent example of an imaging system that is fundamentally different in several ways from normal photography. These differences include: .
The type of energy used to form the image (radio waves vs. light waves); . The fact that the illumination must be supplied by the imaging system, rather than the surrounding ambient conditions. That is, cameras and human eyes operate with visible light; radar, however, must supply its own ”illumination” using radio waves; .
The geometry of the image (based on polar coordinates rather than rectangular coordinates).
The image is formed by rays emanating from the center of the image, corresponding to the radar location; and. The fact that the radar site (camera) is located in the image plane rather than perpendicular to the image plane and some distance away. This is convenient because it means, for example, that to get a radar image of several hundred square miles of the earth’s surface, we don’t have to take a camera an equivalent distance above the earth; we just position the radar on the surface at the center of the area. Radar operates by sending a narrow beam of radio waves in a particular direction (like a searchlight) and waiting for reception of some reflected energy (from rain in the weather radar example).
This is illustrated in Figure 5.
3. Based on the speed of light, the distance to the reflector can be calculated by measuring how long it takes for the energy to bounce off of the imaged object and return to the sensor. The other dimension is the known angle of transmission. This gives us the two dimensions of a polar coordinate system (each point can be associated with a particular distance and set of angles).
A dot is placed on the radar image at the point corresponding to that distance and angle from the radar site.
The intensity of the dot corresponds to the intensity of the received reflection (the intensity of the rain in our example).
Medical ultrasound systems operate very similarly to radar, but they use sound waves rather than radio waves. Figure 5. 3: Diagram of a radar system. The antenna is like a rotating searchlight, sending rays of radio energy at successive angles around the compass.
... operate on 256 grey-scale images. This means that each pixel in the image is stored as a number between 0 to 255, where ... , ranging from black, through Pg. 4 all of the intermediate gray levels, to white. The result is a numerical representation of the ... way is to store each pixel as a byte, which is 8 bits. In this form the maximum pixel value is 255. Other ...
Wherever the searchlight beam strikes an object, energy bounces back to the radar, and is plotted on the image at the appropriate angle and distance from the antenna. The antenna is located at the center of the radar image. One more type of imaging system that will be mentioned briefly is holography. This system is fundamentally unique in that it captures three-dimensional information from the original scene. It requires illumination from a special (laser) source, and a rather complex optical setup. The viewing requirements for a single holographic image are modest; recently advances in viewing technologies allow video holograms, color holograms, and presentation of holograms to groups of people.
5. 6 Converting Images to Bits We live in an analog world, but almost all image processing is now performed digitally. Hence, we need to understand the methods for converting images to digital format, and back again. We also need to understand the implications of the various ways in which these conversions may be done, which inevitably involve approximations. To develop a binary representation of an image, we will want to determine what approximations we will be making. This depends on the nature of the image we are starting with, on our capability for converting, storing, and transmitting the image, and on the use for which the image is intended.
Consideration of these issues requires making decisions, or trade off, based on our desire for precision, our need to stay within the limitations of our equipment or budget, and the practical matter of how precisely we need to represent the image. First, we will discuss the general issue of digitization of information, and then we ” ll specifically consider the digitization of images of different kinds. 5. 6. 1 From continuous information to a Discrete Representation In Chapter 3, we discussed how different types of information could be represented by binary digits.
The process of representing information in binary form is sometimes known as digitization. In its most general form, this means the process by which anything and everything in our analog world can be converted into numbers for computer processing and storage. By continuous information, we mean a quantity that can take on the infinite number of possible values that belong to a continuum. Discrete information, on the other hand, implies that the quantity can assume only a finite number of values (at finite instances in time or finite locations in space).
Much of what we measure in the world is continuous, whether or not our measurements are capable of representing continuous information. By definition, binary digits are discrete, because they can take on only two values.
Furthermore, only discrete information can be perfectly represented using binary digits. This means that whenever we use bits to represent some continuous quantity, we are making an approximation and introducing some error by ”throwing away” information. We must convert information from continuous form to discrete form as the first step in digitizing the information. Such a conversion, also known as an analog to digital conversion, necessarily requires some loss of information and precision.
Examples of continuous information include measurements of temperature, distance, voltage, pressure, speed, volume, and other quantities that take on a continuum of values, or an infinite number of possible values within some finite range. As we have discussed, we would require an infinite number of bits to perfectly represent any such quantity, because representation of the precise value would require infinite precision. In practice, however, we can satisfy our need for any level of precision by using enough bits to provide a sufficiently precise approximation of the continuous quantity. As we have mentioned, real-world applications do not require infinite precision; in practice, we have no use for it. Another example of continuous information would be an image of a scene. For an example, let us consider a so-called black-and-white photograph of a scene, which we wish to digitize and store.
(Note that by a black-and-white photograph, we usually mean a photo with more than just black and white in it, and are really referring to the fact that shades of gray from black to white appear. ) This example is actually continuous in two senses: the brightness may take loan infinite range of values at each point in the image, and the image contains an infinite (continuous) number of points. By brightness, we refer to the fact that at every point in the photo, there is some value that could describe the shade of gray at that point. These shades of gray, known as gray levels, range from black to white and are continuous — there are an infinite number of them. The actual value that we could associate with each gray level would be a measurement of how much light that shade reflects. Further, this photo is also continuous in space, because there is a different gray level at each point and, assuming a perfect camera, an infinite number of such distinguishable points distributed in the photo.
So, we have an infinite number of locations, each of which can take on one of an infinite number of gray levels. How can we develop a binary representation of such a photo? The answer is that we must perform two processes, each of which provides an approximation to transform a continuous quantity into a discrete quantity. First, we must reduce the spatial resolution of the image from a continuous area representation into a finite number of small picture elements, or pixels, each representing small areas rather than infinitesimal points within a spatial continuum. Then, we must convert the brightness level corresponding to each pixel into code representing the approximate gray level. 5. 6.
2 Pixels: A Matter of Spatial Resolution Each pixel in an image corresponds to a small area (usually, but not always, square) of that image. Each pixel represents a single intensity (brightness) level. Ideally, we would choose this pixel area to be small enough so that when these pixels are put next to each other on a display, they present a pleasing representation of the original analog image to the viewer. This process of breaking continuous image into a grid of pixels is sometimes called, sampling, scanning, or spatial quantization. Figure 5.
4 shows a pixel ized image and an inset that allows us to see the pixels. Figure 5. 4: A picture of a flower bed, represented as a collection of tiny pixels and a close up view of a small section of that flower. The close-up view reveals the pixels which were small enough to not be noticed before.
The definition of ”pleasing” is driven by the use to which the picture will be put. For an artist or an intelligence analyst, the sampled picture would have to have lost no content so far as the human eye with its limitations can detect. To the Internet user, the idea of pleasing often centers around the idea of conveying the content with a minimum of download time; hence, requiring as few pixels as make the picture recognizable. From the point of view of this book, the key is to determine the desired information that the digitized image is to convey, and then to choose the appropriate digitization procedure. Pixels are usually arranged in a rectangular grid, as shown in Figure 5.
5. In this figure we have artificially introduced a black border around the pixels so the individual pixels stand out. This picture consists of 169 pixels arranged in a 13 13 grid. Other spatial arrangements and shapes of pixels are sometimes used as well, such as concentric circles of truncated wedge-shaped pixels in radar imaging, but this grid arrangement of square pixels is by far the most common. One way to create a pixel ized image is to photograph a scene with a digitizing camera. There are many different types of these cameras; most of them have an array of light-sensitive devices, each of which is responsible for measuring the brightness at a single pixel’s location.
Another way to create a pixel ized image is to put an image through a digitizing device, such as a scanner. Figure: A rectangular grid of pixels with black borders clearly delineating pixel boundaries. Let’s return now to our example of the black-and-white photograph we wish to digitize for storage. How do we determine the number of pixels to use? Given an image of fixed size, the spatial resolution of the image-which affects our perception of the quality or fidelity of the image-is dependent on the number of pixels in the image. If we use too few pixels, then the image appears ”coarse” or ”blocky”, and the effects of are apparent. Figures 5.
6 through 5. 9 show the same image for four different resolutions. The image in Figure 5. 6 uses a grid of pixels, for a total of 65, 536 pixels.
Figures 5. 7, 5. 8, and 5. 9 show the same image with resolutions of grid of pixels 1 (6, 384 pixels) (4096 pixels), and (256 pixels), respectively.
Figure 5. 6: A 256 by 256 pixel image. Figure 5. 7: A 128 by 128 pixel image. Figure 5. 8: A 64 by 64 pixel image.
Figure 5. 9: A 16 by 16 pixel image. Each image in this sequence uses significantly fewer pixels than the previous version. This, of course, directly determines how much data storage or transmission will be required for the digitized versions of these images. If we know our capacity for storage and have decided what degree of image quality we wish to retain, it would seem that we might be ready to make a decision on how to digitize and store the photo. But the amount of retained information is also dependent on how the brightness level at each pixel is digitized into a gray level value.
5. 6. 3 Shades of Gray Let us continue with our example. Each pixel must be converted into binary data. To do this, we first determine the number of brightness levels that we wish to represent. For example, if we wish to use 8 bits to represent the brightness at each pixel, we would have 256 brightness levels (recall that a binary number with 8 bit positions may take on 256 different values, or equivalently that 2 raised to the power 8 equals 256), which we generally evenly distribute between pure black and brightest white.
Then, each pixel would be associated with an 8-bit number corresponding to whichever of the 256 brightness levels is closest to the actual analog image brightness at that location in the image. This process is known as quantization; in fact, any time continuous quantity is ”rounded off” for purposes of digitization, the term quantization can be used to describe the process. Then, we can use binary digits to represent the quantized gray levels, and store each pixel as a collection of bits. As usual, we will want to have a number of gray levels that is a power of two, so that given bit word length can be used to represent as many different gray levels as possible. Recall the images in Figures 5.
6 through 5. 9. Although you were not told this, each of the pixels in these images had gray levels represented by a 6-bit word. This means that a total of 26 = 64 possible gray levels were used to generate all four images. Each pixel was quantized and stored as six binary digits of memory, using codes ranging from black (000000) to white (111111).
In Figure 5. 10, the version of this photograph is shown, again using a 6-bit gray-scale resolution. In Figures 5. 10 through 5. 12, the same photo is shown at the same spatial resolution, but with 6 bits (for 64 gray levels), 3 bits (8 gray levels), and 1 bit (2 gray levels) used to represent and store each pixel. You can see the effect of differing gray-scale resolutions on image quality is clearly discernible, but is of a different nature than the effect of differing spatial resolutions.
Figure 5. 10: A 6-bit (64 gray levels) image. Figure 5. 11: A 3-bit (8 gray levels) image. Figure 5. 12: A 1-bit (black and white only) image.
Therefore, to determine how much storage will be required for our photo, we must choose both a spatial resolution, which sets the number of pixels, and a brightness resolution, which sets the number of bits used to represent each pixel. The total number of bits required to store this image (directly) is just the product of the total number of pixels and the number of bits used per pixel. For our example, if we choose to represent the photo by an array of pixels, and decide that 32 gray levels (5 bits) is enough, then our image can be stored using bits, or 2560 bytes, or 2. 5 kB. Note, however, that depending on how the storage system we use represents integers, we may wind up using a full byte, or even two bytes, for each pixel, even if we are only using 5 or fewer bits per pixel.
This is due to the standard formats for data representation as described in Chapter 4. In later chapters, we will discuss the implications of this for different applications, and present some techniques for reducing the amount of storage required without adversely affecting the image quality. It is important to note again that the resolution appropriate for a digitized image depends on the use for which it is intended. Images that are meant to be viewed by human beings do not require more resolution than the limits of human visual acuity can appreciate. Or, in some cases, the means of reproducing the image sets the underlying limit of picture fidelity. We encounter such reproduction limitations, for example, in the presentation of an image on a television or by printing on paper using a given laser printer.
For example, the image in Figure 5. 6, with pixels, and six bits per pixel for a total of 64 gray levels, was determined to be good enough for an image of this size in an application such as this textbook. Higher-quality representation might require more storage, yet would appear no different to an observer. On the other hand, if this image were to be stored for use by a computerized image analysis routine, there might be advantages to using more pixels, more bits per pixel, or both. 5. 6.
4 Color Representation So far, we have considered only black-and-white images. How can we represent color images in binary form? Students of art may recall that any color can be created by adding the right proportions of red, green, and blue light. These should not be confused with the ”subtractive primary colors — magenta, yellow and cyan — that are used when combining pigments. While other colors may be used, red, green, and blue are the standard colors which are mixed to form other colors of light. Each color can be created by these colors in the appropriate combination; thus, we can represent a color with three numbers indicating the amounts of red, green, and blue light that combine to produce that color. This system for specifying colors is known as the RGB system.
For example, 10 units of red, green, and blue will form white of a certain intensity. If we increase this to 20 units of red, green, and blue, we will still have a white light, but it will be more intense. Why does the eye perceive combinations of red, green, and blue as a full spectrum of colors? The impaired human (that is, someone who possesses ”trichromatic vision” and does not suffer from a variety of color blindness ailments such as monochromatism or dichromatism) has three kinds of cells in the eye that are sensitive to different ranges of wavelengths of light and are used to distinguish color. The three values for red, green, and blue content in an RGB can produce a response by the eye like that of any other color because our eyes can only interpret a color from the three responses or the respective cells. Standard color televisions use tight clusters of red, green, and blue color sources to create the illusion of other colors.
Because these sources are small and closely spaced, the human eye cannot discern the individual components, and we just see the color combination as a single shade. However, standard color television systems do not use a completely digitized version of the image; each row of the display, or raster, is transmitted as continuous information for each of the color components. When we wish to digitize a color image, we must first spatially quantize the image into pixels, as we did for black-and-white imagery. Then, we must determine the RGB representation for each pixel. That is, we must determine the amount of red, green, and blue needed to represent the color at the pixel’s location. Finally, we must digitize these three numbers, to represent each value by a binary number of a predefined length.
Consider digitizing a particular pixel. Note that because we are representing these color components in binary form, we must approximate their contributions by a finite number of bits. If, for example, we use 3 bits for each color value, we would be able to represent 23 = 8 different intensity levels of red, of green, and of blue. This representation would require a total storage of 9 bits per pixel — three bits for each of the three colors. This would give us different possible color combinations.
In more mathematical terms, because we are using 9 bits, we should be able to represent 29, or 512 different colors. Another system used to represent color imagery is called HLS (hue, luminance, and saturation).
This system does not represent colors by combinations of other colors, but it still uses three numerical values. The hue of a pixel represents where its pure color component falls on a scale that extends across the visible light spectrum, from red to violet. The luminance of a pixel represents how bright or dark the pixel is. The saturation represents how ”pure” the color is; that is, how much it is or is not diluted by the addition of white, with 100% indicating no dilution with white.
Thus, for example, pastel colors have saturation levels well below 100%. This set of three numbers, like RGB, is sufficient to represent any color. So, in an HLS system, the color of each pixel is again represented by three binary words, each of which represents a digitized value. The effects of variations in hue and saturation are depicted in Figure 5. 13. Figure 5.
13: Diagram showing the colors corresponding to the full range of hue (along horizontal axis) and saturation values (0% saturation at the bottom and 100% saturation at the top).
Yet one more approach needs to be described to convey the way in which colors in pictures are coded in real systems. Suppose that only a few bits were going to be used to represent our images; we might choose to only use a small number of bits so as to speed transmission over a network, or to allow more images to be stored on a computer disk of a given size. If we decided to use nine bits as in our example above, we would be allowed a total of 512 colors. But, using exactly the scheme described above, this would be 512 specific colors that may not well represent the range of colors in our picture. For example, if our picture happened to be a reproduction of a ”black and white” photograph, we might be quite disappointed with the result: the above scheme has a total of eight shades of gray available among its 512 colors.
Thus our picture would have a very coarse representation of the shades in our photograph. The representation of our photograph would, of course, be greatly enhanced if we had 512 shades of gray available to us. This kind of optimization of the use of the colors used to represent an image is actually used. Prior to representing an image in discrete colors, a software procedure can be used to scan the entire image and determine the most useful set of colors to have if we are limited to a specific number. Then, carrying our example forward for the 9-bit case, the total picture representation will consist of a list of 512 colors, each specified in perhaps 24 bits, 8 for each RGB color component.
Then, the picture will be represented with each pixel given a 9-bit number, which is simply an index that may be used to look up the 24 bits of color information from the complete table. We call this a palette color representation, because the table describes a palette of colors from which the final picture is rendered. 5. 6.
5 Color Discrimination Studies such as those described above for human spatial and luminance discrimination have also been conducted for color discrimination. The average person can discern about 100 saturated (that is, pure) colors from each other. When both luminance and hue are varied, one can discern about 6, 000 variations of color intensity. Finally, about another 60 levels of saturation are discernible, for a grand total of approximately 360, 000 recognizable colors. With appropriate encoding, we would expect to need 19 bits per pixel for full color representation, with the same caveats as before of the effects of examining cut outs of such an image in isolation.
In practice, though, 8 bits per color or 24 bits per pixel are general used for full color representation, for the sake of convenience. In Figures 5. 14-5. 17, we see a sequence in which the number of colors used to represent the image is decreased progressively from 24 bits down to 4 bits.
For many natural scenes in which we have a good intuitive feel for the true appearance and abundance of color variations, the first reduction from 24 to 16 bits is noticeable in a comparison. As we drop to 8 bits, even without a comparison it is easy to discern the fact that the picture is no longer true in the sense of an accurately rendered scene. Figure 5. 14: An image with 24 bit color (8 bits each for red, green and blue. ) Figure 5.
15: An image with 16 bit color resolution. Figure 5. 16: An image with 8 bit color resolution. Figure 5. 17: An image with 4 bit color resolution. In Figure 5.
18 we see the application of yet another means to better represent a picture with a limited number of colors. Here a process known as dithering has been applied. As is done with the color images in a newspaper, color dots from the available palette have been arranged in clusters to approximate a new color when the result is viewed from a sufficient distance. The dithering process produces a rather coarse-looking picture when viewed too closely, but the results are quite appealing from the appropriate distance. In particular, compare this picture at a good distance with the previous one, which used the same number of colors.
Figure 5. 18: An image with 4 bit color resolution in which a dithering process has been used to obtain a better representation on average. 5. 7 Binocular Vision and 3 D displays As humans, most of us see the world through two eyes. This binocular (or stereo, or 3 D) vision provides us with additional visual information referred to as depth perception.
It is in fact a misnomer to refer to this type of vision as 3 D, because only very limited information in the third dimension is available. This limited information is sufficient to do two things: it makes the scene look more ”real” to the viewer, and it provides some specific information as to the location of objects in the front-to-back direction. Figure 5. 19 illustrates the two images forming a binocular vision pair. Of course, in normal vision your eyes see not two separate images, but the fused image with the depth dimension added. We are used to having this additional dimension in our vision, and two-dimensional displays eliminate this additional information.
Hence, development of three-dimensional displays has been an active research area for many years. The principle behind stereo vision displays is simple: different images must be presented to each eye, corresponding to the two images that the eyes would form from their disparate positions in the human head. This is easy to accomplish for one viewer with still images, as illustrated by the old-fashioned stereo scope in which one photograph was simply held in place in front of each eye. For multiple viewers and moving images, the situation becomes more complicated. Figure 5. 19: Example of images produced by binocular (stereo) vision with camera separation equivalent to the spacing of the human eyes.
Some people can cross their eyes in such a way as to fuse these two pictures in their minds into single stereo sco pic scene. The role of stereo displays is to produce this without effort, talent, or eye strain. The original 3 D movies that were shown in theaters made use of color to isolate the images. The images that one eye was meant to see we represented in blue, while those for the other eye were presented in red. The audience wore plastic glasses that placed a blue filter over one eye and a red filter over the other. While a true sensation of 3 D images was produced, the color of these images was something like black-and-white with disturbing tinges of blue and red at the edges.
That is, all color realism was abandoned to achieve depth realism. New systems for 3 D display make use of optical polarization to separate the images. Polarization is a property of light that may go unnoticed in your day-to-day experience. Briefly, in addition to being able to break up white light into colors, we can also break light up into two components known as polarizations. There are actually two ways to do that, but we will dwell upon the easier to explain: linear polarization components. Special filters can be used to produce light that has only a vertical component, and other filters can produce light with only a horizontal component.
Each of these streams of light would then pass again through a filter of the same type or be totally blocked by a filter of the opposite type. Because any color of light has these two possible components, we can use such filters to selectively pass or block entire color images. Thus, if we project the two stereo images in color, but, with each projected by a different polarization of light, then the audience wearing polarized glasses (opposite filter types over each of their eyes) would perceive a 3 D display. Yet another approach makes use of what are known as shutter glasses such as those shown in Figure 5. 20. In this case, the two images that we wish to present to the viewer’s eyes are presented alternately.
The viewer wears a pair of glasses that have electronic shutters in front of each eye that allow each image in the sequence to be seen only by the appropriate eye. The states of the shutters in the glasses are of course synchronized with the presentation sequence; this is usually done via an additional signal from the display or a wired connection to the display electronics. The mechanism behind the operation of the electronic shutters is typically the same liquid crystal technology that is used to draw numbers on the face of a calculator display. Figure 5. 20: Stereo sco pic shutter glasses are shown being used to view a 3-D display of terrain. The operation of the electronic shutters in this case is synchronized with images shown on the computer screen via an infra-red link (note the small box on top of the computer monitor) between the computer and the glasses.
5. 8 From Images to Video Most people have seen old films of live or animated sequences, in which the choppy motion makes clear that the viewer is actually looking at a sequence of still images. Indeed, everything which we call ”video,’ ‘including television, movies, and computer graphics, consists of a series of still images that are displayed so as to appear in continuous motion to the human eye. The only difference between your latest laser disk movie and ”The Birth of a Nation” is how well the eye is fooled into thinking it is viewing actual continuous motion. 5. 8.
1 Human Visual persistence If you look at a well-lit scene and then close your eyes, you will notice that the image can still be sensed for some time after the eyes close. This is due to the amount of time that the retina retains some of the information with which it has been stimulated. This phenomenon, which places limits on how fast our visual system can react to changes, is known as visual persistence or visual latency. Simply put, our visual system has a slow response to change in stimulus. We can take advantage of this to develop techniques for digital video systems.
Although an image on the retina decays gradually, rather than lasting a specific amount of time, there is a critical period during which the stimulus changes so little that the visual system cannot take in any new information even if the eyes are open. This period, on average, is about 50 milliseconds, or one twentieth of a second. Thus, the average human visual system can only take in about 20 different images per second before they begin to blur together. If these images are sufficiently similar, then the blurring which takes place appears to the eye to resemble motion, in the same way we discern it when an object moves smoothly in the real world. Another way to measure visual persistence is to determine the number of flashes of light per second that would appear to be a continuous, flicker-free illumination.
Studies show that this varies as a function of the intensity of the flashes, but that almost no flashing is evident at above 50 flashes per second, and perception even for the brightest of lights disappears for rates above 80 flashes per second. 5. 3 In fact, if your eyes had a faster response than this, you might find it quite annoying, since the 60 Hz electrical system used in the United States causes electric lights to flicker at a rate of 120 times per second. Because of human visual latency, we only see continuous light.
In Europe, the power fluctuates at 50 Hz and hence illumination pulses at 100 flashes per second; they are living closer to the edge of perception there. The above phenomenon has been used since the beginning of the 20 th century to produce ”moving pictures,’ ‘ or movies. Thomas Edison, the inventor of the motion picture camera, needed to balance the needs of human perception with the desire to minimize the amount (hence cost) of film that needed to be taken. He determined experimentally that 10 frames per second sufficed to provide the illusion of continuous motion (just barely).
He also determined that viewers were quite annoyed by 10 flashes per second caused by the shutter opening and closing to accommodate motion of the film between looks.
This phenomenon was addressed simply by having the shutter open and close three times for each single motion of the film, producing a 60-flashes-per-second presentation. This higher flash rate was tolerated nicely by humans. Soon after, the frame rate was increased by the motion picture industry to provide a more pleasing 24 frames per second to provide an improved illusion of continuous motion. Again the phenomenon was addressed by having the shutter open and close, this time twice for each single motion of the film, producing a 48-flashes-per-second presentation. This rate is still used for motion pictures. Television, interestingly enough, displays 30 new images per second, but suffers from the same flash phenomenon if simply presented.
This phenomenon is also addressed by presenting the images twice per frame, in a sense. The way this is accomplished is that 60 times per second, every other line or raster is changed. Each new image is painted onto the screen in a two-step process — first the odd rows, then the even ones — so that at every point on the screen, things are locally changing at a rate of 60 times per second. In this way we do not discern the choppiness we would see if the image were refreshed all at once 30 times per second. This same phenomenon can be used to create digitized video — a video signal stored in binary form. We have already discussed how individual images are digitized; digital video simply consists of a sequence of digitized still images, displayed at a rate sufficiently high to appear as continuous motion to the human visual system.
The individual images are obtained by a digital camera that acquires a new image at a fast enough rate (say, 60 times per second), to create a time-sampled version of the scene in motion. Because of human visual latency, these samples at certain instants in time are sufficient to capture all of the information that we are capable of taking in! When we discuss digitization of audio signals, we ” ll go into more depth about the idea of sampling a quantity at different times. 5. 8. 2 Adding Up the Bits In Chapter 3, we calculated that one hour’s worth of music stored on a compact disc would require storage of over 5 billion bits (608 MB) of information. How does this compare to one hour’s worth of digital video? Let’s make some simple assumptions to get a rough idea: let’s assume a screen that is pixels — about the same resolution you can get on a good TV set.
Of course we want color — let’s say we ” ll use 3 bits per color per pixel, for a total of 9 bits per pixel; that seems pretty modest. Now, let’s say we want the scene to change 60 times per second, so that we don’t see any flicker or choppiness. This means we will need pixels bits per pixel frames per second seconds = 500 billion bits per hour — just for the video. Francis Ford Coppola’s The Godfather, at over 3 hours, would require nearly 191 GB — over 191 billion bytes — of memory using this approach. This almost sounds like an offer we can refuse. But, do films actually require this much storage? Fortunately, the answer is no.
The reason we can represent video with significantly fewer bits than in this example is due to compression techniques, which take advantage of certain predict abilities and redundancies in video information to reduce the amount of information to be stored. In the following chapters, we ” ll discuss some of these techniques and the data storage requirements for video that they help us achieve. Summary So, is a picture worth ten thousand words? That depends on the picture and the words, of course! But 10, 000 words, at an average of 6 characters per word, and 8 bits per character for the ASCII representation, would require 480, 000 bits — approximately 60 KB — of storage. And an image which is pixels, with 8 bits used for the gray level of each pixel, is also 524, 288 bits, or approximately 64 KB. So, it seems as if the writers of those old adages might have had more insight into information representation than one might first suspect. In this chapter, we have discussed how images — still and moving — can be represented in binary form for purposes of storage, processing, or transmission.
We ” ve talked about the two processes required to form a binary representation of an image — and quantization — and described how these processes affect both image quality and storage size. We ” ve discussed a few ways of representing color information, and talked about how video standards are based on the performance of the human visual system. In the next chapter, we will go into some specifics regarding how digital imagery is typically acquired, represented, stored, and transmitted. We ” ll talk about standard formats for common image-rendering devices, such as fax machines, printers, and PC monitors. And, we ” ll discuss some more implications of the large amounts of storage needed for video, which will lead to the need for schemes to reduce the storage without compromising the information.