Saturday, September 28, 2013

What is a histogram?

I reproduce here the opening paragraphs of Yannis Ioannidis's 2003 The History of Histograms (abridged) for useful information on etymology, early usage, and definition.  The full article is available as a PDF download, but outside the paragraphs here may be of little interest to photographers or a nonspecialist audience.
The word `histogram' is of Greek origin, as it is a composite of the words `isto-s'  (= `mast', also means `web' but this is not relevant to this discussion) and `gram-ma'  (= `something written'). Hence, it should be interpreted as a form of writing consisting of `masts', i.e., long shapes vertically standing, or something similar. It is not, however, a word that was originally used in the Greek language.  The term `histogram' was coined by the famous statistician Karl Pearson to refer to a "common form of graphical representation". In the Oxford English Dictionary quotes from "Philosophical Transactions of the Royal Society of London" Series A, Vol. CLXXXVI, (1895) p. 399, it is mentioned that "[The word `histogram' was] introduced by the writer in his lectures on statistics as a term for a common form of graphical representation, i.e., by columns marking as areas the frequency corresponding to the range of their base."  Stigler identifies the lectures as the 1892 lectures on the geometry of statistics [69].
The above quote suggests that histograms were used long before they received their name, but their birth date is unclear. Bar charts (i.e., histograms with an individual `base' element associated with each column) most likely predate histograms and this helps us put a lower bound on the timing of their first appearance. The oldest known bar chart appeared in a book by the Scottish political economist William Playfair titled "The Commercial and Political Atlas (London 1786)" and shows the imports and exports of Scotland to and from seventeen countries in 1781 [74]. Although Playfair was skeptical of the usefulness of his invention, it was adopted by many in the following years, including for example, Florence Nightingale, who used them in 1859 to compare mortality in the peacetime army to that of civilians and through those convinced the government to improve army hygiene. 
From all the above, it is clear that histograms were first conceived as a visual aid to statistical approximations. Even today this point is still emphasized in the common conception of histograms: Webster's defines a histogram as "a bar graph of a frequency distribution in which the widths of the bars are proportional to the classes into which the variable has been divided and the heights of the bars are proportional to the class frequencies". Histograms, however, are extremely useful even when disassociated from their canonical visual representation and treated as purely mathematical objects capturing data distribution approximations. This is precisely how we approach them in this paper. 
In the past few decades, histograms have been used in several fields of informatics. Besides databases, histograms have played a very important role primarily in image processing and computer vision. Given an image (or a video) and a visual pixel parameter, a histogram captures for each possible value of the parameter (Webster's "classes") the number of pixels that have this value (Webster's "frequencies"). Such a histogram is a summary that is characteristic of the image and can be very useful in several tasks: identifying similar images, compressing the image, and others. Color histograms are the most common in the literature, e.g., in the QBIC system [21], but several other parameters have been proposed as well, e.g., edge density, texturedness, intensity gradient, etc. [61]. In general, histograms used in image processing and computer vision are accurate. For example, a color histogram contains a separate and precise count of pixels for each possible distinct color in the image. The only element of approximation might be in the number of bits used to represent different colors: fewer bits imply that several actual colors are represented by one, which will be associated with the number of pixels that have any of the colors that are grouped together. Even this kind of approximation is not common, however. 

Michael Freeman in his Digital SLR Handbook (2011, 3rd ed) breaks it down quite simply:
A histogram is simply a column graph.  In digital photography, a standard 8-bit scale of 0-255 shows 256 columns from pure black at left (0) to pure white at right (255), and normally, in camera displays and Photoshop, they are packed together so that they join and there are no gaps.  Pixel brightness is plotted across the bottom on the X axis while the number of pixels that contain a particular tone is plotted up the vertical Y axis.  At all stages of the photography workflow, this is the single most useful representation of the tonal qualities of an image.  (p150)

Rick Nunn (Histograms, 2010) has a short and simple blog post about histograms featuring two helpful illustrations.  First, there is the anatomy of a histogram:




Then there is the collection of archetypes:




Histograms can be used as aids to set exposure in one of two ways.   Many DSLR cameras now feature a live histogram on the LCD.  The photographer can view the changes in the histogram as changes are made to exposure settings.  Nearly all DSLRs also include histograms as part of their playback views.  Given the opportunity to reshoot a scene, the photographer can use the playback histogram to make adjustments for additional exposures. Histograms can also be used as aids to adjust exposure in software such as Photoshop and Lightroom.


Ken Rockwell (How to Use Histograms, 2006) wants you to know that histograms cannot replace the photographer's vision in setting or correcting exposure:
The best way to evaluate exposure is to look at the picture, not a histogram. 
Histograms are a way to measure exposure more objectively for those who can't see very well. Histograms don't replace your eyes and experience. Histograms are helpful in sunlight where it's hard to see an LCD, or in the shop if setting something exactly. Your eyes are always the final judge. 
A histogram is just a guide. Worry about your image more than the histogram.

As an example of the pitfalls in over-reliance on histograms to set exposure, Jeff Wignall (Exposure, 2011) gives this example:
And that’s exactly where the histogram can get you into trouble: Imagine that you shoot a photograph of a nice country scene with a dirt road, a dark-red barn, and a white sky, and assume that the barn is the largest subject in the scene. When you look at the histogram, you see a lot of concentrated hills at the left edge of the scene (they represent the dark road and the big barn), and you might be tempted to add some exposure to shift them a bit to the right. The trouble is that the barn might have looked perfectly good as a dark tone, and by adding exposure, you’re taking away from the drama of your photograph. You are letting the graph dictate “correctness” instead of drama. Don’t. (p231)
Finally, Brian Auer (How to Read Image Histograms, 2010) has produced a series of monochrome images, compiled in to a collage, demonstrating different exposure settings with accompanying histograms.  It shows quite clearly that acceptable images will not always produce an even distribution of data within the histogram.  Visit Mr Aur's post at the link above to see fuller sized versions of all these images.




This, then, constitutes an overview on the basic properties and uses of exposure histograms.  I was going to include all this in my exercise report, but given the length I have broken it off into a separate post.

#

No comments:

Post a Comment