Cafezika: Reading Comprehension

Given the high price of wine and the enormous number of choices, a system in which industry experts comb through the forest of wines, judge them, and offer consumers the meaningful shortcut of medals and ratings makes sense.

But what if the successive judgments of the same wine, by the same wine expert, vary so widely that the ratings and medals on which wines base their reputations are merely a powerful illusion? That is the conclusion reached in two recent papers in the Journal of Wine Economics.

Both articles were authored by the same man, a unique blend of winemaker, scientist and statistician. The unlikely revolutionary is a soft-spoken fellow named Robert Hodgson, a retired professor who taught statistics at Humboldt State University. Since 1976, Mr. Hodgson has also been the proprietor of Fieldbrook Winery, a small operation that puts out about 10 wines each year, selling 1,500 cases.

A few years ago, Mr. Hodgson began wondering how wines, such as his own, can win a gold medal at one competition, and "end up in the pooper" at others. He decided to take a course in wine judging, and met G.M "Pooch" Pucilowski, chief judge at the California State Fair wine competition, North America's oldest and most prestigious. Mr. Hodgson joined the Wine Competition's advisory board, and eventually "begged" to run a controlled scientific study of the tastings, conducted in the same manner as the real-world tastings. The board agreed, but expected the results to be kept confidential.

There is a rich history of scientific research questioning whether wine experts can really make the fine taste distinctions they claim. For example, a 1996 study in the Journal of Experimental Psychology showed that even flavor-trained professionals cannot reliably identify more than three or four components in a mixture, although wine critics regularly report tasting six or more. There are eight in this description, from The Wine News, as quoted on wine.com, of a Silverado Limited Reserve Cabernet Sauvignon 2005 that sells for more than $100 a bottle: "Dusty, chalky scents followed by mint, plum, tobacco and leather. Tasty cherry with smoky oak accents…" Another publication, The Wine Advocate, describes a wine as having "promising aromas of lavender, roasted herbs, blueberries, and black currants." What is striking about this pair of descriptions is that, although they are very different, they are descriptions of the same Cabernet. One taster lists eight flavors and scents, the other four, and not one of them coincide.

That wine critiques are peppered with such inconsistencies is exactly what the laboratory experiments would lead you to expect. In fact, about 20 years ago, when a Harvard psychologist asked an ensemble of experts to rank five wines on each of 12 characteristics—such as tannins, sweetness, and fruitiness—the experts agreed at a level significantly better than chance on only three of the 12.

Psychologists have also been skeptical of wine judgments because context and expectation influence the perception of taste. In a 1963 study at the University of California at Davis, researchers secretly added color to a dry white wine to simulate a sauterne, sherry, rosé, Bordeaux and burgundy, and then asked experts to rate the sweetness of the various wines. Their sweetness judgments reflected the type of wine they thought they were drinking. In France, a decade ago a wine researcher named Fréderic Brochet served 57 French wine experts two identical midrange Bordeaux wines, one in an expensive Grand Cru bottle, the other accommodated in the bottle of a cheap table wine. The gurus showed a significant preference for the Grand Cru bottle, employing adjectives like "excellent" more often for the Grand Cru, and "unbalanced," and "flat" more often for the table wine.

Provocative as they are, such studies have been easy for wine critics to dismiss. Some were small-scale and theoretical. Many were performed in artificial laboratory conditions, or failed to control important environmental factors. And none of the rigorous studies tested the actual wine experts whose judgments you see in magazines and marketing materials. But Mr. Hodgson's research was different.

In his first study, each year, for four years, Mr. Hodgson served actual panels of California State Fair Wine Competition judges—some 70 judges each year—about 100 wines over a two-day period. He employed the same blind tasting process as the actual competition. In Mr. Hodgson's study, however, every wine was presented to each judge three different times, each time drawn from the same bottle.

The results astonished Mr. Hodgson. The judges' wine ratings typically varied by ±4 points on a standard ratings scale running from 80 to 100. A wine rated 91 on one tasting would often be rated an 87 or 95 on the next. Some of the judges did much worse, and only about one in 10 regularly rated the same wine within a range of ±2 points.

Mr. Hodgson also found that the judges whose ratings were most consistent in any given year landed in the middle of the pack in other years, suggesting that their consistent performance that year had simply been due to chance.

Mr. Hodgson said he wrote up his findings each year and asked the board for permission to publish the results; each year, they said no. Finally, the board relented—according to Mr. Hodgson, on a close vote—and the study appeared in January in the Journal of Wine Economics.

This September, Mr. Hodgson dropped his other bombshell. This time, from a private newsletter called The California Grapevine, he obtained the complete records of wine competitions, listing not only which wines won medals, but which did not. Mr. Hodgson told me that when he started playing with the data he "noticed that the probability that a wine which won a gold medal in one competition would win nothing in others was high." The medals seemed to be spread around at random, with each wine having about a 9% chance of winning a gold medal in any given competition.

To test that idea, Mr. Hodgson restricted his attention to wines entering a certain number of competitions, say five. Then he made a bar graph of the number of wines winning 0, 1, 2, etc. gold medals in those competitions. The graph was nearly identical to the one you'd get if you simply made five flips of a coin weighted to land on heads with a probability of 9%. The distribution of medals, he wrote, "mirrors what might be expected should a gold medal be awarded by chance alone."

Mr. Hodgson's work was publicly dismissed as an absurdity by one wine expert, and "hogwash" by another. But among wine makers, the reaction was different. "I'm not surprised," said Bob Cabral, wine maker at critically acclaimed Williams-Selyem Winery in Sonoma County. In Mr. Cabral's view, wine ratings are influenced by uncontrolled factors such as the time of day, the number of hours since the taster last ate and the other wines in the lineup. He also says critics taste too many wines in too short a time. As a result, he says, "I would expect a taster's rating of the same wine to vary by at least three, four, five points from tasting to tasting."

One critic who recognizes that variation is an issue is Joshua Greene, editor and publisher of Wine and Spirits, who told me, "It is absurd for people to expect consistency in a taster's ratings. We're not robots." In the Cruse trial, the company appealed to the idea that even experienced tasters could err. Cruse claimed that it had bought the cheap Languedoc believing it was the kingly Bordeaux, and that the company's highly-trained and well-paid wine tasters had failed to perceive that it wasn't. The French rejected that possibility, and 35 years ago this December, eight wine dealers were convicted and given prison terms and fines totaling $8 million.

Despite his studies, Mr. Hodgson is betting that, like the French, American consumers won't be easily converted to the idea that wine experts are fallible. His winery's Web site still boasts of his own many dozens of medals.

"Even though ratings of individual wines are meaningless, people think they are useful," Mr. Greene says. He adds, however, that one can look at the average ratings of a spectrum of wines from a certain producer, region or year to identify useful trends.

As a consumer, accepting that one taster's tobacco and leather is another's blueberries and currants, that a 91 and a 96 rating are interchangeable, or that a wine winning a gold medal in one competition is likely thrown in the pooper in others presents a challenge. If you ignore the web of medals and ratings, how do you decide where to spend your money?

One answer would be to do more experimenting, and to be more price-sensitive, refusing to pay for medals and ratings points. Another tack is to continue to rely on the medals and ratings, adopting an approach often attributed to physicist Neils Bohr, who was said to have had a horseshoe hanging over his office door for good luck. When asked how a physicist could believe in such things, he said, "I am told it works even if you don't believe in it." Or you could just shrug and embrace the attitude of Julia Child, who, when asked what was her favorite wine, replied "gin."

Cafezika

Wednesday, November 18, 2009

Reading Comprehension

1 comment: