A coauthor and I just recently submitted a revision of our manuscript to a journal. If we'd known it was going to be so much work, we probably never would've written the paper in the first place. . . . It's a surprising amount of work between idea and execution (even forgetting about issues such as writing the letter in response to the referee reports). And, actually, this particular review process was very easy, as such things go. Still a lot of effort, though. It reminds me that being able to something once is a lot less than describing a method clearly and in appropriate generality.

Get off that goddam cell phone!

| 2 Comments

Mark Glaser writes an interesting but confusing article about a journalism class at NYU where students aren't allowed to blog or twitter about the class content:

After New York University journalism student Alana Taylor wrote her first embed report for MediaShift on September 5, it didn't take long for her scathing criticism of NYU to spread around the web and stir conversations. . . . By Taylor's account, [journalism professor Mary] Quigley had a one-on-one meeting with Taylor to discuss the article, and Quigley made it clear that Taylor was not to blog, Twitter or write about the class again.

Glaser then corresponds with Prof. Quigley, who emails:

I [Quigley] will confirm that I asked the class not to text, email or make cell phone calls during class. It's distracting to both me and other students, especially in a small class seated around a conference table. This has always been my policy, and I would hazard a guess that it's the policy of many professors no matter the discipline.

However, I did say after the class session they were free to text, Twitter, blog, email, post on Facebook or whatever outlet they wanted about the course, my teaching, the content, etc.

Seems clear enough: Keep your thumbs to yourself during the class period then write it all down later. Makes sense to me. But then Glaser reports:

When I [Glaser] followed up and asked her whether that meant students still needed to get permission before writing about class, she said: "Yes, I would certainly require a student to ask permission to use direct quotes from the class on a blog written after class."

Huh? Didn't she just say "they were free to text, Twitter, blog, email, . . . whatever they wanted about the course"? At this point, I wish Glaser had gone back to Quigley one more time for a clarification.

P.S. I looked up Mary Quigley on the web and found this list of articles by her students--judging from the quick summaries, apparently Quigley teaches a class on feature writing--and
this homepage, which to me was suprisingly brief, but I suppose that journalists have a tradition of not giving our their work for free.

P.P.S. Without knowing more details than what is in the links above, I'm 100% in support of Taylor, the student who was told not to blog. But I can definitely sympathize with Quigley: I can well imagine a student in one of my classes blogging something like this:

At the halfway point in the class, Quigley lets us go on a break. In the bathroom I run into an old classmate who asks me if I am going to stay in the class. I ask her if she doesn't like it and she responds that she is worried of it being too "all-over the place" or "disorganized" or "confusing."

Ouch!

P.P.P.S. I was amused that Taylor wrote that "I like to think that having a blog is as normal as having a car." Where exactly does she park?

After writing this, I scrolled down Ben Casnocha's blog and read a few more entries and came to this discussion of Nassim Taleb. Casnocha writes:

At the bottom of Taleb's homepage he posts his email address and invites readers to contact him. With some qualifications:
Concise messages are much preferable (say a maximum < 40 words) as I will not be able to read long letters. Please do not 1) send me your papers or other "interesting material" to read, 2) ask finance questions (not my specialty, 3) make me to rewrite sections of my books (I write books, not emails), 4) ask for a list of "other interesting books to read", 5) ask me to provide career or educational advice, 6) send me passages from Tolstoy or the Ecclesiast on luck and randomness, 7) send me the list of typos in my drafts. Note that I almost always reply (but ONLY to short messages), time permitting (but once) -even to nasty emails. Finally, note that, thanks to my new keyboard, I sometimes reply in Arabic, particularly to academics. [Also please please refrain from offering to "improve" my web site].

He opens his piece on walking by noting that thanks to the "exposure" of his books he came onto theories about fitness by two authors. I imagine this happend by a reader writing in and sharing "interesting material" of the sort he says he does not want. I have never emailed Taleb, but I [Casnocha] don't take his qualifications seriously. It is, in fact, a very naked way to signal busyness and importance.

I think there's something important that Casnocha (and his blog commenter) are not understanding here, and that is the interaction between the linear scaling of a person's time and the exponential scaling of fame.

Here's the deal. Taleb is one person. I'm sure he can answer emails faster than most of us--and he might even have a secretary to filter out the spam--but, still, he's responding to these on human scale. Similarly, he writes just like the rest of us (James Patterson and Doris Kearns Goodwin excepted), putting one word after the other. Even if he writes 10 times faster than a less practiced writer, he still has to do the work.

But . . . he's really famous. OK, not famous like Elvis or even Bob Dylan, but he could very well be receiving a zillion emails per day. Taleb doesn't need to signal busyness and important. He's certainly important, and if he tries to answer all his emails, he's gonna be busy also. Lots of famous people don't have emails at all

I do, however, think it's a bit silly for Taleb to ask people not to send him things to read. I like when people send me things to read. I can look at a couple paragraphs and decide if I want to read more. Sometimes people send interesting things. Also, I'd recommend that Taleb get rid of his "almost always reply" rule. I almost always reply to emails, but Taleb must receive many many more than I do.

P.S. Heinlein's solution.

Last week, Christian Robert and I separately reviewed Krzysztof Burdzy's book, The Search for Certainty, which I characterized as a harmless if misleading discussion of the philosophy of probability. Burdzy sent us his reply, which I will post below, followed by my comments. I am omitting some parts of Burdzy's comments that are specifc to Christian's review and not of general interest.

Blog style

| 2 Comments

I followed this link from Tyler Cowen to "Ben Casnocha on Chile" and found . . . a long blog entry that was exactly in the style of Tyler Cowen! I wonder if Cowen realized this when he linked to it. Probably not: just as we don't notice our own strong smells (or so I've been told), it's probably also hard for anyone to notice an imitation of one's own style. I do wonder whether Casnocha was imitating Cowen on purpose--not such a bad idea when blogging to imitate a master, just as short-story writers continue to imitate John Updike. Personally, I'm sick and tired of book and movie reviewers imitating Pauline Kael--I didn't even like her own writing and I don't enjoy seeing her stylistic ticks repeated by others--but, hey, that's their choice.

P.S. In case you're wondering, here are a few Cowenisms in Casnocha's blog:

Annie Lowrey speculates:

Based on Census Bureau data, five senators would represent Americans earning between $100,000 and $1 million individually per year, with [2/10 of a senator] working on behalf of the millionaires. Eight senators would represent Americans with no income. Sixteen would represent Americans who make less than $10,000 a year, an amount well below the federal poverty line for families. The bulk of the senators would work on behalf of the middle class, with 34 representing Americans making $30,000 to $80,000 per year. . . . Or how about if senators represented particular demographic groups, based on gender and race? White women would elect the biggest group of senators -- 37 of them, though only 38 women have ever served in the Senate.

I don't know how well all of this would work in practice--for one thing, I wouldn't want the senator who represents two-year-olds to be anywhere near the nuclear button--but I agree that ideas of fairness and political representation are subtle.

Along similar lines, here is my response to economists who complained that there were not enough economists in elective office:

Ma conférence à Ensae

| 6 Comments

Ici.

A matter of perspective?

| 3 Comments

An article in The Guardian says;


David Champion, director of automobile testing for Consumer Reports magazine, said the core problem of faulty Toyota accelerators had been linked to 19 deaths in a decade, amounting to two a year of the 40,000 people killed annually on American roads.

"I find it a little odd that we're going to have a Congressional hearing to look at those two deaths out of 40,000," said Champion.

Eric Bettinger, Bridget Terry Long, Philip Oreopoulos, and Lisa Sanbonmatsu write:

Growing concerns about low awareness and take-up rates for government support programs like college financial aid have spurred calls to simplify the application process and enhance visibility.

Here's the study:

H&R Block tax professionals helped low- to moderate-income families complete the FAFSA, the federal application for financial aid. Families were then given an estimate of their eligibility for government aid as well as information about local postsecondary options. A second randomly-chosen group of individuals received only personalized aid eligibility information but did not receive help completing the FAFSA.

And the results:

Comparing the outcomes of participants in the treatment groups to a control group . . . individuals who received assistance with the FAFSA and information about aid were substantially more likely to submit the aid application, enroll in college the following fall, and receive more financial aid. . . . However, only providing aid eligibility information without also giving assistance with the form had no significant effect on FAFSA submission rates.

The treatment raised the proportion of applicants in this group who attended college from 27% (or, as they quaintly put it, "26.8%") to 35%. Pretty impressive. Overall, it appears to be a clean study. And they estimate interactions (that is, varying treatment effects), which is always, always, always a good idea.

Here are my recommendations for improving the article (and this, I hope, increasing the influence of this study):

Update on the coffee experiment

| 1 Comment

It's working, so far.

This program introduces students to three modern, applied statistics research problems, and gives them a sense of how statisticians approach large, complex problems, with the aim of encouraging them to pursue advanced degrees in statistics.

The program takes place at the National Center for Atmospheric Research, Boulder, Colorado. According to the website, the summer 2011 program will be at Columbia.

I emailed David Runciman my discussion of his BBC broadcast (in which he wrote: "It is striking that the people who most dislike the whole idea of healthcare reform - the ones who think it is socialist, godless, a step on the road to a police state - are often the ones it seems designed to help" and "many of America's poorest citizens have a deep emotional attachment to a party that serves the interests of its richest").

Runciman responded with some comments which made me feel that I was being unfair in my original description of his statements as "the usual errors."

Below is my dialogue with Runciman and also my response to a related comment by Megan Pledger.

Runciman replied to my original blog, reasonably enough, as follows:

I [Runciman] don't think I say at any point (either in the radio program, or the article which is a shortened version of the script) that there is more opposition among the poor than among the rich, or among the young than among the old. I don't say that more people vote against their own interests than vote in their own interests - obviously not true. Maybe it reads like that's implied. But many also implies more than you would expect and I still believe that's true.

To which I replied:

A propensity for bias?

| 4 Comments

Teryn Mattox writes:

Matt Stephenson points me to this BBC article, "Why do people vote against their own interests?", that seems to me to be a bit misleading. This would seem to fall into the dog-bites-man category of "This is important. Someone is wrong on the internet"--but it is the fabled BBC, and it is written by a political scientist at fabled Cambridge University--so maybe it's going through some problems.

It is striking [says David Runciman, speaking on the BBC] that the people who most dislike the whole idea of healthcare reform - the ones who think it is socialist, godless, a step on the road to a police state - are often the ones it seems designed to help.

B-b-b-but . . . what about this?

mapsnyt.jpg

The people who dislike healthcare are primarily those over 65 (who already have free medical care in America) and people with above-average income. No, these are not really the ones the new bill is most designed to help.

To be fair, though, my maps are based on survey data from 2004. I haven't been able to grab more recent individual-level data to replicate our analysis with current public opinion. Still, my guess is that it is the older and richer who most strongly oppose changing the health-care system.

Next:

If people vote against their own interests, it is not because they do not understand what is in their interest or have not yet had it properly explained to them. They do it because they resent having their interests decided for them by politicians who think they know best. There is nothing voters hate more than having things explained to them as though they were idiots.

Hey, I didn't know that! Maybe it's true. I thought that in a relatively peaceful and prosperous country such as the United States, there's nothing voters hate more than an economic downturn.

Beyond this, there's little evidence that people vote based on their individual interest or even that they should vote based on their interest; rather, survey data and theory both suggest that people vote based on what they think is best for the country. (See here and here.) This is not to say that the psychological models of Drew Westen, which are touched upon in this article, are wrong or irrelevant, but merely to point out that "people voting against their interests" is not such a surprise or paradox.

And then there's this:

It was Oscar Wilde, was it not, who said he would sooner believe a falsehood told well than a truth told falsely? And George Orwell who wrote that good prose is like a windowpane, but sometimes it needs a bit of Windex and a clean rag to fully do its job.

Along those lines, Don Rubin has long ago convinced me of the importance of clean statistical notation. One example that's been important to me is model checking--residual plots, p-values, and all the rest. The key, to me, is the Tukeyesque idea of comparing observed data to what could've occurred if the model were true. The usual way this used to be done in statistics books was to talk about data y and a random variable Y. If the test statistic is T(y), then the p-value is Pr (T(Y)>T(y)) or, more generally, Pr (T(Y)>T(y) | theta). (I'm assuming continuous models here so as to avoid having to use the "greater than or equal" symbol.)

But this notation starts to break down once you start thinking about uncertainty in theta. If theta can be well estimated from data, then maybe you're ok with Pr (T(Y)>T(y) | theta.hat). But once we go beyond point estimation, we're in trouble, and the trouble is that y is said to be a "realization" of Y. Just as Clark Kent is a particular realization of Superman.

Ouch.

Here's the story (which Kaiser forwarded to me). The English medical journal The Lancet (according to its publisher, "the world's leading independent general medical journal") published an article in 1998 in support of the much-derided fringe theory that MMR vaccination causes autism. From the BBC report:

The Lancet said it now accepted claims made by the researchers were "false".

It comes after Dr Andrew Wakefield, the lead researcher in the 1998 paper, was ruled last week to have broken research rules by the General Medical Council. . . . Dr Wakefield was in the pay of solicitors who were acting for parents who believed their children had been harmed by MMR. . . .

[The Lancet is now] accepting the research was fundamentally flawed because of a lack of ethical approval and the way the children's illnesses were presented.

The statement added: "We fully retract this paper from the published record." Last week, the GMC ruled that Dr Wakefield had shown a "callous disregard" for children and acted "dishonestly" while he carried out his research. It will decide later whether to strike him off the medical register.

The regulator only looked at how he acted during the research, not whether the findings were right or wrong - although they have been widely discredited by medical experts across the world in the years since publication.

They also write:

The publication caused vaccination rates to plummet, resulting in a rise in measles.

An interesting question, no? What's the causal effect of a single published article?

P.S. I love it how they refer to the vaccine as a "three-in-one jab." So English! They would never call it a "jab" in America. So much more evocative than "shot," in my opinion.

Problems with Census data

| 5 Comments

Following this link from John Sides, I read this blog by Justin Wolfers on a problem with U.S. Census data discovered by Trent Alexander, Michael Davern and Betsey Stevenson:

The authors compare the official census count (based on the tallying up of all Census forms) with their own calculations, based on the sub-sample released for researchers (the "public use micro sample," available through IPUMS). If all is well, then the authors' estimates should be very close to 100% of the official population count. But they aren't:

blogSpan.jpg

Nick Allum writes:

I heard a rumour that Doris Kearns Goodwin is still being interviewed on TV, and . . . yes, it's true!

My first thought was: What, they couldn't find an equally appealing talking head who wasn't also a plagiarist? I'm sure there are lots of well-spoken historians who'd love the chance to go on Johnny Carson or whatever it's called nowadays.

But then I looked around on her website, and now I'm not sure. Her books have received all sorts of praise as exemplary popular history, and that sounds like as good a qualification as any for explaining history on TV. Who cares if she's a plagiarist? She's not on the tube for her creative writing talent or, for that matter, for her ability to learn from the primary sources.

The other dimension is that plagiarism is a moral offense. At the very least, I think it might help if Goodwin's TV interviewers every once in a while brought up the piagiarism issue in some relevant way. For example, "Since we're on the topic of authenticity in political candidates, what do you think of the accusation that candidate X is ripping off the ideas of politician Y? As a plagiarist yourself, you must have some thoughts on this?" Or, "The relations between senators and their staff are complicated, no? You must have some insights into this, having delegated the writing of your book to research assistants who copied whole chunks from others' work. How many of 100 members of the U.S. Senate do you think actually read more of the health care bill than you've read of your own publications?"

Really??? He's almost 80 years old! Yeah, I know, U.S. senator is a pretty cushy job, not much heavy lifting involved, but still . . .

P.S. If I'm still blogging when I'm 80, please don't throw this one back at me.

Tyler Cowen quotes Barbara Demick as writing, "North Koreans have multiple words for prison in much the same way that the Inuit do for snow." So do we, no? But in our case, they seem to come from 1930s B-movies

I wonder if there are almost as many words for prison in Russia, Turkmenistan, and the other leaders on the list. Apparently North Korea is off the charts, so perhaps they have ten times as many words for prison/jail as we do.

P.S. America includes a bunch of Inuits, so I guess we have multiple words for snow also!

Stop me before I rant again

| 7 Comments

David Shor writes:

I just read an idea for a pollster that crowd-sources statistical work, and was curious what you thought about the idea.

Here's the idea:

Today, there is a new polling method available: IVR, or 'Interactive Voice Response' polling. Basically, the pollster records several questions, a computer auto-dials hundreds of landlines, and with the people who are willing to participate in the survey, they go through the script automatically.

Even though the old media pollsters and traditional polling organisations like AAPOR are busy discrediting those polls that they condescendingly call 'robopolls', there is not much evidence that they do any worse than live-interviewer polls- but they are much, much cheaper. . . .

Now, the next step to make polls even easier to access for everyone is there- with the mid-January start-up of the IVR pollster Precision Polling.

From Precision Polling's website:

Automated Phone Surveys are phone calls where a recorded voice asks you questions and you type in responses on your keypad (e.g. "Who will get your vote for mayor? Press 1 for Joe..."). This provides a fast and affordable way to get answers from real people.

What do I think? I think it's evil. These robopolls "fast and affordable" for the pollster but not for the person being hassled by the phone call. I think these machine phone calls should be illegal--yes, I would eagerly support a law making it illegal to call someone if there's no human making the call (fax and data transmission excepted, of course). This would have the side benefit of making all those pre-election endorsement auto-calls illegal, as well as various obnoxious calls used by collection agencies.

It's simply an abuse of the phone system, just as it would be an abuse of the electrical system to sneak into your neighbor's house one night, plug in a really long extension cord, and run it out their window to your house to power your appliances.

"Fast and affordable," indeed! Fast, affordable, and abusive is more like it.

P.S. I feel bad even giving these dudes publicity, but I figure, once it's on Daily Kos it's already been read by a million people, so I hope the good I'm doing by disparaging this idea outweighs the harm I'm doing by publicizing it.

P.P.S. I'm not saying the Daily Kos diarist ("twohundertseventy") is evil, or even that that the people at Precision Polling are bad guys. I just don't know if they've thought through the ethical implications of their suggestion, which amounts to bombarding millions of people with irritating calls at dinnertime. Or perhaps they have a retort to my ethical argument, something like: Lots of people enjoy answering polls, or Polls are essential to democracy. OK, if they're so damn essential, try paying people to participate in your poll. You're making money off of them, why not give something back to the people you're hassling? Grrrr.

P.P.P.S. I agree with commenter Tom that robocalls should be legal if the person being called agrees to it ahead of time.

Hal Daume pointed me to this plan of some marathon-running dude named Matt has to quit drinking caffeine. Here's Matt's motivation:

I [Matt] try hard to stay away from acid-forming foods and to eat by the principles of Thrive, where energy comes not from stimulation but from nourishment. I want to maximize the energy I have available to create an exciting life, and coffee, in the long-term, only robs me of this energy.

I've tried hard to quit coffee in the past--I even went a month without coffee a while back. But I keep coming back to it. I come back to it because I have this idea that it helps me think better. I enjoy reading books and doing math more when I drink coffee, and I think I come up with better ideas when I'm caffeinated. But I know that's not true. The type of thinking coffee helps me with is a very linear kind, a proficiency at checking items off a list or even of recombining old ideas in a new way. This isn't real creativity. Real creativity is nonlinear, the creation of truly new ideas that haven't yet been conceived, not simply the reordering of old ones.

What's cool about Matt's project is that he's randomizing: some days he'll drink caffeinated, some days regular coffee, and other days a mix. (To be precise, his wife is doing the randomizing, and she gets to choose the mix.) Each week, he alters the proportions to have more and more decaf--that way he can transition to fully-decaffeinated coffee, but in a way that is slightly unpredictable, so that he's never quite sure what he's getting in any day.

Also, of course, he's making all this public, which I guess will make it tougher for him to break his self-imposed rules.

This is an interesting example in which randomization is used for something other than the typical statistical reason of allowing unbiased comparisons of treatment groups.

I was also amused by his method of having his wife randomize. I remember thinking about this when Seth was telling me about one of his self-experiments, where I worried that expectation effects could be large--Seth knows what he's doing to himself (in this case, I believe it was some choice of which oil he was drinking every day) and I was thinking that this could have a huge effect on his self-measurements. I spent awhile trying to think of a way that Seth could randomize his treatment, but it wasn't easy--Seth was living alone at the time, and there wasn't anyone who could conveniently do it for him--and for reasons having to do with the effects that Seth was expecting to see, a simple randomization wouldn't work. (Seth was expecting results to last over several days, so a randomization by day wouldn't do the trick. But randomizing weeks wouldn't do either, because then you're losing independence of the daily measurements, if Seth guesses (or thinks he can guess) the new treatment on the day of the switch.) It would've been so so easy to do it using a friend, but not at all easy to do alone.

A prediction

| 3 Comments

What it takes

| 2 Comments

From a recent email exchange with a collaborator on a paper that a bunch of us are working on:

Yes, it's definitely a methodology paper. But, given that we don't have any theorems or simulation studies, the motivation for the methodology has to come from the application, no?

A few days ago, I suggested that we could invert the usual forecast-the-election-from-the-economy rule and instead use historical election returns to make inferences about past economic trends.

Bob Erikson is skeptical. He writes:

It is an interesting idea but I don't think the economics-vote connection is strong enough to make it work. At best econoims explains no more than "half the variance" and often less. Like I [Bob] am on record as saying the economy has little to do with midterm elections (AJPS 1990) unlike prez elections.

Damn. It's such a cute idea, though, I still want to give it a try.

Some thoughts on final exams

| 12 Comments

I just finished grading my final exams--see here for the problems and the solutions--and it got me thinking about a few things.

#1 is that I really really really should be writing the exams before the course begins. Here's the plan (as it should be):
- Write the exam
- Write a practice exam
- Give the students the practice exam on day 1, so they know what they're expected to be able to do, once the semester is over.
- If necessary, write two practice exams so that you have more flexibility in what should be on the final.

The students didn't do so well on my exam, and I totally blame myself, that they didn't have a sense of what to expect. I'd given them weekly homework, but these were a bit different than the exam questions.

My other thought on exams is that I like to follow the principles of psychometrics and have many short questions testing different concepts, rather than a few long, multipart essay questions. When a question has several parts, the scores on these parts will be positively correlated, thus increasing the variance of the total.

More generally, I think there's a tradeoff in effort. Multi-part essay questions are easier to write but harder to grade. We tend to find ourselves in a hurry when it's time to write an exam, but we end up increasing our total workload by writing these essay questions. Better, I think, to put in the effort early to write short-answer questions that are easier to grade and, I believe, provide a better evaluation of what the students can do. (Not that I've evaluated that last claim; it's my impression based on personal experience and my casual reading of the education research literature. I hope to do more systematic work in this area in the future.)

I just graded the final exams for my first-semester graduate statistics course that I taught in the economics department at Sciences Po.

I posted the exam itself here last week; you might want to take a look at it and try some of it yourself before coming back here for the solutions.

And see here for my thoughts about this particular exam, this course, and final exams in general.

Now on to the exam solutions, which I will intersperse with the exam questions themselves:

Kevin Spacey famously said that the greatest trick the Devil ever pulled was convincing the world he didn't exist. When it comes The Search for Certainty, a new book on the philosophy of statistics by mathematician Krzysztof Burdzy, the greatest trick involved was getting a copy into the hands of Christian Robert, who trashed it on his blog and then passed it on to me.

The flavor of the book is given from this quotation from the back cover: "Similarly, the 'Bayesian statistics' shares nothing in common with the 'subjective philosophy of probability." We actually go on and on in our book about how Bayesian data analysis does not rely on subjective probability, but . . . "nothing in common," huh? That's pretty strong.

Rather than attempt to address the book's arguments in general, I will simply do two things. First, I will do a "Washington read" (as Yair calls it) and see what Burdzy says about my own writings. Second, I will address the question of whether Burdzy's arguments will have any effect on statistical practice. If the answer to the latter question is no, we can safely leave the book under review to the mathematicians and philosophers, secure in the belief that it will do little mischief.

This is pretty funny. And, to think that I used to work there. This guy definitely needs a P.R. consultant. I've seen dozens of these NYT mini-interviews, and I don't think I've ever seen someone come off so badly. The high point for me was his answering a question about pay cuts by saying that he's from Philadelphia. I don't know how much of this is sheer incompetence and how much is coming from the interviewer (Deborah Solomon) trying to string him up. Usually she seems pretty gentle to her interview subjects. My guess is what happened is her easygoing questions lulled Yudof into a false sense of security, he got too relaxed, and he started saying stupid things. Solomon must have been amazed by what was coming out of his mouth.

P.S. The bit about the salary was entertaining too. I wonder if he has some sort of deal like sports coaches do, so that even if they fire him, they have to pay out X years on his contract.

Ban Chuan Cheah writes:

I'm trying to learn propensity score matching and used your text as a guide (pg 208-209). After creating the propensity scores, the data is matched and after achieving covariate balance the treatment effect is estimated by running a regression on the treatment variable and some other covariates. The standard error of the treatment effect is also reported - in the book it is 10.2 (1.6).

We all know, following the research of Rosenstone, Hibbs, Erikson, and others, that that economic conditions can predict vote swings at state and national levels.

But, what about the reverse? Could we deduce historical economic conditions from election returns? Instead of forecasting elections from the economy, we could hindcast the economy from elections.

Would this make sense as a way of studying local and regional economic conditions in the U.S. in the 1800s, for example? I could imagine that election data are a lot easier to come by than economic data.

P.S. Don't forget that there have been big changes over time in our impressions of the ability of presidents to intervene successfully in the economy.

Patterson update

| 1 Comment

I went to the library and took a look at a book by James Patterson. It was pretty much the literary equivalent of a TV cop show. I couldn't really see myself reading it all the way through, but it was better-written than I'd expected. It's hard for me to see why Patterson wants to keep doing it (even if his coauthors are doing most of the work at this point). But I suppose that, once you're on the bestseller list, it's a bit addictive and you want to stay up there.

Today I faced some tedious work on a project that must be finished by the end of the week, so my procrastination methods reached new heights of creativity. For the first time, I clicked on the "Most Popular" tab at the top of the NY Times website. This gives me another opportunity for procrastination, by typing this blog post, because I noticed something surprising: There's not much overlap between the 10 "most e-mailed" and the 10 "most blogged" recent stories. Only 3 stories are on both "top 10" lists...which is to say, 7 of the most e-mailed stories are not among those that drew the attention of the most bloggers, and 7 of the most-blogged stories didn't make the cut for most emailers. I don't know if this is typical -- maybe this is an unusual week -- but I find it surprising. If a story seems like the kind of thing that would interest your friends, wouldn't it also be a good one to blog about? Does the difference simply reflect demographics? Perhaps bloggers are younger, and are interested in different stories than non-bloggers?

It's not 1933, it's 1930

| 3 Comments

A major storyline of the 2008 election was that it was the Great Depression all over again: George W. Bush was the hapless Herbert Hoover and Barack Obama was the FDR figure, coming in on a wave of popular resentment to clean things up. The stock market crash made the parallels pretty direct. One could continue the analogy, with Bill Clinton playing the Calvin Coolidge role, mindlessly stoking the paper economy and complicit in the rise of the stock market as a national sport. Public fascination with various richies seemed very 1920s-ish, and we had lots of candidates for the "Andrew Mellon" of the 2000s. Obama's decisive victory echoed Roosevelt's in 1932.

But history doesn't really repeat itself--or if it does, it's not always quite the repetition that was expected. With his latest plan of a spending freeze (on the 17% of the federal budget that is not committed to the military, veterans, homeland security and international affairs, Social Security, or Medicare), Obama is being labeled by many liberals as the second coming of Herbert Hoover--another well-meaning technocrat who can't put together a political coalition to do anything to stop the slide. Conservatives, too, may have switched from thinking of Obama as a scary realigning Roosevelt to viewing him as a Hoover from their own perspective--as a well-meaning fellow who took a stock market crash and made it worse through a series of ill-timed government interventions.

I can see the future debates already: was Obama a Hoover who dithered while the economy burned, too little and too late (the Krugman version) or a Hoover who hindered the ability of the economy to recover on his own by pushing every button he could find on the national console (the Chicago-school version)?

In either storyline, it's 1930, not 1932: rather than being three years into a depression, we're still just getting started and we're still in the Hoover-era position of seeing things fall apart but not quite being ready to take the next step.

Anyway, I'm not claiming to offer any serious political or economic analysis here, just pointing out that the 1932 election was a full three years after the 1929 stock market crash, so Obama's stepping into the story at a different point than when Roosevelt stepped in to his.

Or maybe we're still on track for Obama to "do a Reagan,' ride out the recession in the off-year election and sit tight as the economy returns in years 3 and 4.

Tufte recommendation

| 4 Comments

A former student writes:

I'm going to get a Tufte book. Do you recommend "The Visual Display of Quantitative Information" or "Envisioning Information?"

My reply: My favorite is his second book, Envisioning Information. His first book was his breakthrough but the second book is the one that I learned the most from, myself.

P.S. I don't know if this counts as a 3-star thread.

What can search predict?

| 5 Comments

You've all heard about how you can predict all sorts of things, from movie grosses to flu trends, using search results. I earlier blogged about the research of Yahoo's Sharad Goel, Jake Hofman, Sebastien Lahaie, David Pennock, and Duncan Watts in this area. Since then, they've written a research article.

Here's a picture:

sharadsearch.png

And here's their story:

We [Goel et al.] investigate the degree to which search behavior predicts the commercial success of cultural products, namely movies, video games, and songs. In contrast with previous work that has focused on realtime reporting of current trends, we emphasize that here our objective is to predict future activity, typically days to weeks in advance. Specifically, we use query volume to forecast opening weekend box-office revenue for feature films, first month sales of video games, and the rank of songs on the Billboard Hot 100 chart. In all cases that we consider, we find that search counts are indicative of future outcomes, but when compared with baseline models trained on publicly available data, the performance boost associated with search counts is generally modest--a pattern that, as we show, also applies to previous work on tracking flu trends.

The punchline:

We [Goel et al.] conclude that in the absence of other data sources, or where small improvements in predictive performance are material, search queries may provide a useful guide to the near future.

I like how they put this. My first reaction upon seeing the paper (having flipped through the graphs and not read the abstract in detail) was that it was somewhat of a debunking exercise: Search volume has been hyped as the greatest thing since sliced bread, but really it's no big whoop, it adds almost no information beyond a simple forecast. But then my thought was that, no, this is a big whoop, because, in an automatic computing environment, it could be a lot easier to gather/analyze search volume than to build those baseline models.

Sharad's paper is cool. My only suggestion is that, in addition to fitting the separate models and comparing, they do the comparison on a case-by-case basis. That is, what percentage of the individual cases are predicted better by model 1, model 2, or model 3, and what is the distribution of the difference in performance. I think they're losing something by only doing the comparisons in aggregate.

It also might be good if they could set up some sort of dynamic tracker that could perform the analysis in this paper automatically, for thousands of outcomes. Then in a year or so they'd have tons and tons of data. That would take this from an interesting project to something really cool.

Alex Lundry sent along this presentation.. As some of you know, I hate videos, so I didn't actually look at this, but it seems to combine two of my main interests, so I thought it might interest some of you too. If you like it (or you don't), feel free to say so in the comments.

The man with the golden gut

| 2 Comments

Seth links to this fascinating article by Jonathan Mahler about the popular novelist James Patterson:

Last year, an estimated 14 million copies of his books in 38 different languages found their way onto beach blankets, airplanes and nightstands around the world. Patterson may lack the name recognition of a Stephen King, a John Grisham or a Dan Brown, but he outsells them all. Really, it's not even close. (According to Nielsen BookScan, Grisham's, King's and Brown's combined U.S. sales in recent years still don't match Patterson's.) This is partly because Patterson is so prolific: with the help of his stable of co-authors, he published nine original hardcover books in 2009 and will publish at least nine more in 2010.

Patterson has written in just about every genre -- science fiction, fantasy, romance, "women's weepies," graphic novels, Christmas-themed books. He dabbles in nonfiction as well. In 2008, he published "Against Medical Advice," a book written from the perspective of the son of a friend who suffers from Tourette's syndrome.

More than Grisham, King, and Brown combined: that really is pretty impressive. The sixty-somthing Patterson has written 35 New York Times #1 best sellers but doesn't seem to have too much of a swelled head:

A new kind of spam

| 9 Comments

As a way of avoiding work, I check the comments on this blog and decide which to approve and which to send to the spam folder. (Lots of stuff gets sent directly to spam; these are almost 100% classified correctly and I basically never need to check there.)

There are different kinds of spam, but I can typically spot it by being close to content-free and with a link to a site that is selling something. I don't mind if you're a statistical consultant and you link to your consulting site, but, no, if you submit a comment with a link to some discount DVD site or whatever, yes, you're going straight to the spam fliter.

Today, though, I got a new kinds of spam: it looked just like the usual stuff but there was no URL, either in the mssage or in the regular URL field. I can't figure out why somebody would bother to do this.

Following up on our recent discussion (see also here) about estimates of war deaths, Megan Price pointed me to this report, where she, Anita Gohdes, Megan Price, and Patrick Ball write:

Several media organizations including Reuters, Foreign Policy and New Scientist covered the January 21 release of the 2009 Human Security Report (HSR) entitled, "The Shrinking Cost of War." The main thesis of the HRS authors, Andrew Mack et al, is that "nationwide mortality rates actually fall during most wars" and that "today's wars rarely kill enough people to reverse the decline in peacetime mortality that has been underway in the developing world for more than 30 years." . . . We are deeply skeptical of the methods and data that the authors use to conclude that conflict-related deaths are decreasing. We are equally concerned about the implications of the authors' conclusions and recommendations with respect to the current academic discussion on how to count deaths in conflict situations. . . .

The central evidence that the authors provide for "The Shrinking Cost of War" is delivered as a series of graphs. There are two problems with the authors' reasoning.

From blogging legend Phil Nugent:

capt.macs10404302112.fells_acres_macs104.jpg

If Scott Brown wins, I [Nugent] suspect that it will have less to do with a massive swing to the right in the bosom of liberalism than with a tendency there to vote against the repulsive and inept candidate in favor of the one who seems Kennedyesque, no matter whether he belongs to the Kennedys' party or not. On the other hand, if the election comes down to a squeaker that finds Coakley victorious, it'll probably be because the last minute media explosion, complete with the sight of all those gleeful Republicans turning cartwheels in the end zone, alerted voters to the strategic importance of holding their noses and voting for the monster over the centerfold.

Andrew Sullivan links to this amusing study [link fixed]. The whole blog is lots of fun--I've linked to it before--and it illustrates an important point in statistics, which I've given as the title of this blog entry.

P.S. I'm not trying to say that statistical methodology is a waste of time. Good methods--and I include good graphical methods in this category--allow us to make use of more data. If all you can do is pie charts and chi-squared tests (for example), you won't be able to do much.

Alan Turing is said to have invented a game that combines chess and middle-distance running. It goes like this: You make your move, then you run around the house, and the other player has to make his or her move before you return to your seat. I've never played the game but it sounds like fun. I've always thought, though, that the chess part has got to be much more important than the running part: the difference in time between a sprint and a slow jog is small enough that I'd think it would always make sense just to do the jog and save one's energy for the chess game.

But when I was speaking last week at the University of London, Turing's chess/running game came up somehow in conversation, and somebody made a point which I'd never thought of before, that I think completely destroys the game. I'd always assumed that it makes sense to run as fast as possible, but what if you want the time to think about a move? Then you can just run halfway around the house and sit for as long as you want.

It goes like this. You're in a tough spot and want some time to think. So you make a move where the opponent's move is pretty much obvious, then you go outside and sit on the stoop for an hour or two to ponder. Your opponent makes the obvious move and then has to sit and wait for you to come back in. Sure, he or she can plan ahead, but with less effectiveness than you because of not knowing what you're going to do when you come back in.

So . . . I don't know if anyone has actually played Turing's running chess game, but I think it would need another rule or two to really work.

This looks interesting; too bad I'm not around to hear it:

Book titles

| 8 Comments

My collaborators and I have had some successes and some failures; here are some stories, with the benefit of (varying degrees of) hindsight.

"Bayesian Data Analysis." We thought a lot about this one. It was my idea to use the phrase "data analysis": the idea was that "inference" is too narrow (being only one of the three data analysis steps of model-building, inference, and model checking) and "statistics" is too broad (seeing as it also includes design and decision making as well as data analysis). I hadn't thought of the way that BDA sounds like EDA but that came out well, even though the first edition of BDA was pretty weak on the EDA stuff--we fit more of that into the second edition (in chapter 6 and even in the cover). Beyond this, I was never satisfied with "Bayes" in the title--it seemed, and still seems, too jargony and not descriptive enough for me. I'd prefer something like "Data Analysis Using Probability Models" or even "Data Analysis Using Generative Models" (to use a current buzzword that, yes, may be jargon but is also descriptive). But we eventually decided (correctly, I think) that we had to go with Bayes because it's such a powerful brand name. Every once in awhile I see the phrase "Bayesian data analysis" used generically, not in reference to our book, and when this happens it always makes me happy; I think the statistical world is richer to have this phrase rather than the formerly-standard "Bayesian inference" (which, as noted above, misses some big issues).

"Teaching Statistics: A Bag of Tricks." Should've been called "Learning Statistics: A Bag of Tricks." Only a few people want to teach statistics; lots of people want to learn it. And, ultimately, a book of teaching methods is really a book of learning methods. Also, many people have told me that they've bought the book and read it. I actually think it's had more effect from people reading it than from people using it in their classes. Sort of like one of those golf books that people put by their bedside and read even if they don't get around to practicing and following all the instructions.

"Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives." The title seems fine, but something went wrong in the promotion of this book. Xiao-Li and I collected some excellent articles and put a huge amount of effort into editing them. I think the book is great but it hasn't sold a lot. Perhaps we should've structured it slightly differently so it could've been used as a course book? And of course we shouldn't have published with Wiley, who are notorious for pricing their books too high. (I notice they now charge $132 (!) for Feller's famous book on probability theory.) Why did we go with Wiley? At the time, Xiao-Li and I thought it would be difficult to find a publisher so we didn't really try shopping it around. In retrospect, we didn't fully realize how great our book was; we were satisfied just to get it out there without thinking clearly about what would happen next.

"Data Analysis Using Regression and Multilevel/Hierarchical Models." The awkward "Multilevel/Hierarchical" thing is Phil's fault: I wanted to go with "multilevel" (because I felt, and still feel, that "hierarchical" can be seen as implying nested models, and it was very important for me in this book to go beyond the simple identification of multilevel models with simple hierarchical designs and data structures), but Phil pointed out that "hierarchical" is a much more standard word than "multilevel" (for example, "hierarchical model" gets four times as many Google hits as "multilevel model"). So I did the awkward think and kept both words. (And Jennifer was fine with this too.) Also we needed to put Regression in there because a multilevel model is really just regression with a discrete predictor. And Data Analysis for the reasons described above. The book has sold well so the title doesn't seem to have hurt it any.

"Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do." I think this was a mistake. First, as some people have pointed out and as we realized even at the time, we don't actually say why Americans vote the way they do. I really wish we had chosen our other candidate subtitle, "How Americans are Polarized and How They're Not." Beyond this, I'm actually down on the whole Red State, Blue State thing. Sure, it's grabby, but I fear it makes the book seem less serious. Given that we didn't become the next Freakonomics and we didn't sell a zillion copies, if I could go back in time I'd give it a more serious title, such as, hmmm..., "Geographic and Demographic Polarization in American Poliitcs"--no, that's too serious-sounding. Maybe "Democrats and Republicans: Who They Are, Where They Live, and Where They Stand on the Issues." Or "American Voters, Red and Blue: Who They Are, Where They Live, and Where They Stand on the Issues." Something that is a bit grabby but conveys more of our research content. (Many people were misled by our title into thinking the book was merely a retread of our Red State, Blue State article, but really it was full of original research that, to this date, has still only appeared in the book.)

"A Quantitative Tour of the Social Sciences." I can't imagine a better title for this one. And I love the book, too. In addition to having wonderful content, it has a great cover that was contributed by a blog commenter (who I still have to send a free book to; sorry!). We've gotta do a better job of promoting it, but I'm not quite sure how. Here's a nice review.

I have a few more books in (various stages of) the pipeline, but I'll hold off telling you their titles until they're closer to done.

I remember many years ago being told that political ideologies fall not along a line but on a circle: if you go far enough to the extremes, left-wing communists and right-wing fascists end up looking pretty similar.

I was reminded of this idea when reading Christian Robert and George Casella's fun new book, "Introducing Monte Carlo Methods with R."

I do most of my work in statistical methodology and applied statistics, but sometimes I back up my methodology with theory or I have to develop computational tools for my applications. I tend to think of this sort of ordering:

Probability theory - Theoretical statistics - Statistical methodology - Applications - Computation

Seeing this book, in which two mathematical theorists write all about computation, makes me want to loop this line in a circle. I knew this already--my own single true published theorem is about computation, after all--but I tend to forget. In some way, I think that computation--more generally, numerical analysis--has taken some of the place in academic statistics that was formerly occupied by theorem-proving. I think it's great that many of our more mathematical-minded probabilists and statisticians can follow their theoretical physicist colleagues and work on computational methods. I suspect that applied researchers such as myself will get much more use out of theory as applied to computation, as compared to traditionally more prestigious work on asymptotic inference, uniform convergence, mapping the rejection regions of hypothesis tests, M-estimation, three-armed bandits, and the like.

Don't get me wrong--I'm not saying that computation is the only useful domain for statistical theory, or anything close to that. There are lots of new models to be built and lots of limits to be understood. Just, for example, consider the challenges of using sample data to estimate properties of a network. Lots of good stuff to do all around.

Anyway, back to the book by Robert and Casella. It's a fun book, partly because they resist the impulse to explain everything or to try to be comprehensive. As a result, reading the book requires the continual solution of little puzzles (as befits a book that introduces its chapters with quotations from detective novels). I'm not sure if this was intended, but it makes it a much more participatory experience, and I think for that reason it would also be an excellent book for a course on statistical computing.

Charles Warne writes:

A colleague of mine is running logistic regression models and wants to know if there's any sort of a test that can be used to assess whether a coefficient of a key predictor in one model is significantly different to that same predictor's coefficient in another model that adjusts for two other variables (which are significantly related to the outcome). Essentially she's wanting to statistically test for confounding, and while my initial advice was that a single statistical test isn't really appropriate since confounding is something that we make an educated judgement about given a range of factors, she is still keen to see if this can be done. I read your 2006 article with Hal Stern "The difference between 'significant' and 'not significant' is not itself statistically significant" which included the example (p. 328) where evidence for a difference between the results of two independent studies was assessed by summing the squares of the standard errors of each and taking the square root to give the standard error of the difference (se=14). My question is whether this approach can be applied to my colleague's situation, given that both logistic regression models are based on the same sample of individuals and therefore are not independent? Is there an adjustment that can be used to produce more accurate standard errors for non-independent samples or should i not be applying this approach at all? Is there a better way this problem could be tackled?

My reply: No, you wouldn't want to take the two estimates and treat them as if they were independent. My real question, though, is why your colleague wants to do this in the first place. It's not at all clear what question such an analysis would be answering.

P.S. Warne adds:

My final exam

| 11 Comments

I'm not particularly proud of this one, but I thought it might interest some of you in any case. It's the final exam for the course I taught this fall to the economics students at Sciences Po. Students were given two hours.

Overexposure

| 2 Comments

Thinking about Erma Bombeck, I'm reminded of the whole "overexposure" phenomenon. Some people get overexposed but it's still ok. The classic example is Michael Jackson: no matter what, people still think Billie Jean and the rest are cool. And somehow Dave Barry managed to hit the stratosphere without getting that "overexposed" vibe. But Bombeck had more of the classic pattern: at first, she was this exciting new thing--I remember when we got The Grass Is Always Greener Over The Septic Tank out of the library--then, somewhere along the way, she became tacky. I guess it would make sense to go reread The Grass is Always Greener and see if it's still funny. I think I'd still think Art Buchwald's old columns are funny, but who knows.

And then there's Erle Stanley Gardner. I have no sense whether he was "overexposed" or just had his deserved period of popularity which naturally ended.

Boris writes, regarding the recent U.S. Senate election (in which moderate Republican Scott Brown narrowly beat liberal Democrat Martha Coakley in usually reliably-Democratic Massachusetts):

I [Boris] disagree with Josh Tucker that the election isn't that consequential. First, the pivotal Senator will now be a Republican, not a Democrat. The parties put a lot of pressure on moderate members of Congress to vote one way or the other; it's often unsuccessful, but its a pretty powerful source of influence. Second, that pivotal Senator will be Brown, not Snowe (if my prediction proves accurate). Finally, this pivotality will exist on every issue, not just health care reform, which probably just expired in its current form. Not too shabby as a consequential election, right?

Based upon his voting record in the Massachusetts State Senate as well the Votesmart surveys of MA state legislators (include his own from 2002), I [Boris] estimate that Brown is to the left of the leftmost Republican in the Senate, Olympia Snowe of Maine and to the right of the rightmost Democrat in the Senate, Ben Nelson of Nebraska. Just as important, Brown stands to become the pivotal member of the Senate--that is, the 60th least liberal (equivalently, the 40th most conservative)-a distinction previously held by Nelson.

More here.

I posted a note on the other blog about the difference between internal and external coherence of political ideology. The basic idea is that, a particular person or small group can have an ideology (supporting positions A, B, C, and D, for example) that is perfectly internally coherent--that is, all these positions make sense given the underlying ideology--while being incoherent with other ideologies (for example, those people who support positions A, B, not-C, and not-D). What's striking to me is how strongly people can feel that their beliefs on a particular issue flow from their being a liberal, or a conservative, or whatever, even though others with similar opinions will completely disagree with them on that issue.

Stephen Dubner reports on an observational study of bike helmet laws, a study by Christopher. Carpenter and Mark Stehr that compares bicycling and accident rates among children among states that did and did not have helmet laws. In reading the data analysis, I'm reminded of the many discussions Bob Erikson and I have had about the importance, when fitting time-series cross-sectional models, of figuring out where your identification is coming from (this is an issue that's come up several times on this blog)--but I have no particular reason to doubt the estimates, which seem plausible enough. The analysis is clear enough, so I guess it would be easy enough to get the data, fit a hierarchical model, and, most importantly, make some graphs of what's happening before and after the laws, to see what's going on in the data.

Beyond this, I had one more comment, which is that I'm surprised that Dubner found it surprising that helmet laws seem to lead to a decrease in actual bike riding. My impression is that when helmet laws are proposed, this always comes up: the concern that if people are required to wear helmets, they'll just bike less. Hats off to Carpenter and Stehr for estimating this effect in this clever way, but it's certainly an idea that's been discussed before. In this context, I think it wouldb useful to think in terms of sociology-style models of default behaviors as well as economics-style models of incentives.

I read this report by Matthew Yglesias that Blue Cross/Blue Shield is "covertly backing far-right efforts to get health reform declared unconstitutional." I don't want to get into a discussion about whether these efforts are really "far-right"--I know next to nothing about the politics of the health reform battle.

What I really wanted to convey here was my first reaction upon seeing this, which was: Blue Cross/Blue Shield?? I remember this organization from the 70s, when it was my vague impression that Blue Cross was synonymous with "health insurance." I've always thought of it as a quasi-public organization, a sort of default health plan. I mean, sure, they're a private organization, so I assume that, just like the gas company and the electric company and the phone company, they're probably top-heavy with overpaid executives who don't do anything while earning ten times what they'd get on the federal scale. Whatever. That's the system we have here: people who work for quasi-public companies get a soft deal.

I was surprised, though, to hear about Blue Cross doing such strong lobbying. Sort of similar to the reaction I had seeing the percentage of political contributions from employees at Harvard etc. that went to the Democrats. I mean, sure, employees of Harvard have the right to give to whoever they want, but, still, there's something funny about a quasi-public institution such as Harvard (or Blue Cross) leaning so strongly on one side of the debate.

I don't really know if I should think of any of this as a problem; it's just seems strange to think of Blue Cross as sponsoring a covert political agenda. It almost sounds like something from one of those '60s parody spy movies, where the bad guys aren't the Russians or ex-Nazis or whatever, but . . . Blue Cross!

I like paperback books that fit in my pocket. Unfortunately, about 25 years ago they pretty much stopped printing books in that size. Usually the closest you can get are those big floppy "trade paperbacks" or, in the case of the occasional Stephen King-type bestseller, a thick-as-a-brick paperback with big printing and fat pages.

It's not my place to question book marketers. My best theory is that book prices went up, for whatever reason, and then people wanted to feel like they're getting their money's worth: instead of a little pocket book for $2.95, you get the trade paperback for $16.95. Personally, I'd prefer the little book--whether or not I'm paying $16.95--but probably others feel differently. It's sort of like they way they'll sell you 50 aspirins in a bottle that would hold 200, and so forth.

Anyway, I pretty much have to get my pocket books used. I was in a used bookstore the other day and bought Killing Time (1961) by Donald E. Westlake, an author whom I've referred to before as the master of the no-redeeming-social-value thriller. This book was pretty good, and, on top of that, it actually had some redeeming social value.

I'll get back to this point in a moment, but first I wanted to say that one of the funnest things about reading a book from fifty years ago is to get a sense of how things used to be. Killing Time takes place in a small East Coast town which is dominated by a few local bigwigs. I imagine there used to be a lot of places like this in the old days but not so much any more, now that not so many people work in factories, and local ties are weaker. It reminded me of when I watched a bunch of Speed Racer cartoons with Phil in a movie theater in the early 90s. These were low-budget Japanese cartoons from the 60s that we loved as kids. From my adult perspective, the best parts were during the characters' long drives, where you could see Japanese industrial scenes in the background.

OK, now back to the "redeeming social value" thing. In Killing Time, Westlake takes the traditional Philip Marlowe private eye scenario and turns it inside out. The main character of the book (named Smith--make of that what you will) follows the standard pattern: he's outwardly cynical, just wanting to live his life and get by, but underlying this he has a philosophy of government that you might call "realistic idealism" or "idealistic realism." In the book, some reformers from the state capital come to town with the goal of exposing corruption, but private eye Smith doesn't want to go along with this: in his view, the reformers are naive, society has a balance, and it's best to keep things on an even keel. There's a crucial scene about two-thirds of the way through the book, though, where I suddenly realized (through the words of another character) how Smith's apparent cynicism is an extreme form of idealism. And then when I got to the end of the book, I had a sense of the explosive internal contradictions inherent in the standard "private eye" view of the world.

What I can't figure out is how anybody could write a private eye story with a straight face after reading the Westlake book. To me, it really closes the door on the genre. It's the Watchmen of private eye novels.

P.S. An interesting thing about Westlake is that he has not, I believe, ever had a breakout bestseller. I don't know what it takes to get such success, but I don't think it ever happened to him. He had many books made into movies, though, so I'm sure he did just fine financially.

P.P.S. Don't get me wrong, it"s not like I'm saying Westlake is some sort of unrecognized literary master. He has great plots and settings and charming characters, but nothing I've ever read of his has the emotional punch of, say, Scott Smith's A Simple Plan (to choose a book whose plot would fit well into the Westlake canon).

It's the Gatsby seminar in the Computational Neuroscience Unit at University College London, Mon 18 Jan at 4pm:

Creating structured and flexible models: some open problems

A challenge in statistics is to construct models that are structured enough to be able to learn from data but not be so strong as to overwhelm the data. We introduce the concept of "weakly informative priors" which contain important information but less than may be available for the given problem at hand. We also discuss some related problems in developing general models for taxonomies and deep interactions. We consider how these ideas apply to problems in social science and public health. If you don't walk out of this talk a Bayesian, I'll eat my hat.

P.S. Link updated.

Nate does Bayes

| 1 Comment

The classical statisticians among you can call it a measurement-error model. Whatever.

Bayesian statistics then and now

| 8 Comments

The following is a discussion of articles by Brad Efron and Rob Kass, to appear in the journal Statistical Science. I don't really have permission to upload their articles, but I think (hope?) this discussion will be of general interest and will motivate some of you to read the others' articles when they come out. (And thanks to Jimmy and others for pointing out typos in my original version!)

It is always a pleasure to hear Brad Efron's thoughts on the next century of statistics, especially considering the huge influence he's had on the field's present state and future directions, both in model-based and nonparametric inference.

Three meta-principles of statistics

Before going on, I'd like to state three meta-principles of statistics which I think are relevant to the current discussion.

First, the information principle, which is that the key to a good statistical method is not its underlying philosophy or mathematical reasoning, but rather what information the method allows us to use. Good methods make use of more information. This can come in different ways: in my own experience (following the lead of Efron and Morris, 1973, among others), hierarchical Bayes allows us to combine different data sources and weight them appropriately using partial pooling. Other statisticians find parametric Bayes too restrictive: in practice, parametric modeling typically comes down to conventional models such as the normal and gamma distributions, and the resulting inference does not take advantage of distributional information beyond the first two moments of the data. Such problems motivate more elaborate models, which raise new concerns about overfitting, and so on.

As in many areas of mathematics, theory and practice leapfrog each other: as Efron notes, empirical Bayes methods have made great practical advances but "have yet to form into a coherent theory." In the past few decades, however, with the work of Lindley and Smith (1972) and many others, empirical Bayes has been folded into hierarchical Bayes, which is part of a coherent theory that includes inference, model checking, and data collection (at least in my own view, as represented in chapters 6 and 7 of Gelman et al, 2003). Other times, theoretical and even computational advances lead to practical breakthroughs, as Efron illustrates in his discussion of the progress made in genetic analysis following the Benjamini and Hochberg paper on false discovery rates.

My second meta-principle of statistics is the methodological attribution problem, which is that the many useful contributions of a good statistical consultant, or collaborator, will often be attributed to the statistician's methods or philosophy rather than to the artful efforts of the statistician himself or herself. Don Rubin has told me that scientists are fundamentally Bayesian (even if they don't realize it), in that they interpret uncertainty intervals Bayesianly. Brad Efron has talked vividly about how his scientific collaborators find permutation tests and p-values to be the most convincing form of evidence. Judea Pearl assures me that graphical models describe how people really think about causality. And so on. I'm sure that all these accomplished researchers, and many more, are describing their experiences accurately. Rubin wielding a posterior distribution is a powerful thing, as is Efron with a permutation test or Pearl with a graphical model, and I believe that (a) all three can be helping people solve real scientific problems, and (b) it is natural for their collaborators to attribute some of these researchers' creativity to their methods.

The result is that each of us tends to come away from a collaboration or consulting experience with the warm feeling that our methods really work, and that they represent how scientists really think. In stating this, I'm not trying to espouse some sort of empty pluralism--the claim that, for example, we'd be doing just as well if we were all using fuzzy sets, or correspondence analysis, or some other obscure statistical method. There's certainly a reason that methodological advances are made, and this reason is typically that existing methods have their failings. Nonetheless, I think we all have to be careful about attributing too much from our collaborators' and clients' satisfaction with our methods.

My third meta-principle is that different applications demand different philosophies. This principle comes up for me in Efron's discussion of hypothesis testing and the so-called false discovery rate, which I label as "so-called" for the following reason. In Efron's formulation (which follows the classical multiple comparisons literature), a "false discovery" is a zero effect that is identified as nonzero, whereas, in my own work, I never study zero effects. The effects I study are sometimes small but it would be silly, for example, to suppose that the difference in voting patterns of men and women (after controlling for some other variables) could be exactly zero. My problems with the "false discovery" formulation are partly a matter of taste, I'm sure, but I believe they also arise from the difference between problems in genetics (in which some genes really have essentially zero effects on some traits, so that the classical hypothesis-testing model is plausible) and in social science and environmental health (where essentially everything is connected to everything else, and effect sizes follow a continuous distribution rather than a mix of large effects and near-exact zeroes).

To me, the false discovery rate is the latest flavor-of-the-month attempt to make the Bayesian omelette without breaking the eggs. As such, it can work fine if the implicit prior is ok, it can be a great method, but I really don't like it as an underlying principle, as it's all formally based on a hypothesis testing framework that, to me, is more trouble than it's worth. In thinking about multiple comparisons in my own research, I prefer to discuss errors of Type S and Type M rather than Type 1 and Type 2 (Gelman and Tuerlinckx, 2000, Gelman and Weakliem, 2009, Gelman, Hill, and Yajima, 2009). My point here, though, is simply that any given statistical concept will make more sense in some settings than others.

For another example of how different areas of application merit different sorts of statistical thinking, consider Rob Kass's remark: "I tell my students in neurobiology that in claiming statistical significance I get nervous unless the p-value is much smaller than .01." In political science, we're typically not aiming for that level of uncertainty. (Just to get a sense of the scale of things, there have been barely 100 national elections in all of U.S. history, and political scientists studying the modern era typically start in 1946.)

Progress in parametric Bayesian inference

I also think that Efron is doing parametric Bayesian inference a disservice by focusing on a fun little baseball example that he and Morris worked on 35 years ago. If he would look at what's being done now, he'd see all the good statistical practice that, in his section 10, he naively (I think) attributes to "frequentism." Figure 1 illustrates with a grid of maps of public opinion by state, estimated from national survey data. Fitting this model took a lot of effort which was made possible by working within a hierarchical regression framework--"a good set of work rules," to use Efron's expression. Similar models have been used recently to study opinion trends in other areas such as gay rights in which policy is made at the state level, and so we want to understand opinions by state as well (Lax and Phillips, 2009).

I also completely disagree with Efron's claim that frequentism (whatever that is) is "fundamentally conservative." One thing that "frequentism" absolutely encourages is for people to use horrible, noisy estimates out of a fear of "bias." More generally, as discussed by Gelman and Jakulin (2007), Bayesian inference is conservative in that it goes with what is already known, unless the new data force a change. In contrast, unbiased estimates and other unregularized classical procedures are noisy and get jerked around by whatever data happen to come by--not really a conservative thing at all. To make this argument more formal, consider the multiple comparisons problem. Classical unbiased comparisons are noisy and must be adjusted to avoid overinterpretation; in constrast, hierarchical Bayes estimates of comparisons are conservative (when two parameters are pulled toward a common mean, their difference is pulled toward zero) and less likely to appear to be statistically significant (Gelman and Tuerlinckx, 2000).

Another way to understand this is to consider the "machine learning" problem of estimating the probability of an event on which we have very little direct data. The most conservative stance is to assign a probability of ½; the next-conservative approach might be to use some highly smoothed estimate based on averaging a large amount of data; and the unbiased estimate based on the local data is hardly conservative at all! Figure 1 illustrates our conservative estimate of public opinion on school vouchers. We prefer this to a noisy, implausible map of unbiased estimators.

Of course, frequentism is a big tent and can be interpreted to include all sorts of estimates, up to and including whatever Bayesian thing I happen to be doing this week--to make any estimate "frequentist," one just needs to do whatever combination of theory and simulation is necessary to get a sense of my method's performance under repeated sampling. So maybe Efron and I are in agreement in practice, that any method is worth considering if it works, but it might take some work to see if something really does indeed work.

Comments on Kass's comments

Before writing this discussion, I also had the opportunity to read Rob Kass's comments on Efron's article.

I pretty much agree with Kass's points, except for his claim that most of Bayes is essentially maximum likelihood estimation. Multilevel modeling is only approximately maximum likelihood if you follow Efron and Morris's empirical Bayesian formulation in which you average over intermediate parameters and maximize over hyperparameters, as I gather Kass has in mind. But then this makes "maximum likelihood" a matter of judgment: what exactly is a hyperparameter? Things get tricky with mixture models and the like. I guess what I'm saying is that maximum likelihood, like many classical methods, works pretty well in practice only because practitioners interpret the methods flexibly and don't do the really stupid versions (such as joint maximization of parameters and hyperparameters) that are allowed by the theory.

Regarding the difficulties of combining evidence across species (in Kass's discussion of the DuMouchel and Harris paper), one point here is that this works best when the parameters have a real-world meaning. This is a point that became clear to me in my work in toxicology (Gelman, Bois, and Jiang, 1996): when you have a model whose parameters have numerical interpretations ("mean," "scale," "curvature," and so forth), it can be hard to get useful priors for them, but when the parameters have substantive interpretations ("blood flow," "equilibrium concentration," etc.), then this opens the door for real prior information. And, in a hierarchical context, "real prior information" doesn't have to mean a specific, pre-assigned prior; rather, it can refer to a model in which the parameters have a group-level distribution. The more real-worldy the parameters are, the more likely this group-level distribution can be modeled accurately. And the smaller the group-level error, the more partial pooling you'll get and the more effective your Bayesian inference is. To me, this is the real connection between scientific modeling and the mechanics of Bayesian smoothing, and Kass alludes to some of this in the final paragraph of his comment.

Hal Stern once said that the big divide in statistics is not between Bayesians and non-Bayesians but rather between modelers and non-modelers. And, indeed, in many of my Bayesian applications, the big benefit has come from the likelihood. But sometimes that is because we are careful in deciding what part of the model is "the likelihood." Nowadays, this is starting to have real practical consequences even in Bayesian inference, with methods such as DIC, Bayes factors, and posterior predictive checks, all of whose definitions depend crucially on how the model is partitioned into likelihood, prior, and hyperprior distributions.

On one hand, I'm impressed by modern machine-learning methods that process huge datasets and I agree with Kass's concluding remarks that emphasize how important it can be that the statistical methods be connected with minimal assumptions; on the other hand, I appreciate Kass's concluding point that statistical methods are most powerful when they are connected to the particular substantive question being studied. I agree that statistical theory is far from settled, and I agree with Kass that developments in Bayesian modeling are a promising way to move forward.

This story is pretty funny. "Distractions in the classroom," indeed. They take nursery school pretty seriously down there in Texas, huh?

Recent Comments

  • Andrew Gelman: Mark: I think you're misunderstanding my point. I'm trying to read more
  • anon: being able to something once is a lot less than read more
  • Mark Rutherford: Does it ever occur to you and your interlocutor that read more
  • Krish Swamy: The gap that you talk about - from doing something read more
  • tgras: Dr. Gelman, I wholeheartedly agree with you that his response read more
  • agnostic: Quigley writes about women and the workplace. Taylor should use read more
  • Alexandre: I think what the professor meant was that the students read more
  • Andrew Gelman: Gabe: I grade GEB below 80/100. My problem with it read more
  • Phil: We really only need one person: the median voter. Make read more
  • tgrass: I had come across Taleb's rules of emailing last year. read more
  • Gabe: Dr Gelman - I agree the book was entertaining. But, read more
  • Andrew Gelman: John: Interesting about Knuth. The no-reading-email-before-4 and deal-with-each-item-right-away strategy works read more
  • Andrew Gelman: I think Fodor and Burdzy are quite different cases. Fodor, read more
  • Steve Sailer: I'm reading veteran cop novelist Joe Wambaugh's 1984 nonfiction book read more
  • Steve: I've noticed this phenomena alot lately. Philosophers critiquing ideas used read more
  • Matt: Funny, I actually emailed him once, after Fooled By Randomness read more
  • Gabe: Taleb doesn't understand the notion of linear scale versus log read more
  • Barry Burden: It's not exactly overrepresentation or underrepresentation that matters, but rather read more
  • thom: The BBC is generally very good, but it is big read more
  • John Cook: Along these lines, Donald Knuth quit using email 20 years read more