I like graphs — they're a wonderful way of conveying information. I love to make elegant, communicative data graphics.
So when I see a bad one, it ticks me off even more than most poorly-done things do.
Apologies in advance to the Center for Public Integrity, therefore, who put together a lot of data on the mortgage crisis in a somewhat clumsy and confusing manner. Most of what they've done is imperfect, but serviceable. (In particular, the heat maps they rely on are not especially communicative.)
But I got to one graph that was particularly poorly done, and frankly misleading.
That's this graph:
Let's start by looking at what this graph actually shows. There are two lines here. The top line (in red) is the median value of a mortgage originated in that year (in inflation-adjusted dollars). The bottom line (in yellow) is the median income of the people who took those loans. Fair enough as a basic, first order approximation of what's going on — this is showing that loan size grew out of proportion with income.
It's as soon as we get to the framing that we run into trouble. The graph says "In 1994, $73,000 was the annual median income for a loan of $120,000." But that's not at all what the graph is showing — the graph is indicating the median income vs the median loan amount. This data isn't (and can't) give any indication of the distribution of these mortgages. Loans of $120,000 might have been going to people with higher than median incomes, while more expensive loans went to people who were less risk-averse and had lower income. They stretch this a little further with their next sentence, but it's the same basic objection, that they've not presented data about the distribution.
Where this graph truly loses it is the title, which is entirely misleading. The graph is entitled: "Increasing Percentage of Income Goes to Housing." But that's not at all what this graph shows, and moreover this obfuscates a critical point about the way the mortgage crisis played out. Mortgage brokers regularly sold graduated payment or balloon mortgages, which start off with much smaller monthly payments and incur much more expensive ones or require refinancing later on; in many cases, when the monthly payments went up at the end of the initial grace period, homeowners bailed on their mortgages, contributing to the crisis.
As a result, the size of the principal (the total amount borrowed) is not a good indicator of the amount of income actually being spent on housing. It's a virtual certainty that monthly payments track income more closely than loan size, and it's possible they track much more closely. It's really key that a lot of these loans defaulted because people were signed on to loans they never could have afforded, but were never required to bear the real cost of those loans.
Let's make it better
There's three ways they could've made this graph better.
First, they might've supported the point they do in the explanatory text, and shown the median loan amount for a band of income (or particular bands of income) over time. Seeing the average loans granted to people making $65-70K, $70-75K,$75-80K, etc. (or using quintiles) would make the distribution clear and let us know if loan size really does track income. It might be that (and it would be very interesting to see if) low income households increased their loans much more than median income families, for example. This would be a much more informative graph, though a little more complex.
Second, they might've just substituted monthly mortgage payments (and monthly median income) into the graph in place of the variables they have now, which fits the title.
Third, they could have shown three things: monthly median income, monthly median payment, and monthly payment as a proportion of principal. This both conveys the point in the title (people are spending more on housing) and that people are taking out loans far beyond their means. That's a graph worth looking at.
If I can get my hands on some data I may try making one of those this weekend.