Bad Data

By James Kwak

To make a vast generalization, we live in a society where quantitative data are becoming more and more important. Some of this is because of the vast increase in the availability of data, which is itself largely due to computers. Some is because of the vast increase in the capacity to process data, which is also largely due to computers. Think about Hans Rosling’s TED Talks, or the rise of sabermetrics (the “Moneyball” phenomenon) not only in baseball but in many other sports, or the importance of standardized testing scores in K-12 education, or Karl Rove’s usage of data mining to identify likely supporters, or the FiveThirtyEight revolution in electoral forecasting, or the quantification of the financial markets, or zillions of other examples. I believe one of my professors has written a book about this phenomenon.

But this comes with a problem. The problem is that we do not currently collect and scrub good enough data to support this recent fascination with numbers, and on top of that our brains are not wired to understand data. And if you have a lot riding on bad data that is poorly understood, then people will distort the data or find other ways to game the system to their advantage.

Readers of this blog will all be familiar with the phenomenon of rating subprime mortgage-backed securities and their structured offspring using data exclusively from a period of rising house prices — because those were the only data that were available. But the same issue crops up in many different stories covering different aspects of society.

CompStat, an approach to policing that focuses on tracking detailed crime metrics, was widely credited with helping New York and other cities reduce crime in the 1990s. Last year, This American Life ran a story, based on a police officer’s secret recordings, detailing how in at least one precinct officers were pressured to boost their numbers through dubious arrests and citations. They also found another precinct where serious crimes were reported as less serious crimes in order to make their numbers look better than they really were.

In a recent New York Times story, David Segal describes how law schools massage their metrics to score higher in the US News and World Report rankings. Segal focuses on the tricks that some schools seem to use to boost the number of graduates employed nine months after graduation; for example, some schools apparently hire their own graduates to temporary positions that happen to span the date on which employment rates are measured. The rankings are based on statistics that are defined by the American Bar Association but are self-reported by the schools and not audited by anyone.

The big, well-known example of how the importance of data breeds data manipulation is standardized testing. In the early days of the standardized testing boom, the key statistic was the percentage of students at or above grade level, defined as the fiftieth percentile on some standardized test. (For those wondering if this is circular, the scaled score required to be at the fiftieth percentile is set before the test based on the attributes of the questions included in the test; it is not set after the test based on students’ actual performances.) So one obvious tactic would be to focus on students in roughly the thirtieth to sixtieth percentiles while ignoring the others. Another, more problematic tactic would be to classify as many low-performing students as possible into special education so that they would not be in the denominator. (Then there is blatant cheating, like giving your students more time to take the test or simply correcting their answers afterward — Freakonomics has a chapter on this — since few if any school districts have the capacity or the motivation to oversee the tests rigorously.) Even leaving aside data manipulation issues, there is also the basic problem that test difficulty varies from year to year. The test in year N + 1 is calibrated to be the same difficulty as the test in year N, but this is all based on statistics, and there is this thing called random variation to deal with.

And I recently read Natalie Obiko Pearson’s story in Bloomberg on the problems with greenhouse gas emissions data. Most of the numbers we read are self-reported by countries and the companies in those countries, and even if they are honest (a big if) they are “bottom up” estimates — based on how much fossil fuel is being consumed. But when scientists actually measure changes in greenhouse gases in the atmosphere, they get different results than predicted by the bottom-up estimates. And in all the examples cited in Bloomberg, actual atmospheric measurements are higher than bottom-up estimates. This could be because the article didn’t mention atmospheric measurements that were lower than predicted by official data. But it could also be because both the companies burning the fossil fuels and the countries aggregating the data have the same incentive to underreport: companies because it means they don’t have to buy as many carbon permits and countries because it means they can claim to be under their Kyoto Protocol targets.

Greenhouse gases are a good example of how we think data will help save us — if we can track how much carbon dioxide each company is producing, we can make it pay for that carbon — but we may just not have good enough data. In general, I think the current trend toward using more and more data is a good thing. I mean, what’s the alternative: gut intuition? But this only increases the importance of having good data to begin with. And when some parties benefit from bad data, this can be a big challenge with no easy solution.

Commentary

41 thoughts on “Bad Data”

Nemo says:

February 13, 2011 at 8:54 pm

What we need are people who are incentivized to obtain and report accurate data.

We could call them “regulators” or something.

…

Nah, that would never fly. Forget it.
Evan H says:

February 13, 2011 at 9:03 pm

Nemo, there are systematic problems with keeping regulators honest and properly incentivized. Regulatory capture is a real thing. That doesn’t mean that regulation is inherently a bad idea, but it does mean that problems can’t be solved just with “more regulation” in an unfocused, generalized sense.
Ted Auch says:

February 13, 2011 at 9:15 pm

For a detailed look at our CO2 footprint check out a posting by yours truly at my blog from a “back of the envelope” point of view.
http://www.tedauch.com/2010/08/31/botes-and-co2/
Also keep in mind that when you look at CO2 on the Y- and GDP on the X-axis globally and regionally you get different linear fits. The reason? Because as you get more granularity GDP doesn’t explain emissions as much given that livestock flows of CO2, CH4, and N2O play a larger and larger role. Whenever you see x-country emits x-tons of CO2 per capita you think that that is all inclusive. Well it isn’t AND it is only from a purely supply-side perspective. For example here in my home state of Vermont we have a substantial dairy component to our agricultural landscape. If you were to simply add the cow’s myriad emissions sources to our average per capita emissions of CO2 the answer would be frightening, BUT if you weight the former with respect to demand-side forces (i.e., population density, obesity, GDP per capita, etc) than states like New Jersey soar past everyone else. I will be posting on my blog these numbers very soon so stay posted.
Remember Dan Daggett’s “Beyond the Rangeland Conflict” which discusses the willing ignorance of urban and exurban consumers of meat and dairy. Sure Colorado and Nebraska and Iowa emit tons of agricultural related CO2, CH4, and N2O but it is only because demand-side pressures are so astronomically high.
Sam A says:

February 13, 2011 at 9:18 pm

What’s the point of getting better climate data? The climate deniers will call it a hoax anyway. When the Atlantic coast is somewhere around Oklahoma City, maybe Jim Inhofe will concede this is a man made problem. Then again, he may not.
dunkelblau says:

February 13, 2011 at 9:37 pm

And now that brokerages are reporting share cost basis directly to the IRS, will we find that income inequality is larger than we thought?
Per Kurowski says:

February 13, 2011 at 10:38 pm

@ James Kwak “And when some parties benefit from bad data, this can be a big challenge with no easy solution”

Think of perfect data… which leads to perfect transactions… swaps that generate no profits for anyone. That is not the way the world works. Quite often it is only the profit opportunities created by using wrong data that drives the world forward. But, don’t worry it will be very long before we have that sort of perfect data… probably never. The data will surely contain faults… but more than that it is our perception of their validity that counts.

The real problem with the securities collateralized with mortgages to the subprime sector, much much more than the faulty triple-A ratings awarded, was the credibility the regulators awarded those credit ratings. Just think of allowing a bank to leverage over 60 times in an uncertain world, just because of some ratings. That surely points to extremely innocent and gullible regulators who have never walked down the main and back-streets of this world.
Garrett Wollman says:

February 13, 2011 at 11:58 pm

For the bookshelf: Charles Seife, /Proofiness/, published last year in hardcover. Seife is rather opinionated (and savages Nino Scalia in particular) but has a good taxonomy of real-world “statistics fail”.
Six Ounces says:

February 14, 2011 at 1:05 am

Yeah, regulators will get it right. ;)

How many revisions must the BLS make each year? Are their revisions more likely to be positive or negative? Why must they recalibrate their figures every March? Is their X-12-ARIMA model for seasonal adjustment valid when there are structural breaks?

The BLS has been attempting to seasonally adjust metro area data for years, and they keep postponing the release because of data problems.

If the nation’s best data collectors can’t get it right, what makes anyone think anybody else can do it any better?

In grad school I learned that the best research begins with a good theoretical model. Then you collect data for your variables and conduct empirical research. The reality is that the greatest limitation on empirical research is data availability, so researchers cheat – they start with the data that’s available and work backward. The biggest sin is using the wrong data for your dependent variable – the inferences are invalid by design.

And because getting good data is expensive, only the researchers endowed with grants get it, and they seldom share it with others to reproduce their work. Sometimes the data is proprietary, and they are prohibited from sharing it even if they wanted to.

As Deirdre McCloskey has warned us, even when we get significant statistical results, the results might not have any practical significance.

If data is measured with error, then all the estimators are biased and inconsistent. All inferences are thus faulty. And practically all economic data is measured with error.

So most of our economic analysis and forecasting has been reduced to tossing chicken bones into a tin pot based on faulty data alone. Worse, researchers are just making stuff up. Many authors have preconceived conclusions, and torture data until it confesses.

“87.6 percent of statistics are entirely made up.”

“You can use all the sophisticated techniques you want, but always remember that the data was collected by the village watchman who writes down whatever the Hell he wants.”

“Oh Ron, there are literally thousands of other men that I should be with instead, but I am 72 percent sure that I love you.”
Alice Cook says:

February 14, 2011 at 1:25 am

Superb post – really enjoyed reading it.
Anonymous says:

February 14, 2011 at 1:30 am

Experiments never lie (by definition.) People lie. So, start scrubBing some people… and good luck with that… oh, and give yourself a nice scrub too… personally, I find it impossible to remain pristine…but I do take my share of showers (
saeedbabar says:

February 14, 2011 at 1:55 am

Good common sense and less reliance of too much data is all we need.
Bayard Waterbury says:

February 14, 2011 at 3:15 am

James, your title Bad Data may be a bit off. If I were to name this article, I would be Why Data Shouldn’t Matter.

I will give you a prime example, from a personal perspective. I suffer from dyslexia. It’s been a life long stuggle, but has been substantially ameliorated by great and continuous effort. I am quite bright (testing around 140 on every IQ test ever taken, and scoring above the 90th percentile or better on literally every kind of test ever administered to me by an independent tester). I grew up in a middle class family of people who had a great respect for knowledge and learning. I began reading before I was in kindergarten because my parents worked with me diligently. I am 65 years old with a brother who is 62. We both have had very respectable professional lives, I in law and him in physics. Even now, we are knowledge junkies and widely read in a great number of areas of knowledge.

We took our SAT’s long before there were study guides and classes. We both went to public school. I was a “B” average student, but an athlete who lettered in three sports, and highly socially active. He was a good athlete, but really a book worm. I took the SAT’s “cold” and scored a 1492 out of 1600. He took them “cold” also and scored perfect. There was standardized testing when I was in high school, but teachers didn’t teach “to” the tests. In fact, there weren’t even special courses for those who were brighter. But teachers in our school treated everyone the same and expected each of us to learn what was required. I had a couple of truly great teachers, but most were fairly average.

Today, I think that in education, the focus on standardized testing is completely inappropriate to the task of educating students, especially if the measure of performance matters on a statistical basis. What I mean is that it can be useful, especially in helping students on an individual basis to find areas of their education that need boosting, but certainly not to determine if students are really learning the subject matter and not just the test answers. My view has always been, if one learns everything then the testing only determines if they can test well (able to remember and execute things under pressure).

Beyond education, it is very easy to see how statistics can be seriously misleading to the point of absurdity. Take the 9% undemployment rate, for instance, or the measure of inflation, or…. We need to be very careful about the data analysis, and the use of it to make arguments.

Suffice it to say, I love your article.
Dan Palanza says:

February 14, 2011 at 4:35 am

Nemo Said: “What we need are people who are incentivized to obtain and report accurate data.
We could call them “regulators” or something.”

The keeper of accurate financial data is the traditional double-entry book-keeper. We are living through a cycle of financial leaders that see the book-keeper as a bean counter. I would bet that all of the numbers guys who read this blog would be surprised to know that a proper double-entry book-keeping framework of rules relies on the balancing of four categories of arithmetic. If you know of an economist that is aware of a proper book-keeping’s four arithmetic, I would love to meet him or her.

There can be no regulation so long as there is no control language to enforce that regulation.

A proper book-keeping uses a double-entry pattern to distinguish between numbers that measure value, set isomorphic and equivalent to, numbers that express rights. Think of measured value as cash and expressed rights as capital. Capital is a potential, until a marketplace traded turns it into cash. The mortgage scam turned mortgages into paper that was sold at face value. By the time the actual value could be determined the sellers of paper-value has put the cash into other, safer assets.

A proper book-keeping also uses a simultaneous double-entry pattern to distinguish between numbers that tally each entity’s asset-value, set isomorphically equivalent to numbers that tally rights to ownership’s expression. Between these two system|language relationships the book-keeper uses the two categories of measured-value to produce a Statement of Profit [loss]. Simultaneously, the book-keeper uses the two categories of expressed rights to produce a Balance Sheet.

Where a proper Statement of Profit [loss] tells where entity’s value is being created, a proper Balance Sheet tells who owns the rights to that created value, which the business entity has worked to create. Such book-keeping standards began to disappear with the proliferation of the microprocessor in the late 1970s. By 1992 the proper book-keeping had largely disappeared. Today, it appears to be completely extinct, except for a computer driven prototype of the book-keeper’s art, which I have been building over the past 30 years..

The art of double-entry book-keeping’s earliest records date from 1340 A.D. It was first documented about 150 years later for the Catholic Church by Luca Pacioli. It was further refined and automated in the nineteenth century, along with the development of thermodynamics — which is a member science of the book-keeper’s art. I have never run into a physics or math person who knows of double-entry book-keeping’s relationship to thermodynamics.

Nothing will be regulated until we know how to model a proper double-entry book-keeping into a formal computer science. It is tricky, but not that difficult to do.
…
Rien Huizer says:

February 14, 2011 at 6:21 am

James,

It could be very simple:

We need honest people who do not act out of self-interest all the time. What about taking some DNA from departed saints and cloning them, in order to have stock to replace this current, rotten population with.

One problem: most of these saints were celibate. How do we make them reproduce and not fall into the traps that threaten mammal life with extinction in about 200 years? Their clones may be good, but once these former nuns and priests start to breed, they may get offspring and interests that celibacy was supposed to prevent.

Maybe we should simply stop worrying. Nobody goes before their time. Carbon or no carbon.

No economics subjects today? Data etc is not economics. Economics is a purely axiomatic discipline within applied mathematics (i e it would be applicable to reality, if there was such a reality as assumed by most of economic theory, but there cannot be, at least not completely). And that is pretty good. Because in the past policymakers who were following their own interests (they were usually not (yet) saints) would use whatever ideology that would lend legitimacy to their policies, and that would often mean religion, nationalism, class, etc Much better to have families of economics that we believe we should pay attention to. At least the stuff lends itself to rational debate. And for that you may need data, but because someone else’s data are unreliable, they do not defeat your hypothesis. Beautiful isn’t it?
Rien Huizer says:

February 14, 2011 at 6:39 am

Plato would call them guardians, and his guardians were supposed to protect the people against democracy. Would you sacrifice your freedom for accurate data? And once these data were accurate, what would happen? The world’s problems would be solved by intelligent -guardians? What about the very practical system of self-interested exchange that has given our species such a dominant position (virtually all mammals on earth depend on humans for being fed, medicated, even reproduction, or (like the rodents) our generosity in producing rubbish. The question that will arise if there is abnormal temporal climate variation (for the time being the jury is still out, although variation in atmospheric CO2 may well play a role in this utterly complex process) is what will be sacrificed first when the climate shrinks our habitat: our livestock and rodents, or our fellow man? I guess Plato’s guardians would resign if they were to oversee such a mess. And I would not be surprised if lots of people would find this a difficult question, if they has to make such a choice explicitly and consciously. Give up beef to a allow a Bangladeshi family to survive or at leeast not have to become a climate refugee? I guess most of us would prefer the Army Corps of Engineers to build huge levees there rather than that. And lets face it, once we can grow bananas in Alaska, maybe we need immigrants there.
Jeffery Ewener says:

February 14, 2011 at 8:48 am

I think your conclusion reveals in a way the limitation of this kind of thinking. Quantitative analysis must be good because none at all would be bad. But it’s just this single-dimensionality that is the real danger of this kind of thinking.

Your study of the quanting of K-12 education was very good, but it didn’t address the more basic question — what exactly is the change (or outcome?) we are trying to effect in our young people through the process of education … and can it be quantified? No doubt certain parts or aspects of it can be — reading, mathematical, musical skills are either present or not and to some extent can be graded on a line. But not all of them can — things like a kid’s desire to read, the joy they take in music, their creativity in math, things that make the difference between mediocrity and excellence (even leaving aside such radically non-quantifiable outcomes like human fulfillment) already start to slip away from our measurements. And the preconditions for learning anything, even the most knuckleheaded number-crunching — things like desire, attention, excitement, a willingness to collaborate and so on — it’s hard to see how these can be usefully quantified at all, though educational outcomes by any measure are certainly highly dependent on them.

The trouble with quantification is that it is at best a tertiary operation, but it thinks it’s primary. There are simply aspects of human life — if I had to quantifiy it, I’d say north of 99% — that cannot be usefully, or consistently, or meaningfully quantified. This is not to deny the power of quantification, the necessity for it, in certain areas. The trick is to distinguish between these two realms. The danger is in the clear, unambiguous, satisfying quality of a number, a quality which seduces people into thinking that, because they have stuck a number on something, they understand it, though in truth they have no idea what their number measures or what it means. This is now tending to make quantitative methods not merely inapplicable, but wrong — and not merely wrong, but destructive.
Freedom to... says:

February 14, 2011 at 8:56 am

Interesting questions. According to Einstein, knowledge comes from experience; everything else is information (data). Maybe data is significant only in so far as it is interpreted by human values, i.e. becomes knowledge. CO2 levels in themselves wouldn’t mean much, except they are related to life as we know it. How many more people are aware of CO2 post the US’s winter storms or Australia’s flood/cyclone? Making the connection and doing something based on the data, that’s human too.
Herbert Wetherby says:

February 14, 2011 at 9:18 am

Yes, the unemployment rate is a two edge sword, as more are forced off the books, the rate drops making it look better, but the incomeless ones are also worse off and fall into the hands of welfare state. And if people believed everything they were taught and/or told, they would be in quite a quandry today.
Ian says:

February 14, 2011 at 9:52 am

The best book on this subject that i’ve read is “the logic of failure” by Dorner. It a fast read and very applicable to anybody that deals with complex decisions or statistics. Basically, decision making processes for most people are emotional, based on recent experience, and discount real, but unrealized, risks. There are reason a why we spend more money researching cancer than heart disease, yet heart disease kills more people per year.

The same logic (or lack thereof) is why there is a gap between what people want to have in terms of government services and what they are willing to pay in taxes.

Basically, more data is only as good as the people making the decisions. Frankly, most people suck at making good decisions.
jhm says:

February 14, 2011 at 9:59 am

I think you assume that the problems associated with unsatisfactory outcomes is the quality of the data rather than the reliance on quantitative analyses.

My favorite quote illustrating my point:

DAVID SIMON: You show me anything that depicts institutional progress in America, school test scores, crime stats, arrest reports, arrest stats, anything that a politician can run on, anything that somebody can get a promotion on. And as soon as you invent that statistical category, 50 people in that institution will be at work trying to figure out a way to make it look as if progress is actually occurring when actually no progress is. And this comes down to Wall Street. I mean, our entire economic structure fell behind the idea that these mortgage-based securities were actually valuable. And they had absolutely no value. They were toxic. And yet, they were being traded and being hurled about, because somebody could make some short-term profit. In the same way that a police commissioner or a deputy commissioner can get promoted, and a major can become a colonel, and an assistant school superintendent can become a school superintendent, if they make it look like the kids are learning, and that they’re solving crime. And that was a front row seat for me as a reporter. Getting to figure out how the crime stats actually didn’t represent anything, once they got done with them.
Rich S. says:

February 14, 2011 at 11:32 am

I would guess, after 15 years in this Kansas company, that somewhere around 75% of the engineers sitting around me within a 100 foot radius would take the unsubstantiated opinion of Rush Limbaugh over any quantitative data that could be presented. It wouldn’t matter how credible the data might be, the source of the data, how well the data was presented.
As long as ostensibly educated people are willing to put politics or religion over science, no matter the subject, trying to argue a point using data is a waste of time. In order to have a meaningful debate, you have to have some agreement regarding the basic facts of a matter. When you have Rush telling my colleagues that they are entitled to their own facts, the debate never gets to that point. Using more data in such an environment just adds to the noise.
Does anyone have the Oppenhiemer quote that goes something like, “You will never convince a man using the facts. Eventually, the science passes these people by , and they just die off.” I read it somewhere a few years ago and can’t find the correct quote. I suspect someone else here knows it.
Stephen A. Boyko says:

February 14, 2011 at 11:44 am

Mr. Kwak:

A well written and timely post.

“Readers of this blog will all be familiar with the phenomenon of rating subprime mortgage-backed securities …”

From the military to medicine, but all suffer from a similar structural problem. Monolithic governance (one-size-fits-all) of a multi-faceted domain creates errors of conflation that produce unsatisfactory results by not differentiating determinate from indeterminate sources of data. To illustrate, MBSes comprised of no-money down, NINJA loans that are marked-to-model are uncertain. How did uncertain investments receive AAA ratings?
Mark K says:

February 14, 2011 at 12:16 pm

There was a big controversy, climategate, in December 2009. This actually gave the sceptics a bit of ammo, coming out at the same time as the Copenhagen conference.

There are deniers, but some have legitimate claims to deny. A lot of the climate change researchers don’t give out their data, or explain their methods for standardizing data captured over decades in spots all over the globe. If these climate scientists want to be taken seriously, they should allow either their data to be examined by outsiders, or at least develop the best possible methods for getting new data.
Dan Palanza says:

February 14, 2011 at 12:22 pm

“Does anyone have the Oppenhiemer quote that goes something like, “You will never convince a man using the facts. Eventually, the science passes these people by , and they just die off.” I read it somewhere a few years ago and can’t find the correct quote. I suspect someone else here knows it.”

Max Planck said it this way: “It is one of the most painful experiences of my entire scientific life that I have but seldom — in fact, I might say, never — succeeded in gaining universal recognition for a new result, the truth of which I could only demonstrate by a conclusive, albeit only theoretical proof.”

He goes on later in the Chapter: “A new scientific truth does not triumph by convincing its opponents and making them see the light, but rather because its opponents eventually die, and a new generation grows up that is familiar with it.”

Social science is way tougher than the physical sciences were in Planck’s time. Hopefully the episode in Egypt will become an example of Planck’s faith in the new generation of scientists.
Rich S. says:

February 14, 2011 at 12:23 pm

Climate scientist Judith Curry from Georgia Tech has expressed a similar opinion. She has published articles on this recently.
The Pensum (Mark K) says:

February 14, 2011 at 1:38 pm

Well it kind of makes sense. I can understand keeping data classified when it comes from proprietary sources, or where disclosure is paramount for want of keeping personal information private. Unfortunately a lot of government/financial data is of this nature. But when it’s data of a natural effect, say temperature, secrecy is not necessary.
The Pensum (Mark K) says:

February 14, 2011 at 1:44 pm

When I was learning econometrics, it looked to me like half the techniques we were learning were developed to overcome shortcomings in data. Perhaps in the future, these will all be redundant when more action takes place on the web/electronically, and thus quantifiable.

Incidentally, http://bpp.mit.edu/ — billion prices project at MIT. A way of overcoming faulty CPI data?
Tom Drolshagen says:

February 14, 2011 at 2:02 pm

This kind of problem shows up on the internet by the gaming of Google’s search ratings. There have been a few articles on how the Huffington Post website has been doing this very successfully. They’re going to get bought by AOL for over $300 million.
Anonymous says:

February 14, 2011 at 2:17 pm

One note on the standards-setting process in testing: it’s quite a bit more “closed-loop” than you may think. Questions are “field-tested” on actual students; sometimes as non-credit questions mixed in, sometimes as whole sections (the GRE does this), and sometimes as explicit “practice tests”.

In any case, questions which don’t end up with the “expected” distribution are typically discarded. Sometimes this is due to actual problems (unclear wording, sometimes outright factual errors in the question), sometimes it’s to counter accusations of bias (if the distribution shows variation across demographics) and sometimes it’s simply a matter of the students lacking the required knowledge, particularly on math and science questions.

For reference, I worked in the testing industry for over 5 years both as a scorer and as a trainer / developer. Can’t be more specific, as everything’s under NDA.
Rich S. says:

February 14, 2011 at 6:24 pm

I think very little climate data is proprietary. One reason some people are reluctant to provide their raw data is that the deniers, being mostly republicans, will twist, cherry-pick, and blatantly lie about what the data mean. Scientists are understandably afraid of having their names dragged through the mud by dishonest right-wing demogogues.
I undersatnd that democrats are no angels, but pale in comparison when it comes to lying.
buskerbayarea says:

February 14, 2011 at 9:44 pm

Right – we don’t need more data – we need honest data.

I’m sick of the dishonest data running rampant in our global economy. I’ve started the non-violent revolt against the rich to get some honest data, the eradication of global debt, and the immediate rebuilding of our infrastructure to be sustainable.

learn more: buskerbayarea.wordpress.com
mollyrose says:

February 14, 2011 at 10:12 pm

James, this is a wee bit off the subject but is about the interpretation of data, and I would like to draw your attention to it and ask for your take on it:

Have you heard of the odd and unexplained trading pattern which showed up in Wallstreet’s superfast computer trading system ? — according to my source it was identified by Jeffrey Donovan and named “mountain range.”
priscianus jr says:

February 14, 2011 at 11:26 pm

molly,
http://sleaff.wordpress.com/2010/12/13/cause-of-may-6th-stock-market-flash-crash/
alecpatton says:

February 15, 2011 at 6:08 am

Great comment – made me laugh, then depressed me.
alecpatton says:

February 15, 2011 at 6:10 am

Should have clarified – Nemo, my ‘great comment’ remark was aimed at your opening one-liner about ‘people who are incentivized to obtain and report accurate data.’
progressus says:

February 15, 2011 at 7:48 am

We are a society obsessed with results. We conduct life as if results are the only things that matter. To most results by any means are results just the same. We manage by results, we define problems by results, we define our job by results, we make individuals accountable for results, we cause harm to others in the pursuit of results, we cheat and lie to show results, and we even define our self by the results we get. Only results matters!

The system is never the focus of serious attention for improvement. Why do we do this? In general there is a lack of understanding that results are the effects of a process/system. Moreover this lack of understanding is associated with an absence of systems thinking, statistical thinking and critical thinking. The examples presented in this article are evidence of this. (http://www.forprogressnotgrowth.com/2010/11/30/a-matter-of-results/)
mondo says:

February 15, 2011 at 8:23 am

Thanks for the link, priscianus. My source was Fortean Times and I was hesitant to post it because it sounded too dee-doo dee-doo. But this real enough — a genuine unexplained (so far) anomaly right in the blood-flow of our financial system.
RA says:

February 15, 2011 at 7:53 pm

You’re repeating nonsense that was refuted a long time ago.
Search Google Scholar with “climategate refuted” and take a look at the results and the credibility of the sources that turn up.
Annie says:

February 15, 2011 at 9:37 pm

Too crowded inside the chip, eh? So “stuff” starts to rub elbows and grab onto each other – basically, you’re out of space – not “computing” space, but space-space – cosmic distance between planets and suns, for example. Too much fraud, too little time and space…

And the Mandelbrot set is how gravity distributes itself throughout space, like duh.

And if anyone is still conversing with the carbon “scientists” bought and paid for by oil and coal “subsidies”, then you are a retard :-)

Of what scientific value is that nanosecond transaction “data”? Or “derivative” data? Or health insurance billing data?

Caught General Mc Caffrey (sp?) throw out “data” on TV – he said 80% of the WORLD’s wealth goes through the Suez Canal – and the genius strategy for establishing cultural excellence as the “power” to keep the area safe was torture, tyranny and mutilation of women? Wow – impressive. Monkey brains transcend “selfishness”?

Let’s get some major phone lines into “libertarian” Somalia and distribute the names and phone numbers of the families of USA soldiers who are overseas so that the unemployed boyz have a real job – start the foreclosure scam rolling over the families and get “rich” quick, plus a whole lotta sweet revenge – that’s “democracy” you can bribe your pirate enemies with and keep them out of the canal at the same time. How’s that for keeping the “strategy” going in the “Middle East”?
sleaf says:

February 18, 2011 at 8:16 pm

Thanks for the bump, priscianus. ;)
pulse oximeters says:

February 22, 2011 at 11:19 am

Great article..made me think that’s for sure. There is so much quantative data everywhere these days, especially with internet usage, and everyone is eager to get their hands on it for various purposes. Problem is, with the quantative stats you get off the internet, they are more often than not, wildly wrong and extrapolated from data mined from servers, and this can cause problems for businesses and people relying on this information. Qualative information, whilst hard to come by, is a better bet!

Comments are closed.

Share this:

Related

41 thoughts on “Bad Data”