╌>

Data Science Has Become About Lending False Credibility To Decisions We've Already Made

  
Via:  Nerm_L  •  5 years ago  •  8 comments


Data Science Has Become About Lending False Credibility To Decisions We've Already Made
It is truly remarkable that our era of searching data for answers has devolved into searching data until we find support for the answer we've already decided upon.

Sponsored by group News Viners

News Viners

S E E D E D   C O N T E N T



One of the greatest failures of data science has been the way in which it has devolved from the genuine search for answers into just another tool to lend credibility to the answers we want. It no longer matters what our data actually says or whether the data we are using is in any way relevant to the questions we ask of it. All that matters is that we can justify our preordained decisions with the certainly of “data.” As we rapidly undermine the promise of data science, will our trust in data fade with it?

The misuse of data and statistics to support preordained decisions has reached such a cultural touchpoint today that even Scott Adams’ Dilbert cartoon has lampooned the practice, with the boss offering the helpful advice that “Does it matter [if my spreadsheet is wrong], as long as it gives me the answer I want?”

It is truly remarkable that our era of searching data for answers has devolved into searching data until we find support for the answer we've already decided upon.

Today’s data science is less and less about the genuine search for answers. We no longer embark upon an analysis with hypothesis in hand and open to whatever answer our data ultimately yields. Instead, like doctor shopping, we “data shop” until we find a dataset and methodology that gives us the answer we want.

We live in a world in which our preeminent scientific institutions convene the nation’s most respected researchers to advise our government on misinformation and the resulting report centers on Twitter not because those researchers believe it plays the most important role in the spread of misinformation or yields the most accurate results, but because it was the easiest for them to get their hands on. It seems even academia has been led astray by the siren song of data hype.

Indeed, many areas of data science like “social media analytics” are not actually based on methodologically or statistically rigorous data analysis at all.

Social media analysts focus nearly exclusively on Twitter because it is the easiest dataset for them to get their hands on, not because it is the most relevant or accurate dataset for the phenomena they hope to measure.

Nearly the entire historical output of social media assessments going back the last decade and a half have reported absolute counts rather than normalized trends, calling into question or even completely invalidating a large fraction of the research drawn from social media.

Researchers blindly report trends from datasets they have no understanding of, running basic searches and reporting results without any idea of how their datasets are changing out from under their analyses.

Yet, none of this matters because we no longer see data as yielding answers, but rather as a veneer of credibility to wrap around the answers we want.

Data scientists no longer turn to statistics, rigorous methodologies and the scientific method to interrogate large datasets they understand deeply and yield findings that have been carefully normalized, scrutinized and verified.

Instead, data science has become two things: hyperbole and lending false credibility to decisions that have already been made.

Hype has become synonymous with how the research community increasingly views data science. Researchers sprinkle data science buzzwords over their proposals, publications and grant submissions like some sort of magical fairy dust, confident in the unfortunate truth that the mere presence of phrases like “big data,” “social media analytics” or “deep learning” will massively improve their odds of success, regardless of the actual question being asked or the accuracy of their results.

Yet, beyond the hype, the analyses that are actually performed have unfortunately become about searching for tenuous or even entirely false findings that can lend some air of credibility to past decisions that have already been made. Any decision, no matter how incorrect, can find conclusive data-driven support merely by searching until some method applied to some dataset is sufficiently adjusted to yield a supportive finding.

Putting this all together, data science is no longer about analyzing data or giving our data the opportunity to speak to us.

Instead, data science has become about hype-fueled fairy dust that can boost the prospects of a resume or report with its trendy buzzwords.

Most dangerously, it has become about the misuse of statistics, data, research methodologies and the scientific method to lend false credibility to decisions that have already been made.

We no longer devise a hypothesis and test it using data. We start with the conclusion we want and find the data and methods to support it.

As data science becomes about false hype and conscripting data in the service of preordained conclusions, we risk undermining the public’s trust in data and halting the data revolution just as it has begun.

------------------------------

(Seeder's note:  The seeded article was published by Forbes.  The complete text of the article has been posted for convenience.)


Tags

jrGroupDiscuss - desc
[]
 
Nerm_L
Professor Expert
1  seeder  Nerm_L    5 years ago

Carefully selected facts have become the means for affirming biases.  

 
 
 
mocowgirl
Professor Quiet
2  mocowgirl    5 years ago
We no longer devise a hypothesis and test it using data. We start with the conclusion we want and find the data and methods to support it.

I recently watched a video about the anti-vaxxers using correlations to justify their war on vaccines.  The video host directed her viewers to the site below to see for themselves just how "reliable" that correlations are.  

For example: 

There is a 99.79% correlation between US spending on science, space and technology, and Suicides by hanging, strangulation, and suffocation.   

According to this correlation, the US should eliminate spending on science, space and technology if the goal is to eliminate suicide by hanging, strangulation and suffocation.

Enjoy! (or not)

 
 
 
Nerm_L
Professor Expert
2.1  seeder  Nerm_L  replied to  mocowgirl @2    5 years ago
I recently watched a video about the anti-vaxxers using correlations to justify their war on vaccines.  The video host directed her viewers to the site below to see for themselves just how "reliable" that correlations are. 

And yet the proponents of vaccination utilize the same sorts of correlations to support their position.  The same methods are being used to arrive at diametrically opposed conclusions.  

What is the validity of a statistical outlier?  The outlier data represents real world observations.  

 
 
 
TᵢG
Professor Principal
3  TᵢG    5 years ago

Not the fault of data science.   Human nature has done this forever - use the available resources to support that which one wishes to believe true or wishes others to believe true.

Some people use deception to their advantage.

 
 
 
Nerm_L
Professor Expert
3.1  seeder  Nerm_L  replied to  TᵢG @3    5 years ago
Not the fault of data science.   Human nature has done this forever - use the available resources to support that which one wishes to believe true or wishes others to believe true. Some people use deception to their advantage.

That's like saying that climate change isn't the climate's fault; an example of flawed logic.

Data science didn't create big data.  Big data is an agglomeration of information generated outside of data science.  The Library of Congress is a big data silo and that collection of information was established over centuries.

 
 
 
TᵢG
Professor Principal
3.1.1  TᵢG  replied to  Nerm_L @3.1    5 years ago
That's like saying that climate change isn't the climate's fault; an example of flawed logic.

I do not see any flaw in my logic.   Data science is not a force of nature; it is a set of methods, techniques and tools.   It can be used by human beings for good and for bad.   How do you find a way to blame a set of methods, techniques and tools?    That is like blaming a hammer for hitting your thumb.

Data science didn't create big data.

With this straw man you are arguing against your point.   Yes, I agree, data science did not create big data.  jrSmiley_87_smiley_image.gif

 
 
 
Nerm_L
Professor Expert
3.1.2  seeder  Nerm_L  replied to  TᵢG @3.1.1    5 years ago
I do not see any flaw in my logic.   Data science is not a force of nature; it is a set of methods, techniques and tools.   It can be used by human beings for good and for bad.   How do you find a way to blame a set of methods, techniques and tools?    That is like blaming a hammer for hitting your thumb.

Yes, big data is similar to climate.  The question is whether or not the methods and techniques used to observe and describe 'forces of nature' are used objectively or subjectively.

The point being made by the seeded opinion is that the methods and techniques of data science are being used subjectively.  Climate science has a system of peer review and testing/refutation to limit subjective use of methods and techniques.  Data science does not have a similar system for regulating subjectivity and biases.  Data science does not function the same way as other sciences, suggesting that data science is not really a science.

Data science did not create big data just as climate science did not create climate.  Climate science regulates itself through peer review and testing/refutation to minimize biases and subjectivity.  Data science does not.

 
 
 
TᵢG
Professor Principal
3.1.3  TᵢG  replied to  Nerm_L @3.1.2    5 years ago
Climate science regulates itself through peer review and testing/refutation to minimize biases and subjectivity.  Data science does not.

Data science is not a branch of science.   You are reading far too much into the name.   Further, you cannot seriously be arguing that climate science is immune from malicious interpretation.   Human beings manipulate facts to form the reality they choose.   It is malicious human intent that is the problem.

Data Science is a set of methods, techniques and tools.   Data science is used by human beings.   Blame the human beings who abuse the tools, not the tools themselves.

 
 

Who is online


devangelical
Sean Treacy
Igknorantzruls


123 visitors