【090】不要迷信大数据|The era of blind faith in big data must end

96
TedDigger
2017.10.02 12:25* 字数 591

Speaker: Cathy O'Neil

Key words:数据 算法 歧视

Abstract:算法对每个的的生活都很重要。数学家和大数据科学家Cathy O'Neil告诉我们:1、警惕不要让算法成为rule maker创造出来剥削他人的数学武器。2、很多时候,建立算法时使用的数据本身可能存在缺陷,这会导致算法的不正确和不公平,对此我们应采取措施。

@TED: Algorithms decide who gets a loan, who gets a job interview, who gets insurance and much more -- but they don't automatically make things fair, and they're often far from scientific. Mathematician and data scientist Cathy O'Neil coined a term for algorithms that are secret, important and harmful: "weapons of math destruction." Learn more about the hidden agendas behind these supposedly objective formulas and why we need to start building better ones.

Content:

Fact:

  • Algorithm is everywhere and is used to sort and separate the winners from the losers
  • Algorithms are opinions embedded in code
  • Algorithms are not always objective and true and scientific.

Question: What if the algorithms are wrong?

Two elements of algorithm:

  • you need data, what happened in the past
  • and a definition of success, the thing you're looking for and often hoping for
  • You train an algorithm by looking, figuring out.

Bias affect algorithms:

Eg:

  1. the algorithm uesd to select persons at the hiring process in Fox News is more likely to succeed usually filter out women because they do not look like people who were successful in the past.

  2. when we send the police only to the minority neighborhoods to look for crime. The arrest data would be very biased and the algorithm to predict the individual criminality would go wrong

The news organization ProPublica recently looked into one of those "recidivism risk" algorithms, as they're called, being used in Florida during sentencing by judges. Bernard, on the left, the black man, was scored a 10 out of 10. Dylan, on the right, 3 out of 10. 10 out of 10, high risk. 3 out of 10, low risk. They were both brought in for drug possession. They both had records, but Dylan had a felony but Bernard didn't. This matters, because the higher score you are, the more likely you're being given a longer sentence.

Solution: algorithmic audit

  • data integrity check
  • think about the definition of success
  • we have to consider accuracy
  • we have to consider the long-term effects of algorithms, the feedback loops

Suggestion:

  1. for the data scientists: we should not be the arbiters of truth. We should be translators of ethical discussions that happen in larger society.
  1. the non-data scientists: this is not a math test. This is a political fight. We need to demand accountability for our algorithmic overlords.

Hope: The era of blind faith in big data must end.


Link:TED

快来加入#1000个TED学习计划#,在“一千个TED视频的探索之旅”中分享你最好最实用的TED学习笔记

日记本
Web note ad 1