There was some Amazon-bashing going on at Hacker News lately and one of the commenters came up with a story about a guy whose business used Amazon for sales and advertising. One day Amazon permanently revoked his account, no explanation given. The comment describes the consequences: "This person lost everything they built over four years of hard work and had to file for bankruptcy when the commitments made by the business to support a multi-million-dollar-per-year supply pipeline caught-up with what had happened."
This particular account may or may not be true, but if you are reading tech press, every now and then you read a similar horror story. Big internet company that acts as a gatekeeper to a specific commercial or social sphere screws something up and some little person bears the consequences.
The interesting part is that it's never an act of bad will or an attempt to harm someone. It's just that the company's software have erred and at scale of millions of users there's no way (or at least no cheap and scalable way) to deal personally with each individual user, answer the complaints or even check what have actually happened.
You may have various opinions on the topic. Many people would reason that if the service you are relying on is free there's no obligation for the company to provide it for you. They can stop at any time they wish.
Later on that day, however, I've listened to a talk by Tim O'Reilly. Among other things he mentioned that the bail system in US is broken in that it keeps of rich out of pre-trial incarceration while keeping the poor in. He goes on explaining that they've built a data model that predicts who is safe to let out and who is not. That way the money is kept out of the process.
Now, that may sound weird. So you are going to rot in jail while your buddy will be released just because some software said so? But you can still make an argument for the system: First, even if it performs poorly, it's still better than bail. Second, you can think of it as a lottery. State is entitled to keep you in pre-trial incarceration. If they let you go it's a privilige, not an obligation. And such a priviliges can be administered by means of lottery. Denmark, after all, used to depend on lottery for military conscription and I never heard any complaints.
To bring the argument further, let's assume that the software, both the one revoking accounts in Amazon and the one used by the local judge, is based on machine learning. Specifically, neural networks.
Neural network is a beast that looks like it have escaped from a gedankenexperiment: It takes inputs, generates some more or less reasonable outputs, but nobody has any idea of what's going on in the middle. In our case inmate's data go in and the machine says either 'yes' or 'no'. Trying to find out why it have decided for either option is a futile experiment.
So for we are good. The inscrutability of the process plays nice with the perception of the whole thing as a lottery.
However, imagine someone collects the outputs from the machine and after a while realises that most of people ordered to be stay in jail are black. The white people, on the other hand, are mostly released. The damned box is a racist!
But the designers of the neural network certainly haven't put any prejudice against black people into the system! They've just assembled some neuron-like components into a mesh and that's it. Also, we have no way to find out what the neural network "thinks". There's no way to say whether it's driven by the colour of the skin or by some other, unknown, factors.
What now?
Are we going to give up on data-driven methodology? If so, the decisions we do will likely be much worse.
Or are we going to give up on justice, crashing random small people under the wheels of the data-driven approach?
And whatever your answer is, keep in mind that it applies both to the jail case and the Amazon case.
Martin Sústrik, August 19th, 2015
The question is twofold:
a) Reliability: Can we allow the existence of error when the cost of error is high? If there is no other technical way, then we simply accept it, but there might be solutions that are costly but without errors.
Would we allow errors if the program decided on the death sentence of prisoners?
b) externalities: When we decide to maximize a welfare equation, one needs to look whether that function maximizes a quantity that is good for everyone.
Did Amazon's equation maximize the value of its customers or the reduction of the costs of its infrastructure?
If Amazon was liable for any damages due to errors, then that would make them design a data-driven system that is good for all.
a) sometimes errors are inevitable, at least until people make programs. consider self-driven cars. They should be programmed for killing, e.g. a case when it have to choose to kill one person inside or 5 people outside. It's quite a challenge to cover all possible real-life situation, so errors are inevitable. On the other hand even error-prone (to some reasonable extend) self-driven cars can save lots of lives while the system is under constant improvement. Would we allow errors here? I hope so.
b) it still can be a question of cost. If errors are not so often compensating loses can be cheaper than fixing error. Consider credit card fraud cases. Banks usually just compensate loses even not investigating most cases. Just because it's too expensive and not so effective.
I don't really agree with your characterization of "data-driven -> machine learning". I think that's a significant step, and I would argue that in fact machine learning algorithms will be prone to being less purely data-driven than more primitive algorithms.
In the end, it seems that using data-driven algorithms with human moderation is the best solution. That would allow removal of some of the prejudice of human beings, and the huge volume that machines can take, combined with the higher-level reasoning that human beings can take for the hard-to-decide edge cases.
I'm not so sure I share your sentiments about the neutrality of a neural network. You seem to take it for granted that an algorithm must be neutral if we don't understand how it works. I don't see the line of reasoning there. It seems perfectly possible that it might produce unjust results, despite that (or because!) we do not understand how it works.
To state the obvious: a neural net is just a function. It has inputs and outputs. And even though its parameter space is too large for us to reason about, we can still observe its performance and make human (and humane) judgements about how well it is doing. It's not necessarily neutral, and I'd be perfectly comfortable calling an assemblage of bits "unjust", especially if it were keeping all the black folks in jail, but letting the white folks out.
And the obvious question I think you passed over: neural networks are not really a product of the assembly of neurons as you say, but rather the weights assigned to those neurons during training. And if your training data are biased directly (perhaps they include race?) or indirectly (perhaps they include data which correlates to race, e.g. poverty?) you could well end up with a network which is making decisions based on human failings, rather than any sort of wisdom. As we know, police are far more willing to arrest and charge minorities than they are rich white folks, so I am quite comfortable claiming that data aggregating crimes will be biased in favor of keeping minorities in jail.
I know you're not arguing directly for such a system for handling bail. However, since you used it as an example to support a "data-driven" approach, I felt the need to respond. I want society to be more data-driven in general. However, when we use data to decide things, we can only be as smart as the algorithms we use when we process those data into decisions. When we use the wrong algorithm, and it produces bad results, I would argue that *yes*, the people who chose the algorithm are to blame.
One bit martin (perhaps intentionally to spark the discussion) left out. The machine can most certainly become racist, because our current ML techniques pretty much amount to "generalizations box" - it's a magnifying glass of our collective data, including prejudices - but not true consciousness making its own decision.
In practice - because of institutional racism, a lot of black people are overrepresented in crime in the US. We feed this past overrepresented data as the learning dataset and the machine creates neuron weight circuitry for "aha! black person, most likely a criminal!". Incidentally, it's the exact same mechanism how prejudice self-amplifies in the general population, too.
One possible approach could be "opinion affirmative action". Basically introduce reverse bias in the input data against known prejudices - even if it counters past statistics. And hope that it will break the previous feed-back loop of self-fulfilling prophecy (fe. assumptions that black people are criminals is what actually may set em on criminal path).
Same goes for a lot of ML systems. You won't get justice, only a coarse statistical estimation based on previous observations - but not true rational reason, unless you give it far, far more data (as well as advanced reinforced ML training techniques such as generating hypotheses from the corpus and testing those). This is far more difficult than just dumping some narrow database data from the past, yet is what most commercial "screening" oughta do, otherwise they merely automated the coarsest of human prejudice.
Post preview:
Close preview