In may of this year, shortly before giving an acclaimed ted talk elaborating on many of these points, zeynep tufekci wrote the following in the new york times.
If a bridge sways and falls, we can diagnose that as a failure, fault the engineering, and try to do better next time. If Google shows you these 11 results instead of those 11, or if a hiring algorithm puts this person’s résumé at the top of a file and not that one, who is to definitively say what is correct, and what is wrong? Without laws of nature to anchor them, algorithms used in such subjective decision making can never be truly neutral, objective or scientific.
Programmers do not, and often cannot, predict what their complex programs will do. Google’s Internet services are billions of lines of code. Once these algorithms with an enormous number of moving parts are set loose, they then interact with the world, and learn and react. The consequences aren’t easily predictable.
First, i want to break down two issues here:
- One is the unintended consequences of designing an algorithm.
- The Second, are the unexpected consequences of performing a complex task atop a set of opaque abstractions.
The two seem similar, they're definitely related, but they are different. I'm going to talk about the second because it's more insidious than the first. It is easier to look at an algorithm and see outright bias, it is harder though when the behavior is a side-effect of the unexamined limitations made about the underlying system.
I want to focus on zeynep's google example first because when i was in college, my system design class devoted a section of reading to looking at large scale systems. One of the fun ones at the time (2005) was google's mapreduce: designed both to make parallelism accessible to lowly mortals and simultaneously show off their clever use of commodity hardware, as you can see in the mapreduce paper's abstract:
Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.
Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.
Emphasis mine1 because the whole crux is that mapreduce allows for the peasant ails of commodity machines while doing the best it can to insulate you from catastrophic failure and returning some usable results (and performantly!). Plus, you don't even really need to understand parallelism to make good use of it!
There was a moment while we discussed the paper where we had this cute illustration, i'll do my best to explain it since it's basically a funnel: there was The Cloud™, it represented all possible data on the internet. A subsection of that cloud was the amount of the public-facing internet that google had indexed. A subsection of that cloud was the entirety of the results google knew about for your given query. And finally, a subset of that cloud represented the results of your query that had survived the search job* either due to timeouts/latency/commodity hardware/solar flares/etc2.
The point being that for almost all queries, there were an impossibly large number of possible results for the specific thing you were looking for and (improbably) the least novel thing was getting your search results within some window of time3. In a mapreduce
world, the system 'deals' with the issue of a result not showing up by returning whatever it has at the end of some timeout. This works because pretty much any internet user, has absolutely no clue how many of the results could be of any actual value (to say nothing of how many results there were to begin with). So what if pages 4-6 of the Total Results don't show up? Do you even go that far? Would you notice? (at the time, and still today, people tend to not look past the first page of results). Even if one or two links are missing, you still get most of the relevant data back (better than nothing).
slowly learning a few things about distributed systems pic.twitter.com/C2gRnHgau4
— Julia Evans (@b0rk) November 1, 2016
When the number of results for your query is in the tens of thousands, does it matter if 4500-4520 are missing (or even just slightly misordered)? You may scoff, but this is a legitimate question! If it matters, much like a tree in the woods, does it matter if you never look at the results (or are statistically unlikely to look at them). Does it matter if the results are correctly ordered the next time you reload the page?
This is not to say that there isn't bias in google's search result algorithm (it thankfully is biased against some kinds of harmful content), but that there is a belief that any result is better than no result. (for something not mission critical, you can argue that focusing on Availability is not at all unreasonable, at least that's my claim)
Facebook's lee byron (in a very interesting post about building a feed) makes the following observation about the benefits of lazy-loading a dataset presented to somebody browsing a feed.
How many followers will the most popular content sources have?
Twitter used to call this the "Bieber problem": when he tweeted, a fan-out operation occurred, placing this new tweet into the feeds of every one of his followers which was a massive job that locked up whole machines. In Twitter's earlier years, a couple quick @justinbieber tweets in a row could "fail whale" the whole service.
The "Bieber problem" is an illustration of a critical decision of the architecture of a feed service: do you build a user's feed at write time or at read time? Twitter's architecture placed tweets into feeds as soon as they were written into the service. By contrast, Facebook's architecture waits for a user to read their feed before actually looking up the latest posts from each followed account.
...
Should the story at the very top of your activity feed be the very most latest thing that happened, or should it be the very most interesting thing that happened? This is more of a product decision than an architectural one, but it has serious implications on your feed architecture.
I would argue that he understates how much archituctural choices can retcon their way into product decisions. If a tweet is deleted from my feed as i'm reading, can i really be all that angry? "Who's right" in that case aligns with the moral quandry of the probabalistically failing google mapreduce server: if @justinbieber deletes a tweet but you're not finished reading it, should you have been entitled to finish it?
A similar bit of architectural-influenced thought recently surfaced in a post from tristan harris, a google design ethicist
For example, imagine you’re out with friends on a Tuesday night and want to keep the conversation going. You open Yelp to find nearby recommendations and see a list of bars. The group turns into a huddle of faces staring down at their phones comparing bars. They scrutinize the photos of each, comparing cocktail drinks. Is this menu still relevant to the original desire of the group?
It’s not that bars aren’t a good choice, it’s that Yelp substituted the group’s original question (“where can we go to keep talking?”) with a different question (“what’s a bar with good photos of cocktails?”) all by shaping the menu.
Moreover, the group falls for the illusion that Yelp’s menu represents a complete set of choices for where to go.
which, emphasis mine. and while i personally feel less morally pigeonholed by my phone (or yelp) (and i believe that tristan believes that people ought to be empowered by theirs) i really don't know that i am legitimately perturbed by yelp's incomplete view of the universe (to be clear, i don't want yelp asking me why i feel compelled to visit a bar right now (and self-improvement aside, i don't know that others would either)).
I'm reminded of this classic 'quote' from magician Teller about magic:
If you are given a choice, you believe you have acted freely. This is one of the darkest of all psychological secrets.
and again from tristan
We don’t miss what we don’t see.
I don't believe that any programmer (or ethicist) at google (and, from the way lee byron writes, those at facebook), would pretend that their algorithms behave 100% predictably at all times (as we nerds would say deterministically
) each time they're run. The behavior of a feed or a search result page is, by necessity, the hybrid product of an architectural performance optimization, the whims of the hardware on a given day, solar flares, and hosts of other reasons, many of which are product/business-oriented (or, ways for you to make the service profitable).
Which, again, i want to return to that first excerpt from zeynep's piece:
If Google shows you these 11 results instead of those 11, or if a hiring algorithm puts this person’s résumé at the top of a file and not that one, who is to definitively say what is correct, and what is wrong? Without laws of nature to anchor them, algorithms used in such subjective decision making can never be truly neutral, objective or scientific.
Programmers do not, and often cannot, predict what their complex programs will do.
While i go along with much that zeynep is argues for, trying to compare physical structures to the inner-workings of a database strikes me as completely out of whack. Similar to people who trust that "digital" provides some kind of archival panacea but don't realize that digital data (unless redundantly archived on resiliant physical media) is subject to corruption, bit flips and other sundry damage4. Once your datastore spans multiple "availability zones"(continents) and needs to make interesting guarantees about atomicity, consistency and partition tolerance, i argue that you've gone way beyond the complexity of a bridge.
to claim that a system that gracefully tolerates hardware failure exhibits a bias is 100% true, and there are other biasing factors at work in any of these systems (which, see the depressing end of the spectrum). But it also gives the laws of nature an easy pass—what does it mean to have a law of nature anchor a search result? Do we choose a system that behaves predictably at the expense of convenience? Certainly the decision to "show the truth" vs "show a good approxmation" is an ethical one, but how big of one is it? When does data-center latency become ethically fraught? What sorts of disclaimers do services need to furnish us with? What about our friends?
Julia Evans has written a fair bit about distributed systems and what people mean when they talk abou things like http://jvns.ca/blog/2016/10/21/consistency-vs-availability/.
... I know what availability is! It means I don’t get paged because the system is down. Basically it means if you ask, you can always get an answer! I like that. What’s wrong with that?
So! let’s say we have 100 computers in a “cluster”. I think “consistency” kinda means “if you query any computer in the cluster at the same time it will tell you the same thing”. It doesn’t really mean that (that is actually legit impossible because of the speed of light) but let’s go with it.
There was a time when we were lucky enough to interact with services that could plausibly run without concern for CAP and our conception of data-morality was simpler: either the server was down, or it wasn't. A successful query represented the only truth at single point in time. The massively redundant world we live in now is much much more complex. But developers, designers, and project leads are often insulated from this sort of infrastructure and what it means for data being returned from a query. If you lead or design a project, when was the last time you asked about whether a backing api request favored being "consistent" or "available" and what that meant for your product's consumers?
When you ask yelp for a bar and it suggests a park, what sort of moralistic puritanic editorializing is that? also, how do you know it's not? when a hardware failure prevents you from seeing a retweet for a minute (but lets you see the rest of your feed), what kind of bias is that? Why are we so offended (and scandalized!) when we're presented with the equivalent of a gap in memory (or best-available information)? Is it because we don't know the reason behind it? Is it because it exposes the nature of our relationships with these services? When you're shopping around for services do you factor in Consistency vs Availability in your decision?
Some other unstructured thoughts:
My system design class also introduced another article about the Therac 25 that left many in the class somewhat scarred. For many of us, the short-term takeaway was "thank god i don't have to write medical firmware," but you see the moral fallout of the therac if you peek at any code or system for any appreciable length of time.
I see this argument quite often that engineers have it easy because physics and statics can be modeled in a predictable way. When bridges fall due to hardware failure or poor design (or freak accidents!) people can die. When your CRUD application gracefully handles a backend failure and rolls back an optimistic ui update, i don't really know what to tell you. They're not on-par.
I also think it's strange that the counterpoint to dumb ml-algorithms are suggestions which seem to defeat the purpose of the relevant platform (and often with a *very WASP-ey bent!): i want a popular bar (have you tried a park?). I want to see what my friends posted (are you being challenged enough?). I want some search results (who is john galt?).
I'm not going to defend google or facebook, both of whom have had significant issues due to their scale and the tragic effects of systems applied to a heterogeneous world. but it goes without saying that it's problematic to treat their unintended consequences like a therac-25's (which, again to be clear, i am not giving them a pass, please read this troubling article for a counterpoint to my claim) when they are much more like a highly opinionated video store employee that reads you (without asking) and then suggests a movie they're not really that into, but think you'd probably like. Even when google and co. get it wrong, we aren't often talking about damage on par with the effects of shoddy bridge (or car) engineering (although again, that sentencing algorithm and minority-excluding ad dashboard is again the absurdly dark counterpoint).
Facebook influences more people than any country, it has a moral responsibility to be factual and educational. Hasn't risen to it. https://t.co/pMdbLbxhqy
— Stephan Ango (@kepano) October 21, 2016
When the stakes are legitimately low, do we need transparency in how google or facebook make decisions about our newsfeed? that's a very good question! should we have transparency as to why my server is upselling today's special appetizer? should the server tell me that today's french dip soup is the result of some roast beef's last day of usable shelf life? should i ask the record store employee what cultural basis grounds all the recommendations they give me? Should a sales person be required to tell me up-front they work on commission? Or do we regulate some of these things instead: what if in addition to a privacy policy, each website advertised its Consistency or Availability infrastructural choices?
Here's a funny story: the other day i came down with a terrible head cold leaving me comically congested. So i walked to the pharmacy and began to buy a pack of sudafed. However, the Methamphetamine Menace of the early aughts and War on Drugs require that I supply my driver's license when purchasing pseudoephedrine to ensure that i haven't been purchasing untold quantities of pseudoephedrine for my purported walter white side-gig. But i say “began” because the electronic logbook that particular safeway used was offline. Or maybe it wasn't the website, but the network of the safeway pharmacy. Or who knows. At a very tangible level, my sinuses were the victims of a law that, for liability reasons, was implemented by everyone assuming a stable link-layer and an Available web service. Laws are meant to be general-purpose but when they imply a hard dependency on things like a database, what sorts of laws govern its audits and security? Should safweay be accountable to the question of security of their logbooks? (the violations are steep and result in a federal misdemeanor). Given the risks, would you buy sudafed from a pharmacy or state that had a poorly implemented logbook? We talk about transactions being problematic because you could double-charge a consumer, but what about an unthrottled double-entry of a controlled pharmaceutical that triggers an immediate police response?
I'll close this out with a thought on google glass. I still believe that google glass' failure wasn't a foregone conclusion, but it was poorly socialized. There were absolutely no norms around it and so it became odd to see people taking it to the extreme as users and other folks assuming the worst when going about their day. It was too many new things at once. But if you limit the scope, maybe you can have a better go at things.
Spectacles are going to up many people's Snapchat game, but I can't get past the power of capturing moments I never had a chance to capture pic.twitter.com/htjmMAEn8Z
— Sean O'Kane (@sokane1) November 18, 2016
w/r/t services, i am not going to cop-out and say "you don't have to use it," because the choice isn't always ours if we intersect with somebody else's use of some bias-ridden service. How do we resolve being tagged in a service we're not members of? Do we pass it off on the person doing the tagging or do we enforce policies on services? What about when the result isn't just annoyance but actively harmful?
For many of us that have used some form of irc/icq/aim/zephyr/jabber/etc for upwards of 20 years, the norms around slack communication feel oddly formal compared to prior systems which felt truly asynchronous (often responding to a message 12 or 24 hours later). The pitch "slack is the email killer" subconsciously migrates social expectations from email (attentiveness, timely replies) to the norms of irc or im (not all messages need an ACK
). That's also under-selling the issue, but it's definitely part of it. Do workplaces that caveat 'important things need to be emailed' experience the same level of slack-anxiety? most workplaces eventually establish norms around communication, but it feels like slack gets left out, which has caused its own set of problems.
One of the hallmarks of bad software is it believes it is something more than a tool you use to accomplish a task. pic.twitter.com/YRarXyNdq1
— Steven Frank (@stevenf) July 28, 2016
i talk about norms because when we go to a restaurant, we know(or have learned) why our server is upselling us. When our friends tell us to skip the bar, we understand the reason behind the diversion. We have built up norms around these sorts of social situations: servers want to sell specials and perform well (and depending on the establishment, pad out their tip), our friends may no longer be drinking, or maybe their throat hurts from shouting at last night's bar or whatever.
So what of the societal norms around the apps we use? We've been around long enough now to see at least one and a half dot-com busts and we've been witnessing the effect of companies who put off monetization for easily five to ten years now (or end their incredible journeys as the result of selling their company to somebody else). Still, each time we act offended when we discover that these free apps are also working to earn their investors and employees a buck. Next time it will be different, though.
As a field, we designers spend a lot of time talking about how to design with empathy and grace. Still, i suspect the problem is mostly that people expect apps to behave like utilities, deterministic and honest in their focused execution. Now, we're discovering there's something strange about our landlord or our electric company talking to (and empathizing with) us like our best friend and then shaking us down for extra money when the conversation takes a turn.
latest @magic growth hack starts by acting like an old friend, ends by asking to "hop on a call for 10 minutes" 😒😬😡 pic.twitter.com/PyQedn5yeh
— Michael Grinich (@grinich) July 18, 2016
although the meaning is now coded, at the time, the whole point of using the phrase "commodity machine" implied "this computer can barely gurantee an SLA" and "lol @ the lavish costs others spend for sparc/dec/ibm/etc servers running commercial bsd/vax/irix/etc." which made the abstract a sort of boastful triumph of architectural cleverness over cost and software over the unreliable nature of cheap hardware.↩
(*map-reduce was, at least then, not actively used for search, but it was a good enough proxy for the behavior of a large fault-tolerant system that favored partial output to no-output)↩
In the world of Databases and Computer Science, this "you get what you get and don't be upset" behavior is usually referred to as Availability and is usually pitted against Consistency, whose behavior is more like "i'll give you the most honest answer i have, but it may take a while (or you may never really get it)". The sophie's choice between the two of them is discussed up in the CAP theorem.↩
and for a multitude of reasons, the physical copy has its own problems.↩