5by5: Access and Algorithms with Lisa Nash

A 5by5 Conversation with Lisa Nash, Designer + Data Scientist at IDEO about how is access changing based on algorithms and what can we do to get it right?

 

Interview by Twisha Shah-Brandenburg and Thomas Brandenburg

“The role that diversity plays in data science is similar to the role that it plays in other areas of design. Without diversity on teams you’re a lot more likely to overlook some of the people you’re designing for or design something with unintended consequences for a specific group of people.”

 

Question 1
Is it possible to create a completely unbiased algorithm?

The term “algorithm” can be used to describe a lot of different things. In general terms, an algorithm is a process that you use during problem solving. Usually it’s used in the context of talking about solving problems with a computer.

For the context of this question, I’ll discuss predictive algorithms. Predictive algorithms involve analyzing existing data to infer outcomes for the future. For example, we could be talking about analyzing data about demographics, attendance, grades, and discipline record etc. for students with the intent of predicting whether the students will finish high school.  This would involve “training” a model (algorithm) with a large-ish data set with “features” (such as grades, discipline record, attendance record etc.) and known outcomes for a group of students. Prediction is performed when we have the features for a student, but don’t yet know the outcome.

Our algorithms are trained on data that may have bias, so bias may be inherent in the predictions.  I think it’s important to make a distinction here that the algorithm isn’t the thing that’s biased per se. The point of algorithms or models is to pick up on patterns in data—that’s exactly what it’s intended to do. But that can have bad consequences when the algorithm picks up on and reinforces prejudices that exist in our society and are reflected in our data.  There are ways to mitigate those effects, which I think is what we’ll get into with some of the other questions.

It’s also important to remember that human beings are the ones who decide what to do with the output. Your algorithm may tell you that a student is 80% likely to drop out of school, or that someone is 90% likely to default on a loan—but it’s up to humans to decide how much to trust and what subsequent actions to take with those predictions.

 

Question 2
What is the role of diversity in the planning and creation process of algorithms?

The role that diversity plays in data science is similar to the role that it plays in other areas of design.  Without diversity on teams you’re a lot more likely to overlook some of the people you’re designing for or design something with unintended consequences for a specific group of people.

An illustrative example of this is the “racist soap dispenser.” Automatic soap dispensers transmit infrared light and measure the amount reflected back to determine when to dispense soap– soap is dispensed when the measurement surpasses a certain value. The designers of some soap dispensers didn’t take into account the fact that darker skin reflects less infrared light into the design of their dispensing algorithm. This led to a case where soap dispensers that were actually installed in a hotel did not dispense soap for darker skinned individuals. This kind of oversight is much less likely to happen when a design team is diverse. .

In my work at IDEO, data science components (predictive algorithms or other) don’t exist in a vacuum and are usually in service of some user. Diversity of thought helps us as we consider not only what will be helpful, but helps as we consider what kind of data we’re collecting, how we use the data, and potential repercussions.

 

Question 3
Blind orchestra auditions were introduced to keep biases in check so that the focus is on the music and not the demographic information that can make the decision-making process subjective. What might we learn from this as we design algorithms that make decisions?

In designing any kind of algorithm, we can choose to leave out anything that we think may explicitly introduce bias (e.g. demographic data). However, that doesn’t necessarily mean that we’re excluding bias. Some variables (that aren’t specifically demographic) could be highly correlated with demographic variables like race or gender that are excluded.

An example of this that I find interesting is one that comes from my days as a physics PhD student. The physics graduate records exam (GRE) is a multiple choice test that is supposed to assess the applicant’s mastery of undergraduate physics material (and therefore be a predictor of success in PhD programs). However, studies have shown that women and underrepresented minorities perform significantly worse than white males, and high scores are not indicative of measurable aspects of success in grad school, like publications or even completion (link, link). Even if demographic data is not considered, including the physics GRE score still skews the demographics of grad school programs.  This realization has led some schools to exclude the GRE score and consider other factors with more weight. They’re seeing an increase in diversity of students without sacrificing the quality of their programs (link).

In the design of predictive models and interventions, we need human beings who are able to think through the connections between different variables and the implications of algorithm design. This necessarily means that algorithms aren’t a “black box” and that whoever is designing the intervention understands the meaning of what is going into and coming out of the model.

 

Question 4
What are the long-term effects of feedback loops? How might data scientists / designers and engineers think about and monitor their data?

In data science, feedback loops are sometimes essential because they can be used to improve the predictions of the model. Like anything, though, these feedback loops can be harmful when they reinforce biases in data.

For a concrete example, let’s examine the use of data science and predictive algorithms in policing. In these cases, law enforcement agencies predict where crime might occur and who might commit it. People and places predicted by the model can then be monitored more closely. There is reasonable concern that these practices will lead police to target minority communities and individuals. Because predicted people and locations are monitored more closely, they may see higher arrest rates for the same level of crime. Arrest data is dependent on both the level of crime and enforcement for a community. A related use of predictive models is to assign risk scores to criminal offenders. Evidence suggests that black offenders are falsely flagged more often than white offenders as being high risk.

To deal with situations like this, data scientists/designers can think through the implications in initial algorithm development. Algorithm developers have to consider the true meaning of their data (e.g. in policing we only have data about arrests or convictions not actual crime). If data scientists don’t have the expertise to understand the context, they can get input from experts in fields related to model predictions, and devise plans for monitoring results later on.  

An effort must also be made to educate those who will be using the model’s output to make decisions about the actual meaning of the results. As I think I’ve said before, algorithms that are used to make major decisions about individuals shouldn’t be black boxes to people with the power to make decisions.

(By the way, there are lots of interesting articles to read about predictive policing and bias: here, here, here, here )

 

Question 5
What is the future of data science? What signals are you looking at that are making you excited and worried?

I am excited about the push for human-centered data science and the realization that data science should involve creative design. At IDEO, we design based on user needs instead of merely looking for all the possibilities of what we can do with data. Designing in this way gives us some natural guardrails because data science is always in service of a user.

This perspective isn’t unique to IDEO, and many people are discussing the merits of human-centered data science. It’s great that we’re having venues for the discussion of these ideas like IIT’s Design Intersections conference and this year’s EPIC conference.

 

Interested in this topic? Register to be part of a larger community at the Design Intersections conference in Chicago May 24-25, 2018.