BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

Big Data And The Problem Of Bias In Higher Education

This article is more than 4 years old.

Getty

The explosive use of big data, predictive analytics and other modeling techniques to help understand and drive outcomes in all types of organizations has significantly increased over the past decade.

Advocates of artificial intelligence enthusiastically tout the benefits of data to predict and, in some cases, alter key processes and outcomes. Higher education institutions are no different. They are increasingly turning to predictive analytics to help understand and improve student success.

While it is true that there is power in predictive analytics, they are no panacea — especially not within the context of diversity and inclusion. Concepts such as “AI” and “machine learning” are assumed to be neutral by definition, yet all predictive models are shaded by human judgment, which we know falls far short of being error-free.

Decades of research on implicit bias show the limitations of human decision making across a number of settings. A recent report from the Ohio State University’s Kirwan Institute for the Study of Race and Ethnicity specifically cautions against the use of predictive analytics. The report asserts that there are potential cognitive and systemic racial biases that impact both the design of data models and the interpretation of their findings.

Thus, it’s fair to ask: Given the human element, can big data ever be bias-free in the context of diversity?

Probably not. But even with these risks, the need for predictive analytics and data-based models within higher education is clear. Many institutions are using big data models in an attempt to improve student outcomes in retention, graduation, engagement and career placement. The use of data in higher education is becoming a competitive advantage for institutions to meet annual enrollment, retention and revenue goals. Student data is gathered to craft models in order to predict their choices, actions and outcomes. In addition, decisions about student support services, programming and resources are being made based on this data.

While some cheer this development as much-needed progress that helps higher education to become more data-driven or evidence-based, others are raising a red caution flag. They point to a range of issues such as the accuracy, security and privacy of the data and the potential for a diversity bias against minoritized and underrepresented student groups.

The increased focus on student success is important given the rising cost of college and student debt. This has led some colleges and universities to use data analytics to predict the types of students who are more likely to need support from academic advising in the form of early intervention. Other institutions are using predictive models to offer adaptive learning tools that faculty can use to help identify and assist students who may need additional support in the classroom. Admissions managers are increasingly relying on predictive analytics to improve enrollment plans, target marketing efforts by student segments and provide customized scholarships and financial-need awards.

What could be wrong with these examples?

Some argue that predictive models must avoid over-reliance on key student demographics such as financial background, family economic status, race, gender or cultural background. For when inputted into predictive models, these data points can easily perpetuate the structural and historical inequities that persist in access to education.

Furthermore, the use of this demographic data can ignore factors that may be absent from big data sets but are nonetheless powerful predictors of key outcomes for diverse student populations. Because data is seen as unbiased, there is a tendency for predictive models to be perceived as a true picture of fixed outcomes rather than what they truly are: educated guesses that use the past to predict what will happen in the future.

Consider another example. Using zip-codes to predict outcomes, preferences or other key factors ignores the historical impact of redlining, which has a demonstrated negative impact on African American and Latino families. This is an example of the association bias that occurs when the data used within any predictive algorithm or model has inherent biases associated with gender, race, ethnicity, culture, etc.

Another source of caution is confirmation bias. It happens when the people who design predictive models tend to look for and use the information to support their own preexisting ideas or beliefs. In these cases, information that fails to support their thinking is disregarded or discarded. Confirmation biases can be especially problematic in cases of diversity because it often takes place in situations when we want certain ideas to be true. This can impact both the kind of data that is selected (or not selected), as well as the interpretation of which data is relevant or appropriate for our so-called unbiased models.

One way to combat bias in our predictive models is to expect it. That begins by owning up to the fact that the selection of data, the definition of models, the interpretations of findings and the actions taken based on these models are inescapably influenced by the same implicit biases of everyday human behavior. We are human after all.

Placing the label of predictive analytics on data does not ensure that errors in judgment will not take place. Nor does it mitigate the fact that these biases are extremely problematic within a diversity context.

Therefore, we must constantly challenge the source, assumptions, method of data gathering, the interpretation and the use of data — especially when it comes to underrepresented or minoritized groups. Within higher education, we should focus our attention and our resources on not merely modeling the trajectories of outcomes but also on the innovative approaches to changing or disrupting them.

Some people may continue to argue that big data is neutral and unprejudiced. They are wrong. We should continue to move forward with an embrace of data-driven decisions and predictive models. And while doing so, we must be forever mindful that while data may not discriminate, people still do.

Follow me on Twitter or LinkedInCheck out my website