The Dark Side of Big Data and Predictive Analytics?

Jeremy Wu

Retired, Federal Government

Published May 31, 2016

Since 2000, data sources have greatly expanded to include those generated in the Internet and by sensors. These digitized data accumulate rapidly in massive amounts, and can be linked, tracked and stored over time. Known generally as Big Data, they promise to enable unprecedented understanding about our society and economy.

Predictive Analytics – the application of algorithms and methods to digest and make sense of data – offers to predict future events to help spur economic growth and improve our quality of life. By uncovering the useful patterns hidden in data, a predictive model, combined with useful predictors and insights, may produce a quantitative score or a descriptive category to assess the chance of an event occurring. Informed decisions and policies may then be made by individuals, corporations or governments.

A classic example of Predictive Analytics is the credit rating used in the financial sector. Based on predictors such as a person’s marital status, income and payment history, a model can produce a score to indicate the likelihood of the person making future payments on time. The credit score is then used to approve or reject the issuance of a credit card or a loan application.

Applicability of Predictive Analytics is flexible and broad. If a person is replaced by a firm, the approach can predict the chance of the firm going bankrupt in the near future or assess the financial conditions of an entire economic sector collectively.

The Target Story

Target's prediction of the pregnancy of a customer in high school before her father's knowledge has been widely cited by advocates as the new power and success story of combining the use of Big Data and Predictive Analytics.

By applying a model to every female shopper in its database of up to 70 million customers, the giant retailer was able to not only assign a “pregnancy prediction score” on the probability of her being pregnant, but also estimate her due date within a small window. Target sent coupons to the likely pregnant customers timed to the different stages of this habit-changing moment of their life to forge long-lasting purchasing relationships. The approach was reportedly so successful that the angry father of a customer near Minneapolis complained about Target sending the marketing materials to his daughter who was still in high school, only to have to apologize later after he learned that his daughter was indeed pregnant.

Target’s approach has scientific appeal. It collects massive amounts of data on each customer through individual purchases and “cookies” on its website, and links them with other data it acquires commercially. Its baby-shower registry records the shopping habits of pregnant customers and identifies specific products as part of the reliable predictors. Stages of pregnancy are validated by public birth records. The predictors and algorithms are tested and evaluated for consistent patterns. The empirical evidence is further supported by decades of research into the science of habit formation in neurology and psychology.

Based on what is known about the approach, Target’s predictions of pregnancy will perform much “better” than random guessing in terms of accuracy and reliability. The angry father was an exclamation mark to confirm this “success.” In fact, the Target predictions were so accurate that coupons for pregnancy products had to be mixed with those for irrelevant products as noise to reduce privacy suspicions and concerns.

Big Data Predictive Analytics is not isolated to the marketing industry. Their extensions to economics, social science, and health care are well known and have exploded in number and scope since 2000.

National Security: Terrorism, Economic Espionage, and Insider Threat

In the aftermath of the 9/11 terrorist attacks, the U.S. greatly expanded its surveillance programs on foreign targets and its own citizens. The existence of some of these programs and the magnitude of their operations were not known to the public until 2013. Subsequent reports revealed that tens of millions of Americans have been and continue to be subject to secret, digital surveillance by the U.S. government. The captured individual data range from emails, files, chats, photos, to videos, as well as phone calls.

For example, the top-secret PRISM program is now known to be launched in 2007 to intercept Internet communications against foreign terrorist targets operating outside the U.S., but information about U.S. citizens or anyone residing in the U.S. may also be captured “incidentally” in the process.

The surveillance programs started in 2001 were under questionable presidential authority. Congress enacted the Foreign Intelligence Surveillance Act (FISA) Amendments Act in 2008 to transition from the original clandestine programs and the expiring Protect America Act of 2007.

The amendments inexplicably expanded the surveillance targets from “persons linked to al Qaeda or related terrorist organizations” to all targets for foreign intelligence. As a result, the surveillance programs on suspected terrorists were extended to also cover potential spies. This legislation was reauthorized in 2012; it will be up for reauthorization in 2017.

The amount of data collected under these surveillance programs is enormous even by Big Data standards. The Washington Post reported that every day, collection systems at the National Security Agency (NSA) intercept and store 1.7 billion e-mails, phone calls and other types of communications. According to The Guardian, NSA collected 97 billion pieces of intelligence from computer networks worldwide in March 2013.

So how does the U.S. government collect these Big Data and what does it do with them after they are collected?

They are obviously well beyond any number of human personnel can process and digest. A 2014 report by the Privacy and Civil Liberties Oversight Board (PCLOB) describes “702-tasked selectors” for collecting data and “minimization procedures” for conducting queries. Very little is known about the algorithms – the steps, rules, and mathematical formulas – that are embedded in the electronic “black boxes” from data collection to the production of analytics and profiles. Therefore, potential errors, biases, and prejudices may move from human into machines and become embedded when transparency and accountability are absent.

While acknowledging its potential to “enhance accountability, privacy, and the rights of citizens,” a White House study warns that Big Data technologies and analytics “have the potential to eclipse longstanding civil rights protections” and “can cause societal harms beyond damages to privacy, such as discrimination against individuals and groups.” According to the report, Big Data “unquestionably increases the potential of government power to accrue unchecked.”

Nonetheless, Americans have been asked to trust the secret processes because they are “legal and efficient” to prevent terrorist attacks; there are “safeguards” against abuse and misuse of authority; and deviations from the Fourth Amendment and traditional American values of privacy and civil liberties are small but necessary sacrifices. Since 2013, many of these assurances have proved to be at least misleading if not simply false.

Under Executive Order 13587 issued in 2010, federal departments and agencies have also been directed to establish insider threat detection and prevention programs. The “insiders” cover current and former federal employees, contractors, and extended relations to an individual under investigation. The extensive surveillance may include “keystrokes, screen captures, and content transmitted via email, chat, and data import or export.” At the same time, the government has been making rules to exempt itself from the basic requirements of existing privacy laws under this program, forbidding individuals to access their own security files and allowing erroneous information to stay in their files.

Emerging Pattern of Wrongful Profiling and Prosecution

Even when the predictors are ideally chosen and the algorithms are perfectly developed in the Target example, the prediction of a hypothetical woman being pregnant with a probability of 87% may still be wrong 13% of the time. When applied to millions of women, a large number of the predictions will be wrong.

Conditions are seldom perfect. Big Data Predictive Analytics can be wrong in many ways for many reasons, ranging from an unproven inferential foundation, biased data sources, missed safeguards, human mistakes and prejudices, to malicious intent.

While false positives are recognized to be a real risk, there are few known discussions about its consequences in the Target story. Conceivably, the marketing materials turned into junk mail and discarded; some angry fathers may receive an apology from the customer service department.

The consequences can be more serious in other applications. False positives in a credit rating will deny a person or a firm a much needed loan. Being wrongfully accused of spying for another country can ruin an innocent person’s life and career, including financial status, employment security, professional reputation, and health of family members.

The expanded authorities from the FISA Amendments Act in the late 2000s coincided with the public campaigns by the Department of Justice (DOJ) and the Federal Bureau of Investigations (FBI) against economic espionage, with China identified as the primary culprit. It has a high correlation with persons of Chinese origin.

A disturbing pattern of wrongful profiling against Chinese Americans has begun to emerge in recent years. Intercepted emails and similar Internet communications formed the critical evidence used in many of these prosecutions. Despite numerous calls and petitions for explanations and apologies by Congressional members, community and professional organizations, and major media such as the New York Times, the U.S. government has so far failed to provide a satisfactory response.

CBS’s 60 Minutes reported on the “collateral damage” of Americans wrongly accused of espionage-related crimes on May 15, 2016. According to the report, DOJ has won convictions in 14 cases related to Chinese economic espionage since 2012. It lost one case at trial. Charges were dropped against five Chinese-born scientists, all American citizens.

In another case, a white former Boeing manager, whose wife is a Chinese American, was alleged of Chinese economic espionage. He was subsequently prosecuted for another crime unrelated to national security after the information collected under a FISA order was repurposed. There are open questions that evidence was planted to extract information about espionage activities that were not present. In a classic catch-22 situation, the accused was not allowed access to the government’s arguments in the FISA warrant due to national security. In the meantime, his career and family life were totally ruined.

Another prosecution involving two computer engineers – a Chinese American and a Chinese National – was dismissed after partial acquittal and a hung jury. During the questioning, an FBI agent testified under oath his personal belief that any U.S. citizen seeking funding from the “863 Project” is illegal or wrong even if it does not involve trade secrets.

The stated function of the Chinese “863 Project” is similar to the National Science Foundation in the U.S. Unlike terrorist groups for which the U.S. government publicizes, there is no official list of “economic espionage organizations” that U.S. citizens are advised to avoid.

Substantial amount of empirical evidence has accumulated to form a pattern of wrongful profiling and prosecutions. In its zeal to catch spies and punish the guilty, the government appears to target ethnic groups and neglect to protect the innocent.

There are also questions of fairness in sentencing and retaliation. Subject to additional verification and research, available information shows that the average sentence length is 30 months for all convictions under the Economic Espionage Act since 1996. However, the average for those with apparent Chinese surnames is 40 months, and the average for those accused of benefitting China is 52 months.

Although the case against the Chinese American hydrologist was dropped, her federal employment was terminated for being “untrustworthy” under the same justifications as the criminal prosecution. For a federal workforce of 2 million employees, less than 0.5% is terminated each year, many of them for performance reasons. If there were misconduct, the hydrologist, who received multiple awards for her work that saved lives, was not given the customary counseling, training, or improvement plans prior to the termination. The unusual government action appears to be retaliatory and vindictive in response to the embarrassment of a failed prosecution.

In another case that began with allegations of espionage, a 46-year-old Chinese American researcher at a national laboratory was sentenced to one year in prison for taking an unauthorized computer containing no classified information to China. He lost his job and career. At about the same time, then-director of the Central Intelligence Agency shared seven laptops of top secrets with a mistress, but was given a two-year probation and a fine of $100,000 after pleading guilty to one misdemeanor charge.

Together these observations suggest, at least at face value, a pattern of wrongful profiling against persons of Chinese origin, especially those who are or related to naturalized U.S. citizens working in scientific or sensitive positions. The wrongful profiling may have taken place at the digital stage; existing safeguards failed to protect some of the innocent.

Summary

Big Data and Predictive Analytics can undoubtedly benefit our economy and society. However, there is also a potential dark side to it.

The balance between national security and civil liberties has been perennial and intricate. The 9/11 terrorist attacks and Big Data have tipped this balance. Big Data are already being collected extensively for national security, and the government continues to exempt itself from existing privacy laws. The extent, effectiveness, and fairness of their application of Big Data are basically unknown. We are only beginning to learn about their existence with practically no understanding of their impact on economic espionage and insider threat.

An alarming pattern of wrongful profiling against American citizens, especially of Chinese origin, has emerged in economic espionage allegations. Without appropriate public monitoring and participation, new form of traditional and digital profiling will evolve and develop intentionally or unintentionally.

Transparency, accountability, and balanced evaluations are especially critical in the formulation, implementation, and oversight of public policies involving the use of Big Data Predictive Analytics, in order to mitigate the risks of both traditional and digital discrimination.

The Dark Side of Big Data and Predictive Analytics?

Jeremy Wu

Retired, Federal Government

The Target Story

National Security: Terrorism, Economic Espionage, and Insider Threat

Emerging Pattern of Wrongful Profiling and Prosecution

Summary

More articles by this author

Insights from the community

Others also viewed

Data Analytics: Nice to Have “OR” Must Have?

Application of Big Data Analytics

Analytics: Big Data, Bad behaviour?

“Data management and Disruptive Technology: Reshaping the way how business is done “

No, Everyone Doesn't Need To Understand Analytics

Data Analytics

The Power of Data Analytics: Unveiling Insights, Driving Decisions

Shaping Tomorrow: How Data Analytics is Driving Innovation Across Industries

The data problem: Designed or Driven?

Why voice is a big data goldmine

Explore topics

The Target Story

National Security: Terrorism, Economic Espionage, and Insider Threat

Emerging Pattern of Wrongful Profiling and Prosecution

Summary

The Importance of Data in Fighting Racial Profiling: from FedCases to "China Initiative” and Beyond

Aug 18, 2021

Coronavirus and Racism

Feb 10, 2020

2020年人口普查：從公民身份問題到數據隱私

Aug 6, 2018

2020年人口普查：从公民身份问题到数据隐私

Aug 6, 2018

The 2020 Census: From the Citizenship Question to Data Privacy

Aug 6, 2018

Sherry Chen was Victim of “Gross Injustice,” Judge Rules

May 21, 2018

2020年人口普查的公民问题: 保护公民权利还是借口转移政治权力？

Apr 9, 2018

2020年人口普查的公民問題: 保護公民權利還是藉口轉移政治權力？

Apr 9, 2018

The Citizenship Question in the 2020 Census: Protecting Civil Rights or Pretext to Shift Political Power?

Apr 9, 2018

重温帕克法官对李文和博士的道歉

Mar 12, 2018

Insights from the community

Others also viewed

Data Analytics: Nice to Have “OR” Must Have?

Application of Big Data Analytics

Analytics: Big Data, Bad behaviour?

“Data management and Disruptive Technology: Reshaping the way how business is done “

No, Everyone Doesn't Need To Understand Analytics

Data Analytics

The Power of Data Analytics: Unveiling Insights, Driving Decisions

Shaping Tomorrow: How Data Analytics is Driving Innovation Across Industries

The data problem: Designed or Driven?

Why voice is a big data goldmine

Explore topics