'Poisoned' data could wreck AIs in wartime, warns Army software acquisition chief

Close up of skull icon on monitor screen (Getty)

WASHINGTON — Even as the Pentagon makes big bets on big data and artificial intelligence, the Army’s software acquisition chief is raising a new warning that adversaries could “poison” the well of data from which AI drinks, subtly sabotaging algorithms the US will use in future conflicts.

“I don’t think our data is poisoned now,” deputy assistant secretary Jennifer Swanson emphasized Wednesday at a Potomac Officers Club conference, “but when we’re fighting a near-peer adversary, we have to know exactly what those threat vectors are.”

The fundamental problem is that every machine-learning algorithm has to be trained on data — lots and lots of data. The Pentagon is making a tremendous effort to collect, collate, curate, and clean its data so analytic algorithms and infant AIs can make sense of it. In particular, the prep team needs to throw out any erroneous datapoints before the algorithm can learn the wrong thing.

Commercial chatbots from 2016’s Microsoft Tay to 2023’s ChatGPT are notorious for sucking up misinformation and racism along with all the other internet content they consume. But what’s worse, Swanson argued, is that the military’s own training data might be deliberately targeted by an adversary – a technique known as “data poisoning.”

“Any commercial LLM [Large Language Model] that is out, there that is learning from the internet, is poisoned today,” Swanson said bluntly. “[But] I am honestly more concerned about what you call, you know, the ‘regular’ AI, because those are the algorithms that are going to really be used by our soldiers to make decisions in the battlefield.”

Making better chatbots isn’t the big problem for the Pentagon, she argued. “I think [generative AI] is fixable,” she said. “It really is all about the data.” Instead of training an LLM on the open internet, as OpenAI et al have done, the military would train it on a trusted, verified military dataset inside a secure, firewalled environment. Specifically, she recommended a system at DoD Impact Level 5 or 6, suitable for sensitive (5) or classified (6) data.

“Hopefully by this summer, we have an IL-5 LLM capability that will be available for us to use,” she said. That can help with all sorts of back-office functions, summarizing reams of information to make bureaucratic processes more efficient, she said. “But our main concern [is] those algorithms that are going to be informing battlefield decisions.”

AUSA 2023 Engineer Hour: Panel Discussion, “Army Engineers Making a Difference in Europe and the Pacific”

Jennifer Swanson, left, the deputy assistant secretary of the Army for Data, Engineering and Software, gives her remarks at the Walter E. Washington Convention Center in Washington, D.C., Oct. 11, 2023. (U.S. Army photo by Pfc. Brandon L. Perry)

“The consequences of bad data or bad algorithms or poison data or trojans or all of those things are much greater in those use cases,” Swanson said. “That’s really, for us, where we are spending the bulk of our time.”

CJADC2, AI Testing, And Defeating Poisoned Data

Getting the right military-specific training data is particularly critical for the Pentagon, which aims to use AI to coordinate future combat operations across land, air, sea, space, and cyberspace. The concept is called Combined Joint All-Domain Command and Control (CJADC2), and in February the Pentagon announced a functioning “minimum viability capability” was already being fielded to select headquarters around the world.

Future versions will add targeting data and strike planning, plugging into existing AI battle command projects at the service level: the Air Force’s ABMS, the Navy’s Project Overmatch, and the Army’s Project Convergence.

Project Convergence, in turn, will use technology developed by the newly created Project Linchpin, what Swanson described the service’s “flagship AI program” designed to be a “trusted and secure ML ops pipeline for our programs.”

In other words, the Army is trying to apply to machine learning the “agile” feedback loop between development, cybersecurity, and current operations (DevSecOps) used by leading software developers to roll out new tech fast and keep updating it.

The catch? “Right now, we don’t know 100 percent how to do that,” Swanson said. In fact, she argued, no one does: “It is concerning to me how all-in we are with AI and hardly anybody has those answers. We’ve asked probably a hundred different companies, ‘how do you do it?’ and they’re like, ‘umm.’”

Swanson is not some Luddite technophobe. She’s an oldschool coder — “I learned FORTRAN in college,” she said — who spent three decades in government technology jobs and became the Army’s first-ever deputy assistant secretary for software, engineering, and data in 2022. She’s the kind of person who first buys a Tesla and then turns off a free trial of its self-driving feature because she didn’t like how it handled a test she set for it (“It screwed up” on a merge, she said). One of her big concerns about AI, in fact, is how difficult it is to test.

“How do you actually test AI, because it is not deterministic like [other] software?” Swanson asked. Traditional code — from the ancient FORTRAN she learned in school to modern favorites like Python — uses rigid IF-THEN-ELSE algorithms that always give the same output for a given input. Modern machine learning, by contrast, relies on neural networks modeled loosely on the human brain, which work off complex statistical correlations and probabilities. That’s why you can ask ChatGPT the same question over and over and get a different answer every time — a dream come true for students faking their essays but a nightmare for reliability testers.

What’s more, machine learning algorithms keep learning as they’re exposed to new data, essentially reprogramming themselves. That can make them much more adaptable than traditional code, which requires a human to make changes manually, but it also means they can fixate on some unintended detail or outright information and deviate dramatically from their creators’ intent.

“Because it continues learning, how do you manage that in the battlefield [to] make sure it’s not just going to completely go wild?” Swanson asked. “How do you know that your data has not been poisoned?”

“There’s a lot of work that’s going on right now,” she said. “My hope is that a lot of this work is going to produce some answers in the next year or two and enable us to set the foundation properly from the beginning.”