AI & Multi-Agent

Adversarial Prompting

Inputs designed to coerce a language model or agent into unsafe, unauthorized, or false behavior.

Definition

Adversarial Prompting is inputs designed to coerce a language model or agent into unsafe, unauthorized, or false behavior. In defense applications, it targets staff assistants, document analyzers, and tool-using agents through natural language. The hard part is hidden instructions inside apparently benign mission data, especially when systems are deployed across contested links, coalition boundaries, and mixed human-machine teams. KhanBMS treats it as a key reason KhanBMS separates untrusted content from command policy, tying the concept back to modular command, edge execution, and auditable authority.

Reference attributes

Layer
LLM attack method
Operational value
Targets staff assistants, document analyzers, and tool-using agents through natural language
Primary risk
Hidden instructions inside apparently benign mission data
KhanBMS role
A key reason KhanBMS separates untrusted content from command policy

Related terms

#security#llm#agents