▎AI & Multi-Agent
Model Extraction
Attack that recreates or approximates a model by querying it and observing outputs.
Definition
Model Extraction is attack that recreates or approximates a model by querying it and observing outputs. In defense applications, it can steal costly autonomy, targeting, or perception capabilities from exposed services. The hard part is query camouflage and partial reconstruction of sensitive behavior, especially when systems are deployed across contested links, coalition boundaries, and mixed human-machine teams. KhanBMS treats it as a threat mitigated by rate limits, watermarking, and confidential execution in KhanBMS, tying the concept back to modular command, edge execution, and auditable authority.
Reference attributes
- Layer
- model theft attack
- Operational value
- Can steal costly autonomy, targeting, or perception capabilities from exposed services
- Primary risk
- Query camouflage and partial reconstruction of sensitive behavior
- KhanBMS role
- A threat mitigated by rate limits, watermarking, and confidential execution in KhanBMS
Related terms
- AI WatermarkingEmbedding or detecting signals that identify AI-generated content or model ownership.
- Confidential AI ComputingUse of encryption, enclaves, and attestation to protect AI workloads while data is in use.
- AI Trusted Execution Environment (AI-TEE)Hardware-isolated environment for protecting model weights, inputs, and inference outputs from a compromised host.
- Adversarial Machine Learning (AML)Study and defense of attacks that manipulate AI through crafted inputs, poisoned data, or model theft.
#security#ip#model
