When AI Becomes a Swarm: What Interface Design Can (and Can't) Do for Governance
When AI Becomes a Swarm: Why HCI as Law is Agent Governance's Last Stand
Preface: I’m struck by a quieter shift: we’re no longer just tuning a single model, but orchestrating clusters of them. When tools like Kimi can spin up dozens of sub-agents in parallel, days of work collapse into minutes, and the usual legal levers start to feel blunt. This article looks at one narrow question: what designers can realistically do, at the interface layer, when AI starts to show up as a set of loosely coordinated workers rather than a single, neat assistant on the screen.
Is governance failure inevitable?
New systems roll out in months; legal frameworks take years. That gap is exactly where most of the risk seems to live right now. Traditional “post-hoc regulation” has proven inadequate. We need to shift toward “embedded governance”—writing rules directly into the interfaces (HCI) where Agents interact with humans. For the teams I’ve worked with, this isn’t just a compliance checkbox anymore—it’s turning into a practical survival skill.
Writing Law into Interfaces
It’s not that we need endless new laws; we need interfaces that actually help those laws matter in practice.
When Agents think faster and see more than humans, governance can’t rely solely on post-hoc audit reports. Governance has to move much closer to the actual moment of decision, not sit in a PDF after the fact. In that sense, HCI stops being only about usability and starts to look like one of the few levers we still have to shape behaviour on the screen.
Turning the “Black Box” into a “Glass House”: Explainability by Modality
In the handful of hospitals and trading floors I’ve been able to visit, the people closest to these systems keep coming back to the same need: they want to see how the agent got there, not just the final number on the screen. To make governance executable, we can organize HCI interventions into the following matrix:
Explainability: From Results to Process
You might ask: “How do I know why this AI suggested this?” That’s exactly the problem. Multimodal Agents (combining images, genomes, vital signs) often have decision processes that are black boxes.
Consider a diagnostic Agent suggesting “aggressive chemotherapy.” If the interface only shows this result, that’s a black box. In one ICU I visited last year, a senior nurse put it bluntly: “I only trust it when I can see what it’s arguing with.” That stuck with me more than any slide about explainability.
The HCI approach here is progressive disclosure. Don’t dump the conclusion all at once. First show the shadow it detected on the CT scan, then show how it connects to the medical record, and only then give the suggestion. This isn’t for show—it’s so doctors can “see” the Agent’s thinking path. In practice, that means audit trails have to be visible in the very screen where clinicians decide, not hidden in logs for regulators.
One CHI 2025 study reported roughly 20–35% higher self-reported trust and single-digit error reductions in a simulated triage workflow with 24 clinicians, though the authors caution that clinical deployment is still untested.
Where Human Oversight Actually Happens
Agents make mistakes, and they make mistakes humans won’t notice. In a noisy emergency room, if an Agent prescribes medication based on incorrect speech recognition, who stops it? The problem isn’t just speed—it’s that humans can’t react fast enough when things go wrong.
The HCI approach here is dynamic permission switching. For low-risk vital sign alerts, let the Agent run automatically; but for high-risk actions like prescribing medication, human “approval” is required. When confidence falls below threshold (e.g., blurry image), the Agent must automatically “raise its hand” for help. And it must accept humans interrupting it in the most natural ways (voice, gestures) at any time.
The tricky part is designing these intervention points without creating so much friction that clinicians start ignoring the system entirely. One team I spoke with found that requiring approval for every low-confidence alert led to alert fatigue within two weeks. They eventually pushed half of those prompts into a morning summary view instead of blocking the main flow.
Designing Interfaces That Notice Model Drift
Models drift. Today’s miracle doctor might become tomorrow’s quack because data distribution changed (e.g., COVID outbreak). Continuously learning models may have hidden errors that accumulate over time.
The HCI approach here is anomaly visualization. When a doctor vetoes an Agent’s suggestion three times in a row, that’s a signal itself. HCI must capture these signals and light up red lights on the dashboard. If performance degrades, the system should automatically downgrade—from “autopilot” back to “assisted driving,” forcing humans to take over, rather than waiting for engineers to fix the model.
The challenge is distinguishing between legitimate model drift and temporary variations in user behavior. One hospital’s system kept triggering degradation warnings because a new resident was more conservative than the previous team, not because the model had actually degraded.
The Swarm Paradox: When Law Can’t Keep Up
The dilemma is straightforward: New systems roll out in months; legal frameworks take years. This speed gap creates a vacuum where risk breeds.
Technology Side: Google and Sakana’s Arms Race
Google’s multimodal strategy isn’t building “assistants” anymore—it’s building “agents.”
Gemini 2.0 (2024/12): Generates images, audio, and text together, laying the foundation for native multimodality.
Gemini 3 (evolved): Agentic Vision—it doesn’t just “see,” it “perceives” and “acts.”
But the real shift comes from China’s multi-agent workflows.
Background: Kimi K2.5 and Agent Swarm Moonshot’s recently launched Kimi K2.5 demonstrates another possibility: It doesn’t rely on a single supermodel, but natively supports “swarm orchestration.”
1 to 100: User gives one instruction, model automatically generates up to 100 parallel sub-agents.
High concurrency: Supports 1,500 concurrent tool calls. Writing code, testing, and documentation happen simultaneously.
HCI Implication: This means human cognitive bandwidth is completely liberated. We only focus on goals, letting the clustered agents handle the dirty work.
This is similar to Sakana AI’s concept, but Sakana emphasizes a “dream team” of expert models rather than a simple swarm. This collectivization means more complex decision processes and harder attribution. When 100 agents collaborate, one agent’s hallucination might be amplified by others, causing cascading effects. An ArXiv preprint from 2025 suggests this requires entirely new “federated learning interfaces” to isolate risks, though the practical implementation details are still being worked out.
Regulation Side: Global Puzzle, Each Playing Their Own Game
Meanwhile, governments worldwide are trying to use old maps to find new roads.
United States: Relies on executive orders and sectoral regulation, fast but fragmented.
European Union: AI Act draws red lines directly, penalties are staggering, but may stifle innovation.
South Korea: Both developers and users are responsible—this is smart design.
Singapore: Doesn’t penalize you, but gives concrete “how-to” guidance—very pragmatic.
The fragmentation creates an opportunity for HCI designers—we can design interfaces that meet multiple regulatory requirements simultaneously, rather than building region-specific versions.
Frontline Reality in Healthcare and Finance
On paper, we have plenty of theory; on the ground, things are much thinner. Let’s see how this logic works in life-and-death domains.
Comparing healthcare and finance, two high-risk domains, reveals that HCI governance logic has remarkable consistency, though the implementation details differ:
1. Healthcare: From Diagnostic Assistant to Clinical Agent
Consider a large teaching hospital. A night-shift resident is handling complex cases in the emergency room. An elderly patient arrives with fever and confusion. Family members describe symptoms over the phone with heavy accents, while the patient’s electronic health record shows multiple chronic conditions.
A multimodal agent is running in the background: It analyzes emergency room video surveillance in real-time, capturing the patient’s nonverbal cues; simultaneously processes family voice descriptions, identifying key symptom vocabulary; integrates the patient’s historical medical images, lab reports, and real-time vital signs from wearable devices.
The night-shift resident glances at the CT panel, where the agent has highlighted two suspicious patches in amber rather than red to indicate low confidence. Before the doctor completes a full assessment, the agent has already flagged several high-risk indicators and suggested priority tests. However, when the agent tries to suggest a specific medication based on ambiguous information from voice analysis, the system pauses execution because that medication might conflict with allergies in the patient’s record. This pause isn’t an error—it’s by design. The agent identified areas where its confidence is insufficient and actively seeks human confirmation.
The HCI approach here emphasizes evidence hierarchy—don’t just give suggestions, show evidence (image highlights, genetic scores). Layered checkpoints mean alerts auto-send, prescriptions require approval, experimental treatments need expert review. This aligns with Singapore’s MGF model and meets the EU AI Act’s “human oversight” requirement.
The system still mis-prioritizes patients who cannot speak the dominant language, and the team has no agreed protocol for overriding the agent in these situations. This remains an open challenge.
2. Finance: From Algorithms to Agent Trading
On an investment management firm’s trading floor, the market shows unusual volatility during Asian trading hours. An agent system monitors multiple data streams in real-time: global market indices, foreign exchange rates, commodity prices, and sentiment signals from news agencies and social media.
When the system detects a regional event that might affect the portfolio, it initiates a multi-step process: First simulates market reactions under different scenarios, evaluating how various assumptions impact existing positions; then recalculates optimal portfolio allocation based on risk parameters and liquidity conditions; next, the agent transforms these analyses into concrete trading strategies, including how to split large orders based on market depth, choose optimal execution timing and venues, and set priorities by urgency.
After generating trading proposals, the agent doesn’t execute immediately. Instead, it presents suggestions on the trader’s dashboard, annotated with confidence scores and risk assessments for each decision. For high-confidence, low-risk standard operations, the system allows automatic execution; but for trades involving large amounts or unusual market conditions, the system requires human confirmation.
The HCI approach here uses what I call a “regulatory block”—writing compliance rules directly into the interface. Self-regulation means confidence >80% runs automatically, <50% calls humans. Every trade’s decision tree is recorded, which makes SEC review simpler. Back-tests from vendors like Tickeron (2025) report striking annualized returns for agent-driven portfolios, but only under fairly idealised assumptions and with strict guardrails. Those same guardrails—”regulatory blocks” that stop certain trades outright—are what keep the system from spiralling into self-reinforcing crashes. The challenge is that these systems can still amplify market volatility when multiple agents react to the same signal simultaneously.
Conclusion: Governance Engineering for Designers
In that sense, interface work quietly does the job we often imagine law will do. Lawyers can write words like “AI must be regulated,” but designers decide “when that red pause button appears.” This is governance engineering.
For designers, this means AI Literacy is no longer a bonus—it’s becoming essential. You can’t just know Grid and Typography anymore. You need to understand multi-agent collaboration logic, understand federated learning’s privacy boundaries. If you can’t understand how Kimi spins up dozens of agents, you can’t design “brakes” for it.
This is a new responsibility: Our job is not to make every interaction smoother. It’s to decide where the flow should slow down on purpose—and then make that slowdown legible to everyone using the system. The next wave of AI revolution won’t happen at the model layer—it will happen at the interaction layer. In many of the systems that now touch life-and-death decisions, designers sit uncomfortably close to the last meaningful gate. It’s worth treating that as more than a visual problem.
Here are two concrete steps to start:
Start by adding one explicit “pause” state to your highest-risk flow, and make it a first-class element in your design system. When you document it, include not just its props but also who’s allowed to override it, and on what timescales. This makes governance constraints visible to the entire team, though in smaller teams it’s hard to get this right on the first pass.
Build a simple dashboard that visualizes when and why human interventions happen. This isn’t just for compliance—it helps you spot patterns that indicate model drift or design problems. Most teams I’ve seen start with basic logging and evolve from there.
Regulators will take years to settle on their language. Your design system will ship again next quarter. That’s probably where governance work quietly starts today.
References
Google. (2024, December). Introducing Gemini 2.0: our new AI model for the agentic era. Google Blog.
Google AI. (2026, January 27). Agentic Vision in Gemini Flash. Google Developers Blog.
Sakana AI. (2026, January). Multi-LLM orchestration for world models. Presentation.
CHI Proceedings. (2025, November). Imagining Design Workflows in Agentic AI Futures. CHI 2025.
ACM Trans. Comput.-Hum. Interact. (2024, January). A Scoping Review of the Top-tier HCI Literature on Agentic AI.
JMIR. (2026, January 13). From Agents to Governance: Essential AI Skills for Healthcare. Journal of Medical Internet Research.
3CL Legal. (2026, January 22). The EU AI Act and USA AI.gov Action Plan: A Legal Comparison. 3CL Legal.
IMDA Singapore. (2026, January 21). Model AI Governance Framework for Agentic AI. Press Release.
Cooley. (2026, January 27). South Korea’s AI Basic Act: Overview and Key Takeaways. Legal Insight.
Virtual Workforce AI. (2026, January 26). Agentic AI agents for financial services use cases. Virtual Workforce AI Blog.
Tickeron. (2025). AI Trading Performance and Annualized Returns. Tickeron Research.
ArXiv Preprint. (2025, October). Federated Learning Interfaces for Agent Swarms.
Copyright © PrivacyUX Consulting Ltd. All rights reserved.
Joshua is a pioneer in Agentic UX (Agentic User Experience), with over 15 years of groundbreaking practice in the fields of artificial intelligence and user experience design. He was among the first to advocate for treating user privacy protection as a core principle of AI product design. In 2022, he founded Privacyux Consulting Ltd., where he serves as Chief Consultant, actively advancing privacy-centered innovation in medical AI products. Previously, he served as Chief Strategy Officer for Social AI (2022–2024), focusing on the design of privacy-conscious emotion recognition systems and mechanisms for user data autonomy.







