Research at Neuraphic

Our research program focuses on adversarial AI, inference-time defense, autonomous security, and safe agent systems. We publish our findings openly.

We invest in fundamental research because the problems we are trying to solve — defending AI systems against adversarial attacks, building autonomous security that reasons about threats, deploying agents that operate safely within defined boundaries — do not have adequate solutions in the existing literature. The gap between what the field knows and what production systems need is wide, and it is growing.

Our research is not separate from our products. It directly informs what we build. The adversarial taxonomy that shapes how Prion classifies inputs came from our own attack research. The vulnerability reasoning that will power Claeth comes from our work on contextual security analysis. Every system we build is grounded in work that we can point to, explain, and defend.

We publish openly because the industry improves when knowledge about vulnerabilities and defenses is shared responsibly. We also publish because accountability requires it — claims about safety that cannot be scrutinized are not claims worth making.

Research areas

Adversarial AI

How AI systems fail under adversarial pressure — and how to prevent it. We study seven categories of attack against language models, develop classification systems that operate at inference time, and build evaluation frameworks that keep pace with evolving threats. This research directly powers Prion.

Autonomous security

Can AI reason about vulnerabilities the way a human security researcher does? We are training systems that understand how infrastructure fails in context — not through pattern matching against known signatures, but through contextual analysis of configurations, dependencies, and failure modes. This research directly powers Claeth.

Safe agent systems

Autonomous agents that take actions in the real world need fundamentally different safety guarantees than models that generate text. We study task planning, supervision boundaries, human-in-the-loop escalation, sandboxed execution, and the audit systems needed to trust agents with real decisions. This research directly powers Workers.

Publications

Mar 28, 2026 Writing Adversarial robustness in language model defense: methods and benchmarks Mar 1, 2026 Writing On the feasibility of real-time prompt classification at inference Jan 8, 2026 Writing Toward autonomous vulnerability detection in cloud infrastructure

Policy & safety

Apr 7, 2026 Policy Responsible Scaling Policy Feb 18, 2026 Policy Our approach to AI safety Framework Core views on AI safety

Our research is not separate from our products. It directly informs what we build.

Core views on AI safety → About Neuraphic → Careers in research → Newsroom →