Safety in Foundational Models

AI Security, Models

This talk maps the concrete safety and security risks that emerge when AI hits production—spanning user harms and enterprise exposure—and tracks how those risks shift as AI agents and MCP servers come online. What breaks in real products and how to catch it before users do.

From diagnosis to defense, it lays out a layered approach to responsible development: supervised classifiers that enforce known constraints, unsupervised methods that surface novel or rare behavior, and a look at research-stage ideas that could further harden deployed systems.

You’ll leave with:

• A clear map of where foundational models introduce risk for individuals and organizations.

• How safety and security intersect once agents and MCP servers enter the stack.

• A practical toolkit you can deploy now, plus research directions worth tracking.

Vinay Rao is a CTO at ROOST.tools and an Advisor at Anthropic. Vinay has led Engineering, Product Management, Policy Design, and Data Science teams to ensure safe deployment of Anthropic’s AI models. Prior to joining Anthropic Vinay has worked in Trust, Safety, and Risk at YouTube, Stripe, Airbnb, and Google. Vinay holds a PhD in Engineering from Stanford.