Demo

AI Safety Research Intern (PhD)

Centific
Redmond, WA Intern
POSTED ON 12/23/2025
AVAILABLE BEFORE 1/24/2026

Job Title: AI Safety Research Intern (PhD)

Location: Seattle, WA (or Remote)

Type: Full-time Internship - 40 hours per week

Duration: 6 months


Job Description

Internship: AI Safety, Jailbreaking Attacks & Defense, Agentic AI, Human Behavior

(Ph.D. Research Intern)


Build the Future of Safe and Responsible AI

Are you advancing the frontiers of AI safety, LLM jailbreak detection and defense, and agentic AI—with publications to show for it? Join us to translate pioneering research into robust security and trustworthy LLM systems that resist adversarial and behavioral exploits.


The Mission

We’re tackling cutting-edge AI safety across adversarial robustness, jailbreak defense, agentic workflows, and human-in-the-loop risk modeling. As a Ph.D. Research Intern, you’ll own high-impact experiments from concept to prototype to deployable modules, directly contributing to our platform’s security guarantees.


What You’ll Do

  • Advance AI Safety: Design, implement, and evaluate attack and defense strategies for LLM jailbreaks (prompt injection, obfuscation, narrative red teaming).
  • Evaluate AI Behavior: Analyze and simulate human-AI interaction patterns to uncover behavioral vulnerabilities, social engineering risks, and over-defensive vs. permissive response tradeoffs.
  • Agentic AI Security: Prototype workflows for multi-agent safety (e.g., agent self-checks, regulatory compliance, defense chains) that span perception, reasoning, and action.
  • Benchmark & Harden LLMs: Create reproducible evaluation protocols/KPIs for safety, over-defensiveness, adversarial resilience, and defense effectiveness across diverse models (including latest benchmarks and real-world exploit scenarios).
  • Deploy and Monitor: Package research into robust, monitorable AI services using modern stacks (Kubernetes, Docker, Ray, FastAPI); integrate safety telemetry, anomaly detection, and continuous red-teaming.


Example Problems You Might Tackle

  • Jailbreaking Analysis: Systematically red-team advanced LLMs (GPT-4o, GPT-5, LLaMA, Mistral, Gemma, etc.), uncovering novel exploits and defense gaps.
  • Multi-turn Obfuscation Defense: Implement context-aware, multi-turn attack detection and guardrail mechanisms, including countermeasures for obfuscated prompts (e.g., StringJoin, narrative exploits).
  • Agent Self-Regulation: Develop agentic architectures for autonomous self-check and self-correct, minimizing risk in complex, multi-agent environments.
  • Human-Centered Safety: Study human behavior models in adversarial contexts—how users probe, trick, or manipulate LLMs, and how defenses can adapt without excessive over-defensiveness.


Minimum Qualifications

  • Ph.D. student in CS/EE/ML/Security (or related); actively publishing in AI Safety, NLP robustness, or adversarial ML (ACL, NeurIPS, BlackHat, IEEE S&P, etc.).
  • Strong Python and PyTorch/JAX skills; comfort with toolkits for language models, benchmarking, and simulation.
  • Demonstrated research in at least one of: LLM jailbreak attacks/defense, agentic AI safety, human-AI interaction vulnerabilities.
  • Proven ability to go from concept → code → experiment → result, with rigorous tracking and ablation studies.


Preferred Qualifications

  • Experience in adversarial prompt engineering, jailbreak detection (narrative, obfuscated, sequential attacks).
  • Prior work on multi-agent architectures or robust defense strategies for LLMs.
  • Familiarity with red-teaming, synthetic behavioral data, and regulatory safety standards.
  • Scalable training and deployment: Ray, distributed evaluation, CI/telemetry for defense protocols.
  • Public code artifacts (GitHub) and first-author publications or strong open-source impact.


Our Stack (you’ll touch a subset)

  • Modeling: PyTorch/JAX, Hugging Face, OpenMMLab, Mistral, LLaMA
  • Safety: Red-teaming frameworks, LLM benchmarking (SODE, ART), human behavior simulation
  • Systems: Python, Ray, Kubernetes, Docker, FastAPI, Triton, Weights & Biases
  • Defense Pipelines: Context-aware filtering, prompt manipulation detection, anomaly telemetry


Benefits:

  • Comprehensive healthcare, dental, and vision coverage
  • 401k plan
  • Paid time off (PTO)
  • And more!


Learn more about us at centific.com.


Centific is an equal-opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.

Salary : $35 - $40

If your compensation planning software is too rigid to deploy winning incentive strategies, it’s time to find an adaptable solution. Compensation Planning
Enhance your organization's compensation strategy with salary data sets that HR and team managers can use to pay your staff right. Surveys & Data Sets

What is the career path for a AI Safety Research Intern (PhD)?

Sign up to receive alerts about other jobs on the AI Safety Research Intern (PhD) career path by checking the boxes next to the positions that interest you.
Income Estimation: 
$49,901 - $77,045
Income Estimation: 
$70,221 - $95,951
Income Estimation: 
$53,274 - $74,851
Income Estimation: 
$77,900 - $95,589
Income Estimation: 
$101,387 - $124,118
Income Estimation: 
$119,030 - $151,900
Income Estimation: 
$149,493 - $192,976
View Core, Job Family, and Industry Job Skills and Competency Data for more than 15,000 Job Titles Skills Library

Job openings at Centific

  • Centific Redmond, WA
  • Company Overview: Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower t... more
  • 4 Days Ago

  • Centific Bellevue, WA
  • Company Overview: Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower t... more
  • 4 Days Ago


Not the job you're looking for? Here are some other AI Safety Research Intern (PhD) jobs in the Redmond, WA area that may be a better fit.

  • Centific Redmond, WA
  • PhD Research Intern — Speech AI Centific AI Research Fulltime - 40 hours per week Summary Centific AI Research seeks a PhD Research Intern to design and ev... more
  • 20 Days Ago

  • Cobot Seattle, WA
  • Join Collaborative Robotics this summer as an AI Research Engineer Intern , and evaluate modern AI/ML approaches for bimanual robotic manipulation. You’ll ... more
  • 24 Days Ago

AI Assistant is available now!

Feel free to start your new journey!