WhiteBox aims to develop more AI interpretability and safety researchers in Asia.

Our flagship program

AI Interpretability Fellowship

Cohort 2 - January 22 to May 18, 2025

Learn how to train and interpret AI models. Accelerate your career in AI safety research.

In-person track: Openspace Katipunan (Quezon City), Philippines 🇵🇭
Virtual track: Open to those in South, Southeast, or East Asia
Cost: $3,000 USD Free!
Completion reward: $200 USD
Minimum time commitment: 15 hours/week

Through the fellowship, you'll learn to train and interpret like GPT-2. Our training exercises are drawn heavily from ARENA, a renowned AI safety upskilling program. You’ll also get hands-on AI safety research experience through two hackathons.

Application deadline: Dec. 9, 2024, 11:59pm GMT+8
Applications are processed on a rolling basis.

SUCCESS STORIES
In Cohort 1, five of our fellows won 1st and 3rd place across two AI safety hackathons and got accepted into the Apart Lab fellowship.
Check out their hackathon outputs here:

Apart's AI and Democracy Hackathon

🥇 1st place ($1,000 prize)

Ivan Enclonar, Lexley Villasis, Kyle Reynoso

Apart's AI Security Evaluation Hackathon

🥉 3rd place ($300 prize)

Zmavli Caimle, Carl Viñas, JD Dantes, Alex Pino, Kyle Reynoso

Aside from the above, you can find presentations of our fellows’ research reports on our YouTube channel.
testimonials

Hear what our 1st cohort has to say

Carl Viñas
"I highly recommend this fellowship to anyone looking to deepen their understanding of AI safety—why it's a critical issue in today’s world, how to approach research in this field effectively, and the incredible opportunities that await those who pursue it. Through this fellowship, I gained everything I hoped for and more."
Ivan Enclonar
"Although you can learn more about AI and mechanistic interpretability online, WhiteBox's fellowship offers so much value because it brings together smart and like-minded people. There is no other organization or event in the Philippines that gathers such talented people to tackle challenging problems. I highly recommend this fellowship to future participants. (I've already encouraged some of my friends to join the next cohort!)"
Zmavli Caimle
"The WhiteBox team was very flexible with regards to the varying needs of fellows...The program also revealed to me how great of a fit mechanistic interpretability research is to me, given my background in making logical languages ­­— so much so that I have decided to make pursuing mechanistic interpretability research my main career path.”
Clement Neo
Cohort 1 Mentor
"WhiteBox is doing important work in growing the field of AI safety in Southeast Asia, which has potential talent that is often overlooked."
Kat Compendio
"The people I've met during the fellowship have left a profound impact on me. I had doubts about staying in this field, in pursuing a degree in computer science, but the experiences and community that I gained through this inspired me. I've had an insane amount of growth both professionally and personally through the fellowship."
Cohort 1 Participant
"WhiteBox has been a really great experience - from the technical learning to meeting a community of similar but also diverse group of people. If you're even a bit interested in knowing how LLMs work, how you can contribute to AI Safety, or even just meeting and learning with a cohort, then you'd probably enjoy being part of WhiteBox!"

About us

WhiteBox Research is a nonprofit aiming to develop more AI interpretability and safety researchers, particularly in or near Southeast Asia.

We’re based in Manila 🇵🇭 and funded by the Long-Term Future Fund and Manifund.

Our team

Clark Urzo
Co-Founder and Strategy Director

Clark has experience in software engineering, independent AI safety research, and entrepreneurship. He participated in the virtual workshops in late 2022 of the ML Alignment Theory Scholars (MATS) program under John Wentworth.

He was also a facilitator in a virtual AI Safety Fundamentals Alignment course and was the co-founder and VP of Engineering for Veer, one of the first virtual reality companies in the Philippines.

Brian Tan
Co-Founder and Operations Director

Brian co-founded Effective Altruism Philippines (EA PH) in 2018 and worked full-time to grow EA PH in 2021. EA PH helps Filipino students and professionals figure out how to make a bigger positive impact.

He also supported effective altruism groups globally in 2022–2023 via working for the Centre for Effective Altruism. Brian was a Merit Scholar at the Ateneo de Manila University and graduated with a degree in IT Entrepreneurship in 2019.

Kyle Reynoso
Learning Director

Kyle leads in designing our curriculum and helping fellows upskill in their research and engineering abilities. He graduated summa cum laude from the University of the Philippines with a degree in Computer Science.

His team won 1st place in Apart Research’s AI and Democracy Hackathon, and he was mentored for a SPAR research project by Nina Rimsky from Anthropic.

Our advisors

Callum McDougall
Member of Technical Staff, Anthropic

Callum works at Anthropic on their Interpretability team. He is the founder of the Alignment Research Engineer Accelerator (ARENA). He finished a Bachelor’s and Master’s in Mathematics from the University of Cambridge.

He was mentored by Neel Nanda in the ML Alignment Theory Scholars (MATS) program in 2023. He also co-authored a MechInterp paper called “Copy Suppression: Comprehensively Understanding an Attention Head”.

Lee Sharkey
Co-Founder, Apollo Research

Lee is the Chief Strategy Officer at Apollo Research and a mentor for the MATS Program’s mechanistic interpretability stream. Previously, Lee was a Researcher at Conjecture, where he published an interim report on superposition.

Lee’s past research includes "Sparse Autoencoders Find Highly Interpretable Features in Language Models” and“Goal Misgeneralization in Deep Reinforcement Learning”.

Accelerate your journey into AI interpretability and safety.

Join Cohort 2 of our fellowship.