WhiteBox Research

testimonials

Hear what our 1st cohort has to say

Carl Viñas

"I highly recommend this fellowship to anyone looking to deepen their understanding of AI safety—why it's a critical issue in today’s world, how to approach research in this field effectively, and the incredible opportunities that await those who pursue it. Through this fellowship, I gained everything I hoped for and more."

Ivan Enclonar

"Although you can learn more about AI and mechanistic interpretability online, WhiteBox's fellowship offers so much value because it brings together smart and like-minded people. There is no other organization or event in the Philippines that gathers such talented people to tackle challenging problems. I highly recommend this fellowship to future participants. (I've already encouraged some of my friends to join the next cohort!)"

Zmavli Caimle

"The WhiteBox team was very flexible with regards to the varying needs of fellows...The program also revealed to me how great of a fit mechanistic interpretability research is to me, given my background in making logical languages — so much so that I have decided to make pursuing mechanistic interpretability research my main career path.”

Clement Neo

Cohort 1 Mentor

"WhiteBox is doing important work in growing the field of AI safety in Southeast Asia, which has potential talent that is often overlooked."

Kat Compendio

"The people I've met during the fellowship have left a profound impact on me. I had doubts about staying in this field, in pursuing a degree in computer science, but the experiences and community that I gained through this inspired me. I've had an insane amount of growth both professionally and personally through the fellowship."

Cohort 1 Participant

"WhiteBox has been a really great experience - from the technical learning to meeting a community of similar but also diverse group of people. If you're even a bit interested in knowing how LLMs work, how you can contribute to AI Safety, or even just meeting and learning with a cohort, then you'd probably enjoy being part of WhiteBox!"

About us

WhiteBox Research is a nonprofit aiming to develop more AI interpretability and safety researchers, particularly in or near Southeast Asia.

We’re based in Manila 🇵🇭 and funded by the Long-Term Future Fund and Manifund.

Our team

Clark Urzo

Co-Founder and Strategy Director

Clark has experience in software engineering, independent AI safety research, and entrepreneurship. He participated in the virtual workshops in late 2022 of the ML Alignment Theory Scholars (MATS) program under John Wentworth.

He was also a facilitator in a virtual AI Safety Fundamentals Alignment course and was the co-founder and VP of Engineering for Veer, one of the first virtual reality companies in the Philippines.

Brian Tan

Co-Founder and Operations Director

Brian co-founded Effective Altruism Philippines (EA PH) in 2018 and worked full-time to grow EA PH in 2021. EA PH helps Filipino students and professionals figure out how to make a bigger positive impact.

He also supported effective altruism groups globally in 2022–2023 via working for the Centre for Effective Altruism. Brian was a Merit Scholar at the Ateneo de Manila University and graduated with a degree in IT Entrepreneurship in 2019.

Kyle Reynoso

Learning Director

Kyle leads in designing our curriculum and helping fellows upskill in their research and engineering abilities. He graduated summa cum laude from the University of the Philippines with a degree in Computer Science.

His team won 1st place in Apart Research’s AI and Democracy Hackathon, and he was mentored for a SPAR research project by Nina Rimsky from Anthropic.

Angel Martinez

Research Engineer

Angel is an AI engineer with a background in statistics. She was part of the London AI Safety Research (LASR) Labs Summer 2024 Cohort, under the mentorship of Mikita Balesni from Apollo Research.

She co-authored a paper on reward hacking in LLMs, which was accepted in a NeurIPS 2024 workshop. She currently does TA work for our fellowship.

Ivan Enclonar

Engineering Intern

Ivan is a Computer Science student at De La Salle University. He was part of Cohort 1 of our fellowship, where he wrote a mechanistic interpretability paper on locating where memories are stored in a network.

He also won an Apart Hackathon with a paper on using vector addition to scrub hazards from open-source models.

Our advisers

Callum McDougall

Research Engineer, Google DeepMind

Callum is a Research Engineer at Google DeepMind and the founder of the Alignment Research Engineer Accelerator (ARENA). He also worked at Anthropic on their Interpretability team. He finished a Bachelor’s and Master’s in Mathematics from the University of Cambridge.

He was mentored by Neel Nanda in the ML Alignment Theory Scholars (MATS) program in 2023. He also co-authored a MechInterp paper called “Copy Suppression: Comprehensively Understanding an Attention Head”.

Lee Sharkey

Co-Founder, Apollo Research

Lee is the Chief Strategy Officer at Apollo Research and a mentor for the MATS Program’s mechanistic interpretability stream. Previously, Lee was a Researcher at Conjecture, where he published an interim report on superposition.

Lee’s past research includes "Sparse Autoencoders Find Highly Interpretable Features in Language Models” and“Goal Misgeneralization in Deep Reinforcement Learning”.

WhiteBox aims to develop more AI interpretability and safety researchers in Asia.

AI Interpretability Fellowship

Hear what our 1st cohort has to say

About us

Our team

Our advisers

Accelerate your journey into AI interpretability and safety.