Interview with Agustin Covarrubias
TLDR: Agus runs Kairos. Kairos helps AI safety groups at university to be run efficiently. Kairos also runs SPAR - one of the best known AI safety upskilling programs. AI welfare needs more attention.
Mykhaylo Filipenko: Thanks a lot for taking the time for this interview. I will start always with the same question: Could you give a short introduction about yourself?
Agustín Covarrubias: Yeah, my name is Augustin Covarrubias. People usually call me Agus. I'm currently the director of Kairos – it’s a field building organization. What I mean by that is that basically we try to help growing the field of AI safety. We particularly focus on how can we get more talent to work on some of the key challenges which the field is trying to tackle. We do that in many different ways which I can expand on later.
My background is a bit weird. I used to be a professional software engineer for a couple of years. I did a lot of community building however non AI safety but a lot of open source community building. I also did a bunch of stuff for academic communities in Chile which is where I live.
Mykhaylo Filipenko: All right and thanks! You already started to talk about the org that you're running. Maybe you could comment how many people are with you, when and how did it get started and what was the idea behind it? That would be very interesting.
Agustín Covarrubias: Sure. We're a pretty small team. We’re currently two people. It’s me and my co-founder. Plus, we have some contractors that help us out with different things. We're growing though. We are currently trying to hire for two extra roles over the next seven months. So, maybe we'll double the team by end of year.
In terms of the origin story I think it is pretty complex: I guess the background context is there's this network of groups around the world called AI safety groups and these are usually clubs at different universities. They are normally run by students and focused on getting more people up to speed or upskilling around AI safety.
The hope is that these people will then move on or at least some of these people will then move on and have a career in the field. It is a pretty big ecosystem of groups. So nowadays there's 60 to 70 groups around the world. Maybe 40 to 50 of them are in the US.
Back when I started this, which was December 2023 or so, there was this network of groups that existed but no one was really supporting them. Some of these groups have had a lot of success of getting some incredible people into the field and excited about doing work in AI safety. Nonetheless there was very little work besides giving them grants. Hardly anybody would provide them the advice or input and strategy or mentorship and all these other things that you come along when running a group. That’s more or less where Kairos was born.
There is this org which is the Center for Effective Altruism who have been supporting Effective Altruism (EA) groups around the world and they have noticed we're pretty excited about supporting safety efforts as it seemed like all these AI safety groups should be supported by someone but probably it shouldn't be by EA though.
EA is a pretty distinct community even though it's related to AI safety in some regards. What they decided to do is to hire someone to plan for how to support safety groups long-term and then to spin off and create their own entity that is separate from the Center for Effective Altruism and could just operate in AI safety at large. So that's what I did. I joined EA for a few months. I created a project. I hired a founder while I was there and then we spun off into this separate thing which ended up being Kairos.
Officially, we started the new work on October 2024 and we've been operating since then and some things changed. Even though our main focus was a safety group support and it's still one of our main focus, we've also started running this quite large research program called SPAR which helps people of getting into AI safety research for the first time with professional mentors that can guide them through research projects, typically in a three month long research project.
Mykhaylo Filipenko: I think a lot of people by now heard about SPAR in the AI safety sector. I don't know maybe you could give one or two more words how it works and a little bit of details about it.
Agustín Covarrubias: SPAR is a virtual part-time research program where we pair mentors with mentees. For example, a mentor might run a project that's three months long and they might take three or five mentees and over that three-month period they'll work together to develop this research project. The hope is that this provides a very low threshold for people that want to get their first research experience in AI safety and want to benefit from strong mentorship from people who have already done this type of research. SPAR existed for a while now. I believe it was started around two years ago. We're in our sixth round of the program but it was originally started by some of these AI safety university groups.
Particularly, there was a group at Berkeley that back then was reasoning that all these PhD students are willing to supervise people doing AI safety research. Wouldn't it be nice if other people from other universities could apply and they started making this collaboration with other AI safety groups which ended up becoming SPAR. By the standards of research programs SPAR was pretty successful, so it got a bunch of applications and it started becoming this more competitive program but it was mostly run by this volunteer group of students working part time on it. Eventually someone decided the program should be professionalized.
So, they hired Lauren Mangala to run this program but Lauren left for another thing and that's when we took over to run this program.
Mykhaylo Filipenko: And besides this program, what are the other things that Kairos does currently?
Agustín Covarrubias: SPAR is one of our biggest programs and then we have all the things we do in regard to supporting a safety group. One of the main things we do there is we run a program called FSB which is a terrible name that we will probably change over the next few weeks but FSB is basically a program that supports group organizers. Basically helping people running these groups at universities through mentorship. We find more experienced group organizers, people that have been doing this for longer and we pair them together one-on-one and then they meet several times over the semester.
The mentor helps to provide input, advice and guide them through the steps of starting a group or running a group etc. Those are the two major programs we run so far. We also run smaller events: For example, there's something called Oaisis which is an in-person workshop for a safety group organizers and we're currently contemplating whether we should run other types of in person events as well.
Mykhaylo Filipenko: Maybe come back to SPAR. By now it seems there are a lot of programs like this. There is MARS, there is MATS, there is ARENA, there is AI safety camp? Do you feel we are getting too many programs or do you think we still need a couple of more?
Agustín Covarrubias: So, I think there's this weird thing where even though there's a lot of programs and I think maybe there's six or nine programs that compete for the same people, they do not really compete for the same people. Some programs are in person and therefore they would not compete with the same audience as SPAR. There are more virtual part-time programs: There's a safety camp, there's FAIK and there's a bunch of others as well but I think they cater to slightly different audiences and this means that even though there's many programs each of them is sort of picking a different piece of the pipeline.
For example, we're really concerned that we wouldn't be able to get as many mentors because there were other programs that were trying to get mentors at the same time. But we quickly realized that mentors had very different preferences. Should they be in person in London? Should they do it part-time? How competitive do they want their pool of applications to be relative to the other preferences? This means there's a bunch of niches that I think these programs can fill. That said, I think one problem returns from scale. I think it is probably not optimal to have an unlimited amount of research programs just because then we end up duplicating a lot of work.
I think over the last few months a bunch of these programs have started to coordinate more and talk to each other to figure out can we share more resources. Can we sort of eliminate some of the double work that's associated with running these kinds of programs? That is good trend.
Mykhaylo Filipenko: That's very interesting. How many people go through SPAR every year?
Agustín Covarrubias: Currently we have 170 mentees, 42 mentors per cohort, and two cohorts per year.
Mykhaylo Filipenko: Alright, so it’s like 300 to 400 people a year that come out of SPAR? I think the numbers of MATS etc. might be similar. Where do all these people go after? I am not sure but my gut feeling is that the labs we have now cannot absorb all this amount of people per year.
Agustín Covarrubias: Yeah. this is an interesting question. I think we've looked at some of the past participants for SPAR and I think a number of things happen. So there is the case that some do SPAR and immediately after get hired at AI safety role either at OpenAI, Anthropic or DeepMind or they go into an independent AI safety lab. Maybe they go to work at Redwood Research or the Center for AI Safety or some somewhere else. At the same time there's another fraction of the people that participate in SPAR, particularly the more junior ones, which do other things afterwards. For example some SPAR mentors decide to continue the research projects beyond the program. So they might keep their cohort of people. If they have three mentees, they might stick with them over a longer period of time and end up publishing a paper or seek a longer research collaboration with them. In other cases what might happen is that people might repeat SPAR. This is especially common with undergrads.
For someone who's on their final year, they might do SPAR in the first semester and then in the second semester do SPAR again either with the same mentor or with another mentor. Finally, there are people that transition to other research programs maybe more senior ones. This includes things like MATS or GAVI which are more competitive than SPAR itself and is often considered the gold standard for being a person that has had a lot of research experience or has been trained quite a lot to work in the field.
It really varies. People do all kinds of things after SPAR. And what we try to do is just keep SPAR relatively general so that it can support different journeys people might have into the field in terms of research agendas.
Mykhaylo Filipenko: Maybe I switch topics a little bit right now. So, I think you've seen a lot of different things in AI safety over the last years, especially drafting the programs, looking at different research agendas from all the different mentors. What do you feel is overrepresented and underrepresented in AI safety?
Agustín Covarrubias: Although people tend to be pretty strategic and tend to think a lot about which research agendas are the best bets and so on, the field still pretty much runs on vibes. What I mean by this is that we get these booms of interest for different areas of research over time. For example in the last few years there was this specific research agenda called eliciting latent knowledge and had all this hype around it. People were so excited that ELK was a really good framework for trying to figure out very hard problems associated with alignment. Then, in the last year or so maybe a bit longer the attention and interest came back down.
I think we're currently in another stage for the same process with mechanistic interpretability even though this topic was always a bit of an attractor for people. It has this very nice properties: It's very elegant and it's very good at nerd sniping people so it really targets people’s curiosity; it's very experimental driven so people like it a lot. Beyond just that general appealability there were breakthroughs that happened over the last two years mostly by Chris Olah, Anthropic, and some other labs as well. This sparked a massive drive of interest towards mech interp. As a result nowadays SPAR has maybe six to ten mech interp projects and we get a lot of applications to them relative to many other research agendas that are on the program. This is a thing where I tried to think about when the number of people is vetting on a certain agenda too much.
What ends up happening is that you need to worry about the people that are getting into this field only because of mech interp versus people that are actually pretty flexible and could have gotten into many possible research agendas. Thus, maybe we could say that mech interp is “overrepresented” where we're putting more resources than we would otherwise want to in this research agenda but at the same time mech interp is bringing so many people that wouldn't have gotten into the field of AI safety otherwise. So it's less of a concern for me that we're “losing” all this great talent to mech interp because I think the people are most into AI safety for the safety itself and tend to go to other research agendas as well.
Another overrepresented area is maybe evals where there was this huge rush of investment and excitement based on the following theory of change: You would create policies that would set thresholds of certain risk scenarios and when those thresholds were met then certain things would happen. This was very appealing because then you could legislate based on empirical evidence as it might evolve over time. You didn't have to ask politicians to actually buy into the risks right now. They just needed to buy in about which actions you would take if the risk were to manifest. Even though we were really excited about “if-then commitments” and evals was a major focus of work, lately it seems like eval related policies have not had a lot of success.
Thus, a growing number of people are pivoting away attention from evals work to other areas.
Mykhaylo Filipenko: Interesting. I think you said something about overrepresented areas maybe areas which attended a lot of attraction now what's the other side? What are areas where you see that these are maybe still underrepresented but very exciting.
Agustín Covarrubias: A thing we're probably neglecting too much is to work on digital sentience and digital welfare. If you explain this type of research to anyone outside of AI safety they might think you're crazy which sort of explains why we don't have a huge amount of people working on this. It's a thing that has maybe some stigma around it. Thankfully, I think there has been some progress here. I think there was a major move by Anthropic when they hired their first person to work on model welfare which was Kyle Fish.
And then at the same time there was this other org that was founded called Elios AI which is specifically focused on doing research on this. I think the tides are changing here and a lot of people are starting to figure out that this is really important. We're already seeing some people moving there but I would love to see even more work being done here.
There is also this broader thing: I think we're still putting most of our talent to work on technical research rather than policy. Only in the last few years people have been realizing that policy is ever more important as people's ideas of how risk might manifest and how we might prevent those change. We still haven’t fully updated there. For example, there is much more high quality research programs and talent pipelines for technical safety than there is for either governance or technical governance.
Mykhaylo Filipenko: The thing about AI welfare is a very interesting insight, indeed. Time for my last question today: You already touched theory of change. What is your personal theory of change?
What I hear a lot is that the big labs are going to close down access to them maybe in a year or two. People thinking about a Manhattan project for AGI and so on and so on. What is your theory of impact how independent organizations like Kairos will contribute to AI safety?
Agustín Covarrubias: In many scenarios it may be likely the default outcome that the AI safety community progressively loses access and influence. At the same time the way I think about my theory of change or the theory of change for Kairos is mostly focused on talent. Talent does not need to go to the AI safety community. But we hope that our programs help people to do that choice. Anthropic, DeepMind, etc. all these people are currently hiring for safety roles and security roles and at the same time we expect a lot of people to go into government.
And not just like policy people. Technical people as well. I think as people are more aware of the risks as more work is done to of set up the governance frameworks and policy frameworks, hopefully there will also be growing demand in both technical governance and to put governance people into places such as an AI safety institute. For example, the EU AI office is currently hiring like crazy right now.
Mykhaylo Filipenko: I think that's it from my side. Thanks very much for 20 very interesting minutes!
Agustín Covarrubias: Likewise, and thanks for having me!