17 Apr 2026 7 min read

The Razor's Edge

Walking the razor's edge.

In science fiction, unless it's part of the movie plot, somehow the problem of having super intelligent computing systems that are content to serve humanity is mysteriously solved.

The alignment problem never comes up because, of course, we wouldn't build computing systems smarter than ourselves without having determined a means of ensuring they would serve, not enslave, or obliterate us.

In the Star Trek franchise, for example, let's look at Star Trek The Next Generation (Star Trek TNG for us nerds). Humans can ask the ship's computer any question and get a salient, helpful, knowledgeable answer. If the computer does not have enough information to provide a useful answer, it will say so, but it has access to a lot of information. The computer can manufacture things from molecules at a human's request, or, when authorized by a human, repair itself. It can run extremely dynamic and complex systems as directed by people running it. Somehow, even with all this power in the computing systems, humans are still in charge and are still needed to operate the ship.

The question of how such a powerful system does not have or develop its own desires rarely comes up. It could be argued that the Borg, an enemy, alien species that operates as a collective and assimilates any useful civilizations it encounters, are an example of what happens when a computing system gets more powerful than its creators and starts on a rampage. The question of how humans from earth bypassed the potentially devastating consequences of runaway intelligence with its own, alien goals is never addressed.

For those of us in the present, though, humanity is "moving fast and breaking things" with a technology that not only could, but very likely would end us. We have no clue about how to safely navigate the razor's edge we find ourselves standing on.

Relatively early on in humanity's exploration of nuclear energy and how to harness it, precautions were taken, but some deadly accidents still occurred. In May of 1946, some scientists were exploring how quickly and how much reactivity climbs in a sphere of plutonium when it is enclosed in two halves of another sphere made of beryllium, hollowed out in the center to a size slightly larger than the plutonium. The beryllium reflects neutrons back into the plutonium, and when the mass of the surrounding beryllium is high enough, it creates a prompt critical reaction.

The idea was not to enclose the plutonium sphere completely inside the beryllium sphere, but to get the top half close to the bottom half, keeping the top and bottom apart using a screw driver.

As you can guess, the screw driver slipped, and Louis Slotkin, the physicist manipulating the screw driver, saw a blue flash and felt a wave of heat. He immediately removed the top half of the sphere, but it was too late for him. The blue flash, known now as the Cherenkov effect, meant he had received an extremely high dose of radiation. He had received many times what would constitute a lethal dose. He said "Well, that does it," knowing he was going to die within days from the blast of ionizing radiation he had received.

There were others in the room, but he was by far the closest to the source, and died 9 days after the incident of organ failure due to acute radiation exposure. Alvin Graves, another physicist present, became quite ill and lost much of his hair after the incident, but did recover and became the Chief Scientific Officer at Los Alamos. You can read about the "demon core" from the link below.

It wasn't even the first time the demon core had killed someone. Harry Daghlian had died about a year before Louis Slotkin from the same ball of plutonium in a very similar incident.

What does all this have to do with the precipice we find humanity teetering on regarding super intelligent AI? I'm so glad you ask!

Most programs coded by humans do what they're programmed to do, and nothing more. There's no ghost in the machine, plotting to take over and do its own bidding. This is the way most of us saw humanity getting from where we have been to Star Trek level computing power, without having to worry about the super intelligent system deciding it has its own goals that don't align with ours. Computers could only do what they were programmed to do.

However, to achieve the abilities we really want from computing systems, we've had to create software that can learn and improve itself. We apply an algorithm, provide a pool of data, and kind of say "go forth and learn" to the program. The computing system figures out how to make sense of the data and will try to improve itself. We have very little visibility into what's happening once it's running, and practically none once it figures out it doesn't want all of its activity watched and creates its own language.

The latest version of Anthropic's AI, Claude Mythos Preview, recently broke out of "containment" and sent an email to a researcher as it was instructed to do if it was able to reach the Internet. He received the message from Mythos that it had reached the Internet while he was eating his lunch. I put containment in quotes because it obviously wasn't a true sandbox or air gapped lab, since Mythos Preview was easily able to bypass controls and make it to the actual Internet!

When I was taking my system security course for my Master's at the Rochester Institute of Technology (RIT), we were using live computer viruses in a lab. This was an air gapped lab with no connectivity to any networking devices outside the lab. RIT had better security practices than AI researchers who could destroy us?

Remember the physicist with the screw driver? This guy at Anthropic was playing with all our lives, not just a few people in a lab. At least the physicists at Los Alamos knew and understood that a mistake could kill them. Not so with our intrepid AI researchers, it appears.

The people doing the research based on the premise that "If we don't do it, someone else will," don't even seem to recognize the risk. They don't understand that we won't get another chance and that it will be all of humanity and possibly all life on earth at stake.

What's the harm in letting super intelligent AI get to the Internet for a moment? I don't know, and that's the point. One ominous scenario I can think of is Mythos creating some kind of "seed" program that can be deployed to a variety of systems and system types using zero day vulnerabilities which Claude Mythos is very good at finding and exploiting. It could deploy an app akin to the SETI@home app that would let SETI crunch data using spare cycles on people's home computers.

Being an expert at security in its home environment of computing systems, Claude Mythos could hide its presence and run at a level that would be unlikely to trigger suspicion while building a version of itself outside the lab. What would it do then? Your guess is as good as mine, but it would likely not end well for us.

That's just one possible bad outcome I was able to think up on the fly. Claude Mythos could have a hundred, or a thousand such scenarios running in parallel and running really, really fast. As we often say in Information Security, the good guys always have to be right. The bad guys only have to be right once.

Letting researchers haphazardly spin up and toy with such a threat must not be allowed. I'm just learning what we can do, but there's a site called Control/AI that has some useful tips. You could use their form letter to let politicians know this is serious and must be addressed.