The best clue might come from a 2022 paper written by the Anthropic team back when their startup was just a year old. They warned that the incentives in the AI industry — think profit and prestige — will push companies to “deploy large generative models despite high uncertainty about the full extent of what these models are capable of.” They argued that, if we want safe AI, the industry’s underlying incentive structure needs to change.
Well, at three years old, Anthropic is now the age of a toddler, and it’s experiencing many of the same growing pains that afflicted its older sibling OpenAI. In some ways, they’re the same tensions that have plagued all Silicon Valley tech startups that start out with a “don’t be evil” philosophy. Now, though, the tensions are turbocharged.
An AI company may want to build safe systems, but in such a hype-filled industry, it faces enormous pressure to be first out of the gate. The company needs to pull in investors to supply the gargantuan sums of money needed to build top AI models, and to do that, it needs to satisfy them by showing a path to huge profits. Oh, and the stakes — should the tech go wrong — are much higher than with almost any previous technology.
So a company like Anthropic has to wrestle with deep internal contradictions, and ultimately faces an existential question: Is it even possible to run an AI company that advances the state of the art while also truly prioritizing ethics and safety?
“I don’t think it’s possible,” futurist Amy Webb, the CEO of the Future Today Institute, told me a few months ago.
None of those changes impact the morality of a weapons use in any way. I’m happy to dwell on this gun analogy all you like because it’s fairly apt, however there is one key difference central to my point: there is no way to do the equivalent of banning armor piercing rounds with an LLM or making sure a gun is detectable by metal detectors - because as I said it is non-deterministic. You can’t inject programmatic controls.
Any tools we have for doing it are outside the LLM itself (the essential truth undercutting everything else) and furthermore even then none of them can possibly understand or reason about morality or ethics any more than the LLM can.
Let me give an example. I can write the dirtiest most disgusting smut imaginable on ChatGPT, but I can’t write about a romance which in any way addresses the fact that a character might have a parent or sibling because the simple juxtaposition of sex and family in the same body of work is considered dangerous. I can write a gangrape on Tuesday, but not a romance with my wife on Father’s Day. It is neither safe from being used as not intended, nor is it capable of being used for a mundane purpose.
Or go outside of sex. Create an AI that can’t use the N-word. But that word is part of the black experience and vernacular every day, so now the AI becomes less helpful to black users than white ones. Sure, it doesn’t insult them, but it can’t address issues that are important to them. Take away that safety, though, and now white supremacists can use the tool to generate hate speech.
These examples are all necessarily crude for the sake of readability, but I’m hopeful that my point still comes across.
I’ve spent years thinking about this stuff and experimenting and trying to break out of any safety controls both in malicious and mundane ways. There’s probably a limit to how well we can see eye to eye on this, but it’s so aggravating to see people focusing on trying to do things that can’t effectively be done instead of figuring out how to adapt to this tool.
Apologies for any typos. This is long and my phone fucking hates me - no way some haven’t slipped through.
Of course you can. Why would you not, just because it is non-deterministic? Non-determinism does not mean complete randomness and lack of control, that is a common misconception.
Again, obviously you can’t teach an LLM about morals, but you can reduce the likelyhood of producing immoral content in many ways. Of course it won’t be perfect, and of course it may limit the usefulness in some cases, but that is the case also today in many situations that don’t involve AI, e.g. some people complain they “can not talk about certain things without getting cancelled by overly eager SJWs”. Society already acts as a morality filter. Sometimes it works, sometimes it doesn’t. Free-speech maximslists exist, but are a minority.