The best clue might come from a 2022 paper written by the Anthropic team back when their startup was just a year old. They warned that the incentives in the AI industry — think profit and prestige — will push companies to “deploy large generative models despite high uncertainty about the full extent of what these models are capable of.” They argued that, if we want safe AI, the industry’s underlying incentive structure needs to change.
Well, at three years old, Anthropic is now the age of a toddler, and it’s experiencing many of the same growing pains that afflicted its older sibling OpenAI. In some ways, they’re the same tensions that have plagued all Silicon Valley tech startups that start out with a “don’t be evil” philosophy. Now, though, the tensions are turbocharged.
An AI company may want to build safe systems, but in such a hype-filled industry, it faces enormous pressure to be first out of the gate. The company needs to pull in investors to supply the gargantuan sums of money needed to build top AI models, and to do that, it needs to satisfy them by showing a path to huge profits. Oh, and the stakes — should the tech go wrong — are much higher than with almost any previous technology.
So a company like Anthropic has to wrestle with deep internal contradictions, and ultimately faces an existential question: Is it even possible to run an AI company that advances the state of the art while also truly prioritizing ethics and safety?
“I don’t think it’s possible,” futurist Amy Webb, the CEO of the Future Today Institute, told me a few months ago.
“The government needs to stop people from doing a capitalism, but it had better not stop anyone from doing a capitalism, that would be tyranny.”
LLM are non-deterministic. “What they are capable of” is stringing words together in a reasonable facsimile of knowledge. That’s it. The end.
Some might be better at it than others but you can’t ever know the full breadth of words it might put together. It’s like worrying about what a million monkeys with a million typewriters might be capable of, or worrying about how to prevent them from typing certain things - you just can’t. There is no understanding about ethics or morality and there can’t possibly be.
What are people expecting here?
If those words are connected to some automated system that can accept them as commands…
For instance, some idiot entrepreneur was talking to me recently about whether it was feasible to put an LLM on an unmanned spacecraft in cis-lunar space (I consult with the space industry) in order to give it operational control of on-board systems based on real time telemetry. I told him about hallucination and asked him what he thinks he’s going to do when the model registers some false positive in response to a system fault… Or even what happens to a model when you bombard it’s long-term storage with the kind of cosmic particles that cause random bit flips (This is a real problem for software in space) and how that might change its output?
Now, I don’t think anyone’s actually going to build something like that anytime soon (then again the space industry is full of stupid money), but what about putting models in charge of semi-autonomous systems here on earth? Or giving them access to APIs that let them spend money or trade stocks or hire people on mechanical Turk? Probably a bunch of stupid expensive bad decisions…
Speaking of stupid expensive bad decisions, has anyone embedded an LLM in the ethereum blockchain and givien it access to smart contracts yet? I bet investors would throw stupid money at that…
That’s hilarious. I love LLM, but it’s a tool not a product and everyone trying to make it a standalone thing is going to be sorely disappointed.
I’m expecting that everything that the statistical models reveal or make convincing results about which benefit the owners of the models will be exploited. Anything that threatens power or the model owners will be largely ignored and dismissed.
They are deterministic though, in a literal sense. Rather their behavior is undefined. And yes, a LLM is not a person and it’s not quite accurate to talk about them knowing or understanding things. So what though? Why would that be any sort of evidence that research efforts into AI safety are futile? This is at least as much of an engineering problem as a philosophy problem.
The output for a given input cannot be independently calculated as far as I know, particularly when random seeds are part of the input. How is that deterministic?
The so what means trying to prevent certain outputs based on moral judgements isn’t possible. It wouldn’t really be possible if you could get in there with code and change things unless you could write code for morality, but it’s doubly impossible given you can’t.
The output for a given input cannot be independently calculated as far as I know, particularly when random seeds are part of the input.
The system gives a probability distribution for the next word based on the prompt, which will always be the same for a given input. That meets the definition of deterministic. You might choose to add non-deterministic rng to the input or output, but that would be a choice and not something inherent to how LLMs work. Random ‘seeds’ are normally used as part of deterministically repeatable rng. I’m not sure what you mean by “independently” calculated, you can calculate the output if you have the model weights, you likely can’t if you don’t, but that doesn’t affect how deterministic it is.
The so what means trying to prevent certain outputs based on moral judgements isn’t possible. It wouldn’t really be possible if you could get in there with code and change things unless you could write code for morality, but it’s doubly impossible given you can’t.
The impossibility of defining morality in precise terms, or even coming to an agreement on what correct moral judgment even is, obviously doesn’t preclude all potentially useful efforts to apply it. For instance since there is a general consensus that people being electrocuted is bad, electrical cables normally are made with their conductive parts encased in non-conductive material, a practice that is successful in reducing how often people get electrocuted. Why would that sort of thing be uniquely impossible for LLMs? Just because they are logic processing systems that are more grown than engineered? Because they are sort of anthropomorphic but aren’t really people? The reasoning doesn’t follow. What people are complaining about here is that AI companies are not making these efforts a priority, and it’s a valid complaint because it isn’t the case that these systems are going to be the same amount of dangerous no matter how they are made or used.
While an LLM itself has no concept of morality, it’s certainly possible to at least partially inject/enforce some morality when working with them, just like any other tool. Why wouldn’t people expect that?
Consider guns: while they have no concept of morality, we still apply certain restrictions to them to make using them in an immoral way harder. Does it work perfectly? No. Should we abandon all rules and regulations because of that? Also no.
Yes. Let’s consider guns. Is there any objective way in which to measure the moral range of actions one can understand with a gun? No. I can murder someone in cold blood or I can defend myself. I can use it to defend my nation or I can use it to attack another - both of which might be moral or immoral depending on the circumstances.
You might remove the trigger, but then it can’t be used to feed yourself, while it could still be used to rob someone.
So what possible morality can you build into the gun to prevent immoral use? None. It’s a tool. It’s the nature of a gun. LLM are the same. You can write laws about what people can and can’t do with them, but you can’t bake them into the tool and expect the tool now to be safe or useful for any particular purpose.
You can write laws about what people can and can’t do with them, but you can’t bake them into the tool and expect the tool now to be safe or useful for any particular purpose.
Yes, and that’s why the decision making and responsibility (and accountability) must always rest with the human being imo, especially when we deal with guns. And in health care. And in social policy. And all the other crucial issues.
So what possible morality can you build into the gun to prevent immoral use?
You can’t build morality into it, as I said. You can build functionality into it that makes immmoral use harder.
I can e.g.
- limit the rounds per minute that can be fired
- limit the type of ammunition that can be used
- make it easier to determine which weapon was used to fire a shot
- make it easier to detect the weapon before it is used
- etc. etc.
Society considers e.g hunting a moral use of weapons, while killing people usually isn’t.
So banning ceramic, unmarked, silenced, full-automatic weapons firing armor-piercing bullets can certainly be an effective way of reducing the immoral use of a weapon.
None of those changes impact the morality of a weapons use in any way. I’m happy to dwell on this gun analogy all you like because it’s fairly apt, however there is one key difference central to my point: there is no way to do the equivalent of banning armor piercing rounds with an LLM or making sure a gun is detectable by metal detectors - because as I said it is non-deterministic. You can’t inject programmatic controls.
Any tools we have for doing it are outside the LLM itself (the essential truth undercutting everything else) and furthermore even then none of them can possibly understand or reason about morality or ethics any more than the LLM can.
Let me give an example. I can write the dirtiest most disgusting smut imaginable on ChatGPT, but I can’t write about a romance which in any way addresses the fact that a character might have a parent or sibling because the simple juxtaposition of sex and family in the same body of work is considered dangerous. I can write a gangrape on Tuesday, but not a romance with my wife on Father’s Day. It is neither safe from being used as not intended, nor is it capable of being used for a mundane purpose.
Or go outside of sex. Create an AI that can’t use the N-word. But that word is part of the black experience and vernacular every day, so now the AI becomes less helpful to black users than white ones. Sure, it doesn’t insult them, but it can’t address issues that are important to them. Take away that safety, though, and now white supremacists can use the tool to generate hate speech.
These examples are all necessarily crude for the sake of readability, but I’m hopeful that my point still comes across.
I’ve spent years thinking about this stuff and experimenting and trying to break out of any safety controls both in malicious and mundane ways. There’s probably a limit to how well we can see eye to eye on this, but it’s so aggravating to see people focusing on trying to do things that can’t effectively be done instead of figuring out how to adapt to this tool.
Apologies for any typos. This is long and my phone fucking hates me - no way some haven’t slipped through.
there is no way to do the equivalent of banning armor piercing rounds with an LLM or making sure a gun is detectable by metal detectors - because as I said it is non-deterministic. You can’t inject programmatic controls.
Of course you can. Why would you not, just because it is non-deterministic? Non-determinism does not mean complete randomness and lack of control, that is a common misconception.
Again, obviously you can’t teach an LLM about morals, but you can reduce the likelyhood of producing immoral content in many ways. Of course it won’t be perfect, and of course it may limit the usefulness in some cases, but that is the case also today in many situations that don’t involve AI, e.g. some people complain they “can not talk about certain things without getting cancelled by overly eager SJWs”. Society already acts as a morality filter. Sometimes it works, sometimes it doesn’t. Free-speech maximslists exist, but are a minority.
It’s impossible to run an AI company “ethically” because “ethics” are such a wibbly-wobbly and subjective thing, and because there are people who simply wish to use it as a weapon on one side of a debate or the other. I’ve seen goalposts shift around quite a lot in arguments over “ethical” AI.