Cambridge scientist: Intelligence robots need to be taught like teenagers
Ross J. Anderson, computer scientist and professor at Cambridge University, tells ERR in an interview that if we do not teach AI Western values, intelligence robots could one day defect to the Chinese side. This means that in the future, we'll need to teach computers like we teach our children.
Could you explain the concept of alignment when it comes to AI. What does it mean to align AI?
Aligning AI is about ensuring that a model has got the same kind of aims and purposes as the people who built it or are operating it. It can vary according to the context. For example, it can mean not generating dangerous content, not generating content that is offensive or pornographic.
You said in your NATO Cyberdefense Center of Excellence's CYCON conference speech that it is a major issue we need to solve. Why is it difficult to align AI to make sure it does not produce this type of content?
Large language models are trained on enormous scripts of data from the entire internet. This means they pick up the bad stuff with the good. In addition to respectable newspapers, they also read sheets. They read extremist rants on obscure websites. They read all sorts of bad things. So it's quite easy to provoke a large language model to go off and start talking extremist stuff or racist or homophobic stuff.
One of the things you have to do is use various methods to align it. This is typically done with reinforcement, learning with human feedback. You get the model to generate thousands upon thousands of sentences on various topics, and reward or punish it, depending on human assessment of whether the output is acceptable or unacceptable.
This is how you can align the model for the intended purpose. The problem is that now that the technology is out there and well understood, other people can do the same. They can take a model that they've downloaded from somewhere, and they can train it to produce extremist content, right-wing propaganda, Russian propaganda or whatever kind of propaganda they choose.
You've talked a little about the risks already, but thinking about major applications, perhaps military applications as well. What are the risks there? I believe that in your talk you even said we might see AI go over to the Chinese side.
Let me give you an example. When you're trying to test autonomous vehicles, one of the requirements may be that if the vehicle becomes confused, it must stop safely, without departing from the lane it's in. Turns out this is impossible to do, because usually when a car gets confused, it's because it doesn't know what lane it is in. It is passing through roadworks or some other strange situation where it just loses its orientation.
And so it becomes impossible to abide by a safety property unless you continue to understand your situation, your context. Now, if you translate this to the more difficult environment of warfare, where you've got various rules where it's okay to target enemy military personnel but it's not okay to target civilians, you can only abide by these rules insofar as you continue to understand the situation and context. So the kinds of problems we're struggling with when we try and fail to certify autonomous cars are going to be present once you start building battle robots.
That is just the second stage. The third stage is once we get to the next level. Where robots are not merely conducting operations like soldiers but actually giving commands. There you have to start explaining [to the robots] the reasons for fighting, start understanding the values involved.
Now, in the case of the Chinese, this is straightforward enough. The duty of all robots is to uphold the rule of the Chinese Communist Party and suppress dissidents. In the case of the West, our aims and objectives are much more diffuse, much more complex and difficult to describe. How to describe on a single page what democracy and the rule of law are about. It is typically an entire course. Even civics for high school students has many difficult points, points of controversy and conflict. Things that need to be reinterpreted by every generation. So it is a much more difficult task to align senior robots with the values of the society they're supposed to be defending.
In this circumstance there is the particular risk that if intelligence and military agencies give the robots secret orders to behave just like Chinese ones, in order to wiretap, the former might come to the conclusion that the West is just being hypocritical and China is the real power in the world.
Just as Edward Snowden defected from the NSA under moral stress and gave his story to The Guardian and The New York Times, I think it is not impossible that a Western defense or intelligence robot might under similar stress logically prefer to defect to China in the future.
How do we overcome this? You mentioned in your talk that it is similar to how we might school our children, while it is a little more difficult with robots. They don't go through the stage of growing up, with their parents giving them advice. How do we teach them?
It is one of the points I was making for future research that at present people who talk about the philosophy of AI refer to the trolley problem (a philosophical thought experiment that asks whether it is morally justified to switch a trolley to a different track to save several people if it causes a single person's death – ed.). They say how you can give robots orders to save one person at the cost of another person's life. But this is most likely not how things will be done if we ever get to that point where AI has become artificial general intelligence.
I expect that any AI getting close to the threshold of being an AGI would end up having to go through an educative process, a childhood and adolescence. And as everyone who has had a teenage child or grandchild knows, this can be rather traumatic because at 12 or 13 both boys and girls start kicking back against parental authority, start doing things they're not supposed to do.
They may start experimenting with alcohol or tobacco or come home at two in the morning, just to test the boundaries and show they're growing up. Can you imagine what it would be like if an AGI controlled by a large company started behaving like that? What would happen if a phone company decided it was a teenager once more?
Quite a scary thought. How far are we from this? Computer scientists say that while AI seems to be doing very complex things, it is in essence still an empty shell. There is no internal life, as opposed to what we believe about humans. Is it a problem now, or will it only become a problem once we have AGI?
Personally, I'm not one of the doomsters. I don't believe large language models will lead us to AGI. But there are many very smart people whom I respect and who have a different opinion. Geoffrey Hinton, for example, the inventor of large language models believes they are the path to AGI and that there are significant risks involved. Which is why he resigned or retired from Google a few weeks ago. He signed this letter yesterday with many other people, saying that we should treat AI as an existential threat.
I do not believe that the path to AGI goes directly through LLM, but we're learning so much from them that given another 10-20 years, it is not impossible we'll get to AGI. And when we do, we better be prepared for it.
So we've got to think about issues like alignment. If we're going to make large-scale use of AI in governance, you have to think about the particular problems of alignment, because governments are so much more complex than companies. A company like IBM or whatever just exists to write software for its customers. A company like Facebook exists to sell advertising. That's straightforward enough, but governments are not there just to do something.
Governments are there to mediate conflicts in society between rich people who want to pay less tax and poor people who want better health services. Between young people who want there to be a lot of apartment buildings so they can get a home and start a family early and all the people who don't want their view spoiled.
There are all these fights in every society, and so governments lie all the time. Politicians are forever saying one thing to Peter and another thing to Paul. Now we have the internet and we can see what they've said. They now make all sorts of vague promises which are never implemented, and that's perfectly understandable, given what government is. But once you start having AIs in government, which are supposed to do everything and nothing and be all things to all people... Where will they get their moral direction from? Who's going to be behind the wheel?
Do I understand you correctly that because an authoritarian system is simpler, the rules are also simpler to teach. The Chinese government mediates the conflicts you've just described in a rather simplistic way. Is there the risk that dictators will be able to make use of AI much more simply than our complex democratic societies?
It's more complicated because China is trying to do entirely different things with its big AI players. China is doing surveillance. China does lots of face recognition. Lots of CCTV cameras in public places. They spot you if you're a Tibetan or if you're an Uighur or a known enemy of the state. It is done to support the power of the Communist Party.
One of the things the Chinese are not keen on is language models because it's very difficult to keep modern LLMs aligned right. In China, alignment would mean never saying anything rude about the Communist Party. I'm sure enough people in Estonia remember Soviet rule when you had to be careful not to say anything rude about Brezhnev. So you understand how difficult it is to use an LLM like that in such societies.
On the other hand, we've made this big gamble in large language models and are now building them into everything. The two paths have diverged. Now, here's the rub. Many developing countries are looking to China for their development assistance because China is offering a lot of money as part of its deliberate ploy to counter Western influence in places like Africa, South Asia and Latin America where the Chinese are looking to get raw materials, food etc.
We may see Cold War II played out in third world battlegrounds where battles might not be fought with partisans and counterinsurgency operations. It could be a conflict between IT-systems and AI systems used by the different sides, both to run companies and for administration.
All of that is rather a view to the future, while ordinary people are already coming into contact with AI. The popularity of ChatGPT has exploded, even if it has remained a toy for many. Do we have everyday examples of AI that has not been aligned? Is it already having an effect for the ordinary person?
Well, I think a lot of what you see today on websites is what economists call sludge. The bad variety of nudge (in economics, suggestions and positive reinforcements to influence consumer behavior – ed.). I flew here from London on a Ryanair flight, and flying with Ryanair, as you know, you have to fight through this website, which is forever trying to sell you more baggage, more parking spaces, more hotel rooms. More of everything.
You've got to really keep your concentration to get the deal that you want. That is what the world has evolved into and web search is broken. If you search for the Hilton hotel, the first seven hits will likely be for booking agencies rather than the actual hotel.
This has been brought about by the economics of the internet, and the economics of AI will be no different. Let me give you an example of how it may go wrong. One of the promises of large language models is providing companionship for the elderly. Once people reach their 70s or 80s, they are often very lonely because many of their friends have died or are demented. Providing companionship for the elderly is socially valuable. If you're rich, you can afford to have servants in your house. If not, then you're just lonely.
There will probably be a lot of companies producing bots to keep people company, converse with them through their speakers or via text on their phones. It would bring a lot of value. It'll make people happier. They'll probably live longer and be able to call for help if they fall. But it's all wide open to exploitation. Because if you manage to get into those large language models, you can sell all people all sorts of dubious investments.
And alignment, I believe you're suggesting, is a way to perhaps avoid this? What would be the good scenario and how much progress are AI developers making on that front?
Large AI companies make a significant effort for ethics. But that is not what the next two or three years of development will look like, because the existence of machine learning as a service means that anybody can now do a startup and can offer their own products based on things like ChatGPT or similar, which they can just rent per transaction from Microsoft. Alternatively, by downloading one of the many public domain models out there. We will see rapid technological process whereby language models can be run on cheaper and cheaper hardware, modified by third parties.
There will be more and more startups, including some very predatory and evil ones, getting into the business of selling machine learning in various forms.
That is quite a threatening or pessimistic perspective. Is there any sort of encouraging message or are we doomed?
No. We're not doomed. These technological revolutions occur from time to time, and you can never tell in advance what is going to happen. When Gutenberg invented movable type printing, for example, he couldn't possibly have foreseen that Erasmus would produce a new translation of the Bible. And then Martin Luther would nail some theses on a church door in Wittenberg, and then there'd be 100 years of war, followed by the enlightenment and all the rest. All of it was completely unpredictable at the time. But if you could go back in history, would you strangle Gutenberg in his cradle?
Probably not!
Exactly!
--
Follow ERR News on Facebook and Twitter and never miss an update!
Editor: Marcus Turovski