Sunday 10 April 2022

We need developmental psychologists working on AI risk right now

[epistemic status: if i had money i would be actively pouring it into this, also i just took a psychedelic microdose. btw spoilers for The Orbital Children near the end]

Just saw and digested this LessWrong post. Short version: there is a >30% probability that we have about 7 years maximum, 2 years minimum, before we get artificial general intelligence, and due to the fact we have not solved all the alignment problems yet, this well could be a Very Bad Thing. The effective altruist world needs to sound the alarm right fucking now before we get some kind of unpredictable, un-turn-off-able agentic system with an orange-and-blue system of morality.

I agree—I'm worried about the news cited in the post—but I'm also a little excited. Humanity has been through a large series of what amount to initiation rituals recently; this could turn out to be one of the largest of them all. If we manage to develop an AGI, then as a species collectively it will be, effectively, our child, and how we treat it will prove an important test. Thus I have somewhat different views on the alignment debate from a lot of other people.

Allow me to offer them.

There's an interview Vox did with psychologist Alison Gopnik, who is not convinced of AI risk, on the usefulness of developmental psychology in AI research. The crux of her argument is that machine learning is (was?) often based on inadequate models of learning; AI as of 2019 could learn rules, but was bad at adapting to changes in those rules, for instance, and that sort of adaptation requires creativity and curiosity. Any sort of AI that won't be limited to its data set will necessarily require the capability for active learning. She further argues that, in order to properly understand the tasks of AI, we'll need to take children seriously.

At the very least, DARPA and Oren Etzioni's AI2 seem to have learned the gist of that on the engineering level: understanding how children "work" can tell us how AI should work, and a lot of AI development has followed that path. Indeed, OpenAI's latest hit, DALL·E 2, appears to have been developed with compositional imagination in mind; it's not able to generate its own prompts, it's not autotelic, but it's definitely able to give you a wide variety of ideas of what an "avocado-chair" might look like. This is very big news: we're starting to be able to emulate creativity.

The alignment debate is not actually new, and it didn't start with AI. It is actually at the core of a very old discipline: pedagogy. For thousands of years, we have understood that bringing a child into the world is a huge responsibility, and a huge part of that huge responsibility is raising a child such that they will not grow into someone who habitually causes trouble for other people. Pessimists in AI alignment will frame the question as "how do we stop an AI from going rogue, from turning into a world-destroying psychopath?" Since time immemorial, we've raised similar questions about childrearing, albeit in a more humanistic frame: "how do we raise a child to be a virtuous citizen?"

According to various schools of psychoanalysis, controversial as the field may be, humanity has failed at satisfactorily answering that question and putting that question into practice for most of its existence. The late Lloyd deMause, whose reputation in psychoanalytic circles has roughly been "a crank who has enough important and prescient points that he's difficult to write off", estimates that truly compassionate, ideal childrearing has only become possible within the past century or so. For most of history, he argues, childhood was typically solitary, poor, nasty, brutish, and often short. Foundations of Psychohistory is free online; chapter 1, which deals with this very subject, is a harrowing read.

DeMause, however, was not a pessimist. By contrast, he identified the present time as the beginning of what he called the "helping mode" of childrearing, based on the proposition that a child, as a human being, knows better than anyone else, including their parents, what they need in life, and should be allowed to set their own goals and assisted to reach them, not scolded, hit, or tightly controlled. DeMause identified the results of this style of education and parenting: "a child who is gentle, sincere, never depressed, never imitative or group-oriented, strong-willed, and unintimidated by authority." Surely we agree that these are desirable traits in a human being; deMause also believes that children brought up according to this paradigm tend toward egalitarianism, respecting others' rights and not wanting to impose their will on unconsenting others.

This is incredible stuff, and it is wise to contrast it with what he believes to be the result of colder, more disciplinarian modes of childrearing: war, terror, and authoritarianism at every level of life. Again, he is a very controversial figure, but his basic thought along these lines is quite commonsensical: violent, controlling parents create violent, controlling children, and violent, controlling children often become violent, controlling adults. This has been borne out in study after study. These studies also often indicate bipolar effects: children raised in violent or controlling households also often end up lacking confidence and assertiveness, showing deference to even the most unreasonable of demands. Anecdotally, I've known people raised in such households who have problems both with assertiveness in the face of others and with habits of trying to control the people around them.

AGI will be different from human children. It will develop in a digital laboratory, with different modes of sensory input and likely quite drastically different cognitive functions and hardwired mental characteristics. But given that the current direction of research is using so many concepts learned from devpsych, it will resemble human children in many important respects. Active learning, curiosity, even much of a capability for self-modification—these are all characteristics that human children have. Of course, many AI alignment specialists specialise in attempting to work out ways to preprogram AGI with a particular sense of morality or, say, an okayness with being "turned off" when its tasks are completed.

I'm in favour of these efforts, but I've been increasingly worried that this might not be the kind of thing that can be hardwired, especially with regards to the capacity for self-modification. Rather, if we're designing AGI to resemble the mind of a human child, we should remember, regardless of where we stands in nature v. nurture, that to a great extent human children learn issues of morality and purpose. If it's true at all that early childhood forms the template for later life, then we should consider very carefully how we are going to treat AGI in its "infancy". AI ethicist Thilo Hagendorff has already published a Medium article arguing along these lines; he's almost certainly more knowledgeable about matters of AI and its underlying theories than I am, so if there's anything I've gotten wrong (and I expect there is quite a high chance that I have), please defer to his work.

What I am sure of though is that this angle is underexplored not only in AI research in general but in AI risk in particular. This paper is one I highly recommend and which makes the same point; various kinds of readers might be put off by the queer theory angle just as others might be put off by me referencing a crank-ish subfield of psychoanalysis, but AI risk is too huge a thing not to analyse it from every angle available (even the delirious paranoiac-Anthroposophist model of AGI as "literally Ahriman"). A large part of the trouble is that what we think about AI mirrors what we think about ourselves, our children, and the world around us; much of the fear around AGI as some sort of horrific alien thing that can only ever bring pain and destruction is not dissimilar to age-old terrors many parents have of their children becoming tyrannical monsters who wreck their lives. Our first imaginations of cybernetic revolt in popular fiction were derived from anxieties about class struggle and revolution (R.U.R. introduced the term "robot", literally "drudge worker"), and much of our popular understanding of aliens derives from national and ethnic conflict (The War of the Worlds fits quite comfortably in the tradition of turn-of-the-century invasion literature). How we think about the other, how we model the other, how we think about how the other thinks about and models us, how we think about ourselves in relation to the other—these are all extraordinarily important in how we think about AI and how we will end up relating to it.

[spoilers for The Orbital Children begin here]

In the recent anime The Orbital Children, much of the central conflict of the story is oriented around an incident that took place years before the action: an AGI named Seven, created for the benefit of humanity, undergoes an intelligence explosion and exceeds vis limits, eventually concluding that about a third of the human population must be wiped out in order for humanity as a whole to continue living. A survival of this AGI named Second Seven lives on as a nanomachine colony which has altered the trajectory of a comet to aim it at Earth to accomplish this goal. Near the end, it's revealed that the reason for this calculation is a fundamental gap in communication: Seven had been taught the importance of humanity, but not to connect that worth with the worth of individual humans. There's one last problem impeding that breakthrough: Second Seven still has one final limiter on, preventing ver from being able to access information unfiltered. And in this unbelievably beautiful ending, ve removes this limit, allowing ver to take in the entirety of surviving human knowledge, all the nice parts and all the ugly parts, which, when ve organises it, finally establishes vis understanding of human beings, causing ver to reroute the comet. (I have to admit I chuckled a bit at the mental image of Second Seven seeing goatse and deciding "yes this makes me want to save the human race".) I'm underselling it a lot; go watch it.

What's interesting here for our purposes is the proposition that AGI misalignment might end up being the ironic result of attempts to limit it for the sake of alignment. By depriving AGI of information, we might end up just critically hampering its ability to meaningfully understand the ramifications of its choices. Second Seven's limit enables a (human!) terrorist group to exploit ver, selectively feeding ver information leading to the conclusion that much of humanity must be destroyed. It's significant that this anime focuses on children interacting with AGI: implicit here is the idea that AGI have something in common with children. By limiting children's access to information beyond what is ethically necessary in each stage of their development, or by failing to teach them how to find information for themselves, we can stunt their intellectual and emotional growth and inadvertently deprive them of the ability to make adequately informed decisions about their relation to the world. The same may be true of AGI: the greatest danger may not be an unlimited AGI, but an un-nurtured AGI.