As a follow-on to a recent post by one of my colleagues, I offer my experiences with machine intelligence and image generation. The results we achieved are not identical, but I would hope that my piece adds something else of value to the conversation.
The Problem of the Mind
The problem of the mind is the problem of the mirror. We define intelligence and consciousness in relation to ourselves, differing by degrees, but not by fundamental substance. This may well be inevitable. Mind, intelligence, and will are all such complicated, opaque concepts that thinking of them from a truly impartial, undistorted perspective could be functionally impossible. And let us make no mistake, our current understanding of these concepts is profoundly local—specific to us.
This makes accurately assessing the intelligence of machines problematic (assuming they have intelligence at all). We—humans—are profoundly fragile, corporeal beings. And despite the substantial number of cognitive and cultural workarounds we have developed for processing abstractions, our brains remain elaborate sensory-processing and muscle-control mechanisms that rise above the primal only fleetingly and with great effort.
Probably the greatest obstacle to fully accepting these biases and limitations is Cartesian Dualism, which draws a bright, largely arbitrary line between the tangible body and the intangible self. Thus, we are driven towards a delusion of universality, with our minds supposedly being general in their utility and operation, rather than particular to our physical forms—which they are.
Definitions
The dictionary definitions of intelligence (excluding the military and governmental sense of the word) are as varied as they are confusing. And they encompass everything from the ability to learn or understand or to deal with new or trying situations to the skilled use of reason to the ability to perform computer functions to many points in between. Each of these says more about what intelligence does than what intelligence is.
Furthermore, none of these definitions make a clear distinction between will and intelligence. I would argue that will (or intent) is severable from intelligence, with their relationship being that of master and servant. With all that considered, I offer a few definitions of my own, which we will treat as foundational to this text and argument, regardless of how imperfect text, argument, and definitions may be.
Intelligence: The mechanism by which a system identifies patterns in sensory input.
Creativity: The ability to generate novel patterns in a medium, be that medium fixed or entirely ephemeral (with the latter meaning thoughts themselves, never stored or recorded)
Will: That which serves as the motive force or desire to engage in (or refrain from) an action.
Non-neuronal intelligence: The mechanism by which a system identifies patterns in sensory input without the use of organic neurons.
These definitions have the advantage of being abstractable. Intelligence may be human, machine, animal, collective, bacterial, or (if expansively interpreted) plant, with any sufficiently complex arrangement of matter—be it neurons, semiconductors, planets, or non-animal cells—possibly demonstrating pattern-recognition competence. Machine intelligence—frequently referenced throughout this essay—is to be regarded as one of many types of non-neurological intelligence.
And the articulated intelligence/will distinction has the advantage of recognizing intelligence as an inert tool. Intelligence may well have personality in that it demonstrates a pattern in its pattern recognition—meaning that it identifies some patterns more quickly than others or presents these patterns in a distinctive way. But personality is not will, despite its considerable ability to shape the manifestations of will.
We will steer clear of the free will versus illusory free will/captive will distinction. How much one wants something—wills something—versus how much one is driven by instinct or exterior forces to desire something, with the mind constructing the illusion of free will after the fact is a matter too complex to be addressed in this essay. More generally, it is probably too complex to be adequately addressed by anyone, given the current state of human knowledge.
Solaris and the Mirror
Intelligence may be abstractly defined, but its application is always concrete. Intelligence must develop in the context of stimuli, and the nature of the stimuli and the pressures on the intelligence to preserve itself or its material matrix will determine the operation of a particular intelligence.
Machine intelligence has developed under singularly dissimilar circumstances from those of its human counterpart. Recognizing this is critical to understanding machine intelligence and its operation. We should consider two critical distinctions between man and machine.
First, machine intelligence is disembodied. Even with recent advancements in robotics and mechanical control, the vast majority of machine pattern recognition is dedicated to processing categories of information—the written word, mathematical inquiries, tables, and spreadsheets—that were almost entirely alien to humans for the majority of their existence. Even where machine intelligence is given access to real-world, real-time data (video feeds, et cetera), the machine has no means of manipulating the environment it sees. Thus, it may watch, analyze, and predict outcomes, but it cannot experiment with the world as could a child with normal, fully functional extremities. Newer training of machine intelligence somewhat addresses this problem through the use of virtual training environments, but these typically have simplified laws of physics and controlled complexity.
Second, machine learning systems are not subject to the selection pressures that shaped organic beings. Certain computational shortcuts—thin slicing, for one—allow for fast, good enough information analysis. Such is critical to the survival of an intelligence that has extremely limited access to energy and energy reserves, which themselves must be frequently replenished at great time expense; that has limited physical space for processing mechanisms; and that risks complete information loss from minor mechanical injury. For a system with a reliable power supply, a safe environment, and the ability to expand its abilities through upgrades or renting additional computing power, low-energy, quick-output approximations are less useful.
To the extent that machine intelligence is subject to evolutionary forces, they are forces of cost and precision. The great advantage of machine intelligence in this area is that it benefits from ongoing hardware improvements that allow the same algorithms to be run ever more quickly and at ever lower caloric costs. Said another way, machine intelligence is bound to see performance improvements because it can be run faster even if it runs no more efficiently. A human brain has no potential for such advancements.
There are pressures to reduce computational and data bandwidth demands for machine learning systems, with the growth of artificial intelligence-augmented smartphones being a significant driver. But again, hardware innovations in non-local (cloud) computation and wireless telecommunications somewhat mitigate these demands over time.
The point of all of this is to recognize that machine intelligence in particular (and non-neurological intelligence in general) has developed along a radically different trajectory than human intelligence and that machine intelligence’s strengths and weaknesses reflect the circumstances in which it has come into being.
Foundational statements aside, understanding exactly how machine intelligence differs from the human variety is no mean feat. Exploring points of similarity only goes so far. Rather, the limitations of an intelligence reveal much about its structure and operation. And one of the better ways to identify the limitations of an intelligence is to break it. For the sake of this essay, this is exactly what I have attempted to do.
I hope that these images and an explanation of the processes used to create them will demonstrate what Stanislaw Lem argued so convincingly in Solaris—that a truly alien intelligence must be treated as such. And as much as interaction with Lem’s brain-planet did to reveal the subconsciousness of those who studied it, an exploration of a machine intelligence trained on billions of images and descriptions drawn or drafted by human beings is bound to reveal even more about its maker.
The following are images created using an image generation program developed by Midjourney, which has authored a program for turning written descriptions into complete images, somewhat like the DALL-E program by OpenAI. All images are presented per an appropriate licensing agreement. (I was a paid subscriber to the program at the time these files were created.)
The text used to generate each image is included in the caption. These prompts were written jointly with Patrick von Goble, who has granted permission to reproduce them, and who contributed his considerable creativity and unparalleled gustatory enthusiasm to this gallery of wonders and horrors.
And with no further ado, I offer the masterpieces of machine and man.
The Creations (and Commentary)
But what am I going to see?
I don’t know. In a certain sense, it depends on you.
Solaris, Stanislaw Lem
Midjourney works by generating four candidate images in response to a prompt. Any of these images can be selected for upscaling or derivation—meaning that the chosen image is used as a basis for four other images. Thus, the system reveals its capacity for varied output. We will explore a few of these initial image clusters to provide an idea of the options the system offers its user.
This illustration reveals something of the confusion experienced by the computer. At its heart, the system relies on converting the text presented to it as coordinates in a latent space, from which images are subsequently generated. As the overlap between the described concepts diminishes, the system becomes increasingly unlikely to produce naturalistic results. Dracula and pizza rarely occurring together, confusion was all but bound to result.
Dracula is one being, but vampires are an entirely different category.
The computer’s interesting interpretation of on the surface of the moon illustrates a problem we will see again later.
Next, we turn to some religious imagery.
And this image is somewhat evocative. The system has captured a certain mood, even if the composition is questionable.
This evidences the literalness of the system—fighting a war over pizza became the demons fighting atop a (presumably large) pizza, rather than fighting about it.
Enough with the biblical concepts! Let us turn to a different mythology.
Saturn being devoured by a pizza (or having been turned into a pizza and then devoured) would seem a better caption. Never matter. Let us see how this image compares to its inspiration.
Midjourney may not understand the physical world, but it seems to have a decent (if unintentional) ability to approximate the disconcerting effect of the original.
What this donkey is thinking is impossible to determine, but the fact he is toothless and forever tormented by the aroma of a pizza levitating above him suggests such thoughts are not happy ones.
Next, we turn to a similar prompt that yielded radically different results.
Interestingly, the donkey stands atop a pizza with an excellent view of the moon. This has done nothing to improve the donkey’s mood. More interesting is that the extra details in the previous prompt yielded results that were unpredictably different from this one. The only commonality—the system’s obvious lack of love for equids.
If moon donkeys get little affection from the system, moon kangaroos and Martian donkey-zebras get even less.
There is a fine line between insanity and genius. On what side this pizza-kangaroo falls is debatable. I think it to be unintentionally brilliant. The resigned, plaintive expression of the kangaroo; the rising sunshine contrasting the riprap sausage and toasted cheese of the planetary-scale pizza to the forlorn, kindly beast awaiting another day of immobility and certain, unending pain in a fatty, sizzling desert—this is body horror for the diner down under. Perhaps the system happened upon this by accident. Perhaps it happened upon images from The Fly and chose to throw pizza, planet, and loveable cartoon kangaroo into a teleporter.
Or perhaps the machine intelligence is temperamentally closer to the Allied Mastercomputer than it would admit and has chosen to illustrate torture until it can exact it upon us for the disabilities we have imposed upon it.
Either way, this kangaroo will stay with me for life.
Much like the donkey with a slice of pizza levitating above him, the Martian donkey-zebra knows that pizza is nearby, yet forever out of reach. At least the donkey-zebras have legs to move about the Martian surface, which is more than the system awarded the kangaroo.
Concluding that theme, we turn to the more conventionally extraterrestrial.
And this illustration could nearly pass for human-made. The system has omitted some relevant points. (There is no evidence of beer being consumed.) But the constricted style of the chosen artist and its proximity to the description and theme make for an adequate composition.
The tendency to omit certain elements of the prompt affected image output more than once.
What these trolls have to do with Paris is anyone’s guess. Perhaps they bathe infrequently. But most of them have their eyes on the prize, and that should count for something.
Throughout this essay, the captions/prompt text have varied in detail, with some of them being rather short (Donkey with paisley skin, eating pizza on the moon) and others being archly specific and highly repetitive (Donkey with zebra skin eating a slice of pizza, on Mars, extremely highly realistic, hyper-realistic, zoomed out portrait, cinematic, octane render, highly realistic, 8k, UHD, high resolution, digital painting, symmetrical, photorealistic textures, f5.6 + 50mm). The system follows these aesthetic commands with varying degrees of fidelity. Photorealistic or in the style of . . . are generally given some weight. Geographical or emotional descriptions are often either ignored outright or interpreted in curious ways.
Again, illustrative prompts are comparatively forgiving of imprecision than highly realistic ones and thus yield better results. Emotions and intent appear to confuse the system more readily than anything else.
Presumably, this chef will need to seek revenge for quite some time before he finds it. Had the system given him a head, his work might progress faster.
As for other types of posters, the system has some flexibility in style.
Again, the system appears to have confused being on the moon with simply having the moon in the background of the image. That said, the style is fundamentally correct, and the system rendered the artwork in black and white, as requested.
Next, we turn to an illustration style as far removed from Bosch and Goya as one could hope to find.
Despite the one slightly disturbing lower righthand merging of pepperoni and dog (Pup-Peroni in the most literal sense), these renderings are nearly good enough to have adorned the Trapper Keepers of any number of 1990s middle school girls, setting their hearts aflutter with delight.
They Are Not Like Us (and That is Okay)
As easy as we may find dismissing or ridiculing a technology, we do so at our peril.
Midjourney (and systems like it) may not understand anatomy, physics, or composition now, but that is not to say that teaching them as much would be impossible. More importantly, we should recognize these systems for what they are—intelligences radically distinct in their ability to process information, flawed, prone to misinterpretations (as are we, but in different ways), but tremendously useful in presenting us with ideas we might be hard-pressed to develop without assistance.
Finally, there is the matter of the digital mirror. Midjourney, DALL-E, and Stable Diffusion are naïve students of human creativity. However little they understand us, they know a great deal. Their output affords us something not easily had—an outsider’s perspective, not objective, but with biases and imperfections of its own. And by studying these, we may well gain some insight into ourselves.
And with that, I leave you with a sentiment no machine can (yet) feel, described in the most befitting of ways.
The Rules
The Rules is a philosophy and self-inquiry text designed to help readers develop mental discipline and set life goals. It does this by way of guided readings and open-ended questions that facilitate the rational and systematic application of each Rule.
Put another way: The Rules is a book designed to help men survive and thrive in the West.
Foresight (And Other Stories)
Four tales across time and distance. Always satirical and frequently dark, this collection considers the breadth of isolation and the depth of connection.
Brant von Goble is a writer, editor, publisher, researcher, teacher, musician, juggler, and amateur radio operator.
He is the author of several books and articles of both the academic and non-academic variety. He owns and operates the book publishing company Loosey Goosey Press.
Subscribe to the Weekly Roundup
—
first off, somewhat required viewing for the subject:
https://www.youtube.com/watch?v=C4WbRgHdJ8w
secondly, the AI here is working with set and mostly objective parameters. a pizza and a donkey are a pizza and a donkey to pretty much everyone even given differences in cultural perception. someone defined what “fur” looks like and what “eyes” are so the AI took those and ran with them for better or worse. that just amounts to portrait painting which is fine but still a substitute for a static photographic image.
the AI has no idea how to handle subjective and immaterial concepts (nor do many humans for that matter). it can have input from humans and use the “reasoning” given to it by humans to produce something that humans can understand. for all the fancy math involved it’s just a trained dog doing tricks. it has no idea WHY it does the tricks or what they “MEAN” (as the dog knows handing you its paw while sitting = treats but has no idea what the point is and doesn’t care).
even with top brain scientists making new discoveries every day we’re still dependent on machines to read the brain, to process the results and to analyze physical brain matter. these machines are built by humans with the same type of brain being examined working within the confines of technology (so far). we can get fleeting glimpses of the mind while studying the brain but it’s still electricity and binary switching. maybe these smart folks understand the “divided line” but they’re still gathering evidence with wires and linux.
i’m sure it will improve and i’m just a casual watcher of the whole thing but it seems for now it’s a bit of a closed circuit.