Shan Carter, a researcher at Google Brain, recently visited his daughter’s second-grade class with an unusual payload: an array of psychedelic pictures, filled with indistinct shapes and warped pinwheels of color. He passed them around the class, and was delighted when the students quickly deemed one of the blobs a dog ear. A group of 7-year-olds had just deciphered the inner visions of a neural network.
Carter is among the researchers trying to pierce the “black box” of deep learning. Neural networks have proven tremendously successful at tasks like identifying objects in images, but how they do so remains largely a mystery. Their inner workings are shielded from human eyes, buried in layers of computations, making it hard to diagnose errors or biases. On Wednesday, Carter’s team released a new paper that offers a peek inside, showing how a neural network builds and arranges visual concepts.
This particular line of research dates back to 2015, when Carter’s co-author, Chris Olah, helped design Deep Dream, a program that tried to interpret neural networks by reverse-engineering them. Olah’s team taught a neural network to recognize an array of objects with ImageNet, a massive database of images. Then they told the network to, say, generate a dog or a tree based on what it had “learned.” The results were hallucinogenic images that reflected, in a limited sense, how the model “saw” the inputs fed into it. (It later turned out the system could also produce rather pricey works of art.)
Since then, Olah, who now runs a team at research institute OpenAI devoted to interpreting AI, has worked to make those types of visualizations more useful. Neural networks are composed of layers of what researchers aptly call neurons, which fire in response to particular aspects of an image. For each level of the network, Carter and Olah grouped together pieces of images that caused roughly the same combination of neurons to fire. Then, as with Deep Dream, the researchers reconstructed an image that would have caused the neurons to fire in the way that they did: at lower levels, that might generate a vague arrangement of pixels; at higher levels, a warped image of a dog snout or a shark fin. They arranged similar groups near each other, calling the resulting map an “activation atlas.”
That lets researchers observe a few things about the network. By toggling between different layers, they can see how the network builds toward a final decision, from basic visual concepts like shape and texture to discrete objects. Olah has noticed, for example, that dog breeds (ImageNet includes more than 100) are largely distinguished by how floppy their ears are. The atlas also shows how the network relates different objects and ideas—say, by putting dog ears not too distant from cat ears–and how those distinctions become clearer as the layers progress.
Jeff Clune, a professor at the University of Wyoming who wasn’t involved in the study, says that the atlas is a useful step forward, but of somewhat limited utility for now. Researchers trying to understand how neural networks function have been fighting a losing battle, he points out, as networks grow more complex and rely on vaster sums of computing power. “That increase so far has far outstripped our ability to invent technologies that make them interpretable to us,” he says.
The research also unearthed some surprises. As an illustration, Olah pulls up an ominous photo of a fin slicing through turgid waters: Does it belong to a grey whale or a great white shark? As a human inexperienced in angling, I wouldn’t hazard a guess, but a neural network that’s seen plenty of shark and whale fins shouldn’t have a problem. Then he shows me the atlas images associated with the two animals at a particular level of the neural network—a rough map of the visual concepts it has learned to associate with them. One of the shark images is particularly strange. If you were to squint a bit, you might see rows of white teeth and gums—or, perhaps, the seams of a baseball.
It turns out the neural network they studied also has a gift for such visual metaphors, which can be wielded as a cheap trick to fool the system. By manipulating the fin photo—say, throwing in a postage stamp image of a baseball in one corner—Carter and Olah found you could easily convince the neural network that a whale was, in fact, a shark.
It’s true, Olah says, that the method is unlikely to be wielded by human saboteurs; there are easier, and more subtle ways of causing such mayhem. So-called adversarial patches can be automatically generated to confuse a network into thinking a cat is a bowl of guacamole, or even cause self-driving cars to misread stop signs.
But he finds it exciting that humans can learn enough about a network’s inner depths to, in essence, screw with it. The hope, he says, is that peering into neural networks may eventually help us identify confusion or bias, and perhaps correct for it. Neural networks are generally excellent at classifying objects in static images, but slip-ups are common—say, in identifying humans of different races as gorillas, and not humans. With visualization tools like his, a researcher could peer in and look at what extraneous information, or visual similarities, caused it to go wrong.
That said, there are risks to attempting to divine the entrails of a neural network. “With interpretability work, there’s often this worry that maybe you’re fooling yourself,” Olah says. The risk is that we might try to impose visual concepts that are familiar to us, or look for easy explanations that make sense.
That’s one reason some figures, including AI pioneer Geoff Hinton, have raised an alarm on relying too much on human interpretation to explain why AI does what it does. Just as humans can’t explain how their brains make decisions, computers run into the same problem. As Hinton put it in a recent interview with WIRED, “If you ask them to explain their decision, you are forcing them to make up a story.”
More Great WIRED Stories