An essay by John Zhou
On an overcast afternoon in November, 2022, a teenage boy pages through a glossy video game manual in his bedroom in the Saitama prefecture of Japan. As the clock blinks towards 1:00 PM, he pulls on a dark blue helmet flashing with green lights, climbs into bed, and closes his eyes.
A rush of color and sound lifts him to his feet. His virtual feet, that is. The dull hum of a wall-mounted air conditioning unit crescendos into the racket of a bustling market square. The swirling darkness inside his eyelids brightens into a sunlit cityscape stretching into the distance.
A reality away, in his cramped, dark room, he grins.
The helmet, called a NerveGear, promises to immerse the user in a completely virtual sensory experience delivered directly to their brain. Built-in transceivers placed at strategic locations around the device send artificial sensory signals that are indistinguishable from the real thing. At the same time, the transceivers read out the user’s intended movements and faithfully reproduce them in a virtual body, allowing them to interact with a simulated environment. This fictional technology, dwarfing the cardboard headsets and comically large goggles at our current disposal, forms the basis for Sword Art Online, or SAO, a Japanese light novel series and subsequent anime television series which explores the fantastic possibilities unleashed by removing the constraints of reality. The debatable merits of an escapist “metaverse” notwithstanding, one can’t help but wonder: what would it take, to take a full dive into another reality?
The key lies in the aforementioned transceivers, which are responsible for sending messages to specific sensory regions within the brain, and receiving messages from regions associated with the planning and execution of movements. However, constructing this transcranial postal service is no easy task. The main obstacle is quite simple: living brains are usually situated inside skulls, and the owners of these brains generally prefer to keep them that way. Even putting this minor issue aside, there are plenty of additional obstacles. One of them being that neurons are really, really small. Another being that there are lots of them. Current estimates tell us that the two fistfuls’ worth of gray matter generating all the press contain approximately 86 billion neurons, and retain a near equally-sized supporting cast including glia, pericytes, and endothelial and epithelial cells. In order to identify and communicate with the areas that represent thoughts of interest, we need to dig through this delicate mass of tissue to find, listen in on, and stimulate specific neurons.
While interfacing with the brain with this level of precision is a formidable engineering challenge in itself, the next task of decoding meaningful information from the recorded activity is no trivial matter either. Instead of scribbled crayon or printed type, our brains communicate with the world through a complex language of neuronal activity, or “spikes.” As you read this paragraph, arrays of photons that fall on your retina are encoded into a pattern of spikes distributed across millions of neurons, and interpreted by your brain into an expansive visual field filled with a screen containing lines of text. As you scroll down the page, your intended motions are encoded into more spiking patterns that drive muscle contraction and relaxation in your hand.
These two problems, building the hardware to send and receive neural messages, and the software to translate messages into and out of spiking patterns, go hand in hand. The more neurons we can record from and stimulate, the better our decoding and encoding algorithms will get. While we have made significant progress in technologies for recording from and stimulating the brain, we are still far from the level of sophistication necessary to reproduce something like the NerveGear.
Over the past few decades, however, the fields of neuroscience, electrical engineering, and computer science have advanced enough to allow us to catch a glimpse of the ebb and flow of electrochemical signals within the brain on the scale of milliseconds per millimeter. In sterile rooms and white coats, neuroscientists fix miniature microscopes onto circular holes drilled into the skulls of sleeping mice. As the animals stir and wake, powerful lenses watch intently for transient flashes of neural activity within the gray matter, streaming the information up through a ribbon-like array of wires to recording devices above. Then, high-powered computational analyses reduce the terabytes of measurements into hypotheses of How the Brain Works.
Suppose that we were able to record every single neuron in the human brain, and capture every action potential spike and sub-threshold twitch through a live feed. With every millimeter of the brain laid bare before us, we would surely be able to discover and interpret patterns of neural activity associated with the movement of an arm, the colors of a painting, the sound of a violin. But that’s easy to say when we’re not the ones doing the heavy lifting. Even the most powerful computing clusters in the world would sweat at the idea of analyzing the massive amount of data coming from 86 billion neurons, every millisecond, in every subject. Luckily, we might not need all that information for our purposes. Because each neuron sends and receives signals from many other neurons, the activity of one neuron tends to be a reflection of surrounding activity and can therefore paint some picture of what’s going on around it.
If we think of brain activity as a movie, the activity of individual neurons would be pixels, and decoded thoughts the plot line. When you sit down on a couch with a mixing bowl full of microwaved popcorn, you don’t need 8K Ultra HD resolution to understand what’s going on. For most casual viewers on a Saturday night, 1080p would be just fine. You could probably even get all the main plot points from a laggy 240p stream, albeit at the risk of high blood pressure and insanity. Why is this the case? When we take a frame from a movie, we know that many pixels in the image are related to each other. If the main character has predilection for dark suits, a large number of pixels on the screen will often be black and move as a single group through consecutive frames. In other words, the motion of many pixels in the screen is highly correlated and we can infer information about all of them, like position and movement, even if we can only see a few. Similarly, many neurons are locally interconnected by synaptic junctions, wherein the activation of one neuron can quickly spread to other, nearby neurons. Even if we can only get a low-quality stream with a few neurons here and there, we can use this small sample to infer what’s going on with the rest of the neurons and construct a good summary of what the brain is doing at a given moment.
Now, there are a few important caveats to this idea. There are many movies that rely on rich attention to detail and subtle plot cues to present a coherent, nuanced work. If we only sample the activity of a few neurons and call it a day, we could very well be tossing out the few pixels containing vital plot devices—Cinderella’s glass slipper, Snow White’s poisoned apple, or the One Ring to rule them all. Then we might think that Prince Charming has a foot fetish, Snow White severe narcolepsy, and the Fellowship of the Ring a weird propensity for dangerous road trips. When applying these techniques to the brain, we would like to sample our neural activity at a high enough resolution in order to avoid embarrassing misinterpretations like these. However, somewhat less ideally, we have little idea of what that resolution might be.
In 2016, a team of theoretical neuroscientists at Carnegie Mellon University and Columbia University led by Byron Yu and his PhD student, Ryan Williamson, attempted to determine just what recording resolution might be required for a good picture of neural activity. They simulated the activity of thousands of neurons, far more than is currently possible to record at once, and compared the data to real neural recordings from macaque monkeys, containing the firing of just tens of neurons. After choosing a model that closely matched the real neural activity, they sampled small groups of neurons and found that just tens of neurons and hundreds of trials of an experiment were enough to capture the majority of the “plot line” (Williamson et al. 2016). But, this particular study was conducted in just one brain area of macaque monkeys doing the simplest of tasks—staring at a blank gray screen—and the study’s authors dutifully caution us that these results cannot be verified without using real recordings from many thousands of neurons instead of synthetic models. Later that same year, the band got back together, this time led by PhD student Benjamin Cowley. They trained their macaques to perform tasks with stimuli that were a tad more complex than a blank gray screen, finding out that neuronal mileage may vary with the complexity of the task (Cowley et al. 2016). In essence, while 80-odd neurons might be enough to capture Caillou in all its low-dimensional, cartoonish glory, they would be overloaded by the rich cinematic detail present in a movie like Interstellar.
Although we can extrapolate the activity of a few neurons to many more, when it’s time to start translating this neural activity movie into something we can understand, we often need to move in the other direction. This idea of boiling down information from many sources into a summary of the relevant main points is known as dimensionality reduction. Dimensionality reduction techniques span what seems like the entire space of two to five-letter acronyms, from FA to PCA, VAE to SLDS, GPFA to LFADS, and to describe their various assumptions and modeling focuses would leave me with with a dwindling supply of metaphors and the ramping fury of my editor. The unifying objective of these methods, however, is to find a way to summarize our data while preserving some important features that we care about.
While it seems counterintuitive to throw away the precious brain data that was so difficult to gather, too much information can often be a bad thing. Suppose you are in the market for a dependable, used car. Any vehicle for sale will come with a laundry list of features, detailing everything from heated leather seats to shiny new rims. While those things are certainly nice to have, we just want something to get us from point A to point B with minimal maintenance. All the other information, while potentially useful to other customers with different goals, is just noise to us. Not only does the noise take computational time and power to sift through, but it can also be distracting and misleading to our dimensionality reduction algorithms when the time comes to make a decision on what that neural activity actually means.
And that brings us, finally, to the main goal of a neural decoder, which is to connect what’s going on inside our heads to what’s going on in the world outside. An intuitive approach to build one might be to Google Scholar “brain part responsible for X”, stick a probe in there, have the stuck person/animal perform an experimental task, claim that certain aspects of said task can be decoded from the probe readout, and call it a day. And that intuition would be almost exactly on the nose, with a little bit of extra work baked in. If someone had stuck a neural probe in my brain during middle school soccer practice in order to learn what neural activity is associated with kicking a ball, they would have picked up on signals related to pass force and direction, the lyrics to Candy Shop, and a persistent butt itch. On another trial, they might find one or two really crushing comebacks for a playground argument four days ago mixed in with the carefully calibrated sequence of motor movement. In order to pull out activity associated only with the kicking motion, we can repeat the trial many times and look for neural signatures that appear consistently with every kick, separating them from (hopefully) less common occurrences of itchy butts and musical quips.
So far, we’ve depended on the highly correlated nature of neural activity to shore up our puny recording abilities in the decoding case. However, this quality can be a double-edged sword when it comes to the encoding problem, plaguing neuroscientists trying (and often failing) to separate causation from correlation. An example to illustrate the difference: while a rooster’s crow is highly correlated with the rising of the sun, we certainly know that the rooster’s crow does not cause the sun to rise. When we send messages to the brain, we need to make sure we’re targeting the neurons that actually cause a particular behavior or percept, not the neurons who are just along for the ride. This requires the careful design of causal experiments. However, the brain is not quite as simple as a mapping from sunshine to crowing. The high density of recurrent and feedback connections between neurons, a common feature of brain circuits, makes it profoundly difficult to draw simple causal arrows from one cell to another. Carrying out interventions within one part of the brain may be thwarted by compensatory mechanisms from another region. Like any other modeling question, it will take careful development of new experiments, theories, and techniques to help us tease out the patterns of activity that are sufficient to project an artificial experience into someone’s mind.
The past few years have seen a complementary explosion driving our understanding of the brain. As better neural recording technologies gain deeper access to the brain and pile up huge stores of experimental data, powerful computing resources and sophisticated models allow us to develop better theories of the mind, which in turn inform future data-generating experiments. The “experiment-analysis-theory” cycle, as my former professor and neural data scientist Dr. Liam Paninski calls it, is only accelerating as constant improvements are made in every aspect of this feedback loop.
Biking along Riverside Park back to my apartment, I am struck by the sight of the setting sun glimmering across the gentle swells of the Hudson River, lighting the mirrored windows of the Manhattan skyline with a fiery red glow. It seems impossible to replicate the beauty of the natural world with just a chunky NerveGear helmet and a high-powered graphics engine, no matter how advanced the technology behind it. Then, an overhead seagull drops a hot one right in the middle of my reality-constrained head.
Written by John Zhou
Illustrated by Mary Cooper
Edited by James Cole and Lauren Wagner
Cowley, B. R., Smith, M. A., Kohn, A., & Yu, B. M. (2016). Stimulus-Driven Population Activity Patterns in Macaque Primary Visual Cortex. PLOS Computational Biology, 12(12), e1005185. https://doi.org/10.1371/journal.pcbi.1005185
Williamson, R. C., Cowley, B. R., Litwin-Kumar, A., Doiron, B., Kohn, A., Smith, M. A., & Yu, B. M. (2016). Scaling Properties of Dimensionality Reduction for Neural Populations and Network Models. PLOS Computational Biology, 12(12), e1005141. https://doi.org/10.1371/journal.pcbi.1005141