How our Visual Neurons relate to Deep Neural Networks

Our brain has been evolving for millions of years, ever-changing and adjusting to handle novel stimuli and conditions, looking like a bag of slimy, gooey matter folded in various ways inside our head. This organ and the billions of neurons that it is composed of are our best bet for understanding what intelligence is and how it emerges. But another good bet is to create (artificial) intelligence ourselves and then explain our own creation.

In recent years, we’ve come one step closer to creating intelligent machines. Specifically in terms of vision, we now have systems that can accurately classify images from thousands of categories. This is thanks to convolutional neural networks (CNNs) and the efficiency of the graphical processing units used to train them. With a sufficient amount of images to train these networks, CNNs can learn to adjust their artificial neurons’ connection strength (“weights”) in such a way that, after training, they are able to classify images that they have never even seen before.

“Visual information enters our brain through our eyes, travels through neurons in brain regions, and is consumed in stages of increasing complexity…”

The role of CNNs is to receive an input image and return a number of probabilities for each of the classes that it is trained to recognize. A term that comes up often when talking about CNNs is “features”, which are the output of our model neurons. CNNs work because each neuron in the model is assigned a mathematical operation to perform based on which layer it belongs to. A neuron can, for example, belong to an early, middle, or late layer in the processing stream. This operation involves the weights, or connection strengths, of the neuron and the small window it can “see”. You can also think of a neuron as a pattern detector, checking if the area that it’s looking at contains something familiar to the pattern of its weights and outputting a “score” for this familiarity.

Example CNN architecture. How a typical CNN looks (courtesy Wikimedia user Aphex34 , source)

Interestingly, CNNs have been loosely inspired by our brain’s visual system. Visual information enters our brain through our eyes, travels through neurons in brain regions, and is consumed in stages of increasing complexity, starting from simple visual representations such as edges, lines, and curves and continuing to more complex representations such as faces or full bodies. In fact, this deconstruction of information in stages of increasing complexity is what CNNs also seem to replicate. The concept is not foreign to older computational approaches to vision. It is also seen in the older and biologically inspired HMAX model, which tries to explain visual processing in the cortex as a “hierarchy of increasingly sophisticated representations”.

How can we create a model for the visual system’s neurons using artificial neurons in CNN layers? A “model” is a mathematical function that, given the same input as a biological neuron, can produce a similar output. In order to create it, one first needs to understand and define the input.

Let’s take a look at a study performed by scientists from the Neurophysiology Group of KU Leuven. This study looked specifically at the inferotemporal cortex, known to be one of the “late” processing stages of our visual system (ventral stream) and, more specifically, from a section of cells that are more responsive to images that contain a body versus images of other categories (e.g. faces). They get excited when a body is present in the image, even if it’s headless!

Example stimulus (without the green frame, source).
Example stimulus #2











Biological neurons respond to images like this. Scientists can then collect neuronal responses through electrophysiology experiments. In such experiments, a monkey is shown images on a monitor, and an electrode inserted in the monkey’s brain records the activity or response of neurons to the presented images.

An example neuron was presented with hundreds of images of black shapes with white background. Some of the shapes were random, some shapes resembled animal or human bodies, and other shapes resembled silhouettes of other categories. The study’s goal? To find out if we can use artificial neurons from CNN layers to predict the actual biological response of that neuron to silhouette images.

“Deep CNN layers could receive an image as input and produce a response very close to what an actual neuron would produce!”

In the study, the CNNs were “pre-trained”, meaning that they have already been taught to classify images using a very large dataset. Thus, to get each layer’s activations, scientists provided an input image to this pre-trained network and then took the output of each layer. The mathematical modeling task in the study measured the relationship between the biological neuron’s response and the corresponding values of artificial neurons’ activations. This task is what statisticians call a “regression analysis”, where one tries to predict a value of one variable (biological neuron’s response) given the values of other variables (artificial neurons’ activations).

In every regression analysis, there were two numerical matrices involved: the X matrix of predictor variables and the Y matrix of the target variable(s). In the study, scientists built a separate regression setup for each CNN layer. By building these regression analysis setups, they aimed to replace a monkey’s biological neuron with all artificial neurons present in a CNN layer (X) and discover what the spiking pattern (Y) looked like.

Ultimately, this study found that deeper CNN layers proved to be strong models for body patch neurons, which are also deeper in our brain’s visual processing stages. While the pre-trained networks are trained using natural images, this didn’t hinder the results of this study that used silhouette images. Thus, deep CNN layers could receive an image as input and produce a response very close to what an actual neuron would produce! More importantly, the best model from one of those deep layers could explain on average up to 80% of the variability in the responses of real neurons to silhouettes.

Indeed, artificial ensemble of neurons performing mathematical operations come very close to matching how our visual neurons work! There’s something satisfying about the fact that CNNs engineered by humans are so similar to something that resulted from millions of years of evolution: the biological neuron. Our ability to simulate the brain using artificial neural networks has never been stronger. The key to reverse engineering the brain may be a strange marriage of AI and neuroscience.

NOTE: Ioannis Kalfas is the first author of eNeuro’s publication here. All data presented here (example images) belong to KU Leuven and for more details on the subject you can refer to eNeuro journal, or KU Leuven.

How do you think new technologies will emulate the brain in the future? Tell us below in the comments!

Support Science Communication by Knowing Neurons:

Become a Patron!

Illustration of deep neural networks by Michal Roessler.
Illustrated by Michal Roessler.

Ioannis Kalfas

Ioannis Kalfas is a former PhD researcher in the department of Neuroscience at KU Leuven. His research focused on primate vision, trying to explain the underlying mechanisms of shape understanding in the brain. His project involved the study of Artificial Neural networks, such as deep Convolutional Neural Networks, and their representational similarities to biological neurons from regions of the brain that are responsible for object recognition. He received his MSc. in Machine Learning from KTH University in Stockholm. Currently, after 1 year of working as a Data Scientist in Industry, he is back to KU Leuven performing research on insect identification in fields using biosensors.