A team of Italian mathematicians, including one who is also a neuroscientist from the Champalimaud Centre for the Unknown (CCU), in Lisbon, Portugal, has shown that artificial vision machines can learn to recognize complex images spectacularly faster by using a mathematical theory that was developed 25 years ago by one of this new study’s co-authors. Their results have been published in the journal Nature Machine Intelligence.
During the last decades, machine vision performance has exploded.
For example, these artificial systems can now learn to recognize virtually any human face – or to identify any individual fish moving in a tank, in the midst of a large number of other almost identical fish which are also moving.
The machines we’re talking about are, in fact, electronic models of networks of biological neurons, and their aim is to simulate the functioning of our brain, which is as good as it gets at performing these visual tasks – and this, without any conscious effort on our part.
But how do these neural networks actually learn?
In the case of face recognition, for instance, they do it by acquiring experience about what human faces look like in the form of a series of portraits.
More specifically, after being digitized into a matrix of pixel values (think about your computer monitor’s RGB system), each image is “crunched” inside the neural network, which then manages to extract general, meaningful features, from the set of sample faces (such as the eyes, mouth, nose, etc).
This learning (deep learning, in its more modern development) then enables the machine to spit out another set of values, which will in turn enable it, for instance, to identify a face it has never seen before in a databank of faces (much like a fingerprint database), and therefore to predict who that face belongs to with great accuracy.
The story of Clever Hans
But, before the neural network can begin to perform this well, though, it is typically necessary to present it with thousands of faces (i.e. matrices of numbers).
Moreover, much as these machines have been increasingly successful at pattern recognition, the fact is that nobody really knows what goes on inside them as they learn their task.
They are, basically, black boxes.
You feed them something, they spit out something, and if you designed your electronic circuits properly… you’ll get the correct answer.
What this means is that it is not possible to determine which or how many features the machine is actually extracting from the initial data – and not even how many of those features are really meaningful for face recognition.
“To illustrate this, consider the paradigm of the wise horse”, says first author of the study Mattia Bergomi, who works in the Systems Neuroscience Lab at the CCU.
The story dates from the early years of the 20th century.
It’s about a horse in Germany called Clever Hans that, so his master claimed, had learned to do arithmetics and announce the result of additions, subtractions, etc. by tapping one of its front hooves on the ground the right number of times.
Everyone who witnessed the horse’s performance was convinced he could count (the event was even reported by the New York Times). But then, in 1907, a German psychologist showed that the horse was in fact picking up unconscious cues in his master’s body language that were telling it when to stop tapping…
“It’s the same with machine learning; there is no control over how it works or what it has learned during training”, Bergomi explains.
The machine having no a priori knowledge of faces, it just somehow does its stuff – and it works.
This led the researchers to ask:
could there be a way to inject some knowledge of the real world (about faces or other objects) into the neural network, before training, in order to cause it explore a more limited space of possible features instead of considering them all – including those that are impossible in the real world?
“We wanted to control the space of learned features”, Bergomi points out.
“It’s similar to the difference between a mediocre chess player and an expert: the first sees all possible moves, while the latter only sees the good ones”, he adds.
Another way of putting it, he says, is by saying that “our study addresses the following simple question: When we train a deep neural network to distinguish road signs, how can we tell the network that its job will be much easier if it only has to care about simple geometrical shapes such as circles and triangles?”.
The scientists reasoned that this approach would substantially reduce training time – and, not less importantly, give them a “whiff” of what the machine might be doing to obtain its results.
“Allowing humans to drive the learning process of learning machines is fundamental to move towards a more intelligible artificial intelligence and reduce the skyrocketing cost in time and resources that current neural networks require in order to be trained”, he remarks.
What’s in a shape?
Here’s where a very abstract and novel mathematical theory, called “topological data analysis” (TDA), enters the stage.
The first steps in the development of TDA were taken in 1992 by the italian mathematician Patrizio Frosini, co-author of the new study and currently at the University of Bologna.
“Topology is one of the purest forms of math”, says Bergomi.
“And until recently, people thought that Topology would not be applied to anything concrete for a long time.
Until TDA became famous in the last few years.”
Topology is a sort of extended geometry that, instead of measuring lines and angles in rigid shapes (such as triangles, squares, cones, etc.), seeks to classify highly complex objects according to their shape.
For a topologist, for example, a donut and a mug are the same object: one can be deformed into the other by stretching or compression.
Now, the thing is, current neural networks are not good at topology.
For instance, they do not recognize rotated objects.
To them, the same object will look completely different every time it is rotated.
That is precisely why the only solution is to make these networks “memorise” each configuration separately – by the thousands.
And it is precisely what the authors were planning to avoid by using TDA.
Think of TDA as being a mathematical tool for finding meaningful internal structure (topological features), in any complex “object” that can be represented as a huge set of numbers, by looking at the data through certain well-chosen “lenses” or filters.
The data itself can be about faces, financial transactions or cancer survival rates.
For faces in particular, by applying TDA, it becomes possible to teach a neural network to recognize faces without having to present it with each of the different orientations faces might assume in space.
The machine will now recognize all faces as being a face, even in different rotated positions.
It’s a 5! No, it’s a 7!
In their study, the scientists tested the benefits of combining machine learning and TDA by teaching a neural network to recognise hand-written digits.
The results speak for themselves.
As these networks are bad topologists and handwriting can be very ambiguous, two different hand-written digits may prove indistinguishable for current machines – and conversely, two instances of the same hand-written digit may be seen by them as different.
That is why, to be performed by today’s vision machines, this task requires presenting the network, which knows nothing about digits in the world, with thousands of images of each of the 10 digits, written with all sorts of slants, calligraphies, etc..
The new approach allows artificial intelligence to learn to recognize transformed images much faster. The image is credited to credited to Diogo Matias.
To inject knowledge about digits, the team built a set of a priori features that they considered meaningful (in other words, a set of “lenses” through which the network would “see” the digits), and forced the machine to choose among these lenses to look at the images.
And what happened was that the number of images (that is, the time) needed for the TDA-enhanced neural network to learn to distinguish 5’s from 7’s – however badly written -, while maintaining its predictive power, dropped down to less than 50!
“What we mathematically describe in our study is how to enforce certain symmetries, and this provides a strategy to build machine learning agents that are able to learn salient features from a few examples, by taking advantage of the knowledge injected as constraints”, says Bergomi.
Does this mean that the inner workings of learning machines which mimic the brain will become more transparent in the future, enabling new insights on the inner workings of the brain itself?
In any case, this is one of Bergomi’s goals.
“The intelligibility of artificial intelligence is necessary for its interaction and integration with biological intelligence”, he says.
He is currently working, in collaboration with his colleague Pietro Vertechi, also from the Systems Neuroscience Lab at CCU, on developing a new kind of neural network architecture that will allow humans to swiftly inject high-level knowledge into these networks to control and speed up their training.
Topological Data Analysis (TDA) has been a successfully applied to a range of applications in the recent years — whether it is to process and segment a digital image, gain insights into patterns formed by biological systems such as flocks of birds or a herd of buffaloes, positioning of sensor networks, or simply detect from a set of discrete data-points the underlying shape of the object they reside on.
On the lines of machine learning, TDA belongs to a category of mathematical tools that aim to determine mathematical associations or patterns in data from complex systems, without claiming to understand their inner mechanisms.
These tools don’t need to understand the biological functioning of birds to provide a interpretation of how they group together in a flock over time.
This is a bold approach; however, the age of big data has made this a viable, and sometimes a more preferred one.
The difference between TDA and general ML is that TDA is specifically concerned with the analyzing patterns or properties pertinent to the shape of the data.
What is topology?
Very simplistically, topology is the study of shape of objects subject to smooth operations such as stretching and twisting. However, topology is not concerned with the exact geometric description of an object: it does not bother itself with how many edges an object has, or whether an object is round or oval in shape. Topology tries to describe an object in more general terms — how many distinct components does an object have, how many holes on its surface, or how many cavities. As an example, look at the two familiar objects below — a doughnut and a coffee cup. Very different in terms of the exact geometrical shape, but perform a few transformations on their mouldable clay models — and your doughnut becomes a coffee cup. Both of them are equivalent in a sense. They have a single visible cavity and are one complete connected object. They are the same object topologically.
At the cost of being more formal, topology of an object is described by a set of numbers called as the Betti numbers, each number β(k) describing the number of holes an object contains in k-dimensions; β(0) is then the number of connected components of your object, β(1) is the number of 1-dimensional holes, β(2) is the number of 2-dimensional holes, and so on.
“Mathematically, the Betti numbers of the above donut/coffee mug/torus are: β(0) = 1, β(1) = 2, β(2) = 1.”
For now, we don’t need to get into the mess of these Betti numbers. Just let it sink in that β(0)is the contiguous components in the object, β(1) is the flat total gaps in the 2D plane area of the object, and β(2) is the total 3D gaps in the object. We will discuss this in details over the course of these articles.
But then what is the topology of a real world data?
It is easy to define a topology on a well-defined surface. But what do you do with discrete data-points observed from a real-world complex system? How do you define a topology for a flock of birds or a digital image? The solution is Topological Data Analysis — using a tool called as Persistent Homology. Persistent Homology builds relationships between data-points by connecting them together through some well-defined rules. We’ll see what those rules are soon, but for now assume that we have defined connections between data in a particular fashion. What are these connections? Are they simple line segments (1-dimensional objects) between points so that we have generated a complex network from our 2-dimensional data? But then, one would argue — why stop at 0 and 1-dimensional connections? Why not define triangles, tetrahedrons and higher-dimensional connections?
A simplicial complex is a structure generated from such generalized connections between data-points. Figure 4 below shows a 2D projection of Lorenz attractor in its full majesty, its discrete point-data, and a simplicial complex generated from it respectively. The simplicial complex in the figure is generated from triangles, lines and points.
Figure 4: A 2D projection of the Lorenz system trajectory on the left and its point cloud data on the right
We can easily spot the commonality between the full dynamics in Figure 4 and the simplicial complex in Figure 5 — the presence of the two dominant holes. So what does this mean for us?
Essentially a simplicial complex is able to reconstruct the topological structure of an object from discrete data. And so, generally speaking,
“Determining a simplicial complex from a point cloud data, if done right, can extract the exact number of holes (in different dimensions), as the original object”
“A simplicial complex of a discrete point cloud data can reconstruct the topology of the underlying object”
This is pretty powerful, because now we can use this tool to define a topology for systems that inherently don’t possess shape, and so, attempt to explain them better.
By defining a simplicial complex over an arrangement of flock in the sky, we can make claims about the topology of this arrangement and associate it with the state of the underlying system. This is where we come back full circle — studying a system based on its geometric/topological manifestation and circumventing a detailed complex analysis that requires domain knowledge.
There’s a potential challenge in this rosy picture we just presented. The simplicial complex and the associated connections are based on certain rules. How do you decide which two points should be joined by a line, and which shouldn’t? Which points should be connected together by a triangle, instead of pairwise segments? We won’t go through these rules in detail, but a crucial point to be driven here is that the simplicial complex like the one shown in the above figure, are defined only for a specific range of distances. Their structure changes as you change the distance. The figure below shows how the complex changes as you change a spatial scale parameter — Ɛ.
For a small Ɛ, we had a disconnected skeleton with hardly a few points connected, whereas for a large Ɛ, the connections were so dense that the two holes of the Lorenz attractor were filled up. Neither of these situations represent the attractor; its only a certain range of the spatial scale parameter that a simplicial complex more closely resembles the original attractor.
Of course, there’s ways to handle that — and eventually we will discuss filtration techniques and complexes (the rules that define the connections), and barcodes/persistence diagrams (rules that define which persistent features are important) in future articles.
TDA and ML
TDA provides a new approach of understanding patterns in your data that are associated with its shape. Its hard to decide whether machine learning would do a better job of figuring out these patterns. One advantage that TDA definitely offers is interpretability — something that ML falls short in current times. TDA defines formalized methods and tools to explicit define shape, whereas ML is restricted to making possible inferences between data and shape definitions. And its not to say that they cannot work together. Properties extracted by TDA can serve as effective features for ML algorithms, whereas the interpretation of algorithms can validate the claims by TDA about system behavior. But more on this later.
Topological Data Analysis (or TDA) is an exciting new tool that is being rapidly applied to a variety of complex systems by investigating their shape. Through the course of my yammering, I hope to have impressed on you that TDA doesn’t make any assumptions about the mechanism of the system while analyzing its shape.
There’s a lot of stuff I have skipped and love to go into the details of:
- What are the rules for creating simplicial complexes?
- What is a “hole” in a simplicial complex?
- What tools do I use for doing TDA? Should I have to write from scratch the code for finding k-dimensional holes in my data?
- What is the mathematics underlying these fascinating set of tools? Can I develop an intuition without going through Introduction to Topology?
Through a series of articles, I will cover these topics in detail. I am an aspiring Computer Scientist who aims to make complex systems his bread and butter for life. As I am learning the tools and mathematics and applying them in my research, I thought it would be a good idea to share them with the community as I go along. As a CS student trying to understand mathematics, I will attempt to fill these articles with as much intuition and hands-on experience as possible.
I hope you find the information useful. Until next time — where I discuss the intuition behind simplicial complexes and Betti numbers!
References and Further Coding/Reading/Watching
- Chad Topaz’s video on applying TDA to biological aggregation data (one of my favorites): https://www.youtube.com/watch?v=mkvdUtZ79jk
- Chad Topaz’s tutorial on Homology: https://drive.google.com/file/d/0B3Www1z6Tm8xblBVRFBpeWpfREk/view
- Persistent Homology of Embedded Time-Series Data (PHETS): https://github.com/lizbradley/PHETS
- A Mathematician’s perspective on Topological Data Analysis: https://rviews.rstudio.com/2018/11/14/a-mathematician-s-perspective-on-topological-data-analysis-and-r/
Champalimaud Center for the Unknown
Maria Joao Soares – Champalimaud Center for the Unknown
The image is credited to Diogo Matias.
Original Research: Closed access
“Towards a topological–geometrical theory of group equivariant non-expansive operators for data analysis and machine learning”. Mattia G. Bergomi, Patrizio Frosini, Daniela Giorgi & Nicola Quercioli.
Nature Machine Intelligence. doi:10.1038/s42256-019-0087-3