May 27, 2011

The Dark Side of Scientists: A Tale of Complex Networks

I'm not citing names, first because they would be meaningless for most people, second because I'm not stupid and my future job may in one way or another depend on those involved some day. Besides, it wouldn't be either elegant or very professional. The tale I'm gonna tell is anyway very funny and clarify a bit of the human nature.

I've just got accepted for publication by PLoS ONE this nice paper, which I wrote with Joerg Reichardt and David Saad. Joerg, by the way, deserves all the merit as the computer expert behind some of the best tricks involved in the algorithm, which as I'm going to explain, is the main point of the paper. 

The paper is this:

The interplay of microscopic and mesoscopic structure in complex networks, J. Reichardt, R. Alamino, D. Saad - arXiv:1012.4524v1 [cond-mat.stat-mech]

The link above is from the preprint in the arXiv, the final version being a bit different in the end, but not very far from it. It's not traditional physics, it's a very interdisciplinary paper mixing ideas from physics and information theory and with applications to sociology and biology. Okay... It's fundamentally a paper on Bayesian inference and, being a mathematics paper, it's naturally interdisciplinary. But it's always nice to use that word. :) 

Let me talk a bit of the paper. That may be a long talk, so if you are more interested about the tale itself, I suggest you to scroll down. I will start by telling what we mean by a complex network. A complex network is basically a bunch of things, that may be equal or different, interacting among themselves in any kind of way. Looks like everything is a complex network, right? Well, it's very much like that. But it is much easier to visualize it with a picture, and so I'm putting one I found the internet here.


Each dot, or edge, or node, in the above network is an author of a scientific paper. Each connection, link, edge, means that two authors shared a paper. And that is a complex network. Mathematically, it's a graph. Now, in this graph it's not easy to see, but if you pay attention you will see that are some structures in these graph. Some groups of nodes are more interconnected among themselves than with other nodes. Finding out the rules by which this happens is called community detection.

It's interesting to note that this group structure is what we call a mesoscopic characteristic of the network. It means that it happens in an intermediary level between the macroscopic and the microscopic phenomena in the graph. By macroscopic you can imagine things like paths, cycles and cliques. By microscopic, you can think about characteristics of individual nodes or of small groups of nodes, two or three usually.

The interesting thing is that community detection is usually done by trying to infer how one group connect to another one, completely ignoring any node specific characteristic. What we've done was to include this microscopic information in the inference and, voila, our algorithm was capable of modelling the network structure better than the others!

Our algorithm is a Bayesian one, full of tricks I must admit, but it works anyway. What it effectively does is to define a general model for a network that depend on two hyperparameters: the group structure and the tendency of a node to link to another one. Then, we feed the algorithm with the observe adjacency matrix of the graph representing the network and the algorithm give back a classification of each node into a different group, how the groups link to each other and what is the propensity of a node to link to another one! 

You may say: Of course, give me enough points and I can fit an elephant! But first, we did not add as much parameters as we could, we did it by thinking about the best structure. Each one of our parameters has a "physical" interpretation. Second, we compared it with an algorithm with more degrees of freedom (more parameters to adjust) and we still performed better.

How do you know you're better? Well, there are some networks that were studied by specialists in their area and the group structure was inferred and studied by them. In out paper we give one example from sociology and two from biology. In these cases, we had what we called the expert classification. So, we run our algorithm and others on the network and compared with this classification. As I said, our algorithm agreed much better in ALL three cases. 

Now, I will ask you something. Isn't that clear that, in order to know if the algorithm was good, we had to compare with a case where the classification is known? Isn't it obvious that there can be cases where the expert classification may be difficult, may be not available, or may take a long, long time to be obtained? And after all, even if we always had the expert, isn't that interesting to have a program that is as good as the expert? That would certainly tell us something about how the expert works and, as I have been writing in this blog for a long time now, pure knowledge is also a good thing.

And finally we come to the climax of the tale. I explained all of that, except for the last paragraph above simply because I thought it was too obvious for an audience of scientists. After I finished my talk, there were few questions from the public, which is a sign that either no one understood what I said or that nobody liked what I said. The second turned out to be the case as one of the members of the audience asked with a sarcastic tone of voice:

"Why don't you always ask the classification for the expert?"

The audience was pleased with the question and many smiled and nodded in agreement. I answered that it was a test and the guy asked me for an example where the expert could not give the classification. I'm a terrible debater, so it took me some time to think about a specific example and the one I came with was not very convincing. But I guess that it's clear that the more complex the network is, the more difficult is to a human expert to analyse it. Note that these people were not stupid. But they were nonetheless arrogant enough to think that if they could not see the importance of what you're doing, it's not important at all. In fact, I could say that many of the talks from those who were smiling were actually very devoid of any short term practical application, what for me is irrelevant as, I'm saying again, knowledge for the sake of knowledge IS IMPORTANT no matter what politicians and businessmen say.

This kind of attitude is unfortunately very common in the scientific community. Not everyone is like that, but a lot are. Be it because we are competing for funding or for awards, or because we want to be the brilliant rising star, that's not what science is all about. I guess that this kind of disunion just make us more vulnerable to the attack we have been suffering from the governments around the world. The utilitarian philosophy, which is just a means of mass control, is already rooted into our community. On the other hand, maybe we were always like that. Newton seemed to be like that. Others as well. But it is, anyway, regrettable.

2 comments:

Marlo said...

It's strange, as I cannot be called a scientist for quite some time, that I agree with Dr. Alamino. At least, such study would serve as a beginning of an algorithm that could evaluate networks too complex to human experts.

That's so clear to me that it makes me think I'm missing something.

Roberto C. Alamino said...

You're not missing anything, Marlo. It just to show that scientists are very little different than anyone else... for the good and for the bad...