Initialization of cluster_label_prop providing initial labels

Hi everyone,
I am using the function cluster_label_prop to assign labels to unlabelled nodes of a network. So I provide an input graph and I use both the initial and fixed arguments. In more detail, as initial I provide a vector with three possible entries (1,0,-1) the majority of labels are -1s which means that I don’t know their label. Furthermore, entries that are either 0 or 1 are fixed.
Since the label propagation works by “copying” the most likely label in the node’s neighbourhood, I was wondering how labels of nodes having -1 are initialised. This is not specified in the documentation and I think there are 3 options: 1) -1s are not considered when propagating labels; 2) every -1 is initialised randomly to be either 0 or 1; 3) every -1 is intialised to the most common fixed label.
Thanks for the reply.

It is this one.


Note that the behaviour of this function will change slightly in R/igraph 1.3.0 (when it’s released), according to the changelog item here: Release igraph 0.9.6 · igraph/igraph · GitHub I.e., no vertex will be left unlabelled, even if there are unlabelled disconnected components. You can try the 1.3 prerelease right now (instructions), but you need to compile it yourself.

Tip: when in doubt, check if there is additional information in the documentation of igraph’s C core. That is the ultimate source.

https://igraph.org/c/html/latest/igraph-Community.html#label-propagation

Thanks for the prompt reply! Since I am considered a connected undirected graph I won’t have problems in terms of missing labels.
I see how it works now so, for example, if I have 9 neighbours out of 10 labelled with -1 and 1 labelled with 1 I will copy the label of this node. Otherwise, I will take the more likely, and in case of parity I will make a random choice.
Nice, I think it would be useful to add this to the documentation.
Thanks for the tip, I was checking the C implementation however I was not able to understand extra things by checking the additional information and I couldn’t understand much of the source code.

The documentation does not explain what label propagation is, at this moment. It simply references the paper. If you can provide a short paragraph explaining label propagation, I will insert it. It should be the second paragraph here: igraph Reference Manual Later it will be transferred to the high-level interface docs as well (R, Python, Mathematica).

Well, the general description seems clear in the updating rule: “then updating the labels by majority voting in the neighborhood of the vertex” considering that the original paper in Section III goes like this: “We assume that each node in the network chooses to join the community to which the maximum number of its neighbors belong, with ties broken uniformly randomly.”. This is what the authors say and I can adapt it by providing a short addition to the general description.
If it helps (and I think it does), I can write a few lines on the use of initial according to this conversation :slight_smile:

1 Like

In R:

Description
This is a fast, nearly linear time algorithm for detecting community structure in networks. In works by labeling the vertices with unique labels and then updating the labels by majority voting in the neighborhood of the vertex.

Details
This function implements the community detection method described in: Raghavan, U.N. and Albert, R. and Kumara, S.: Near linear time algorithm to detect community structures in large-scale networks. Phys Rev E 76, 036106. (2007). This version extends the original method by the ability to take edge weights into consideration and also by allowing some labels to be fixed.

From the abstract of the paper: “In our algorithm every node is initialized with a unique label and at every step each node adopts the label that most of its neighbors currently have. In this iterative process densely connected groups of nodes form a consensus on a unique label to form communities.”

initial:
The initial state. If NULL, every vertex will have a different label at the beginning. Otherwise it must be a vector with an entry for each vertex. Non-negative values denote different labels, negative entries denote vertices without labels.

I suggest adding at the bottom of the description of initial:
If the initial state contains negative values they will not be considered in the propagation process until the corresponding nodes obtain a non-negative label.