Creating Preferential Attachment (PA) Complex Networks with Self Loops

I am trying to create a directed simulated preferential attachment complex network with self loops, similar to something like this:

(This is the oncology_network dataset available in the ig.degree.betweenness package. Plotted with the ig.degree.betweenness::plot_simplified_edgeplot() function. Numbers on edges/edge thickness correspond to number of in or out directed edges)

I’ve used the igraph::sample_pa() function but have found limited success as there is no easy way to incorperate self loops that also have a preferential attachment process.

Is there a way to do this presently with igraph::sample_pa or some other function?

How would the connection probabilities be computed? Can you give a concrete example?

I presently do not have any framework thus far. All I know is that for real collaboration networks, this structure is really common. PA seems to be the approach

I’ve done a brief search on Google scholar and I have only managed to find another self looping network which comes from a real world dataset (here)

igraph has a rather general implementation of preferential attachment that could be in principle extended. When a new vertex is added, each of its m edges will connect each existing vertex i with probability proportional to

d_i^\alpha + A

where \alpha and A are parameters.

d can be either the in-degree or total degree. Multi-edges can be allowed or disallowed. A>0 is meant to ensure that there is a non-zero connection probability to existing vertices of zero in-degree in the directed case.


If we are going to be naïve about this, this could be easily generalized to to allowing self-loops. We simply included the just-added vertex in the candidates to connect to. A ensures that self-connections are possible.

But am quite reluctant to do this as a naïve generalization of the algorithm, without a good scientific justification. If I were dealing with real-world scale-free networks that have self-loops, and tried to use such a model, one of the first things I’d likely end up needed to try is to tune the self-loop probability separately from the connection probabilities to other vertices, to be able to reproduce the number of observed loops.

That would be yet another parameter for an already complex function. At this point, making this generalization, and defining/justifying included parameters, essentially becomes a research project. We can’t take on such a research project as part of igraph development.


So here’s what I propose

If you have the time to look into this, and are able to find some uses of such models in the literature, we can consider including this feature. A very solid mathematical justification (perhaps based on some invariance) for why it’s interesting/useful to include this even WITHOUT separate control of self-loop probabilities might also do.

We are preparing for a large release within a month after which the API will be frozen for a while. Thus this should be done within two weeks at most to keep within time.

I did a quick literature search and was unable to find anything. Let me know if you do find something that we can use.

Note that I am not asking about network datasets that seem to follow a scale-free degree distribution while having self-loops. I’m asking for published articles that describe a preferential attachment process / model that allows loops (either to try to model such data, or for other purposes).

1 Like

@szhorvat not sure if I’ll be able to come up with something on such short notice. However I have found these papers which might be relevant:

https://arxiv.org/pdf/2301.13730

It appears to note the present limitation with igraph’s PA generation mechanism.

Box 5.1 here discusses a variant with self-loops:

This might work. Is there a way to do this with the igraph r package?

@szhorvat I believe I have something that works. I will do some more digging though.