algorithm in Barabasi Game partial sum tree


I wanted to know if there is any literature source for the psumtree-variant of the barabasi-game generator for random graphs? Specifically, I would like to use & modify it for a different graph model but for that purpose, I need to understand it first. Reading the C code is ofcourse an option, but I wanted to know if there is any publication etc of the partial sum tree approach?

Kind regards

Take a look at the psumtree.c file, and the comments/documentation within. You can basically treat it as a black-box data structure that allows for efficient sampling from a discrete probability distribution, as well as for efficient changes to that distribution.

When implementing preferential attachment, we need to select a vertex with probability proportional to its degree (i.e. sample from a discrete distribution).

Thank you! I have taken a look at it, but I was actually wondering about something like a more formal analysis of this approach (esp. regarding efficiency). I must be googling the wrong terms but I cant turn anything up. If you have any helpful pointers here, Id be very grateful.

I’d also like to know whether there is any published literature on this algorithm. I’m teaching my unit on preferential attachment tomorrow, and had designed this unit when it was the ‘bag’ implementation. Now I’m not sure how to explain it to students. The igraph sample_pa documentation explains the bag algorithm but only says this about psumtree: " psumtree uses a partial prefix-sum tree to generate the graph, this algorithm can handle any power and zero.appeal values and never generates multiple edges." The documentation page only cites one of the early Barabasi & Albert papers on scale free networks. Gabor Csardi, what source did you use in reimplementing this? Thanks – Dan

Have you looked at the documentation in psumtree.c? Most of is it actually included in the HTML documentation of the C library, and describes the key part of the method.

Ultimately, it is important to distinguish between what the function produces, and how it does it. One only needs to understand what it does in order to use it.

What does it do?

New vertices are added to the graph in discrete steps. In each step, we assign a weight w_i = (d_i+A)^\beta to each existing vertex i. d_i denotes its degree. We select m existing vertices according to probabilities proportional to the weights, p_i = w_i / \sum_i w_i, and connect the new vertex to them.

There are variations on the generation procedure depending on how d is interpreted (in-degee or total degree) and whether parallel edges are allowed (sampling m vertices with or without replacement).

How does it work?

To implement this, we simply need a method to sample from discrete distributions. There are many such methods, not all equally efficient. Notice that here the distribution changes in each step. Thus we need an algorithm that can (1) generate samples efficiently, but also (2) update the distribution efficiently. The data structure whose description I linked to provides this. Please check the description and let me know if it is insufficiently clear.

There’s one more trick to be mentioned: In order to select m distinct vertices, we can first select one, then set its probability (actually: weight) to zero before selecting the next. And so on, until m have been selected.

In comparison, the “bag” method only works for integer weights w and does not support sampling without replacement.

I don’t know if this method is published anywhere. It seems too simple to warrant publication. Effectively, we have a binary search tree where each node is annotated with the total weight of its subtree. To perform sampling, you descend from the root, choosing the left or right child node randomly, with probabilities proportional to their weights.

1 Like

As @szhorvat already explains, the use of the partial prefix sum tree is just an implementation detail of how to sample nodes based on degrees. Of course, it is a detail that will impact how fast it will run, and how flexible the implementation is. However, the end result does not depend on its implementation details.

For future reference, as far as I know, this concept is called a Fenwick Tree. Its publication lays out its purposes clearly:

A new method (the ‘binary indexed tree’) is presented for maintaining the cumulative frequencies which are needed to support dynamic arithmetic data compression. It is based on a decomposition of the cumulative frequencies into portions which parallel the binary representation of the index of the table element (or symbol). The operations to traverse the data structure are based on the binary coding of the index. In comparison with previous methods, the binary indexed tree is faster, using more compact data and simpler code. The access time for all operations is either constant or proportional to the logarithm of the table size. In conjunction with the compact data structure, this makes the new method particularly suitable for large symbol alphabets.

1 Like

You can find a short description of partial prefix sum trees in the PhD thesis of Gábor Csárdi.

1 Like

Thanks @szhorvat, that’s very helpful. The explanations in your post (both the what and the how parts) are more what I was looking for than the explanation of methods in the library of the psumtree methods.

To be sure I fully understand, am I correct that A in your formula is zero.appeal in sample_pa and beta in your formula is power in sample_pa, controlling linearity of attachment (what Barabasi’s textbook calls alpha)?

Yes, that’s correct.

Thanks a lot @tamas and @vtraag for the literature references, that was just what I was looking for! Now I have something to go from (and something to cite :slight_smile: ).

You can cite the igraph documentation as well. Each igraph version is archived on Zenodo to make sure that it will be indefinitely available.

1 Like