Generate subgraph of given size

peranti · 24 January 2022 17:23

I have a connected graph of 15000 nodes and would like to sample subgraphs of size 500. What options are available within the igraph system to generate subgraphs of a given size?

szhorvat · 24 January 2022 18:37

One technique that is sometimes used is to perform a random walk, and take the subgraph induced by the vertices traversed by the walk.

dan_suthers · 25 January 2022 02:30

Nice idea. But Peranti should clarify the requirements for the sample. Random walk would give samples that are connected and are “communities” under some definitions of communities, so it is not a random sample of nodes or edges. If it is more important to have a random sample of nodes regardless of connectivity, use “sample” on vertex (node) ids and then induce a graph of the edges that connect them with induced_subgraph. It may not be a connected graph. Or if random sample of edges is what matters, sample the edge ids and use the nodes they connect.

peranti · 25 January 2022 06:18

Yes, @dan_suthers. Indeed, I am looking for a subgraph of 500 that is connected or the only subgraph component. However, I understand what you have mentioned that the resultant subgraph is no more random.

If sample the vertex ids and then induce the graph, it necessarily need not result in a single component subgraph. Isn’t it?

dan_suthers · 25 January 2022 11:09

My main point is that one should first define the kind of sample one wants, and then identify a method that meets those requirements. If the most important criteria is a random sample of nodes, a random walk won’t do it. But if you want to randomly choose a connected component, then it seems to me that a random walk starting at a randomly chosen node would do it. However, I am not a statistician: there may be some bias in that method I am not aware of.
A further question: do you just want one randomly chosen connected component, or do you want to repeatedly sample the graph and get a distribution of components that is randomly distributed across all possible samples of components? Again, I’m not going to claim to have expertise about sampling, but it seems important to clarify your objectives.

peranti · 25 January 2022 12:24

@dan_suthers I am afraid, I do not have an answer to your question. I would think and come back to you.

szhorvat · 25 January 2022 14:09

Indeed.

@peranti Why do you want to take a sample? The usual motivation is that the original graph is too large to analyse in a reasonable amount of compute time. Thus we want a proxy which is small enough to work with, yet still representative of the original network. If this is what you are looking for, you need to consider whether the sampling method you chose preserves those properties of the larger network which you care about. Random walk based sampling is probably the simplest attempt at preserving local structure.

I am not deeply familiar with this topic, so I am reluctant to give more specific suggestions. However, the problem of taking a representative sample has been studied for a long time, thus you will find many papers on it. If I had to do this, I would look at a few recent papers and look for a review of existing methods, along with their tradeoffs (biases? what properties do they / do they not preserve?).

vtraag · 25 January 2022 14:35

I agree that the objective should guide what type of sampling strategy is being used.

For a starting point on some problems of estimation and sampling networks, perhaps this would be a useful reference:

Kolaczyk E.D. (2009) Sampling and Estimation in Network Graphs. In: Statistical Analysis of Network Data. Springer Series in Statistics. Springer, New York, NY. https://doi.org/10.1007/978-0-387-88146-1_5

Topic		Replies	Views
Subgraph or induce subgraph by sampling the random nodes of given degree sequence Usage R	4	494	24 May 2021
Graph from motif distribution Usage Python	4	447	25 November 2021
Count the size and amount of clusters/hubs/sub-networks Usage R	4	269	21 January 2022
How to subgraph a network based on sex? Usage R	1	352	12 July 2021
Sampling graphs with a given degree sequence Usage Mathematica	1	847	28 January 2020

Generate subgraph of given size

Related topics