Specifying multiple output nodes from 1 input node

sid · 8 April 2020 14:31

I have many large dataframes (from csv files) of the following structure with 1 input node in each row and multiple output nodes and edge weights.

      input_node            output_nodes             edge-weights               id-attr      attribute
1   11347-5       ['64837-1', '116228-0']  [0.01001617, 0.01778383] 82249852    372856
2   116228-0             ['14328-3']                 [0.3505]                             82283186    372892
3    39644-0            ['116228-0']             [0.10184362]                       82273700    372878
4   116228-0            ['116228-0']             [0.21326264]                      82278451    372887
5   116228-0    ['64827-1', '116228-0']  [0.02947139, 0.08275262] 82249816    372855
>

For example, rows 1 and 5 have 1 input node, 2 output nodes, the corresponding 2 edge weights (they are numbers), and few attributes; rows 2 through 4 have 1 input, and 1 output, etc .

How do I read this dataframe in igraph to make a graph while retaining attributes. Typically igraph asks for the dataframe to have the first 2 columns to be input and output node. This is a large dataframe where, the # of output nodes could be large in some rows.

I can imagine doing this by a “for” loop and regex. But, that would be too slow and the new dataframe would require more memory. Would appreciate any suggestions.

Thank you. Sid

tamas · 17 April 2020 14:25

If memory is of a concern, you could convert the data frame into a file in LGL format, which looks roughly like this in your case:

# 11347-5
64837-1 0.01001617
116228-0 0.01778383
# 116228-0
14328-3 0.3505
# 39644-0
116228-0 0.10184362
...

So, the idea is that for each input node, you start a new line with # , add the name of the input node, and then add the edges incident on that node in the subsequent lines, listing the other endpoints of these edges and the weights. This is then repeated for each input node. You can do this line-by-line from the input dataframe and write the file to disk so there’s no extra memory consumed. Then, you can load the LGL file into igraph directly. You only need to make sure that none of the node identifiers contain whitespace characters.

sid · 16 May 2020 21:03

If you have a dataframe with lists in one of the columns, then there is a function in Python pandas called “explode” which creates this. As already pointed out, in tidyverse package in R, the way to do this will be to use “unnest_longer”. The python way of doing it is extremely fast.

Topic		Replies	Views
How to create a graph from pandas dataframes? Usage Python	4	7406	16 December 2023
Convert Graph object to dataframe Usage Python	7	3631	13 January 2021
graph_from_data_frame function Usage R	1	526	1 October 2021
How to create a graph from an adjacency matrix? Usage R	3	1730	11 April 2021
How to create a large weighted network with excel data? Usage R	7	369	18 September 2023

Specifying multiple output nodes from 1 input node

Related topics