Specifying multiple output nodes from 1 input node

I have many large dataframes (from csv files) of the following structure with 1 input node in each row and multiple output nodes and edge weights.

      input_node            output_nodes             edge-weights               id-attr      attribute
1   11347-5       ['64837-1', '116228-0']  [0.01001617, 0.01778383] 82249852    372856
2   116228-0             ['14328-3']                 [0.3505]                             82283186    372892
3    39644-0            ['116228-0']             [0.10184362]                       82273700    372878
4   116228-0            ['116228-0']             [0.21326264]                      82278451    372887
5   116228-0    ['64827-1', '116228-0']  [0.02947139, 0.08275262] 82249816    372855
>

For example, rows 1 and 5 have 1 input node, 2 output nodes, the corresponding 2 edge weights (they are numbers), and few attributes; rows 2 through 4 have 1 input, and 1 output, etc .

How do I read this dataframe in igraph to make a graph while retaining attributes. Typically igraph asks for the dataframe to have the first 2 columns to be input and output node. This is a large dataframe where, the # of output nodes could be large in some rows.

I can imagine doing this by a “for” loop and regex. But, that would be too slow and the new dataframe would require more memory. Would appreciate any suggestions.

Thank you. Sid

If memory is of a concern, you could convert the data frame into a file in LGL format, which looks roughly like this in your case:

# 11347-5
64837-1 0.01001617
116228-0 0.01778383
# 116228-0
14328-3 0.3505
# 39644-0
116228-0 0.10184362
...

So, the idea is that for each input node, you start a new line with # , add the name of the input node, and then add the edges incident on that node in the subsequent lines, listing the other endpoints of these edges and the weights. This is then repeated for each input node. You can do this line-by-line from the input dataframe and write the file to disk so there’s no extra memory consumed. Then, you can load the LGL file into igraph directly. You only need to make sure that none of the node identifiers contain whitespace characters.

If you have a dataframe with lists in one of the columns, then there is a function in Python pandas called “explode” which creates this. As already pointed out, in tidyverse package in R, the way to do this will be to use “unnest_longer”. The python way of doing it is extremely fast.