hi guys,
There are a few issues on the python-igraph repo that are basically asking for ports of functions already present in the R interface. That highlights the amazing work done by our folks on rigraph. Some examples:
Ideally, we would move those R functions back into C and then code thin wrappers in both languages. In practice, that might be a little hard at times, especially when working with attributes - C is notoriously iffy with strings.
How are you all feeling about this? I’m tempted to take the functions from R and basically transliterate them in Python, but can see that going the C way is better in the long term. Any strong opinions either way?
Generally speaking, it’s good to have as much in the C core as possible. igraph’s philosophy so far has been to put as much into the C core as possible, much more so than a typical C library would. (See e.g. my request.)
But with attributes, things get a little more complicated.
Attribute storage is specific to each high-level interface. It has to be so in order to allow integrating igraph deeply into its host language (e.g. allow native datatypes to be used as attributes).
Graph union and difference by vertex names is the domain of high-level interfaces because it requires at least an equality comparison (==
) for attribute values, whose storage is interface-specific. Other similar operations may require (or may benefit from) >
, <
or hashing too. This cannot be done for the host language’s datatypes from C.
There are two separate things discussed in that GitHub issue:
- Reading CSVs. @tamas makes a good point why reading CSV files should be left to other libraries, not igraph.
- Creating a graph from a “dataframe”. What “dataframe” means, and what the most natural way to convert to a graph is, are both specific to the host language. An easy way to go from a Pandas dataframe to a graph would be very useful, but this is again the domain of the Python interface, not the C core.
To sum up, personally I support moving as much as possible to the C core. But these two specific things cannot be moved to the C core in any reasonable way (except raw CSV reading, but there are good reasons not to do that in igraph).
1 Like
There are, however, many other features that are present only in R or only in Mathematica, and can be moved to C with some effort.
If you’re looking for such projects @iosonofabio, one example is match_vertices
, which is almost there (most features are included in C now), but is still R-only and is still partially implemented in R.
In the Mathematica interface, I have a large number of such features, mostly related to graph theory:
- exact graph colouring and clique cover (the methods I used here require a SAT solver, which would generally be useful for many other graph theory problems, so eventually we should link to or include one)
- is the graph perfect? (written in C++, very easy to port to C)
- testing regularity and symmetry properties, such as edge and vertex transitivity (written in C++, harder to port)
- Strahler index
- Contracting degree-2 vertices (extremely useful for spatial graph analysis) and checking homeomorphism
- probably some other things I don’t recall now
All of these would ideally be ported to the C core eventually.
There’s also the inverse problem: There are quite a few features that are present in C, but not exposed in Python (e.g. epidemics), or not fully exposed in Python (e.g. the new methods of degree_sequence_game
). There are some which are not exposed either in Python or in R (realize_degree_sequence
<- I’d appreciate it if someone would do this one!).
Unfortunately, some of the features that were contributed to the C core were not documented and explained in a way so as to make it easy for the maintainers of high-level interface to include them. Here’s an example: https://igraph.org/c/doc/igraph-Spatial-Games.html
Thank you.
So +1 for copying from R to python because of attributes. I’ll try to make PRs for those two.
Sorry but no time to rewrite the Mathematica code in C, I think you’re the best person for that
Actually I said that it makes no sense to do those two, nor is it possible …
hmm to me it reads like you don’t recommend doing it in C because of strings and other native types right? That’s why I’m now doing it in Python. Or is that not what this is?
Correct. Sorry, I misunderstood.
yep no problem, I’m coding the dataframe now