"Best practices" for id, name and label, and reason for new warning message about id

A new warning has appeared with the most recent igraph. I’ve been loading this graph for years without this warning as part of my teaching. I give two examples before asking my questions.

This one is Newman’s graph of network science co-authorship:

> NS <- read_graph("Networks/netscience.graphml", format="graphml")
Warning message:
In read.graph.graphml(file, ...) :
  At vendor/cigraph/src/io/graphml.c:487 : Could not add vertex ids, there is already an 'id' vertex attribute.
> summary(NS)
IGRAPH 7fc5087 U-W- 1589 2742 -- Network Science Collaborations
+ attr: name (g/c), label (v/c), id (v/c), weight (e/n)
> head(V(NS)$id)
[1] "0" "1" "2" "3" "4" "5"

This one is data derived from a database of chat contributions, with relations between the contributions I computed elsewhere:

> TI <- read_graph("Networks/TappedInChatSampleAnnotated.graphml", 
+                  format="graphml")
Warning message:
In read.graph.graphml(file, ...) :
  At vendor/cigraph/src/io/graphml.c:487 : Could not add vertex ids, there is already an 'id' vertex attribute.

and indeed I am using $id for the id of the contribution in this chat data, to correspond to the id in the source database:

> summary(TI)
IGRAPH 1af7625 D-W- 171 195 -- 
+ attr: label (v/c), time (v/c), actor (v/c), contribution (v/c), id
| (v/c), charcount (v/n), Edge Label (e/c), weight (e/n), id (e/c)
> V(TI)$id
  [1] "177" "178" "179" "180" "181" "182" "183" "184" "185" "186" "187" "188"
... etc.

There have always been issues an inconsistencies when moving between different packages and also when exporting to Gephi between the use of Id, id, label, and name.

I am wondering:

  1. What are the “best practices” for use of id, label and name in igraph, if any? (Note" Gephi uses “label” for default label in visualizations, and will create “Id” if a graph already has “id”. My workflow often moves between igraph and Gephi.)

  2. What is the reason for this warning now appearing? Has there been a change in policy with regard to the above?

Thanks in advance,
Dan

Can you post a link to that GraphML file? The original on Newman’s website is in the GML format, not GraphML.

Do you recall what was the last version that did not show this warning? I don’t recall recent changes that could have caused this, but there were older changes. All these are specific to storage in GraphML, not general usage.

I’ll get back to you with an answer to your actual questions later.

Here’s Newman’s collaboration network converted to graphml in 2018
https://www.hawaii.edu/filedrop/dl/rzNXe-RYqEe-iRRMA-KzAHc/

While we’re at it, here is another graphml giving the warning (written out during a class demo a few weeks ago):
https://www.hawaii.edu/filedrop/dl/knKHj-VmSRf-cfXWR-sEdTX/

I did not see this morning when teaching last fall. I think it was the big release this summer, whether it are release or the graph release and not sure.

As far as I can tell, this warning was added for the 0.10 series. Were you using the 0.9 series last year?

Here’s what’s happening:

In GraphML, all vertices must have a unique ID. This is not a GraphML vertex attribute, just a unique identifier. Nevertheless, igraph reads the value of this identifier and stores it as an igraph vertex attribute called id.

In this GraphML file, there is also a GraphML attribute called “id”. igraph chooses to store this GraphML attribute in the igraph vertex attribute id and skips storing the unique identifier. It shows a warning to let you know what happened.

There is no perfect solution here, unfortunately. Whatever igraph-side attribute name we choose to store the GraphML unique identifiers in, there could in principle be a GraphML attribute with a conflicting name.

What is not nice here is that if we export an igraph graph with an id vertex attribute to GraphML, it will create a GraphML attribute called “id”. It’s good to note that the file you shared was in fact written by igraph. This means that taking any GraphML file, reading it in with igraph, then re-exporting it, creates a GraphML file that has this issue (and that triggers this warning). Students being students, if anything can be even mildly confusing, some of them will be confused by it … so I can see how this is an annoyance during teaching, even though the warning can be safely ignored.

@tamas, what do you think about this situation? We could recommend using prefixattr=True when re-exporting GraphML files, but that changes attribute names, which is an even greater annoyance during round-tripping.

Should we treat the id vertex attribute specially? Should we add a new parameter to the GraphML writer function for igraph 1.0 that makes it possible to rename it or skip it?

It’d be good to check how other software (Gephi, NetworkX) handle this. Do they read in the unique ID at all? How do they store it? Does it get written back in an inconvenient way on re-export?

If you have any input on this @dan_suthers, that would be much appreciated.

I’ll have to stop looking at this for a while and prepare for my own teaching for tomorrow :slight_smile:

A final comment is that as I recall (and my memory may be failing me), the behaviour was no different in 0.9. It’s just that there was no warning issued at that time. You can safely ignore the warning, assuming that you are not trying to use this unique ID in any way.

Thanks for your clear explanation, which is along the lines of what I expected. Yes, I was using 0.9 the last time I taught and upgraded this summer to 0.10. No student expressed confusion but it’s good to have an explanation because I would rather say “this is what is going on and why you can ignore it” instead of “I don’t know why that is there”! :slight_smile:

I am working on the igraph-Gephi round trip with a trivial graph; will post soon.

Here’s my test, after which I need to get back to teaching! Script and graphmls are at https://www.hawaii.edu/filedrop/dl/nqBgt-LmHcN-EpbUI-Wdpvp/ (14 day limit). Conclusion is that id attribute comes in via the igraph/graphml write-read round trip, but the warning comes from the Gephi version.

Making a trivial example

> g1 <- graph_from_literal(A-B)
> summary(g1)
IGRAPH 8fbe2fe UN-- 2 1 -- 
+ attr: name (v/c)
> write_graph(g1, file="g1.graphml", format="graphml")

The resulting g1.graphml

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
         http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<!-- Created by igraph -->
  <key id="v_name" for="node" attr.name="name" attr.type="string"/>
  <graph id="G" edgedefault="undirected">
    <node id="n0">
      <data key="v_name">A</data>
    </node>
    <node id="n1">
      <data key="v_name">B</data>
    </node>
    <edge source="n0" target="n1">
    </edge>
  </graph>
</graphml>

Read it back in without saving just to check representation

> read_graph(file="g1.graphml", format="graphml") 
IGRAPH c80ee54 UN-- 2 1 -- 
+ attr: name (v/c), id (v/c)
+ edge from c80ee54 (vertex names):
[1] A--B

So reading it in adds the id attribute that was not there: it is a feature of the igraph graphml write/read round trip but does not trigger the warning.

Now loading this into Gephi and looking in Data Laboratory I get (removing the unused Interval column):

Id	Label	name
n0	n0	A
n1	n1	B

Although the Id wasn’t replicated, this shows another issue with Gephi-igraph coordination: Gephi uses Label by default for the screen display, so we need to take an extra step to change it to name or copy the name column to Label.

I export this from Gephi as g2.graphml (being sure that layout parameters are not included):

<?xml version="1.0" encoding="UTF-8"?><graphml xmlns="http://graphml.graphdrawing.org/xmlns">
    <key attr.name="label" attr.type="string" for="node" id="label"/>
    <key attr.name="Edge Label" attr.type="string" for="edge" id="edgelabel"/>
    <key attr.name="weight" attr.type="double" for="edge" id="weight"/>
    <key attr.name="name" attr.type="string" for="node" id="v_name"/>
    <graph edgedefault="undirected">
        <node id="n0">
            <data key="label">n0</data>
            <data key="v_name">A</data>
        </node>
        <node id="n1">
            <data key="label">n1</data>
            <data key="v_name">B</data>
        </node>
        <edge id="2744" source="n0" target="n1">
            <data key="weight">1.0</data>
        </edge>
    </graph>
</graphml>

No problem with id here; the only change is addition of the label. Reading it back into igraph:

> g2 <- read_graph(file="g2.graphml", format="graphml")
> summary(g2) 
IGRAPH 881818e UNW- 2 1 -- 
+ attr: label (v/c), name (v/c), id (v/c), Edge Label (e/c), weight
| (e/n), id (e/c)

There is no warning but we now have label and id attributes on the nodes.

Write the same graph back out and read it again without involving Gephi, we now get the warning:

> write_graph(g2, file="g3.graphml", format="graphml")
> read_graph(file="g3.graphml", format="graphml") 
IGRAPH e321c86 UNW- 2 1 -- 
+ attr: label (v/c), name (v/c), id (v/c), Edge Label (e/c), weight
| (e/n), id (e/c)
+ edge from e321c86 (vertex names):
[1] A--B
Warning message:
In read.graph.graphml(file, ...) :
  At vendor/cigraph/src/io/graphml.c:487 : Could not add vertex ids, there is already an 'id' vertex attribute.

Here is g3.graphml

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
         http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<!-- Created by igraph -->
  <key id="v_label" for="node" attr.name="label" attr.type="string"/>
  <key id="v_name" for="node" attr.name="name" attr.type="string"/>
  <key id="v_id" for="node" attr.name="id" attr.type="string"/>
  <key id="e_Edge Label" for="edge" attr.name="Edge Label" attr.type="string"/>
  <key id="e_weight" for="edge" attr.name="weight" attr.type="double"/>
  <key id="e_id" for="edge" attr.name="id" attr.type="string"/>
  <graph id="G" edgedefault="undirected">
    <node id="n0">
      <data key="v_label">n0</data>
      <data key="v_name">A</data>
      <data key="v_id">n0</data>
    </node>
    <node id="n1">
      <data key="v_label">n1</data>
      <data key="v_name">B</data>
      <data key="v_id">n1</data>
    </node>
    <edge source="n0" target="n1">
      <data key="e_Edge Label"></data>
      <data key="e_weight">1</data>
      <data key="e_id">2744</data>
    </edge>
  </graph>
</graphml>

So now we have both id and v_id. I hope that helps, time to prepare a class!

We cannot really do anything about it without breaking someone’s code. IDs in the GraphML file are supposed to be unique, otherwise it’s not a valid GraphML file. They can be used to refer to the nodes uniquely, so it totally makes sense to import them. On the other hand, igraph does not guarantee that a vertex attribute named id so we cannot use them directly in the generated GraphML file when exporting – unless we do a uniqueness check, which we could do in theory.

If we really want to get rid of the warning in the roundtripping case, we would need to add a uniqueness check in igraph_write_graph_graphml and use the IDs directly if we find them to be unique. More often than not this will be the case. Then we can fall back to the current ID generation scheme if we find that the IDs in the id vertex attribute are not unique.

Personally I don’t have the resources to implement this in the future, but if you want to do so, feel free to go ahead.

1 Like