the difference between Read_Ncol and Read_edgelist

I use Read_Ncol and Read_edgelist for read edgelist texte file so i get difference results
exmple this is text file (tst):
1 10
1 9
1 8
1 2
2 4
2 3
2 1
2 5
2 6
3 9
3 4
3 6
4 3
4 5
4 6
5 4
5 7
6 2
6 3
6 7
6 4
7 5
7 6
7 8
8 9
8 1
8 10
9 10
9 1
10 1
10 9
when i use graph=Graph.Read_Edgelist(‘/home/yac/Desktop/tst/tst’)
Output : 11
when i use graph=Graph.Read_Ncol(‘/home/yac/Desktop/tst/tst’)
this difference effects on the result of the algorithm, i do not get the same result

Read_Edgelist assumes that the file has pairs of 0-based vertex indices. The number of vertices is the largest index + 1.

Read_Ncol uses vertex names, not indices. You can use any string to identify vertices. There is no guarantee about the ordering of vertices in the final graph.

I generate LFR benchmark with the software package from this site
witch is the implementation of algorithm that describe in the paper ( [Benchmark graphs for testing community detection algorithms]) The edgelist starts from index 1
I run the program and i get the benchmarks but when i use read_Ncol or read_edgelist
for read the file . i get irrational results when i use read_Ncol .
So i use networkx for generate the LFR benchmark , the edgelist starts from index 0 , I use read_edgelist then i get rational results and when i use Read_Ncol i get irrational results !! what is the issue with read_Ncol ?

It’s unclear what you mean by “irrational results”.

As I said above, using the NCOL format does not guarantee any vertex ordering, but you can of course reorder vertices based on their names.

The edgelist format assumed 0-based indexing. You can simply delete vertex 0 from the result and then it’s the same as if it used 1-based indexing.

Hi thank you for your respond
In this code I use ledein algorthim to identify community structure and I use NMI for evaluate the quality of solution but when i use Read_Ncol for read graph the NMI is 0.08
this is irrational result , it should be 1
when i use Read_Edglist , NMI is 1 and this the right result…it is same results as in the paper …

To understand what is happening, you need to first understand the difference between Read_Edgelist and Read_Ncol.

In igraph, vertices have a consistent ordering. The ID of a vertex is its position in this order, i.e. just an index. To be specific, it’s a 0-based index, as usual in Python. The first vertex has index 0, the second 1, and so on.

Read_Edgelist assumes that the file contains vertex IDs. This means that the ordering of vertices is defined by the file. However, the file you are dealing with uses 1-based indices, not 0-based. The first vertex is referred to as 1, the second as 2, and so on.

Read_Ncol assumes that the file contains arbitrary vertex names, which may be any string, not necessarily numbers. You can have vertices called Alice and Bob. The vertex name 1 will not in general be the first one in your graph.

I assume that the ordering is important for your application as the ground-truth communities are made up of vertices with adjacent indices. Thus, you should use Read_Edgelist. This will produce an additional vertex using index 0 that you don’t want. Simply delete this and you have your graph. Note that after deleting vertex 0, all indices shift down by one, so the old vertex 1 becomes 0, old vertex 2 becomes 1, etc. Take this into account when working with the ground-truth communities.

I hope this is clear.

1 Like

yes thank you so much , it is clear now :heart_eyes: