the difference between Read_Ncol and Read_edgelist

salaheddine_taibi · 1 July 2022 17:17

I use Read_Ncol and Read_edgelist for read edgelist texte file so i get difference results
exmple this is text file (tst):
1 10
1 9
1 8
1 2
2 4
2 3
2 1
2 5
2 6
3 9
3 4
3 6
4 3
4 5
4 6
5 4
5 7
6 2
6 3
6 7
6 4
7 5
7 6
7 8
8 9
8 1
8 10
9 10
9 1
10 1
10 9
when i use graph=Graph.Read_Edgelist(‘/home/yac/Desktop/tst/tst’)
print(graph.vcount())
Output : 11
when i use graph=Graph.Read_Ncol(‘/home/yac/Desktop/tst/tst’)
print(graph.vcount())
Output:10
this difference effects on the result of the algorithm, i do not get the same result

szhorvat · 1 July 2022 17:37

Read_Edgelist assumes that the file has pairs of 0-based vertex indices. The number of vertices is the largest index + 1.

Read_Ncol uses vertex names, not indices. You can use any string to identify vertices. There is no guarantee about the ordering of vertices in the final graph.

salaheddine_taibi · 1 July 2022 21:55

I generate LFR benchmark with the software package from this site https://www.santofortunato.net/resources
witch is the implementation of algorithm that describe in the paper ( [Benchmark graphs for testing community detection algorithms]) The edgelist starts from index 1
I run the program and i get the benchmarks but when i use read_Ncol or read_edgelist
for read the file . i get irrational results when i use read_Ncol .
So i use networkx for generate the LFR benchmark , the edgelist starts from index 0 , I use read_edgelist then i get rational results and when i use Read_Ncol i get irrational results !! what is the issue with read_Ncol ?

szhorvat · 2 July 2022 16:31

It’s unclear what you mean by “irrational results”.

As I said above, using the NCOL format does not guarantee any vertex ordering, but you can of course reorder vertices based on their names.

The edgelist format assumed 0-based indexing. You can simply delete vertex 0 from the result and then it’s the same as if it used 1-based indexing.

salaheddine_taibi · 2 July 2022 17:19

Hi thank you for your respond
In this code I use ledein algorthim to identify community structure and I use NMI for evaluate the quality of solution but when i use Read_Ncol for read graph the NMI is 0.08
this is irrational result , it should be 1
when i use Read_Edglist , NMI is 1 and this the right result…it is same results as in the paper …

szhorvat · 5 July 2022 09:43

To understand what is happening, you need to first understand the difference between Read_Edgelist and Read_Ncol.

In igraph, vertices have a consistent ordering. The ID of a vertex is its position in this order, i.e. just an index. To be specific, it’s a 0-based index, as usual in Python. The first vertex has index 0, the second 1, and so on.

Read_Edgelist assumes that the file contains vertex IDs. This means that the ordering of vertices is defined by the file. However, the file you are dealing with uses 1-based indices, not 0-based. The first vertex is referred to as 1, the second as 2, and so on.

Read_Ncol assumes that the file contains arbitrary vertex names, which may be any string, not necessarily numbers. You can have vertices called Alice and Bob. The vertex name 1 will not in general be the first one in your graph.

I assume that the ordering is important for your application as the ground-truth communities are made up of vertices with adjacent indices. Thus, you should use Read_Edgelist. This will produce an additional vertex using index 0 that you don’t want. Simply delete this and you have your graph. Note that after deleting vertex 0, all indices shift down by one, so the old vertex 1 becomes 0, old vertex 2 becomes 1, etc. Take this into account when working with the ground-truth communities.

I hope this is clear.

salaheddine_taibi · 7 July 2022 00:14

yes thank you so much , it is clear now

Topic		Replies	Views
Does igraph accept json format and why are nodes represented as numbers? Usage Python	5	407	13 September 2021
Making it possible to use arbitrary vertex names Development Python	23	1580	15 November 2021
"Best practices" for id, name and label, and reason for new warning message about id Usage R	9	98	19 September 2024
Eivenvector centralities & connected components Usage Python	28	2358	6 October 2020
Python-igraph 0.8.3 Announcements Python	3	1377	8 October 2020

the difference between Read_Ncol and Read_edgelist

Related topics