I am trying to find the maximum amount of cliques that have at least a distance of 75 from eachother.
I have a TXT file with XY coordinates that are read in by the program as edges

#!/usr/bin/python
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import igraph as ig
dist = 75
File_data = np.loadtxt("TEST.txt", dtype=int)
points = File_data
Npoints = len(points)
G = ig.Graph()
G.add_vertices(Npoints)
Newlist=[]
for i in range(Npoints - 1):
print (str((i/Npoints)*100) + "%")
for j in range(i + 1, Npoints):
if np.hypot(points[i, 0] - points[j, 0], points[i, 1] - points[j, 1]) >= dist:
Newlist.append([i,j])
#append all values greater than dist to a new list
G.add_edges(Newlist)
# The graph G now contains an edge between nodes [i,j] if the distance between those two points is >= 75
H = G.maximal_cliques()[0]
# H is a list of all nodes in G that are in a maximum graph of G.
(fig, ax) = plt.subplots(1, 1, figsize=(12, 8))
ax.set_aspect('equal')
for i in range(Npoints):
ax.scatter(points[i, 0], points[i, 1], c=('g' if i in H else 'r'),
s=(25 if i in H else 16), marker='x')
if i in H:
c = plt.Circle((points[i, 0], points[i, 1]), dist, ec='k',
lw=1, fill=False)
ax.add_patch(c)
plt.savefig('example.pdf', bbox_inches='tight')

But the program crashes as I add all the edges.
The TXT file contains around 9500 xy coordinates, but the program crashes with a fatal error at G.add_edges(Newlist)

Is there some way to make the performance better? Am I doing something wrong?

Iâ€™ve got the python-igraph 0.9.9 wheel version : igraph-0.9.9-cp310-cp310-win_amd64.whl and installed it via pip install.
OS is windows 10 pro, Python 3.10

The TEXT.txt files has 9458 lines, with XY coordinates. The size seems to be part of the problem, because with smaller files the program works fine. Iâ€™ve uploaded the TXT file and made the code more compressed.

How big is Newlist after you have constructed it? I havenâ€™t even finished constructing the Newlist, but itâ€™s now already up to more than 16 million entries. Are you sure itâ€™s only crashing upon calling G.add_edges and not before?

You are probably going to be better of using scipy or sklearn for the construction of the pairwise distances. You are now constructing the Newlist in pure python, which most likely takes up quite some memory.

Edit: Indeed constructing the list is much faster (and presumably a lot more memory efficient) by doing this

from scipy.spatial import distance_matrix
D = distance_matrix(x=points, y=points)
edges = np.array(np.where(D > 75))

This results in 68,492,582 edges.

You can then easily construct the graph using G = ig.Graph(edges.T).

Your solution is way faster indeed. Thanks a lot for that. It seems that creating the list was indeed the problem.

Finding the maximal cliques ( H = G.maximal_cliques()[0] )still takes a long while; I havenâ€™t seen it finish on my test.txt file

My current code is as follows:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import matplotlib.pyplot as plt
import numpy as np
import igraph as ig
from scipy.spatial import distance_matrix
File_data = np.loadtxt("TEST.txt", dtype=int)
points = File_data
D = distance_matrix(x=points, y=points)
Npoints = len(points)
edges = np.array(np.where(D > 75))
G = ig.Graph(edges.T)
H = G.maximal_cliques()[0]
(fig, ax) = plt.subplots(1, 1, figsize=(12, 8))
ax.set_aspect('equal')
for i in range(Npoints):
ax.scatter(points[i, 0], points[i, 1], c=('g' if i in H else 'r'),
s=(25 if i in H else 16), marker='x')
if i in H:
c = plt.Circle((points[i, 0], points[i, 1]), 75, ec='k',
lw=1, fill=False)
ax.add_patch(c)
plt.savefig('example.pdf', bbox_inches='tight')

Is it impossible to find the maximal cliques with 68 million edges on 8GB ram? Or is there even more to tidy up and make the process faster?

Thanks

Edit:

A fatal error occurs unfortunately when trying to find the maximal cliques: Fatal error at src/core/vector.c:378 : Assertion Failed: v!=0

Difficult to say. The memory is probably not the problem directly, but Iâ€™m not sure. The overall runtime might be a problem in this case since this is quite a dense graph.

This is something that should not happen. Just to make sure, can you double check you are on python-igraph version 0.9.9? Because vector.c:378 seems to refer to this line of code:

which shouldnâ€™t trigger this assertion error.

Possibly, it is an out-of-memory error, in which case a more informative error should be thrown, but it is not immediately clear.

Do you think upgrading to a higher amount of RAM (like adding another 16GB of RAM) will prevent the memory failure (I do think it is a memory issue, because the error happens when the memory is peaking and my screen freezes when the error occurs)? I donâ€™t know whether there is a rule of thumb for the amount of RAM necessary in relation to the amount of edges in a graph or somethingâ€¦

Can you provide a complete minimal example (without relying on any datafiles) that reproduces this issue?

Can you double-check that the error message is really this, character for character? Did you change anything in it? It could be:

Fatal error at src/core/vector.pmt:378 : Assertion Failed: v != 0

Notice that this is different: vector.pmt and v != 0 has spaces. Then it would come from here:

Or maybe MSVC handles filenames differently when #include is used â€¦

This could be the result of a combination of an out-of-memory error and an error handling bug in igraph. Itâ€™s hard to tell. If it is, that bug might have already gotten fixed on the develop branch during the transition from vector_ptr to vector_list.

However, finding maximal cliques in such a huge graph is likely to be extremely slow â€¦ Iâ€™m not sure that it would be feasible even with more memory.

Note that maximal_cliques() find maximal cliques, not largest cliques. It does not seem to make sense to find only one.

@vtraag It would still be nice to be able to find only a single clique or only a few. This functionality already exists in the C library, in the form of the â€ścallbackâ€ť functions, but not in Python. It would be nice to expose this in Python as well.

I think I need to give some background information on the case that I am using this for.
I want to maximise the amount of points with a certain diameter in a field.
The higher the amount of green coordinates, the better the result is for me.

First of all, let me explain the difference between maximal cliques and largest cliques. Maximal cliques are cliques that cannot be extended further, but they can still be much smaller than the largest possible cliques in the graph. Largest cliques are cliques of size k such that there exists no clique of size k+1 or larger in the graph. Every largest clique is maximal, but not all maximal cliques are largest (in fact, most of them arenâ€™t).

To illustrate the difference, consider the union of a full graph of size 5 and 1 million additional isolated edges. This graph has 1 million plus one maximal cliques (the clique of size 5 is a maximal clique, and each additional isolated edge is a maximal clique of size 5), but it has only one largest clique (the clique of size 5).

I think that due to the triangle inequality, there will be lots and lots of maximal cliques in your particular graph in 2D, most of which would be irrelevant to the problem, but Iâ€™m not sure if I understand the original problem correctly. Let me try to rephrase the problem a bit. You are given N points in two dimensional space and a radius r. You need to select as few points as possible such that for every point there is at least one selected point within radius r. Is that correct, is this what you need?