Graph using high dimensional binary data

Hi,

I was wondering how to use igraph if I have a matrix with binary data, but I have more many more columns that rows: 425 (rows) x 2998 (columns). Please note that eah row of matrix corresponds to a patient, while the columns correspond to the genetic mutations that were identified on that patient. So,

                       SNP1   SNP2  SNP3  SNP4  .........
patient1                 0      1     1     1
patient2                 1      0     1     1
patient3                 0      1     0     1

Basically, the value 1 means that the patient has a specific mutation, for example patient1 has a genetic mutation SNP2, SNP3, SNP4 and so on.

My main doubt is how can I convert this information in order to be used in a graph?

My suggestion:

1 - obtain the correlation matrix for the previous matrix, but I cannot use the Pearson metric because my data is binary, so instead of calculating the correlation matrix utilizing Pearson, I was thinking to utilize the metric Jaccard. And, thus utilize the similarity matrix using the Jaccard metric since there is a correspondence between correlation matrix and similarity matrix (focusing only on the binary data).

2 - Create the adjacency matrix using the similarity matrix, through one of the functions in igraph.

3 - Build the graph utilizing igraph.

Is there another way to do this?

Note: My main objective afterwards is to do clustering after the graph is built.

Thank you

You did not explain what you are trying to do, other than the vague “how to use igraph”.

Thank you for your comments szhorvat, I have clarified my question.

If you are asking how to construct a graph of patients, you can indeed calculate some of similarity measures between the rows to obtain a matrix (e.g. dist function from proxy package) then use graph_from_adjacency_matrix to create a graph from that.

1 Like