# Graph using high dimensional binary data

Hi,

I was wondering how to use igraph if I have a matrix with binary data, but I have more many more columns that rows: 425 (rows) x 2998 (columns). Please note that eah row of matrix corresponds to a patient, while the columns correspond to the genetic mutations that were identified on that patient. So,

``````                       SNP1   SNP2  SNP3  SNP4  .........
patient1                 0      1     1     1
patient2                 1      0     1     1
patient3                 0      1     0     1
``````

Basically, the value 1 means that the patient has a specific mutation, for example patient1 has a genetic mutation SNP2, SNP3, SNP4 and so on.

My main doubt is how can I convert this information in order to be used in a graph?

My suggestion:

1 - obtain the correlation matrix for the previous matrix, but I cannot use the Pearson metric because my data is binary, so instead of calculating the correlation matrix utilizing Pearson, I was thinking to utilize the metric Jaccard. And, thus utilize the similarity matrix using the Jaccard metric since there is a correspondence between correlation matrix and similarity matrix (focusing only on the binary data).

2 - Create the adjacency matrix using the similarity matrix, through one of the functions in igraph.

3 - Build the graph utilizing igraph.

Is there another way to do this?

Note: My main objective afterwards is to do clustering after the graph is built.

Thank you

You did not explain what you are trying to do, other than the vague â€śhow to use igraphâ€ť.

If you are asking how to construct a graph of patients, you can indeed calculate some of similarity measures between the rows to obtain a matrix (e.g. `dist` function from `proxy` package) then use `graph_from_adjacency_matrix` to create a graph from that.