Hi,
I was wondering how to use igraph if I have a matrix with binary data, but I have more many more columns that rows: 425 (rows) x 2998 (columns). Please note that eah row of matrix corresponds to a patient, while the columns correspond to the genetic mutations that were identified on that patient. So,
SNP1 SNP2 SNP3 SNP4 .........
patient1 0 1 1 1
patient2 1 0 1 1
patient3 0 1 0 1
Basically, the value 1 means that the patient has a specific mutation, for example patient1 has a genetic mutation SNP2, SNP3, SNP4 and so on.
My main doubt is how can I convert this information in order to be used in a graph?
My suggestion:
1 - obtain the correlation matrix for the previous matrix, but I cannot use the Pearson metric because my data is binary, so instead of calculating the correlation matrix utilizing Pearson, I was thinking to utilize the metric Jaccard. And, thus utilize the similarity matrix using the Jaccard metric since there is a correspondence between correlation matrix and similarity matrix (focusing only on the binary data).
2 - Create the adjacency matrix using the similarity matrix, through one of the functions in igraph.
3 - Build the graph utilizing igraph.
Is there another way to do this?
Note: My main objective afterwards is to do clustering after the graph is built.
Thank you