My colleagues and I have sequenced cyanobacteria strains from numerous locations in a single system to understand how certain communities/locations may be connected. I am using R and this is my first time ever using igraph or doing any network analysis.
I would like to be able to group the samples by station name. I am having several problems with this. First, if I try replacing the sample name (taxa1 and taxa2 in code) with the station name (site1 and site2) in the csv file then it does not plot the network analysis correctly. This would only get me part way to my solution anyway. So I think I need to group by the unique sample identifiers (taxa1 and taxa2). I have over 15 groups…I’d like all of the Stockton groups to have a green color and then I would like Discovery Bay, Indian Slough, Old River @ Kings, Rancho del Rio all a purple color scheme (each site slightly different) hue. Then the remaining sites a different color. Then I want a legend with the names of sites and associated color of the node.
Is this possible? Is there some way to differentiate this many groups in color.vertex?
At the bottom of this question are two figures - one is the output from this code and the 2nd shows me manually manipulating the station names. Rainbow array was used for the second figure as I was trying different things with the code – I cannot show the rainbow array figure since I’m a new user. But I could put it in the comments if that would be helpful for someone to answer my question.
Example line of code from my csv file. Note that site 1 and site 2 are often different, but these two taxa samples are from the same location. I am not sure how to upload more of the data so just showing this line from the csv file as an example.
taxa1 taxa2 site1 site2 rho p.value rho_cut qval
WB1499 WB1500 Discovery Bay Discovery Bay 0.913176451 8.26E-15
0.75 0
####this first set of code was used to create the edges
dataset <- read.csv("ITS_edge_list_name.csv")
co_occur_pairs<-function(dataset){
final.results<-data.frame()
rhos<-c(-.75,-.5,.5,.75)
trts<-as.vector(unique(dataset$trt))
#trts
for(t in 1:length(trts)){
#t<-1
dataset_trt<-subset(dataset, trt==trts[t])
dataset_trt_no0<-subset(dataset_trt, ab1 > 0 & ab2 > 0)
dataset_trt_no0$pairs<-paste(dataset_trt_no0$taxa1,dataset_trt_no0$taxa2)
for(r in 1:4){
#r<-5
if(rhos[r] < 0){temp<-subset(dataset_trt_no0, rho <= rhos[r])}
if(rhos[r] > 0){temp<-subset(dataset_trt_no0, rho >= rhos[r])}
if(dim(temp)[1]>1){
temp.graph<-simplify(graph.edgelist(as.matrix(temp[,c(2,3)]),directed=FALSE))
edge_list<-data.frame(get.edgelist(temp.graph,names=TRUE))
edge_list$pairs<-paste(edge_list$X1,edge_list$X2)
edge_list_pvals<-merge(edge_list,dataset_trt_no0,by="pairs",sort=FALSE )
edge_list_pvals$rho_cut<-rhos[r]
edge_list_pvals$trt<-trts[t]
edge_list_pvals$qval<-fdrtool(edge_list_pvals$p.value, statistic="pvalue",plot=FALSE,verbose=FALSE)$qval
as.matrix(names(edge_list_pvals))
final.results<-rbind(final.results,edge_list_pvals[,-c(2:3)]) }
}
print(t/length(trts))
}
return(final.results)
}
####from the CSV file created in the above step I used igraph for the network analysis
library(igraph)
data <- read.csv("ITS_edge_list.csv")
grays<-gray.colors(3,start=0,end=.5)
graph1<-simplify(graph.edgelist(as.matrix(subset(data, rho > 0.75 & p.value < 0.05)
[1:2]),directed=FALSE))
plot.igraph(graph1, layout = layout.kamada.kawai, edge.arrow.size=0.75,
edge.curved=F, edge.label.color= NA, edge.label.font=5, vertex.shape="circle",
vertex.size=3, edge.color="black", vertex.color=rainbow(25), asp=0.5, main="Network
Co-occurrence All ITS Samples (rho > 0.75, p-value < 0.05)")
This results in the following figure
Here is the same figure, but I edited (in adobe) the station names that would be associated with each node - not allowed to show this since I am a new user (only can show one figure)