Grouping vertex colors by station - over 15 stations

My colleagues and I have sequenced cyanobacteria strains from numerous locations in a single system to understand how certain communities/locations may be connected. I am using R and this is my first time ever using igraph or doing any network analysis.

I would like to be able to group the samples by station name. I am having several problems with this. First, if I try replacing the sample name (taxa1 and taxa2 in code) with the station name (site1 and site2) in the csv file then it does not plot the network analysis correctly. This would only get me part way to my solution anyway. So I think I need to group by the unique sample identifiers (taxa1 and taxa2). I have over 15 groups…I’d like all of the Stockton groups to have a green color and then I would like Discovery Bay, Indian Slough, Old River @ Kings, Rancho del Rio all a purple color scheme (each site slightly different) hue. Then the remaining sites a different color. Then I want a legend with the names of sites and associated color of the node.

Is this possible? Is there some way to differentiate this many groups in color.vertex?

At the bottom of this question are two figures - one is the output from this code and the 2nd shows me manually manipulating the station names. Rainbow array was used for the second figure as I was trying different things with the code – I cannot show the rainbow array figure since I’m a new user. But I could put it in the comments if that would be helpful for someone to answer my question.

Example line of code from my csv file. Note that site 1 and site 2 are often different, but these two taxa samples are from the same location. I am not sure how to upload more of the data so just showing this line from the csv file as an example.

 taxa1       taxa2         site1                   site2                 rho                      p.value       rho_cut    qval
 WB1499   WB1500  Discovery Bay   Discovery Bay  0.913176451    8.26E-15

0.75 0

####this first set of code was used to create the edges

dataset <- read.csv("ITS_edge_list_name.csv")



 for(t in 1:length(trts)){

  dataset_trt<-subset(dataset, trt==trts[t])
  dataset_trt_no0<-subset(dataset_trt, ab1 > 0 & ab2 > 0)


 for(r in 1:4){
  if(rhos[r] < 0){temp<-subset(dataset_trt_no0, rho <= rhos[r])}
  if(rhos[r] > 0){temp<-subset(dataset_trt_no0, rho >= rhos[r])}
    edge_list_pvals<-merge(edge_list,dataset_trt_no0,by="pairs",sort=FALSE  )
    edge_list_pvals$qval<-fdrtool(edge_list_pvals$p.value, statistic="pvalue",plot=FALSE,verbose=FALSE)$qval
    final.results<-rbind(final.results,edge_list_pvals[,-c(2:3)]) }

####from the CSV file created in the above step I used igraph for the network analysis

data <- read.csv("ITS_edge_list.csv")

 graph1<-simplify(graph.edgelist(as.matrix(subset(data, rho > 0.75 & p.value < 0.05) 

plot.igraph(graph1, layout =  layout.kamada.kawai,  edge.arrow.size=0.75,  
edge.curved=F, edge.label.color= NA, edge.label.font=5, vertex.shape="circle", 
vertex.size=3, edge.color="black", vertex.color=rainbow(25), asp=0.5, main="Network 
Co-occurrence All ITS Samples (rho > 0.75, p-value < 0.05)")

This results in the following figure

Here is the same figure, but I edited (in adobe) the station names that would be associated with each node - not allowed to show this since I am a new user (only can show one figure)

Trying to post the rainbow array that I mentioned in my post here.

First, if I try replacing the sample name (taxa1 and taxa2 in code) with the station name (site1 and site2) in the csv file then it does not plot the network analysis correctly.

Can you show us the incorrect result? That seems like something that should be fixed first.