I’m trying to create a sociogram of 30 individuals (top row) and how many censuses they were present (first column).
Majority of the observations in this dataset were present (indicated by a 1, an absence was indicated by a 0).
Essentially looks like this:
1 1 0
1 0 1
1 1 1
Any help would be great.
Could you perhaps elaborate a bit more on the exact file format that you have your data in? If possible, could you post a (anonymized) version of your dataset?
In what language would you like to work? Python, R or Mathematica?
R studio, it is essentially a matrix of individuals 1-30 being on the top row and on the first column there are 21 censuses.
For each census, attendance was taken to account for all individuals being present during the given census (assigned a 1), or absent during the census (assigned a 0).
Please let me know if that is helpful. Thanks!
If I understand correctly, I think you are trying to make a bipartite graph where the first row would be the names or IDs of the person (each of these becomes a vertex), and the first columns the names or IDs of the each census (each of these also becomes a vertex). Then there will be edges (undirected is sufficient as direction is implied by the bipartition) only between vertices in the Person group and vertices in the Census group.
If you load the ‘tidyverse’ library, which includes readr and tibble packages, you can read the data in to a table using readr::read_delim or one of its specializations (read_csv etc.)
Then you can extract the row names to be person vertex names and the first column to be census attribute names.
Then transform the matrix into an edge list where there is an entry for each ‘1’ in the matrix that has the census in the first column and the person in the second. This part will take a little thought. I’ve done it with data that was structured differently, but found package ‘tidyr’ methods to be very useful. (Also ‘dplyr’ if joins are needed. Sorry I don’t have time to think through your particular case.)
Once you have this edge list, load the igraph package and call graph_from_data_frame on the edge list. A vertex attribute data frame is optional but if you use it make sure the vertex set is exactly the same as those mentioned in the edges. I suggest you put a logical variable ‘type’ on the vertices and make it TRUE or FALSE depending on whether the vertex is a person or census.
Once the biparatite graph is made, if you need to have direct links between persons who participated in a shared census, then use igraph bipartite_projection, which will give a list of two projection graphs according to the type attribute.
ps. probably tidyr::pivot_longer “Multiple observations per row” use case to make a separate row for each census+person combination, then filter out those with 0.
Thank you, very informative. I will check out transforming the matrix via tidyverse!