Hi, I am Khyati and this is my journey exploring graphs, maps and networks!
For one of the assignments we had to plot out names on graphs using the baby names data set.This Data is about Different Baby Names with their Gender, Name, Count and Overall Rank in proportion to the overall data.
glimpse(babynames) # dplyr
## Rows: 1,924,665
## Columns: 5
## $ year <dbl> 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880, 1880,…
## $ sex <chr> "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", "F", …
## $ name <chr> "Mary", "Anna", "Emma", "Elizabeth", "Minnie", "Margaret", "Ida",…
## $ n <int> 7065, 2604, 2003, 1939, 1746, 1578, 1472, 1414, 1320, 1288, 1258,…
## $ prop <dbl> 0.07238359, 0.02667896, 0.02052149, 0.01986579, 0.01788843, 0.016…
head(babynames) # base R
## # A tibble: 6 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1880 F Mary 7065 0.0724
## 2 1880 F Anna 2604 0.0267
## 3 1880 F Emma 2003 0.0205
## 4 1880 F Elizabeth 1939 0.0199
## 5 1880 F Minnie 1746 0.0179
## 6 1880 F Margaret 1578 0.0162
tail(babynames) # same
## # A tibble: 6 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 2017 M Zyhier 5 0.00000255
## 2 2017 M Zykai 5 0.00000255
## 3 2017 M Zykeem 5 0.00000255
## 4 2017 M Zylin 5 0.00000255
## 5 2017 M Zylis 5 0.00000255
## 6 2017 M Zyrie 5 0.00000255
names(babynames) # same
## [1] "year" "sex" "name" "n" "prop"
Plotting my name on a graph -
library(babynames) # contains the actual data
library(dplyr) # for manipulating data
library(ggplot2) # for plotting data
# manipulate_name_data
mydata <- babynames %>%
filter( name == "Khyati" | name == "Khyathi") %>%
filter( sex == "F")
mydata
## # A tibble: 2 × 5
## year sex name n prop
## <dbl> <chr> <chr> <int> <dbl>
## 1 1992 F Khyati 5 0.00000249
## 2 2013 F Khyati 6 0.00000312
glimpse(mydata)
## Rows: 2
## Columns: 5
## $ year <dbl> 1992, 2013
## $ sex <chr> "F", "F"
## $ name <chr> "Khyati", "Khyati"
## $ n <int> 5, 6
## $ prop <dbl> 2.49e-06, 3.12e-06
plot <- ggplot(mydata, aes(x = year,
y = prop,
group = name,
color = name,
colorspaces = sex)) +
geom_line()
plot
It was interesting to find out how rare the name is in the US. The graph output was also different from the usual, indicating the gradual but constant increase in the usage of the name.
In this assignment, we had to map our home town. Initially, I started mapping Mumbai as a whole but later found out it was a large data set to handle on one map. I further scaled it to Malabar Hill and plotted the buildings, highways , famous restaurants and highways linked on the map.
## Using default API key for pickpoint.io. If batch geocoding, please get your own (free) API key at https://app.pickpoint.io/sign-up
## Using default API key for pickpoint.io. If batch geocoding, please get your own (free) API key at https://app.pickpoint.io/sign-up
## Issuing query to Overpass API ...
## Rate limit: 0
## Query complete!
## converting OSM data to sf format
## Issuing query to Overpass API ...
## Rate limit: 0
## Query complete!
## converting OSM data to sf format
## Issuing query to Overpass API ...
## Rate limit: 0
## Query complete!
## converting OSM data to sf format
## Issuing query to Overpass API ...
## Rate limit: 0
## Query complete!
## converting OSM data to sf format
#1
tm_shape(dat_malabarhill_B) + tm_fill(col="lightcoral") +
#2
tm_shape(dat_malabarhill_H) + tm_lines(col="steelblue3" , lwd = 1) +
#3
tm_shape(dat_malabarhill_R) + tm_dots(size = 0.7 , col="plum4" , shape = 21) + tm_text("name", auto.placement = TRUE, size = 0.5 , alpha = 1) +
#4
tm_shape(dat_malabarhill_T) + tm_dots(size = 0.1 , col="palegreen3" , shape = 21) +
#5
tm_compass(type = "rose" , position = c("right" , "bottom") , size = 1.5) +
tm_scale_bar(width = 4 , position = c("right" , "bottom" , text.size = 1)) +
tm_layout( title = "MALABAR HILL, MUMBAI" , title.size = 0.7 , title.color = "black" ,title.position = c("left" , "top"), frame = TRUE, frame.lwd = 3 , bg.color = "lightyellow")
## Scale bar width set to 0.25 of the map width
For this assignment, the task was to explore and analyse drama using Network Graphs. The TV show I chose was ‘This Is Us’. To analyse the show, I saw the first episode of season 1 and explored the connections between the main and guest characters. It was fun to see the connections evolve during the episode and it gave me a better understanding of the show.
## Rows: 19 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (5): name, sex, race, cast(season 1), occupation
## dbl (2): node id, birthyear
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## Rows: 21 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): type
## dbl (3): from, to, weight
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 19 × 7
## id name sex race `cast(season 1)` birthyear occupation
## <int> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 1 Jack Pearson M white main 1944 constructio…
## 2 2 Rebecca Pearson F white main 1950 housewife/s…
## 3 3 Randall Pearson M black main 1980 businessman
## 4 4 Kate Pearson F white main 1980 personal as…
## 5 5 Kevin Pearson M white main 1980 actor
## 6 6 Beth Pearson F black main NA lawyer
## 7 7 Toby Damon M white main NA IT employee
## 8 8 William Hill M black main NA musician
## 9 9 Miguel Rivas M other recurring NA constructio…
## 10 10 Sophie F white recurring NA nurse
## 11 11 Tess Pearson F black recurring 2008 student
## 12 12 Annie Pearson F black recurring NA student
## 13 13 Dr. Nathan Katowski M white guest NA doctor
## 14 14 Madison Simons F white recurring NA <NA>
## 15 15 Random Party Girls F other guest NA <NA>
## 16 16 Director manny M white guest NA director
## 17 17 Attendee fat group F white guest NA <NA>
## 18 18 Alan thicke M white guest NA actor
## 19 19 Firestation worker M white guest NA firestation…
## # A tibble: 21 × 4
## from to weight type
## <dbl> <dbl> <dbl> <chr>
## 1 1 2 31 family
## 2 4 4 1 self
## 3 3 12 4 family
## 4 5 15 9 friend
## 5 5 4 28 family
## 6 1 13 27 family
## 7 2 13 21 family
## 8 5 16 11 work
## 9 3 6 19 family
## 10 6 12 2 family
## # … with 11 more rows
## # A tbl_graph: 19 nodes and 21 edges
## #
## # An undirected multigraph with 6 components
## #
## # Node Data: 19 × 7 (active)
## id name sex race `cast(season 1)` birthyear occupation
## <int> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 1 Jack Pearson M white main 1944 construction wor…
## 2 2 Rebecca Pearson F white main 1950 housewife/singer
## 3 3 Randall Pearson M black main 1980 businessman
## 4 4 Kate Pearson F white main 1980 personal assista…
## 5 5 Kevin Pearson M white main 1980 actor
## 6 6 Beth Pearson F black main NA lawyer
## # … with 13 more rows
## #
## # Edge Data: 21 × 4
## from to weight type
## <int> <int> <dbl> <chr>
## 1 1 2 31 family
## 2 4 4 1 self
## 3 3 12 4 family
## # … with 18 more rows
# visualizing between-ness and degree of nodes
tiu %>%
activate(nodes) %>%
mutate(degree = centrality_degree()) %>%
activate(edges) %>%
mutate(betweenness = centrality_edge_betweenness()) %>%
ggraph(layout = "nicely") +
geom_edge_link(aes(alpha = betweenness)) +
geom_node_point(aes(size = degree, colour = degree)) +
scale_color_gradient(guide = "legend") +
geom_node_label(aes(label = name),
repel = TRUE, max.overlaps = 20,
alpha = 0.6,
size = 3)
# visualize communities of nodes
tiu %>%
activate(nodes) %>%
mutate(community = as.factor(group_louvain())) %>%
ggraph(layout = "graphopt") +
geom_edge_link() +
geom_node_point(aes(color = community), size = 5) +
geom_node_label(aes(label = name),
repel = TRUE, max.overlaps = 20,
alpha = 0.6,
size = 3) +
labs(title = "This Is Us (NBC Series)",
subtitle = "Season 1 Episode 1",
caption = "Duration 40 minutes")
tiu
## # A tbl_graph: 19 nodes and 21 edges
## #
## # An undirected multigraph with 6 components
## #
## # Node Data: 19 × 7 (active)
## id name sex race `cast(season 1)` birthyear occupation
## <int> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 1 Jack Pearson M white main 1944 construction wor…
## 2 2 Rebecca Pearson F white main 1950 housewife/singer
## 3 3 Randall Pearson M black main 1980 businessman
## 4 4 Kate Pearson F white main 1980 personal assista…
## 5 5 Kevin Pearson M white main 1980 actor
## 6 6 Beth Pearson F black main NA lawyer
## # … with 13 more rows
## #
## # Edge Data: 21 × 4
## from to weight type
## <int> <int> <dbl> <chr>
## 1 1 2 31 family
## 2 4 4 1 self
## 3 3 12 4 family
## # … with 18 more rows
From the above graphs, we come to know that there were three obvious connections (work,family,friends) with three parallel stories running side by side. the individual nodes indicate the characters which appear later in season 1 and have not yet engaged into conversation with the main characters.
Throughout the two weeks of this workshop, I realized that it’s important to remember that learning coding/programming language is very similar to learning a spoken language in the sense that you need to be using it regularly or it will not stick. Learning ‘R’ as a coding language was a trip of ups and downs. It did get overwhelming at times as we had to process a lot of information in such a short amount of time. However, the joy of your codes working and finally seeing the desired outcome surpassed it. The best parts of this workshop were when we listened to music every morning before class started and the part where we coded as a class and tried new things. I realized that you can use ‘R’ to clean, analyze, and graph my data. It can be used for research to estimate and display results which will be helpful for future projects. This course was really helpful and i wish to explore it more:)