diff --git a/_projects/2024/100452420/100452420.Rmd b/_projects/2024/100452420/100452420.Rmd new file mode 100644 index 00000000..7f6024d3 --- /dev/null +++ b/_projects/2024/100452420/100452420.Rmd @@ -0,0 +1,364 @@ +--- +title: "Serena Williams Career" +description: | + it shows the ranking of the famous tennis player. +categories: "2024" +author: Diego Fernández-Alvarez +date: "`r Sys.Date()`" +output: + distill::distill_article: + self_contained: false + toc: true +--- + + +```{r setup, include=FALSE} +knitr::opts_chunk$set(out.width="100%", fig.align="center", fig.showtext = TRUE) +``` + +This post details my process of replicating a chart that visualizes Serena Williams's world tennis ranking from 1998 to 2022. The original chart was published by The New York Times, and you can view it [here.](https://www.nytimes.com/2022/09/08/learning/whats-going-on-in-this-graph-sept-14-2022.html) + + +Below is the original chart for reference: + + +![Original Plot](../100452420/original_plot.png) + +## Original Chart + +The chart displays Serena Williams's fluctuating WTA (Women's Tennis Association) rankings throughout her career. +there are some relevant features such as a reverse y-axis that ranks her at the top with "No. 1" and descends to "20.". Also, the color orange in the line indicates periods when she held the No. 1 ranking. Finally there are some text annotations summarizing key moments in her career, such as her initial rise to No. 1, her return after childbirth, and her record-breaking 186 consecutive weeks at No. 1. + +I selected this graph for its explanatory power and simplicity, making it easy to interpret even for readers unfamiliar with tennis. Moreover, as a lifelong tennis fan, I have a personal connection to the topic. + +## Replication of the graph + +### Libraries and Data Preparation + +For this project, I primarily used the tidyverse package in R for data manipulation and visualization. Obtaining the dataset was challenging, as the official WTA website does not provide comprehensive rankings data. Fortunately, I discovered a GitHub repository by Jeff Sackmann that contained historical WTA rankings. However, the data was divided by decades (90s, 00s, 10s, and 20s) and included all players. + +To prepare the data, I joined the four tables into a single dataset (wta_data). After identifying Serena Williams's player ID (200033), I filtered the data to create the final dataset (serena_data) for the analysis. + +```{r} +library(tidyverse) + +# Reading the data +wta90s <- as_tibble(read.csv("C:/Users/diegu/Downloads/wta_rankings_90s.csv")) +wta00s <- as_tibble(read.csv("C:/Users/diegu/Downloads/wta_rankings_00s.csv")) +wta10s <- as_tibble(read.csv("C:/Users/diegu/Downloads/wta_rankings_10s.csv")) +wta20s <- as_tibble(read.csv("C:/Users/diegu/Downloads/wta_rankings_20s.csv")) + +wta_data <- rbind(wta90s, wta00s, wta10s, wta20s) + +serena_data <- wta_data |> + filter(player == 200033) |> + select(ranking_date, rank) |> + mutate(ranking_date = as.Date(as.character(ranking_date), "%Y%m%d")) + +``` + + +### Adding an extra color for the number one ranking + +To replicate the orange line segments indicating Serena's No. 1 status, I created a line_color column based on specific date ranges. I had to select the specific weeks when Serena was number 1 so when the line goes through those periods of time, the color turns orange and blue in any other cases. + +```{r, fig.dpi = 150} +serena_data <- serena_data |> + mutate(line_color = ifelse(ranking_date >= as.Date("2002-04-24") & + ranking_date <= as.Date("2003-05-15") | + ranking_date >= as.Date("2008-06-20") & + ranking_date <= as.Date("2008-07-13") | + ranking_date >= as.Date("2008-11-30") & + ranking_date <= as.Date("2009-01-20") | + ranking_date >= as.Date("2009-08-15") & + ranking_date <= as.Date("2010-07-15") | + ranking_date >= as.Date("2012-12-15") & + ranking_date <= as.Date("2016-06-15") | + ranking_date >= as.Date("2016-11-10") & + ranking_date <= as.Date("2016-12-25")| + ranking_date >= as.Date("2017-02-02") & + ranking_date <= as.Date("2017-02-15")| + ranking_date >= as.Date("2009-07-27") & + ranking_date <= as.Date("2009-08-01"), + "#FF851B", "#3093CF")) + +``` + + +## Creating the Plot + +### Setting the Aesthetics + +The plot uses ggplot2 to map the x-axis to ranking_date and the y-axis to rank. The y-axis is reversed to emphasize the "No. 1" rank, displaying values from 1 to 20 in intervals of 5, as shown in the original plot. The x-axis spans from 01/01/1998 to 31/12/2022, corresponding to the timeline of Serena’s career, ending in the year of her retirement. + + +```{r, layout = "1-body-outset", fig.width= 10, fig.height= 3.8} + +p <- ggplot(serena_data, aes(x = ranking_date, y = rank)) + + geom_line(color = serena_data$line_color, + size = ifelse(serena_data$line_color == "#FF851B", 1.5, 1)) + + scale_y_reverse(limits = c(20, 1), + breaks = c(1, 5, 10, 15, 20), + labels = c("No. 1" = expression(bold("No. 1")), "5", "10", "15", "20")) + + scale_x_date( + limits = c(as.Date("1998-01-01"), as.Date("2022-12-31")), + breaks = seq(as.Date("1998-01-01"), as.Date("2022-01-01"), by = "1 year"), + date_labels = " | '%y" + ) +p +``` + + +To match the style of the original chart, I used theme_minimal and customized grid lines and axis labels. + +```{r, layout = "1-body-outset", fig.width= 10, fig.height= 3.8} + +p <- p + + theme_minimal(base_size = 10) + + theme( + axis.text.x = element_text(color = "grey"), + plot.title = element_text(hjust = 0.47, face = "bold", size = 10), + axis.title.y = element_blank(), + axis.title.x = element_blank(), + axis.text.y = element_text(hjust = 1, color = "black"), + panel.grid.major.x = element_blank(), + panel.grid.minor = element_blank()) +p +``` + +### Annotations and Titles + +Annotations provide context to key moments in Serena's career. I included the four comments from the original graph. Additionally, I added the title to the plot. Here's how I implemented these changes: + + +```{r, layout = "1-body-outset", fig.width= 10, fig.height= 3.8} +p <- p + + labs( + title = "Serena Williams's world ranking since 1998", + x = "", + y = "" + ) + + annotate("text", x = as.Date("2001-10-01"), y = 1.8, + label = "Williams first rose to\nNo. 1 in July 2002, at\nage 20. She stayed\nNo. 1 for 49 weeks.", + size = 2.5, hjust = 1) + + annotate("text", x = as.Date("2005-02-01"), y = 18, + label = "Her ranking\nfell below\n20 in 2006.", + size = 2.5, hjust = 1) + + annotate("text", x = as.Date("2013-08-01"), y = 3.2, + label = "She stayed No.1 for\n186 consecutive weeks\nfrom Feb. 2013 to\nSept. 2016.", + size = 2.5, hjust = 0) + + annotate("text", x = as.Date("2017-02-01"), y = 18, + label = "She gave birth to\nher daughter in\n2017 and returned\nto tennis in 2018.", + size = 2.5, hjust = 1) +p + +``` + +### Last Adjustments + +Finally, the arrows, dotted lines, and source were added to the chart. Both the arrows and dotted lines were created using the annotate function. The source is placed at the bottom of the chart, as in the original. Although I did not obtain the data directly from the WTA, I aimed to replicate the original plot as closely as possible. + + +```{r, layout = "1-body-outset", fig.width= 10, fig.height= 3.9} +p <- p + + annotate("segment", x = as.Date("1999-03-25"), xend = as.Date("1999-03-25"), y = 15, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("1998-11-07"), xend = as.Date("1998-11-07"), y = 17, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2006-01-25"), xend = as.Date("2006-01-25"), y = 15, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2007-01-25"), xend = as.Date("2007-01-25"), y = 15, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2011-05-20"), xend = as.Date("2011-05-20"), y = 17, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2011-10-01"), xend = as.Date("2011-10-01"), y = 14, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2017-09-01"), xend = as.Date("2017-09-01"), y = 15, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2018-09-01"), xend = as.Date("2018-09-01"), y = 16, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("curve", x = as.Date("2002-01-01"), y = 1, xend = as.Date("2002-06-01"), yend = 1, + curvature = 0, + arrow = arrow(length = unit(0.1, "cm"), type = "closed"), + color = "black", + size = 0.4) + + + annotate("curve", x = as.Date("2005-05-01"), y = 18, xend = as.Date("2006-01-01"), yend = 20, + curvature = -0.3, + arrow = arrow(length = unit(0.1, "cm"), type = "closed"), + color = "black", + size = 0.4) + + + annotate("curve", x = as.Date("2017-07-01"), y = 18, xend = as.Date("2018-04-01"), yend = 20, + curvature = -0.3, + arrow = arrow(length = unit(0.1, "cm"), type = "closed"), + color = "black", + size = 0.4) + + + annotate("curve", x = as.Date("2013-07-01"), y = 2, xend = as.Date("2013-04-01"), yend = 1.2, + curvature = -0.3, + arrow = arrow(length = unit(0.1, "cm"), type = "closed"), + color = "black", + size = 0.4) + + labs(caption = "Source: Women's Tennis Association") + + theme( + plot.caption = element_text(hjust = 0.311, face = "bold", vjust = -1, size = 7, color = "grey"), + plot.caption.position = "plot" + ) +p + +``` + + +## Improvements + +The graph is straightforward and clearly illustrates the player’s career trajectory. While showing a player’s ranking is insightful, I thought it would be more interesting to include the titles she won throughout her career. This addition allows for a deeper analysis of why her ranking dropped on certain occasions and on which surface she played more comfortably, winning the most titles. + +I could not include all tournaments, as the final table would have been too large. Instead, I chose to focus on the four most important tennis tournaments: the Grand Slams. These events award the most points to players, making them a reliable indicator of a player’s ranking. + +I renamed the previous graph to "serena_ranking_plot" and removed the original source to attach the new graph at the bottom. Additionally, I used theme(plot.margin) to reduce the margins and improve the layout. + +```{r, layout = "1-body-outset", fig.width= 10, fig.height= 7} + +serena_ranking_plot <- ggplot(serena_data, aes(x = ranking_date, y = rank)) + + geom_line(color = serena_data$line_color, + size = ifelse(serena_data$line_color == "#EE6A24", 1.5, 1)) + + scale_y_reverse(limits = c(20, 1), + breaks = c(1, 5, 10, 15, 20), + labels = c("No. 1" = expression(bold("No. 1")), "5", "10", "15", "20"))+ + scale_x_date( + limits = c(as.Date("1998-01-01"), as.Date("2022-12-31")), + breaks = seq(as.Date("1998-01-01"), as.Date("2022-01-01"), by = "1 year"), + date_labels = " | '%y" + ) + + labs( + title = "Serena Williams's world ranking and Grand Slam tournaments", + x = "", + y = "" + ) + + theme_minimal(base_size = 12) + + theme( + axis.text.x = element_text(color = "black"), + plot.title = element_text(hjust = 0.5, face = "bold", size = 9), + axis.title.y = element_blank(), + axis.text.y = element_text(hjust = 1, color = "black"), + panel.grid.major.x = element_blank(), + panel.grid.minor.x = element_blank()) + + annotate("text", x = as.Date("2001-10-01"), y = 1.6, + label = "Williams first rose to\nNo. 1 in July 2002, at\nage 20. She stayed\nNo. 1 for 49 weeks.", + size = 3, hjust = 1) + + annotate("text", x = as.Date("2005-03-01"), y = 18, + label = "Her ranking\nfell below\n20 in 2006.", + size = 3, hjust = 1) + + annotate("text", x = as.Date("2013-08-01"), y = 3, + label = "She stayed No.1 for\n186 consecutive weeks\nfrom Feb. 2013 to\nSept. 2016.", + size = 3, hjust = 0) + + annotate("text", x = as.Date("2017-04-01"), y = 18, + label = "She gave birth to\nher daughter in\n2017 and returned\nto tennis in 2018.", + size = 3, hjust = 1) + + annotate("segment", x = as.Date("1999-03-25"), xend = as.Date("1999-03-25"), y = 15, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("1998-11-07"), xend = as.Date("1998-11-07"), y = 17, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2006-01-25"), xend = as.Date("2006-01-25"), y = 15, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2007-01-25"), xend = as.Date("2007-01-25"), y = 15, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2011-05-20"), xend = as.Date("2011-05-20"), y = 17, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2011-10-01"), xend = as.Date("2011-10-01"), y = 14, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2017-09-01"), xend = as.Date("2017-09-01"), y = 15, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("segment", x = as.Date("2018-09-01"), xend = as.Date("2018-09-01"), y = 16, yend = 20, + linetype = "dotted", color = "#3093CF", size = 0.7) + + annotate("curve", x = as.Date("2002-01-01"), y = 1, xend = as.Date("2002-06-01"), yend = 1, + curvature = 0, + arrow = arrow(length = unit(0.1, "cm"), type = "closed"), + color = "black", + size = 0.4) + + + annotate("curve", x = as.Date("2005-05-01"), y = 18, xend = as.Date("2006-01-01"), yend = 20, + curvature = -0.3, + arrow = arrow(length = unit(0.1, "cm"), type = "closed"), + color = "black", + size = 0.4) + + + annotate("curve", x = as.Date("2017-07-01"), y = 18, xend = as.Date("2018-04-01"), yend = 20, + curvature = -0.3, + arrow = arrow(length = unit(0.1, "cm"), type = "closed"), + color = "black", + size = 0.4) + + + annotate("curve", x = as.Date("2013-07-01"), y = 2, xend = as.Date("2013-04-01"), yend = 1.2, + curvature = -0.3, + arrow = arrow(length = unit(0.1, "cm"), type = "closed"), + color = "black", + size = 0.4) + + theme(plot.margin = margin(5, 0, 0, 0)) + +``` + + +The data for the tournaments was collected from [this website](https://en.wikipedia.org/wiki/Serena_Williams_career_statistics), but it was incomplete. I had to create an Excel file and complete the table with all the years and tournaments she played in. + +The libraries used for the improvements were patchwork (to join the two tables) and readxl (to read the Excel file). + +After cleaning the data and selecting relevant columns, I created the plot, assigning the color orange to tournaments she won, grey to those she did not play, and blue to those she played but did not win. I also adjusted the margins using theme(plot.margin) and ordered the tournaments in chronological order. + +Lastly, I joined both graphs, adjusting the heights so that the years matched in both tables. + + +```{r, layout = "1-body-outset", fig.width= 12, fig.height= 7} + +library(patchwork) +library(readxl) +serena_slams <- read_xlsx("C:/Users/diegu/Desktop/Master in Computational Social Sciences/data visualization/replicate plot/dataviz/_projects/2024/100452420/SerenaWilliams_slams_data.xlsx") + +serena_slams <- serena_slams |> + mutate(tournament = + str_replace( + Tournament,"^(US Open|French Open|Wimbledon|Australian Open).*", "\\1")) |> + select(Result, Year, tournament) + +serena_slams$tournament <- factor(serena_slams$tournament, levels = c("US Open", "Wimbledon", "French Open", "Australian Open")) + +serena_slams_plot <- ggplot(serena_slams, aes(x = Year, y = tournament, fill = factor(Result))) + + geom_tile(color = "black")+ + scale_fill_manual(values = c("Won" ="orange", "Did not play" = "grey", "Played" = "#63B8FF"))+ + labs(fill = "Year") + + theme_minimal()+ + coord_fixed() + + theme(plot.margin = margin(0, 0, 5, 0), + axis.title.y = element_blank(), + axis.text.x = element_blank(), + axis.title.x = element_blank(), + axis.ticks = element_blank()) + + +serena_ranking_plot/serena_slams_plot + + plot_layout(heights = c(2, 1)) + +``` + + +## Difficulties encountered + + +The source data was not available due to WTA restrictions, so I had to search for it on GitHub. The main challenge was that the datasets were divided by decades, so I had to import four separate datasets (90s, 00s, 10s, and 20s) and then join the tables. + +Finding the player ID (200033) was also tricky, but thankfully it was mentioned by the GitHub user. + +Another challenge was determining the line colors (orange and blue). I had to create an additional column to specify the color: orange for when Serena was ranked number one, and blue for when she was not. + +Joining the two plots and ensuring they aligned closely was time-consuming. I spent considerable time adjusting the margins to ensure that both charts were joined seamlessly and that the years matched properly. + + + + + + + + + + diff --git a/_projects/2024/100452420/100452420.html b/_projects/2024/100452420/100452420.html new file mode 100644 index 00000000..3d3681a7 --- /dev/null +++ b/_projects/2024/100452420/100452420.html @@ -0,0 +1,1889 @@ + + + + +
+ + + + + + + + + + + + + + + +it shows the ranking of the famous tennis player.
+This post details my process of replicating a chart that visualizes Serena Williams’s world tennis ranking from 1998 to 2022. The original chart was published by The New York Times, and you can view it here.
+Below is the original chart for reference:
+ +The chart displays Serena Williams’s fluctuating WTA (Women’s Tennis Association) rankings throughout her career. +there are some relevant features such as a reverse y-axis that ranks her at the top with “No. 1” and descends to “20.”. Also, the color orange in the line indicates periods when she held the No. 1 ranking. Finally there are some text annotations summarizing key moments in her career, such as her initial rise to No. 1, her return after childbirth, and her record-breaking 186 consecutive weeks at No. 1.
+I selected this graph for its explanatory power and simplicity, making it easy to interpret even for readers unfamiliar with tennis. Moreover, as a lifelong tennis fan, I have a personal connection to the topic.
+For this project, I primarily used the tidyverse package in R for data manipulation and visualization. Obtaining the dataset was challenging, as the official WTA website does not provide comprehensive rankings data. Fortunately, I discovered a GitHub repository by Jeff Sackmann that contained historical WTA rankings. However, the data was divided by decades (90s, 00s, 10s, and 20s) and included all players.
+To prepare the data, I joined the four tables into a single dataset (wta_data). After identifying Serena Williams’s player ID (200033), I filtered the data to create the final dataset (serena_data) for the analysis.
+library(tidyverse)
+
+# Reading the data
+wta90s <- as_tibble(read.csv("C:/Users/diegu/Downloads/wta_rankings_90s.csv"))
+wta00s <- as_tibble(read.csv("C:/Users/diegu/Downloads/wta_rankings_00s.csv"))
+wta10s <- as_tibble(read.csv("C:/Users/diegu/Downloads/wta_rankings_10s.csv"))
+wta20s <- as_tibble(read.csv("C:/Users/diegu/Downloads/wta_rankings_20s.csv"))
+
+wta_data <- rbind(wta90s, wta00s, wta10s, wta20s)
+
+serena_data <- wta_data |>
+ filter(player == 200033) |>
+ select(ranking_date, rank) |>
+ mutate(ranking_date = as.Date(as.character(ranking_date), "%Y%m%d"))
+To replicate the orange line segments indicating Serena’s No. 1 status, I created a line_color column based on specific date ranges. I had to select the specific weeks when Serena was number 1 so when the line goes through those periods of time, the color turns orange and blue in any other cases.
+serena_data <- serena_data |>
+ mutate(line_color = ifelse(ranking_date >= as.Date("2002-04-24") &
+ ranking_date <= as.Date("2003-05-15") |
+ ranking_date >= as.Date("2008-06-20") &
+ ranking_date <= as.Date("2008-07-13") |
+ ranking_date >= as.Date("2008-11-30") &
+ ranking_date <= as.Date("2009-01-20") |
+ ranking_date >= as.Date("2009-08-15") &
+ ranking_date <= as.Date("2010-07-15") |
+ ranking_date >= as.Date("2012-12-15") &
+ ranking_date <= as.Date("2016-06-15") |
+ ranking_date >= as.Date("2016-11-10") &
+ ranking_date <= as.Date("2016-12-25")|
+ ranking_date >= as.Date("2017-02-02") &
+ ranking_date <= as.Date("2017-02-15")|
+ ranking_date >= as.Date("2009-07-27") &
+ ranking_date <= as.Date("2009-08-01"),
+ "#FF851B", "#3093CF"))
+The plot uses ggplot2 to map the x-axis to ranking_date and the y-axis to rank. The y-axis is reversed to emphasize the “No. 1” rank, displaying values from 1 to 20 in intervals of 5, as shown in the original plot. The x-axis spans from 01/01/1998 to 31/12/2022, corresponding to the timeline of Serena’s career, ending in the year of her retirement.
+p <- ggplot(serena_data, aes(x = ranking_date, y = rank)) +
+ geom_line(color = serena_data$line_color,
+ size = ifelse(serena_data$line_color == "#FF851B", 1.5, 1)) +
+ scale_y_reverse(limits = c(20, 1),
+ breaks = c(1, 5, 10, 15, 20),
+ labels = c("No. 1" = expression(bold("No. 1")), "5", "10", "15", "20")) +
+ scale_x_date(
+ limits = c(as.Date("1998-01-01"), as.Date("2022-12-31")),
+ breaks = seq(as.Date("1998-01-01"), as.Date("2022-01-01"), by = "1 year"),
+ date_labels = " | '%y"
+ )
+p
+To match the style of the original chart, I used theme_minimal and customized grid lines and axis labels.
+p <- p +
+ theme_minimal(base_size = 10) +
+ theme(
+ axis.text.x = element_text(color = "grey"),
+ plot.title = element_text(hjust = 0.47, face = "bold", size = 10),
+ axis.title.y = element_blank(),
+ axis.title.x = element_blank(),
+ axis.text.y = element_text(hjust = 1, color = "black"),
+ panel.grid.major.x = element_blank(),
+ panel.grid.minor = element_blank())
+p
+Annotations provide context to key moments in Serena’s career. I included the four comments from the original graph. Additionally, I added the title to the plot. Here’s how I implemented these changes:
+p <- p +
+ labs(
+ title = "Serena Williams's world ranking since 1998",
+ x = "",
+ y = ""
+ ) +
+ annotate("text", x = as.Date("2001-10-01"), y = 1.8,
+ label = "Williams first rose to\nNo. 1 in July 2002, at\nage 20. She stayed\nNo. 1 for 49 weeks.",
+ size = 2.5, hjust = 1) +
+ annotate("text", x = as.Date("2005-02-01"), y = 18,
+ label = "Her ranking\nfell below\n20 in 2006.",
+ size = 2.5, hjust = 1) +
+ annotate("text", x = as.Date("2013-08-01"), y = 3.2,
+ label = "She stayed No.1 for\n186 consecutive weeks\nfrom Feb. 2013 to\nSept. 2016.",
+ size = 2.5, hjust = 0) +
+ annotate("text", x = as.Date("2017-02-01"), y = 18,
+ label = "She gave birth to\nher daughter in\n2017 and returned\nto tennis in 2018.",
+ size = 2.5, hjust = 1)
+p
+Finally, the arrows, dotted lines, and source were added to the chart. Both the arrows and dotted lines were created using the annotate function. The source is placed at the bottom of the chart, as in the original. Although I did not obtain the data directly from the WTA, I aimed to replicate the original plot as closely as possible.
+p <- p +
+ annotate("segment", x = as.Date("1999-03-25"), xend = as.Date("1999-03-25"), y = 15, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("1998-11-07"), xend = as.Date("1998-11-07"), y = 17, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2006-01-25"), xend = as.Date("2006-01-25"), y = 15, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2007-01-25"), xend = as.Date("2007-01-25"), y = 15, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2011-05-20"), xend = as.Date("2011-05-20"), y = 17, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2011-10-01"), xend = as.Date("2011-10-01"), y = 14, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2017-09-01"), xend = as.Date("2017-09-01"), y = 15, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2018-09-01"), xend = as.Date("2018-09-01"), y = 16, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("curve", x = as.Date("2002-01-01"), y = 1, xend = as.Date("2002-06-01"), yend = 1,
+ curvature = 0,
+ arrow = arrow(length = unit(0.1, "cm"), type = "closed"),
+ color = "black",
+ size = 0.4) +
+
+ annotate("curve", x = as.Date("2005-05-01"), y = 18, xend = as.Date("2006-01-01"), yend = 20,
+ curvature = -0.3,
+ arrow = arrow(length = unit(0.1, "cm"), type = "closed"),
+ color = "black",
+ size = 0.4) +
+
+ annotate("curve", x = as.Date("2017-07-01"), y = 18, xend = as.Date("2018-04-01"), yend = 20,
+ curvature = -0.3,
+ arrow = arrow(length = unit(0.1, "cm"), type = "closed"),
+ color = "black",
+ size = 0.4) +
+
+ annotate("curve", x = as.Date("2013-07-01"), y = 2, xend = as.Date("2013-04-01"), yend = 1.2,
+ curvature = -0.3,
+ arrow = arrow(length = unit(0.1, "cm"), type = "closed"),
+ color = "black",
+ size = 0.4) +
+ labs(caption = "Source: Women's Tennis Association") +
+ theme(
+ plot.caption = element_text(hjust = 0.311, face = "bold", vjust = -1, size = 7, color = "grey"),
+ plot.caption.position = "plot"
+ )
+p
+The graph is straightforward and clearly illustrates the player’s career trajectory. While showing a player’s ranking is insightful, I thought it would be more interesting to include the titles she won throughout her career. This addition allows for a deeper analysis of why her ranking dropped on certain occasions and on which surface she played more comfortably, winning the most titles.
+I could not include all tournaments, as the final table would have been too large. Instead, I chose to focus on the four most important tennis tournaments: the Grand Slams. These events award the most points to players, making them a reliable indicator of a player’s ranking.
+I renamed the previous graph to “serena_ranking_plot” and removed the original source to attach the new graph at the bottom. Additionally, I used theme(plot.margin) to reduce the margins and improve the layout.
+serena_ranking_plot <- ggplot(serena_data, aes(x = ranking_date, y = rank)) +
+ geom_line(color = serena_data$line_color,
+ size = ifelse(serena_data$line_color == "#EE6A24", 1.5, 1)) +
+ scale_y_reverse(limits = c(20, 1),
+ breaks = c(1, 5, 10, 15, 20),
+ labels = c("No. 1" = expression(bold("No. 1")), "5", "10", "15", "20"))+
+ scale_x_date(
+ limits = c(as.Date("1998-01-01"), as.Date("2022-12-31")),
+ breaks = seq(as.Date("1998-01-01"), as.Date("2022-01-01"), by = "1 year"),
+ date_labels = " | '%y"
+ ) +
+ labs(
+ title = "Serena Williams's world ranking and Grand Slam tournaments",
+ x = "",
+ y = ""
+ ) +
+ theme_minimal(base_size = 12) +
+ theme(
+ axis.text.x = element_text(color = "black"),
+ plot.title = element_text(hjust = 0.5, face = "bold", size = 9),
+ axis.title.y = element_blank(),
+ axis.text.y = element_text(hjust = 1, color = "black"),
+ panel.grid.major.x = element_blank(),
+ panel.grid.minor.x = element_blank()) +
+ annotate("text", x = as.Date("2001-10-01"), y = 1.6,
+ label = "Williams first rose to\nNo. 1 in July 2002, at\nage 20. She stayed\nNo. 1 for 49 weeks.",
+ size = 3, hjust = 1) +
+ annotate("text", x = as.Date("2005-03-01"), y = 18,
+ label = "Her ranking\nfell below\n20 in 2006.",
+ size = 3, hjust = 1) +
+ annotate("text", x = as.Date("2013-08-01"), y = 3,
+ label = "She stayed No.1 for\n186 consecutive weeks\nfrom Feb. 2013 to\nSept. 2016.",
+ size = 3, hjust = 0) +
+ annotate("text", x = as.Date("2017-04-01"), y = 18,
+ label = "She gave birth to\nher daughter in\n2017 and returned\nto tennis in 2018.",
+ size = 3, hjust = 1) +
+ annotate("segment", x = as.Date("1999-03-25"), xend = as.Date("1999-03-25"), y = 15, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("1998-11-07"), xend = as.Date("1998-11-07"), y = 17, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2006-01-25"), xend = as.Date("2006-01-25"), y = 15, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2007-01-25"), xend = as.Date("2007-01-25"), y = 15, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2011-05-20"), xend = as.Date("2011-05-20"), y = 17, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2011-10-01"), xend = as.Date("2011-10-01"), y = 14, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2017-09-01"), xend = as.Date("2017-09-01"), y = 15, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("segment", x = as.Date("2018-09-01"), xend = as.Date("2018-09-01"), y = 16, yend = 20,
+ linetype = "dotted", color = "#3093CF", size = 0.7) +
+ annotate("curve", x = as.Date("2002-01-01"), y = 1, xend = as.Date("2002-06-01"), yend = 1,
+ curvature = 0,
+ arrow = arrow(length = unit(0.1, "cm"), type = "closed"),
+ color = "black",
+ size = 0.4) +
+
+ annotate("curve", x = as.Date("2005-05-01"), y = 18, xend = as.Date("2006-01-01"), yend = 20,
+ curvature = -0.3,
+ arrow = arrow(length = unit(0.1, "cm"), type = "closed"),
+ color = "black",
+ size = 0.4) +
+
+ annotate("curve", x = as.Date("2017-07-01"), y = 18, xend = as.Date("2018-04-01"), yend = 20,
+ curvature = -0.3,
+ arrow = arrow(length = unit(0.1, "cm"), type = "closed"),
+ color = "black",
+ size = 0.4) +
+
+ annotate("curve", x = as.Date("2013-07-01"), y = 2, xend = as.Date("2013-04-01"), yend = 1.2,
+ curvature = -0.3,
+ arrow = arrow(length = unit(0.1, "cm"), type = "closed"),
+ color = "black",
+ size = 0.4) +
+ theme(plot.margin = margin(5, 0, 0, 0))
+The data for the tournaments was collected from this website, but it was incomplete. I had to create an Excel file and complete the table with all the years and tournaments she played in.
+The libraries used for the improvements were patchwork (to join the two tables) and readxl (to read the Excel file).
+After cleaning the data and selecting relevant columns, I created the plot, assigning the color orange to tournaments she won, grey to those she did not play, and blue to those she played but did not win. I also adjusted the margins using theme(plot.margin) and ordered the tournaments in chronological order.
+Lastly, I joined both graphs, adjusting the heights so that the years matched in both tables.
+library(patchwork)
+library(readxl)
+serena_slams <- read_xlsx("C:/Users/diegu/Desktop/Master in Computational Social Sciences/data visualization/replicate plot/dataviz/_projects/2024/100452420/SerenaWilliams_slams_data.xlsx")
+
+serena_slams <- serena_slams |>
+ mutate(tournament =
+ str_replace(
+ Tournament,"^(US Open|French Open|Wimbledon|Australian Open).*", "\\1")) |>
+ select(Result, Year, tournament)
+
+serena_slams$tournament <- factor(serena_slams$tournament, levels = c("US Open", "Wimbledon", "French Open", "Australian Open"))
+
+serena_slams_plot <- ggplot(serena_slams, aes(x = Year, y = tournament, fill = factor(Result))) +
+ geom_tile(color = "black")+
+ scale_fill_manual(values = c("Won" ="orange", "Did not play" = "grey", "Played" = "#63B8FF"))+
+ labs(fill = "Year") +
+ theme_minimal()+
+ coord_fixed() +
+ theme(plot.margin = margin(0, 0, 5, 0),
+ axis.title.y = element_blank(),
+ axis.text.x = element_blank(),
+ axis.title.x = element_blank(),
+ axis.ticks = element_blank())
+
+
+serena_ranking_plot/serena_slams_plot +
+ plot_layout(heights = c(2, 1))
+The source data was not available due to WTA restrictions, so I had to search for it on GitHub. The main challenge was that the datasets were divided by decades, so I had to import four separate datasets (90s, 00s, 10s, and 20s) and then join the tables.
+Finding the player ID (200033) was also tricky, but thankfully it was mentioned by the GitHub user.
+Another challenge was determining the line colors (orange and blue). I had to create an additional column to specify the color: orange for when Serena was ranked number one, and blue for when she was not.
+Joining the two plots and ensuring they aligned closely was time-consuming. I spent considerable time adjusting the margins to ensure that both charts were joined seamlessly and that the years matched properly.
+
`,e.githubCompareUpdatesUrl&&(t+=`View all changes to this article since it was first published.`),t+=` + If you see mistakes or want to suggest changes, please create an issue on GitHub.
+ `);const n=e.journal;return'undefined'!=typeof n&&'Distill'===n.title&&(t+=` +Diagrams and text are licensed under Creative Commons Attribution CC-BY 4.0 with the source available on GitHub, unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.
+ `),'undefined'!=typeof e.publishedDate&&(t+=` +For attribution in academic contexts, please cite this work as
+${e.concatenatedAuthors}, "${e.title}", Distill, ${e.publishedYear}.+
BibTeX citation
+${m(e)}+ `),t}var An=Math.sqrt,En=Math.atan2,Dn=Math.sin,Mn=Math.cos,On=Math.PI,Un=Math.abs,In=Math.pow,Nn=Math.LN10,jn=Math.log,Rn=Math.max,qn=Math.ceil,Fn=Math.floor,Pn=Math.round,Hn=Math.min;const zn=['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday'],Bn=['Jan.','Feb.','March','April','May','June','July','Aug.','Sept.','Oct.','Nov.','Dec.'],Wn=(e)=>10>e?'0'+e:e,Vn=function(e){const t=zn[e.getDay()].substring(0,3),n=Wn(e.getDate()),i=Bn[e.getMonth()].substring(0,3),a=e.getFullYear().toString(),d=e.getUTCHours().toString(),r=e.getUTCMinutes().toString(),o=e.getUTCSeconds().toString();return`${t}, ${n} ${i} ${a} ${d}:${r}:${o} Z`},$n=function(e){const t=Array.from(e).reduce((e,[t,n])=>Object.assign(e,{[t]:n}),{});return t},Jn=function(e){const t=new Map;for(var n in e)e.hasOwnProperty(n)&&t.set(n,e[n]);return t};class Qn{constructor(e){this.name=e.author,this.personalURL=e.authorURL,this.affiliation=e.affiliation,this.affiliationURL=e.affiliationURL,this.affiliations=e.affiliations||[]}get firstName(){const e=this.name.split(' ');return e.slice(0,e.length-1).join(' ')}get lastName(){const e=this.name.split(' ');return e[e.length-1]}}class Gn{constructor(){this.title='unnamed article',this.description='',this.authors=[],this.bibliography=new Map,this.bibliographyParsed=!1,this.citations=[],this.citationsCollected=!1,this.journal={},this.katex={},this.publishedDate=void 0}set url(e){this._url=e}get url(){if(this._url)return this._url;return this.distillPath&&this.journal.url?this.journal.url+'/'+this.distillPath:this.journal.url?this.journal.url:void 0}get githubUrl(){return this.githubPath?'https://github.com/'+this.githubPath:void 0}set previewURL(e){this._previewURL=e}get previewURL(){return this._previewURL?this._previewURL:this.url+'/thumbnail.jpg'}get publishedDateRFC(){return Vn(this.publishedDate)}get updatedDateRFC(){return Vn(this.updatedDate)}get publishedYear(){return this.publishedDate.getFullYear()}get publishedMonth(){return Bn[this.publishedDate.getMonth()]}get publishedDay(){return this.publishedDate.getDate()}get publishedMonthPadded(){return Wn(this.publishedDate.getMonth()+1)}get publishedDayPadded(){return Wn(this.publishedDate.getDate())}get publishedISODateOnly(){return this.publishedDate.toISOString().split('T')[0]}get volume(){const e=this.publishedYear-2015;if(1>e)throw new Error('Invalid publish date detected during computing volume');return e}get issue(){return this.publishedDate.getMonth()+1}get concatenatedAuthors(){if(2
tag. We found the following text: '+t);const n=document.createElement('span');n.innerHTML=e.nodeValue,e.parentNode.insertBefore(n,e),e.parentNode.removeChild(e)}}}}).observe(this,{childList:!0})}}var Ti='undefined'==typeof window?'undefined'==typeof global?'undefined'==typeof self?{}:self:global:window,_i=f(function(e,t){(function(e){function t(){this.months=['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec'],this.notKey=[',','{','}',' ','='],this.pos=0,this.input='',this.entries=[],this.currentEntry='',this.setInput=function(e){this.input=e},this.getEntries=function(){return this.entries},this.isWhitespace=function(e){return' '==e||'\r'==e||'\t'==e||'\n'==e},this.match=function(e,t){if((void 0==t||null==t)&&(t=!0),this.skipWhitespace(t),this.input.substring(this.pos,this.pos+e.length)==e)this.pos+=e.length;else throw'Token mismatch, expected '+e+', found '+this.input.substring(this.pos);this.skipWhitespace(t)},this.tryMatch=function(e,t){return(void 0==t||null==t)&&(t=!0),this.skipWhitespace(t),this.input.substring(this.pos,this.pos+e.length)==e},this.matchAt=function(){for(;this.input.length>this.pos&&'@'!=this.input[this.pos];)this.pos++;return!('@'!=this.input[this.pos])},this.skipWhitespace=function(e){for(;this.isWhitespace(this.input[this.pos]);)this.pos++;if('%'==this.input[this.pos]&&!0==e){for(;'\n'!=this.input[this.pos];)this.pos++;this.skipWhitespace(e)}},this.value_braces=function(){var e=0;this.match('{',!1);for(var t=this.pos,n=!1;;){if(!n)if('}'==this.input[this.pos]){if(0 =k&&(++x,i=k);if(d[x]instanceof n||d[T-1].greedy)continue;w=T-x,y=e.slice(i,k),v.index-=i}if(v){g&&(h=v[1].length);var S=v.index+h,v=v[0].slice(h),C=S+v.length,_=y.slice(0,S),L=y.slice(C),A=[x,w];_&&A.push(_);var E=new n(o,u?a.tokenize(v,u):v,b,v,f);A.push(E),L&&A.push(L),Array.prototype.splice.apply(d,A)}}}}}return d},hooks:{all:{},add:function(e,t){var n=a.hooks.all;n[e]=n[e]||[],n[e].push(t)},run:function(e,t){var n=a.hooks.all[e];if(n&&n.length)for(var d,r=0;d=n[r++];)d(t)}}},i=a.Token=function(e,t,n,i,a){this.type=e,this.content=t,this.alias=n,this.length=0|(i||'').length,this.greedy=!!a};if(i.stringify=function(e,t,n){if('string'==typeof e)return e;if('Array'===a.util.type(e))return e.map(function(n){return i.stringify(n,t,e)}).join('');var d={type:e.type,content:i.stringify(e.content,t,n),tag:'span',classes:['token',e.type],attributes:{},language:t,parent:n};if('comment'==d.type&&(d.attributes.spellcheck='true'),e.alias){var r='Array'===a.util.type(e.alias)?e.alias:[e.alias];Array.prototype.push.apply(d.classes,r)}a.hooks.run('wrap',d);var l=Object.keys(d.attributes).map(function(e){return e+'="'+(d.attributes[e]||'').replace(/"/g,'"')+'"'}).join(' ');return'<'+d.tag+' class="'+d.classes.join(' ')+'"'+(l?' '+l:'')+'>'+d.content+''+d.tag+'>'},!t.document)return t.addEventListener?(t.addEventListener('message',function(e){var n=JSON.parse(e.data),i=n.language,d=n.code,r=n.immediateClose;t.postMessage(a.highlight(d,a.languages[i],i)),r&&t.close()},!1),t.Prism):t.Prism;var d=document.currentScript||[].slice.call(document.getElementsByTagName('script')).pop();return d&&(a.filename=d.src,document.addEventListener&&!d.hasAttribute('data-manual')&&('loading'===document.readyState?document.addEventListener('DOMContentLoaded',a.highlightAll):window.requestAnimationFrame?window.requestAnimationFrame(a.highlightAll):window.setTimeout(a.highlightAll,16))),t.Prism}();e.exports&&(e.exports=n),'undefined'!=typeof Ti&&(Ti.Prism=n),n.languages.markup={comment://,prolog:/<\?[\w\W]+?\?>/,doctype://i,cdata://i,tag:{pattern:/<\/?(?!\d)[^\s>\/=$<]+(?:\s+[^\s>\/=]+(?:=(?:("|')(?:\\\1|\\?(?!\1)[\w\W])*\1|[^\s'">=]+))?)*\s*\/?>/i,inside:{tag:{pattern:/^<\/?[^\s>\/]+/i,inside:{punctuation:/^<\/?/,namespace:/^[^\s>\/:]+:/}},"attr-value":{pattern:/=(?:('|")[\w\W]*?(\1)|[^\s>]+)/i,inside:{punctuation:/[=>"']/}},punctuation:/\/?>/,"attr-name":{pattern:/[^\s>\/]+/,inside:{namespace:/^[^\s>\/:]+:/}}}},entity:/?[\da-z]{1,8};/i},n.hooks.add('wrap',function(e){'entity'===e.type&&(e.attributes.title=e.content.replace(/&/,'&'))}),n.languages.xml=n.languages.markup,n.languages.html=n.languages.markup,n.languages.mathml=n.languages.markup,n.languages.svg=n.languages.markup,n.languages.css={comment:/\/\*[\w\W]*?\*\//,atrule:{pattern:/@[\w-]+?.*?(;|(?=\s*\{))/i,inside:{rule:/@[\w-]+/}},url:/url\((?:(["'])(\\(?:\r\n|[\w\W])|(?!\1)[^\\\r\n])*\1|.*?)\)/i,selector:/[^\{\}\s][^\{\};]*?(?=\s*\{)/,string:{pattern:/("|')(\\(?:\r\n|[\w\W])|(?!\1)[^\\\r\n])*\1/,greedy:!0},property:/(\b|\B)[\w-]+(?=\s*:)/i,important:/\B!important\b/i,function:/[-a-z0-9]+(?=\()/i,punctuation:/[(){};:]/},n.languages.css.atrule.inside.rest=n.util.clone(n.languages.css),n.languages.markup&&(n.languages.insertBefore('markup','tag',{style:{pattern:/(
+
+
+ ${e.map(l).map((e)=>`
`)}}const Mi=`
+d-citation-list {
+ contain: layout style;
+}
+
+d-citation-list .references {
+ grid-column: text;
+}
+
+d-citation-list .references .title {
+ font-weight: 500;
+}
+`;class Oi extends HTMLElement{static get is(){return'd-citation-list'}connectedCallback(){this.hasAttribute('distill-prerendered')||(this.style.display='none')}set citations(e){x(this,e)}}var Ui=f(function(e){var t='undefined'==typeof window?'undefined'!=typeof WorkerGlobalScope&&self instanceof WorkerGlobalScope?self:{}:window,n=function(){var e=/\blang(?:uage)?-(\w+)\b/i,n=0,a=t.Prism={util:{encode:function(e){return e instanceof i?new i(e.type,a.util.encode(e.content),e.alias):'Array'===a.util.type(e)?e.map(a.util.encode):e.replace(/&/g,'&').replace(/e.length)break tokenloop;if(!(y instanceof n)){c.lastIndex=0;var v=c.exec(y),w=1;if(!v&&f&&x!=d.length-1){if(c.lastIndex=i,v=c.exec(e),!v)break;for(var S=v.index+(g?v[1].length:0),C=v.index+v[0].length,T=x,k=i,p=d.length;T
+
+`);class Ni extends ei(Ii(HTMLElement)){renderContent(){if(this.languageName=this.getAttribute('language'),!this.languageName)return void console.warn('You need to provide a language attribute to your
Footnotes
+
+`,!1);class Fi extends qi(HTMLElement){connectedCallback(){super.connectedCallback(),this.list=this.root.querySelector('ol'),this.root.style.display='none'}set footnotes(e){if(this.list.innerHTML='',e.length){this.root.style.display='';for(const t of e){const e=document.createElement('li');e.id=t.id+'-listing',e.innerHTML=t.innerHTML;const n=document.createElement('a');n.setAttribute('class','footnote-backlink'),n.textContent='[\u21A9]',n.href='#'+t.id,e.appendChild(n),this.list.appendChild(e)}}else this.root.style.display='none'}}const Pi=ti('d-hover-box',`
+
+
+