diff --git a/story/IMG/Surfer_position.png b/story/IMG/Surfer_position.png index acf118b..ed33e35 100644 Binary files a/story/IMG/Surfer_position.png and b/story/IMG/Surfer_position.png differ diff --git a/story/NestedValueForConnection.Rmd b/story/NestedValueForConnection.Rmd index d7ab50d..84c9f47 100644 --- a/story/NestedValueForConnection.Rmd +++ b/story/NestedValueForConnection.Rmd @@ -83,7 +83,7 @@ data %>% plot.margin=grid::unit(c(0,0,0,0), "cm"), ) + ggplot2::annotate("text", x = -150, y = -45, hjust = 0, size = 11, label = paste("Where surfers live."), color = "Black") + - ggplot2::annotate("text", x = -150, y = -51, hjust = 0, size = 8, label = paste("data-to-viz.com | 200,000 #surf tweets recovered"), color = "black", alpha = 0.5) + + ggplot2::annotate("text", x = -150, y = -60, hjust = 0, size = 8, label = paste("data-to-viz.com | 200,000 #surf tweets recovered"), color = "black", alpha = 0.5) + xlim(-180,180) + ylim(-60,80) + scale_x_continuous(expand = c(0.006, 0.006)) + diff --git a/story/NestedValueForConnection.html b/story/NestedValueForConnection.html index 08e3dd3..8b391ad 100644 --- a/story/NestedValueForConnection.html +++ b/story/NestedValueForConnection.html @@ -376,7 +376,7 @@
data %>%
- mutate( highlight=ifelse(name=="Amanda", "Amanda", "Other")) %>%
- ggplot( aes(x=year, y=n, group=name, color=highlight, linewidth=highlight)) +
- geom_line() +
- scale_color_manual(values = c("#69b3a2", "lightgrey")) +
- scale_size_manual(values=c(1.5,0.2)) +
- theme(legend.position="none") +
- ggtitle("Popularity of American names in the previous 30 years") +
- theme_ipsum() +
- geom_label(x = 1990, y = 55000, label = "Amanda reached 3550\nbabies in 1970", size = 4, color = "#69b3a2")
This is a good way to describe the behavior of a specific group in the dataset.
@@ -567,8 +571,8 @@A variation of the stacked area graph is the percent
stacked area graph. It is the same thing but value of each group are
normalized at each time stamp. That allows to study the percentage of
@@ -587,8 +591,8 @@
Visualizing a unique Numeric variable
-Visualizing a unique Numeric variable
+
This document gives a few suggestions to analyse a dataset composed by a unique numeric variable.
It considers the night price of about 10,000 Airbnb appartements on the French Riviera in France.
This example dataset has been downloaded from the Airbnb website and is available on this Github repository. Basically it looks like the table beside.
This document gives a few suggestions to analyse a
+dataset composed by a unique numeric variable.
It considers the
+night price of about 10,000 Airbnb
+appartements on the French Riviera in France.
This example dataset
+has been downloaded from the Airbnb website and
+is available on this Github
+repository. Basically it looks like the table beside.
# Libraries
-library(tidyverse)
-library(hrbrthemes)
-library(kableExtra)
-options(knitr.table.format = "html")
-
-# Load dataset from github
-data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)
-
-# show data
-data %>% head(6) %>% kable() %>%
- kable_styling(bootstrap_options = "striped", full_width = F)
# Libraries
+library(tidyverse)
+library(hrbrthemes)
+library(kableExtra)
+options(knitr.table.format = "html")
+
+# Load dataset from github
+data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/1_OneNum.csv", header=TRUE)
+
+# show data
+data %>% head(6) %>% kable() %>%
+ kable_styling(bootstrap_options = "striped", full_width = F)
#Histogram ***
The mos
-You can learn more about each type of graphic presented in this story in the dedicated sections. Click the icon below:
-Any thoughts on this? Found any mistake? Have another way to show the data? Please drop me a word on Twitter or in the comment section below:
-+ You can learn more about each type of graphic presented in this + story in the dedicated sections. Click the icon below: +
+ ++ Data To Viz is a + comprehensive classification of chart types organized by data + input format. Get a high-resolution version of our decision tree + delivered to your inbox now! +
+A work by Yan Holtz for data-to-viz.com
@@ -375,28 +428,28 @@Visualizing the world population
-Visualizing the world population
+
This document gives a few suggestions to analyse a nested
or hierarchical
dataset in which a numeric value is available for each leaf. This kind of data has an origine node that gives birth to subsequent nodes and so on until the final leaves.
This document gives a few suggestions to analyse a
+nested
or hierarchical
dataset in which a
+numeric value is available for each leaf. This kind of data has an
+origine node that gives birth to subsequent nodes and so on until the
+final leaves.
Take the world population of 250 countries as an example. The world is divided in continent (group), continent are divided in regions (subgroup), and regions are divided in countries. In this tree structure, countries are considered as leaves: they are at the end of the branches.
+Take the world population of 250 countries as an example. The world +is divided in continent (group), continent are divided in regions +(subgroup), and regions are divided in countries. In this tree +structure, countries are considered as leaves: they are at the end of +the branches.
Data come from wikipedia, formatted thanks to these 2 pages. (1, 2). A clean .csv
file is available on github. It looks like that:
# Libraries
-library(tidyverse)
-library(hrbrthemes)
-library(kableExtra)
-options(knitr.table.format = "html")
-library(viridis)
-
-# Load dataset from github
-data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/11_SevCatOneNumNestedOneObsPerGroup.csv", header=T, sep=";")
-data[ which(data$value==-1),"value"] <- 1
-colnames(data) <- c("Continent", "Region", "Country", "Pop")
-
-# show data
-data %>% head(3) %>% kable() %>%
- kable_styling(bootstrap_options = "striped", full_width = F)
Data come from wikipedia, formatted thanks to these 2 pages. (1, 2). A clean
+.csv
file is available on github.
+It looks like that:
# Libraries
+library(tidyverse)
+library(hrbrthemes)
+library(kableExtra)
+options(knitr.table.format = "html")
+library(viridis)
+
+# Load dataset from github
+data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/11_SevCatOneNumNestedOneObsPerGroup.csv", header=T, sep=";")
+data[ which(data$value==-1),"value"] <- 1
+colnames(data) <- c("Continent", "Region", "Country", "Pop")
+
+# show data
+data %>% head(3) %>% kable() %>%
+ kable_styling(bootstrap_options = "striped", full_width = F)
-qu’ma +pĂ¢tes | |||||||||||||||||||||||
-sparrow +riche | |||||||||||||||||||||||
-lyrics +m’éteindre | |||||||||||||||||||||||
-riche +Ă©chec | |||||||||||||||||||||||
-dé +noir | |||||||||||||||||||||||
-fasses +complete | |||||||||||||||||||||||
-j’suis +temps | |||||||||||||||||||||||
-souffres +emprunt | |||||||||||||||||||||||
-seum +corps | |||||||||||||||||||||||
-tuerait +m’ont | |||||||||||||||||||||||
-l’bois +nekfeu | |||||||||||||||||||||||
-d’un +valeurs | |||||||||||||||||||||||
-brassens +demande | |||||||||||||||||||||||
-d’avoir +j’ai | |||||||||||||||||||||||
-d’hélène +above | |||||||||||||||||||||||
-l’pornographe +posthume | |||||||||||||||||||||||
-georges +valent | |||||||||||||||||||||||
-sète +vague |
This kind of data can be represented using a heatmap (2d):
-don %>%
- na.omit() %>%
- ggplot(aes(x=as.numeric(long), y=lat, fill=altitude)) +
- geom_tile() +
- scale_fill_viridis() +
- theme_ipsum() +
- xlab("longitude") +
- ylab("latitude")
This kind of data can be represented using a heatmap +(2d):
+don %>%
+ na.omit() %>%
+ ggplot(aes(x=as.numeric(long), y=lat, fill=altitude)) +
+ geom_tile() +
+ scale_fill_viridis() +
+ theme_ipsum() +
+ xlab("longitude") +
+ ylab("latitude")
Another way is to build a surface plot. It really makes sense to use 3D in this special case since it allows to visualize the real shape of the volcano:
-plot_ly(z = volcano, type = "surface")
Another way is to build a surface plot. It really makes sense to use +3D in this special case since it allows to visualize the real shape of +the volcano:
+ + +You can learn more about each type of graphic presented in this story in the dedicated sections. Click the icon below:
-Any thoughts on this? Found any mistake? Have another way to show the data? Please drop me a word on Twitter or in the comment section below:
-+ You can learn more about each type of graphic presented in this + story in the dedicated sections. Click the icon below: +
+ ++ Data To Viz is a + comprehensive classification of chart types organized by data + input format. Get a high-resolution version of our decision tree + delivered to your inbox now! +
+A work by Yan Holtz for data-to-viz.com
@@ -670,28 +767,28 @@Apartment price vs ground living area.
-Apartment price vs ground living area.
+
This document gives a few suggestions to analyse a dataset composed by two numeric variables. It considers the price of 1460 apartements (SalePrice
) and their ground living area (GrLivArea
).
This dataset comes from a kaggle machine learning competition. The two variables studied here are available on Github repository. Basically it looks like the table beside.
This document gives a few suggestions to analyse a
+dataset composed by two numeric variables. It considers the price of
+1460 apartements (SalePrice
) and their ground living area
+(GrLivArea
).
This dataset comes from a kaggle
+machine learning competition. The two variables studied here are
+available on Github
+repository. Basically it looks like the table beside.
# Libraries
-library(tidyverse)
-library(hrbrthemes)
-library(kableExtra)
-options(knitr.table.format = "html")
-library(viridis)
-library(ggExtra)
-library(patchwork)
-
-# Load dataset from github
-data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/2_TwoNum.csv", header=T, sep=",") %>% select(GrLivArea, SalePrice)
-
-# show data
-data %>% head(6) %>% kable() %>%
- kable_styling(bootstrap_options = "striped", full_width = F)
# Libraries
+library(tidyverse)
+library(hrbrthemes)
+library(kableExtra)
+options(knitr.table.format = "html")
+library(viridis)
+library(ggExtra)
+library(patchwork)
+
+# Load dataset from github
+data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/2_TwoNum.csv", header=T, sep=",") %>% select(GrLivArea, SalePrice)
+
+# show data
+data %>% head(6) %>% kable() %>%
+ kable_styling(bootstrap_options = "striped", full_width = F)
As usual when working with numeric variables, it is always a good practice to check their distributions. Here Prices and Ground living areas are on two different scales so it makes sense to study them in two different graphics. This can be done using a histogram or a density plot.
-p1 <- data %>%
- ggplot( aes(x=GrLivArea)) +
- geom_histogram(fill="#69b3a2", color="#e9ecef", alpha=0.9, bins=50) +
- ggtitle("Ground living area distribution") +
- theme_ipsum() +
- theme(
- plot.title = element_text(size=12)
- ) +
- xlab('area')
-
-p2 <- data %>%
- ggplot( aes(x=SalePrice/1000)) +
- geom_histogram(fill="#69b3a2", color="#e9ecef", alpha=0.9, bins=50) +
- ggtitle("Sale price distribution") +
- theme_ipsum() +
- theme(
- plot.title = element_text(size=12)
- )+
- xlab('Sale price (k$)')
-
-p1 + p2
#Distribution *** As usual when working with numeric variables, it is +always a good practice to check their distributions. Here Prices and +Ground living areas are on two different scales so it makes sense to +study them in two different graphics. This can be done using a histogram or a density plot.
+p1 <- data %>%
+ ggplot( aes(x=GrLivArea)) +
+ geom_histogram(fill="#69b3a2", color="#e9ecef", alpha=0.9, bins=50) +
+ ggtitle("Ground living area distribution") +
+ theme_ipsum() +
+ theme(
+ plot.title = element_text(size=12)
+ ) +
+ xlab('area')
+
+p2 <- data %>%
+ ggplot( aes(x=SalePrice/1000)) +
+ geom_histogram(fill="#69b3a2", color="#e9ecef", alpha=0.9, bins=50) +
+ ggtitle("Sale price distribution") +
+ theme_ipsum() +
+ theme(
+ plot.title = element_text(size=12)
+ )+
+ xlab('Sale price (k$)')
+
+p1 + p2
This allows to understand that most of the prices range between 100 and 300 k$ with extreme values reaching 750 k$.
-The next step is to study the relationship between the 2 variables. Basically to explore if there is a correlation between sale price and living area. The first chart type to try in this case is the scatterplot.
-data %>%
- ggplot( aes(x=GrLivArea, y=SalePrice/1000)) +
- geom_point(color="#69b3a2", alpha=0.8) +
- ggtitle("Ground living area partially explains sale price of apartments") +
- theme_ipsum() +
- theme(
- plot.title = element_text(size=12)
- ) +
- ylab('Sale price (k$)') +
- xlab('Ground living area')
It is quite obvious that there is a relationship between prices and ground living area.
The previous graphic convey most of the information efficiently. Still, there are a few customizations that can be done to make the chart even more insightful:
+This allows to understand that most of the prices range between 100 +and 300 k$ with extreme values reaching 750 k$.
+#Scatterplot *** The next step is to study the relationship between +the 2 variables. Basically to explore if there is a correlation between +sale price and living area. The first chart type to try in this case is +the scatterplot.
+data %>%
+ ggplot( aes(x=GrLivArea, y=SalePrice/1000)) +
+ geom_point(color="#69b3a2", alpha=0.8) +
+ ggtitle("Ground living area partially explains sale price of apartments") +
+ theme_ipsum() +
+ theme(
+ plot.title = element_text(size=12)
+ ) +
+ ylab('Sale price (k$)') +
+ xlab('Ground living area')
+It is quite obvious that there is a relationship between prices and
+ground living area.
#Improving the scatter plot {.tabset} *** The previous graphic convey +most of the information efficiently. Still, there are a few +customizations that can be done to make the chart even more +insightful:
trend line
with confidence interval to illustrate and clarify the relationshipinteractive
version to get more information concerning each data point.trend line
with confidence interval to
+illustrate and clarify the relationshipinteractive
version to get more information
+concerning each data point.marginal distribution
Help the reader seing the trend on the chart by showing it explicitely. Several models exist to show a trend. A linear regression is used on the left plot, and a local regression is used on the right. Showing the confindence interval is a good practice as well.
-p <- data %>%
- ggplot( aes(x=GrLivArea, y=SalePrice/1000)) +
- geom_point(color="#69b3a2", alpha=0.8) +
- theme_ipsum() +
- theme(
- plot.title = element_text(size=12)
- ) +
- ylab('Sale price (k$)') +
- xlab('Ground living area')
-
-p1 <- p + ggtitle("Linear regression") +
- geom_smooth(method='lm', color="black", alpha=0.8, size=0.5, fill="skyblue", se=FALSE)
-
-p2 <- p + ggtitle("Loess") +
- geom_smooth(method='loess', color="black", alpha=0.8, size=0.5, fill="skyblue")
-
-p1 + p2
##Trend Help the reader seing the trend on the chart by showing it +explicitely. Several models exist to show a trend. A linear +regression is used on the left plot, and a local +regression is used on the right. Showing the confindence interval is +a good practice as well.
+p <- data %>%
+ ggplot( aes(x=GrLivArea, y=SalePrice/1000)) +
+ geom_point(color="#69b3a2", alpha=0.8) +
+ theme_ipsum() +
+ theme(
+ plot.title = element_text(size=12)
+ ) +
+ ylab('Sale price (k$)') +
+ xlab('Ground living area')
+
+p1 <- p + ggtitle("Linear regression") +
+ geom_smooth(method='lm', color="black", alpha=0.8, size=0.5, fill="skyblue", se=FALSE)
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
+## ℹ Please use `linewidth` instead.
+## This warning is displayed once every 8 hours.
+## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
+## generated.
+p2 <- p + ggtitle("Loess") +
+ geom_smooth(method='loess', color="black", alpha=0.8, size=0.5, fill="skyblue")
+
+p1 + p2
## `geom_smooth()` using formula = 'y ~ x'
+## `geom_smooth()` using formula = 'y ~ x'
Scatter plot is probably the chart type for which it makes the most sense to use interactivity. For instance, it allows to hover a dot to have more information about it. It allows to zoom on a specific part of the chart as well.
-# Plotly allows to turn any ggplot2 graphic interactive
-library(plotly)
-
-p <- data %>%
- mutate(text=paste("Apartment Number: ", seq(1:nrow(data)), "\nLocation: New York\nAny other information you need..", sep="")) %>%
- ggplot( aes(x=GrLivArea, y=SalePrice/1000, text=text)) +
- geom_point(color="#69b3a2", alpha=0.8) +
- ggtitle("Ground living area partially explains sale price of apartments") +
- theme_ipsum() +
- theme(
- plot.title = element_text(size=12)
- ) +
- ylab('Sale price (k$)') +
- xlab('Ground living area')
-
-ggplotly(p, tooltip="text")
If the number of data points on the scatterplot is high, it is a good practice to display the marginal distributions arount the graphic.
-data %>%
- ggplot( aes(x=GrLivArea, y=SalePrice)) +
- geom_point() #%>%
##Interactivity Scatter plot is probably the chart type for which it +makes the most sense to use interactivity. For instance, it allows to +hover a dot to have more information about it. It allows to zoom on a +specific part of the chart as well.
+# Plotly allows to turn any ggplot2 graphic interactive
+library(plotly)
+
+p <- data %>%
+ mutate(text=paste("Apartment Number: ", seq(1:nrow(data)), "\nLocation: New York\nAny other information you need..", sep="")) %>%
+ ggplot( aes(x=GrLivArea, y=SalePrice/1000, text=text)) +
+ geom_point(color="#69b3a2", alpha=0.8) +
+ ggtitle("Ground living area partially explains sale price of apartments") +
+ theme_ipsum() +
+ theme(
+ plot.title = element_text(size=12)
+ ) +
+ ylab('Sale price (k$)') +
+ xlab('Ground living area')
+
+ggplotly(p, tooltip="text")
##Marginal distribution If the number of data points on the +scatterplot is high, it is a good practice to display the marginal +distributions arount the graphic.
+ #ggMarginal(type="histogram")
The most common pitfall with scatterplot is overplotting: when the sample size gets big, dots are plotted on top of each other what makes the chart unreadable. There are several work around to avoid this issue as describe in this specific post. Here is a summary of the different offered techniques:
-# code for all graphics:
-p <- data %>%
- ggplot( aes(x=GrLivArea, y=SalePrice/1000)) +
- theme_ipsum() +
- theme(
- plot.title = element_text(size=12)
- ) +
- ylab('Sale price (k$)') +
- xlab('Ground living area')
-
-# Reduce dot size
-p1 <- p + geom_point(color="#69b3a2", alpha=0.8, size=0.2) + ggtitle("Dot size")
-
-# Use density estimate
-p2 <- p + geom_density2d(color="#69b3a2") + ggtitle("Density 2d: contour")
-
-# Use density estimate (area)
-p3 <- p + stat_density_2d(aes(fill = ..level..), geom = "polygon") + ggtitle("Density 2d: area") + theme(legend.position="none")
-
-# With raster
-p4 <- p +
- stat_density_2d(aes(fill = ..density..), geom = "raster", contour = FALSE) +
- scale_fill_distiller(palette=4, direction=1) +
- scale_x_continuous(expand = c(0, 0)) +
- scale_y_continuous(expand = c(0, 0)) +
- theme(
- legend.position='none'
- ) +
- ggtitle("Density 2d: raster") +
- xlim(0,2500) +
- ylim(0,400)
-
-# Hexbin
-p5 <- p + geom_hex() +
- scale_fill_viridis() +
- theme(legend.position="none") +
- ggtitle("Hexbin")
-
-# 2d histogram
-p6 <- p + geom_bin2d( ) +
- scale_fill_viridis( ) +
- theme(legend.position="none") +
- ggtitle("2d histogram")
-
-p1 + p2 + p3 + p4 + p5 + p6 + plot_layout(ncol = 2)
#Overplotting *** The most common pitfall with scatterplot is +overplotting: when the sample size gets big, dots are plotted on top of +each other what makes the chart unreadable. There are several work +around to avoid this issue as describe in this specific +post. Here is a summary of the different offered techniques:
+# code for all graphics:
+p <- data %>%
+ ggplot( aes(x=GrLivArea, y=SalePrice/1000)) +
+ theme_ipsum() +
+ theme(
+ plot.title = element_text(size=12)
+ ) +
+ ylab('Sale price (k$)') +
+ xlab('Ground living area')
+
+# Reduce dot size
+p1 <- p + geom_point(color="#69b3a2", alpha=0.8, size=0.2) + ggtitle("Dot size")
+
+# Use density estimate
+p2 <- p + geom_density2d(color="#69b3a2") + ggtitle("Density 2d: contour")
+
+# Use density estimate (area)
+p3 <- p + stat_density_2d(aes(fill = ..level..), geom = "polygon") + ggtitle("Density 2d: area") + theme(legend.position="none")
+
+# With raster
+p4 <- p +
+ stat_density_2d(aes(fill = ..density..), geom = "raster", contour = FALSE) +
+ scale_fill_distiller(palette=4, direction=1) +
+ scale_x_continuous(expand = c(0, 0)) +
+ scale_y_continuous(expand = c(0, 0)) +
+ theme(
+ legend.position='none'
+ ) +
+ ggtitle("Density 2d: raster") +
+ xlim(0,2500) +
+ ylim(0,400)
+
+# Hexbin
+p5 <- p + geom_hex() +
+ scale_fill_viridis() +
+ theme(legend.position="none") +
+ ggtitle("Hexbin")
+
+# 2d histogram
+p6 <- p + geom_bin2d( ) +
+ scale_fill_viridis( ) +
+ theme(legend.position="none") +
+ ggtitle("2d histogram")
+
+p1 + p2 + p3 + p4 + p5 + p6 + plot_layout(ncol = 2)
You can learn more about each type of graphic presented in this story in the dedicated sections. Click the icon below:
-Any thoughts on this? Found any mistake? Have another way to show the data? Please drop me a word on Twitter or in the comment section below:
-+ You can learn more about each type of graphic presented in this + story in the dedicated sections. Click the icon below: +
+ ++ Data To Viz is a + comprehensive classification of chart types organized by data + input format. Get a high-resolution version of our decision tree + delivered to your inbox now! +
+A work by Yan Holtz for data-to-viz.com
@@ -564,28 +644,28 @@Evolution of the bitcoin price
-Evolution of the bitcoin price
+
This document gives a few suggestions to analyse a dataset composed by two ordered numeric variables. It considers the evolution of the bitcoin price between April 2013 and April 2018.
This dataset has been built using the crypto R package that allows to access the CoinMarketCap website. The first column, date
, represents an ordered numeric variable. The second, value
gives the bitcoin price. This dataset is available on github.
This document gives a few suggestions to analyse a
+dataset composed by two ordered numeric variables. It considers the
+evolution of the bitcoin price between
+April 2013 and April 2018.
This dataset has been built using
+the crypto R package
+that allows to access the CoinMarketCap website. The first
+column, date
, represents an ordered numeric variable. The
+second, value
gives the bitcoin price. This dataset is
+available on github.
# Libraries
-library(tidyverse)
-library(hrbrthemes)
-library(DT)
-library(plotly)
-library(kableExtra)
-options(knitr.table.format = "html")
-
-# Load dataset from github
-data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
-data$date <- as.Date(data$date)
-
-# Show long format
-data %>%
- head(5) %>%
- kable() %>%
- kable_styling(bootstrap_options = "striped", full_width = F)
# Libraries
+library(tidyverse)
+library(hrbrthemes)
+library(DT)
+library(plotly)
+library(kableExtra)
+options(knitr.table.format = "html")
+
+# Load dataset from github
+data <- read.table("https://raw.githubusercontent.com/holtzy/data_to_viz/master/Example_dataset/3_TwoNumOrdered.csv", header=T)
+data$date <- as.Date(data$date)
+
+# Show long format
+data %>%
+ head(5) %>%
+ kable() %>%
+ kable_styling(bootstrap_options = "striped", full_width = F)
The most comon way to represent that kind of dataset is probably to produce a line plot. It allows to give a good overview of the bitcoin price on the period, reaching a value of 20,000 $ in december 2017.
-data %>%
- ggplot( aes(x=date, y=value)) +
- geom_line(color="#69b3a2") +
- ggtitle("Evolution of Bitcoin price") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
#Line plot *** The most comon way to represent that kind of dataset +is probably to produce a line plot. It allows to give a +good overview of the bitcoin price on the period, reaching a value of +20,000 $ in december 2017.
+data %>%
+ ggplot( aes(x=date, y=value)) +
+ geom_line(color="#69b3a2") +
+ ggtitle("Evolution of Bitcoin price") +
+ ylab("bitcoin price ($)") +
+ theme_ipsum()
Even if the line chart is a very good way to convey the information, when only one variable is represented like here, the plot can appear to be slightly empty. Thus, an interesting alternative is the area chart. It is basically the same thing, except that the area between the X axis and the line is filled.
-data %>%
- ggplot( aes(x=date, y=value)) +
- geom_area(fill="#69b3a2", alpha=0.5) +
- geom_line(color="#69b3a2") +
- ggtitle("Evolution of Bitcoin price") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
#Area chart *** Even if the line chart is a very good way to convey +the information, when only one variable is represented like here, the +plot can appear to be slightly empty. Thus, an interesting alternative +is the area chart. It is basically the same thing, except +that the area between the X axis and the line is filled.
+data %>%
+ ggplot( aes(x=date, y=value)) +
+ geom_area(fill="#69b3a2", alpha=0.5) +
+ geom_line(color="#69b3a2") +
+ ggtitle("Evolution of Bitcoin price") +
+ ylab("bitcoin price ($)") +
+ theme_ipsum()
Note that using the same color for the filled area and the line is often a good looking choice, with a bit more transparency for the filled area.
-Using interactivity gives a great added value to your line or area chart. Indeed, it is very useful to ba able to zoom on a specific time slot of interest on the graphic. For instance, have a look to what happened in 2014: bitcoin already experienced a huge evolution, comparable to 2017 in terme of relative evolution, but not mediatised at all.
-p <- data %>%
- ggplot( aes(x=date, y=value)) +
- geom_area(fill="#69b3a2", alpha=0.5) +
- geom_line(color="#69b3a2") +
- ggtitle("Evolution of Bitcoin price") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
-ggplotly(p)
If you have just a few point in your dataset, you probably want to use a connected scatterplot instead. it is basically the same thing where each individual point is represented. It greatly helps to understand when observation have been made. Let’s consider the last 10 observation of the bitcoin dataset:
-data %>%
- tail(10) %>%
- ggplot( aes(x=date, y=value)) +
- geom_area(fill="#69b3a2", alpha=0.5) +
- geom_line(color="#69b3a2") +
- geom_point() +
- ggtitle("Evolution of Bitcoin price") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
Note that using the same color for the filled area and the line is +often a good looking choice, with a bit more transparency for the filled +area.
+#Interactivity *** Using interactivity gives a great added value to +your line or area chart. Indeed, it is very useful to ba able to zoom on +a specific time slot of interest on the graphic. For instance, have a +look to what happened in 2014: bitcoin already experienced a huge +evolution, comparable to 2017 in terme of relative evolution, but not +mediatised at all.
+p <- data %>%
+ ggplot( aes(x=date, y=value)) +
+ geom_area(fill="#69b3a2", alpha=0.5) +
+ geom_line(color="#69b3a2") +
+ ggtitle("Evolution of Bitcoin price") +
+ ylab("bitcoin price ($)") +
+ theme_ipsum()
+ggplotly(p)
#Few points? Use connected scatter *** If you have just a few point +in your dataset, you probably want to use a connected +scatterplot instead. it is basically the same thing where each +individual point is represented. It greatly helps to understand when +observation have been made. Let’s consider the last 10 observation of +the bitcoin dataset:
+data %>%
+ tail(10) %>%
+ ggplot( aes(x=date, y=value)) +
+ geom_area(fill="#69b3a2", alpha=0.5) +
+ geom_line(color="#69b3a2") +
+ geom_point() +
+ ggtitle("Evolution of Bitcoin price") +
+ ylab("bitcoin price ($)") +
+ theme_ipsum()
The previous chart can be a bit frustrating. It is indeed hard to study the evolution of the bitcoin in this graphic since the price ranges between 7,500 and 10,000 dollars in this period, when the Y axis ranges between 0 and 10,000. In this case, it is a good practice to cut the Y axis to zoom on the variation. This subject is subject to many debates in the dataviz community and you can read more about it in the dedicated page.
-data %>%
- tail(10) %>%
- ggplot( aes(x=date, y=value)) +
- geom_ribbon(aes(ymin=8000, ymax=value), fill="#69b3a2", color="transparent", alpha=0.5) +
- geom_line(color="#69b3a2") +
- geom_point() +
- ggtitle("Evolution of Bitcoin price") +
- ylab("bitcoin price ($)") +
- theme_ipsum()
#To cut or not to cut? *** The previous chart can be a bit +frustrating. It is indeed hard to study the evolution of the bitcoin in +this graphic since the price ranges between 7,500 and 10,000 dollars in +this period, when the Y axis ranges between 0 and 10,000. In this case, +it is a good practice to cut the Y axis to zoom on the variation. This +subject is subject to many debates in the dataviz community and you can +read more about it in the dedicated page.
+data %>%
+ tail(10) %>%
+ ggplot( aes(x=date, y=value)) +
+ geom_ribbon(aes(ymin=8000, ymax=value), fill="#69b3a2", color="transparent", alpha=0.5) +
+ geom_line(color="#69b3a2") +
+ geom_point() +
+ ggtitle("Evolution of Bitcoin price") +
+ ylab("bitcoin price ($)") +
+ theme_ipsum()
Sometimes one can be interested in comparing the value to a specific threshold. In this case, you can fill the area depending on this threshold, with 2 different colors depending if the value is over or below the threshold:
-data %>%
- tail(300) %>%
- mutate( mycolor=ifelse(value>7500, "yes", "no")) %>%
- ggplot( aes(x=date, y=value)) +
- geom_ribbon(aes(ymin=7500, ymax=value, fill=mycolor), color="black", alpha=0.5) +
- scale_fill_manual(values=c("#69b3a2","#271569")) +
- ggtitle("Evolution of Bitcoin price") +
- ylab("bitcoin price ($)") +
- theme_ipsum() +
- theme(legend.position="none")
#Comparing to a limit *** Sometimes one can be interested in +comparing the value to a specific threshold. In this case, you can fill +the area depending on this threshold, with 2 different colors depending +if the value is over or below the threshold:
+data %>%
+ tail(300) %>%
+ mutate( mycolor=ifelse(value>7500, "yes", "no")) %>%
+ ggplot( aes(x=date, y=value)) +
+ geom_ribbon(aes(ymin=7500, ymax=value, fill=mycolor), color="black", alpha=0.5) +
+ scale_fill_manual(values=c("#69b3a2","#271569")) +
+ ggtitle("Evolution of Bitcoin price") +
+ ylab("bitcoin price ($)") +
+ theme_ipsum() +
+ theme(legend.position="none")
Note: this graphic is imperfect and must be improved (don’t understand the behavior of geom_ribbon
)
You can learn more about each type of graphic presented in this story in the dedicated sections. Click the icon below:
-Any thoughts on this? Found any mistake? Have another way to show the data? Please drop me a word on Twitter or in the comment section below:
-Note: this graphic is imperfect and must be improved (don’t
+understand the behavior of geom_ribbon
)
+ You can learn more about each type of graphic presented in this + story in the dedicated sections. Click the icon below: +
+ ++ Data To Viz is a + comprehensive classification of chart types organized by data + input format. Get a high-resolution version of our decision tree + delivered to your inbox now! +
+A work by Yan Holtz for data-to-viz.com
@@ -484,28 +555,28 @@Row | " + htmlEscape(data.rows[d.row]) + " |
---|---|
Column | " + htmlEscape(data.cols[d.col]) + " |
Value | " + htmlEscape(d.label) + " |
" + rowTitle + " | " + htmlEscape(data.rows[d.row]) + " |
" + colTitle + " | " + htmlEscape(data.cols[d.col]) + " |
" + opts.cellnote_val + " | " + htmlEscape(d.label) + " |
Variable | " + + htmlEscape(colnames[Math.floor(i / cols)]) + " |
---|---|
Row | " + + htmlEscape(data.matrix.rows[i % cols]) + " |
Value | " + + htmlEscape(scols[i].label) + " |
Variable | " + + htmlEscape(colnames[Math.floor(i / cols)]) + " |
---|---|
Col | " + + htmlEscape(data.matrix.cols[i % cols]) + " |
Value | " + + htmlEscape(scols[i].label) + " |
=i.length)return n;var r=[],u=o[e++];return n.forEach(function(n,u){r.push({key:n,values:t(u,e)})}),u?r.sort(function(n,t){return u(n.key,t.key)}):r}var e,r,u={},i=[],o=[];return u.map=function(t,e){return n(e,t,0)},u.entries=function(e){return t(n(ta.map,e,0),0)},u.key=function(n){return i.push(n),u},u.sortKeys=function(n){return o[i.length-1]=n,u},u.sortValues=function(n){return e=n,u},u.rollup=function(n){return r=n,u},u},ta.set=function(n){var t=new v;if(n)for(var e=0,r=n.length;r>e;++e)t.add(n[e]);return t},o(v,{has:s,add:function(n){return this._[c(n+="")]=!0,n},remove:f,values:h,size:g,empty:p,forEach:function(n){for(var t in this._)n.call(this,l(t))}}),ta.behavior={},ta.rebind=function(n,t){for(var e,r=1,u=arguments.length;++r=0&&(r=n.slice(e+1),n=n.slice(0,e)),n)return arguments.length<2?this[n].on(r):this[n].on(r,t);if(2===arguments.length){if(null==t)for(n in this)this.hasOwnProperty(n)&&this[n].on(r,null);return this}},ta.event=null,ta.requote=function(n){return n.replace(Ma,"\\$&")};var Ma=/[\\\^\$\*\+\?\|\[\]\(\)\.\{\}]/g,xa={}.__proto__?function(n,t){n.__proto__=t}:function(n,t){for(var e in t)n[e]=t[e]},ba=function(n,t){return t.querySelector(n)},_a=function(n,t){return t.querySelectorAll(n)},wa=ia.matches||ia[m(ia,"matchesSelector")],Sa=function(n,t){return wa.call(n,t)};"function"==typeof Sizzle&&(ba=function(n,t){return Sizzle(n,t)[0]||null},_a=Sizzle,Sa=Sizzle.matchesSelector),ta.selection=function(){return Na};var ka=ta.selection.prototype=[];ka.select=function(n){var t,e,r,u,i=[];n=k(n);for(var o=-1,a=this.length;++o=0&&(e=n.slice(0,t),n=n.slice(t+1)),Ea.hasOwnProperty(e)?{space:Ea[e],local:n}:n}},ka.attr=function(n,t){if(arguments.length<2){if("string"==typeof n){var e=this.node();return n=ta.ns.qualify(n),n.local?e.getAttributeNS(n.space,n.local):e.getAttribute(n)}for(t in n)this.each(A(t,n[t]));return this}return this.each(A(n,t))},ka.classed=function(n,t){if(arguments.length<2){if("string"==typeof n){var e=this.node(),r=(n=z(n)).length,u=-1;if(t=e.classList){for(;++u =0?n.slice(0,t):n,r=t>=0?n.slice(t+1):"in";return e=hl.get(e)||fl,r=gl.get(r)||Et,Mu(r(e.apply(null,ea.call(arguments,1))))},ta.interpolateHcl=Lu,ta.interpolateHsl=Tu,ta.interpolateLab=Ru,ta.interpolateRound=Du,ta.transform=function(n){var t=ua.createElementNS(ta.ns.prefix.svg,"g");return(ta.transform=function(n){if(null!=n){t.setAttribute("transform",n);var e=t.transform.baseVal.consolidate()}return new Pu(e?e.matrix:pl)})(n)},Pu.prototype.toString=function(){return"translate("+this.translate+")rotate("+this.rotate+")skewX("+this.skew+")scale("+this.scale+")"};var pl={a:1,b:0,c:0,d:1,e:0,f:0};ta.interpolateTransform=Hu,ta.layout={},ta.layout.bundle=function(){return function(n){for(var t=[],e=-1,r=n.length;++e