forked from wch/rgcookbook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathch08.Rmd
1265 lines (869 loc) · 52.9 KB
/
ch08.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
output:
bookdown::html_document2:
fig_caption: yes
editor_options:
chunk_output_type: console
---
```{r echo = FALSE, cache = FALSE}
# This block needs cache=FALSE to set fig.width and fig.height, and have those
# persist across cached builds.
source("utils.R", local = TRUE)
knitr::opts_chunk$set(fig.width = 3, fig.height = 3)
```
Axes {#CHAPTER-AXES}
====
The x- and y-axes provide context for interpreting the displayed data. ggplot will display the axes with defaults that look good in most cases, but you might want to control, for example, the axis labels, the number and placement of tick marks, the tick mark labels, and so on. In this chapter, I'll cover how to fine-tune the appearance of the axes.
Swapping X- and Y-Axes {#RECIPE-AXES-SWAP-AXES}
----------------------
### Problem
You want to swap the x- and y-axes on a graph.
### Solution
Use `coord_flip()` to flip the axes (Figure \@ref(fig:FIG-AXES-SWAP-AXES)):
```{r FIG-AXES-SWAP-AXES, fig.show="hold", fig.cap="A box plot with regular axes (left); With swapped axes (right)"}
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot()
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
coord_flip()
```
### Discussion
For a scatter plot, it is trivial to change what goes on the vertical axis and what goes on the horizontal axis: just exchange the variables mapped to x and y. But not all the geoms in ggplot treat the x- and y-axes equally. For example, box plots summarize the data along the y-axis, the lines in line graphs move in only one direction along the x-axis, error bars have a single *x* value and a range of *y* values, and so on. If you're using these geoms and want them to behave as though the axes are swapped, `coord_flip()` is what you need.
Sometimes when the axes are swapped, the order of items will be the reverse of what you want. On a graph with standard x- and y-axes, the *x* items start at the left and go to the right, which corresponds to the normal way of reading, from left to right. When you swap the axes, the items still go from the origin outward, which in this case will be from bottom to top -- but this conflicts with the normal way of reading, from top to bottom. Sometimes this is a problem, and sometimes it isn't. If the *x* variable is a factor, the order can be reversed by using `scale_x_$discrete()` with `limits = rev(levels(...))`, as in Figure \@ref(fig:FIG-AXES-SWAP-AXES-REVLEVELS):
```{r FIG-AXES-SWAP-AXES-REVLEVELS, fig.cap="A box plot with swapped axes and x-axis order reversed"}
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
coord_flip() +
scale_x_discrete(limits = rev(levels(PlantGrowth$group)))
```
### See Also
If the variable is continuous, see Recipe \@ref(RECIPE-AXES-REVERSE) to reverse the direction.
Setting the Range of a Continuous Axis {#RECIPE-AXES-RANGE}
--------------------------------------
### Problem
You want to set the range (or limits) of an axis.
### Solution
You can use `xlim()` or `ylim()` to set the minimum and maximum values of a continuous axis. Figure \@ref(fig:FIG-AXES-RANGE) shows one graph with the default *y* limits, and one with manually set *y* limits:
```{r FIG-AXES-RANGE, fig.show="hold", fig.cap="Box plot with default range (left); With manually set range (right)"}
pg_plot <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot()
# Display the basic graph
pg_plot
pg_plot +
ylim(0, max(PlantGrowth$weight))
```
The latter example sets the *y* range from 0 to the maximum value of the `weight` column, though a constant value (like 10) could instead be used as the maximum.
### Discussion
`ylim()` is shorthand for setting the limits with `scale_y_continuous()`. (The same is true for `xlim()` and `scale_x_continuous()`.) The following are equivalent:
```{r eval=FALSE}
ylim(0, 10)
scale_y_continuous(limits = c(0, 10))
```
Sometimes you will need to set other properties of `scale_y_continuous()`, and in these cases using `xlim()` and `scale_y_continuous()` together may result in some unexpected behavior, because only the first of the directives will have an effect. In these two examples, `ylim(0, 10)` should set the *y* range from 0 to 10, and `scale_y_continuous(breaks=c(0, 5, 10))` should put tick marks at 0, 5, and 10. However, in both cases, only the second directive has any effect:
```{r eval=FALSE}
pg_plot +
ylim(0, 10) +
scale_y_continuous(breaks = NULL)
pg_plot +
scale_y_continuous(breaks = NULL) +
ylim(0, 10)
```
To make both changes work, get rid of `ylim()` and set both limits and breaks in `scale_y_continuous()`:
```{r eval=FALSE}
pg_plot +
scale_y_continuous(limits = c(0, 10), breaks = NULL)
```
In ggplot, there are two ways of setting the range of the axes. The first way is to modify the *scale*, and the second is to apply a *coordinate transform*. When you modify the limits of the *x* or *y* scale, any data outside of the limits is removed -- that is, the out-of-range data is not only not displayed, it is removed from consideration entirely. (It will also print a warning when this happens.)
With the box plots in these examples, if you restrict the *y* range so that some of the original data is clipped, the box plot statistics will be computed based on clipped data, and the shape of the box plots will change.
With a coordinate transform, the data is not clipped; in essence, it zooms in or out to the specified range. Figure \@ref(fig:FIG-AXES-RANGE-SCALE-COORD) shows the difference between the two methods:
```{r FIG-AXES-RANGE-SCALE-COORD, fig.show="hold", fig.cap='Smaller y range using a scale (data has been dropped, so the box plots have changed shape; left); "Zooming in" using a coordinate transform (right)'}
pg_plot +
scale_y_continuous(limits = c(5, 6.5)) # Same as using ylim()
pg_plot +
coord_cartesian(ylim = c(5, 6.5))
```
Finally, it's also possible to *expand* the range in one direction, using `expand_limits()` (Figure \@ref(fig:FIG-AXES-RANGE-EXPAND)). You can't use this to shrink the range, however:
```{r FIG-AXES-RANGE-EXPAND, fig.cap="Box plot on which y range has been expanded to include 0"}
pg_plot +
expand_limits(y = 0)
```
Reversing a Continuous Axis {#RECIPE-AXES-REVERSE}
---------------------------
### Problem
You want to reverse the direction of a continuous axis.
### Solution
Use `scale_y_reverse()` or `scale_x_reverse()` (Figure \@ref(fig:FIG-AXES-REVERSE)). The direction of an axis can also be reversed by specifying the limits in reversed order, with the maximum first, then the minimum:
```{r FIG-AXES-REVERSE, fig.cap="Box plot with reversed y-axis"}
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
scale_y_reverse()
# Similar effect by specifying limits in reversed order
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
ylim(6.5, 3.5)
```
### Discussion
Like `scale_y_continuous()`, `scale_y_reverse()` does not work with `ylim()`. (The same is true for the x-axis properties.) If you want to reverse an axis *and* set its range, you must do it within the `scale_y_reverse()` statement, by setting the limits in reversed order (Figure \@ref(fig:FIG-AXES-REVERSE-LIMITS)):
```{r FIG-AXES-REVERSE-LIMITS, fig.cap="Box plot with reversed y-axis with manually set limits"}
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
scale_y_reverse(limits = c(8, 0))
```
### See Also
To reverse the order of items on a *discrete* axis, see Recipe \@ref(RECIPE-AXIS-ORDER).
Changing the Order of Items on a Categorical Axis {#RECIPE-AXIS-ORDER}
-------------------------------------------------
### Problem
You want to change the order of items on a categorical axis.
### Solution
For a categorical (or discrete) axis -- one with a factor mapped to it -- the order of items can be changed by setting limits in `scale_x_discrete()` or `scale_y_discrete()`.
To manually set the order of items on the axis, specify limits with a vector of the levels in the desired order. You can also omit items with this vector, as shown in Figure \@ref(fig:FIG-AXES-ORDER-MANUAL), left:
```{r FIG-AXES-ORDER-MANUAL-1, eval=FALSE}
pg_plot <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot()
pg_plot +
scale_x_discrete(limits = c("trt1", "ctrl", "trt2"))
```
### Discussion
You can also use this method to display a subset of the items on the axis. This will show only `ctrl` and `trt1` (Figure \@ref(fig:FIG-AXES-ORDER-MANUAL), right). Note that because data is removed, it will emit a warning when you do this.
```{r FIG-AXES-ORDER-MANUAL-2, eval=FALSE}
pg_plot +
scale_x_discrete(limits = c("ctrl", "trt1"))
#> Warning: Removed 10 rows containing missing values (stat_boxplot).
```
```{r FIG-AXES-ORDER-MANUAL, ref.label=c("FIG-AXES-ORDER-MANUAL-1", "FIG-AXES-ORDER-MANUAL-2"), echo=FALSE, fig.cap="Box plot with manually specified items on the x-axis (left); With only two items (right)", warning=FALSE}
```
To reverse the order, set `limits = rev(levels(...))`, and put the factor inside. This will reverse the order of the `PlantGrowth$group` factor, as shown in Figure \@ref(fig:FIG-AXES-ORDER-REV):
```{r FIG-AXES-ORDER-REV, fig.cap="Box plot with order reversed on the x-axis"}
pg_plot +
scale_x_discrete(limits = rev(levels(PlantGrowth$group)))
```
### See Also
To reorder factor levels based on data values from another column, see Recipe \@ref(RECIPE-DATAPREP-FACTOR-REORDER-VALUE).
Setting the Scaling Ratio of the X- and Y-Axes {#RECIPE-AXES-SCALE}
----------------------------------------------
### Problem
You want to set the ratio at which the x- and y-axes are scaled.
### Solution
Use `coord_fixed()`. This will result in a 1:1 scaling between the x- and y-axes, as shown in Figure \@ref(fig:FIG-AXES-SCALE-EQUAL):
```{r FIG-AXES-SCALE-EQUAL-1, eval=FALSE}
library(gcookbook) # Load gcookbook for the marathon data set
m_plot <- ggplot(marathon, aes(x = Half,y = Full)) +
geom_point()
m_plot +
coord_fixed()
```
### Discussion
The marathon data set contains runners' marathon and half-marathon times. In this case it might be useful to force the x- and y-axes to have the same scaling.
It's also helpful to set the tick spacing to be the same, by setting breaks in `scale_y_continuous()` and `scale_x_continuous()` (also in Figure \@ref(fig:FIG-AXES-SCALE-EQUAL)):
```{r FIG-AXES-SCALE-EQUAL-2, eval=FALSE}
m_plot +
coord_fixed() +
scale_y_continuous(breaks = seq(0, 420, 30)) +
scale_x_continuous(breaks = seq(0, 420, 30))
```
```{r FIG-AXES-SCALE-EQUAL, ref.label=c("FIG-AXES-SCALE-EQUAL-1", "FIG-AXES-SCALE-EQUAL-2"), echo=FALSE, fig.show="hold", fig.cap="Scatter plot with equal scaling of axes (left); With tick marks at specified positions (right)", fig.width=3, fig.height=4}
```
If, instead of an equal ratio, you want some other fixed ratio between the axes, set the ratio parameter. With the marathon data, we might want the axis with half-marathon times stretched out to twice that of the axis with the marathon times (Figure \@ref(fig:FIG-AXES-SCALE-HALF)). We'll also add tick marks twice as often on the x-axis:
```{r FIG-AXES-SCALE-HALF, fig.cap="Scatter plot with a 1/2 scaling ratio for the axes", fig.width=4, fig.height=4}
m_plot +
coord_fixed(ratio = 1/2) +
scale_y_continuous(breaks = seq(0, 420, 30)) +
scale_x_continuous(breaks = seq(0, 420, 15))
```
Setting the Positions of Tick Marks {#RECIPE-AXES-SET-TICKS}
-----------------------------------
### Problem
You want to set where the tick marks appear on the axis.
### Solution
Usually ggplot does a good job of deciding where to put the tick marks, but if you want to change them, set `breaks` in the scale (Figure \@ref(fig:FIG-AXES-SET-TICKS)):
```{r FIG-AXES-SET-TICKS, fig.show="hold", fig.cap="Box plot with automatic tick marks (left); With manually set tick marks (right)"}
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot()
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
scale_y_continuous(breaks = c(4, 4.25, 4.5, 5, 6, 8))
```
### Discussion
The location of the tick marks defines where *major* grid lines are drawn. If the axis represents a continuous variable, *minor* grid lines, which are fainter and unlabeled, will by default be drawn halfway between each major grid line.
You can also use the `seq()` function or the `:` operator to generate vectors for tick marks:
```{r}
seq(4, 7, by = .5)
5:10
```
If the axis is discrete instead of continuous, then there is by default a tick mark for each item. For discrete axes, you can change the order of items or remove them by specifying the limits (see Recipe \@ref(RECIPE-AXIS-ORDER)). Setting breaks will change which of the levels are labeled, but will not remove them or change their order. Figure \@ref(fig:FIG-AXES-SET-TICKS-DISCRETE) shows what happens when you set limits and breaks (the warning is because we're using only two of the three levels for `group` and therefore are dropping some rows):
```{r FIG-AXES-SET-TICKS-DISCRETE, fig.cap="For a discrete axis, setting limits reorders and removes items, and setting breaks controls which items have labels"}
# Set both breaks and labels for a discrete axis
ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
scale_x_discrete(limits = c("trt2", "ctrl"), breaks = "ctrl")
```
### See Also
To remove the tick marks and labels (but not the data) from thegraph, see Recipe \@ref(RECIPE-AXIS-REMOVE-TICKS).
Removing Tick Marks and Labels {#RECIPE-AXIS-REMOVE-TICKS}
------------------------------
### Problem
You want to remove tick marks and labels.
### Solution
To remove just the tick labels, as in Figure \@ref(fig:FIG-AXES-SET-TICKS-NONE) (left), use `theme(axis.text.y = element_blank()`) (or do the same for `axis.text.x`). This will work for both continuous and categorical axes:
```{r FIG-AXES-SET-TICKS-NONE-1, eval=FALSE}
pg_plot <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot()
pg_plot +
theme(axis.text.y = element_blank())
```
To remove the tick marks, use `theme(axis.ticks=element_blank())`. This will remove the tick marks on both axes. (It's not possible to hide the tick marks on just one axis.) In this example, we'll hide all tick marks as well as the *y* tick labels (Figure \@ref(fig:FIG-AXES-SET-TICKS-NONE), center):
```{r FIG-AXES-SET-TICKS-NONE-2, eval=FALSE}
pg_plot +
theme(axis.ticks = element_blank(), axis.text.y = element_blank())
```
To remove the tick marks, the labels, and the grid lines, set breaks to
`NULL` (Figure \@ref(fig:FIG-AXES-SET-TICKS-NONE), right):
```{r FIG-AXES-SET-TICKS-NONE-3, eval=FALSE}
pg_plot +
scale_y_continuous(breaks = NULL)
```
(ref:cap-FIG-AXES-SET-TICKS-NONE) No tick labels on y-axis (left); No tick marks and no tick labels on y-axis (middle); With `breaks=NULL` (right)
```{r FIG-AXES-SET-TICKS-NONE, ref.label=c("FIG-AXES-SET-TICKS-NONE-1", "FIG-AXES-SET-TICKS-NONE-2", "FIG-AXES-SET-TICKS-NONE-3"), echo=FALSE, fig.show="hold", fig.cap="(ref:cap-FIG-AXES-SET-TICKS-NONE)"}
```
This will work for continuous axes only; if you remove items from a categorical axis using limits, as in Recipe \@ref(RECIPE-AXIS-ORDER), the data with that value won't be shown at all.
### Discussion
There are actually three related items that can be controlled: tick labels, tick marks, and the grid lines. For continuous axes, `ggplot()` normally places a tick label, tick mark, and major grid line at each value of breaks. For categorical axes, these things go at each value of limits.
The tick labels on each axis can be controlled independently. However, the tick marks and grid lines must be controlled all together.
Changing the Text of Tick Labels {#RECIPE-AXES-TICK-LABEL}
--------------------------------
### Problem
You want to change the text of tick labels.
### Solution
Consider the scatter plot in Figure \@ref(fig:FIG-AXES-TICK-LABEL), where height is reported in inches:
```{r FIG-AXES-TICK-LABEL-1, eval=FALSE}
library(gcookbook) # Load gcookbook for the heightweight data set
hw_plot <- ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
geom_point()
hw_plot
```
To set arbitrary labels, as in Figure \@ref(fig:FIG-AXES-TICK-LABEL) (right), pass values to breaks and labels in the scale. One of the labels has a newline (`\n`) character, which tells ggplot to put a line break there:
```{r FIG-AXES-TICK-LABEL-2, eval=FALSE}
hw_plot +
scale_y_continuous(
breaks = c(50, 56, 60, 66, 72),
labels = c("Tiny", "Really\nshort", "Short", "Medium", "Tallish")
)
```
```{r FIG-AXES-TICK-LABEL, ref.label=c("FIG-AXES-TICK-LABEL-1", "FIG-AXES-TICK-LABEL-2"), echo=FALSE, fig.show="hold", fig.cap="Scatter plot with automatic tick labels (left); With manually specified labels on the y-axis (right)"}
```
### Discussion
Instead of setting completely arbitrary labels, it is more common to have your data stored in one format, while wanting the labels to be displayed in another. We might, for example, want heights to be displayed in feet and inches (like 5'6") instead of just inches. To do this, we can define a *formatter* function, which takes in a value and returns the corresponding string. For example, this function will convert inches to feet and inches:
```{r}
footinch_formatter <- function(x) {
foot <- floor(x/12)
inch <- x %% 12
return(paste(foot, "'", inch, "\"", sep = ""))
}
```
Here's what it returns for values 56--64 (the backslashes are there as escape characters, to distinguish the quotes *in* a string from the quotes that *delimit* a string):
```{r}
footinch_formatter(56:64)
```
Now we can pass our function to the scale, using the labels parameter (Figure \@ref(fig:FIG-AXES-TICK-LABEL-FORMATTER), left):
```{r FIG-AXES-TICK-LABEL-FORMATTER-1, eval=FALSE}
hw_plot +
scale_y_continuous(labels = footinch_formatter)
```
Here, the automatic tick marks were placed every five inches, but that looks a little off for this data. We can instead have ggplot set tick marks every four inches, by specifying breaks (Figure \@ref(fig:FIG-AXES-TICK-LABEL-FORMATTER), right):
```{r FIG-AXES-TICK-LABEL-FORMATTER-2, eval=FALSE}
hw_plot +
scale_y_continuous(breaks = seq(48, 72, 4), labels = footinch_formatter)
```
```{r FIG-AXES-TICK-LABEL-FORMATTER, ref.label=c("FIG-AXES-TICK-LABEL-FORMATTER-1", "FIG-AXES-TICK-LABEL-FORMATTER-2"), echo=FALSE, fig.show="hold", fig.cap="Scatter plot with a formatter function (left); With manually specified breaks on the y-axis (right)"}
```
Another common task is to convert time measurements to HH:MM:SS format, or something similar. This function will take numeric minutes and convert them to this format, rounding to the nearest second (it can be customized for your particular needs):
```{r}
timeHMS_formatter <- function(x) {
h <- floor(x/60)
m <- floor(x %% 60)
s <- round(60*(x %% 1)) # Round to nearest second
lab <- sprintf("%02d:%02d:%02d", h, m, s) # Format the strings as HH:MM:SS
lab <- gsub("^00:", "", lab) # Remove leading 00: if present
lab <- gsub("^0", "", lab) # Remove leading 0 if present
return(lab)
}
```
Running it on some sample numbers yields:
```{r}
timeHMS_formatter(c(.33, 50, 51.25, 59.32, 60, 60.1, 130.23))
```
The scales package, which is installed with ggplot2, comes with some built-in formatting functions:
* `comma()` adds commasto numbers, in the thousand, million, billion, etc. places.
* `dollar()` adds a dollar sign and rounds to the nearest cent.
* `percent()` multiplies by 100, rounds to the nearest integer, and adds a percent sign.
* `scientific()` gives numbers in scientific notation, like `3.30e+05`, for large and small numbers.
If you want to use these functions, you must first load the scales package, with `library(scales)`.
Changing the Appearance of Tick Labels {#RECIPE-AXES-TICK-LABEL-APPEARANCE}
--------------------------------------
### Problem
You want to change the appearance of tick labels.
### Solution
In Figure \@ref(fig:FIG-AXES-TICK-LABEL-ROTATE) (left), we've manually set the labels to be long-long enough that they overlap:
```{r FIG-AXES-TICK-LABEL-ROTATE-1, eval=FALSE}
pg_plot <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot() +
scale_x_discrete(
breaks = c("ctrl", "trt1", "trt2"),
labels = c("Control", "Treatment 1", "Treatment 2")
)
pg_plot
```
To rotate the text 90 degrees counterclockwise (Figure \@ref(fig:FIG-AXES-TICK-LABEL-ROTATE), middle), use:
```{r FIG-AXES-TICK-LABEL-ROTATE-2, eval=FALSE}
pg_plot +
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = .5))
```
Rotating the text 30 degrees (Figure \@ref(fig:FIG-AXES-TICK-LABEL-ROTATE), right) uses less vertical space and makes the labels easier to read without tilting your head:
```{r FIG-AXES-TICK-LABEL-ROTATE-3, eval=FALSE}
pg_plot +
theme(axis.text.x = element_text(angle = 30, hjust = 1, vjust = 1))
```
```{r FIG-AXES-TICK-LABEL-ROTATE, ref.label=c("FIG-AXES-TICK-LABEL-ROTATE-1", "FIG-AXES-TICK-LABEL-ROTATE-2", "FIG-AXES-TICK-LABEL-ROTATE-3"), echo=FALSE, fig.show="hold", fig.cap="X-axis tick labels rotated 0 (left), 90 (middle), and 30 degrees (right)", fig.width=3.5, fig.height=3.5}
```
The `hjust` and `vjust` settings specify the horizontal alignment (left/center/right) and vertical alignment (top/middle/bottom).
### Discussion
Besides rotation, other text properties, such as size, style (bold/italic/normal), and the font family (such as Times or Helvetica) can be set with `element_text()`, as shown in Figure \@ref(fig:FIG-AXES-TICK-LABEL-FONT):
```{r FIG-AXES-TICK-LABEL-FONT, fig.cap="X-axis tick labels with manually specified appearance"}
pg_plot +
theme(
axis.text.x = element_text(family = "Times", face = "italic",
colour = "darkred", size = rel(0.9))
)
```
In this example, the size is set to `rel(0.9)`, which means that it is 0.9 times the size of the base font size for the theme.
These commands control the appearance of only the tick labels, on only one axis. They don't affect the other axis, the axis label, the overall title, or the legend. To control all of these at once, you can use the theming system, as discussed in Recipe \@ref(RECIPE-APPEARANCE-THEME).
### See Also
See Recipe \@ref(RECIPE-APPEARANCE-TEXT-APPEARANCE) for more about controlling the appearance of the text.
Changing the Text of Axis Labels {#RECIPE-AXES-AXIS-LABEL}
--------------------------------
### Problem
You want to change the text of axis labels.
### Solution
Use `xlab()` or `ylab()` to change the text of the axis labels (Figure \@ref(fig:FIG-AXES-AXIS-LABEL)):
```{r FIG-AXES-AXIS-LABEL, fig.show="hold", fig.cap="Scatter plot with the default axis labels (left); Manually specified labels for the x- and y-axes (right)", fig.width=4, fig.height=3.5}
library(gcookbook) # Load gcookbook for the heightweight data set
hw_plot <- ggplot(heightweight, aes(x = ageYear, y = heightIn, colour = sex)) +
geom_point()
# With default axis labels
hw_plot
# Set the axis labels
hw_plot +
xlab("Age in years") +
ylab("Height in inches")
```
### Discussion
By default the graphs will just use the column names from the data frame as axis labels. This might be fine for exploring data, but for presenting it, you may want more descriptive axis labels.
Instead of `xlab()` and `ylab()`, you can use `labs()`:
```{r, eval=FALSE}
hw_plot +
labs(x = "Age in years", y = "Height in inches")
```
Another way of setting the axis labels is in the scale specification, like this:
```{r, eval=FALSE}
hw_plot +
scale_x_continuous(name = "Age in years")
```
This may look a bit awkward, but it can be useful if you're also setting other properties of the scale, such as the tick mark placement, range, and so on.
This also applies, of course, to other axis scales, such as `scale_y_continuous()`, `scale_x_discrete()`, and so on.
You can also add line breaks with `\n`, as shown in Figure \@ref(fig:FIG-AXES-AXIS-LABEL-NEWLINE):
```{r FIG-AXES-AXIS-LABEL-NEWLINE, fig.cap="X-axis label with a line break", fig.width=4, fig.height=3.5}
hw_plot +
scale_x_continuous(name = "Age\n(years)")
```
Removing Axis Labels {#RECIPE-AXES-AXIS-LABEL-REMOVE}
--------------------
### Problem
You want to remove the label on an axis.
### Solution
For the x-axis label, use `xlab(NULL)`. For the y-axis label, use `ylab(NULL)`.
We'll hide the x-axis in this example (Figure \@ref(fig:FIG-AXES-AXIS-LABEL-REMOVE)):
```{r FIG-AXES-AXIS-LABEL-REMOVE-1, eval=FALSE}
pg_plot <- ggplot(PlantGrowth, aes(x = group, y = weight)) +
geom_boxplot()
pg_plot +
xlab(NULL)
```
### Discussion
Sometimes axis labels are redundant or obvious from the context, and don't need to be displayed. In the example here, the x-axis represents group, but this should be obvious from the context. Similarly, if the *y* tick labels had *kg* or some other unit in each label, the axis label "weight" would be unnecessary.
Another way to remove the axis label is to set it to an empty string. However, if you do it this way, the resulting graph will still have space reserved for the text, as shown in the graph on the right in Figure \@ref(fig:FIG-AXES-AXIS-LABEL-REMOVE):
```{r FIG-AXES-AXIS-LABEL-REMOVE-2, eval=FALSE}
pg_plot +
xlab("")
```
```{r FIG-AXES-AXIS-LABEL-REMOVE, ref.label=c("FIG-AXES-AXIS-LABEL-REMOVE-1", "FIG-AXES-AXIS-LABEL-REMOVE-2"), echo=FALSE, fig.show="hold", fig.cap='X-axis label with `NULL` (left); With the label set to `""` (right)', fig.width=3.5, fig.height=3.5}
```
When you use `theme()` to set `axis.title.x = element_blank()`, the name of the *x* or *y* scale is unchanged, but the text is not displayed and no space is reserved for it. When you set the label to `""`, the name of the scale is changed and the (empty) text does display.
Changing the Appearance of Axis Labels {#RECIPE-AXES-AXIS-LABEL-APPEARANCE}
--------------------------------------
### Problem
You want to change the appearance of axis labels.
### Solution
To change the appearance of the x-axis label (Figure \@ref(fig:FIG-AXES-AXIS-LABEL-APPEARANCE)), use `axis.title.x`:
```{r FIG-AXES-AXIS-LABEL-APPEARANCE, fig.cap="X-axis label with customized appearance"}
library(gcookbook) # Load gcookbook for the heightweight data set
hw_plot <- ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
geom_point()
hw_plot +
theme(axis.title.x = element_text(face = "italic", colour = "darkred", size = 14))
```
### Discussion
For the y-axis label, it might also be useful to display the text unrotated, as shown in Figure \@ref(fig:FIG-AXES-AXIS-LABEL-YROTATE) (left). The `\n` in the label represents a newline character:
```{r FIG-AXES-AXIS-LABEL-YROTATE-1, eval=FALSE}
hw_plot +
ylab("Height\n(inches)") +
theme(axis.title.y = element_text(angle = 0, face = "italic", size = 14))
```
When you call `element_text()`, the default `angle` is 0, so if you set `axis.title.y` but don't specify the `angle`, it will show in this orientation, with the top of the text pointing up. If you change any other properties of `axis.title.y` and want it to be displayed in its usual orientation, rotated 90 degrees, you must manually specify the `angle` (Figure \@ref(fig:FIG-AXES-AXIS-LABEL-YROTATE), right):
```{r FIG-AXES-AXIS-LABEL-YROTATE-2, eval=FALSE}
hw_plot +
ylab("Height\n(inches)") +
theme(axis.title.y = element_text(
angle = 90,
face = "italic",
colour = "darkred",
size = 14)
)
```
```{r FIG-AXES-AXIS-LABEL-YROTATE, ref.label=c("FIG-AXES-AXIS-LABEL-YROTATE-1", "FIG-AXES-AXIS-LABEL-YROTATE-2"), echo=FALSE, fig.show="hold", fig.cap="Y-axis label with angle = 0 (left); With angle = 90 (right)"}
```
### See Also
See Recipe \@ref(RECIPE-APPEARANCE-TEXT-APPEARANCE) for more about controlling the appearance of the text.
Showing Lines Along the Axes {#RECIPE-AXES-AXIS-LINES}
----------------------------
### Problem
You want to display lines along the x- and y-axes, but not on the other sides of the graph.
### Solution
Using themes, use `axis.line` (Figure \@ref(fig:FIG-AXES-AXIS-LINE)):
```{r FIG-AXES-AXIS-LINE-1, eval=FALSE}
library(gcookbook) # Load gcookbook for the heightweight data set
hw_plot <- ggplot(heightweight, aes(x = ageYear, y = heightIn)) +
geom_point()
hw_plot +
theme(axis.line = element_line(colour = "black"))
```
### Discussion
If you are starting with a theme that has a border around the plotting
area, like theme_bw(), you will also need to unset panel.border
(Figure \@ref(fig:FIG-AXES-AXIS-LINE), right):
```{r FIG-AXES-AXIS-LINE-2, eval=FALSE}
hw_plot +
theme_bw() +
theme(panel.border = element_blank(), axis.line = element_line(colour = "black"))
```
(ref:cap-FIG-AXES-AXIS-LINE) Scatter plot with axis lines (left); With `theme_bw()`, `panel.border` must also be made blank (right)
```{r FIG-AXES-AXIS-LINE, ref.label=c("FIG-AXES-AXIS-LINE-1", "FIG-AXES-AXIS-LINE-2"), echo=FALSE, fig.show="hold", fig.cap="(ref:cap-FIG-AXES-AXIS-LINE)"}
```
If the lines are thick, the ends will only partially overlap (Figure \@ref(fig:FIG-AXES-AXIS-LINE-LINEEND), left). To make them fully overlap (Figure \@ref(fig:FIG-AXES-AXIS-LINE-LINEEND), right), set `lineend = "square"`:
```{r FIG-AXES-AXIS-LINE-LINEEND, fig.show="hold", fig.cap='With thick lines, the ends don\'t fully overlap (left); Full overlap with `lineend="square"` (right)'}
# With thick lines, only half overlaps
hw_plot +
theme_bw() +
theme(
panel.border = element_blank(),
axis.line = element_line(colour = "black", size = 4)
)
# Full overlap
hw_plot +
theme_bw() +
theme(
panel.border = element_blank(),
axis.line = element_line(colour = "black", size = 4, lineend = "square")
)
```
### See Also
For more information about how the theming system works, see Recipe \@ref(RECIPE-APPEARANCE-THEME).
Using a Logarithmic Axis {#RECIPE-AXES-AXIS-LOG}
------------------------
### Problem
You want to use a logarithmic axis for a graph.
### Solution
Use `scale_x_log10()` and/or `scale_y_log10()` (Figure \@ref(fig:FIG-AXES-LOG)):
```{r FIG-AXES-LOG, fig.show="hold", fig.cap="Exponentially distributed data with linear-scaled axes (left); With logarithmic axes (right)", fig.width=5, fig.height=5}
library(MASS) # Load MASS for the Animals data set
# Create the base plot
animals_plot <- ggplot(Animals, aes(x = body, y = brain, label = rownames(Animals))) +
geom_text(size = 3)
animals_plot
# With logarithmic x and y scales
animals_plot +
scale_x_log10() +
scale_y_log10()
```
### Discussion
With a log axis, a given visual distance represents a constant *proportional* change; for example, each centimeter on the y-axis might represent a multiplication of the quantity by 10. In contrast, with a linear axis, a given visual distance represents a constant quantity change; each centimeter might represent adding 10 to the quantity.
Some data sets are exponentially distributed on the x-axis, and others on the y-axis (or both). For example, the `Animals` data set from the MASS package contains data on the average brain mass (in g) and body mass (in kg) of various mammals, with a few dinosaurs thrown in for comparison:
```{r}
Animals
```
As shown in Figure \@ref(fig:FIG-AXES-LOG), we can make a scatter plot to visualize the relationship between brain and body mass. With the default linearly scaled axes, it's hard to make much sense of this graph. Because of a few very large animals, the rest of the animals get squished into the lower-left corner-a mouse barely looks different from a triceratops! This is a case where the data is distributed exponentially on both axes.
ggplot will try to make good decisions about where to place the tick marks, but if you don't like them, you can change them by specifying `breaks` and, optionally, `labels`. In the example here, the automatically generated tick marks are spaced farther apart than is ideal. For the y-axis tick marks, we can get a vector of every power of 10 from 10^0^ to 10^3^ like this:
```{r}
10^(0:3)
```
The x-axis tick marks work the same way, but because the range is large, R decides to format the output with scientific notation:
```{r}
10^(-1:5)
```
And then we can use those values as the breaks, as in Figure \@ref(fig:FIG-AXES-LOG-BREAKS) (left):
```{r FIG-AXES-LOG-BREAKS-1, eval=FALSE}
animals_plot +
scale_x_log10(breaks = 10^(-1:5)) +
scale_y_log10(breaks = 10^(0:3))
```
To instead use exponential notation for the break labels (Figure \@ref(fig:FIG-AXES-LOG-BREAKS), right), use the trans_format() function, from the scales package:
```{r FIG-AXES-LOG-BREAKS-2, eval=FALSE}
library(scales)
animals_plot +
scale_x_log10(breaks = 10^(-1:5), labels = trans_format("log10", math_format(10^.x))) +
scale_y_log10(breaks = 10^(0:3), labels = trans_format("log10", math_format(10^.x)))
```
```{r FIG-AXES-LOG-BREAKS, ref.label=c("FIG-AXES-LOG-BREAKS-1", "FIG-AXES-LOG-BREAKS-2"), echo=FALSE, message=FALSE, fig.show="hold", fig.cap="Scatter plot with log~10~ x- and y-axes, and with manually specified breaks (left); With exponents for the tick labels (right)", fig.width=5, fig.height=5}
```
Another way to use log axes is to transform the data before mapping it to the *x* and *y* coordinates (Figure \@ref(fig:FIG-AXES-LOG-BEFORE-MAPPING)). Technically, the axes are still linear -- it's the quantity that is log-transformed:
```{r FIG-AXES-LOG-BEFORE-MAPPING, fig.cap="Plot with log transform before mapping to x- and y-axes", fig.width=5, fig.height=5}
ggplot(Animals, aes(x = log10(body), y = log10(brain), label = rownames(Animals))) +
geom_text(size = 3)
```
The previous examples used a log~10~ transformation, but it is possible to use other transformations, such as log~2~ and natural log, as shown in Figure \@ref(fig:FIG-AXES-LOG-2-E). It's a bit more complicated to use these -- `scale_x_log10()` is shorthand, but for these other log scales, we need to spell them out:
```{r FIG-AXES-LOG-2-E, fig.cap="Plot with exponents in tick labels. Notice that different bases are used for the x and y axes.", fig.width=5, fig.height=5}
library(scales)
# Use natural log on x, and log2 on y
animals_plot +
scale_x_continuous(
trans = log_trans(),
breaks = trans_breaks("log", function(x) exp(x)),
labels = trans_format("log", math_format(e^.x))
) +
scale_y_continuous(
trans = log2_trans(),
breaks = trans_breaks("log2", function(x) 2^x),
labels = trans_format("log2", math_format(2^.x))
)
```
It's possible to use a log axis for just one axis. It is often useful to represent financial data this way, because it better represents proportional change. Figure \@ref(fig:FIG-AXES-LOG-Y) shows Apple's stock price with linear and log y-axes. The default tick marks might not be spaced well for your graph; they can be set with the breaks in the scale:
```{r FIG-AXES-LOG-Y, fig.show="hold", fig.cap="Top: a stock chart with a linear x-axis and log y-axis; bottom: with manual breaks", fig.width=7, fig.height=2}
library(gcookbook) # Load gcookbook for the aapl data set
ggplot(aapl, aes(x = date,y = adj_price)) +
geom_line()
ggplot(aapl, aes(x = date,y = adj_price)) +
geom_line() +
scale_y_log10(breaks = c(2,10,50,250))
```
Adding Ticks for a Logarithmic Axis {#RECIPE-AXES-AXIS-LOG-TICKS}
-----------------------------------
### Problem
You want to add tick marks with diminishing spacing for a logarithmic axis.
### Solution
Use `annotation_logticks()` (Figure \@ref(fig:FIG-AXES-LOG-TICKS)):
```{r FIG-AXES-LOG-TICKS, fig.cap="Log axes with diminishing tick marks", fig.width=5, fig.height=5}
library(MASS) # Load MASS for the Animals data set
library(scales) # For the trans_format function
# Given a vector x, return a vector of powers of 10 that encompasses all values
# in x.
breaks_log10 <- function(x) {
low <- floor(log10(min(x)))
high <- ceiling(log10(max(x)))
10^(seq.int(low, high))
}
ggplot(Animals, aes(x = body, y = brain, label = rownames(Animals))) +
geom_text(size = 3) +
annotation_logticks() +
scale_x_log10(breaks = breaks_log10,
labels = trans_format(log10, math_format(10^.x))) +
scale_y_log10(breaks = breaks_log10,
labels = trans_format(log10, math_format(10^.x)))
```
We also defined a function, `breaks_log10()`, which returns all powers of 10 that encompass the range of values passed to it. This tells `scale_x_log10` where to put the breaks. For example:
```{r}
breaks_log10(c(0.12, 6))
```
### Discussion
The tick marks created by `annotation_logticks()` are actually geoms inside the plotting area. There is a long tick mark at each power of 10, and a mid-length tick mark at each 5.
To get the colors of the tick marks and the grid lines to match up a bit better, you can use `theme_bw()`.
By default, the minor grid lines appear visually halfway between the major grid lines, but this is not the same place as the "5" tick marks on a logarithmic scale. To get them to be the same, we can supply a function for the scales `minor_breaks`.
We'll define `breaks_5log10()`, which returns 5 times powers of 10 that encompass the values passed to it.
```{r}
breaks_5log10 <- function(x) {
low <- floor(log10(min(x)/5))
high <- ceiling(log10(max(x)/5))
5 * 10^(seq.int(low, high))
}
breaks_5log10(c(0.12, 6))
```
Then we'll use that function for the `minor breaks` (Figure \@ref(fig:FIG-AXES-LOG-TICKS-CUSTOM)):
```{r FIG-AXES-LOG-TICKS-CUSTOM, fig.cap="Log axes with ticks at each 5, and fixed coordinate ratio", fig.width=6, fig.height=4}
ggplot(Animals, aes(x = body, y = brain, label = rownames(Animals))) +
geom_text(size = 3) +
annotation_logticks() +
scale_x_log10(breaks = breaks_log10,
minor_breaks = breaks_5log10,
labels = trans_format(log10, math_format(10^.x))) +
scale_y_log10(breaks = breaks_log10,
minor_breaks = breaks_5log10,
labels = trans_format(log10, math_format(10^.x))) +
coord_fixed() +
theme_bw()
```
Making a Circular Plot {#RECIPE-AXES-POLAR}
-----------------------
### Problem
You want to make a circular plot.
### Solution
Use `coord_polar()`. For this example we'll use the `wind` data set from gcookbook. It contains samples of wind speed and direction for every 5 minutes throughout a day. The direction of the wind is categorized into 15-degree bins, and the speed is categorized into 5 m/s increments:
```{r}
library(gcookbook) # Load gcookbook for the wind data set
wind
```
We'll plot a count of the number of samples at each `SpeedCat` and `DirCat` using `geom_histogram()` (Figure \@ref(fig:FIG-AXES-POLAR)). We'll set `binwidth` to 15 and make the origin of the histogram start at –7.5, so that each bin is centered around 0, 15, 30, etc.:
```{r FIG-AXES-POLAR, fig.cap="Polar plot", fig.width=6, fig.height=5}
ggplot(wind, aes(x = DirCat, fill = SpeedCat)) +
geom_histogram(binwidth = 15, boundary = -7.5) +
coord_polar() +
scale_x_continuous(limits = c(0,360))
```
### Discussion
Be cautious when using polar plots, since they can perceptually distort the data. In the example here, at 210 degrees there are 15 observations with a speed of 15–20 and 13 observations with a speed of >20, but a quick glance at the picture makes it appear that there are more observations at >20. There are also three observations with a speed of 10–15, but they're barely visible.
In this example we can make the plot a little prettier by reversing the legend, using a different palette, adding an outline, and setting the breaks to some more familiar numbers (Figure \@ref(fig:FIG-AXES-POLAR-CUSTOM)):
```{r FIG-AXES-POLAR-CUSTOM, fig.cap="Polar plot with different colors and breaks", fig.width=6, fig.height=5}
ggplot(wind, aes(x = DirCat, fill = SpeedCat)) +
geom_histogram(binwidth = 15, boundary = -7.5, colour = "black", size = .25) +
guides(fill = guide_legend(reverse = TRUE)) +
coord_polar() +
scale_x_continuous(limits = c(0,360),
breaks = seq(0, 360, by = 45),
minor_breaks = seq(0, 360, by = 15)) +
scale_fill_brewer()
```
It may also be useful to set the starting angle with the start argument, especially when using a discrete variable for *theta*. The starting angle is specified in radians, so if you know the adjustment in degrees, you'll have to convert it to radians:
```{r eval=FALSE}
coord_polar(start = -45 * pi / 180)
```
Polar coordinates can be used with other geoms, including lines and points. There are a few important things to keep in mind when using these geoms. First, by default, for the variable that is mapped to *y* (or *r*), the smallest actual value gets mapped to the center; in other words, the smallest data value gets mapped to a visual radius value of 0. You may be expecting a data value of 0 to be mapped to a radius of 0, but to make sure this happens, you'll need to set the limits.
Next, when using a continuous *x* (or *theta*), the smallest and largest data values are merged. Sometimes this is desirable, sometimes not. To change this behavior, you'll need to set the limits.
Finally, the *theta* values of the polar coordinates do not wrap around-it is presently not possible to have a geom that crosses over the starting angle (usually vertical).
I'll illustrate these issues with an example. The following code creates a data frame from the `mdeaths` time series data set and produces the graph shown on the left in Figure \@ref(fig:FIG-AXES-POLAR-CONTINUOUS):
```{r FIG-AXES-POLAR-CONTINUOUS-1, eval=FALSE}
# Put mdeaths time series data into a data frame
mdeaths_mod <- data.frame(
deaths = as.numeric(mdeaths),
month = as.numeric(cycle(mdeaths))
)
# Calculate average number of deaths in each month
library(dplyr)
mdeaths_mod <- mdeaths_mod %>%
group_by(month) %>%
summarise(deaths = mean(deaths))
mdeaths_mod
#> # A tibble: 12 x 2
#> month deaths
#> <dbl> <dbl>
#> 1 1 2129.833
#> 2 2 2081.333
#> 3 3 1970.500
#> 4 4 1657.333
#> 5 5 1314.167
#> 6 6 1186.833
#> 7 7 1136.667
#> 8 8 1037.667
#> ... with 4 more rows