-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patherd.txt
2736 lines (2055 loc) · 120 KB
/
erd.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
= A Brief Guide to Using ERDDAP for Serving CF-Compliant netCDF Files: A Limited Use Case
Steven K. Baum
v2.718, 2020-08-28
:doctype: book
:toc:
:icons:
:source-highlighter: coderay
:numbered!:
[preface]
== Overview
The basic steps for setting up ERDDAP and preparing and serving
CF-compliant netCDF files from it are described herein.
The netCDF files were needed to serve historical physical oceanographic
datasets for the GRIIDC project.
The separate official online documents for setting up ERDDAP and for working with datasets
are complete and invaluable, but somewhat daunting for anyone who wants to spin up from the beginning.
This document is meant to provide a brief introduction to ERDDAP and the
official documentation for a limited use case, and covers only a small fraction of
what is contained within the latter.
It can be consulted for getting a sense of the general procedures needed to install
and use ERDDAP, but the official documentation for both setting up an ERDDAP server:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html`]
and for preparing datasets to be served by ERDDAP:
https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html[`https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html`]
should be consulted for the final word on everything you find in the document.
There are links to the sections in the official documentation that correspond to
the sections herein, and you should read them sooner rather than later.
There are also many, many sections in the official documentation that aren't covered here.
This brief introduction doesn't cover even 10% of the available details of how to set up
ERDDAP and use it to serve datasets.
It covers how to install and set up ERDDAP for a first time in a reasonably secure way,
and how to prepare CF-compliant netCDF files to serve in it.
The CF-compliant netCDF files covered here are just one example of one of the two general
types of data that ERDDAP can process and serve: gridded and tabular.
Gridded data is data that exists on regular grids such as numerical model output or satellite
data. Tabular data is data that isn't on grids and can be stored in tables.
Oceanographic examples of this include buoy, station and trajectory data that exist as points
that can vary in space and time, but do not exist as entire spatial fields that vary in time.
The types of gridded datasets ERDDAP can handle are described at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#EDDGrid[`https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#EDDGrid`]
and the types of tabular datasets at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#EDDTable[`https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#EDDTable`]
Reading through these sections will give you an idea of what kinds of data and files
ERDDAP can serve, and how to prepare them.
The subset of the table category that is covered here is the *EDDTableFromNcCFFiles* category
that aggregates data from netCDF files that follow the conventions of the CF Discrete Sampling
Geometries as described at:
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#discrete-sampling-geometries[`http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html#discrete-sampling-geometries`]
The details about the category are found in the official documentation at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#EDDTableFromNcCFFiles[`https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#EDDTableFromNcCFFiles`]
The specific and limited use of ERDDAP described here consists of three major steps:
* Obtaining, installing and setting up Java, Tomcat and ERDDAP.
* Preparing the files - in this case netCDF files containing historical Gulf of Mexico datasets - such that
they can be properly processed and displayed by ERDDAP.
* Creating an XML configuration fragment for each netCDF file containing all the meta-data needed by
ERDDAP to properly serve it.
*Note:* This document document explains how to do these things on a Linux operating system, in this case the
CentOS distribution. ERDDAP can be installed on OSX and Windows machines, but that is not covered here.
This document also assumes reasonable familiarity with the use of the command line on Linux
systems. While ERDDAP itself is a tremendously capable and useful GUI, all the things you will need
to do to prepare datasets for it will be from the command line.
== Initial Installation
The procedure for obtaining and installing ERDDAP is present is great detail at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html`]
It's a good idea to at least skim through the first part of this document about how to install the chain of
Java, Tomcat and ERDDAP needed to achieve proper ERDDAP functionality.
Familiarization with the steps of the entire procedure can prevent annoying stumbles and backtracking.
Herein we'll focus on those things specific to the present GRIIDC ERDDAP server set-up.
It is popular and profitable to install such things using Docker these days.
In this matter we'll just quote the official document:
[quote]
If you already use Docker, you might prefer the Docker version.
If you don't already use Docker, we generally don't recommend this.
If you chose to install ERDDAP via Docker, we don't offer any support for the installation process.
We haven't worked with Docker yet. If you work with this, please send us your comments.
=== Java
This is officially covered at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#java[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#java`]
The GRIIDC ERDDAP server is presently on a machine that runs the Linux CentOS operating system, which is a
freely available downstream variant of the RedHat Enterprise distribution.
CentOS contains its own Java and Tomcat packages, but the GRIIDC ERDDAP server is presently
run on Java and Tomcat packages that are external to the official CentOS distribution.
This is due to CentOS being historically a bit slow in updating the Java and Tomcat packages in
their official distribution, as well as more recently because the official document recommends a very specific
Java distribution on which to run Tomcat and ERDDAP.
The Java, Tomcat and ERDDAP packages were all installed in `/opt` in this project because they are all external to CentOS,
and because it's a useful thing to keep them all in one centralized
location. The official CentOS Java and Tomcat packages are
scattered all over the filesystem, and it is more trouble than it is worth to attempt to shoehorn non-standard
packages into the various niches into which the standard packages are presently stashed. There is a procedure for
taking source code packages and configuring and compiling them into standard CentOS binary packages, although one
might call it attempting to lower the river rather than raising the bridge. Steadily increasing security
demands will probably force future installations to use the standard Java and Tomcat packages in the CentOS
distribution, but that is beyond the scope of this document.
The Java distribution officially recommended can be found at:
https://adoptopenjdk.net/[`https://adoptopenjdk.net/`]
This is not the official distribution available from java.com nor the CentOS distribution package.
The AdoptOpenJDK project was started because of "the general lack of an open and reproducible build & test system for OpenJDK source across multiple platforms."
The general goals are to provide a reliable source of OpenJDK binaries for all platforms for the long run,
and to provide an open, common, audited, build infrastructure.
If the creator of ERDDAP recommends it - "it is the main, community-supported, free (as in beer and speech) version of Java 8 that offers Long Term Support (free upgrades until at least September 2023)" - then we'll use it.
An additional bonus to using AdoptOpenJDK is that is contains the DejaVu fonts that are
recommended to replace the standard Java fonts, so you can skip the step of downloading them
and installing them in another Java distribution.
There are Java versions - 9, 10 and 11 - beyond the recommended version 8, but ERDDAP has not yet been
tested with them. Check the official installation document for further developments in this area.
Installing the recommended Java is as easy as downloading it into `/opt`, unpacking it. Bob recommends
installing it in `/usr/local`, but that is the standard place into which most open source packages are installed
by default and can get crowded. Using `/opt` allows you to limit the neighborhood to just the essentials of Java,
Tomcat and ERDDAP.
=== Tomcat
This is officially covered at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#tomcat[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#tomcat`]
==== Overview
While the Java installation is easy, Tomcat is more complicated, requiring additional steps beyond
downloading and unpacking.
Tomcat is a Java Application Server (JAS), a piece of software - sometimes called middleware - that exists
between the network services of the operating system and Java applications like ERDDAP.
There are other freely available and commercial JAS packages, but ERDDAP is only supported under Tomcat.
The recommended Tomcat distribution is the latest version of Tomcat 8, and can be found at:
https://tomcat.apache.org/download-80.cgi[`https://tomcat.apache.org/download-80.cgi`]
*Note:* If you are already running a Tomcat server it is recommended that you install a second
one that will run only ERDDAP.
The general procedure for installing and configuring Tomcat is:
* install Tomcat in the recommended location
* create a `tomcat` user for security
* make changes to configuration files
* make whatever changes are recommended or required for security
* set the environment variables appropriately for an ERDDAP server
* convince yourself that the Tomcat server is working correctly
Each of these steps are explained below. The official documentation on how to set up your
own ERDDAP goes into much greater detail about all of these steps.
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html[+https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html+]
==== Installation
Download it and unpack it into `/opt`.
This will create a directory such as `apache-tomcat-8.5.57` (the latest release as of 2020-08-18).
It is useful to change the directory name to something like `tomcat8`, and also to create a file
in `/opt` called `000-tomcat-8.5.57` as a reminder.
Your Tomcat distribution is now installed in `/opt/tomcat8`.
The procedure for doing this as root after you downloaded the Tomcat distribution into `/opt` is:
-----
tar xzvf apache-tomcat-8.5.57.tar.gz
mv apache-tomcat-8.5.57 tomcat8
-----
*Note:* In the remainder of this document, all commands run within
the Tomcat directory will be shown with their full path including
the base directory `/opt/tomcat8`. If you install Tomcat elsewhere, you
will of course need to change this to the new path. You might want to
grab a copy of the source code for this document (as detailed at the end
of this document), do a global search and replace on the Tomcat path
location, and recompile a custom version for your own use.
It is useful to keep the original tarred and compressed file in the directory to
remind you which version you're running. If you wish to remove it, then you might want to
create a reminder file, e.g.
-----
vi 000-tomcat-8.5.57
-----
==== Creating a `tomcat` User
It is strongly recommended to set up Tomcat (and ERDDAP) to belong to a `tomcat` user
that has no password and limited permissions.
This means that only the super user can switch to user `tomcat` and that nobody can
log in to your server as `tomcat`.
The general procedure for doing this is as root or via `sudo` is:
-----
useradd tomcat -s /bin/bash -p '*'
chown -R tomcat /opt/tomcat8
chgrp -R tomcat /opt/tomcat8
chmod -R ug+rwx /opt/tomcat8
chmod -R o-rwx /opt/tomcat8
-----
After this has been set up, you can become user `tomcat` via:
`sudo su - tomcat`
==== Configuration File Changes
At this point, the official documentation recommends making changes to three configuration files.
First, the file:
`/opt/tomcat8/conf/server.xml`
needs to be edited (as user `tomcat`). The value of `connectionTimeout` should be changed to `300000` and
the parameter `relaxedQueryChars` needs to be added as shown. Some additional parameters as suggested at:
https://www.upguard.com/blog/15-ways-to-secure-apache-tomcat-8[+https://www.upguard.com/blog/15-ways-to-secure-apache-tomcat-8+]
have also been added in the example below. Just change that entire section to what you see below.
-----
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="300000"
relaxedQueryChars="[]|"
redirectPort="8443"
allowTrace="False"
xpoweredby="False"
deployXML="False"/>
-----
This also needs to be done for the section:
-----
<Connector port="8443" ...
...
</Connector>
-----
if that section is uncommented and used to provide `https` connections.
Next, edit the file:
`/opt/tomcat8/conf/context.xml`
(also as user `tomcat`) and add the `Resources` tag as shown below.
-----
<Context>
<WatchedResource>WEB-INF/web.xml</WatchedResource>
<WatchedResource>${catalina.base}/conf/web.xml</WatchedResource>
<Resources cachingAllowed="true" cacheMaxSize="80000" />
</Context>
-----
Finally, edit the file:
`/etc/httpd/conf/httpd.conf`
and add the following lines near the end of the file right before the
`Supplemental configuration` line. This must be done as root or via `sudo`.
-----
TimeOut 3600
ProxyTimeout 3600
-----
==== General Security Considerations
The official documentation has many suggestions about securing your Tomcat server, as
well as links to other resources.
The THREDDS documentation at:
https://www.unidata.ucar.edu/software/tds/current/tutorial/AdditionalSecurityConfiguration.html[+https://www.unidata.ucar.edu/software/tds/current/tutorial/AdditionalSecurityConfiguration.html+]
also contains many useful Tomcat security suggestions and links.
If you search for "tomcat security" you can find many, many more suggestions.
It would be a good idea to consult your local security people about this, or
just wait for them to contact you.
A very important recommendation is to run Tomcat under the user `tomcat` rather than as `root`.
There are instructions as to how to create this separate user with limited permissions
and with no password, thus enabling only the superuser or those with `sudo` privileges to
become user `tomcat` and make changes to the installation.
There are also recommendations on changing file permissions to increase security.
Follow them all.
==== Specific Security Measures
We'll now go over a few simple things that can be done to increase security.
First, we'll disable some HTTP methods that won't be needed.
Edit the file:
`/opt/tomcat8/conf/web.xml`
and add the following block of code. It can be added anywhere before the
`</web-app>` tag at the end of the file.
-----
<security-constraint>
<web-resource-collection>
<web-resource-name>restricted methods</web-resource-name>
<url-pattern>/*</url-pattern>
<http-method>TRACE</http-method>
<http-method>PUT</http-method>
<http-method>OPTIONS</http-method>
<http-method>DELETE</http-method>
</web-resource-collection>
<auth-constraint />
</security-constraint>
-----
There are also ways to implement fine-grained security measures at the file
and directory level. These are available at both the Java and OS level. The Java level
procedures can be found at:
http://tomcat.apache.org/tomcat-8.0-doc/security-manager-howto.html[+http://tomcat.apache.org/tomcat-8.0-doc/security-manager-howto.html+]
and the OS level procedures at:
https://wiki.centos.org/HowTos/SELinux[+https://wiki.centos.org/HowTos/SELinux+]
These are way, way beyond the scope of this document and most confusing to apply. If you really do want to give them a try
you might want to contact a security professional.
==== Environment Variables
This is officially covered at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#WindowsMemory[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#WindowsMemory`]
Variables specifying the environment in which Tomcat will run need to be created in a file called:
`/opt/tomcat8/bin/setenv.sh`
An example of this for this specific set-up is:
-----
export JAVA_HOME=/opt/jre
export JAVA_OPTS='-server -Djava.awt.headless=true -Xmx8000M -Xms8000M'
export TOMCAT_HOME=/opt/tomcat8
export CATALINA_HOME=/opt/tomcat8
-----
Follow the instructions about how to set the `-Xmx` and `-Xms` memory
settings. It will depend on both your available RAM and the size and number
of files you're attempting to serve with ERDDAP.
The example contains the recommended minimum setting for a 64-bit machine,
although it is further recommended to set these to 1/2 of your available RAM
memory.
These recommendations tell you that you should have at least 16GB of RAM to
run an ERDDAP server, although experience has shown that 32GB is a better choice for a minimum.
==== Testing Tomcat
This is officially covered at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#testTomcat[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#testTomcat`]
After all the configuring and securing, try to start Tomcat with:
`/opt/tomcat8/bin/startup.sh`
*Note:* If you're not going to attempt to install ERDDAP on a Windows machine, go ahead
and delete all the `*.bat` files in `/opt/tomcat8/bin`.
After you start it, use your browser to go to:
`https://yourmachine.org:8080`
If you don't get the Tomcat welcome page, then go through the debugging section in
the official documentation.
A very important log file for this is:
`/opt/tomcat8/logs/catalina.out`
which will typically supply details about the problem if it is a Tomcat rather than
an external network problem.
A good thing to do right after you've successfully tested Tomcat is to go into
the directory:
`/opt/tomcat8/webapps`
and delete everything therein, unless you're planning to manage the server remotely
rather than from the command line. Remote management can be and usually is a big security hole,
so you're better off just removing everything from that directory.
You might consider keeping the `ROOT` directory which supplies the initial Tomcat
splash page, and there are suggestions on web as to how to edit the contents of `ROOT`
to remove the splash page and provide minimal information rather than the error message
you'll get in your browser if you remove it.
=== ERDDAP
==== Overview
ERDDAP is a Java web application that runs in the web application server Tomcat.
The general installation and configuration procedure is:
* install the ERDDAP configuration files
* make appropriate changes to the configuration file `setup.xml` that sets the general server environment
* modify the `datasets.xml` file to specify your datasets
* create a parent directory in which all the internal ERDDAP files are kept
* install the ERDDAP war archive file `erddap.war` in Tomcat
* restart Tomcat to see if ERDDAP is working and, if not, debug as necessary
==== Install Configuration Files
This is officially covered at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#erddapContent[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#erddapContent`]
Download the configuration files from:
https://github.com/BobSimons/erddap/releases/download/v2.02/erddapContent.zip[`https://github.com/BobSimons/erddap/releases/download/v2.02/erddapContent.zip`]
This address will change whenever a new version becomes available, so check the official documentation to be sure.
The installation procedure - performed as user `tomcat` - is:
-----
cd /opt/tomcat8
wget https://github.com/BobSimons/erddap/releases/download/v2.02/erddapContent.zip
unzip erddapContent.zip
-----
which will create the directory:
`/opt/tomcat8/content/erddap`
in which the configuration files for both the server and the datasets it will serve reside.
==== Modifying `setup.xml`
Edit the file `/opt/tomcat8/content/erddap/setup.xml` and make all the suggested changes therein.
The changes that are *mandatory* are:
-----
<bigParentDirectory>
<emailEverythingTo>
<baseUrl>
<baseHttpsUrl>
<email.*>
<admin.*>
-----
Additonally, `<fontFamily>` will have to be changed if you're not using the DejaVu fonts
that come with the AdoptOpenJDK distribution.
There are many non-mandatory configuration options, which you may need to come back to
as you gain experience in running your server.
*Note:* It is easy enough to make changes to `setup.xml` that are not considered well-formed
by the very finicky XML parser. You can either use an XML validator to check the file
or check the contents of a log file that will soon enough become your intimate friend:
`/opt/erddap/logs/log.txt`
for information as to the nature of your XML problem.
Be warned, though, that error messages about XML problems provide information
no more specific than the line number of the end of the `<dataset>` block in which
the XML problem resides. A validator is probably your best option.
This file will also be the go-to
place to track down the inevitable problems your dataset configuration files will have,
and the error messages for dataset rather than XML syntax problems are generally much
more useful.
==== Modifying `datasets.xml`
This file does not need to be modified to test if your ERDDAP server is working.
The distribution comes with a default `datasets.xml` file that is fully debugged
and contains examples of how to set up various types of datasets.
Just leave it as is to test your initial server set up.
See the remaining 90% of this document for how to move on to configuring `datasets.xml`
for your datasets.
==== Create a Parent Directory
The `<bigParentDirectory>` is a very important directory to create.
It will contain all the internal netCDF and Java database files ERDDAP creates from
the netCDF and other files you configure in `datasets.xml` for inclusion.
ERDDAP creates several new internal files for each file it ingests, including those
with summary information data for each dataset in a format more quickly and readily available
than the original file.
On the GRIIDC ERDDAP server, this directory is specified as `/opt/erddap`.
*DO NOT* in any circumstances put this directory inside the `/opt/tomcat8` directory hierarchy.
You might even want to put it on a separate data disk.
It is also very important to make user `tomcat` the owner of the `/opt/erddap` hierarchy and
change the permissions as indicated. The basic procedure to create and appropriately modify
this directory - assuming that it will be create as `/opt/erddap` - is:
-----
mkdir /opt/erddap
chown -R tomcat /opt/erddap
chgrp -R tomcat /opt/erddap
chmod -R ug+rwx /opt/erddap
chmod -R o-rwx /opt/erddap
-----
==== Installing the ERDDAP Server
This is officially covered at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#erddap.war[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#erddap.war`]
The ERDDAP server can now be downloaded from:
https://github.com/BobSimons/erddap/releases/download/v2.02/erddap.war[`https://github.com/BobSimons/erddap/releases/download/v2.02/erddap.war`]
the location of which, as with the configuration files, will change with the version number.
*Note*: The full ERDDAP distribution is half a gigabyte in size, so it might take a while to download.
The procedure for doing this as user `tomcat` is:
-----
cd /opt/tomcat8
wget https://github.com/BobSimons/erddap/releases/download/v2.02/erddap.war
mv erddap.war webapps
-----
==== Shortening the URL
This is officially covered at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#ProxyPass[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#ProxyPass`]
At this point you can edit the Apache/HTTPD configuration file to use ProxyPass so users won't have
to specify the port number `:8080` in the URL for the server. The official documentation provides the
details for this, and don't forget to restart the Apache server if you do this.
The procedure is to edit - as root or via `sudo` - the file:
`/etc/httpd/conf/httpd/conf`
and modify an existing `VirtualHost` tag - or create one - and add:
-----
<VirtualHost *:80>
ServerName YourDomain.org
ProxyRequests Off
ProxyPreserveHost On
ProxyPass /erddap http://localhost:8080/erddap
ProxyPassReverse /erddap http://localhost:8080/erddap
</VirtualHost>
-----
Then restart your Apache HTTP server via:
`/usr/sbin/apachectl -k graceful`
There are infinite possibilities and subtleties when it comes to modifying the `httpd.conf` file.
If you have a problem with getting this to work, search through the extensive web documentation
available for doing just this. An Apache HTTP configuration tutorial is beyond the scope of this document.
==== Starting the ERDDAP Server
This is officially covered at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#startTomcat[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#startTomcat`]
Now we start the ERDDAP server. If the Tomcat server is still running from your initial
test shut it down via:
`/opt/tomcat8/bin/shutdown.sh`
and then start it up again via:
`/opt/tomcat8/bin/startup.sh`
*Note:* You can check the up or down status of your Tomcat server via the command:
`ps -ef | grep tomcat`
After you've started or restarted Tomcat, check the status of ERDDAP by browsing to:
`https://yourmachine.org:8080/erddap/status.html`
You can also check to see if your ProxyPass set-up is working by trying this URL
without the port number `:8080`.
If ERDDAP does not successfully start it could be a problem with the Apache server, the
Tomcat server, or ERDDAP itself. Check the appropriate logs for error messages. If your
Tomcat server started without any problems, you can probably skip checking Apache and just
look in either:
-----
/opt/tomcat8/logs/catalina.out
/opt/erddap/logs/log.txt
-----
The official documentation has much more information about this at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#isErddapRunning[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#isErddapRunning`]
==== Upgrading ERDDAP
The official instructions are at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#update[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#update`]
and you should read them before this.
You will eventually need to upgrade ERDDAP to a new version, and
also to upgrade Tomcat significantly more often since that package
seems to have updates every month or so.
If there is a new ERDDAP release, check to see if there's also
a more recent Tomcat version - there will be - and upgrade them
both at the same time.
The official documentation has a section on how to do this, but
there are some additional things you really should consider.
Before you upgrade both, though, there are a couple of directories
you really, really need to back up before you do anything.
The first directory to back up is:
`/opt/tomcat8/webapps/erddap/WEB-INF`
which contains the `GenerateDatasetsXml.sh` script you use to create
XML configuration chunks for the `datasets.xml` file. In the present
GRIIDC set-up, that is the directory in which the Python scripts used to
automatically create XML for the configuration file from all files
in a dataset reside. You can skip this if you set up those scripts
somewhere other than under `/opt/tomcat8`, although that will require
making some modifications to `GenerateDatasetsXml.sh` so it can find
the libraries it requires. It is presently written such that it looks
for them relative to the `WEB-INF` directory.
The second directory you'll need to back up is the directory holding
all your XML configuration files:
`/opt/tomcat8/content/erddap`
You do not strictly need to back this up if you are only dropping a new version
of ERDDAP into the `webapps` directory, but it is a good idea to do so anyway
(and often). You do not want to lose those configuration files after all you've
done to create them.
It is recommended that you do not simply swap out `erddap.war` files since
it is quite possible that the new ERDDAP version will throw an error on
file configurations the previous version successfully processed.
You should set up a temporary second Tomcat server as outlined in
the next section about setting up a development server.
That way your present production server can keep humming along while
you do any debugging you might have to do with the new server.
After you are satisfied that the new server is working as it should,
then simply shut down the old one and start up the new one.
The old one can then be deleted.
To summarize: Do *NOT* remove your present Tomcat/ERDDAP combination until
you have backed up what you need to and you are confident that the new
combination is working.
==== The Need for an Additional Development Server
If you have only a single production ERDDAP server running and
wish to put some new datasets on it you will almost certainly
have to go through the debugging cycle. This requires time to start
ERDDAP, peruse the log file to see what's not yet quite right, fix it,
restart ERDDAP, and repeat as necessary.
It can also take minutes to even hours for an ERDDAP to start even
with already debugged datasets.
An ERDDAP with dozens of Florida visual census data files - some of which are
over a gigabyte in size - takes well over an hour to start and completely
process all the files.
The bullet point to take away from this part of the presentation is that
you're going to incur significant down time on your production server if
you also use it for development, and that just restarting it for any
purpose can cause significant down time.
A solution is to set up a second server for development purposes.
This is done by setting up a second Tomcat server - for example in `/opt/tomcat-dev`
rather than `/opt/tomcat` - in the same way you set up the first, production server.
The only significant difference is that you'll have to edit the file:
`/opt/tomcat-dev/conf/server.conf`
to change the port on which it is served from `8080` to another available port.
You can use the same Java distribution but you'll have to have a second Tomcat
installation.
This is also covered in the official documentation:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#secondErddap[+https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#secondErddap+]
==== Customizing the Appearance of ERDDAP
This is covered in the official documentation at:
https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#customize[`https://coastwatch.pfeg.noaa.gov/erddap/download/setup.html#customize`]
The aesthetic appearance of the default ERDDAP distribution is based on
the needs of NOAA ERD. If you're not NOAA ERD you'll probably want to modify the aesthetics.
This is done by copying either or both of the tags shown below into the
`datasets.xml` file, wherein they will override the default versions.
Change whatever is needed to personalize the banners for your organization.
Look at the ERDDAP home page and decide what you want to change, find where it's
located in the files below, and edit them to reflect your desired changes.
The image `/images/noaab.png` specified in the `startBodyHtml5` tag is located
at:
`/opt/tomcat8/webapps/erddap/images/noaab.png`
which tells you the `messages.xml` file and its contents are processed
from the directory:
`/opt/tomcat8/webapps/erddap`
and thus where you'll need to put a replacement image.
The settings at the top of the home page are established in the file:
`/opt/tomcat8/webapps/erddap/WEB-INF/classes/gov/noaa/pfel/erddap/util/messages.xml`
in the `<startBodyHtml5>` tag. The default state of this tag is:
-----
<startBodyHtml5><![CDATA[
<body>
<table class="compact nowrap" style="width:100%; background-color:#128CB5;">
<tr>
<td style="text-align:center; width:80px;"><a rel="bookmark"
href="https://www.noaa.gov/"><img
title="National Oceanic and Atmospheric Administration"
src="&erddapUrl;/images/noaab.png" alt="NOAA"
style="vertical-align:middle;"></a></td>
<td style="text-align:left; font-size:x-large; color:#FFFFFF; ">
<strong>ERDDAP</strong>
<br><small><small><small>Easier access to scientific data</small></small></small>
</td>
<td style="text-align:right; font-size:small;">
&loginInfo;
<br>Brought to you by
<a title="National Oceanic and Atmospheric Administration" rel="bookmark"
href="https://www.noaa.gov">NOAA</a>
<a title="National Marine Fisheries Service" rel="bookmark"
href="https://www.fisheries.noaa.gov">NMFS</a>
<a title="Southwest Fisheries Science Center" rel="bookmark"
href="https://swfsc.noaa.gov">SWFSC</a>
<a title="Environmental Research Division" rel="bookmark"
href="https://swfsc.noaa.gov/textblock.aspx?Division=ERD&id=1315&ParentMenuId=200">ERD</a>
</td>
</tr>
</table>
]]></startBodyHtml5>
-----
The settings for the left side of the page are set in the `<theShortDescriptionHtml>` tag,
the default state of which is:
-----
<theShortDescriptionHtml><![CDATA[
<h1>ERDDAP</h1>
ERDDAP is a data server that gives you a simple, consistent way to download
subsets of scientific datasets in common file formats and make graphs and maps.
This particular ERDDAP installation has oceanographic data
(for example, data from satellites and buoys).
[standardShortDescriptionHtml]
]]></theShortDescriptionHtml>
-----
wherein the `standardShortDescriptionHtml` contents can also be found in the `messages.xml` file.
== How to Get Datasets into ERDDAP
=== Overview
The initial installation and configuration of ERDDAP is the easy part.
Configuring datasets such that they can be correctly ingested by ERDDAP
is both harder and more tedious, but is straightforward enough given some practice.
With many geoscience data servers such as THREDDS and OpenDAP the configuration
process is to simply specify the directory path or the URL of the dataset and the
server does the rest. With ERDDAP you have to write a large chunk of XML that
describes the dataset.
The basic process by which this is done is:
* set up a standardized data directory structure
* prepare a dataset - in the case of GRIIDC netCDF files - for ingestion into ERDDAP
* run an ERDDAP script that reads the dataset and creates 99% of the XML boilerplate required
* place that chunk of XML into the `datasets.xml` file
* restart ERDDAP
* if the chunk works and ERDDAP has processed and is displaying your dataset, rejoice
* if your dataset doesn't appear in the ERDDAP listing, search the `/opt/erddap/logs/log.txt`
file for your dataset and hopefully find a message that tells you what you need to fix
* go back to the restart ERDDAP step, and cycle through as needed
The ERDDAP error messages are very useful and typically
tell you exactly what's missing or otherwise wrong. There are also occasional less than helpful
Java error messages like `EOFException`, but that's usually not the case.
As was mentioned earlier, you will be spending a lot of time perusing
the `/opt/erddap/logs/log.txt` file, so you might want to create a short macro that
will take your editor directly there.
The two really big tasks are creating the datasets in the correct netCDF format and
creating the XML configuration file from these datasets.
We will approach these separately.
=== The Directory Structure
==== The Data Directory
ERDDAP must be able to read the files in all the datasets from somewhere in
order to process them via the information in the configuration files.
If you're running ERDDAP on more than one machine you soon discover that it is a good
idea to establish a standard location for the data across all machines.
The GRIIDC set-up sets this location as:
`/data`
This can be either an actual subdirectory at your root level, or
a symbolic link to somewhere else on a data disk.
Below this level are subdirectories for each different geometry. This
looks like:
-----
/data/profile
trajectory
ts
trajprof
tsprof
-----
There are subdirectories in each of these for separate projects.
-----
/data/profile/griidc
latex
negom
deepwater
ws
-----
And, finally, there are - if there many different datasets within a project
such as with GRIIDC - subdirectories whose names reflect the UDI of each datasets.
-----
/data/profile/griidc/R1.x132.134.0001
R1.x132.134.0002
...
Y1.x030.000.0007
-----
The choice of the UDI as the finest-grained subdirectory name was obvious.
The motivation for this particular overall structure was ultimately decided
by there existing multiple geometry types - e.g. profile and trajectory files -
in the same UDI dataset. Each of the geometry types requires processing in
different ways by different scripts, and it is less confusing to keep them
separate in their archive as well as in the processing stage.
You inform ERDDAP of the location of all the ERDDAP-ready netCDF-format
datasets via the scripts that use `GenerateDatasetsXml.sh` to create the XML
configuration chunks.
Each dataset is contained in a separate subdirectory under `/data`
that must be specified when running the scripts that automatically
create the XML for an entire dataset.
==== The Working Directory
The various preparation steps described below will leave a lot of detritus
in the directories containing your completed and ERDDAP-ready netCDF files.
For example, it is a good idea to create an `ARCHIVE` subdirectory in which
to store any files you receive so you can copy the originals back into your
main working directory should you happen to munge them beyond repair.
It is thus good to have a separate working directory structure where the
sausages can be made and the finished product exported to the data archive
directory structure.
Also, during the preparation of many of the old datasets, different one-off
shell and Python scripts were required to even get them to the point where the
general attribute fixing scripts could be used.
=== Creating the netCDF Datasets
==== Overview
There are many different types of dataset formats that ERDDAP can ingest and display if they
are properly prepared. For the purposes of this document, we are only going to concern ourselves
with netCDF files that conform to the metadata and feature type conventions described in the
latest CF conventions document at:
https://cfconventions.org/latest.html[`https://cfconventions.org/latest.html`]
and the NOAA NCEI netCDF templates - a recommended superset of the CF standards - at:
https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/[`https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/`]
and the ACDD metadata conventions at:
https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3[`https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3`]
There are additional metadata conventions required by IOOS and described at:
https://ioos.noaa.gov/data/contribute-data/metadata-data-formats/[`https://ioos.noaa.gov/data/contribute-data/metadata-data-formats/`]
but these three sets of conventions are the bulk of that with which you must be concerned.
It is important to realize the difference between the types of conventions as well as the necessity of
conforming to both in order for a dataset to be successfully served by ERDDAP.
The metadata convetions concern descriptions of the data rather than the data itself, for instance
the standard names and units of the variables as well as the provence and processing of the dataset.
The feature type conventions deal with the specifics of how the data is structured within the files
in the dataset. Each will now be explained more fully.
===== Metadata Conventions
The metadata in a dataset is everything that is not the data itself.
It provides information about the type of data as well as the provenance and processing of the data.
GRIIDC files need to conform to the IOOS metadata requirements for data providers. These requirements
are specified in the IOOS Metadata Profile, which is at version 1.2 as of this writing.
This contains dataset attribution guidelines and examples to help the US IOOS community publish datasets
in netCDF and other related data formats in an interoperable manner.
The Profile defines recommended and required global and variable attributes for IOOS data
providers to include when publishing their datasets and accompanying services.
https://ioos.github.io/ioos-metadata/[`https://ioos.github.io/ioos-metadata/`]
The Metadata Profile is based on the following standards, each of which is a superset of
the previous one.
* *netCDF Climate and Forecast Conventions (CF)*
** https://cfconventions.org/[`https://cfconventions.org/`]
* *Attribute Convention for Data Discovery (ACDD)*
** https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3[`https://wiki.esipfed.org/Attribute_Convention_for_Data_Discovery_1-3`]
* *NOAA NCEI NetCDF Templates*
** https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/[`https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/`]
As is it the highest level superset, the NCEI Standard Attributes and Guide Table at:
https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/#guidancetable[`https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/#guidancetable`]
contains all the metadata items with you need to be concerned, at
least for the GRIIDC physical oceanography data. It contains both the CF and ACDD
metadata requirements as well as some additional NCEI metadata requirements.
IOOS additionally requires adherence to the following metadata standards:
* *ISO 19115-2* for ancillary dataset and collection level metadata for Catalog indexing
** https://www.iso.org/standard/39229.html[`https://www.iso.org/standard/39229.html`]
* *Darwin Core* for biological data
** https://dwc.tdwg.org/[`https://dwc.tdwg.org/`]
The Metadata Profile standards deal with what is called discovery and use metadata that help find a file and
use the data therein. The ISO 19115-2 standard deals with the details of the provenance and processing of the data.
The Darwin Core - still under early development for oceanographic data - is for biological datasets.
The ERDDAP program itself also has some additional metadata requirements that are needed to ensure
that datasets can be served with the full subsetting and graphical capabilities available.
These will be described in the course of this document.
While it won't help you to create your netCDF file with the various metadata requirements,
there is a
useful tool to check the compliance of your file
after you have created it. It is the IOOS Compliance Checker available at:
https://github.com/ioos/compliance-checker[`https://github.com/ioos/compliance-checker`]
It is a Python script that checks for completeness and community standard compliance of
local or remote netCDF files against CF and ACDD file standards. It checks for any
combination of or all of ACDD, CF, SOS, IOOS, Glider DAC and NCEI standards.
When you run it from the command line it tells you what your file is missing or has specified incorrectly.