-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.txt
261 lines (182 loc) · 12.4 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
TerraFusion
Terra Data Fusion Project - University of Illinois
Author: Landon Clipp MuQun Yang
This file outlines the structure of the combined TERRA Fusion code, how to compile it on a local machine, and how to add
additional code to the program. This file may not be up to date during development.
THE MASTER BRANCH OF THIS REPOSITORY REPRESENTS CURRENT WORKING CODE (COMPILABLE AND RUNNABLE ON BW)
The code is written in C using the following HDF libraries:
1. hdf.h -- For the HDF4 functions
2. hdf5.h -- For the HDF5 functions
3. mfhdf.h -- For the HDF4 scientific dataset (SD) interface. Also acts as a wrapper for the hdf.h library (includes hdf by
default).
***********************
***PROGRAM STRUCTURE***
***********************
This program will contain 5 basic segments. Each segment will handle reading the datasets from one single instrument and
writing the data into the output HDF5 file. Currently, main handles creating the new HDF5 output file and a program-wide global variable has been declared "hid_t outputFile" that contains the file identifier required by HDF5 functions to handle writing information into the file. This is an HDF5 only identifier, it is not valid for HDF4.
Each segment of the code is to be written as if it is its own standalone program. It will have a main function (named by the convention of MOPITT(), CERES(), etc ) where all of the appropriate function calling relevant for that instrument will go. Any functions that could conceivably be reused across the instruments should be declared in the "libTERRA.h" and defined in the "libTERRA.c" files. This allows code reusability so that a function is available to any segment of the code if need be.
Every instrument's function should receive the normal main arguments, int argc and char* argv[] (aka char** argv). The main function (main.c) will pass in the appropriate arguments to each instrument function.
*****************************************
* COMPILING *
*****************************************
Compiler used: cc OR h4cc OR h5cc OR gcc
cc -- Is the Blue Water's gcc wrapper. Must be used if you plan on using Blue Waters modules
h4cc -- A gcc wrapper for HDF4. Created during the building of HDF4 libraries.
Provides automatic visibility to the HDF4 library.
h5cc -- A gcc wrapper for HDF5. Created during the building of HDF5 libraries.
Provides atuomatic visibility to the HDF5 library.
gcc -- Default C compiler. Use if you want maximum control over how libraries are linked.
A sample Makefile has been provided. Do NOT edit this Makefile on the Github repository. You can download it to your machine
and edit it for your own machine, but do not reupload the edited version.
---ROGER---
To build the code at roger(ncsa cyberGIS clustor), do the following:
1) Copy Makefile.roger to Makefile
2) Load necessary modules
module load zlib libjpeg hdf4/4.2.12
module load zlib hdf5
3) Run make under the package root directory
make
4) Run the program for the sample orbit data at roger
(1) Copy the file that includes the input HDF4 files
Two files are provided. They are under /inputFileDebug.
roger_small_input.txt is a subset of all the needed HDF4 files in one bit.
It is good for the debugging purpose.
roger_large_input.txt includes all the needed HDF4 files in one orbit.
Under the package root,
cd exe
cp ../inputFileDebug/roger_large_input.txt .
cp ../inputFileDebug/roger_small_input.txt .
(2) Run the packed option, set the environment variable TERRA_DATA_PACK=1 first
otherwise, just run the following:
./TERRArepackage <your_output_fused_HDF5_file> roger_large_input.txt
---BLUE WATERS---
DOWNLOADING AND SETTING UP PROGRAM
Download:
1. Change into the desired directory and type "git clone https://github.com/TerraFusion/basicFusion"
2. cd into the basicFusion directory
Setup:
3. cd into bin
4. Open jobSubmit with your favorite text editor (e.g. "vim jobSubmit")
a. Change the directory of the "inputFiles" variable at the top of the page to a directory containing
the 5 TERRA instrment directories (which themselves contain valid HDF files). The directory you give
MUST contain all of the subdirectories: MOPITT, CERES, MODIS, MISR, and ASTER. Also, it must be an absolute
path.
b. Change the "PROJDIR" variable to point to your basicFusion directory. Must be an absolute path.
c. Save this file and close it.
5. Open up the batchscript.pbs script
a. Change the PROJDIR to contain an absolute path to your basicFusion directory.
b. Change the EXENAME variable to be what you want the executable to be called.
c. Change the OUTFILENAME variable to be what you want the output HDF5 file to be called.
d. Change the FILELISTNAME variable to be what the generated file list (generated by generateInput.sh) will be named. This filename must be identical to the filename given by the generateInput.sh script.
e. Change the STD_OUTFILENAME variable to be what you want the standard stream output text file to be named. All program streams are redirected to this file.
6. cd into the root project directory (where Makefile is located)
a. Two different makefiles have been provided. One compiles with static HDF libraries, "staticMakefile". The other compiles with dynamic HDF libraries, "dynamicMakefile".
It goes without saying that if you want to use dynamic compilation, you must have your HDF libraries built with dynamic libraries enabled.
Dynamic compiling may be useful for debugging purposes, but otherwise it is fine to use static compilation. cp the desired version of the makefile to "Makefile".
b. Change the values of the "INCLUDE" and "LIB" variables to point to the proper HDF libraries. Depending on how you have compiled the HDF libraries (and also whether or not
you plan on using the Blue Waters HDF modules), you will likely need to update all the variables INCLUDE1, INCLUDE2, LIB1, and LIB2.
If all of your HDF4 and HDF5 library and include files are under the same directory, it is sufficient to point just the "1" variables to your libraries.
c. Because every setup is slightly different, the Makefiles provided may not work as-is! If they don't, make sure that all compiler lines have visibility to the proper
HDF4 and HDF5 include files, and that the linker has proper visibility to the HDF4 and HDF5 libraries.
---LOCAL MACHINE---
The steps for setting up on a local machine are identical to Blue Waters with the following exceptions:
1. You don't have the Blue Waters modules at your disposal, so you MUST install the HDF4 and 5 libraries on your machine.
2. You must change the "CC" option in your Makefile to gcc, h4cc, or h5cc.
*************************************
* RUNNING *
*************************************
BATCH SCRIPT
A script has been provided to make compiling and running the program easy. In the basicFusion directory, the script called
"jobSubmit" in the bin directory contains all of the necessary commands to run the program. All you need to do is to run
this script. Here are the steps the script takes (you can also do these steps manually):
Step 1: Load modules
a. cd into bin
Execute the command
. ./loadModules
Be sure not to leave out the lone period before ./loadModules. This ensures that the modules are loaded onto your current BASH process. Leaving out the period would load them into a child process (which then immediately closes, effectively doing nothing to your current process).
OR run the commands manually:
a. module --verbose swap PrgEnv-cray PrgEnv-intel
b. module --verbose load szip/2.1
c. module --verbose load cray-hdf5/1.8.16
d. module --verbose load hdf4/4.2.10
Step 2: Compile the program
a. cd into root project directory
b. make
Step 3: Generate the inputFiles.txt file using the generateInput.sh script
a. cd into bin
b. ./generateInput.sh [path to 5 instrument directories]
Setp 4: Set environment variables to tell program whether or not to unpack data
a. export TERRA_DATA_PACK=1 (if you want to just have the packed data like the original HDF4 files)
b. unset TERRA_DATA_PACK (if you don't want the data to be packed
Step 4: Submit the executable to the BW job queue (qsub)
a. qsub -v TERRA_DATA_PACK [path to batchscript.pbs]
Check the status of the job by executing the command "qstat | grep [your BW username]". A flag of "Q" means the job
is enqueued. A flag of "R" means the job is currently executing. A flag of "C" means the job has completed.
INTERACTIVE
To run the program in interactive mode, follow these steps:
Step 1: Enter interactive mode
qsub -I -l nodes=1 -l walltime=03:00:00
You may enter any value in for walltime. This is the time you will be alloted in interactive mode. It is given
by HH:MM:SS. Larger wall times means you will wait in the queue for a longer period.
Step 2: Load modules
Execute the first step of the batch script instructions.
Recall: cd into the bin directory and run (with the lone period): . ./loadModules
Step 3: Compile
Enter into the root program directory and enter: make
Step 3:
Generate the inputFiles.txt file using the generateInput.sh script located in bin.
Give the script a path to the 5 instruments.
Step 4:
Run the following command:
aprun -n 1 "$EXE" "$OUTFILEDIR" "$FILELISTDIR" &> "$STD_STREAM_OUTFILE"
Replacing all of the variables ($EXE, $OUTFILEDIR etc ) with the variables you set in your batchscript.pbs file
Explanation of the aprun command argument by argument:
-n [number]: number of nodes requested
$EXE: path to executable program
$OUTFILEDIR: argument to the executable program of what the output file will be called
$FILELISTDIR: Argument to executable of the inputFiles.txt file
&> $STD_STREAM_OUTFILE: Send all program output to this text file
The program is now being executed. Use:
qstat | grep [your BW username]
to check the status of your job.
*****************
***ADDING CODE***
*****************
I have yet to decide how I want to handle multiple collaborators on this project. We will discuss this more in the technical
meeting.
*****************
***FLOW CHARTS***
*****************
High level API flow chart
MOPITT--------------------------------------------------
|
MISR-------------->- |
| |
MODIS-----------hdf.h--->hdf5.h-->convrt to hdf5----->hdf5.h-----> output file
^
CERES----------->---
^
ASTER----------->---
PROGRAM STRUCTURE:
Note: Directionality indicates dependency. For instance, main.c is dependent on fileList.txt and outputFileName.
MOPITTmain.c would be dependent on main.c, MOPITTfiles and libTERRA.c etc.
--------->------MOPITTmain.c------->------|
| ^^^MOPITTfiles^^^ |
| ^^^libTERRA.c ^^^ |
|-------->-------CERESmain.c------->------|
| ^^^CERESfiles^^^ |
| ^^^libTERRA.c^^^ |
fileList.txt-->main.c-------->-----|-------->-------MODISmain.c------->------|---->outputFile
outputFileName----^ | ^^^MODISfiles^^^ |
| ^^^libTERRA.c^^^ |
|-------->-------ASTERmain.c------->------|
| ^^^ASTERfiles^^^ |
| ^^^libTERRA.c^^^ |
|-------->-------MISRmain.c-------->------|
^^^MISRfiles^^^
^^^libTERRA.c^^^
libTERRA.c
^ ^
| |
| |
hdf5.h mfhdf.h