README.txt

TerraFusion
Terra Data Fusion Project - University of Illinois  
Author: Landon Clipp MuQun Yang 

This file outlines the structure of the combined TERRA Fusion code, how to compile it on a local machine, and how to add
additional code to the program. This file may not be up to date during development.

THE MASTER BRANCH OF THIS REPOSITORY REPRESENTS CURRENT WORKING CODE (COMPILABLE AND RUNNABLE ON BW)

The code is written in C using the following HDF libraries:  
1. hdf.h -- For the HDF4 functions  
2. hdf5.h -- For the HDF5 functions  
3. mfhdf.h -- For the HDF4 scientific dataset (SD) interface. Also acts as a wrapper for the hdf.h library (includes hdf by
  default).  

***********************
***PROGRAM STRUCTURE***
***********************

This program will contain 5 basic segments. Each segment will handle reading the datasets from one single instrument and
writing the data into the output HDF5 file. Currently, main handles creating the new HDF5 output file and a program-wide global variable has been declared "hid_t outputFile" that contains the file identifier required by HDF5 functions to handle writing information into the file. This is an HDF5 only identifier, it is not valid for HDF4.

Each segment of the code is to be written as if it is its own standalone program. It will have a main function (named by the convention of MOPITT(), CERES(), etc ) where all of the appropriate function calling relevant for that instrument will go. Any functions that could conceivably be reused across the instruments should be declared in the "libTERRA.h" and defined in the "libTERRA.c" files. This allows code reusability so that a function is available to any segment of the code if need be.

Every instrument's function should receive the normal main arguments, int argc and char* argv[] (aka char** argv). The main function (main.c) will pass in the appropriate arguments to each instrument function.


*****************************************
*               COMPILING               *
*****************************************

Compiler used: cc OR h4cc OR h5cc OR gcc

    cc   -- Is the Blue Water's gcc wrapper. Must be used if you plan on using Blue Waters modules
    h4cc -- A gcc wrapper for HDF4. Created during the building of HDF4 libraries.
            Provides automatic visibility to the HDF4 library.
    h5cc -- A gcc wrapper for HDF5. Created during the building of HDF5 libraries.
            Provides atuomatic visibility to the HDF5 library.
    gcc  -- Default C compiler. Use if you want maximum control over how libraries are linked.


A sample Makefile has been provided. Do NOT edit this Makefile on the Github repository. You can download it to your machine
and edit it for your own machine, but do not reupload the edited version.


---ROGER---

To build the code at roger(ncsa cyberGIS clustor), do the following:

1) Copy Makefile.roger to Makefile
2) Load necessary modules
   module load zlib libjpeg hdf4/4.2.12
   module load zlib hdf5
3) Run make under the package root directory
   make
4) Run the program for the sample orbit data at roger
   (1) Copy the file that includes the input HDF4 files 
       Two files are provided. They are under /inputFileDebug.
       roger_small_input.txt is a subset of all the needed HDF4 files in one bit.
       It is good for the debugging purpose.
       roger_large_input.txt includes all the needed HDF4 files in one orbit.
       Under the package root,
       cd exe
       cp ../inputFileDebug/roger_large_input.txt .
       cp ../inputFileDebug/roger_small_input.txt .
   (2) Run the packed option, set the environment variable TERRA_DATA_PACK=1 first
       otherwise, just run the following:
       ./TERRArepackage <your_output_fused_HDF5_file> roger_large_input.txt
       
---BLUE WATERS---

     DOWNLOADING AND SETTING UP PROGRAM

Download:
1. Change into the desired directory and type "git clone https://github.com/TerraFusion/basicFusion"

2. cd into the basicFusion directory

Setup:
3. cd into bin

4. Open jobSubmit with your favorite text editor (e.g. "vim jobSubmit")

    a.  Change the directory of the "inputFiles" variable at the top of the page to a directory containing
        the 5 TERRA instrment directories (which themselves contain valid HDF files). The directory you give
        MUST contain all of the subdirectories: MOPITT, CERES, MODIS, MISR, and ASTER. Also, it must be an absolute
        path.

    b.  Change the "PROJDIR" variable to point to your basicFusion directory. Must be an absolute path.

    c.  Save this file and close it.

5. Open up the batchscript.pbs script

    a.  Change the PROJDIR to contain an absolute path to your basicFusion directory.

    b.  Change the EXENAME variable to be what you want the executable to be called.

    c.  Change the OUTFILENAME variable to be what you want the output HDF5 file to be called.

    d.  Change the FILELISTNAME variable to be what the generated file list (generated by generateInput.sh) will be named. This filename must be identical to the filename given by the generateInput.sh script.

    e.  Change the STD_OUTFILENAME variable to be what you want the standard stream output text file to be named. All program streams are redirected to this file.
    
6. cd into the root project directory (where Makefile is located)

    a.  Two different makefiles have been provided. One compiles with static HDF libraries, "staticMakefile". The other compiles with dynamic HDF libraries, "dynamicMakefile".
        It goes without saying that if you want to use dynamic compilation, you must have your HDF libraries built with dynamic libraries enabled.
        Dynamic compiling may be useful for debugging purposes, but otherwise it is fine to use static compilation. cp the desired version of the makefile to "Makefile".

    b.  Change the values of the "INCLUDE" and "LIB" variables to point to the proper HDF libraries. Depending on how you have compiled the HDF libraries (and also whether or not
        you plan on using the Blue Waters HDF modules), you will likely need to update all the variables INCLUDE1, INCLUDE2, LIB1, and LIB2.
        If all of your HDF4 and HDF5 library and include files are under the same directory, it is sufficient to point just the "1" variables to your libraries.

    c.  Because every setup is slightly different, the Makefiles provided may not work as-is! If they don't, make sure that all compiler lines have visibility to the proper
        HDF4 and HDF5 include files, and that the linker has proper visibility to the HDF4 and HDF5 libraries.


---LOCAL MACHINE---

The steps for setting up on a local machine are identical to Blue Waters with the following exceptions:

1. You don't have the Blue Waters modules at your disposal, so you MUST install the HDF4 and 5 libraries on your machine.
2. You must change the "CC" option in your Makefile to gcc, h4cc, or h5cc.

*************************************
*               RUNNING             *
*************************************


                    BATCH SCRIPT
A script has been provided to make compiling and running the program easy. In the basicFusion directory, the script called
"jobSubmit" in the bin directory contains all of the necessary commands to run the program. All you need to do is to run
this script. Here are the steps the script takes (you can also do these steps manually):


Step 1: Load modules
  a. cd into bin
  Execute the command
    . ./loadModules

    Be sure not to leave out the lone period before ./loadModules. This ensures that the modules are loaded onto your current BASH process. Leaving out the period would load them into a child process (which then immediately closes, effectively doing nothing to your current process).

    OR run the commands manually:

    a. module --verbose swap PrgEnv-cray PrgEnv-intel
    b. module --verbose load szip/2.1
    c. module --verbose load cray-hdf5/1.8.16
    d. module --verbose load hdf4/4.2.10

Step 2: Compile the program
  a. cd into root project directory
  b. make

Step 3: Generate the inputFiles.txt file using the generateInput.sh script
  a. cd into bin
  b. ./generateInput.sh [path to 5 instrument directories]

Setp 4: Set environment variables to tell program whether or not to unpack data
  a. export TERRA_DATA_PACK=1 (if you want to just have the packed data like the original HDF4 files)
  b. unset TERRA_DATA_PACK (if you don't want the data to be packed 

Step 4: Submit the executable to the BW job queue (qsub)
  a. qsub -v TERRA_DATA_PACK [path to batchscript.pbs]

Check the status of the job by executing the command "qstat | grep [your BW username]". A flag of "Q" means the job
is enqueued. A flag of "R" means the job is currently executing. A flag of "C" means the job has completed. 
     
     
                    INTERACTIVE

To run the program in interactive mode, follow these steps:

Step 1: Enter interactive mode
    qsub -I -l nodes=1 -l walltime=03:00:00
  You may enter any value in for walltime. This is the time you will be alloted in interactive mode. It is given
  by HH:MM:SS. Larger wall times means you will wait in the queue for a longer period.

Step 2: Load modules
  Execute the first step of the batch script instructions.
  Recall: cd into the bin directory and run (with the lone period): . ./loadModules

Step 3: Compile
  Enter into the root program directory and enter: make

Step 3:
  Generate the inputFiles.txt file using the generateInput.sh script located in bin.
  Give the script a path to the 5 instruments.

Step 4:
  Run the following command:
  aprun -n 1 "$EXE" "$OUTFILEDIR" "$FILELISTDIR" &> "$STD_STREAM_OUTFILE"

  Replacing all of the variables ($EXE, $OUTFILEDIR etc ) with the variables you set in your batchscript.pbs file
 
  Explanation of the aprun command argument by argument:
  -n [number]: number of nodes requested
  $EXE: path to executable program
  $OUTFILEDIR: argument to the executable program of what the output file will be called
  $FILELISTDIR: Argument to executable of the inputFiles.txt file
  &> $STD_STREAM_OUTFILE: Send all program output to this text file
  
The program is now being executed. Use:

qstat | grep [your BW username]

to check the status of your job.
  
 
*****************
***ADDING CODE***
*****************

I have yet to decide how I want to handle multiple collaborators on this project. We will discuss this more in the technical
meeting.

*****************
***FLOW CHARTS***
*****************

High level API flow chart

MOPITT--------------------------------------------------
                                                       |
MISR-------------->-                                   |
                   |                                   |
MODIS-----------hdf.h--->hdf5.h-->convrt to hdf5----->hdf5.h-----> output file
                   ^
CERES----------->---
                   ^
ASTER----------->---                 

PROGRAM STRUCTURE:
Note: Directionality indicates dependency. For instance, main.c is dependent on fileList.txt and outputFileName.
      MOPITTmain.c would be dependent on main.c, MOPITTfiles and libTERRA.c etc.

                                         --------->------MOPITTmain.c------->------|             
                                         |             ^^^MOPITTfiles^^^           | 
                                         |             ^^^libTERRA.c ^^^           |
                                         |-------->-------CERESmain.c------->------|   
                                         |             ^^^CERESfiles^^^            |   
                                         |             ^^^libTERRA.c^^^            |  
      fileList.txt-->main.c-------->-----|-------->-------MODISmain.c------->------|---->outputFile
    outputFileName----^                  |             ^^^MODISfiles^^^            |  
                                         |             ^^^libTERRA.c^^^            |   
                                         |-------->-------ASTERmain.c------->------|   
                                         |             ^^^ASTERfiles^^^            |   
                                         |             ^^^libTERRA.c^^^            |
                                         |-------->-------MISRmain.c-------->------|
                                                       ^^^MISRfiles^^^
                                                      ^^^libTERRA.c^^^
                                    
                                    libTERRA.c
                                     ^     ^
                                     |     |
                                     |     |
                                  hdf5.h  mfhdf.h