Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESA Cluster CLI moving from "CAIO" to "TAP" #84

Merged
merged 190 commits into from
Mar 21, 2022

Conversation

thomas-nilsson-irfu
Copy link
Member

Creating a PR here to keep track of the potentially breaking change of switching the Cluster download functions from using the old CAIO to the new TAP interface.

@JanKarlsson (or someone else) please have a look to see that this PR does not mess something up for the Cluster processing before I merge it to our devel (and include it in our next release of irfu-matlab).

Details are tracked in issue #75.

I have tried to verify that the download works using the simple examples:

irf;
tint = irf.tint('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z');
dataSet = 'C3_CP_FGM_5VPS';
TT = caa_download(tint, 'fileinventory:C3_CP_FGM_5VPS');
TT = caa_download('list');
TT = caa_download('listdesc');
TT = caa_download('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z', 'inventory');
caa_download('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z', 'C3_CP_FGM_5VPS');
TT = caa_download(tint, 'list:C3_CP_FGM*');
caa_download(tint, 'C1_CP_EFW_L2_E', 'nowildcard');
caa_download(tint, 'C?_CP_FGM_5VPS');
TT = caa_download(['list:' dataSet]);
TTfileList = caa_download(['fileinventory:' dataSet]);
TT = caa_download(['inventory:' dataSet]);
caa_download(tint, dataSet, 'stream');
caa_download(tint, dataSet, 'schedule');
caa_meta(create);

But I have not been able to trigger one of the call to caa_download() which we use, namely:

download_status=caa_download(TTRequest.UserData(iSubmitted).Downloadfile,'nolog',['downloadDirectory=' dataDir]);

This PR should solve #75 when merged.

thomas-nilsson-irfu and others added 30 commits June 2, 2021 16:09
IMPORTANT NOTE: The following CAIO features are not yet available in
TAP (according to https://www.cosmos.esa.int/web/csa/caiototap):
* Asynchronous product requests (data)
* Header requests (counts as data)
* Streaming data requests (data)

Note: The metadata and inventory requests appears to be drastically
changed meaning this commit here is broken (as it depends on listing the
available data). More commits will follow.

Related to issue #75, kept in a separate branch until ready for merge to
devel.
Note: Still not fixed, more commits to follow.
That is to say the following works:
`caa_download('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z', 'C3_CP_FGM_5VPS')`

Some cleanup needed, and a lot more tests.. Particularly for various
metadata download ans this had changed interface a bit more...
Test with download of simple "inventory".
`TT=caa_download('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z','inventory')`

Result in some problem when parsing it.. But at least it gets
downloaded now..
```matlab
Array indices must be positive integers or logical values.

Error in caa_download/construct_time_table (line 870)
        TT.UserData(numel(textLine{1})-1).dataset = [];
```

As before, more commits are to follow.
Now it breaks other textscans as they do not have header but hard-coded
index 2:end making the time intervals different sizes... More commits to
follow...
Try to fix last commit (16b5464) for
other returnTimeTable values. Note this is untested.

CSA TAP metadata requests have changed a lot more, so now one must ask
for metadata from a specific (sql?) table. For the basic "inventory" it
is part of query "select ... FROM+csa.v_dataset_inventory+", but it is
not obvious what the new table names are...
For instance one of the examples on the transition website
https://www.cosmos.esa.int/web/csa/caiototap uses `FROM+csa.v_dataset`.

Related to issue #75.
Note: textscan for 'listdesc' with TAP does not work as some entries do
not have values for all requested parameters.
`TT = caa_download('listdesc');`
gives (after intial download):

`caalog(700:800)
ans =
    'es the telemetry mode: 0/1 stays for NM/BM telemetry."
     C3_CG_MULT_COMP_E_HIA_FGM_EFW_TS_PS,,,,
     C3_CG_'`

Where its first entry (ending with `telemetry."`) and the second
dataset_id (`C3_CG_MULT_COMP_E_HIA_FGM_EFW_TS_PS`) gets properly
converted into textLine but the non-existing "start_date" breaks
everything after this.
Decoding a CSV file resonse which in some of its entries contain "," is
not 100% self evident (nor is is when entries contain CR &/ LF) making
some lines not at all match the exepected pattern. Also having some
entries quoted and some not is too optimal either...
Solves part of previous commit, but tStart&tEnd conversion does not yet
work as some entries do not have any values for these.
deal with empty time strings (time stamps, if present, are all quoted).
With this commit, `TT = caa_download('list');` appears to work with the
new TAP interface. More commits are still to follow.
The Cluster website does not describe a "csa.v_file" table but it does
appear to exist (I got a response when asking for only
logical_file_id WHERE logical_file_id LIKE 'C3_CP_FGM_5VPS', but was
unsuccessful when asking for limited start_date and end_date)...
Some reverse engineering (aka 'trial and error testing') is most likely
needed in order to get this working as before.

Also remove a deprecated "finnished" which was deprecated in favour of
"finished" back in 2011 (in commit 7936f4a) and has been marked for
deletion ever since. I think just shy of a decade is sufficiently long
time.. :)
The following now appears to work
```
tint = irf.tint('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z');
TT = caa_download(tint, 'fileinventory:C3_CP_FGM_5VPS');
```
or at least it gives the resulting
```
TT.UserData.filename =
'C3_CP_FGM_5VPS__20041230_023322_20050101_113835_V10'
```

More commit are still to follow, (following some more tests to see if
things still works as before with the old interface).
The shorthand link http://goo.gl/VkkoI points to
http://caa.estec.esa.int/documents/CAA-EST-CMDLINE-0015.pdf but this
server no longer exists/replies. The documenation itself is perhaps a
bit outdated, but now the link goes to the inteded document at least..
As stated in commit 4662daa4ead, the old flag 'CAA' is deprecated,
making the flag 'CSA' superfluous.
metadata
'list' & 'listdesc': csa.v_dataset with columns
      'start_date' and 'end_date'.
'inventory': csa.v_dataset_inventory with columns
      'start_time' and 'end_time'.
'fileinventory': csa.v_file with columns
      'file_start_date' and 'file_end_date'.

Related to issue 75.
Fix `TT=caa_download('inventory:dataSet')` and
    `TT=caa_download('fileinventory:dataSet')`, as part of issue 75.
```matlab
>> if ~strfind('https://example.com/foo.bar.zip','.gz'), disp('NON .tar.gz'); else disp('.tar.gz'); end
.tar.gz
>> if ~strfind('https://example.com/foo.bar.tar.gz','.gz'), disp('NON .tar.gz'); else disp('.tar.gz'); end
.tar.gz
```
(tested on Matlab R2021a, as well as R2019b as strfind had changed some
in R2020b). It was originally testing for ".gz" or ".zip" (CSA or CAA)
but was changed in 4662daa.
Similar to 1fa72e0, a corresponding
file appears to be found at https://caa.esac.esa.int instead.
as per CSA website support now also possible for asynchronous data
request (still to be tested).

Added here is only 'streaming' data requests which appears to work,
at least on my laptop with Matlab R2021a.
at least a corresponding cdf file was downloaded.
TimeInterval should be returned as array (Nx2) not cells {Nx2}.

Bug was introduced as part of work moving to Cluster TAP (dealing with
sometimes empty responses) but impact was limited as it was only ever
present in special issue branch.
(It does appear to work regardless of upper/lower case but example page
use only upper case for 'RETRIEVAL_TYPE' and lower case for 'product').
deal with empty time strings (time stamps, if present, are all quoted).
With this commit, `TT = caa_download('list');` appears to work with the
new TAP interface. More commits are still to follow.
The Cluster website does not describe a "csa.v_file" table but it does
appear to exist (I got a response when asking for only
logical_file_id WHERE logical_file_id LIKE 'C3_CP_FGM_5VPS', but was
unsuccessful when asking for limited start_date and end_date)...
Some reverse engineering (aka 'trial and error testing') is most likely
needed in order to get this working as before.

Also remove a deprecated "finnished" which was deprecated in favour of
"finished" back in 2011 (in commit 7936f4a) and has been marked for
deletion ever since. I think just shy of a decade is sufficiently long
time.. :)
The following now appears to work
```
tint = irf.tint('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z');
TT = caa_download(tint, 'fileinventory:C3_CP_FGM_5VPS');
```
or at least it gives the resulting
```
TT.UserData.filename =
'C3_CP_FGM_5VPS__20041230_023322_20050101_113835_V10'
```

More commit are still to follow, (following some more tests to see if
things still works as before with the old interface).
The shorthand link http://goo.gl/VkkoI points to
http://caa.estec.esa.int/documents/CAA-EST-CMDLINE-0015.pdf but this
server no longer exists/replies. The documenation itself is perhaps a
bit outdated, but now the link goes to the inteded document at least..
As stated in commit 4662daa4ead, the old flag 'CAA' is deprecated,
making the flag 'CSA' superfluous.
metadata
'list' & 'listdesc': csa.v_dataset with columns
      'start_date' and 'end_date'.
'inventory': csa.v_dataset_inventory with columns
      'start_time' and 'end_time'.
'fileinventory': csa.v_file with columns
      'file_start_date' and 'file_end_date'.

Related to issue 75.
Fix `TT=caa_download('inventory:dataSet')` and
    `TT=caa_download('fileinventory:dataSet')`, as part of issue 75.
```matlab
>> if ~strfind('https://example.com/foo.bar.zip','.gz'), disp('NON .tar.gz'); else disp('.tar.gz'); end
.tar.gz
>> if ~strfind('https://example.com/foo.bar.tar.gz','.gz'), disp('NON .tar.gz'); else disp('.tar.gz'); end
.tar.gz
```
(tested on Matlab R2021a, as well as R2019b as strfind had changed some
in R2020b). It was originally testing for ".gz" or ".zip" (CSA or CAA)
but was changed in 4662daa.
Similar to 1fa72e0, a corresponding
file appears to be found at https://caa.esac.esa.int instead.
as per CSA website support now also possible for asynchronous data
request (still to be tested).

Added here is only 'streaming' data requests which appears to work,
at least on my laptop with Matlab R2021a.
at least a corresponding cdf file was downloaded.
TimeInterval should be returned as array (Nx2) not cells {Nx2}.

Bug was introduced as part of work moving to Cluster TAP (dealing with
sometimes empty responses) but impact was limited as it was only ever
present in special issue branch.
(It does appear to work regardless of upper/lower case but example page
use only upper case for 'RETRIEVAL_TYPE' and lower case for 'product').
Such as:
```matlab
TT = caa_download(tint, 'fileinventory:C3_CP_FGM_5VPS');
```
Note: As pointed out on ESA Cluster page,
https://www.cosmos.esa.int/web/csa-guide/data-requests#HeaderRequests,
the new TAP interface takes a *long* time for header requests.. But it
appears to work at least..
```matlab
caa_meta('create');
```
succeeded in getting meta data and created a new indexCaaMeta_v3.mat
with 2545 entries.

Related to issue #75.
for the "streaming" requests, this is in start contrast to the basic
gzipped files provided in the old CAIO interface.
One example request:
```matlab
tint = [982088395, 982088530];
datasetName = 'C1_CP_AUX_SPIN_TIME';
caa_download(tint, datasetName, 'stream');
```

which in TAP returns file:
```sh
$ file delme.cef.gz
delme.cef.gz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 17920
```
which when gunzip'ed is:
```sh
$ file delme.cef
delme.cef: POSIX tar archive (GNU)
```
which contain:
```sh
$ tar -tf delme.cef
CSA_Download_20211201_1034/C1_CP_AUX_SPIN_TIME/C1_CP_AUX_SPIN_TIME__20010213_181955_20010213_182210_V141106.cef
```

The old CAIO interface however had:
```sh
$ file delme.cef.gz
delme.cef.gz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 15164
```
and when gunzip'ed:
```sh
$ file delme.cef
delme.cef: ASCII text
```

Related to issue #75.
Re-apply commits in 'iss75_Cluster_TAP' branch on top of latest 'devel'
to ease evaluation of these changes and their impact on Cluster EFW
processing.

https://git-scm.com/docs/git-rebase
@thomas-nilsson-irfu
Copy link
Member Author

To those of you who possibly just got an e-mail about this Pull request, feel free to simply disregard it!

The pull request here (#84) had gotten so old that the code related to Cluster processing had diverted a bit from the one in our devel branch so I had to re-apply all the changes in the special iss75_Cluster_TAP branch on top of the latest devel branch changes in order for @JanKarlsson to properly test the code and its possible impact on Cluster processing.

But when re-applying the changes here on top of your old commits in the devel branch GitHub simply interprets your commits as part of the pull request here and adds all of you to the conversation...

1. `downloadedFile` did not have any default value. This caused trouble
because the new TAP interface return 404 error on some requests which
is handled by the `try,catch` but when then when this second output
argument is missing it results in new error not always handled correctly
upstream in the code.

2. the split'ed `tmpGetRequest` cell does not always have 10 or 12
elements. It is better to simply print all of the existing elements to
the log than use hard-coded positions (the main data product requested
can have different positons depending on exactly how it was requested).

These two bugs where found earlier this week by @JanKarlsson when
testing the new ESA Cluster TAP interface as part of solving issue #75,
which must now be completed withing less than a month.
Andris old defunct password is now fully removed from irfu-matlab, also
add a status output check. Problem detected in tests by @JanKarlsson.
@thomas-nilsson-irfu thomas-nilsson-irfu merged commit 3fce852 into devel Mar 21, 2022
@thomas-nilsson-irfu thomas-nilsson-irfu deleted the iss75_Cluster_TAP branch March 21, 2022 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.