-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESA Cluster CLI moving from "CAIO" to "TAP" #84
Conversation
IMPORTANT NOTE: The following CAIO features are not yet available in TAP (according to https://www.cosmos.esa.int/web/csa/caiototap): * Asynchronous product requests (data) * Header requests (counts as data) * Streaming data requests (data) Note: The metadata and inventory requests appears to be drastically changed meaning this commit here is broken (as it depends on listing the available data). More commits will follow. Related to issue #75, kept in a separate branch until ready for merge to devel.
Note: Still not fixed, more commits to follow.
That is to say the following works: `caa_download('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z', 'C3_CP_FGM_5VPS')` Some cleanup needed, and a lot more tests.. Particularly for various metadata download ans this had changed interface a bit more...
Test with download of simple "inventory". `TT=caa_download('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z','inventory')` Result in some problem when parsing it.. But at least it gets downloaded now.. ```matlab Array indices must be positive integers or logical values. Error in caa_download/construct_time_table (line 870) TT.UserData(numel(textLine{1})-1).dataset = []; ``` As before, more commits are to follow.
Now it breaks other textscans as they do not have header but hard-coded index 2:end making the time intervals different sizes... More commits to follow...
Try to fix last commit (16b5464) for other returnTimeTable values. Note this is untested. CSA TAP metadata requests have changed a lot more, so now one must ask for metadata from a specific (sql?) table. For the basic "inventory" it is part of query "select ... FROM+csa.v_dataset_inventory+", but it is not obvious what the new table names are... For instance one of the examples on the transition website https://www.cosmos.esa.int/web/csa/caiototap uses `FROM+csa.v_dataset`. Related to issue #75.
Note: textscan for 'listdesc' with TAP does not work as some entries do not have values for all requested parameters. `TT = caa_download('listdesc');` gives (after intial download): `caalog(700:800) ans = 'es the telemetry mode: 0/1 stays for NM/BM telemetry." C3_CG_MULT_COMP_E_HIA_FGM_EFW_TS_PS,,,, C3_CG_'` Where its first entry (ending with `telemetry."`) and the second dataset_id (`C3_CG_MULT_COMP_E_HIA_FGM_EFW_TS_PS`) gets properly converted into textLine but the non-existing "start_date" breaks everything after this.
Decoding a CSV file resonse which in some of its entries contain "," is not 100% self evident (nor is is when entries contain CR &/ LF) making some lines not at all match the exepected pattern. Also having some entries quoted and some not is too optimal either... Solves part of previous commit, but tStart&tEnd conversion does not yet work as some entries do not have any values for these.
deal with empty time strings (time stamps, if present, are all quoted).
With this commit, `TT = caa_download('list');` appears to work with the new TAP interface. More commits are still to follow.
The Cluster website does not describe a "csa.v_file" table but it does appear to exist (I got a response when asking for only logical_file_id WHERE logical_file_id LIKE 'C3_CP_FGM_5VPS', but was unsuccessful when asking for limited start_date and end_date)... Some reverse engineering (aka 'trial and error testing') is most likely needed in order to get this working as before. Also remove a deprecated "finnished" which was deprecated in favour of "finished" back in 2011 (in commit 7936f4a) and has been marked for deletion ever since. I think just shy of a decade is sufficiently long time.. :)
The following now appears to work ``` tint = irf.tint('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z'); TT = caa_download(tint, 'fileinventory:C3_CP_FGM_5VPS'); ``` or at least it gives the resulting ``` TT.UserData.filename = 'C3_CP_FGM_5VPS__20041230_023322_20050101_113835_V10' ``` More commit are still to follow, (following some more tests to see if things still works as before with the old interface).
The shorthand link http://goo.gl/VkkoI points to http://caa.estec.esa.int/documents/CAA-EST-CMDLINE-0015.pdf but this server no longer exists/replies. The documenation itself is perhaps a bit outdated, but now the link goes to the inteded document at least..
As stated in commit 4662daa4ead, the old flag 'CAA' is deprecated, making the flag 'CSA' superfluous.
metadata 'list' & 'listdesc': csa.v_dataset with columns 'start_date' and 'end_date'. 'inventory': csa.v_dataset_inventory with columns 'start_time' and 'end_time'. 'fileinventory': csa.v_file with columns 'file_start_date' and 'file_end_date'. Related to issue 75.
Fix `TT=caa_download('inventory:dataSet')` and `TT=caa_download('fileinventory:dataSet')`, as part of issue 75.
```matlab >> if ~strfind('https://example.com/foo.bar.zip','.gz'), disp('NON .tar.gz'); else disp('.tar.gz'); end .tar.gz >> if ~strfind('https://example.com/foo.bar.tar.gz','.gz'), disp('NON .tar.gz'); else disp('.tar.gz'); end .tar.gz ``` (tested on Matlab R2021a, as well as R2019b as strfind had changed some in R2020b). It was originally testing for ".gz" or ".zip" (CSA or CAA) but was changed in 4662daa.
Similar to 1fa72e0, a corresponding file appears to be found at https://caa.esac.esa.int instead.
as per CSA website support now also possible for asynchronous data request (still to be tested). Added here is only 'streaming' data requests which appears to work, at least on my laptop with Matlab R2021a.
at least a corresponding cdf file was downloaded.
TimeInterval should be returned as array (Nx2) not cells {Nx2}. Bug was introduced as part of work moving to Cluster TAP (dealing with sometimes empty responses) but impact was limited as it was only ever present in special issue branch.
(It does appear to work regardless of upper/lower case but example page use only upper case for 'RETRIEVAL_TYPE' and lower case for 'product').
deal with empty time strings (time stamps, if present, are all quoted).
With this commit, `TT = caa_download('list');` appears to work with the new TAP interface. More commits are still to follow.
The Cluster website does not describe a "csa.v_file" table but it does appear to exist (I got a response when asking for only logical_file_id WHERE logical_file_id LIKE 'C3_CP_FGM_5VPS', but was unsuccessful when asking for limited start_date and end_date)... Some reverse engineering (aka 'trial and error testing') is most likely needed in order to get this working as before. Also remove a deprecated "finnished" which was deprecated in favour of "finished" back in 2011 (in commit 7936f4a) and has been marked for deletion ever since. I think just shy of a decade is sufficiently long time.. :)
The following now appears to work ``` tint = irf.tint('2005-01-01T05:00:00.000Z/2005-01-01T05:10:00.000Z'); TT = caa_download(tint, 'fileinventory:C3_CP_FGM_5VPS'); ``` or at least it gives the resulting ``` TT.UserData.filename = 'C3_CP_FGM_5VPS__20041230_023322_20050101_113835_V10' ``` More commit are still to follow, (following some more tests to see if things still works as before with the old interface).
The shorthand link http://goo.gl/VkkoI points to http://caa.estec.esa.int/documents/CAA-EST-CMDLINE-0015.pdf but this server no longer exists/replies. The documenation itself is perhaps a bit outdated, but now the link goes to the inteded document at least..
As stated in commit 4662daa4ead, the old flag 'CAA' is deprecated, making the flag 'CSA' superfluous.
metadata 'list' & 'listdesc': csa.v_dataset with columns 'start_date' and 'end_date'. 'inventory': csa.v_dataset_inventory with columns 'start_time' and 'end_time'. 'fileinventory': csa.v_file with columns 'file_start_date' and 'file_end_date'. Related to issue 75.
Fix `TT=caa_download('inventory:dataSet')` and `TT=caa_download('fileinventory:dataSet')`, as part of issue 75.
```matlab >> if ~strfind('https://example.com/foo.bar.zip','.gz'), disp('NON .tar.gz'); else disp('.tar.gz'); end .tar.gz >> if ~strfind('https://example.com/foo.bar.tar.gz','.gz'), disp('NON .tar.gz'); else disp('.tar.gz'); end .tar.gz ``` (tested on Matlab R2021a, as well as R2019b as strfind had changed some in R2020b). It was originally testing for ".gz" or ".zip" (CSA or CAA) but was changed in 4662daa.
Similar to 1fa72e0, a corresponding file appears to be found at https://caa.esac.esa.int instead.
as per CSA website support now also possible for asynchronous data request (still to be tested). Added here is only 'streaming' data requests which appears to work, at least on my laptop with Matlab R2021a.
at least a corresponding cdf file was downloaded.
TimeInterval should be returned as array (Nx2) not cells {Nx2}. Bug was introduced as part of work moving to Cluster TAP (dealing with sometimes empty responses) but impact was limited as it was only ever present in special issue branch.
(It does appear to work regardless of upper/lower case but example page use only upper case for 'RETRIEVAL_TYPE' and lower case for 'product').
Such as: ```matlab TT = caa_download(tint, 'fileinventory:C3_CP_FGM_5VPS'); ```
Note: As pointed out on ESA Cluster page, https://www.cosmos.esa.int/web/csa-guide/data-requests#HeaderRequests, the new TAP interface takes a *long* time for header requests.. But it appears to work at least.. ```matlab caa_meta('create'); ``` succeeded in getting meta data and created a new indexCaaMeta_v3.mat with 2545 entries. Related to issue #75.
for the "streaming" requests, this is in start contrast to the basic gzipped files provided in the old CAIO interface. One example request: ```matlab tint = [982088395, 982088530]; datasetName = 'C1_CP_AUX_SPIN_TIME'; caa_download(tint, datasetName, 'stream'); ``` which in TAP returns file: ```sh $ file delme.cef.gz delme.cef.gz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 17920 ``` which when gunzip'ed is: ```sh $ file delme.cef delme.cef: POSIX tar archive (GNU) ``` which contain: ```sh $ tar -tf delme.cef CSA_Download_20211201_1034/C1_CP_AUX_SPIN_TIME/C1_CP_AUX_SPIN_TIME__20010213_181955_20010213_182210_V141106.cef ``` The old CAIO interface however had: ```sh $ file delme.cef.gz delme.cef.gz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 15164 ``` and when gunzip'ed: ```sh $ file delme.cef delme.cef: ASCII text ``` Related to issue #75.
Re-apply commits in 'iss75_Cluster_TAP' branch on top of latest 'devel' to ease evaluation of these changes and their impact on Cluster EFW processing. https://git-scm.com/docs/git-rebase
To those of you who possibly just got an e-mail about this Pull request, feel free to simply disregard it! The pull request here (#84) had gotten so old that the code related to Cluster processing had diverted a bit from the one in our But when re-applying the changes here on top of your old commits in the |
1. `downloadedFile` did not have any default value. This caused trouble because the new TAP interface return 404 error on some requests which is handled by the `try,catch` but when then when this second output argument is missing it results in new error not always handled correctly upstream in the code. 2. the split'ed `tmpGetRequest` cell does not always have 10 or 12 elements. It is better to simply print all of the existing elements to the log than use hard-coded positions (the main data product requested can have different positons depending on exactly how it was requested). These two bugs where found earlier this week by @JanKarlsson when testing the new ESA Cluster TAP interface as part of solving issue #75, which must now be completed withing less than a month.
Andris old defunct password is now fully removed from irfu-matlab, also add a status output check. Problem detected in tests by @JanKarlsson.
Creating a PR here to keep track of the potentially breaking change of switching the Cluster download functions from using the old
CAIO
to the newTAP
interface.@JanKarlsson (or someone else) please have a look to see that this PR does not mess something up for the Cluster processing before I merge it to our
devel
(and include it in our next release of irfu-matlab).Details are tracked in issue #75.
I have tried to verify that the download works using the simple examples:
But I have not been able to trigger one of the call to
caa_download()
which we use, namely:irfu-matlab/+local/caa_download.m
Line 449 in a29753c
This PR should solve #75 when merged.