Determine memory requirements for download #1

johnbradley · 2022-09-15T15:32:03Z

The current download logic may be importing the entire file into memory here:

Lines 73 to 74 in 7a67502

    
           response = data_api.get_datafile(file_id) 
        
           f.write(response.content)

Determine if there is a streaming approach to reduce requirements.

johnbradley · 2022-09-15T17:18:04Z

The data_api.get_datafile(file_id) call will download the entire file and keep it in memory. There is an issue requesting the ability to add the streaming flag to avoid this un-necessary memory allocation: gdcc/pyDataverse#49

Replaces pyDataverse DataAccessApi.get_datafile call with equivalent code that streams the response instead of fetching the entire file into memory. The StreamingDataAccessApi object can be removed once this pyDataverse issue is resolved: gdcc/pyDataverse#49 Fixes #1

johnbradley · 2023-05-09T14:43:33Z

I am looking into using libcurl to handle streaming uploading and downloading via pycurl.
I had trouble installing pycurl with pip in multiple environments. It would install fine, but fail at runtime.
So using pycurl would likely require conda.

Uses `curl` command line tool to stream files when uploading and downloading since pyDataverse keeps the entire file in memory when uploading and downloading. Fixes #1 Upgrades github checkout to v3 to fix nodejs warning for v2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Determine memory requirements for download #1

Determine memory requirements for download #1

johnbradley commented Sep 15, 2022

johnbradley commented Sep 15, 2022

johnbradley commented May 9, 2023

Determine memory requirements for download #1

Determine memory requirements for download #1

Comments

johnbradley commented Sep 15, 2022

johnbradley commented Sep 15, 2022

johnbradley commented May 9, 2023