fail to "vectorize" velocity_from_position #68

selipot · 2023-01-07T20:25:35Z

I am attempting to apply velocity_from_position to xarray.DataArrays of lon, lat, and time. I have been following a tutorial for a similar situation. With the following ds Dataset:

ds.info()
xarray.Dataset {
dimensions:
	trajectory = 593297 ;
	obs = 1440 ;

variables:
	float64 time(trajectory, obs) ;
	float32 lat(trajectory, obs) ;
	float32 lon(trajectory, obs) ;
	int32 obs(obs) ;
	int64 trajectory(trajectory) ;
}

I can easily do:

u,v = velocity_from_position(ds.lon.isel(trajectory=0),ds.lat.isel(trajectory=0),ds.time.isel(trajectory=0))

or

u2,v2 = xr.apply_ufunc(
    velocity_from_position,
    ds.lon.isel(trajectory=0),
    ds.lat.isel(trajectory=0),
    ds.time.isel(trajectory=0),
    input_core_dims=[["obs"], ["obs"], ["obs"]],
    output_core_dims=[["obs"], ["obs"]],
    dask="allowed",
)

but the following fails:

u2,v2 = xr.apply_ufunc(
    velocity_from_position,  # first the function
    ds.lon.isel(trajectory=slice(0,10)),
    ds.lat.isel(trajectory=slice(0,10)),
    ds.time.isel(trajectory=slice(0,10)),
    input_core_dims=[["obs"], ["obs"], ["obs"]],
    output_core_dims=[["obs"],["obs"]],
    dask="allowed",
    vectorize=True,
)

and the bottom line of the error is

File ~/miniconda3/envs/research/lib/python3.10/site-packages/clouddrift/analysis.py:65, in velocity_from_position(x, y, time, coord_system, difference_scheme)
     57 # Compute dx, dy, and dt
     58 if difference_scheme == "forward":
     59 
     60     # All values except the ending boundary value are computed using the
   (...)
     63 
     64     # Time
---> 65     dt[:-1] = np.diff(time)
     66     dt[-1] = dt[-2]
     68     # Space

ValueError: could not broadcast input array from shape (10,1439) into shape (9,1440)

So yes, I get the error but I don't understand if the fix is to apply ufunc differently or make velocity_from_position more flexible?

The text was updated successfully, but these errors were encountered:

milancurcic · 2023-01-07T21:06:38Z

It should work if you apply the function to one trajectory at a time. I think the function could be made to correctly handle positions and times as 2-d arrays.

selipot · 2023-01-07T22:05:21Z

Indeed, it does work as shown above but it should be able to be fed to xarray.apply_ufunc. So I do not know not know if it is an issue with the function or my code. Maybe @philippemiron has some insight?

milancurcic · 2023-01-07T22:08:42Z

I haven't used appy_ufunc before, but your code may very well be correct. velocity_from_position definitely does not work on 2-d arrays in its current form. We could make it so, we just need to discuss and decide whether there's a good use case for that, considering that Lagrangian data will be ragged arrays most of the time.

selipot · 2023-01-07T22:17:19Z

I am not sure that's the solution. The example linked above examines the case of a function that takes 1-d arrays as an input, like our function.

milancurcic · 2023-01-07T22:24:01Z

     64     # Time
---> 65     dt[:-1] = np.diff(time)
     66     dt[-1] = dt[-2]
     68     # Space

ValueError: could not broadcast input array from shape (10,1439) into shape (9,1440)

The issue here when using our function is that np.diff() differentiates along the last (fastest-varying) axis, whereas the slice syntax (i.e. dt[:-1]) slices along the first (slowest-varying) axis. In the case of 1-d array inputs, the first and last axes are the same. But in the case of a 2-d array inputs, we get a shape mismatch.

selipot · 2023-01-08T13:55:45Z

To be more useful and widely applicable the function should take an argument indicating the dimension along which to conduct the time derivative operation so that it is applicable to arrays of any dimension. In addition, it should also be able to handle ragged arrays of position.

philippemiron · 2023-01-09T14:47:41Z

It's been a few months since I played with apply_ufunc. If I remember correctly from the EarthCube Notebook, the functions are applied by chunk; I'm not sure it would work as expected using slice...

I believe the easiest would be to use Awkward Array and perform a simple loop:

u,v = ak.zeros_like(lon), ak.zeros_like(lat)   # with lon,lat ak.Array
for i in range(0, len(lon)):
   u[i], v[i] = velocity_from_position(lon[i], lat[i])

selipot · 2023-01-09T15:23:25Z

I'll try that. But will it be fast and use dask/parallel computing? As an example I have 5.5M 60-day long hourly trajectories ... so perhaps if I chunk per trajectories? In general we need the analysis function of clouddrift to be "vectorizable" in a seamless way for users of xarray and awkward.

milancurcic · 2023-01-09T15:59:19Z

I haven't used Dask, but this looks like should do the job: https://examples.dask.org/applications/embarrassingly-parallel.html. On its own, I don't think Philippe's for-loop snippet will run in parallel.

selipot · 2023-01-10T19:44:39Z

More developments: I found that the following works if the input DataArrays are not chunked:

u,v = xr.apply_ufunc(
    velocity_from_position,  # first the function
    ds.lon.isel(trajectory=slice(0,10000)),  # now arguments in the order expected by 'velocity_from_position'
    ds.lat.isel(trajectory=slice(0,10000)),
    ds.time.isel(trajectory=slice(0,10000)),
    #input_core_dims=[[], ['time']],
    input_core_dims=[["obs"],["obs"], ["obs"]],
    output_core_dims=[["obs"], ["obs"]],
    vectorize=True,
)

but if the dataArrays are chunked then we need the following : (note the option dask="allowed" and the .load() function)

u2,v2 = xr.apply_ufunc(
    velocity_from_position,  # first the function
    dsc.lon.isel(trajectory=slice(0,10000)).load(),  # now arguments in the order expected by 'velocity_from_position'
    dsc.lat.isel(trajectory=slice(0,10000)).load(),
    dsc.time.isel(trajectory=slice(0,10000)).load(),
    #input_core_dims=[[], ['time']],
    input_core_dims=[["obs"],["obs"], ["obs"]],
    output_core_dims=[["obs"], ["obs"]],
    dask="allowed",
    vectorize=True,
)

Anyone understand why the load function is needed?

Note that the above applies to a subset of my example dataset, 10000 out of 593297 trajectories. If I try to apply this to the entire dataset, the first option succeeds on my desktop but the chunked option fails with a local cluster, which I am finding odd ...

selipot · 2023-01-12T22:26:07Z

Note that the chunked case above works only if the size of the chunks in the trajectory dimension is 1. That is if the function is fed 1-d array arguments I think. So the way forward appears to modify our function to handle n-d array.

milancurcic · 2023-01-13T06:04:37Z

This is now implemented in the main branch. Give it a try. And I will try running directly with Dask (without the Xarray wrapper) and see if that produces the expected result.

milancurcic · 2023-01-13T06:33:33Z

FWIW, I ran velocity_from_position on inputs of shape (600, 1000) (traj, obs) using dask.delayed on my 6-core laptop. The result is the same (and correct) as if I run the function without Dask, which is a good outcome. I don't get any speed up, though, and actually the Dask variant is slightly slower. It's possible that the inputs are too small to yield any benefit from parallelism.

selipot · 2023-01-13T10:22:49Z

@milancurcic Can you share that test code please?

philippemiron · 2023-01-13T14:02:55Z

I think it should more efficient to use Dask.bag instead of delayed due to the the high number of items. See Avoid too many tasks in the Dask best practices.

milancurcic · 2023-01-14T05:17:05Z

Here's my original snippet.

from clouddrift.analysis import velocity_from_position
import dask
import numpy as np

num_obs = 1000
num_traj = 6000

lon = np.reshape(np.tile(np.linspace(-180, 180, num_obs), num_traj), (num_traj, num_obs))

lat = np.zeros((lon.shape))
for n in range(num_traj):
    lat[n] = n / num_traj * 60 # from 0 to 60N

time = np.reshape(np.tile(np.linspace(0, 1e7, num_obs), num_traj), (num_traj, num_obs))

client = dask.distributed.Client(threads_per_worker=6, n_workers=1)
velocity_from_position_parallel = dask.delayed(velocity_from_position)
res = velocity_from_position_parallel(lon, lat, time)
u, v = res.compute()

Too many tasks is not the problem here, actually the opposite. I make only one function call with dask.delayed, and that doesn't seem to be what it's made for. After reading more about dask.delayed, it seems to be made for running multiple functions concurrently (sigh). Despite this, the dask.delayed variant of velocity_from_position runs twice as fast on my laptop vs. the serial execution of the function if I increase num_traj from 600 to 6000. I don't understand why that is.

I'll experiment with other dask idioms as well as dask+xarray.

philippemiron · 2023-01-16T03:19:35Z

I initially thought you were calling the function num_traj times. I think the use case of dask would be when the number of trajectories is two high to be performed all at once.

So something like this could replace the calculation part:

n_chunks = 10

delayed_results = []
for i in range(0, n_chunks-1):
    arr = range(i*int(num_traj/n_chunks), min(num_traj, (i+1)*int(num_traj/n_chunks))) 
    res = dask.delayed(velocity_from_position)(lon[arr], 
                                               lat[arr], 
                                               time[arr])
    delayed_results.append(res)

results = dask.compute(delayed_results)

I don't have super large dataset to test this with, but maybe @selipot could give it a try?

milancurcic · 2023-01-16T04:04:32Z

You're right. I naively assumed that Dask would chunk the data for me under the hood.

philippemiron · 2023-03-25T17:44:52Z

With apply_ragged this should be possible right? cc @selipot

selipot · 2023-05-17T19:16:19Z

Yes I think this issue is old before apply_ragged was written. But I don't fully understand how apply_ragged is parallelized. No chunk or dask involved?

philippemiron · 2023-05-17T19:42:31Z

The function is executed on a different thread for each trajectory and the results are reassembled afterwards. See this section,

# parallel execution
with futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
    res = executor.map(lambda x: func(*x, **kwargs), iter)
# concatenate the outputs
res = [item if isinstance(item, Iterable) else [item] for item in res]

max_workers = None by default which is set to: Default value of max_workers is changed to min(32, os.cpu_count() + 4), according to the concurrent.futures doc. But you can also set the parameter.

def apply_ragged(
    func: callable,
    arrays: list[np.ndarray],
    rowsize: list[int],
    *args: tuple,
    max_workers: int = None,
    **kwargs: dict,
)

milancurcic · 2023-05-17T20:10:32Z

We can benchmark, but my understanding is that the concurrency (different from parallelism) used in apply_ragged will maybe parallelize some functions sometimes. Functions that do slow stuff like I/O, downloading data (and perhaps even plotting) are likely to run faster. CPU-bound stuff (numerical calculations) are likely to run similarly or slower.

I think this issue intended to ask how to run things truly in parallel, i.e. distribute the computation on different CPUs (whether via Dask, MPI, multiprocessing, or otherwise). I don't think we have a solution yet, but getting it to run with Dask is probably the lowest-hanging fruit among the possible approaches.

(and vectorization is yet a different concept from concurrency and parallelization; it's about putting multiple array elements into a long register and running one operation on all of them.)

selipot assigned milancurcic Jan 7, 2023

This was referenced Jan 8, 2023

velocity_from_position: Handle n-d arrays #69

Closed

velocity_from_position: Handle ragged array #70

Closed

kevinsantana11 added bug Something isn't working question Further information is requested labels Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fail to "vectorize" velocity_from_position #68

fail to "vectorize" velocity_from_position #68

selipot commented Jan 7, 2023

milancurcic commented Jan 7, 2023

selipot commented Jan 7, 2023

milancurcic commented Jan 7, 2023

selipot commented Jan 7, 2023

milancurcic commented Jan 7, 2023

selipot commented Jan 8, 2023

philippemiron commented Jan 9, 2023

selipot commented Jan 9, 2023

milancurcic commented Jan 9, 2023

selipot commented Jan 10, 2023 •

edited

Loading

selipot commented Jan 12, 2023

milancurcic commented Jan 13, 2023

milancurcic commented Jan 13, 2023

selipot commented Jan 13, 2023 •

edited

Loading

philippemiron commented Jan 13, 2023 •

edited

Loading

milancurcic commented Jan 14, 2023

philippemiron commented Jan 16, 2023

milancurcic commented Jan 16, 2023

philippemiron commented Mar 25, 2023 •

edited

Loading

selipot commented May 17, 2023

philippemiron commented May 17, 2023 •

edited

Loading

milancurcic commented May 17, 2023

fail to "vectorize" velocity_from_position #68

fail to "vectorize" velocity_from_position #68

Comments

selipot commented Jan 7, 2023

milancurcic commented Jan 7, 2023

selipot commented Jan 7, 2023

milancurcic commented Jan 7, 2023

selipot commented Jan 7, 2023

milancurcic commented Jan 7, 2023

selipot commented Jan 8, 2023

philippemiron commented Jan 9, 2023

selipot commented Jan 9, 2023

milancurcic commented Jan 9, 2023

selipot commented Jan 10, 2023 • edited Loading

selipot commented Jan 12, 2023

milancurcic commented Jan 13, 2023

milancurcic commented Jan 13, 2023

selipot commented Jan 13, 2023 • edited Loading

philippemiron commented Jan 13, 2023 • edited Loading

milancurcic commented Jan 14, 2023

philippemiron commented Jan 16, 2023

milancurcic commented Jan 16, 2023

philippemiron commented Mar 25, 2023 • edited Loading

selipot commented May 17, 2023

philippemiron commented May 17, 2023 • edited Loading

milancurcic commented May 17, 2023

selipot commented Jan 10, 2023 •

edited

Loading

selipot commented Jan 13, 2023 •

edited

Loading

philippemiron commented Jan 13, 2023 •

edited

Loading

philippemiron commented Mar 25, 2023 •

edited

Loading

philippemiron commented May 17, 2023 •

edited

Loading