Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pandas TypeErrors in hourly_timeseries #117

Closed
3 tasks done
mhidas opened this issue Mar 31, 2020 · 6 comments · Fixed by #151
Closed
3 tasks done

Pandas TypeErrors in hourly_timeseries #117

mhidas opened this issue Mar 31, 2020 · 6 comments · Fixed by #151
Assignees
Labels
bug Something isn't working hourly_timeseries Issues relating to the hourly time series product

Comments

@mhidas
Copy link
Contributor

mhidas commented Mar 31, 2020

A couple of similar errors while trying to create the hourly products in the pipeline for some sites.

  • TypeError: Operation sub between float64 and Timedelta is invalid
TypeError: Operation sub between float64 and Timedelta is invalid
  ...
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 399, in PDresample_by_hour
    df.index = df.index - pd.Timedelta(30, units='m')

occurs for

  • ITFTIS (306 input files, task id e103219b-f5b1-4f4a-a7eb-0e4c4217a491)
  • PH100
  • SAM7DS
  • TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
  ...
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 405, in PDresample_by_hour
    ds_var_mean = ds_var.resample('1H').apply(function_dict[variable]).astype(np.float32)

for

  • SYD100
  • SYD140
  • While fixing this, should also apply the work-around for the pandas.Timedelta units issue, as done in the velocity hourly code (Velocity HOURLY #99 (comment))
@mhidas mhidas added bug Something isn't working hourly_timeseries Issues relating to the hourly time series product labels Mar 31, 2020
@mhidas mhidas self-assigned this Mar 31, 2020
@mhidas
Copy link
Contributor Author

mhidas commented Mar 31, 2020

While fixing this, should also apply the work-around for the pandas.Timedelta units issue, as done in the velocity hourly code (#99 (comment))

@mhidas mhidas changed the title TypeError in hourly_timeseries Pandas TypeError in hourly_timeseries May 20, 2020
@mhidas mhidas changed the title Pandas TypeError in hourly_timeseries Pandas TypeErrors in hourly_timeseries May 20, 2020
@mhidas
Copy link
Contributor Author

mhidas commented May 20, 2020

The full stack traces are

TypeError: Operation sub between float64 and Timedelta is invalid
Traceback (most recent call last):
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodncore/pipeline/handlerbase.py", line 1052, in run
    self.trigger(transition['trigger'])
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 65, in _get_trigger
    return machine.events[trigger_name].trigger(model, *args, **kwargs)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 405, in trigger
    return self.machine._process(func)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 1073, in _process
    return trigger()
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 423, in _trigger
    return self._process(event_data)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 433, in _process
    if trans.execute(event_data):
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 279, in execute
    machine.callback(func, event_data)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 1031, in callback
    func(*event_data.args, **event_data.kwargs)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodndata/moorings/products_handler.py", line 390, in preprocess
    self._make_hourly_timeseries()
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodndata/moorings/products_handler.py", line 300, in _make_hourly_timeseries
    **self.product_common_kwargs)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 507, in hourly_aggregator
    df_temp = PDresample_by_hour(df_temp, function_dict, function_stats)  # do the magic
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 399, in PDresample_by_hour
    df.index = df.index - pd.Timedelta(30, units='m')
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 121, in index_arithmetic_method
    return self._evaluate_with_timedelta_like(other, op)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/indexes/base.py", line 4980, in _evaluate_with_timedelta_like
    other=type(other).__name__))

and

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Traceback (most recent call last):
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodncore/pipeline/handlerbase.py", line 1052, in run
    self.trigger(transition['trigger'])
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 65, in _get_trigger
    return machine.events[trigger_name].trigger(model, *args, **kwargs)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 405, in trigger
    return self.machine._process(func)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 1073, in _process
    return trigger()
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 423, in _trigger
    return self._process(event_data)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 433, in _process
    if trans.execute(event_data):
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 279, in execute
    machine.callback(func, event_data)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/transitions/core.py", line 1031, in callback
    func(*event_data.args, **event_data.kwargs)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodndata/moorings/products_handler.py", line 390, in preprocess
    self._make_hourly_timeseries()
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodndata/moorings/products_handler.py", line 300, in _make_hourly_timeseries
    **self.product_common_kwargs)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 507, in hourly_aggregator
    df_temp = PDresample_by_hour(df_temp, function_dict, function_stats)  # do the magic
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/aodntools/timeseries_products/hourly_timeseries.py", line 405, in PDresample_by_hour
    ds_var_mean = ds_var.resample('1H').apply(function_dict[variable]).astype(np.float32)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/generic.py", line 8155, in resample
    base=base, key=on, level=level)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/resample.py", line 1250, in resample
    return tg._get_resampler(obj, kind=kind)
  File "/mnt/ebs/pipeline/lib/python3.5/site-packages/pandas/core/resample.py", line 1380, in _get_resampler
    "but got an instance of %r" % type(ax).__name__)

@mphemming
Copy link

mphemming commented Feb 19, 2021

Thought this might be a good place to mention that I ran into an error when creating hourly timeseries products locally. Using the latest code on Github, I had to change output_dir=args.output_path on line 579 in 'hourly_timeseries.py' to output_dir=args.output_dir for the code to work. Easy fix but worth mentioning.

I also get warnings for function stringtochar() on lines 298 and 299 of 'aggregated_timeseries.py'. The warning suggests using function tobytes() instead.

@mhidas
Copy link
Contributor Author

mhidas commented Feb 22, 2021

Thanks @mphemming - your feedback is welcome. However these are unrelated to this thread, so I've moved them to separate issues: #135 #136

@mhidas
Copy link
Contributor Author

mhidas commented Apr 20, 2022

The original errors reported above occur under Python 3.5
In Python 3.8 when running the code on the same data we get different errors from different parts of the code.
E.g. for site SAM7DS

test_aodntools/timeseries_products/test_hourly_timeseries.py:125 (TestHourlyTimeseriesDebugging.test_typeerror)
self = <xarray.backends.netCDF4_.NetCDF4ArrayWrapper object at 0x7f1ab30a1700>
key = (array([], dtype=int64), slice(None, None, None), slice(None, None, None))

    def _getitem(self, key):
        if self.datastore.is_remote:  # pragma: no cover
            getitem = functools.partial(robust_getitem, catch=RuntimeError)
        else:
            getitem = operator.getitem
    
        try:
            with self.datastore.lock:
                original_array = self.get_array(needs_lock=False)
>               array = getitem(original_array, key)

../../python-aodntools-py38/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:106: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???

src/netCDF4/_netCDF4.pyx:4383: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

count = array([], shape=(0, 1, 1, 3), dtype=int64)

    def _out_array_shape(count):
        """Return the output array shape given the count array created by getStartCountStride"""
    
        s = list(count.shape[:-1])
        out = []
    
        for i, n in enumerate(s):
            if n == 1:
>               c = count[..., i].ravel()[0] # All elements should be identical.
E               IndexError: index 0 is out of bounds for axis 0 with size 0

../../python-aodntools-py38/lib/python3.8/site-packages/netCDF4/utils.py:458: IndexError

But also...

During handling of the above exception, another exception occurred:

self = <test_aodntools.timeseries_products.test_hourly_timeseries.TestHourlyTimeseriesDebugging testMethod=test_typeerror>

    def test_typeerror(self):
>       output_file, bad_files = hourly_aggregator(files_to_aggregate=SAM7_LIST,
                                                   site_code='SAM7DS',
                                                   qcflags=(1, 2),
                                                   input_dir=TEST_ROOT,
                                                   output_dir='/tmp'
                                                   )

test_hourly_timeseries.py:127: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../aodntools/timeseries_products/hourly_timeseries.py:413: in hourly_aggregator
    nc_clean = in_water(nc)  # in water only

../../aodntools/timeseries_products/hourly_timeseries.py:79: in in_water
    return nc.where((TIME >= time_deployment_start) & (TIME <= time_deployment_end), drop=True)

...

IndexError: The indexing operation you are attempting to perform is not valid on netCDF4.Variable object. Try loading your data into memory first by calling .load().

../../python-aodntools-py38/lib/python3.8/site-packages/xarray/backends/netCDF4_.py:116: IndexError

@mhidas
Copy link
Contributor Author

mhidas commented Apr 21, 2022

The first error ("Operation sub between float64 and Timedelta is invalid" in Py3.5) only occurs for files where all the data are flagged as bad, which results in trying to process an empty array.
The error in Py3.8 happens for a similar reason - all the data are out-of-water, i.e. ouside the range set by time_deployment_start and time_deployment_end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hourly_timeseries Issues relating to the hourly time series product
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants