Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'float' object cannot be interpreted as an integer #2224

Open
1 task
OlegBEZb opened this issue Dec 10, 2024 · 7 comments
Open
1 task

TypeError: 'float' object cannot be interpreted as an integer #2224

OlegBEZb opened this issue Dec 10, 2024 · 7 comments

Comments

@OlegBEZb
Copy link

Description

I run kedro viz and as soon as I press onto one of the datasets persisted in memory, the application fails with the TypeError: 'float' object cannot be interpreted as an integer

Context

This issue makes all the consecutive datasets not available for a preview. Apart from that, there is no direct way to understand which exact column of the dataset persisted as parquet doesn't fit.

Steps to Reproduce

  1. Create a pipeline of one node producing a dataframe. Dataframe may contain questionable pureness of the columns but definitely serialisable to parquet. kedro run pipeline doesn't throw any errors and the dataset actually exists in the data folder and easily readable using catalog.load
  2. run kedro viz
  3. press on the dataset in the application

Expected Result

Preview is available

Actual Result

Long error starting with

ERROR    Exception in ASGI application                                                           httptools_impl.py:414
                                                                                                                                          
                             ╭───────────────────────── Traceback (most recent call last) ─────────────────────────╮                      
                             │ /Users/username/Library/Caches/pypoetry/virtualenvs/venv_name │                      
                             │ 3.11/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py:409 in   │                      
                             │ run_asgi                                

and ending with

/Users/username/Library/Caches/pypoetry/virtualenvs/venv_name │                      
                             │ 3.11/lib/python3.11/site-packages/pydantic/type_adapter.py:527 in dump_python       │                      
                             │                                                                                     │                      
                             │   524 │   │   Returns:                                                              │                      
                             │   525 │   │   │   The serialized object.                                            │                      
                             │   526 │   │   """                                                                   │                      
                             │ ❱ 527 │   │   return self.serializer.to_python(                                     │                      
                             │   528 │   │   │   instance,                                                         │                      
                             │   529 │   │   │   mode=mode,                                                        │                      
                             │   530 │   │   │   by_alias=by_alias,                                                │                      
                             ╰─────────────────────────────────────────────────────────────────────────────────────╯                      
                             TypeError: 'float' object cannot be interpreted as an integer 

Your Environment

Include as many relevant details as possible about the environment you experienced the bug in:

  • Web browser system and version: Version 131.0.6778.86 (Official Build) (arm64)
  • Operating system and version: macOS 14.7.1 (23H222)
  • NodeJS version used (if relevant):
  • Kedro version used (if relevant): 0.19.10
  • Python version used (if relevant): 3.11.10

Checklist

  • Include labels so that we can categorise your issue
@datajoely
Copy link
Contributor

So I'm not sure if this is the same issue, but I remember once that infinity is a valid float in python but not JSON, it's possible something related is going on

@OlegBEZb
Copy link
Author

@datajoely thank you very much for your answer! I've investigated the dataset I had and actually found that the issue happens with the timestamp of type dtype='datetime64[ns]' containing 'NaT' values.

Here is the minimal dataframe which reproduces the error on my side:

test_df = pd.DataFrame({
    'timestamp_col': [pd.Timestamp('2024-12-11 18:00:00'), pd.NaT]
})
test_df.to_parquet('data/01_raw/dataset_name.parquet')

Produces exactly the same error:

TypeError: 'float' object cannot be interpreted as an integer

@datajoely
Copy link
Contributor

So that's interesting - but does that mean you were able to save the data that Kedro-Viz was trying to read?

@OlegBEZb
Copy link
Author

@datajoely, yes. Kedro pipeline does this without any changes. I've also tried to save such a dataframe using pyre pandas from the example above - both work. I haven't specified any configuration for the parquet, but I have fastparquet and pyarrow installed in the venv. So should use pyarrow with snappy compression by default, but I haven't checked into the depth if this may cause any issues

@rashidakanchwala
Copy link
Contributor

rashidakanchwala commented Jan 9, 2025

Is this a ParquetDataset? If so, Kedro-Viz uses the preview function from the ParquetDataset in the kedro-datasets package to render dataset previews , and it converts the dataset into a Pandas DataFrame.

To handle edge cases, like missing or problematic values (NaT, inf, etc.), you could clean the data table before previewing it. The preview function as you can see is simple and generic by design, so addressing all possible edge cases within it may complicate its purpose.

@astrojuanlu
Copy link
Member

xref pandas-dev/pandas#59772 ?

@astrojuanlu
Copy link
Member

@OlegBEZb Looks like the serialisation issue comes from upstream.

About

This issue makes all the consecutive datasets not available for a preview.

could you send a reproducer for this situation too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants