Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Interoperability with pd.to_sql? #123

Open
MichaelTiemannOSC opened this issue Apr 28, 2022 · 3 comments
Open

Interoperability with pd.to_sql? #123

MichaelTiemannOSC opened this issue Apr 28, 2022 · 3 comments

Comments

@MichaelTiemannOSC
Copy link
Collaborator

What is the recommended way to write and read pint-pandas dataframes to and from SQL databases?

Is it immediately obvious that the answer should be to convert to/from JSON? In my case, I'm interested in writing to/from a TRINO database via SQLAlchemy, but happy to see answers that work for PostgreSQL, SQLite, etc. Here's a test case:

https://github.com/os-climate/data-platform-demo/blob/master/notebooks/pint-demo.ipynb

Relevant snippet:

ureg.define("CO2e = CO2 = CO2eq = CO2_eq")
co2_df = pd.DataFrame({'co2': pd.Series([Q_(1.0, 'Mt CO2'), Q_(0.9, 'Mt CO2'), Q_(0.8, 'Mt CO2')],
                                        dtype='pint[Mt CO2]'),
                       'year': [2016, 2017, 2018]})
co2_df.to_sql(ingest_table, con=engine, schema=ingest_schema, if_exists='append', index=False,
              method=osc.TrinoBatchInsert(batch_size = 100, verbose = True, optimize = True))
@MichaelTiemannOSC
Copy link
Collaborator Author

Answering my own question: need to change YEAR series to pint[dimensionless] and then write co2_df.pint.dequantify() to get a SQL table that works.

@MichaelTiemannOSC
Copy link
Collaborator Author

I have updated the demo notebook to show the hoops through which one must jump in order to unpack the results that come back from pd.read_sql. The relevant snippet is:

new_co2_df = pd.DataFrame({col:series.astype(f"pint[{unit.replace('co2', 'CO2')}]")
                          for col, unit, series in zip(list(map(lambda x: ast.literal_eval(x)[0], new_df.T.apply(lambda x: x.index, axis=1).index.values)),
                                                       list(map(lambda x: ast.literal_eval(x)[1], new_df.T.apply(lambda x: x.index, axis=1).index.values)),
                                                       [v for v in new_df.to_dict(orient='series').values()])})

Re-opening in case there's an obviously better way to do this.

@MichaelTiemannOSC
Copy link
Collaborator Author

I have updated the notebook to do a full round-trip of a dataframe that includes both non-quantified data (strings), as well as with both homogeneous units in a column as well as heterogeneous units in a column.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant