Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create rados-like API #1

Open
ivotron opened this issue Jun 16, 2020 · 4 comments
Open

create rados-like API #1

ivotron opened this issue Jun 16, 2020 · 4 comments

Comments

@ivotron
Copy link

ivotron commented Jun 16, 2020

expose an API that is similar to the python-rados API:

import skyhookdm as sh
import pyarrow as pa
import pandas as pd

# connect to skyhook-capable cluster
cluster = sh.Skyhook(conffile='ceph.conf')
cluster.connect()

# create context
ioctx = cluster.open_ioctx('mypool')

# read/load/create arrow table
df = pd.read_csv('myfile.csv')
table = pa.Table.from_pandas(df)

# write table
ioctx.write_table('oid', 'arrow', table)

For querying:

q = sh.Query(columns=['foo', 'bar'], predicates=['baz < 5'])

table = ioctx.read_table('oid', 'arrow', q)
@xweichu
Copy link
Collaborator

xweichu commented Jun 17, 2020

Basically, current APIs are built based on the python-rados like API as you mentioned. I just updated the code a little bit to expose the API. Below is the example. Still some ongoing work.

import pyarrow as pa
import pandas as pd
from skyhookdmclient import SkyhookDM

# create a new SkyhookDM object
sk = SkyhookDM()

# connect to Skyhook_driver and Ceph data pool, please replace the ip_address and the pool name. 
sk.connect('ip_address','ceph_pool_name')

# read/load/create arrow table
df = pd.read_csv('myfile.csv')
table = pa.Table.from_pandas(df)

# write table
sk.write_full('oid', table)

@xweichu
Copy link
Collaborator

xweichu commented Jun 17, 2020

I'm also working on fixing the query() function during the weekends or whenever I have time. Will update the query function soon as well.

@ivotron
Copy link
Author

ivotron commented Jun 18, 2020

thanks @xweichu. One thing that might make the codebase more generic is to refactor the code so that the dask-related (driver) is independent from this API, which would allow us to have a low-level (rados-like) API that only deals with operations that are available in the skyhookdm class.

In this way, this low-level API can than be used to compose higher-level abstractions, like the dask driver (that could reside on a repo of its own), allowing for alternative implementations (e.g. using ray instead of dask), or allowing it to be a building block for the higher-level API that @drin is working on.

So in short, this low-level API would only be in charge of doing write and reads (queries), and then higher-level abstractions would be build on top of this. What do you think?

@ivotron ivotron transferred this issue from uccross/skyhookdm-pythonclient Jul 15, 2020
@ivotron
Copy link
Author

ivotron commented Jul 15, 2020

moving this issue to this repo. This codebase has all the necessary things to implement something like what it's described in the OP. The reading part is missing, which could be implemented as a method of the RadosIOContext class

cc: @jlefevre @carlosmalt @drin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants