Use SQLbackend, how to connect to existing SQL DB file and read the graph into memory #19

yangyxt · 2021-06-28T14:51:52Z

As showed in the title.

j6k4m8 · 2021-06-29T15:37:30Z

If you have already created a graph on disk with sqlite, you can read it like this (assuming your database file is called example-graph.sqlite):

import grand
from grand.backends import SQLBackend

G = grand.Graph(
    backend=SQLBackend(db_url="sqlite:///example-graph.sqlite", directed=True)
)

You can now interact with the graph through whatever dialect you like. If you want to use NetworkX-style APIs, you can list edges like this:

G.nx.edges()

Unfortunately, there isn't currently a great way to export and import graphs between Backends yet, as this can be a super expensive operation. If your graph is relatively small, you can export nodes and edges individually into, say, a NetworkX object — like this:

H = nx.DiGraph()

for node, attrs in G.nx.nodes(data=True):
    H.add_node(node, **attrs)
    
for u, v, attrs in G.nx.edges(data=True):
    H.add_edge(u, v, **attrs)

Then you'll have access to all the same nodes in an in-memory NetworkX object:

H.nodes(data=True)

Does this answer your question?

To get you started if you don't have a sqlite graph file yet, you can create the example-graph.sqlite like this:

G = grand.Graph(backend=SQLBackend(db_url="sqlite:///example-graph.sqlite", directed=True))
G.nx.add_edge("X", "Y")

j6k4m8 · 2021-07-12T18:04:45Z

Hey @yangyxt — does this answer your question?

AbdealiLoKo · 2021-10-07T19:08:38Z

I have a similar question I'm thinking through.
I have a custom schema (with ~10 tables and multiple relationships between the tables) already present in my RDBMS. I want to create a graph from that and run cypher queries on that to perform efficient queries.

Would I be able to use grand with grand-cypher for that ?
https://github.com/aplbrain/grand-cypher

Or will it be inefficient due to the way RDBMS queries work

j6k4m8 · 2021-10-07T19:33:16Z

Hi @AbdealiJK! That's an interesting use-case, and I think that Grand should work for you, though I'm not sure we can make any efficiency guarantees since every "edge" in your query will be converted to its own query. One issue you'll likely encounter is that Grand currently assumes ONE table for nodes and ONE table for edges, but we might be able to get around that if you have a more-or-less consistent schema between the "nodelike" tables and we can form a virtual nodes table out of a join.

By way of excessive explanation: The reason we don't currently support this is that not all Backends support the idea of multiple node "types," and practically no Dialects do either. So there's nothing blocking this technically, it's just not the original use-case! :)

If you're interested in playing around with it, I'd be happy to help debug with you!

AbdealiLoKo · 2021-10-08T02:55:29Z

Yeah. I'm planning on trying it out over the weekend.
But the main thing which I'm concerned about is the efficiency :(
Because if we're fetching each edge with a SQL query, that will not go well ...

I guess the 2 things I'm thinkint about in this regard are:

Doing a bulk fetch for the graph (i.e. get multiple edges in 1 go from SQL)
Doing a incremental fetch or caching mechanism so that once we do a bulk fetch, in subsequent calls - we simply check for changes and update the graph as needed.

My usecases are query-only. I.e. I will never modify my graph through networkx/other dialects.

But, I really like this idea. I was looking for a sqlalchemy for graphs. And grand (while still young) seems to be moving towards that direction. Will play around with it more and contribute if possible

yangyxt · 2021-10-08T03:45:33Z

Hey @yangyxt — does this answer your question?

Yes, thank you! And sorry for the late notice and response! Regarding exporting graphs, I chose to converting the database to text-based file objects to store the graph.

j6k4m8 · 2021-10-08T14:31:12Z

Awesome @yangyxt — hope you're able to use Grand in the future :)

@AbdealiJK: Great, let me know what you discover! I haven't really kicked the tires extensively on the SQL Backend in conjunction with Grand-Cypher, but in a few benchmarks it seemed pretty performant even on relatively large networks. (Not comparable with a "true" neo4j cypher query, but... not embarrassing, at least :) )

When I say "for each edge," that's each edge in the query, so if your Cypher query looks like

MATCH (A)-[]->(B)
RETURN A, B

This only runs one single table query. Still not ideal, but with some indexing probably pretty snappy.

I've also just published one of my backlog-todo's to add a caching layer to the Backends (#20); I think that would probably dramatically increase the speed of query responses if you can make assumptions about a static dataset.

I'd be super grateful and excited to see your contributions and evaluation! There's a lot of nuance to a generalized graph toolkit like Grand but I'm hopeful we're converging on something Good :)

j6k4m8 · 2021-12-02T03:49:50Z

Hey @AbdealiJK — curious to hear how your experiments went! Would be interested in hearing if you had to make any major changes to get things working.

AbdealiLoKo · 2021-12-23T23:22:26Z

Hi @j6k4m8 apologies I missed replying to your previous question.
So. I essentially had to create a new backend which was like a read-only backend. And in the read-only backend, I am making like 10-20 different SQL queries to pull all the relationships I need from sqlalchemy into python. I only pull the IDs.
Once i have this info, im creating a huge graph and then using it for query operations.

Differences here are:

I don't do a separate query to access parts of the graph - I do a constant set of queries and pull the entire graph into python
I cannot modify my network graph in python
This probably works only because my graph isn't too large, I would not have more than 10k nodes in my graph. For this, my SQL queries take no more than 1sec to fetch all the info from SQL

j6k4m8 · 2021-12-24T04:17:13Z

Aha, that's super interesting @AbdealiJK — thank you for the update! If you think there are any good lessons-learned that we could integrate into the package, I'd love to help!

AbdealiLoKo · 2021-12-24T04:50:08Z

I think the main one for me was that a single query which creates my entire graph was much faster than I expected.
Even if I am querying just a small portion of my graph, I currently create the entire graph - because I don't want to worry about the number of queries I have to run in case I have a deep subgraph.
If there was some way for me to get the best of both - i.e. query a part and cache ... It would be cool too. But I feel like that will always be limited to iterative approaches because of the way the SQL has to be structured.
So it seems like a memory-compute tradeoff.

I would also suggest having some hooks in grand so I can customize the SQL structure I have in an existing DB.
Currently. Grand wants to manage my SQL DB for me in its efficient way. But my usecase is more like - I want to create a graph VIEW on top of my SQL DB

j6k4m8 closed this as completed Dec 23, 2021

j6k4m8 mentioned this issue Jun 3, 2022

Question: edge attributes #35

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use SQLbackend, how to connect to existing SQL DB file and read the graph into memory #19

Use SQLbackend, how to connect to existing SQL DB file and read the graph into memory #19

yangyxt commented Jun 28, 2021

j6k4m8 commented Jun 29, 2021

j6k4m8 commented Jul 12, 2021

AbdealiLoKo commented Oct 7, 2021

j6k4m8 commented Oct 7, 2021

AbdealiLoKo commented Oct 8, 2021

yangyxt commented Oct 8, 2021

j6k4m8 commented Oct 8, 2021

j6k4m8 commented Dec 2, 2021

AbdealiLoKo commented Dec 23, 2021

j6k4m8 commented Dec 24, 2021

AbdealiLoKo commented Dec 24, 2021

Use SQLbackend, how to connect to existing SQL DB file and read the graph into memory #19

Use SQLbackend, how to connect to existing SQL DB file and read the graph into memory #19

Comments

yangyxt commented Jun 28, 2021

j6k4m8 commented Jun 29, 2021

j6k4m8 commented Jul 12, 2021

AbdealiLoKo commented Oct 7, 2021

j6k4m8 commented Oct 7, 2021

AbdealiLoKo commented Oct 8, 2021

yangyxt commented Oct 8, 2021

j6k4m8 commented Oct 8, 2021

j6k4m8 commented Dec 2, 2021

AbdealiLoKo commented Dec 23, 2021

j6k4m8 commented Dec 24, 2021

AbdealiLoKo commented Dec 24, 2021