-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows ODBC data source always uses :memory: no matter what you set it to in Control Panel - ODBC Data Sources #29
Comments
I have this issue as well on Windows 11 with the 64 bit drivers and power BI/Excel. Also noting that a couple of the tests are failing when I run test_odbc.exe that comes with the drivers. D:\a\duckdb\duckdb\tools\odbc\test\tests\connect.cpp(132): FAILED: Test SQLColAttribute for a query that returns an int =============================================================================== |
I also find this issue prevents from using window's version DuckDB ODBC Driver with a connection to database other than :memory:. Symptoms include empty list of tables displayed using FWIW I note this issue is surfacing elsewhere Might this help ??? ... I too found some of the tests failing:
and
success here:
I also installed bleeding edge from https://duckdb.org/docs/installation/?version=main&environment=odbc&platform=win&download_method=direct&architecture=x86_64 and found this issue persists. |
Follow up with a hint and possible step toward a workaround...
I put the following in as the SQL statement:
and I got a step further as ...however when I click on "Load" I am advised: |
If you edit your query and click "Advanced Editor" I believe the following would give you what you expect:
It should not be necessary to specify the location of the file, of course. But right now that's what I do, under the circumstances. |
@shwivel thanks for that tip - i saw it before too in duckdb/duckdb#11380 (comment) - I just don't know how to apply it to my use of Excel, as opposed to your use of Power Query. Guidance much appreciated! |
You are in the correct place. On the very first screen (depicted below) you would separate the connection string and query as depicted below: Later, it is helpful for formatting purposes to use the advanced editor. (ribbon menu > Data > Queryies & Connections [this opens a side panel on the right with all your queries] then right click your query (from that side panel), click edit, and in the window that pops up, use the "Advanced Editor" button on the top left of ribbon menu. Then you'll see what I had pasted above and can more easily tweak the connection string and query. (still easier to to copy and paste from a syntax editor, but at least you get the line breaks and such) This looks like hell compared to the equivalent functionality on Google Sheets, but it is what it is and is mostly not terrible. Not as good as it could be, for sure. |
Thanks @shwivel - I'm so close! Using your suggestion, I'm now getting this error message: ... whether or not I qualify the table name with I'm pretty confident I am connecting to my database since if I mistype the database path the error message instead refers to :memory:. I am also confident my select statement is correct - it works when run connecting using CLI (linux side), with or without the Do you possibly see any other variation I should be trying? |
If you go to the advanced editor (where the first word should be "let") you see a connection string and your query in the format depicted here, correct? #29 (comment) Are you attaching databases (within SQL) or solely using the database you've specified in the connection string? What happens if you just run a query like:
To confirm for certain you are connecting to the correct .db file you could also run this query:
Should return a value of something like "c:\your_path\your_name.your_extension.tmp" (the db file you're connecting to, with .tmp added at the end). If you don't see that, and you see merely ".tmp" that means you've connected to :memory: |
By default the temp file will be in the same folder as the database file (that you're connected to), and the temp file will be named the same, but with .tmp at the end. If you've connected to :memory: then the result will just be .tmp. You mention having running that test query via CLI on a linux machine, but that's not where you need to test because it is not where you're having trouble connecting. I would test out running that query in Excel (rather than the query you're actually running) to make sure that you're actually connecting to the database you want to connect to, within Excel. Attached is an Excel file named test_excel.xlsx with a query that accesses a database named testdb.db at path C:\data\testdb.db which has one table called poo. An image below depicts the setup. Excel file: Database file: If you put the database file onto your Windows machine at path C:\data\ and this Excel file will not allow you to refresh the query result, then something is going on environment-wise, because I can do so, with these two files. |
FWIW: my workaround for now is to mirror the duckdb in sqlite for purposes of slapping an excel data dashboard in front of it. 😄 |
Alas: the issue persists for me on two other windows computers. re: environment-wise possibilities - I retested after installing latest supported Microsoft Visual C++ Redistributable (and restarting). No fix 😢 Still hoping this issue just gets fixed upstream... |
The error persists even with nightly build. https://artifacts.duckdb.org/duckdb-odbc/latest/odbc-windows-amd64.zip |
When connection string includes a trailing comma, existing impl in Connect::ParseInputStr was doing an early exit without setting DSN value to dbc->dsn. Because of this DSN was not used and default :memory: instance was created. The problem is not OS specific, but was happening on Windows all the time because Windows ODBC Driver Manager is appending comma to the passed connection string when calling SQLDriverConnect. test_connect_odbc was extended to cover the trailing comma and also parameters overriding in connection string. Fixes: duckdb#29, duckdb#48
When connection string includes a trailing comma, existing impl in Connect::ParseInputStr was doing an early exit without setting DSN value to dbc->dsn. Because of this DSN was not used and default :memory: instance was created. The problem is not OS specific, but was happening on Windows all the time because Windows ODBC Driver Manager is appending comma to the passed connection string when calling SQLDriverConnect. test_connect_odbc was extended to cover the trailing comma and also parameters overriding in connection string. Fixes: duckdb#29, duckdb#48
The problem is not Windows specific, it can be also reproduced on Linux when the connection string ends with a semicolon: On Linux, with the following
And the followig
Running with PyODBC: Python 3.12.8 (main, Dec 6 2024, 00:00:00) [GCC 14.2.1 20240912 (Red Hat 14.2.1-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyodbc
>>> print(pyodbc.connect("DSN=DuckDB").cursor().execute("select current_catalog()").fetchone())
DSN=DuckDB
('test',)
>>> print(pyodbc.connect("DSN=DuckDB;").cursor().execute("select current_catalog()").fetchone())
DSN=DuckDB;
('memory',) However on Windows the problem is worse because Windows ODBC Driver Manager always appends a semicolon to the connection string before passing it to |
In my hands @Mytherin's changes do not resolve the initially reported issue by @shwivel. As @HildebKa previously reported "The issue still persists in nightly build" I think it should be re-opened, but I lack the privs to do so. I installed nightly bleeding edge with files dated 2/3. I find that as before, testing using Excel to make an ODBC connection ( As a possible workaround, I tried setting the 'Advanced Options > SQL Statement Note, I first tried just attaching to the database without the select, and got this possibly informative error.
Perhaps these test results are informative:
|
I just downloaded the nightly build and this issue is not resolved. Please test it on Windows so that you can see it does not work. |
Checking this on Windows using the latest nightly build with the following
No DuckDB driver registered before running
Change the
Check with PyODBC: $ python
Python 3.9.2 (tags/v3.9.2:1a79785, Feb 19 2021, 13:44:55) [MSC v.1928 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyodbc
>>> print(pyodbc.connect("DSN=DuckDB").cursor().execute("select current_catalog()").fetchone())
('test1',)
>>> print(pyodbc.connect("DSN=DuckDB;").cursor().execute("select current_catalog()").fetchone())
('test1',) Check that
So I believe the OP issue is indeed fixed in the latest nightly build. And to troubleshoot the problem in your env, can you share the output of all following registry queries:
And also share the details (SHA-256 sum) of the |
I can confirm that the nightly version you've used is passing all the test-suite on Windows including the This is an excerpt from this nightly run:
I can investigate/debug DuckDB ODBC usage from Excel, but I don't have the cycles to do that right now. Would you mind filing a separate issue with the details how exactly you are setting up the ODBC in Excel (or any other ODBC-capable Windows apps you may use)? |
@staticlibs I provided a complete example with screenshots in my comment. This is not Excel-specific. It does not matter what software you use to connect via ODBC, for example if you use Power BI the result is the same. It uses the transient memory database and not the database you specify in ODBC Data Sources. The instructions on the DuckDB website state that it will use the database specified in ODBC Data Sources. It does not. |
@staticlibs - thanks for educating me regarding configuring the environment for executing tests GitHub actions and that indeed they should pass I am configuring ODBC in Excel exactly as in @shwivel 's report. |
Would it be possible for you to install Python and PyODBC in your environment? I can guide you step by step if necessary. It is possible (though I think - unlikely) that ODBC access from Excel uses a different code path in ODBC driver than access from PyODBC. So lets check first that PyODBC example works in your env to narrow down the problem. |
The behavior is the same when using python. It uses the transient memory database (when a persistent database is not specified) rather than the one I've specified in ODBC data sources. For example, if you set the database to C:\data\testdb.db in ODBC data sources, using the test database linked in my previous comment, and run the following python script, you will get no rows returned:
Result:
This is despite the existence of a table named poo in that database. (the one I set in ODBC data sources) I can specify the database file in the Python script like this:
Then I get the results I expect:
But of course the whole point is to be able to specify the database in one place, and for it to default to that database, rather than the empty, transient, "memory" database. |
I believe there is a confusion about the usage of DuckDB from Python. Accessing the existing DB file with the Python API of DuckDB (sources) is NOT the same as accessing it with PyODBC using DuckDB ODBC driver (sources). The Python API ( The PyODBC ( So, to troubleshoot the problem in your env, lets make the PyODBC working first, writing the detailed setup below, checked on a clean Windows:
import pyodbc
print(pyodbc.connect("DSN=DuckDB;").cursor().execute("select current_catalog()").fetchone()) If the DSN read successfully from the registry, then the output will be:
Otherwise, when DSN is not found (that was always the case on Windows before the latest nightly build), the output will be:
And then check that the edit: redist link fix |
That works in python: But the same does not work in Excel: It does work in Power BI: |
Oddly, simply omitting the driver specification from the connection string solves the issue: However, there is a reason I did that. The reason is because setting access_mode to read_only does not otherwise seem possible. For example, this does not work (see error) (trailing semicolon does not matter whether included or not): Simply removing access_mode=read_only fixes the issue: But I want to connect in read only mode. It does not seem that we can connect in read only mode unless we specify the database file location (and the name of the driver). Setting the location in the data source name, and then using that named data source name (rather than the file location) in a connection string seems to prohibit the setting of the mode to read_only. Is that intentional? If that could be changed, it would be ideal. I cannot use it otherwise. Ultimately, I'd still need to set it manually in every file. |
By manually adding a string value to the registry named access_mode subsequent use of the given DSN appears to obey that setting (see screenshot below). So I'm all set. It might be helpful to modify the UI for the ODBC configuration such that you can specify that setting (and perhaps others) rather than exclusively the database file location. So that the user doesn't need to manually add string values to the registry, afterward. There does not appear to be a way to set the setting in the connection string, unless your connecting string specifies the database file location (it won't work if just using a named DSN). |
Yes, this thing comes from ODBC itself (not specific to DuckDB - details). Basically either
Additional DuckDB options can be specified in the connection string itself. Though they currently only work properly when print(pyodbc.connect("DSN=DuckDB;").cursor().execute("select current_setting('access_mode')").fetchone())
print(pyodbc.connect("DSN=DuckDB;access_mode=read_only").cursor().execute("select current_setting('access_mode')").fetchone())
print(pyodbc.connect("DSN=DuckDB;").cursor().execute("select current_catalog()").fetchone())
print(pyodbc.connect("DSN=DuckDB;database=another_test2.db").cursor().execute("select current_catalog()").fetchone())
|
Got it, thanks @staticlibs |
@shwivel I'm confused - it appears to me that your original observation still holds
Why would you want to close this issue? What am I missing? |
@malcook Per my screenshot from my last comment, with the new ODBC driver, Excel is now using the database I have specified in Windows ODBC data sources. Previously, with the old ODBC driver, I had to specify the location of the database in Excel. Now I don't. Let me know if you have any questions. |
On a Windows system after installing the ODBC driver a data source name of "DuckDB" is created with the default database set to :memory: but of course we may wish to specify a specific database file.
When you change :memory: to some file (ie. C:\some_file.db ) and you try to use that data source, you'll get "table not found" errors (if running a query) or no tables will list (if listing tables) because it's actually using the transient :memory: database even though you've specified a persistent one, as depicted below:
It shows up in regedit fine:
But it doesn't actually use the given database. For example if you open Excel or Power BI and click Data > Get Data > From other sources > From ODBC > DuckDB > no tables will be listed because none are there (it's using :memory: and not the one specified in Control Panel > ODBC data sources). You can manually set the database file location and get the tables to list, within every Excel or whatever file using the DSN, however this is not only annoying to have to do everywhere and every time, but certain features become limited when you set a manual connection string. In any case, I figure this cannot be intended, because why would you allow the specification of a database (with a default of :memory:) if no matter what you change it to, it is still going to use :memoy:?
Let me know if you have any questions. Thanks
The text was updated successfully, but these errors were encountered: