Skip to content

Commit

Permalink
Merge pull request duckdb#4780 from szarnyasg/api2clients
Browse files Browse the repository at this point in the history
/docs/api to /docs/clients
  • Loading branch information
szarnyasg authored Feb 12, 2025
2 parents a19e1b7 + 290b102 commit a9f2395
Show file tree
Hide file tree
Showing 99 changed files with 281 additions and 185 deletions.
2 changes: 1 addition & 1 deletion _data/menu_docs_dev.json
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@
},
{
"page": "Client APIs",
"slug": "api",
"slug": "clients",
"subfolderitems": [
{
"page": "Overview",
Expand Down
2 changes: 1 addition & 1 deletion _posts/2022-09-30-postgres-scanner.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ But the design space is not as black and white as it seems. For example, the OLA

To allow for fast and consistent analytical reads of Postgres databases, we designed and implemented the "Postgres Scanner". This scanner leverages the *binary transfer mode* of the Postgres client-server protocol (See the [Implementation Section](#implementation) for more details.), allowing us to efficiently transform and use the data directly in DuckDB.

Among other things, DuckDB's design is different from conventional data management systems because DuckDB's query processing engine can run on nearly arbitrary data sources without needing to copy the data into its own storage format. For example, DuckDB can currently directly run queries on [Parquet files]({% link docs/data/parquet/overview.md %}), [CSV files]({% link docs/data/csv/overview.md %}), [SQLite files](https://github.com/duckdb/duckdb-sqlite), [Pandas]({% link docs/guides/python/sql_on_pandas.md %}), [R]({% link docs/api/r.md %}#efficient-transfer) and [Julia]({% link docs/api/julia.md %}#scanning-dataframes) data frames as well as [Apache Arrow sources]({% link docs/guides/python/sql_on_arrow.md %}). This new extension adds the capability to directly query PostgreSQL tables from DuckDB.
Among other things, DuckDB's design is different from conventional data management systems because DuckDB's query processing engine can run on nearly arbitrary data sources without needing to copy the data into its own storage format. For example, DuckDB can currently directly run queries on [Parquet files]({% link docs/data/parquet/overview.md %}), [CSV files]({% link docs/data/csv/overview.md %}), [SQLite files](https://github.com/duckdb/duckdb-sqlite), [Pandas]({% link docs/guides/python/sql_on_pandas.md %}), [R]({% link docs/clients/r.md %}#efficient-transfer) and [Julia]({% link docs/clients/julia.md %}#scanning-dataframes) data frames as well as [Apache Arrow sources]({% link docs/guides/python/sql_on_arrow.md %}). This new extension adds the capability to directly query PostgreSQL tables from DuckDB.

## Usage

Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-05-17-announcing-duckdb-080.md
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ print(res)
# [(datetime.date(2019, 5, 15),)]
```

See the [documentation]({% link docs/api/python/function.md %}) for more information.
See the [documentation]({% link docs/clients/python/function.md %}) for more information.

[**Arrow Database Connectivity Support (ADBC)**](https://github.com/duckdb/duckdb/pull/7086). ADBC is a database API standard for database access libraries that uses Apache Arrow to transfer query result sets and to ingest data. Using Arrow for this is particularly beneficial for columnar data management systems which traditionally suffered a performance hit by emulating row-based APIs such as JDBC/ODBC. From this release, DuckDB natively supports ADBC. We’re happy to be one of the first systems to offer native support, and DuckDB’s in-process design fits nicely with ADBC.

Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-07-07-python-udf.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ The two approaches exhibit two key differences:

2) **Vectorization.** PyArrow Table functions operate on a chunk level, processing chunks of data containing up to 2048 rows. This approach maximizes cache locality and leverages vectorization. On the other hand, the built-in types UDF implementation operates on a per-row basis.

This blog post aims to demonstrate how you can extend DuckDB using Python UDFs, with a particular emphasis on PyArrow-powered UDFs. In our quick-tour section, we will provide examples using the PyArrow UDF types. For those interested in benchmarks, you can jump ahead to the [benchmark section below](#benchmarks). If you want to see a detailed description of the Python UDF API, please refer to our [documentation]({% link docs/api/python/function.md %}).
This blog post aims to demonstrate how you can extend DuckDB using Python UDFs, with a particular emphasis on PyArrow-powered UDFs. In our quick-tour section, we will provide examples using the PyArrow UDF types. For those interested in benchmarks, you can jump ahead to the [benchmark section below](#benchmarks). If you want to see a detailed description of the Python UDF API, please refer to our [documentation]({% link docs/clients/python/function.md %}).

## Python UDFs

Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-08-04-adbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ The figure below depicts the query execution flow when using ADBC. Note that the

For our quick tour, we will illustrate an example of round-tripping data using DuckDB-ADBC via Python. Please note that DuckDB-ADBC can also be utilized with other programming languages. Specifically, you can find C++ DuckDB-ADBC examples and tests in the [DuckDB GitHub repository](https://github.com/duckdb/duckdb/blob/main/test/api/adbc/test_adbc.cpp) along with usage examples available in C++.
For convenience, you can also find a ready-to-run version of this tour in a [Colab notebook](https://colab.research.google.com/drive/11CEI62jRMHG5GtK0t_h6xSn6ne8W7dvS?usp=sharing).
If you would like to see a more detailed explanation of the DuckDB-ADBC API or view a C++ example, please refer to our [documentation page]({% link docs/api/adbc.md %}).
If you would like to see a more detailed explanation of the DuckDB-ADBC API or view a C++ example, please refer to our [documentation page]({% link docs/clients/adbc.md %}).

### Setup

Expand Down
2 changes: 1 addition & 1 deletion _posts/2023-09-26-announcing-duckdb-090.md
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,7 @@ SELECT * FROM 'azure://<my_container>/*.csv';

## Clients

[**Experimental PySpark API**](https://github.com/duckdb/duckdb/pull/8083). This release features the addition of an experimental Spark API to the Python client. The API aims to be fully compatible with the PySpark API, allowing you to use the Spark API as you are familiar with but while utilizing the power of DuckDB. All statements are translated to DuckDB's internal plans using our [relational API]({% link docs/api/python/relational_api.md %}) and executed using DuckDB's query engine.
[**Experimental PySpark API**](https://github.com/duckdb/duckdb/pull/8083). This release features the addition of an experimental Spark API to the Python client. The API aims to be fully compatible with the PySpark API, allowing you to use the Spark API as you are familiar with but while utilizing the power of DuckDB. All statements are translated to DuckDB's internal plans using our [relational API]({% link docs/clients/python/relational_api.md %}) and executed using DuckDB's query engine.

```python
from duckdb.experimental.spark.sql import SparkSession as session
Expand Down
10 changes: 5 additions & 5 deletions _posts/2023-12-18-duckdb-extensions-in-wasm.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ To accommodate this, DuckDB has an extension mechanism for installing and loadin

### Running DuckDB Extensions Locally

For DuckDB, here is a simple end-to-end example using the [command line interface]({% link docs/api/cli/overview.md %}):
For DuckDB, here is a simple end-to-end example using the [command line interface]({% link docs/clients/cli/overview.md %}):

```sql
INSTALL tpch;
Expand All @@ -37,15 +37,15 @@ Currently, DuckDB has [several extensions]({% link docs/extensions/core_extensio

In an effort spearheaded by André Kohn, [DuckDB was ported to the WebAssembly platform]({% post_url 2021-10-29-duckdb-wasm %}) in 2021. [WebAssembly](https://webassembly.org/), also known as Wasm, is a W3C standard language developed in recent years. Think of it as a machine-independent binary format that you can execute from within the sandbox of a web browser.

Thanks to DuckDB-Wasm, anyone has access to a DuckDB instance only a browser tab away, with all computation being executed locally within your browser and no data leaving your device. DuckDB-Wasm is a library that can be used in various deployments (e.g., [notebooks that run inside your browser without a server](https://observablehq.com/@cmudig/duckdb)). In this post, we will use the Web shell, where SQL statements are entered by the user line by line, with the behavior modeled after the DuckDB [CLI shell]({% link docs/api/cli/overview.md %}).
Thanks to DuckDB-Wasm, anyone has access to a DuckDB instance only a browser tab away, with all computation being executed locally within your browser and no data leaving your device. DuckDB-Wasm is a library that can be used in various deployments (e.g., [notebooks that run inside your browser without a server](https://observablehq.com/@cmudig/duckdb)). In this post, we will use the Web shell, where SQL statements are entered by the user line by line, with the behavior modeled after the DuckDB [CLI shell]({% link docs/clients/cli/overview.md %}).

## DuckDB Extensions, in DuckDB-Wasm!

DuckDB-Wasm [now supports DuckDB extensions]({% link docs/api/wasm/extensions.md %}). This support comes with four new key features.
DuckDB-Wasm [now supports DuckDB extensions]({% link docs/clients/wasm/extensions.md %}). This support comes with four new key features.
First, the DuckDB-Wasm library can be compiled with dynamic extension support.
Second, DuckDB extensions can be compiled to a single WebAssembly module.
Third, users and developers working with DuckDB-Wasm can now select the set of extensions they load.
Finally, the DuckDB-Wasm shell's features are now much closer to the native [CLI functionality]({% link docs/api/cli/overview.md %}).
Finally, the DuckDB-Wasm shell's features are now much closer to the native [CLI functionality]({% link docs/clients/cli/overview.md %}).

### Using the TPC-H Extension in DuckDB-Wasm

Expand Down Expand Up @@ -176,7 +176,7 @@ We see two main groups of developers using extensions with DuckDB-Wasm.
## Limitations

DuckDB-Wasm extensions have a few inherent limitations. For example, it is not possible to communicate with native executables living on your machine, which is required by some extensions, such as the [`postgres` scanner extension]({% link docs/extensions/postgres.md %}).
Moreover, compilation to Wasm may not be currently supported for some libraries you are relying on, or capabilities might not be one-to-one with local executables due to additional requirements imposed on the browser, in particular around [non-secure HTTP requests]({% link docs/api/wasm/extensions.md %}#httpfs).
Moreover, compilation to Wasm may not be currently supported for some libraries you are relying on, or capabilities might not be one-to-one with local executables due to additional requirements imposed on the browser, in particular around [non-secure HTTP requests]({% link docs/clients/wasm/extensions.md %}#httpfs).

## Conclusions

Expand Down
4 changes: 2 additions & 2 deletions _posts/2024-02-13-announcing-duckdb-0100.md
Original file line number Diff line number Diff line change
Expand Up @@ -344,14 +344,14 @@ As a user, you don't have to do anything to make use of the new ALP compression

## CLI Improvements

The command-line client has seen a lot of work this release. In particular, multi-line editing has been made the default mode, and has seen many improvements. The query history is now also multi-line. [Syntax highlighting has improved]({% link docs/api/cli/syntax_highlighting.md %}) – missing brackets and unclosed quotes are highlighted as errors, and matching brackets are highlighted when the cursor moves over them. Compatibility with read-line has also been [greatly extended]({% link docs/api/cli/editing.md %}).
The command-line client has seen a lot of work this release. In particular, multi-line editing has been made the default mode, and has seen many improvements. The query history is now also multi-line. [Syntax highlighting has improved]({% link docs/clients/cli/syntax_highlighting.md %}) – missing brackets and unclosed quotes are highlighted as errors, and matching brackets are highlighted when the cursor moves over them. Compatibility with read-line has also been [greatly extended]({% link docs/clients/cli/editing.md %}).

<img src="/images/syntax_highlighting_screenshot.png"
alt="Image showing syntax highlighting in the shell"
width="700px"
/>

See the [extended CLI docs for more information]({% link docs/api/cli/overview.md %}).
See the [extended CLI docs for more information]({% link docs/clients/cli/overview.md %}).

## Final Thoughts

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ Some of the queries explained in this blog post are shown in simplified form on
For our initial queries, we'll use the 2023 [railway services dataset](https://www.rijdendetreinen.nl/en/open-data/train-archive).
To get this dataset, download the [`services-2023.csv.gz` file](https://blobs.duckdb.org/nl-railway/services-2023.csv.gz) (330 MB) and load it into DuckDB.

First, start the [DuckDB command line client]({% link docs/api/cli/overview.md %}) on a persistent database:
First, start the [DuckDB command line client]({% link docs/clients/cli/overview.md %}) on a persistent database:

```bash
duckdb railway.db
Expand Down Expand Up @@ -277,7 +277,7 @@ CREATE TABLE distances AS
);
```

To make the `NULL` values visible in the command line output, we set the [`.nullvalue` dot command]({% link docs/api/cli/dot_commands.md %}) to `NULL`:
To make the `NULL` values visible in the command line output, we set the [`.nullvalue` dot command]({% link docs/clients/cli/dot_commands.md %}) to `NULL`:

```sql
.nullvalue NULL
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ are ubiquitious and widely used in [shell scripts](https://en.wikipedia.org/wiki

As a purpose-built data processing tool, DuckDB fits the Unix philosophy quite well.
First, it was designed to be a fast in-process analytical SQL database system _(do one thing and do it well)._
Second, it has a standalone [command line client]({% link docs/api/cli/overview.md %}), which can consume and produce CSV files _(work together),_
Second, it has a standalone [command line client]({% link docs/clients/cli/overview.md %}), which can consume and produce CSV files _(work together),_
and also supports reading and writing text streams _(handle text streams)_.
Thanks to these, DuckDB works well in the ecosystem of Unix CLI tools, as
shown
Expand All @@ -47,7 +47,7 @@ While there are shells specialized specifically for dataframe processing, such a

At the same time, we have DuckDB, an extremely portable database system which uses the same SQL syntax on all platforms.
With [version 1.0.0 released recently]({% post_url 2024-06-03-announcing-duckdb-100 %}), DuckDB's syntax – based on the proven and widely used PostgeSQL dialect – is now in a stable state.
Another attractive feature of DuckDB is that it offers an interactive shell, which aids quick debugging. Moreover, DuckDB is available in [several host languages]({% link docs/api/overview.md %}) as well as in the browser [via WebAssembly](https://shell.duckdb.org/), so if you ever decide to use your SQL scripts outside of the shell, DuckDB SQL scripts can be ported to a wide variety of environments without any changes.
Another attractive feature of DuckDB is that it offers an interactive shell, which aids quick debugging. Moreover, DuckDB is available in [several host languages]({% link docs/clients/overview.md %}) as well as in the browser [via WebAssembly](https://shell.duckdb.org/), so if you ever decide to use your SQL scripts outside of the shell, DuckDB SQL scripts can be ported to a wide variety of environments without any changes.

## Data Processing with Unix Tools and DuckDB

Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-07-05-community-extensions.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ tags: ["extensions"]

### Design Philosophy

One of the main design goals of DuckDB is *simplicity*, which – to us – implies that the system should be rather nimble, very light on dependencies, and generally small enough to run on constrained platforms like [WebAssembly]({% link docs/api/wasm/overview.md %}). This goal is in direct conflict with very reasonable user requests to support advanced features like spatial data analysis, vector indexes, connectivity to various other databases, support for data formats, etc. Baking all those features into a monolithic binary is certainly possible and the route some systems take. But we want to preserve DuckDB’s simplicity. Also, shipping all possible features would be quite excessive for most users because no use cases require *all* extensions at the same time (the “Microsoft Word paradox”, where even power users only use a few features of the system, but the exact set of features vary between users).
One of the main design goals of DuckDB is *simplicity*, which – to us – implies that the system should be rather nimble, very light on dependencies, and generally small enough to run on constrained platforms like [WebAssembly]({% link docs/clients/wasm/overview.md %}). This goal is in direct conflict with very reasonable user requests to support advanced features like spatial data analysis, vector indexes, connectivity to various other databases, support for data formats, etc. Baking all those features into a monolithic binary is certainly possible and the route some systems take. But we want to preserve DuckDB’s simplicity. Also, shipping all possible features would be quite excessive for most users because no use cases require *all* extensions at the same time (the “Microsoft Word paradox”, where even power users only use a few features of the system, but the exact set of features vary between users).

To achieve this, DuckDB has a powerful extension mechanism, which allows users to add new functionalities to DuckDB. This mechanism allows for registering new functions, supporting new file formats and compression methods, handling new network protocols, etc. In fact, many of DuckDB’s popular features are implemented as extensions: the [Parquet reader]({% link docs/data/parquet/overview.md %}), the [JSON reader]({% link docs/data/json/overview.md %}), and the [HTTPS/S3 connector]({% link docs/extensions/httpfs/overview.md %}) all use the extension mechanism.

Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-08-19-duckdb-tricks-part-1.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ DESCRIBE tbl;
└─────────────┴─────────────┴─────────┴─────────┴─────────┴─────────┘
```

Alternatively, in the CLI client, we can run the `.schema` [dot command]({% link docs/api/cli/dot_commands.md %}):
Alternatively, in the CLI client, we can run the `.schema` [dot command]({% link docs/clients/cli/dot_commands.md %}):

```plsql
.schema
Expand Down
2 changes: 1 addition & 1 deletion _posts/2024-10-04-duckdb-user-survey-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ This is in line with the vision that originally drove the creation of DuckDB: cr

### Clients

[Unsurprisingly](https://www.tiobe.com/tiobe-index/python/), DuckDB is most often used from Python (73%), followed by the [standalone command-line application]({% link docs/api/cli/overview.md %}) (47%).
[Unsurprisingly](https://www.tiobe.com/tiobe-index/python/), DuckDB is most often used from Python (73%), followed by the [standalone command-line application]({% link docs/clients/cli/overview.md %}) (47%).
The third spot is hotly contested with R, WebAssembly (!) and Java all achieving around 14%, followed by Node.js (Javascript) at 9%

![DuckDB clients](/images/blog/survey/clients.svg)
Expand Down
4 changes: 2 additions & 2 deletions _posts/2024-11-29-duckdb-tricks-part-3.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ We have now a table with all the data from January to October, amounting to almo

Suppose we want to analyze the average delay of the [Intercity Direct trains](https://en.wikipedia.org/wiki/Intercity_Direct) operated by the [Nederlandse Spoorwegen (NS)](https://en.wikipedia.org/wiki/Nederlandse_Spoorwegen), measured at the final destination of the train service.
While we can run this analysis directly on the `.csv` files, the lack of metadata (such as schema and min-max indexes) will limit the performance.
Let's measure this in the CLI client by turning on the [timer]({% link docs/api/cli/dot_commands.md %}):
Let's measure this in the CLI client by turning on the [timer]({% link docs/clients/cli/dot_commands.md %}):

```plsql
.timer on
Expand Down Expand Up @@ -246,7 +246,7 @@ TO 'services-parquet-hive'
(FORMAT PARQUET, PARTITION_BY (Service_Company, Service_Type));
```

Let's peek into the directory from DuckDB's CLI using the [`.sh` dot command]({% link docs/api/cli/dot_commands.md %}):
Let's peek into the directory from DuckDB's CLI using the [`.sh` dot command]({% link docs/clients/cli/dot_commands.md %}):

```plsql
.sh tree services-parquet-hive
Expand Down
Loading

0 comments on commit a9f2395

Please sign in to comment.