Querying with SQL

SQL queries are sent to Beacon using POST /api/query with a JSON body containing a sql string.

TIP

If your node has SQL disabled, enable it by setting BEACON_ENABLE_SQL=true.

TIP

If you are querying from Python (notebooks/pipelines), using the Beacon Python SDK is recommended: Beacon Python SDK docs

Minimal example

The most common pattern is to use a table function in the FROM clause. This lets you query one file, or many files at once, using glob patterns.

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT * FROM read_parquet(['**/*.parquet']) LIMIT 10"
}

If you omit the output field, Beacon streams results back as Arrow IPC (application/vnd.apache.arrow.stream).

Download as a file (recommended for most clients)

To receive a single downloadable file, set output.format.

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT time, latitude, longitude, temp FROM read_netcdf(['**/*.nc']) WHERE temp BETWEEN 2 AND 10 LIMIT 10000",
  "output": {
    "format": "parquet"
  }
}

Beacon will respond with:

Content-Disposition: attachment; filename="output.parquet"
x-beacon-query-id: <uuid>

Common patterns

Discover available table functions

Beacon registers a set of table-valued functions you can use in SQL (for example read_netcdf, read_zarr, read_parquet, ...).

List table functions: GET /api/table-functions

Inspect available tables and schema

List tables: GET /api/tables
Default table schema (if you use it): GET /api/default-table-schema
Table schema: GET /api/table-schema?table_name=...

Query a registered table

If your Beacon node has tables configured (collections), you can query them directly:

Use GET /api/tables to discover the table names available on your server.

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT time, latitude, longitude, temp FROM some_table WHERE temp IS NOT NULL LIMIT 1000",
  "output": { "format": "csv" }
}

Read from files using table functions

All file-reading functions take a list of glob paths as their first argument, for example ['**/*.nc'].

NetCDF (`read_netcdf`)

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT TEMP, PSAL, LONGITUDE, LATITUDE FROM read_netcdf(['**/*.nc']) WHERE TEMP IS NOT NULL LIMIT 100000",
  "output": { "format": "parquet" }
}

Zarr (`read_zarr`)

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT TEMP, PSAL, LONGITUDE, LATITUDE FROM read_zarr(['datasets.zarr/zarr.json']) LIMIT 100000",
  "output": { "format": "parquet" }
}

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT TEMP, PSAL, LONGITUDE, LATITUDE FROM read_zarr(['*/zarr.json']) LIMIT 100000",
  "output": { "format": "parquet" }
}

read_zarr also supports an optional second argument statistics_columns (a list of column names). Supplying these can improve performance for large Zarr datasets by enabling pushdown based on precomputed statistics.

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT TEMP, PSAL, LONGITUDE, LATITUDE FROM read_zarr(['datasets.zarr/zarr.json'], ['LONGITUDE', 'LATITUDE']) WHERE LONGITUDE > 10 AND LATITUDE < 50",
  "output": { "format": "parquet" }
}

Parquet (`read_parquet`)

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT * FROM read_parquet(['*.parquet']) WHERE quality_flag = 0 LIMIT 100000",
  "output": { "format": "parquet" }
}

CSV (`read_csv`)

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT * FROM read_csv(['1.csv', '2.csv']) LIMIT 10000",
  "output": { "format": "parquet" }
}

Arrow IPC files (`read_arrow`)

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT * FROM read_arrow(['*.arrow', '*.ipc']) LIMIT 10000",
  "output": { "format": "parquet" }
}

Beacon Binary Format (`read_bbf`)

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT * FROM read_bbf(['*.bbf']) WHERE time >= '2020-01-01' AND time < '2021-01-01'",
  "output": { "format": "parquet" }
}

Inspect schemas without reading data (`read_schema`)

If you want to quickly inspect schemas (especially for many files) without running a full scan, use read_schema:

http

POST {beacon-host}/api/query
Content-Type: application/json

{
  "sql": "SELECT * FROM read_schema(['**/*.nc'], 'netcdf')"
}

Explain a query plan

Use POST /api/explain-query with the same body you would send to /api/query.

Get query metrics

Use the x-beacon-query-id response header from /api/query, then request:

GET /api/query/metrics/{query_id}

Output formats

For file output, supported output.format values include:

csv
parquet
ipc (alias: arrow)
netcdf
nd_netcdf (requires dimension_columns)
geoparquet
odv

Examples of formats with options:

json

{
  "sql": "SELECT longitude, latitude, time, temp FROM read_parquet(['*.parquet']) LIMIT 10000",
  "output": {
    "format": { "geoparquet": { "longitude_column": "longitude", "latitude_column": "latitude" } }
  }
}

json

{
  "sql": "SELECT time, depth, temp FROM read_netcdf(['**/*.nc']) LIMIT 10000",
  "output": {
    "format": { "nd_netcdf": { "dimension_columns": ["time", "depth"] } }
  }
}

Installation

Getting started

Data Lake Setup

Datasets

File Formats

Datasets Harmonization

Data Tables

Data Lake Querying

Tuning for performance

Beacon Binary Format

Querying with SQL

Minimal example

Download as a file (recommended for most clients)

Common patterns

Discover available table functions

Inspect available tables and schema

Query a registered table

Read from files using table functions

NetCDF (`read_netcdf`)

Zarr (`read_zarr`)

Parquet (`read_parquet`)

CSV (`read_csv`)

Arrow IPC files (`read_arrow`)

Beacon Binary Format (`read_bbf`)

Inspect schemas without reading data (`read_schema`)

Explain a query plan

Get query metrics

Output formats

Datasets

File Formats

Datasets Harmonization

Data Tables

Querying with SQL ​

Minimal example ​

Download as a file (recommended for most clients) ​

Common patterns ​

Discover available table functions ​

Inspect available tables and schema ​

Query a registered table ​

Read from files using table functions ​

NetCDF (read_netcdf) ​

Zarr (read_zarr) ​

Parquet (read_parquet) ​

CSV (read_csv) ​

Arrow IPC files (read_arrow) ​

Beacon Binary Format (read_bbf) ​

Inspect schemas without reading data (read_schema) ​

Explain a query plan ​

Get query metrics ​

Output formats ​

Querying with SQL

Minimal example

Download as a file (recommended for most clients)

Common patterns

Discover available table functions

Inspect available tables and schema

Query a registered table

Read from files using table functions

NetCDF (`read_netcdf`)

Zarr (`read_zarr`)

Parquet (`read_parquet`)

CSV (`read_csv`)

Arrow IPC files (`read_arrow`)

Beacon Binary Format (`read_bbf`)

Inspect schemas without reading data (`read_schema`)

Explain a query plan

Get query metrics

Output formats