Querying with SQL
SQL queries are sent to Beacon using POST /api/query with a JSON body containing a sql string.
TIP
If your node has SQL disabled, enable it by setting BEACON_ENABLE_SQL=true.
TIP
If you are querying from Python (notebooks/pipelines), using the Beacon Python SDK is recommended: Beacon Python SDK docs
Minimal example
The most common pattern is to use a table function in the FROM clause. This lets you query one file, or many files at once, using glob patterns.
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT * FROM read_parquet(['**/*.parquet']) LIMIT 10"
}If you omit the output field, Beacon streams results back as Arrow IPC (application/vnd.apache.arrow.stream).
Download as a file (recommended for most clients)
To receive a single downloadable file, set output.format.
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT time, latitude, longitude, temp FROM read_netcdf(['**/*.nc']) WHERE temp BETWEEN 2 AND 10 LIMIT 10000",
"output": {
"format": "parquet"
}
}Beacon will respond with:
Content-Disposition: attachment; filename="output.parquet"x-beacon-query-id: <uuid>
Common patterns
Discover available table functions
Beacon registers a set of table-valued functions you can use in SQL (for example read_netcdf, read_zarr, read_parquet, ...).
- List table functions:
GET /api/table-functions
Inspect available tables and schema
- List tables:
GET /api/tables - Default table schema (if you use it):
GET /api/default-table-schema - Table schema:
GET /api/table-schema?table_name=...
Query a registered table
If your Beacon node has tables configured (collections), you can query them directly:
Use GET /api/tables to discover the table names available on your server.
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT time, latitude, longitude, temp FROM some_table WHERE temp IS NOT NULL LIMIT 1000",
"output": { "format": "csv" }
}Read from files using table functions
All file-reading functions take a list of glob paths as their first argument, for example ['**/*.nc'].
NetCDF (read_netcdf)
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT TEMP, PSAL, LONGITUDE, LATITUDE FROM read_netcdf(['**/*.nc']) WHERE TEMP IS NOT NULL LIMIT 100000",
"output": { "format": "parquet" }
}Zarr (read_zarr)
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT TEMP, PSAL, LONGITUDE, LATITUDE FROM read_zarr(['datasets.zarr/zarr.json']) LIMIT 100000",
"output": { "format": "parquet" }
}POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT TEMP, PSAL, LONGITUDE, LATITUDE FROM read_zarr(['*/zarr.json']) LIMIT 100000",
"output": { "format": "parquet" }
}read_zarr also supports an optional second argument statistics_columns (a list of column names). Supplying these can improve performance for large Zarr datasets by enabling pushdown based on precomputed statistics.
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT TEMP, PSAL, LONGITUDE, LATITUDE FROM read_zarr(['datasets.zarr/zarr.json'], ['LONGITUDE', 'LATITUDE']) WHERE LONGITUDE > 10 AND LATITUDE < 50",
"output": { "format": "parquet" }
}Parquet (read_parquet)
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT * FROM read_parquet(['*.parquet']) WHERE quality_flag = 0 LIMIT 100000",
"output": { "format": "parquet" }
}CSV (read_csv)
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT * FROM read_csv(['1.csv', '2.csv']) LIMIT 10000",
"output": { "format": "parquet" }
}Arrow IPC files (read_arrow)
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT * FROM read_arrow(['*.arrow', '*.ipc']) LIMIT 10000",
"output": { "format": "parquet" }
}Beacon Binary Format (read_bbf)
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT * FROM read_bbf(['*.bbf']) WHERE time >= '2020-01-01' AND time < '2021-01-01'",
"output": { "format": "parquet" }
}Inspect schemas without reading data (read_schema)
If you want to quickly inspect schemas (especially for many files) without running a full scan, use read_schema:
POST {beacon-host}/api/query
Content-Type: application/json
{
"sql": "SELECT * FROM read_schema(['**/*.nc'], 'netcdf')"
}Explain a query plan
Use POST /api/explain-query with the same body you would send to /api/query.
Get query metrics
Use the x-beacon-query-id response header from /api/query, then request:
GET /api/query/metrics/{query_id}
Output formats
For file output, supported output.format values include:
csvparquetipc(alias:arrow)netcdfnd_netcdf(requiresdimension_columns)geoparquetodv
Examples of formats with options:
{
"sql": "SELECT longitude, latitude, time, temp FROM read_parquet(['*.parquet']) LIMIT 10000",
"output": {
"format": { "geoparquet": { "longitude_column": "longitude", "latitude_column": "latitude" } }
}
}{
"sql": "SELECT time, depth, temp FROM read_netcdf(['**/*.nc']) LIMIT 10000",
"output": {
"format": { "nd_netcdf": { "dimension_columns": ["time", "depth"] } }
}
}