Exploring the Data Lake (API)
This chapter shows how to discover what data is available on a running Beacon instance using raw HTTP request templates.
All paths are shown as relative URLs (for example GET /api/info). Send them to your Beacon base URL.
Concepts
- Datasets: individual assets (e.g. a single
.ncfile, a single.parquetfile, a Zarr group). - Tables (collections): named logical tables that Beacon registers (often spanning many datasets).
- Schemas/columns: returned as an Arrow schema (fields + types). Use schemas to discover which columns you can
selectandfilteron.
System info
GET /api/infoDatasets
List datasets
Preferred endpoint:
GET /api/list-datasetsOptional query parameters:
pattern: glob pattern to filter dataset paths (example:*.nc,**/*.parquet)offset: pagination offsetlimit: pagination limit
Examples:
GET /api/list-datasets?pattern=*.nc&limit=50&offset=0Total dataset count
GET /api/total-datasetsGet dataset schema (columns)
To get the Arrow schema for a single dataset path:
GET /api/dataset-schema?file=test-files/gridded-example.ncTo infer a merged schema across multiple datasets using a glob:
GET /api/dataset-schema?file=**/*.ncTIP
The response is an Arrow schema JSON. Column names are under .fields[].name.
Tables (collections)
List tables
GET /api/tablesGet default table name
If the query request does not specify from, Beacon uses this table.
GET /api/default-tableGet table schema (columns)
GET /api/table-schema?table_name=defaultList all tables with schemas
This is convenient for UI discovery, but can be heavier on large installations.
GET /api/tables-with-schemaGet table configuration
Useful to see how a table was constructed (paths, file format, statistics settings, etc.).
GET /api/table-config?table_name=defaultFunctions
Beacon exposes DataFusion scalar functions and Beacon table functions.
GET /api/functionsGET /api/table-functionsTable functions are especially useful for SQL queries (e.g. read_netcdf([...]), read_parquet([...])).
Admin endpoints (tables/files)
Admin endpoints are under /api/admin/* (create table, delete table, upload/download/delete files) and are protected by HTTP Basic Auth.
See the Data Lake docs for deeper table concepts, or browse /swagger for the exact request/response shapes.
Create a table (collection)
Raw HTTP (as a template):
POST /api/admin/create-table
Authorization: Basic <base64(username:password)>
Content-Type: application/json
{
"table_name": "argo",
"table_type": {
"logical": {
"paths": [
"argo/*.parquet"
],
"file_format": "parquet"
}
}
}