Data lake

Beacon provides a lightweight data lake model that makes scientific datasets easy to discover, query, and serve through a single API. It supports array and tabular formats (NetCDF, Zarr, Parquet, ODV, CSV) and exposes them through Arrow + DataFusion for fast columnar reads.

Core concepts

Datasets: Individual files or stores (for example .nc, .zarr, .parquet). Datasets can be queried directly and are the smallest unit in Beacon.
External tables: A registered name over one or more files (a folder or glob pattern) with a merged schema, queryable as one logical table. See External Tables.
Managed tables: Iceberg-backed tables Beacon owns and can mutate with INSERT / UPDATE / DELETE. See Managed Tables.
Views: A saved query exposed as a table. See Views.
Metadata & schema: Beacon inspects dataset metadata and builds schemas so you can discover available columns before running queries.
Pushdown & partitioning: Filters and projections are pushed down to reduce IO and speed up queries over large data.

How it works at a glance

Register or place datasets in the configured data directories or object store.
Inspect schemas through the API to understand available columns.
Query datasets or tables using SQL or the JSON query DSL.

For detailed guidance, see the SQL query docs and JSON query docs.

Supported Formats

External Tables

Remote Tables (Federation)

Performance Tuning

WHERE

JOIN

Reading Files

CREATE EXTERNAL TABLE

CREATE TABLE (Managed)

CREATE MATERIALIZED VIEW

Introspection

Function Reference

Beacon-specific

Geospatial

Domain Mapping

Exploring the Data Lake

Querying

JSON Query DSL

SQL

Data lake

Core concepts

How it works at a glance

External Tables

Remote Tables (Federation)

WHERE

JOIN

Reading Files

CREATE EXTERNAL TABLE

CREATE TABLE (Managed)

CREATE MATERIALIZED VIEW

Beacon-specific

Geospatial

Domain Mapping

JSON Query DSL

SQL

Data lake ​

Core concepts ​

How it works at a glance ​

Data lake

Core concepts

How it works at a glance