Reading Files

Table functions let you query files directly in a FROM clause without creating a persistent External Table first. They are useful for ad-hoc exploration or when you want to embed the file path logic inside a View.

All functions accept one or more glob paths as a list. Globs are resolved relative to Beacon's configured dataset storage root.

sql

-- Single path
SELECT * FROM read_parquet(['profiles/2024.parquet'])

-- Folder glob
SELECT * FROM read_netcdf(['argo/**/*.nc'])

-- Multiple globs in one call
SELECT * FROM read_netcdf(['argo/**/*.nc', 'wod/**/*.nc'])

`read_netcdf`

text

read_netcdf(glob_paths)
read_netcdf(glob_paths, dimensions)

Reads NetCDF files matching one or more glob patterns.

The optional dimensions argument filters which variables are returned: a variable is included only if all of its dimensions are a subset of the provided list. Use it to exclude high-dimensional variables you don't need, or to resolve ambiguity when files contain variables with incompatible dimensionalities.

sql

SELECT time, latitude, longitude, temperature
FROM read_netcdf(['argo/**/*.nc'])

-- With explicit dimension columns
SELECT *
FROM read_netcdf(['argo/**/*.nc'], ['time', 'pressure'])

Variable attributes

NetCDF variable attributes (e.g. units, long_name) are exposed as additional columns using the pattern <variable>.<attribute>. Attribute columns preserve the original type (string, integer, float, …). File-level global attributes use a leading dot with no variable prefix: .<attribute>. Quote these column names because they contain a dot.

sql

-- Variable attribute
SELECT temperature, "temperature.units", "temperature.long_name"
FROM read_netcdf(['argo/**/*.nc'])
LIMIT 1

-- Global attribute
SELECT ".source", temperature
FROM read_netcdf(['argo/**/*.nc'])
LIMIT 1

`read_zarr`

text

read_zarr(glob_paths)
read_zarr(glob_paths, statistics_columns)

Reads Zarr stores matching one or more glob patterns. Each path should point at a zarr.json entry file.

The optional statistics_columns argument names coordinate columns for which Beacon has pre-computed chunk statistics. Supplying these enables 1D slice pushdown — Beacon skips chunks that cannot satisfy a WHERE predicate on those columns.

sql

SELECT * FROM read_zarr(['sst/*/zarr.json'])

-- With statistics columns for faster range queries
SELECT time, sst
FROM read_zarr(['sst/*/zarr.json'], ['time', 'latitude', 'longitude'])
WHERE time >= '2024-01-01'

Array attributes

Per-array attributes are exposed as additional columns using the pattern <array>.<attribute>. Attribute columns preserve the original type (string, integer, float, …). Root-level store attributes use a leading dot with no array prefix: .<attribute>. Quote these column names because they contain a dot.

sql

-- Array attribute
SELECT sst, "sst.units", "sst.long_name"
FROM read_zarr(['sst/*/zarr.json'])
LIMIT 1

-- Root-level global attribute
SELECT ".Conventions", sst
FROM read_zarr(['sst/*/zarr.json'])
LIMIT 1

`read_parquet`

text

read_parquet(glob_paths)

sql

SELECT * FROM read_parquet(['obs/**/*.parquet']) LIMIT 100

`read_arrow`

text

read_arrow(glob_paths)

Reads Arrow IPC stream files (.arrow, .ipc).

sql

SELECT * FROM read_arrow(['streams/*.arrow'])

`read_csv`

text

read_csv(glob_paths)
read_csv(glob_paths, delimiter)
read_csv(glob_paths, delimiter, infer_records)

Schema is inferred from the file contents. The first row must be a header row.

delimiter — single-character field separator (default: ,)
infer_records — number of rows to sample when inferring column types (default: 100)

sql

SELECT * FROM read_csv(['metadata/*.csv'])

-- Tab-separated, sample 500 rows for type inference
SELECT * FROM read_csv(['data/*.tsv'], '\t', 500)

`read_odv_ascii`

text

read_odv_ascii(glob_paths)

sql

SELECT * FROM read_odv_ascii(['odv/**/*.txt'])

`read_bbf`

text

read_bbf(glob_paths)

Reads Beacon Binary Format files.

sql

SELECT * FROM read_bbf(['bbf/**/*.bbf'])

`read_tiff`

text

read_tiff(glob_paths)

Reads GeoTIFF and Cloud-Optimized GeoTIFF files.

sql

SELECT * FROM read_tiff(['rasters/elevation.tif'])

Tag attributes

Per-band TIFF tags are exposed as additional columns using the pattern <band>.<attribute>. Attribute columns preserve the original type (string, integer, float, …). File-level tags not tied to a specific band use a leading dot with no band prefix: .<attribute>. Quote these column names because they contain a dot.

sql

-- Band attribute
SELECT band_1, "band_1.nodata", "band_1.scale"
FROM read_tiff(['rasters/elevation.tif'])
LIMIT 1

-- File-level global tag
SELECT ".crs", band_1
FROM read_tiff(['rasters/elevation.tif'])
LIMIT 1

Supported Formats

External Tables

Performance Tuning

WHERE

JOIN

Reading Files

CREATE EXTERNAL TABLE

Introspection

Function Reference

Beacon-specific

Geospatial

Domain Mapping

Exploring the Data Lake

Querying

JSON Query DSL

SQL

Reading Files

`read_netcdf`

Variable attributes

`read_zarr`

Array attributes

`read_parquet`

`read_arrow`

`read_csv`

`read_odv_ascii`

`read_bbf`

`read_tiff`

Tag attributes

External Tables

WHERE

JOIN

Reading Files

CREATE EXTERNAL TABLE

Beacon-specific

Geospatial

Domain Mapping

JSON Query DSL

SQL

Reading Files ​

read_netcdf ​

Variable attributes ​

read_zarr ​

Array attributes ​

read_parquet ​

read_arrow ​

read_csv ​

read_odv_ascii ​

read_bbf ​

read_tiff ​

Tag attributes ​

Reading Files

`read_netcdf`

Variable attributes

`read_zarr`

Array attributes

`read_parquet`

`read_arrow`

`read_csv`

`read_odv_ascii`

`read_bbf`

`read_tiff`

Tag attributes