Skip to content

Introspection

These table functions inspect schemas, file listings, and statistics without reading dataset rows. Use them to explore an unfamiliar data lake before writing queries.

read_schema

text
read_schema(glob_paths, file_format)

Returns the inferred schema (column names and types) for a set of files without reading any data.

file_format must be one of: netcdf, zarr, parquet, arrow, csv, odv_ascii, bbf, tiff.

sql
SELECT * FROM read_schema(['argo/**/*.nc'], 'netcdf')

SELECT * FROM read_schema(['obs/*.parquet'], 'parquet')

list_datasets

text
list_datasets()

Lists all files currently stored in Beacon's dataset storage root. Returns one row per file.

ColumnTypeDescription
file_nameTEXTPath relative to the storage root
file_formatTEXTDetected format
sql
SELECT * FROM list_datasets()

-- Find all NetCDF files
SELECT file_name FROM list_datasets() WHERE file_format = 'netcdf'

view_dataset_statistics

text
view_dataset_statistics(path)

Returns per-column min/max statistics for a single file. Statistics are read from the cache when available and computed on demand otherwise.

ColumnTypeDescription
column_nameTEXTColumn name
data_typeTEXTColumn data type
min_valueTEXTMinimum value (NULL if unknown)
max_valueTEXTMaximum value (NULL if unknown)
is_exactBOOLEANWhether the statistics are exact
sql
SELECT * FROM view_dataset_statistics('argo/2024/R6900001.nc')

view_external_table_statistics

text
view_external_table_statistics(table_name)

Returns per-file statistics for every file that backs an external table — useful for checking which files have cached statistics and what their value ranges are.

ColumnTypeDescription
pathTEXTFile path
file_sizeUINT64File size in bytes
cachedBOOLEANWhether statistics are cached for this file
column_nameTEXTColumn name (NULL if not cached)
data_typeTEXTColumn data type
min_valueTEXTMinimum value
max_valueTEXTMaximum value
is_exactBOOLEANWhether the statistics are exact
sql
SELECT * FROM view_external_table_statistics('ocean_profiles')

-- Find files with no cached statistics
SELECT path FROM view_external_table_statistics('ocean_profiles')
WHERE cached = false

view_statistics_cache

text
view_statistics_cache()

Streams all entries from the global file statistics cache. Each row is validated against the object store — the is_valid flag indicates whether the cached file still exists and its size matches.

ColumnTypeDescription
pathTEXTFile path
file_sizeUINT64File size in bytes
is_validBOOLEANWhether the cached entry is still valid
column_nameTEXTColumn name
data_typeTEXTColumn data type
min_valueTEXTMinimum value
max_valueTEXTMaximum value
is_exactBOOLEANWhether the statistics are exact
sql
SELECT * FROM view_statistics_cache()

-- Find stale cache entries
SELECT path FROM view_statistics_cache() WHERE is_valid = false