Changelog
All notable changes to this project will be documented in this file.
[1.2.0] - 2026-01-14
Breaking changes
- Query streaming now returns a
pyarrow.RecordBatchStreamReaderfromQuery.execute_streaming()instead of yielding individualRecordBatchobjects. This allows users to manage the stream lifecycle directly and integrate with Arrow's native reading/writing utilities.
1.10.0 - 2025-12-07
Breaking changes
- Raised the minimum supported Python version from 3.8 to 3.10 and promoted several previously-optional dependencies (
fsspec,dask,zarr,networkx,matplotlib,numpy,geopandas) to core requirements. Lightweight environments may need to be recreated before upgrading. - Reworked the query entry points to be table/dataset-first.
Client.list_tables()now returnsDataTablehelpers,Client.list_datasets()mirrors that experience for raw files, and the legacyClient.query()/Client.available_columns*()helpers have been deprecated in favor of the richer table and dataset APIs.
Added
- Dataset-aware workflows.
Client.list_datasets()(Beacon ≥ 1.4.0) now surfaces every server-side dataset as a typedDatasethelper that can: fetch apyarrow.Schema, expose metadata (get_file_name(),get_file_format()), and produce a JSON query builder via.query(). CSV and Zarr datasets accept format-specific options such as custom delimiters or statistics columns directly on the query call. - Beacon node management helpers. Administrative operations—including
upload_dataset(),download_dataset(),delete_dataset(),create_logical_table()anddelete_table()—were added toClient. Each helper enforcesBaseBeaconSession.is_admin()and server version gates so automation scripts can manage Beacon nodes safely. - Modular JSON query builder. The monolithic
beacon_api.querymodule has been replaced by a node-based package (e.g._from,select,filter,distinct,sort,functions). This unlocks fluent helpers such asadd_select_column,add_select_coalesced,add_polygon_filter,set_distinct,add_sort, and new function nodes (Functions.concat,Functions.coalesce,Functions.try_cast_to_type,Functions.map_pressure_to_depth, etc.) for assembling complex projections. - Geospatial and scientific outputs.
BaseQuerycan now stream Arrow record batches (execute_streaming) and materialize results as GeoParquet, GeoPandas, NdNetCDF, NetCDF, Arrow, CSV, Parquet, Zarr, Ocean Data View exports, or directly into an xarray dataset. The helpers write responses chunk-by-chunk to disk to avoid loading full payloads into memory. - Documentation and site tooling. The MkDocs configuration now ships topical guides under “Using the Data Lake” (Exploring, Querying, Tables, Datasets), API references powered by
mkdocstrings, versioned docs viamike, and an example gallery (e.g. the World Ocean Database walkthrough) that mirrors the new SDK surface area.
Changed
BaseBeaconSessionnow detects the Beacon server version on construction, exposesversion_at_least(), and checks admin capabilities withis_admin(). Higher-level helpers automatically guard experimental endpoints (datasets, logical tables, streaming outputs) behind these checks.DataTableintrospection now fetches Arrow schemas through/api/table-schema, exposing precise field types for downstream tooling. Thesubset()helper applies the new dataclass-based filter nodes so you can reuse bounding-box, depth, and time filters elsewhere.- Query materialization helpers such as
to_parquet,to_csv,to_arrow, andto_geoparquetnow stream response chunks to disk rather than buffering entire files in memory, improving stability on large exports. - Documentation content was rewritten to align with the new APIs—
docs/getting_started.md,docs/using/*.md, and the reference pages now showcase dataset-first queries, polygon filters, geospatial exports, and SQL parity.
Fixed
- Eliminated runaway memory usage during large exports by switching every file writer to
response.iter_content()streaming. - Hardened dataset/table schema parsing: unsupported Beacon field types now trigger explicit exceptions, while timestamp formats are automatically mapped to the correct
pyarrowtimestamp resolution.