Exploring the Beacon Data Lake
This guide shows how to inspect/query a Beacon Node, discover the available data, and subset slices for local analysis. All snippets are built directly on top of the public SDK classes so you can paste them into scripts or notebooks as-is.
Connect and verify
from beacon_api import Client
client = Client("https://beacon.example.com")
client.check_status() # probes /api/health and prints the version
info = client.get_server_info()
print(info["beacon_version"], info.get("extensions"))
Troubleshooting connectivity
The client automatically prefixes relative URLs with the base URL. If you see Failed to connect to server, double-check the base URL and whether your token grants access to /api/health.
Discover tables
list_tables() returns a mapping of table names to DataTable helpers with cached metadata. Iterate to learn what each collection represents:
tables = client.list_tables()
for name, table in tables.items():
print(f"{name} → {table.get_table_type()} :: {table.get_table_description()}")
Grab one table—default exists on every Beacon deployment—and inspect its schema:
default_table = tables["default"]
schema_arrow = default_table.get_table_schema_arrow()
for field in schema_arrow:
print(f"{field.name}: {field.type}")
Use get_table_schema() when you want a PyArrow Schema object you can re-use (for instance, to validate a DataFrame before upload).
Sample data quickly
Need a glance at the data without hand-writing longitude/latitude filters? DataTable.subset() configures a JSONQuery with a bounding box, depth range, and time window:
sample = default_table.subset(
longitude_column="LONGITUDE",
latitude_column="LATITUDE",
time_column="JULD",
depth_column="PRES",
columns=["TEMP", "PSAL"],
bbox=(-20, 40, -10, 50),
depth_range=(0, 25),
)
preview = sample.to_pandas_dataframe().head()
Because subset() returns a normal JSONQuery, you can tack on additional selects/filters before executing it.
Browse individual datasets (Beacon ≥ 1.4)
When the Beacon Node exposes /api/list-datasets, you can work with files directly instead of logical tables:
datasets = client.list_datasets(pattern="**/*.parquet", limit=10)
first_file = next(iter(datasets.values()))
print(first_file.get_file_path(), first_file.get_file_format())
dataset_schema = first_file.get_schema()
Once you have a Dataset, spin up a query exactly the same way as with tables:
dataset_query = first_file.query()
dataset_df = (
dataset_query
.add_select_column("lon", alias="longitude")
.add_select_column("lat", alias="latitude")
.add_range_filter("time", "2023-01-01T00:00:00", "2023-12-31T23:59:59")
.to_pandas_dataframe()
)
Go deeper with explain and export helpers
query.explain()calls/api/explain-queryso you can see exactly how the Beacon Node plans to execute the request.query.to_geo_pandas_dataframe(longitude, latitude)materializes aGeoDataFramecomplete with CRS.query.to_xarray_dataset(["JULD", "PRES"])yields an n-dimensional dataset perfect for scientific workflows.query.to_parquet("subset.parquet"),query.to_nd_netcdf("subset_nd.nc", ["JULD", "DEPTH"]), orquery.to_zarr("subset.zarr")stream results directly to disk.
With these building blocks you can confidently explore any Beacon Data Lake, whether you prefer notebooks, scripts, or dashboards.