How To Use
The easiest way to use the Beacon Binary Format is to leverage the provided command-line tools. These tools allow you to easily convert your datasets into the Beacon Binary Format and perform various operations on them.
To get started, you can download the beacon-binary-format-toolbox from the official repository. This is a command-line tool that simplifies the process of working with the Beacon Binary Format and is supported on Windows and Linux.
Once you have created a .bbf (binary binary format) file, just simply store it inside the datasets directory (or s3 bucket when using cloud storage) and beacon can use it for various operations immediately.
BBF files can also be used when creating data collections just as any other supported data format by Beacon. This means you can combine different .bbf files to create one huge data collection which can be queried seamlessly via a single table name.
Installation
To install the beacon-binary-format-toolbox
, follow these steps:
Linux (Ubuntu)
- Download the latest release from the official repository.
- Extract the downloaded archive.
- Install the netcdf binaries:
apt install libnetcdf-dev
andapt install netcdf-bin
.
Windows
- Download the latest release from the official repository.
- Extract the downloaded archive.
- Add the toolbox to your system's PATH.
- Install the netcdf & hdf5 binaries and add them to your system's PATH.
Creating a Beacon Binary Format File Collection
The following command allows you to create a Beacon Binary Format file collection:
beacon-binary-format-toolbox create --glob <GLOB> -o <OUTPUT>
beacon-binary-format-toolbox create --glob "data/*.nc" -o output.bbf
The following flags are available to customize the creation of the file:
--compression <COMPRESSION>
: Set the compression type and level (default: zstd:3). Options:zstd:[0-21]
,lz4
,none
--pruning
: Enable the creation of a pruning index (default: true)--group-size <GROUP_SIZE>
: Set the array group size before compression (default: 4MB)
Supported file formats are:
- NetCDF (.nc)
- Parquet (.parquet)
- CSV (.csv)
Inspecting
Listing the footer (file collection metadata)
This will return the super type schema of the file collection.
beacon-binary-format-toolbox list-footer --file-path output.bbf
Listing the stored datasets
Listing the datasets stored in the file collection. Filtering can be done using a regex expr on the file name (entry_key)
beacon-binary-format-toolbox list-datasets-regex --file-path output.bbf --pattern ".*"
It is also possible to just list a single column:
beacon-binary-format-toolbox list-datasets-regex --file-path output.bbf --pattern ".*" --column <COLUMN_NAME>
Listing the pruning index
beacon-binary-format-toolbox list-pruning-index --file-path output.bbf
Listing for specific column:
beacon-binary-format-toolbox list-pruning-index --file-path output.bbf --column <COLUMN_NAME>