Getting Started
To quickly get started with querying data, we first need to know which content (datasets or data tables) the data lake contains. This will help us understand how to effectively query and utilize the available data resources.
Querying Datasets
To learn which datasets are available use the following http GET request:
GET /api/datasets
This will return a list of datasets available in the data lake.
Once we know which datasets are available, we can list the schema of the datasets using the following http GET request: The schema will provide information about the structure of the datasets, including the names and types of the columns.
GET /api/dataset-schema?file=example1.nc
Querying Datasets with SQL
Once we know which datasets are available, we can query the data using SQL and using the output format parquet. Beacon will then return the results in the specified format. The SQL interface is available at the /api/query
endpoint. You can use the sql
parameter to specify the SQL query that you want to execute.
The following example shows how to query a dataset using SQL:
POST /api/query
Content-Type: application/json
{
"sql": "SELECT TEMP,PSAL,LONGITUDE,LATITUDE FROM read_netcdf('*.nc')",
"output": {
"format": "parquet"
}
}
You can extend the SQL query to include more complex operations such as filtering, aggregating, and joining data from multiple datasets or data tables. The SQL syntax is similar to standard SQL, so you can use familiar SQL constructs to query the data.
SELECT TEMP, "TEMP.units", PSAL, DEPTH, LATITUDE, LONGITUDE
FROM read_netcdf('example1.nc')
WHERE DEPTH < 1000 AND LATITUDE > 30 AND LONGITUDE < -80
ORDER BY DEPTH DESC
LIMIT 10
Querying Data Tables
To learn which data tables are available use the following http GET request:
GET /api/tables
This will return a list of data tables available in the data lake. Once we know which data tables are available, we can list the schema of the data tables using the following http GET request:
The schema will provide information about the structure of the data tables, including the names and types of the columns.
GET /api/table-schema?table=example1
Querying Data Tables with SQL
Once we know which data tables are available, we can query the data using SQL and using the output format parquet. Beacon will then return the results in the specified format. The SQL interface is available at the /api/query
endpoint. You can use the sql
parameter to specify the SQL query that you want to execute.
The following example shows how to query a data table using SQL:
POST /api/query
Content-Type: application/json
{
"sql": "SELECT TEMP,PSAL,LONGITUDE,LATITUDE FROM example1",
"output": {
"format": "parquet"
}
}
You can extend the SQL query to include more complex operations such as filtering, aggregating, and joining data from multiple datasets or data tables. The SQL syntax is similar to standard SQL, so you can use familiar SQL constructs to query the data.
SELECT TEMP, "TEMP.units" PSAL, DEPTH, LATITUDE, LONGITUDE
FROM example1
WHERE DEPTH < 1000 AND LATITUDE > 30 AND LONGITUDE < -80
ORDER BY DEPTH DESC
LIMIT 10