Skip to content

FromDataLake

Retrieves rows from the configured data lake query and exposes each row as a generated JSON object.

What It Does

Runs a Trino query and emits the returned rows as generated JSON-like objects.

This generator is useful when test data already lives in a lakehouse or analytics system and you want the scenario to read it directly instead of exporting it first. Optional ignored columns let you drop fields that are not relevant for the rest of the flow.

YAML Example

DataSources:
  - Name: OrdersFromDataLake
    Generator: FromDataLake
    GeneratorConfiguration:
      TrinoServerUri: http://trino.local:8080
      Catalog: lakehouse
      Query: SELECT order_id, customer_id, debug_flag FROM qaas.orders
      Username: docs
      Password: docs-password
      ClientTag: qaas-docs
      ColumnsToIgnore:
        - debug_flag

What This Configuration Does

This configuration connects to Trino, runs the provided query, and turns each result row into one generated item.

The debug_flag column is removed before the items are exposed to the rest of the configuration, so downstream consumers only see order_id and customer_id.