> ## Documentation Index
> Fetch the complete documentation index at: https://docs.trendteller.com.br/llms.txt
> Use this file to discover all available pages before exploring further.

# Data Pipeline

> Dataform transformations and medallion architecture on BigQuery

## Overview

The Data Pipeline is the backbone of Trendteller's analytics platform, transforming raw e-commerce data into analytics-ready tables using Dataform and Google BigQuery.

## Technology Stack

<CardGroup cols={2}>
  <Card title="Dataform" icon="diagram-project">
    Data transformation and orchestration platform
  </Card>

  <Card title="BigQuery" icon="google">
    Google's serverless data warehouse
  </Card>

  <Card title="SQLX" icon="code">
    SQL with templating and JavaScript
  </Card>

  <Card title="Git-based" icon="code-branch">
    Version-controlled transformations
  </Card>
</CardGroup>

## Medallion Architecture

The pipeline implements a three-layer medallion architecture:

<Steps>
  <Step title="Bronze Layer">
    **Raw data ingestion from Airbyte**

    * Tables prefixed with `bronze_`
    * Maintains source format and structure
    * Includes metadata (ingestion timestamp, source brand)
    * Full historical data retained

    ```sql theme={null}
    bronze_bling_orders
    bronze_vnda_products
    bronze_tiny_customers
    ```
  </Step>

  <Step title="Silver Layer">
    **Standardized and cleansed data**

    * Tables prefixed with `silver_`
    * Common schema across all brands
    * Data quality rules applied
    * Deduplication and validation

    ```sql theme={null}
    silver_orders      -- All brands unified
    silver_products    -- Standardized catalog
    silver_customers   -- Consolidated customers
    ```
  </Step>

  <Step title="Gold Layer">
    **Analytics-ready aggregations**

    * Tables prefixed with `gold_`
    * Pre-aggregated metrics
    * Optimized for query performance
    * Business logic applied

    ```sql theme={null}
    gold_daily_sales
    gold_product_performance
    gold_customer_lifetime_value
    ```
  </Step>
</Steps>

## Data Models

### Core Tables

<Tabs>
  <Tab title="Transactional">
    **Orders** - All order transactions across brands

    * `order_id`, `brand_id`, `customer_id`
    * `total_amount`, `status`, `created_at`
    * `items[]`, `payments[]`, `shipping`

    **Order Items** - Individual line items

    * `item_id`, `order_id`, `product_id`
    * `quantity`, `unit_price`, `discount`

    **Payments** - Payment transactions

    * `payment_id`, `order_id`, `method`
    * `amount`, `status`, `processed_at`
  </Tab>

  <Tab title="Master Data">
    **Products** - Product catalog

    * `product_id`, `brand_id`, `sku`
    * `name`, `description`, `category`
    * `price`, `cost`, `stock_quantity`

    **Customers** - Customer information

    * `customer_id`, `brand_id`
    * `name`, `email`, `phone`
    * `address`, `registration_date`

    **Brands** - Brand configuration

    * `brand_id`, `name`, `platform`
    * `settings`, `api_credentials`
  </Tab>

  <Tab title="Analytics">
    **Daily Sales** - Daily aggregated metrics

    * `date`, `brand_id`
    * `total_revenue`, `order_count`
    * `avg_order_value`, `unique_customers`

    **Product Performance** - Product-level metrics

    * `product_id`, `period`
    * `units_sold`, `revenue`
    * `margin`, `stock_turns`

    **Customer Segments** - Customer analytics

    * `segment_id`, `customer_count`
    * `avg_ltv`, `churn_rate`
    * `acquisition_cost`
  </Tab>
</Tabs>

## Brand Integrations

Trendteller supports **11 brands** across **9 e-commerce platforms**:

<AccordionGroup>
  <Accordion title="Bling Integrations">
    **Platform**: Bling ERP
    **Brands**: 3 brands
    **Data**: Orders, products, customers, inventory, invoices
    **Sync**: Incremental (every 6 hours)
  </Accordion>

  <Accordion title="VNDA Integrations">
    **Platform**: VNDA Fashion Platform
    **Brands**: 2 brands
    **Data**: Orders, products, categories, variants
    **Sync**: Real-time webhooks + daily batch
  </Accordion>

  <Accordion title="Other Platforms">
    * **Shoppub**: Marketplace integration (2 brands)
    * **Tiny**: ERP integration (1 brand)
    * **Microvix**: Retail management (1 brand)
    * **Braavo**: E-commerce (1 brand)
    * **JetERP**: Enterprise system (1 brand)
  </Accordion>
</AccordionGroup>

## Transformation Logic

### SQLX Templates

Dataform uses SQLX for SQL transformations with JavaScript templating:

```sql theme={null}
config {
  type: "table",
  schema: "silver",
  tags: ["orders", "daily"]
}

SELECT
  o.order_id,
  o.brand_id,
  b.brand_name,
  o.customer_id,
  o.total_amount,
  o.order_date,
  ${helpers.calculateDiscount('o.total_amount', 'o.discount_percent')} as discount_amount,
  CURRENT_TIMESTAMP() as processed_at
FROM ${ref("bronze_orders")} o
LEFT JOIN ${ref("dim_brands")} b
  ON o.brand_id = b.brand_id
WHERE o.order_date >= CURRENT_DATE() - 90
```

<Tip>
  The `${ref()}` function ensures table dependencies are tracked and executed in the correct order.
</Tip>

### Data Quality Checks

Built-in assertions ensure data quality:

```sql theme={null}
config {
  type: "assertion",
  tags: ["quality_check"]
}

-- Ensure no negative order amounts
SELECT *
FROM ${ref("silver_orders")}
WHERE total_amount < 0
```

## Pipeline Orchestration

### Execution Schedule

<Steps>
  <Step title="Incremental Updates">
    Every 6 hours, incremental transformations process new data:

    * Bronze → Silver (standardization)
    * Silver → Gold (aggregation)
  </Step>

  <Step title="Daily Full Refresh">
    Once daily at 2 AM (BRT), full refresh for:

    * Historical aggregations
    * Cross-brand metrics
    * Customer lifetime value
  </Step>

  <Step title="Manual Triggers">
    On-demand execution via:

    * Dataform CLI
    * GitHub Actions
    * Dataform Web UI
  </Step>
</Steps>

### Dependency Management

Dataform automatically resolves table dependencies:

```
bronze_orders → silver_orders → gold_daily_sales
                              ↘ gold_customer_segments
```

## Data Warehouse Configuration

### BigQuery Setup

<Info>
  **Project**: togo-425319
  **Location**: southamerica-east1 (São Paulo)
  **Environment**: Production
</Info>

**Datasets**:

* `bronze` - Raw data from sources
* `silver` - Standardized tables
* `gold` - Analytics-ready views
* `metadata` - Pipeline metadata

### Performance Optimization

<CardGroup cols={2}>
  <Card title="Partitioning" icon="layer-group">
    Tables partitioned by date for faster queries
  </Card>

  <Card title="Clustering" icon="sitemap">
    Key columns clustered for query optimization
  </Card>

  <Card title="Materialization" icon="database">
    Incremental models reduce compute costs
  </Card>

  <Card title="Caching" icon="bolt">
    Query results cached for repeated access
  </Card>
</CardGroup>

## Monitoring & Observability

### Pipeline Metrics

* Execution duration and success rate
* Data volume processed (rows/GB)
* Data freshness by table
* Error rates and failure reasons

### Alerting

Automated alerts for:

* Pipeline execution failures
* Data quality assertion failures
* Stale data (exceeds freshness threshold)
* Resource quota warnings

## Development Workflow

### Local Development

```bash theme={null}
cd dataform
npm install

# Compile SQLX to SQL
node compile.js

# Validate transformations
dataform compile

# Dry run (no data modification)
dataform run --dry-run

# Execute locally (requires credentials)
dataform run --tags daily
```

### CI/CD Pipeline

```bash theme={null}
# On pull request
- Compile and validate SQL
- Run data quality tests
- Generate documentation

# On merge to main
- Deploy to production
- Execute transformations
- Update documentation
```

## Next Steps

<CardGroup cols={2}>
  <Card title="Architecture" icon="sitemap" href="/architecture/dataform/architecture">
    Explore pipeline architecture in detail
  </Card>

  <Card title="Data Models" icon="table" href="/architecture/dataform/data-models">
    Browse complete schema documentation
  </Card>

  <Card title="Development Guide" icon="code" href="/development/dataform/guide">
    Learn how to build transformations
  </Card>

  <Card title="Brand Integrations" icon="plug" href="/architecture/dataform/brand-integrations">
    Understand brand-specific logic
  </Card>
</CardGroup>
