Overview
The Data Pipeline is the backbone of Trendteller’s analytics platform, transforming raw e-commerce data into analytics-ready tables using Dataform and Google BigQuery.Technology Stack
Dataform
Data transformation and orchestration platform
BigQuery
Google’s serverless data warehouse
SQLX
SQL with templating and JavaScript
Git-based
Version-controlled transformations
Medallion Architecture
The pipeline implements a three-layer medallion architecture:1
Bronze Layer
Raw data ingestion from Airbyte
- Tables prefixed with
bronze_ - Maintains source format and structure
- Includes metadata (ingestion timestamp, source brand)
- Full historical data retained
2
Silver Layer
Standardized and cleansed data
- Tables prefixed with
silver_ - Common schema across all brands
- Data quality rules applied
- Deduplication and validation
3
Gold Layer
Analytics-ready aggregations
- Tables prefixed with
gold_ - Pre-aggregated metrics
- Optimized for query performance
- Business logic applied
Data Models
Core Tables
- Transactional
- Master Data
- Analytics
Orders - All order transactions across brands
order_id,brand_id,customer_idtotal_amount,status,created_atitems[],payments[],shipping
item_id,order_id,product_idquantity,unit_price,discount
payment_id,order_id,methodamount,status,processed_at
Brand Integrations
Trendteller supports 11 brands across 9 e-commerce platforms:Bling Integrations
Bling Integrations
Platform: Bling ERP
Brands: 3 brands
Data: Orders, products, customers, inventory, invoices
Sync: Incremental (every 6 hours)
VNDA Integrations
VNDA Integrations
Platform: VNDA Fashion Platform
Brands: 2 brands
Data: Orders, products, categories, variants
Sync: Real-time webhooks + daily batch
Other Platforms
Other Platforms
- Shoppub: Marketplace integration (2 brands)
- Tiny: ERP integration (1 brand)
- Microvix: Retail management (1 brand)
- Braavo: E-commerce (1 brand)
- JetERP: Enterprise system (1 brand)
Transformation Logic
SQLX Templates
Dataform uses SQLX for SQL transformations with JavaScript templating:The
${ref()} function ensures table dependencies are tracked and executed in the correct order.Data Quality Checks
Built-in assertions ensure data quality:Pipeline Orchestration
Execution Schedule
1
Incremental Updates
Every 6 hours, incremental transformations process new data:
- Bronze → Silver (standardization)
- Silver → Gold (aggregation)
2
Daily Full Refresh
Once daily at 2 AM (BRT), full refresh for:
- Historical aggregations
- Cross-brand metrics
- Customer lifetime value
3
Manual Triggers
On-demand execution via:
- Dataform CLI
- GitHub Actions
- Dataform Web UI
Dependency Management
Dataform automatically resolves table dependencies:Data Warehouse Configuration
BigQuery Setup
Project: togo-425319
Location: southamerica-east1 (São Paulo)
Environment: Production
bronze- Raw data from sourcessilver- Standardized tablesgold- Analytics-ready viewsmetadata- Pipeline metadata
Performance Optimization
Partitioning
Tables partitioned by date for faster queries
Clustering
Key columns clustered for query optimization
Materialization
Incremental models reduce compute costs
Caching
Query results cached for repeated access
Monitoring & Observability
Pipeline Metrics
- Execution duration and success rate
- Data volume processed (rows/GB)
- Data freshness by table
- Error rates and failure reasons
Alerting
Automated alerts for:- Pipeline execution failures
- Data quality assertion failures
- Stale data (exceeds freshness threshold)
- Resource quota warnings

