2026-03-12
Marketing Data Warehouse Guide: Unified Analytics for E-Commerce Growth

Marketing Data Warehouse Guide: Unified Analytics for E-Commerce Growth
Your marketing data is scattered across 15+ platforms, each telling a different story about performance.
Facebook says one thing, Google says another, your email platform disagrees with both, and your attribution model conflicts with your accounting system. Meanwhile, you're trying to make million-dollar budget decisions based on fragmented, conflicting data.
A marketing data warehouse solves this chaos by creating a single source of truth that unifies all your customer data, marketing touchpoints, and business outcomes. Here's your complete guide to building data infrastructure that enables confident decision-making.
The Data Fragmentation Problem
The Modern Marketing Stack:
- 15+ marketing platforms on average
- 5-8 analytics and measurement tools
- 3-4 e-commerce and sales systems
- 2-3 customer service platforms
- Multiple internal databases and spreadsheets
The Cost of Fragmentation:
- Decision paralysis from conflicting metrics
- Wasted budget from incomplete attribution
- Missed opportunities from siloed insights
- Team inefficiency from manual data compilation
- Inability to measure true customer lifetime value
Real-World Impact Example: An ATTN client was seeing:
- Google Ads reporting 4.2x ROAS
- Facebook claiming 3.8x ROAS
- Shopify Analytics showing 2.1x blended ROAS
- Accounting showing 1.6x actual profit per marketing dollar
After implementing a data warehouse with proper attribution modeling, they discovered their true blended ROAS was 2.8x, leading to a complete reallocation of their $2M annual marketing budget.
The DATA Framework for Warehouse Success
D - Data Source Integration and Collection
Core Data Sources to Integrate:
Advertising Platforms:
- Google Ads (Search, Shopping, YouTube, Display)
- Meta/Facebook (Facebook, Instagram, Audience Network)
- TikTok Ads Manager
- Pinterest Business
- LinkedIn Campaign Manager
- Snapchat Ads
- Amazon Advertising
- Microsoft Advertising
Analytics and Measurement:
- Google Analytics 4
- Adobe Analytics
- Mixpanel or Amplitude
- Hotjar or FullStory
- Survey platforms (Typeform, etc.)
- Attribution tools (Triple Whale, Northbeam)
E-Commerce and Sales:
- Shopify/WooCommerce/Magento
- Stripe/PayPal payment processors
- Subscription platforms (ReCharge, Klaviyo)
- Customer service (Zendesk, Gorgias)
- Reviews (Okendo, Yotpo)
Customer Data Platforms:
- Email marketing (Klaviyo, Mailchimp)
- SMS platforms (Postscript, Attentive)
- Customer data platforms (Segment, mParticle)
- CRM systems (HubSpot, Salesforce)
- Loyalty programs
Data Collection Methods:
Integration Approach by Platform:
API-Based (Preferred):
- Real-time or near real-time data sync
- Automated schema detection
- Error handling and retry logic
- Rate limit management
- Authentication and security
Webhook-Based:
- Event-driven data collection
- Real-time transaction and interaction data
- Reduced API call volume
- Immediate data availability
File-Based (Last Resort):
- CSV/Excel exports for limited APIs
- Manual or scheduled imports
- Higher latency but sometimes necessary
- Legacy system integration
A - Architecture Design and Infrastructure
Modern Data Warehouse Architecture:
Cloud-First Approach:
- Snowflake: Performance and scalability leader
- BigQuery: Google ecosystem integration
- Amazon Redshift: AWS ecosystem benefits
- Azure Synapse: Microsoft ecosystem integration
ETL/ELT Platform Selection:
Platform Comparison:
Fivetran:
+ Automated connector maintenance
+ Pre-built marketing platform connectors
+ Reliable data sync and monitoring
- Higher cost for high-volume data
Stitch:
+ More affordable option
+ Good connector coverage
+ Talend acquisition benefits
- Less automated maintenance
Airbyte:
+ Open source flexibility
+ Custom connector development
+ Cost control for technical teams
- Requires more technical expertise
Custom ETL:
+ Complete control and customization
+ No recurring connector costs
+ Tailored to specific business needs
- High development and maintenance overhead
Data Modeling Architecture:
Recommended Layer Structure:
Raw Layer (Bronze):
- Exact copies of source system data
- No transformations applied
- Historical preservation
- Debugging and auditing capability
Staging Layer (Silver):
- Clean and standardized data
- Basic transformations applied
- Data quality checks implemented
- Deduplicated and validated
Analytics Layer (Gold):
- Business-ready data models
- Aggregated and enriched data
- Performance-optimized tables
- User-friendly naming conventions
T - Transformation and Data Modeling
Core Marketing Data Models:
Customer Dimension Table:
CREATE TABLE dim_customers (
customer_id STRING PRIMARY KEY,
email STRING,
first_name STRING,
last_name STRING,
acquisition_date DATE,
acquisition_channel STRING,
acquisition_campaign STRING,
lifetime_value DECIMAL(10,2),
total_orders INTEGER,
customer_segment STRING,
created_at TIMESTAMP,
updated_at TIMESTAMP
);
Attribution Touch points Fact Table:
CREATE TABLE fact_touchpoints (
touchpoint_id STRING PRIMARY KEY,
customer_id STRING,
session_id STRING,
timestamp TIMESTAMP,
channel STRING,
campaign STRING,
ad_set STRING,
creative STRING,
landing_page STRING,
device_type STRING,
cost DECIMAL(10,2),
conversion_value DECIMAL(10,2),
conversion_type STRING
);
Revenue and Transaction Facts:
CREATE TABLE fact_transactions (
transaction_id STRING PRIMARY KEY,
customer_id STRING,
order_date DATE,
revenue DECIMAL(10,2),
cost_of_goods DECIMAL(10,2),
shipping_cost DECIMAL(10,2),
tax_amount DECIMAL(10,2),
discount_amount DECIMAL(10,2),
payment_method STRING,
fulfillment_status STRING,
attribution_channel STRING,
attribution_campaign STRING
);
Campaign Performance Aggregates:
CREATE TABLE agg_campaign_performance_daily (
date DATE,
channel STRING,
campaign_id STRING,
campaign_name STRING,
spend DECIMAL(10,2),
impressions INTEGER,
clicks INTEGER,
conversions INTEGER,
revenue DECIMAL(10,2),
new_customers INTEGER,
return_customers INTEGER,
calculated_at TIMESTAMP
);
Advanced Data Transformations:
Customer Lifetime Value Calculation:
WITH customer_metrics AS (
SELECT
customer_id,
MIN(order_date) as first_order_date,
MAX(order_date) as last_order_date,
COUNT(*) as total_orders,
SUM(revenue) as total_revenue,
AVG(revenue) as avg_order_value,
DATEDIFF('day', MIN(order_date), MAX(order_date)) as customer_lifespan_days
FROM fact_transactions
GROUP BY customer_id
)
SELECT
customer_id,
total_revenue as historical_ltv,
(avg_order_value *
(365 / (customer_lifespan_days / NULLIF(total_orders - 1, 0)))) as predicted_annual_ltv,
CASE
WHEN total_orders = 1 THEN 'One-time'
WHEN total_orders BETWEEN 2 AND 5 THEN 'Repeat'
WHEN total_orders > 5 THEN 'Loyal'
END as customer_segment
FROM customer_metrics;
A - Attribution Modeling and Analysis
Multi-Touch Attribution Implementation:
Data-Driven Attribution Model:
-- Shapley Value Attribution Calculation
WITH touchpoint_combinations AS (
SELECT
customer_id,
conversion_id,
touchpoint_id,
channel,
timestamp,
ROW_NUMBER() OVER (
PARTITION BY customer_id, conversion_id
ORDER BY timestamp
) as touch_sequence
FROM fact_touchpoints
WHERE conversion_id IS NOT NULL
),
attribution_weights AS (
SELECT
customer_id,
conversion_id,
touchpoint_id,
channel,
CASE
WHEN COUNT(*) OVER (PARTITION BY customer_id, conversion_id) = 1 THEN 1.0
WHEN touch_sequence = 1 THEN 0.4 -- First touch
WHEN touch_sequence = MAX(touch_sequence) OVER (PARTITION BY customer_id, conversion_id) THEN 0.4 -- Last touch
ELSE 0.2 / (COUNT(*) OVER (PARTITION BY customer_id, conversion_id) - 2) -- Middle touches
END as attribution_weight
FROM touchpoint_combinations
)
SELECT
channel,
SUM(attribution_weight * conversion_value) as attributed_revenue,
COUNT(DISTINCT conversion_id) as attributed_conversions
FROM attribution_weights
JOIN fact_transactions ON attribution_weights.conversion_id = fact_transactions.transaction_id
GROUP BY channel;
Time-Decay Attribution:
WITH time_decay_attribution AS (
SELECT
touchpoint_id,
customer_id,
conversion_id,
channel,
conversion_value,
EXP(-0.1 * DATEDIFF('hour', timestamp, conversion_timestamp)) as time_decay_weight
FROM fact_touchpoints
WHERE conversion_id IS NOT NULL
),
normalized_attribution AS (
SELECT
touchpoint_id,
channel,
conversion_value,
time_decay_weight / SUM(time_decay_weight) OVER (PARTITION BY conversion_id) as normalized_weight
FROM time_decay_attribution
)
SELECT
channel,
SUM(conversion_value * normalized_weight) as attributed_revenue
FROM normalized_attribution
GROUP BY channel;
Customer Journey Analysis:
-- Customer Journey Path Analysis
WITH customer_journeys AS (
SELECT
customer_id,
STRING_AGG(channel, ' > ' ORDER BY timestamp) as journey_path,
COUNT(*) as touchpoint_count,
DATEDIFF('day', MIN(timestamp), MAX(timestamp)) as journey_length_days,
SUM(CASE WHEN conversion_id IS NOT NULL THEN 1 ELSE 0 END) as conversions
FROM fact_touchpoints
GROUP BY customer_id
)
SELECT
journey_path,
COUNT(*) as customer_count,
AVG(touchpoint_count) as avg_touchpoints,
AVG(journey_length_days) as avg_journey_days,
SUM(conversions) as total_conversions,
SUM(conversions) * 1.0 / COUNT(*) as conversion_rate
FROM customer_journeys
WHERE touchpoint_count <= 10 -- Filter out extremely long journeys
GROUP BY journey_path
ORDER BY customer_count DESC;
Advanced Analytics Implementation
Cohort Analysis and Retention Modeling
Customer Cohort Analysis:
WITH monthly_cohorts AS (
SELECT
customer_id,
DATE_TRUNC('month', MIN(order_date)) as cohort_month
FROM fact_transactions
GROUP BY customer_id
),
cohort_table AS (
SELECT
c.cohort_month,
DATE_TRUNC('month', t.order_date) as transaction_month,
COUNT(DISTINCT c.customer_id) as customers
FROM monthly_cohorts c
LEFT JOIN fact_transactions t ON c.customer_id = t.customer_id
GROUP BY c.cohort_month, DATE_TRUNC('month', t.order_date)
)
SELECT
cohort_month,
DATEDIFF('month', cohort_month, transaction_month) as month_number,
customers,
FIRST_VALUE(customers) OVER (
PARTITION BY cohort_month
ORDER BY transaction_month
) as cohort_size,
customers * 100.0 / FIRST_VALUE(customers) OVER (
PARTITION BY cohort_month
ORDER BY transaction_month
) as retention_rate
FROM cohort_table
ORDER BY cohort_month, month_number;
Advanced Segmentation Models
RFM (Recency, Frequency, Monetary) Segmentation:
WITH rfm_metrics AS (
SELECT
customer_id,
DATEDIFF('day', MAX(order_date), CURRENT_DATE()) as recency,
COUNT(*) as frequency,
SUM(revenue) as monetary_value
FROM fact_transactions
GROUP BY customer_id
),
rfm_scores AS (
SELECT
customer_id,
recency,
frequency,
monetary_value,
NTILE(5) OVER (ORDER BY recency DESC) as recency_score,
NTILE(5) OVER (ORDER BY frequency ASC) as frequency_score,
NTILE(5) OVER (ORDER BY monetary_value ASC) as monetary_score
FROM rfm_metrics
),
rfm_segments AS (
SELECT
customer_id,
recency_score || frequency_score || monetary_score as rfm_score,
CASE
WHEN recency_score >= 4 AND frequency_score >= 4 THEN 'Champions'
WHEN recency_score >= 3 AND frequency_score >= 3 THEN 'Loyal Customers'
WHEN recency_score >= 3 AND frequency_score <= 2 THEN 'Potential Loyalists'
WHEN recency_score >= 4 AND frequency_score <= 1 THEN 'New Customers'
WHEN recency_score <= 2 AND frequency_score >= 3 THEN 'At Risk'
WHEN recency_score <= 1 AND frequency_score <= 2 THEN 'Lost Customers'
ELSE 'Others'
END as customer_segment
FROM rfm_scores
)
SELECT
customer_segment,
COUNT(*) as customer_count,
AVG(recency) as avg_recency,
AVG(frequency) as avg_frequency,
AVG(monetary_value) as avg_monetary_value
FROM rfm_segments
GROUP BY customer_segment;
Marketing Mix Modeling (MMM)
Media Mix Attribution:
-- Adstock Transformation for MMM
WITH adstock_transformed AS (
SELECT
date,
channel,
spend,
-- Adstock transformation with 0.8 decay rate
SUM(spend * POW(0.8, ROW_NUMBER() OVER (PARTITION BY channel ORDER BY date DESC) - 1))
OVER (PARTITION BY channel ORDER BY date ROWS BETWEEN CURRENT ROW AND 6 FOLLOWING) as adstock_spend
FROM agg_campaign_performance_daily
),
saturation_transformed AS (
SELECT
date,
channel,
spend,
adstock_spend,
-- Hill saturation transformation
adstock_spend / (adstock_spend + 1000) as saturated_adstock_spend
FROM adstock_transformed
)
-- This feeds into regression analysis to determine channel contribution
SELECT
channel,
SUM(spend) as total_spend,
SUM(saturated_adstock_spend) as total_media_pressure,
CORR(saturated_adstock_spend, daily_revenue) as revenue_correlation
FROM saturation_transformed
JOIN (
SELECT date, SUM(revenue) as daily_revenue
FROM fact_transactions
GROUP BY date
) revenue ON saturation_transformed.date = revenue.date
GROUP BY channel;
Technology Stack and Implementation
Modern Data Stack Components
Core Infrastructure:
# Data Warehouse
warehouse: snowflake # or bigquery, redshift
instance_size: medium
auto_scaling: enabled
# ETL/ELT Platform
etl_platform: fivetran # or stitch, airbyte
connectors:
- google_ads
- facebook_ads
- shopify
- klaviyo
- google_analytics_4
# Transformation Layer
dbt_core: true
models:
- staging
- intermediate
- marts
- metrics
# Visualization
bi_tool: looker # or tableau, powerbi
dashboards:
- executive_summary
- campaign_performance
- customer_analytics
- attribution_analysis
dbt Project Structure:
dbt_project/
├── models/
│ ├── staging/
│ │ ├── _sources.yml
│ │ ├── stg_google_ads.sql
│ │ ├── stg_facebook_ads.sql
│ │ └── stg_shopify.sql
│ ├── intermediate/
│ │ ├── int_customer_journeys.sql
│ │ └── int_attribution_touchpoints.sql
│ └── marts/
│ ├── marketing/
│ │ ├── dim_campaigns.sql
│ │ ├── fact_touchpoints.sql
│ │ └── agg_campaign_performance.sql
│ └── customer/
│ ├── dim_customers.sql
│ └── fact_customer_ltv.sql
├── macros/
│ ├── attribution_models.sql
│ └── date_helpers.sql
└── tests/
├── generic/
└── singular/
Real-Time vs. Batch Processing
Batch Processing (Most Common):
# Daily batch job schedule
schedule:
- time: "02:00 UTC"
jobs:
- extract_advertising_data
- extract_ecommerce_data
- run_dbt_transformations
- update_attribution_models
- refresh_aggregates
- update_dashboards
Real-Time Processing (Advanced):
# Real-time streaming for high-priority events
streaming_events:
- purchase_conversions
- email_opens
- website_sessions
- ad_clicks
processing_engine: apache_kafka
stream_processing: apache_spark
latency_target: "<5 minutes"
Data Quality and Governance
Data Quality Framework:
-- Data Quality Tests in dbt
-- Volume test: Check for significant data drops
SELECT
current_date as test_date,
'google_ads_spend' as metric,
COUNT(*) as current_count,
LAG(COUNT(*), 7) OVER (ORDER BY current_date) as previous_count,
(COUNT(*) - LAG(COUNT(*), 7) OVER (ORDER BY current_date)) * 100.0 /
LAG(COUNT(*), 7) OVER (ORDER BY current_date) as percent_change
FROM {{ ref('stg_google_ads') }}
WHERE date >= current_date - 8
GROUP BY current_date
HAVING ABS(percent_change) > 50; -- Alert if >50% change
-- Freshness test: Data recency validation
SELECT
'facebook_ads' as source,
MAX(date) as latest_data_date,
DATEDIFF('day', MAX(date), CURRENT_DATE()) as days_behind
FROM {{ ref('stg_facebook_ads') }}
HAVING days_behind > 2; -- Alert if data is >2 days old
Cost Optimization
Query Optimization Strategies:
-- Partitioned tables for cost control
CREATE TABLE fact_touchpoints (
touchpoint_id STRING,
customer_id STRING,
date DATE,
channel STRING,
revenue DECIMAL(10,2)
)
PARTITION BY date
CLUSTER BY (customer_id, channel);
-- Materialized views for common aggregates
CREATE MATERIALIZED VIEW mv_daily_channel_performance AS
SELECT
date,
channel,
SUM(spend) as total_spend,
SUM(revenue) as total_revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM fact_touchpoints
GROUP BY date, channel;
Business Intelligence and Visualization
Executive Dashboard Requirements
Key Performance Indicators:
-- Executive KPI Dashboard Query
WITH daily_metrics AS (
SELECT
date,
SUM(spend) as total_spend,
SUM(revenue) as total_revenue,
COUNT(DISTINCT customer_id) as new_customers,
SUM(revenue) / SUM(spend) as blended_roas
FROM agg_campaign_performance_daily
WHERE date >= CURRENT_DATE() - 30
GROUP BY date
)
SELECT
'Last 30 Days' as period,
SUM(total_spend) as spend,
SUM(total_revenue) as revenue,
SUM(new_customers) as customers_acquired,
AVG(blended_roas) as avg_roas,
(SUM(total_revenue) - SUM(total_spend)) as profit,
((SUM(total_revenue) - SUM(total_spend)) / SUM(total_spend)) * 100 as roi_percent
FROM daily_metrics;
Channel Performance Dashboard:
-- Channel Performance Summary
SELECT
channel,
SUM(spend) as total_spend,
SUM(revenue) as attributed_revenue,
SUM(revenue) / SUM(spend) as roas,
COUNT(DISTINCT customer_id) as customers_acquired,
SUM(spend) / COUNT(DISTINCT customer_id) as cac,
AVG(customer_ltv) as avg_ltv,
AVG(customer_ltv) / (SUM(spend) / COUNT(DISTINCT customer_id)) as ltv_cac_ratio
FROM fact_touchpoints t
JOIN dim_customers c ON t.customer_id = c.customer_id
WHERE date >= CURRENT_DATE() - 30
GROUP BY channel
ORDER BY total_spend DESC;
Advanced Analytics Dashboards
Customer Journey Analysis:
- Journey path visualization
- Conversion funnel analysis
- Time-to-conversion metrics
- Channel interaction effects
Attribution Model Comparison:
- First-click vs. last-click vs. data-driven attribution
- Channel performance across different models
- Attribution model impact on budget allocation
- Model accuracy validation
Cohort and Retention Analysis:
- Monthly cohort retention rates
- Revenue per cohort over time
- Cohort LTV progression
- Retention rate by acquisition channel
Implementation Roadmap
Phase 1: Foundation (Weeks 1-4)
Infrastructure Setup:
Week 1: Data warehouse and ETL platform selection
Week 2: Core connector configuration (ads, analytics, ecommerce)
Week 3: Basic data extraction and loading
Week 4: Data quality validation and monitoring setup
Phase 2: Core Models (Weeks 5-8)
Data Modeling:
Week 5: Customer and product dimension tables
Week 6: Transaction and touchpoint fact tables
Week 7: Basic attribution modeling implementation
Week 8: Campaign performance aggregates
Phase 3: Advanced Analytics (Weeks 9-12)
Advanced Features:
Week 9: Multi-touch attribution models
Week 10: Customer lifetime value calculations
Week 11: Cohort analysis and segmentation
Week 12: Media mix modeling foundation
Phase 4: Visualization and Optimization (Weeks 13-16)
Business Intelligence:
Week 13: Executive dashboard development
Week 14: Campaign performance dashboards
Week 15: Customer analytics dashboards
Week 16: Training and optimization
Common Implementation Challenges
Data Quality Issues
Problem: Inconsistent Data Formats
- Different date formats across platforms
- Varying currency representations
- Inconsistent naming conventions
- Missing or null values
Solution: Standardization Layer
-- Data standardization example
SELECT
campaign_id,
UPPER(TRIM(campaign_name)) as campaign_name_clean,
PARSE_DATE('%Y-%m-%d', date_string) as date_standardized,
ROUND(CAST(spend_string as DECIMAL(10,2)), 2) as spend_clean
FROM raw_campaign_data;
Attribution Complexity
Challenge: Cross-Device Tracking
- Users switch between devices
- Cookie limitations and privacy updates
- Mobile app vs. web attribution
- Email click tracking across devices
Solution: Probabilistic Matching
-- Probabilistic customer matching
WITH customer_signals AS (
SELECT
email,
phone,
device_id,
ip_address,
timestamp
FROM touchpoint_events
),
match_scores AS (
SELECT
a.touchpoint_id as touchpoint_a,
b.touchpoint_id as touchpoint_b,
CASE
WHEN a.email = b.email THEN 1.0
WHEN a.phone = b.phone THEN 0.8
WHEN a.ip_address = b.ip_address AND ABS(DATEDIFF('minute', a.timestamp, b.timestamp)) < 60 THEN 0.6
ELSE 0.0
END as match_score
FROM customer_signals a
JOIN customer_signals b ON a.touchpoint_id != b.touchpoint_id
)
-- Cluster touchpoints with high match scores
SELECT touchpoint_a, touchpoint_b, match_score
FROM match_scores
WHERE match_score > 0.7;
Performance and Cost Management
Challenge: Query Performance
- Large dataset scan costs
- Complex attribution calculations
- Real-time dashboard requirements
- Concurrent user access
Solution: Incremental Processing
-- Incremental model in dbt
{{ config(
materialized='incremental',
unique_key='date',
on_schema_change='fail'
) }}
SELECT
date,
channel,
SUM(spend) as total_spend,
SUM(revenue) as total_revenue
FROM {{ ref('stg_touchpoints') }}
{% if is_incremental() %}
WHERE date > (SELECT MAX(date) FROM {{ this }})
{% endif %}
GROUP BY date, channel;
The Bottom Line
A marketing data warehouse transforms chaotic, fragmented data into actionable business intelligence.
The brands that invest in unified data infrastructure gain massive competitive advantages: faster decision-making, accurate attribution, optimized budget allocation, and deep customer insights. Those that continue operating with fragmented data will fall behind as competition intensifies.
Implement the DATA framework systematically: integrate all data sources comprehensively, design scalable architecture, transform data into business-ready models, and build attribution that reflects reality.
Start with core integrations and basic models, then expand to advanced analytics and real-time capabilities. Focus on data quality and governance from day one—garbage in, garbage out applies especially to marketing analytics.
Remember: the goal isn't to have the fanciest data warehouse—it's to have reliable, unified data that enables confident business decisions. Build for your current needs with future scalability in mind.
Your marketing success depends on data-driven decisions. Build the infrastructure that makes those decisions faster, more accurate, and more profitable.
Related Articles
- GA4 for E-Commerce: The Setup Guide You Actually Need
- E-Commerce Landing Pages That Convert: A Data-Driven Guide
- Pinterest Organic E-Commerce Strategy: The Complete Growth Guide
- E-Commerce Referral Program Guide: Turn Customers Into Your Best Salespeople
- Marketing Budget Planning Framework: Strategic Allocation for E-Commerce Growth
Additional Resources
- Meta Ads Manager Help
- Klaviyo Marketing Resources
- Triple Whale Attribution
- Attentive SMS Resources
- Pinterest Ads
Ready to Grow Your Brand?
ATTN Agency helps DTC and e-commerce brands scale profitably through paid media, email, SMS, and more. Whether you're looking to optimize your current strategy or launch something new, we'd love to chat.
Book a Free Strategy Call or Get in Touch to learn how we can help your brand grow.