Introduction

In Part 1 of this series, we introduced our Stock Market Analytics Platform built on Google Cloud Platform (GCP). We covered the architecture, infrastructure deployment with Terraform, batch processing with Dataproc/Spark, and real-time data processing with Kafka.

In this second part, we'll explore the remaining components: data transformation with dbt, workflow orchestration with Airflow, data visualization, performance optimization, and monitoring.

Data Transformation with dbt

After collecting both historical and real-time stock data in BigQuery, we use dbt (data build tool) to transform this raw data into analytics-ready models.

dbt Project Structure

Our dbt project follows a three-layer architecture:

  1. Staging Models: Initial data cleaning and standardization
  2. Intermediate Models: Business logic and transformations
  3. Mart Models: Final presentation layer for analysis

Transformation Logic

Our transformations include:

  1. Time-based Aggregations: Daily, weekly, monthly views of price data
  2. Technical Indicators: Moving averages, RSI, MACD, Bollinger Bands
  3. Statistical Analysis: Volatility, beta, correlation matrices
  4. Sectoral Analysis: Industry grouping and comparative metrics

Testing and Documentation

Each dbt model includes: