Skip to content

GPU Acceleration

The engine supports GPU execution via CuPy, providing significant speedups for large-scale simulations.

Setup

Install CuPy

Install the CuPy package matching your CUDA version:

# CUDA 12.x
pip install cupy-cuda12x

# CUDA 11.x
pip install cupy-cuda11x

Verify

import cupy as cp
print(f"GPUs found: {cp.cuda.runtime.getDeviceCount()}")

GPU Backends

The engine provides two GPU backends:

graph LR
    subgraph Standard["GpuMonteCarloEngine"]
        A[GPU: simulate] --> B[CPU: tuples] --> C[CPU: risk]
    end

    subgraph Fused["GpuAcceleratedPipeline"]
        D[GPU: simulate] --> E[GPU: risk] --> F["CPU: 6 floats"]
    end

    style Fused fill:#e8f5e9,stroke:#4caf50,stroke-width:2px

GpuMonteCarloEngine

Drop-in replacement for CpuMonteCarloEngine. Implements the MonteCarloEngine protocol, so it works with RunMonteCarlo and ComputePortfolioRisk use cases.

from portfolio_risk_engine.infrastructure.simulation.gpu_monte_carlo_engine import GpuMonteCarloEngine

engine = GpuMonteCarloEngine(seed=42)
sim = RunMonteCarlo(engine).execute(
    market_params=params,
    initial_prices=initial_prices,
    num_simulations=1_000_000,
    time_horizon_days=21,
)
risk = ComputePortfolioRisk.execute(portfolio, sim)

Trade-off: terminal prices are transferred from GPU to CPU as Python tuples. For very large simulations (>1M paths), this conversion can be a bottleneck.

GpuAcceleratedPipeline

Fused pipeline that keeps all data on GPU. Simulation and risk computation happen in a single pass — only 6 scalar floats (the final risk metrics) are transferred back.

from portfolio_risk_engine.infrastructure.simulation.gpu_accelerated_pipeline import GpuAcceleratedPipeline

pipeline = GpuAcceleratedPipeline(seed=42)
risk = pipeline.run(
    market_params=params,
    initial_prices=initial_prices,
    weights=(0.5, 0.3, 0.2),
    num_simulations=1_000_000,
    time_horizon_days=21,
)

When to use which

  • GpuMonteCarloEngine: when you need per-path terminal prices (e.g. for custom analytics or visualization)
  • GpuAcceleratedPipeline: when you only need the 6 risk metrics (fastest, no memory overhead)

Performance

See Benchmarks for detailed CPU vs GPU comparisons across different portfolio sizes and simulation counts.

Key Factors

  • First call: includes JIT compilation / kernel launch overhead (~1-2s)
  • Small simulations (<10K paths): CPU may be faster due to GPU launch overhead
  • Large simulations (>100K paths): GPU provides significant speedups
  • Fused pipeline: eliminates tuple conversion overhead, especially impactful at >1M paths

Architecture Note

The GPU backends live in the infrastructure layer and do not modify any domain or application logic. The GpuMonteCarloEngine implements the same MonteCarloEngine protocol as the CPU version. The GpuAcceleratedPipeline bypasses the protocol for maximum performance but produces the same PortfolioRiskMetrics domain model.