Backtesting Guide

Backtesting is the process of testing a strategy on historical data. A well-executed backtest gives you confidence in your strategy's potential.

The Backtest Equation

Backtest Quality = (Data Quality × Cost Modeling × Execution Realism) - Bias

Each component matters. Let's break them down.

Data Quality

Requirements for Quality Data

Requirement	Why It Matters
Adjusted prices	Splits, dividends affect continuity
Survivorship-free	Include delisted securities
Point-in-time	Use data available at that moment
Complete coverage	No gaps or missing bars

Data Adjustments

# VecAlpha handles adjustments automatically
data = vecalpha.get_data(
    symbol='AAPL',
    start='2020-01-01',
    adjusted=True,      # Split/dividend adjusted
    survivorship_free=True  # Includes delisted
)

Point-in-Time Data

Critical for avoiding look-ahead bias:

# WRONG: Uses data not available at trade time
if earnings_announced and earnings > expected:
    buy()

# CORRECT: Only use data available before trade
if yesterday.close > yesterday.open:
    buy()

Cost Modeling

Transaction Costs

Every trade costs money. Include:

Cost Type	Typical Range	Impact
Commission	0.01% - 0.1%	Reduces returns linearly
Slippage	0.01% - 0.1%	Higher for larger orders
Spread	0.01% - 0.05%	Varies by liquidity

# VecAlpha backtest configuration
backtest_config = {
    'commission': 0.001,      # 0.1% commission
    'slippage_model': 'volume_share',  # Proportional to order size
    'slippage_impact': 0.1,   # Market impact coefficient
}

Slippage Models

Different models for different markets:

# Fixed slippage
slippage = 0.0005  # 5 basis points

# Volume-based slippage (more realistic)
slippage = order_size / daily_volume * price * 0.1

# Volatility-adjusted slippage
slippage = atr * 0.1  # 10% of ATR

Impact on High-Frequency Strategies

More trades = more costs:

Strategy A: 100 trades/year, 10% gross return
Cost: 100 × 0.2% = 20%
Net return: -10% (LOSS)

Strategy B: 10 trades/year, 10% gross return  
Cost: 10 × 0.2% = 2%
Net return: 8% (PROFIT)

Execution Realism

Order Types

Model the orders your strategy will use:

Order Type	When to Use	Modeling
Market	Immediate execution	Slippage costs
Limit	Price target	Fill probability
Stop	Risk management	Trigger timing

# Market order (with slippage)
self.buy(size=100, type='market')

# Limit order (may not fill)
self.buy(size=100, type='limit', price=current_price * 0.99)

# Stop order (triggers on price)
self.sell(size=position, type='stop', price=entry_price * 0.95)

Fill Assumptions

Be realistic about fills:

# Too optimistic: Assume limit always fills
if price <= limit_price:
    filled = True

# More realistic: Partial fills, rejections
filled = simulate_fill_probability(
    order_size=size,
    available_volume=bar_volume,
    price_distance=limit_price - current_price
)

Avoiding Bias

Look-Ahead Bias

Using future information:

# WRONG: Uses today's close to trade today
if close > open:
    buy()  # Can't know close until end of day

# CORRECT: Use yesterday's data
if prev_close > prev_open:
    buy()

Survivorship Bias

Testing only on successful companies:

# WRONG: Only current S&P 500 stocks
symbols = get_current_sp500()

# CORRECT: Historical S&P 500 constituents
symbols = get_sp500_members(date='2020-01-01')

Selection Bias

Picking favorable test periods:

# WRONG: Cherry-pick bull market
start = '2020-04-01'  # Post-COVID bottom
end = '2021-12-01'    # Peak

# CORRECT: Test multiple market regimes
periods = [
    ('2018-01-01', '2019-12-31'),  # Normal
    ('2020-01-01', '2020-12-31'),  # Volatile
    ('2021-01-01', '2022-12-31'),  # Mixed
]

Performance Metrics

Return Metrics

Metric	Formula	Interpretation
Total Return	(End - Start) / Start	Overall profit
CAGR	(End/Start)^(1/years) - 1	Annualized growth
Monthly Return	Mean of monthly returns	Consistency check

Risk Metrics

Metric	Formula	Good Range
Sharpe Ratio	(Return - Rf) / StdDev	> 1.0
Sortino Ratio	(Return - Rf) / DownsideStd	> 1.5
Max Drawdown	Peak to trough decline	< 20%
Calmar Ratio	CAGR / MaxDrawdown	> 1.0

Trade Metrics

Metric	Formula	Target
Win Rate	Wins / Total Trades	> 45%
Profit Factor	Gross Profit / Gross Loss	> 1.5
Avg Win / Avg Loss	Average win size / Average loss	> 1.0
Expectancy	(Win% × AvgWin) - (Loss% × AvgLoss)	> 0

# VecAlpha backtest results
results = backtest.run()

print(f"Total Return: {results.total_return:.2%}")
print(f"Sharpe Ratio: {results.sharpe_ratio:.2f}")
print(f"Max Drawdown: {results.max_drawdown:.2%}")
print(f"Win Rate: {results.win_rate:.2%}")
print(f"Profit Factor: {results.profit_factor:.2f}")

Walk-Forward Analysis

The gold standard for robustness testing:

from vecalpha import WalkForwardAnalysis

wfa = WalkForwardAnalysis(
    train_period='2Y',    # 2 years for optimization
    test_period='6M',     # 6 months for out-of-sample
    anchor=False          # Rolling (vs anchored)
)

results = wfa.run(strategy, data)

# Compare in-sample vs out-of-sample
print(f"In-Sample Sharpe: {results.in_sample_sharpe:.2f}")
print(f"Out-of-Sample Sharpe: {results.out_of_sample_sharpe:.2f}")

# If OOS < 50% of IS, likely overfitted
if results.out_of_sample_sharpe < results.in_sample_sharpe * 0.5:
    print("WARNING: Strategy may be overfitted")

Monte Carlo Simulation

Test statistical significance:

from vecalpha import MonteCarloSimulation

mc = MonteCarloSimulation(n_simulations=1000)
results = mc.run(strategy, data)

print(f"Expected Return: {results.mean_return:.2%}")
print(f"5th Percentile: {results.percentile_5:.2%}")
print(f"Probability of Loss: {results.prob_loss:.2%}")

Backtest Checklist

Before trusting a backtest:

Data is adjusted for splits/dividends
Survivorship-free data used
No look-ahead bias in signals
Realistic transaction costs included
Slippage model appropriate for strategy
Tested across multiple market regimes
Out-of-sample testing performed
Performance compared to buy-and-hold benchmark

Next Steps

Optimization - Fine-tune your strategy
Live Trading - Deploy to production

The Backtest Equation​

Data Quality​

Requirements for Quality Data​

Data Adjustments​

Point-in-Time Data​

Cost Modeling​

Transaction Costs​

Slippage Models​

Impact on High-Frequency Strategies​

Execution Realism​

Order Types​

Fill Assumptions​

Avoiding Bias​

Look-Ahead Bias​

Survivorship Bias​

Selection Bias​

Performance Metrics​

Return Metrics​

Risk Metrics​

Trade Metrics​

Walk-Forward Analysis​

Monte Carlo Simulation​

Backtest Checklist​

Next Steps​