Home Markets Equities Analysis Research Infrastructure About
Live
Loading…
certurk23 // Quant Lab
Research Papers & Notes
Quantitative research covering market microstructure, portfolio theory, execution analytics, machine learning infrastructure, and alternative data. All papers are independent, non-commercial research.
10
Papers
2026
Latest
5
Topics
Open
Access

Article Index

01
VPIN and Order Flow Toxicity
Volume-synchronized probability of informed trading as a practical signal for detecting adverse selection in fragmented equity markets.
VPINMicrostructureOrder Flow
02
Low-Latency Stack: RTX 5090 & Core Ultra 9
Why GPU-accelerated financial modeling still requires CPU-class hardware. A split-path architecture analysis for real trading systems.
InfrastructureLatencyGPU
03
Hierarchical Risk Parity (HRP)
Cluster-based portfolio allocation that avoids covariance inversion and delivers more stable out-of-sample diversification.
PortfolioHRPRisk
04
Probabilistic Sharpe Ratio & Backtest Overfitting
A statistically rigorous alternative to raw Sharpe that adjusts for non-normality, sample length, and multiple testing bias.
StatisticsBacktestPSR
05
Bid-Ask Spread Dynamics
Quoted, effective, and realized spreads as microstructure state variables. Decomposing the cost of immediacy for execution models.
MicrostructureSpreadsExecution
06
Sentiment Analysis in the Turkish Market (BIST)
Building a time-aware, Turkish-native NLP pipeline using Qwen and Llama for financial news signal extraction on Borsa İstanbul.
NLPLLMBIST
07
Sovereign AI: Local LLMs for Quant Research
Why self-hosted language models are structurally superior for investment research. The Bastion philosophy and local deployment case.
Sovereign AILLMPrivacy
08
Automating Alpha Discovery with Genetic Algorithms
Evolutionary search as an alpha hypothesis generator. Fitness design, selection pressure, and rigorous out-of-sample validation.
AlphaOptimizationGA
09
Slippage and Latency Modeling in Backtesting
Why PnL arises from signal after implementation, not signal alone. Fill models, latency decomposition, and market impact.
ExecutionBacktestImpact
10
The Role of Alternate Data in Quant Finance
Satellite imagery, transaction panels, and e-commerce data as nowcasting signals. Data governance and timestamp integrity.
Alt DataMLNowcasting

Microstructure

Paper 01

VPIN and Order Flow Toxicity: A Practical Microstructure Signal for Quantitative Traders

VPIN Order Flow Microstructure NASDAQ certurk23 Quant Lab · 2026

In modern electronic markets, price does not move solely because of public news. A substantial share of short-term price formation is driven by who is trading, how informed they are, and how aggressively they interact with available liquidity. For quantitative researchers, this leads to a central microstructure question: how can we detect when order flow becomes dangerous for liquidity providers?

One influential answer is VPIN, or Volume-Synchronized Probability of Informed Trading. VPIN is designed to measure order flow toxicity — the extent to which incoming flow is adverse to passive market participants such as market makers, internalizers, or execution algorithms. When toxicity rises, quoting tight spreads becomes more dangerous, slippage tends to increase, and short-horizon returns become harder to model using stationary assumptions.

At a high level, VPIN replaces calendar time with volume time. Instead of asking what happened during the last minute, it asks what happened during the last fixed amount of traded volume. This shift is important because information does not arrive at a constant rate in financial markets. During news events, open and close auctions, or stress periods, a single minute may contain far more information than several minutes in a quiet regime. Volume-synchronized sampling tries to normalize that uneven information arrival.

The core VPIN intuition is straightforward. For each fixed-volume bucket, we estimate the buy volume and the sell volume. The larger the imbalance between the two, the more one-sided the flow appears. A common representation is:

Definition — VPIN Estimator
$$\text{VPIN} = \frac{\sum |V_{buy} - V_{sell}|}{V_{total}}$$

In practice, VPIN is computed over a rolling window of the most recent \(n\) volume buckets:

$$\text{VPIN}_t = \frac{1}{nV} \sum_{i=t-n+1}^{t} \left| V_i^{buy} - V_i^{sell} \right|$$

Here, \(V\) is the fixed bucket size, and \(n\) is the number of buckets in the rolling sample. This normalized formulation makes VPIN interpretable as the recent average order-flow imbalance per unit of volume.

Why Order Flow Toxicity Matters

Order flow toxicity is essentially an adverse selection problem. Suppose a market maker posts bid and ask quotes. If the traders hitting those quotes are mostly uninformed and inventory shocks are balanced, the market maker can earn the spread with manageable risk. But if the counterparties are systematically better informed, the market maker is likely to buy just before prices fall and sell just before prices rise.

A rising VPIN is often associated with:

  • wider spreads and reduced displayed depth
  • higher short-term volatility
  • lower passive execution quality
  • more fragile market impact dynamics

Why Use Volume Buckets Instead of Time Bars?

Traditional indicators are built on fixed clock-time bars. That approach imposes an assumption that market activity is homogeneous through time. In reality, a one-minute interval at the open is not statistically comparable to a one-minute interval during midday inactivity. Volume bucketing ensures that each observation contains the same amount of trading activity, creating a more stable basis for measuring imbalances.

The Practical Challenge: Classifying Buy and Sell Volume

Exchanges do not always label every trade as buyer-initiated or seller-initiated in a directly usable way. Practitioners usually infer trade direction using:

  • the tick rule
  • Lee–Ready style signing against quotes
  • direct aggressor flags, when available in proprietary feeds

A Simple Python Implementation

vpin_core.py Python
import pandas as pd
import numpy as np

def classify_trade_sign(price_series: pd.Series) -> pd.Series:
    price_diff = price_series.diff()
    sign = np.sign(price_diff)
    sign = sign.replace(0, np.nan).ffill().fillna(1)
    return sign

def compute_vpin(trades: pd.DataFrame, bucket_volume: float, window: int = 50) -> pd.DataFrame:
    df = trades.copy()
    df["sign"] = classify_trade_sign(df["price"])

    df["buy_volume"]  = np.where(df["sign"] > 0, df["volume"], 0.0)
    df["sell_volume"] = np.where(df["sign"] < 0, df["volume"], 0.0)

    df["cum_volume"] = df["volume"].cumsum()
    df["bucket_id"] = ((df["cum_volume"] - 1) // bucket_volume).astype(int)

    bucketed = df.groupby("bucket_id").agg({
        "buy_volume":  "sum",
        "sell_volume": "sum",
        "volume":      "sum"
    })

    bucketed["imbalance"] = (bucketed["buy_volume"] - bucketed["sell_volume"]).abs()
    bucketed["vpin"] = bucketed["imbalance"].rolling(window).sum() / (
        bucketed["volume"].rolling(window).sum()
    )
    return bucketed

A More Realistic Extension

vpin_features.py Python
def add_microstructure_features(bucketed: pd.DataFrame) -> pd.DataFrame:
    df = bucketed.copy()
    df["order_flow_ratio"]     = (df["buy_volume"] - df["sell_volume"]) / df["volume"]
    df["abs_order_flow_ratio"] = df["order_flow_ratio"].abs()
    df["vpin_zscore"] = (
        (df["vpin"] - df["vpin"].rolling(100).mean()) /
         df["vpin"].rolling(100).std()
    )
    return df

How Quants Use VPIN

From a research perspective, VPIN is rarely the final alpha. It is more commonly used as a state variable:

  • A market making desk may reduce quote sizes when VPIN exceeds a threshold
  • An execution algorithm may shift from passive to more aggressive participation when toxicity rises
  • A short-horizon prediction model may condition its parameters on whether the current VPIN regime is high or low
  • A portfolio manager may use it as one input in a broader stress-monitoring dashboard

Limitations and Critiques

VPIN is useful, but should not be treated as a universal truth. Trade classification error can materially affect the estimate. Bucket size and rolling window length are hyperparameters; different choices can produce very different behavior. High VPIN does not always mean "informed trading" in a strict economic sense — it may also reflect mechanical one-sided flow, hedging pressure, or fragmented liquidity.

VPIN is often strongest as a descriptive microstructure measure rather than as a standalone predictive factor. It tells you something about the market's current fragility, but the exact mapping from fragility to future returns is context-dependent.

Paper 02

Building a Low-Latency Trading Stack with RTX 5090: Why GPU-Accelerated Financial Modeling Still Needs Core Ultra 9-Class CPUs

Infrastructure Latency GPU HFT certurk23 Quant Lab · 2026

A common misconception in quant infrastructure is that buying the fastest GPU automatically creates a low-latency trading stack. It does not. In practice, a modern trading plant is heterogeneous. The GPU is exceptional for massively parallel workloads such as Monte Carlo pricing, large cross-sectional inference, batched feature generation, and local LLM inference. The CPU remains indispensable for market data ingestion, feed normalization, lock-free queues, risk checks, order serialization, and any control path where tail latency matters more than raw throughput.

The correct mental model is not "GPU replaces CPU," but:

End-to-end latency decomposition
$$L_{\text{total}} = L_{\text{net}} + L_{\text{parse}} + L_{\text{feature}} + L_{\text{transfer}} + L_{\text{gpu}} + L_{\text{decision}} + L_{\text{order}}$$

If your strategy is sensitive to wire-to-wire latency, the GPU only occupies one term in that decomposition. The rest lives in the CPU, memory subsystem, NIC path, and operating system.

Why Core Ultra 9-Class Hardware Still Matters

  • Feed handlers are branchy, stateful, and serialization-heavy — GPUs dislike irregular control flow, CPUs excel at it
  • Low-latency stacks depend on pinned threads, cache locality, NUMA awareness, and predictable interrupt behavior
  • The GPU itself needs orchestration: batch assembly, DMA scheduling, memory pinning, and fallback handling all happen on the CPU
  • Pre-trade risk, throttles, and venue adapters are not embarrassingly parallel — they are latency-critical decision layers

A Practical Signal-Flow

  1. NIC receives multicast or market gateway packets
  2. CPU decodes messages and updates the local order book
  3. CPU computes lightweight features for immediate execution logic
  4. GPU receives batched tensors for heavier models
  5. CPU merges GPU output with risk and routing constraints
  6. Orders are serialized and transmitted from the CPU path

PyTorch Batched Inference Example

gpu_inference.py Python
import torch
import time

device = "cuda" if torch.cuda.is_available() else "cpu"

class ShortHorizonModel(torch.nn.Module):
    def __init__(self, in_dim=64, hidden=128):
        super().__init__()
        self.net = torch.nn.Sequential(
            torch.nn.Linear(in_dim, hidden),
            torch.nn.ReLU(),
            torch.nn.Linear(hidden, 1)
        )

    def forward(self, x):
        return self.net(x)

model = ShortHorizonModel().to(device).eval()

# Simulated microstructure features from CPU-side pipeline
features = torch.randn(4096, 64, device=device)

with torch.no_grad():
    start = time.perf_counter()
    score = model(features)
    if device == "cuda": torch.cuda.synchronize()
    elapsed_ms = 1000 * (time.perf_counter() - start)

print(f"Inference latency: {elapsed_ms:.3f} ms")
print(score[:5].flatten())

The honest answer is this: if your objective is true low latency, an RTX 5090 is a powerful accelerator, not a complete solution. You still need CPU-class hardware because markets are not just matrix multiplication — they are interrupts, packets, queues, clocks, and risk gates.

Portfolio Theory

Paper 03

Hierarchical Risk Parity (HRP) for Portfolio Optimization

Portfolio HRP Risk Parity Clustering certurk23 Quant Lab · 2026

Classical portfolio theory is elegant, but in practical quant workflows it often breaks where the algebra looks strongest. Mean-variance optimization requires estimating expected returns and inverting the covariance matrix. In small samples, high dimensions, or unstable regimes, that process becomes fragile. Tiny changes in input can produce violent changes in weights.

Hierarchical Risk Parity (HRP) avoids direct covariance inversion and uses hierarchical clustering to structure allocation. Assets are not independent points in space — they form dependency clusters: banks, semiconductors, sovereign bonds, energy names, or factor-like groups.

HRP first measures similarity using correlation, then transforms that into a distance metric:

Correlation Distance
$$d_{ij} = \sqrt{\frac{1 - \rho_{ij}}{2}}$$

Once the hierarchy is built via clustering, HRP applies two steps: quasi-diagonalization (reorder the covariance matrix so similar assets are adjacent) and recursive bisection. If two clusters have variances \(\sigma_L^2\) and \(\sigma_R^2\), the left cluster receives weight:

Recursive Bisection Allocation
$$w_L = 1 - \frac{\sigma_L^2}{\sigma_L^2 + \sigma_R^2}, \quad w_R = 1 - w_L$$

Python Implementation

hrp.py Python
import numpy as np
import pandas as pd
from scipy.cluster.hierarchy import linkage, leaves_list
from scipy.spatial.distance import squareform

def correl_dist(corr):
    return np.sqrt((1 - corr) / 2)

def get_cluster_var(cov, cluster_items):
    sub_cov = cov.loc[cluster_items, cluster_items]
    ivp = 1 / np.diag(sub_cov)
    ivp = ivp / ivp.sum()
    return np.dot(ivp, np.dot(sub_cov, ivp))

def hrp_allocation(returns: pd.DataFrame) -> pd.Series:
    cov  = returns.cov()
    corr = returns.corr()
    dist = correl_dist(corr)

    link    = linkage(squareform(dist.values, checks=False), method="single")
    sort_ix = corr.index[leaves_list(link)]

    weights  = pd.Series(1.0, index=sort_ix)
    clusters = [list(sort_ix)]

    while clusters:
        cluster = clusters.pop(0)
        if len(cluster) <= 1: continue

        split = len(cluster) // 2
        left, right = cluster[:split], cluster[split:]

        var_left  = get_cluster_var(cov, left)
        var_right = get_cluster_var(cov, right)

        alpha = 1 - var_left / (var_left + var_right)
        weights[left]  *= alpha
        weights[right] *= (1 - alpha)

        clusters.extend([left, right])

    return weights / weights.sum()

HRP's advantage is that it treats dependence structure as an object worth modeling directly. That becomes valuable when correlations are unstable, samples are short, and optimization error matters more than elegant closed forms. It tends to behave well when traditional optimizers overreact to noisy means and covariances.

Paper 04

Probabilistic Sharpe Ratio (PSR) and Backtest Overfitting

Statistics Backtest PSR Sharpe certurk23 Quant Lab · 2026

The Sharpe ratio is one of the most abused statistics in quantitative finance. Two strategies can have the same Sharpe ratio even if one is estimated from a short, skewed, fat-tailed sample and the other from a long, well-behaved history. A raw Sharpe number says nothing about statistical confidence, non-normality, or multiple testing.

Standard Sharpe Ratio
$$\widehat{SR} = \frac{\hat{\mu}}{\hat{\sigma}}$$

The Probabilistic Sharpe Ratio (PSR) estimates the probability that an observed Sharpe ratio exceeds a benchmark \(SR^*\), while adjusting for skewness and kurtosis:

Probabilistic Sharpe Ratio
$$PSR(SR^*) = \Phi\left( \frac{(\widehat{SR} - SR^*)\sqrt{T-1}} {\sqrt{1 - \gamma_3 \widehat{SR} + \frac{\gamma_4 - 1}{4}\widehat{SR}^2}} \right)$$

Here, \(T\) is the sample length, \(\gamma_3\) is skewness, \(\gamma_4\) is kurtosis, and \(\Phi\) is the standard normal CDF. The denominator inflates uncertainty when returns are asymmetric or fat-tailed — exactly what classical Sharpe ignores.

Python Implementation

psr.py Python
import numpy as np
import pandas as pd
from scipy.stats import skew, kurtosis, norm

def probabilistic_sharpe_ratio(returns, sr_benchmark=0.0, periods_per_year=252):
    r = pd.Series(returns).dropna()
    sr_hat = np.sqrt(periods_per_year) * r.mean() / r.std(ddof=1)

    T  = len(r)
    g3 = skew(r, bias=False)
    g4 = kurtosis(r, fisher=False, bias=False)  # Pearson kurtosis

    numerator   = (sr_hat - sr_benchmark) * np.sqrt(T - 1)
    denominator = np.sqrt(1 - g3 * sr_hat + ((g4 - 1) / 4.0) * sr_hat**2)
    z = numerator / denominator

    return {
        "sharpe":   sr_hat,
        "psr":      norm.cdf(z),
        "skew":     g3,
        "kurtosis": g4,
        "z_score":  z
    }

# Example
np.random.seed(42)
rets = np.random.normal(0.0005, 0.01, 500)
print(probabilistic_sharpe_ratio(rets, sr_benchmark=1.0))

Ranking strategies by PSR instead of raw Sharpe forces the strategy to "earn" its Sharpe under a stronger evidentiary standard. It penalizes short histories, punishes ugly tail behavior, and gives you a way to compare estimated skill against a benchmark such as \(SR^* = 1\). Standard Sharpe is descriptive. PSR is inferential.

Execution & Market Making

Paper 05

Market Microstructure: Bid-Ask Spread Dynamics

Microstructure Spreads Execution Adverse Selection certurk23 Quant Lab · 2026

At the surface, the bid-ask spread looks trivial. But in market microstructure, the spread is not merely a transaction cost — it is a compressed summary of inventory risk, adverse selection, tick-size constraints, queue competition, and market-maker expectations about future order flow.

Quoted Spread
$$\text{Quoted Spread}_t = Ask_t - Bid_t$$
Midpoint
$$M_t = \frac{Ask_t + Bid_t}{2}$$

The effective spread measures how far the trade price deviates from the midpoint, adjusted for trade direction (\(D_t = +1\) for buyer-initiated, \(D_t = -1\) for seller-initiated):

Effective Spread
$$\text{Effective Spread}_t = 2 D_t (P_t - M_t)$$

The realized spread compares the execution price to a later midpoint \(M_{t+\Delta}\), showing ex-post dealer revenue net of information effects:

Realized Spread
$$\text{Realized Spread}_t = 2 D_t (P_t - M_{t+\Delta})$$

Why Do Spreads Widen?

  • Inventory risk: dealers need compensation when they accumulate too much long or short exposure
  • Processing costs: technology, clearing, capital, and exchange fees
  • Adverse selection: if incoming orders are likely informed, a passive market maker expects to lose on average after the trade

Python Implementation

spreads.py Python
import pandas as pd
import numpy as np

def compute_spreads(df: pd.DataFrame, horizon=5) -> pd.DataFrame:
    out = df.copy()
    out["mid"]            = (out["bid"] + out["ask"]) / 2
    out["quoted_spread"]  = out["ask"] - out["bid"]
    out["effective_spread"] = 2 * out["trade_sign"] * (out["trade_price"] - out["mid"])
    out["mid_future"]     = out["mid"].shift(-horizon)
    out["realized_spread"] = 2 * out["trade_sign"] * (out["trade_price"] - out["mid_future"])
    out["price_impact"]   = out["effective_spread"] - out["realized_spread"]
    return out

Spread behavior is endogenous — it reflects the interaction between the limit order book and expected future price movement. This is why microstructure-aware models often include spread, queue position, depth imbalance, cancellation rates, and order flow imbalance together rather than in isolation. The spread is the market's local price of immediacy.

ML & AI

Paper 06

Sentiment Analysis in the Turkish Stock Market (BIST): Generating Signals from Financial News with Qwen and Llama

NLP LLM BIST Qwen Llama certurk23 Quant Lab · 2026

Sentiment analysis in equities is easy to oversell and hard to do well. The useful version treats sentiment as a conditional forecast variable: a noisy signal that may explain cross-sectional returns, volatility, or volume once aligned to the correct event timestamp and trading horizon.

In the Turkish market, open-weight LLM ecosystems have matured significantly. Qwen has publicly released Qwen3-family weights, and Meta promotes Llama 4-family models and Llama Stack distributions for self-hosted workflows. That makes a local, Turkish-language financial NLP stack increasingly practical.

The central problem is label design. For a news item arriving at time \(t\), a common target is the forward return over horizon \(h\):

Forward Return Target
$$r_{t,h} = \ln\left(\frac{P_{t+h}}{P_t}\right)$$

A practical sentiment score can be defined as:

Sentiment Score
$$s_t = p(\text{bullish} \mid x_t) - p(\text{bearish} \mid x_t)$$

Local Inference Prototype

bist_sentiment.py Python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "Qwen/Qwen3-8B"  # replace with local checkpoint
tokenizer  = AutoTokenizer.from_pretrained(model_name)
model      = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

texts = [
    "Şirket, beklentilerin üzerinde net kar açıkladı ve yeni yatırım planı duyurdu.",
    "Faiz kararı sonrası banka hisselerinde satış baskısı artıyor."
]

inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt")
with torch.no_grad():
    logits = model(**inputs).logits
    probs  = torch.softmax(logits, dim=-1)

print(probs)  # [bearish, neutral, bullish]

Critical Pitfalls

  • Leakage: if you label a news article with end-of-day return even though it arrived after the close, your backtest is contaminated
  • Non-stationarity: macro regime changes, regulation, inflation cycles, and sector narratives can all alter the meaning of the same words over time
  • Turkish morphology: off-the-shelf English finance sentiment models often misread Turkish context, especially in KAP-style disclosures

The edge comes not from "using an LLM," but from building a time-aware, Turkish-native, market-aligned inference pipeline. The alpha comes from labeling discipline, entity resolution, and proper out-of-sample testing.

Paper 07

Sovereign AI: Why Local LLMs Are the Future of Quant Research

Sovereign AI LLM Privacy Infrastructure certurk23 Quant Lab · 2026

Quant research increasingly depends on language models, but most discussions focus on benchmark performance rather than deployment sovereignty. In real investment workflows, the critical questions are: "Where does the data go?", "Who controls the inference path?", and "Can the full pipeline be audited?"

That is the case for Sovereign AI: the "Bastion" philosophy — the research environment is a defensible stronghold, not a public plaza. Meta's Llama 4 family and Qwen3 open-weight models make the local-model ecosystem deep enough for general reasoning, code assistance, document QA, and domain adaptation without external APIs.

The Case for Local Deployment: Four Pillars

  • Data minimization: local inference reduces the exposure surface for unpublished factor research, private issuer notes, and order-level analytics
  • Auditability: you can log prompts, outputs, retrieval context, model hashes, and evaluation metrics in one controlled system
  • Latency predictability: local inference eliminates WAN variability and vendor-side queueing, making the system operationally deterministic
  • Customization: freedom to fine-tune, distill, constrain tools, attach internal RAG stores, and harden the model around your own research style
Sovereign Research Pipeline
$$\text{Research Output} = f(\text{local LLM},\ \text{private data},\ \text{retrieval layer},\ \text{tool policies})$$

Self-Hosted Inference Example

sovereign_llm.py Python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "meta-llama/Llama-3.1-8B-Instruct"
tokenizer  = AutoTokenizer.from_pretrained(model_name)
model      = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = """You are a quantitative research assistant.
Summarize the main model-risk concerns in this backtest report."""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=200)

print(tokenizer.decode(output[0], skip_special_tokens=True))

For generic drafting, external services may be fine under policy. For alpha research, portfolio analytics, internal memos, and data-rich experimentation, local models are structurally better aligned with how serious research organizations manage information. The future of quant research is not merely "AI-assisted" — it is sovereign, inspectable, and local-first.

Paper 08

Automating Alpha Discovery with Genetic Algorithms

Alpha Optimization Genetic Algorithms Research Automation certurk23 Quant Lab · 2026

Most alpha research pipelines still rely on human-guided search. Genetic algorithms offer an alternative — instead of hand-designing every parameter combination, we let a population of candidate strategies evolve through selection, crossover, and mutation. Trading rules often define non-convex search spaces where gradient methods do not fit naturally and brute force quickly becomes expensive.

A strategy chromosome might encode: feature choices, lookback windows, threshold values, position sizing rules, stop-loss or take-profit logic, and rebalance frequency. The fitness function should not reward raw return alone:

Fitness Function
$$\text{Fitness}(g) = PSR(g) - \lambda \cdot \text{Turnover}(g) - \eta \cdot \text{MaxDrawdown}(g)$$

This matters because the optimizer will exploit whatever you reward. If you maximize in-sample Sharpe without penalties, the algorithm may discover hyperactive, brittle, or capacity-blind rules.

Minimal GA Implementation

genetic_alpha.py Python
import random
import numpy as np

def strategy_fitness(params, prices):
    short_win, long_win, threshold = params
    if short_win >= long_win:
        return -1e6

    short_ma = prices.rolling(short_win).mean()
    long_ma  = prices.rolling(long_win).mean()

    signal = (short_ma / long_ma - 1 > threshold).astype(int) - \
             (short_ma / long_ma - 1 < -threshold).astype(int)

    ret     = signal.shift(1) * prices.pct_change()
    sharpe  = np.sqrt(252) * ret.mean() / ret.std() if ret.std() > 0 else -999
    turnover = signal.diff().abs().mean()

    return sharpe - 0.5 * turnover

def mutate(params):
    p = params.copy()
    idx = random.randint(0, 2)
    if   idx == 0: p[0] = max(2, p[0] + random.randint(-3, 3))
    elif idx == 1: p[1] = max(5, p[1] + random.randint(-5, 5))
    else:          p[2] = max(0.0, p[2] + random.uniform(-0.01, 0.01))
    return p

Evolutionary search is powerful precisely because it can overfit aggressively. A robust workflow uses nested validation, purged cross-validation, parameter stability checks, transaction-cost modeling, and inferential metrics such as PSR or deflated Sharpe. The best way to think about GA in quant finance is not "automatic profit machine," but automated hypothesis generator.

Paper 09

Slippage and Latency Modeling in Backtesting

Execution Backtest Market Impact Latency certurk23 Quant Lab · 2026

Backtests are usually too optimistic for one simple reason: they assume the market waited for you. Between the instant a signal is computed and the instant an order is filled, several things happen. Threads wake up, messages are serialized, risk checks run, gateways forward packets, the venue processes the order, and other participants move the book. By the time the fill occurs, the price you thought you traded may no longer exist.

Latency Decomposition
$$\Delta t = \Delta t_{\text{decision}} + \Delta t_{\text{queue}} + \Delta t_{\text{network}} + \Delta t_{\text{venue}}$$

A practical fill model for a buy order (where \(M_t\) is the reference midprice at decision time):

Fill Price Model
$$P_{\text{fill}} = M_{t+\Delta t} + \frac{1}{2}S_{t+\Delta t} + I(q) + \epsilon$$

A commonly used specification for market impact uses a square-root law:

Square-Root Impact Model
$$I(q) = \eta \sigma \sqrt{\frac{q}{V}}$$

where \(\sigma\) is volatility, \(V\) is available volume, and \(\eta\) is a calibrated coefficient.

Python Fill Simulator

fill_model.py Python
import numpy as np

def simulate_fill(mid, spread, sigma, q, V, latency_ms, side, eta=0.1):
    # side: +1 for buy, -1 for sell
    impact = eta * sigma * np.sqrt(max(q, 1) / max(V, 1))
    noise  = np.random.normal(0, spread * 0.05)

    # latency drift: price can move during delay
    drift      = np.random.normal(0, sigma * np.sqrt(latency_ms / 1000.0))
    future_mid = mid + drift

    fill = future_mid + side * (0.5 * spread + impact) + noise
    return fill

# Example
for side in [1, -1]:
    f = simulate_fill(
        mid=100.0, spread=0.02, sigma=0.01,
        q=10_000, V=1_000_000, latency_ms=8, side=side
    )
    print(ff"Fill ({'buy' if side > 0 else 'sell'}): {f:.4f}")

A strategy with a 1.5 Sharpe ratio in a frictionless backtest may collapse below 0.5 after realistic fill modeling. Mean reversion strategies are especially vulnerable because edge decays quickly and costs are frequent. The broader lesson: PnL does not arise from signal alone — it arises from signal after implementation. Slippage and latency are not "cost assumptions." They are part of the strategy definition.

Alternative Data

Paper 10

The Role of Alternate Data in Quantitative Finance

Alt Data Machine Learning Nowcasting Satellite certurk23 Quant Lab · 2026

Traditional market data tells you what prices did. Alternate data aims to tell you why they might do something next. If public prices already summarize common information, a quant edge must often come from signals that are earlier, orthogonal, or simply structured in a way that the market has not fully absorbed.

A useful framing: for a news item or data point arriving at time \(t\), the research task is:

Nowcasting Model
$$y_{t+1} = \beta^\top x_t + \epsilon_{t+1}$$

The challenge is not writing that equation — it is making sure \(x_t\) is truly available at time \(t\), properly normalized, and economically linked to the target. Alternate data examples include:

  • Satellite imagery: parking-lot traffic, shipping flows, refinery utilization, agricultural conditions
  • Transaction panels: nowcast consumer spend trends before official earnings releases
  • E-commerce price data: proxy for inflation pressure or competitive intensity

Machine Learning Pipeline

alt_data_model.py Python
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import TimeSeriesSplit
from sklearn.metrics import r2_score

df = pd.DataFrame({
    "web_price_change":    [0.01, -0.02,  0.00,  0.03,  0.01, -0.01],
    "traffic_index":       [102,  98,   101,   110,   108,   99  ],
    "transaction_growth":  [0.05,  0.01,  0.02,  0.06,  0.04,  0.00],
    "target":              [0.02, -0.01,  0.00,  0.03,  0.01, -0.02]
})

X = df.drop(columns=["target"])
y = df["target"]

tscv   = TimeSeriesSplit(n_splits=3)
scores = []

for train_idx, test_idx in tscv.split(X):
    model = RandomForestRegressor(n_estimators=200, random_state=42)
    model.fit(X.iloc[train_idx], y.iloc[train_idx])
    pred = model.predict(X.iloc[test_idx])
    scores.append(r2_score(y.iloc[test_idx], pred))

print("Mean OOS R²:", sum(scores) / len(scores))

Data Governance Checklist

  • Is the timestamp point-in-time correct?
  • Is the vendor revising history retroactively?
  • Are there survivorship or coverage biases in the dataset?
  • Does the dataset include entities that later disappeared?
  • What rights do we actually have to use and store the data?

The best alternate-data teams behave less like headline chasers and more like measurement scientists. They spend as much time on ontology, joins, timestamp integrity, and missing-data behavior as they do on modeling. The real edge comes from converting messy external traces into clean, point-in-time, economically interpretable variables. That translation layer is where most of the alpha lives.