The Kelly Criterion and Position Sizing

A high win rate doesn’t mean profit, and a correct direction doesn’t mean correct sizing. A trader with 60% win rate and 1:1 win-loss ratio — bet too heavy and a streak of 5 negatives blows the account at once; bet too light and 10 years yields almost nothing. “How much should I bet” is a more fundamental question than “which way.” The Kelly Criterion is the mathematical answer to this question — derived by John Kelly at Bell Labs in 1956 from Claude Shannon’s information theory, later applied by Ed Thorp to blackjack and warrant arbitrage, and ultimately became the cornerstone of professional position management. This guide assembles Kelly and related sizing formulas — expected value, risk of ruin, Sharpe, vol targeting, mean-variance, risk parity, CPPI — into a primer you can return to.

Framework — why sizing dominates

Red Casino-Style Die
Casino dice — the Kelly formula originated from John Kelly at Bell Labs thinking about gambling and odds-as-signal. This article organizes 12 position-sizing formulas (Kelly / Half / Quarter Kelly · Sharpe · Sortino · vol targeting · risk parity · CPPI / Optimal F) into an actionable primer.
Image: Wikimedia Commons / CC BY-SA 3.0.

Imagine a game: flip a coin, heads you win 2x, tails you lose your stake. Theoretically long-run expected value = +50% — bet anything and you’ll make money? Wrong. If you bet everything every time, a single tails brings you to zero — even with positive EV, long-run probability of ruin is 1.

That’s the problem John Kelly addressed in his 1956 paper A New Interpretation of Information Rate: the bet fraction that maximizes long-run exponential growth of capital. His answer is the Kelly Formula.

  1. Win rate × win-loss ratio ≠ profit. 60% win rate, 1:1 win-loss ratio → positive EV, but 100% position size goes to zero after 6 negatives. Bet size turns “arithmetic expectation” into “geometric expectation” — the core of compounding.
  2. Arithmetic average ≠ geometric average. Consecutive years of +50% and -50%: arithmetic average 0%, but geometric average = √(1.5×0.5) - 1 = -13.4%. The compounding world only sees geometric averages — that’s why big drawdowns matter more than big returns.
  3. Sizing = the main tool of risk management. Same strategy, different sizing turns a Sharpe 0.8 strategy into a Sharpe 3 “disaster” (over-bet) or a Sharpe 0.2 “calm” (under-bet). Sharpe ratio unchanged, but long-term compound growth rate completely different.
  4. The four answers to “how much should I bet.”Kelly (geometric optimum, but assumes true probabilities known); ② Volatility Target (lock portfolio vol); ③ Mean-Variance (minimum vol for a given return target); ④ Risk Parity (each asset contributes equal risk). The four methods answer different questions.

Bottom Line · Picking direction gives you expected value, picking size gives you compounding.

Professional investors spend 70% of effort on risk budgeting, 30% on direction. Retail does exactly the reverse. Ed Thorp, Jim Simons, and Paul Tudor Jones have all publicly stated: “The key to wealth is not being wrong, it’s not being wrong big.”

Expected Value — Expected Value

Before Kelly, first understand Expected Value (EV). Without positive EV, no sizing formula can rescue a losing strategy.

  1. EV · Expected Value. EV = p × W − (1−p) × L. p = win rate; W = amount won per win; L = amount lost per loss. EV > 0 is the minimum bar to “consider betting”, but doesn’t mean you can bet anything. Example: p=60%, W=$100, L=$100 → EV = $60 − $40 = +$20/round.
  2. b · Odds / Win-Loss Ratio. b = W / L. For every $1 lost, win $b. The classic Kelly parameter. In equity trading: roughly “take-profit space / stop-loss space.”
  3. Edge. Edge = EV / bet = p × b − (1−p). “Average cents per $1 bet.” Edge > 0 is necessary for betting. Professional casino players’ long-term edge is typically 1-2%.
  4. Win Rate. Profitable trades / total trades. Doesn’t equal profitability — trend-following often has 35% win rate but 3:1 win-loss ratio and still makes money.
  5. Payoff Ratio. PR = average gain / average loss. Average amount won on winners vs average lost on losers. PR × Win Rate > 1 − Win Rate is the positive EV condition.
  6. Breakeven Win Rate. BEWR = 1 / (1 + PR). Given a payoff ratio, what win rate breaks even? PR=1 → 50%; PR=2 → 33%; PR=3 → 25%.
  7. Profit Factor. PF = total wins / total losses. One of the most-used backtest metrics. PF > 1.5 acceptable, > 2 excellent, > 3 watch for over-fit.
  8. Expectancy. E = (Win% × AvgWin) − (Loss% × AvgLoss). Average dollars earned per trade. Expectancy × annual trade count ≈ annualized return (ignoring compounding).

Advice · Confirm EV positive first, then talk sizing.

Many beginners apply Kelly to clearly negative-EV strategies (e.g., “buy open / sell close every day”), and the result is “faster blowup.” Kelly gives optimal sizing for positive-EV strategies; for negative-EV it gives the “fastest path to zero” — it’s only an amplifier, not a magic wand.

The Kelly Formula — the Kelly Criterion

The Kelly formula has three common forms, corresponding to: binary betting (gambling), multi-bet independent (portfolio), continuous distribution (financial assets). Start with the simplest.

  1. Kelly (discrete) · Classic binary form. f* = (bp − q) / b = p − q/b, where p = win rate, q = 1 − p, b = odds (win $b per $1). f* = capital fraction to bet each time. Example: p = 60%, b = 1 (1-to-1), f* = 0.6 − 0.4/1 = 20%. Bet 20% of capital each round to maximize long-run geometric growth.
  2. Kelly (continuous) · Financial asset form. f* = (μ − r) / σ², where μ = asset expected excess return, r = risk-free rate, σ² = asset variance. Derived by Merton in 1969. For continuous distributions, optimal sizing = direct reflection of excess Sharpe. Sharpe² = 2 × Kelly’s geometric growth rate — the deep link between Kelly and Sharpe.
  3. Kelly (multi-asset) · Generalized Kelly. f* = Σ⁻¹ × (μ − r·1), where Σ = covariance matrix, μ = return vector. For multiple correlated assets, can’t apply Kelly independently to each — must consider covariance. This formula is essentially Markowitz’s tangency portfolio × a risk preference parameter.

Kelly’s three mathematical properties: ① long-run geometric growth rate maximization (log-optimal); ② shortest time to reach any wealth target; ③ never ruined (assuming continuous adjustment).

But three preconditions: ① probability + odds “accurately known”; ② position can be infinitely divided; ③ only concerned with long-run log-wealth.

Kelly Examples in Various Scenarios

ScenarioWin Rate pWin-Loss bKelly f*Meaning
Thorp blackjack51%1:12%Bet 2% of bankroll per hand
Trend CTA40%3:120%20% position per pair
Intraday momentum55%1:110%Slightly above 50%, even win/loss
Merger arb90%1:153.3%High win rate + extreme low odds
Event-driven50%2:125%Good odds compensate moderate win rate
S&P 500 (continuous)μ=6%, σ=16%234%Theoretical full + 135% leverage (unrealistic)

⚠ Key · Kelly outputs the theoretical optimum “given true probability known.”

In reality, both probability and odds are estimated — your “60%” may be a true 52%. This “parameter uncertainty” means the real world should use fractional Kelly (see next section), not full Kelly. Professional investors typically bet at half or quarter Kelly.

Fractional Kelly — why practitioners under-bet

Academic Kelly is optimal under the assumption of “parameters fully known.” In reality parameters always have error, so professional investors “discount” — this is Fractional Kelly.

  1. Half Kelly. f = 0.5 × f*. The most-used “robust version.” Geometric growth loses only 25% (0.75 × Full Kelly), but drawdowns shrink dramatically. Ed Thorp used Half Kelly when managing his fund.
  2. Quarter Kelly. f = 0.25 × f*. More conservative version. Survival first, growth second. Many CTA funds actually run around 0.2-0.3× Kelly.
  3. Full Kelly Drawdown · The cost of Full Kelly. Full Kelly’s theoretical drawdown probability is 50%+, with expected max drawdown ~50%. Psychological tolerance is far below mathematical optimum — even if you understand the math, your clients / stakeholders don’t.
  4. Kelly Leverage. Kelly often gives f* > 100% (requiring leverage). S&P 500 theoretical Kelly ~234% = 2.34× leverage — almost no one actually does this.
  5. Overbet. f > f*. Geometric growth rate drops sharply — 1.5× Kelly growth = 0, 2× Kelly = negative expected return long term.
  6. Underbet. f < f*. Safe but slow. “Below Kelly is always positive expectation,” so being conservative has almost no mathematical cost, only opportunity cost.

Geometric Growth Rate vs Kelly Multiple

Bet Multiple (vs Kelly)Geometric Growth Rate (relative)Max Drawdown (sample)Verdict
0.25× (Quarter Kelly)44%~15%Very robust
0.5× (Half Kelly)75%~25%Golden zone
0.75×94%~40%Aggressive
1.0× (Full Kelly)100%~50%+Mathematical optimum
1.5×75%~75%+Over-bet (negative)
2.0×0%HugeZero growth
> 2.0×Negative~100%Long-run inevitable ruin

Risk of Ruin — Risk of Ruin

Risk of Ruin is a core concept of gambling theory. For players with finite samples + concave utility, Sharpe, EV, and Kelly can all fool you — but the “probability of going to zero” can never fool you.

  1. Risk of Ruin. ROR = ((1−A)/(1+A))^C, where A = edge (per bet), C = number of capital units. Finite-sample ruin formula. Smaller edge, fewer capital units → higher ROR. Kelly’s “never ruined” only holds under infinite time + continuous-adjustment.
  2. Gambler’s Ruin. Classic probability problem: each round 50/50 win/lose $1, start with $N, leave at $M. Probability of ruin = M/(M+N). Making small money is easy; doubling is hard.
  3. Max Drawdown. Peak-to-trough decline. A 50% MDD requires 100% recovery to get back — psychologically harder than mathematically.
  4. Calmar Ratio. Calmar = annualized return / |MDD|. “How much annualized per unit of max drawdown.” Calmar > 0.5 acceptable, > 1 excellent, > 2 exceptional. Better than Sharpe at reflecting “experience.”
  5. Ulcer Index. UI = √(Σ DD_t²/T). Root mean square of drawdowns. Proposed by Peter Martin; smoother than MDD. Used to penalize the combined psychological burden of “long small drawdowns” and “short large drawdowns.”
  6. Time to Recovery. Time to recover from MDD trough back to prior peak. High-vol strategies even with good Sharpe can have long recovery times. Post-2008 GFC, S&P took 5 years to recover.
  7. Stop-Loss Threshold. Pre-set “stop at X% drawdown.” Not for preventing loss, but for preventing emotional breakdown. Institutions often use 15% monthly stop, 20-25% annual stop.
  8. VaR · CVaR. VaR_95 = 5% quantile loss; CVaR = average loss beyond VaR. “95% probability, won’t lose more than X in a day.” CVaR (ES / Expected Shortfall) better captures tail risk than VaR; post-2008 replaced VaR as the regulatory mainstream.

Sharpe · Sortino — the performance ratios

Sharpe / Sortino / Calmar / Information Ratio are different versions of “return / risk.” When unsure, default to Sharpe — but actually each answers a slightly different question.

  1. Sharpe · William Sharpe 1966 · Sharpe Ratio. Sharpe = (μ − r) / σ. Most universal. Uses total volatility. Sharpe 1 = usable, > 1 = good, > 2 = excellent, > 3 may indicate over-fitting. CTAs long-term 0.5-0.7, stat arb 2-3, HFT 5-10+.
  2. Sortino. Sortino = (μ − r) / σ_down. Uses only downside volatility as denominator (upside isn’t “risk”). Suitable for asymmetric-distribution strategies (e.g., put selling). Sortino is typically 30-50% higher than Sharpe.
  3. Calmar. Calmar = μ / |MDD|. Return / max drawdown. Most intuitive to clients — “how much did I lose in my worst stretch?”
  4. Information Ratio. IR = α / TE, where α = excess vs benchmark, TE = tracking error. Standard active-management metric. Long-term IR > 0.5 = good active manager; > 1 top tier.
  5. Treynor. Treynor = (μ − r) / β. Uses β as denominator, measuring excess per unit “systematic risk.” Only meaningful in well-diversified portfolios.
  6. Omega Ratio. Ω(r) = ∫(+)/(−) partitioning above/below threshold r. Considers full return-distribution asymmetry; more informative than Sharpe. Academic-popular, but limited industrial use.
  7. MAR Ratio. MAR = CAGR / |MDD|. Created by Managed Account Reports. Variant of Calmar using CAGR rather than annualized average. CTA industry standard.
  8. Sharpe → Kelly. Geometric growth rate ≈ Sharpe² / 2. Under Kelly optimization, long-term geometric growth ≈ half of Sharpe squared. Sharpe 1 = 50bps/yr geometric edge; Sharpe 2 = 2%/yr. This reveals the essence: “high-Sharpe strategy = steeper compounding.”

Volatility Targeting — the simplest sizing rule

Volatility Targeting (Vol Targeting) is the most widely-used sizing method for institutional investors. Simpler than Kelly, doesn’t need expected return estimation — just lock portfolio vol to a target level.

  1. Vol Target. Position = (Target σ / Asset σ) × NAV. Example: Target σ = 10%, SPX σ = 16% → Position = 62.5% NAV. Sizing adjusted by “target vol / asset current vol.” Automatically de-leverages when asset vol expands, and re-levers when vol falls. This is the underlying mechanism of most risk parity / CTAs.
  2. Realized Vol. σ_real = std(daily returns) × √252. Annualized std dev of daily returns over past N days (commonly 20-60 days). EWMA (exponentially weighted) and GARCH are more refined estimation methods.

Why Institutions All Use It

  1. Risk budgeting — knows up front “max vol exposure,” easy to communicate to clients.
  2. Vol clustering — realized vol predicts future vol (especially short-term).
  3. Crisis avoidance — auto de-levers when vol spikes in 2020-03 / 2008.
  4. Long-term return preserved — empirical evidence: vol-targeted portfolios’ Sharpe is 10-20% higher than unmanaged.
  5. Composable — multiple vol-targeted assets layered = risk parity.

Practical Parameters

StrategyTarget σ
Conservative multi-asset (60/40 type)6-8%
Steady absolute return8-10%
Equity fund12-16%
Hedge fund (typical)10-15%
CTA trend15-20%
Aggressive long-short20-30%

Mean-Variance — Mean-Variance Optimization

Harry Markowitz established Modern Portfolio Theory (MPT) in Portfolio Selection in 1952, winning the 1990 Nobel Prize in Economics. Kelly 1956 came 4 years later. Both ask “how to size positions”; only their objective functions differ: Markowitz minimizes variance subject to a return target; Kelly maximizes geometric growth.

  1. MPT · Modern Portfolio Theory. min w'Σw s.t. w'μ = μ_p, Σwᵢ = 1. Given target return μ_p, solve for minimum-variance weights w. Output is the “Efficient Frontier” — the envelope of all “optimal” portfolios.
  2. Efficient Frontier. Upper boundary on the return-risk plane. Any portfolio off the frontier is suboptimal — you can lift return at the same risk.
  3. Tangency Portfolio. w = Σ⁻¹(μ − r·1) / denominator. Connected to the risk-free asset, the line tangent to the efficient frontier touches this point — the maximum Sharpe portfolio, the starting point of the CAL (Capital Allocation Line).
  4. CAPM · Capital Asset Pricing Model. E(Rᵢ) = Rf + βᵢ(Rm − Rf). Pricing theory derived from MPT. Only non-diversifiable systematic risk (β) earns a premium; idiosyncratic risk should be diversified away.
  5. Black-Litterman Model. Goldman Sachs 1990 enhancement. Uses market-equilibrium weights as prior + investor “views” as likelihood → posterior portfolio. Solves pure MPT’s extreme sensitivity to inputs.
  6. Shrinkage. Sample covariance matrix is unstable in high dimensions. Ledoit-Wolf shrinkage “shrinks” sample Σ toward a structured target (identity, diagonal), markedly improving out-of-sample performance.

⚠ MPT’s biggest practical issue · Extreme input sensitivity — “the more optimized, the worse.”

MPT uses historical estimates of μ and Σ as inputs, but μ’s estimation error is far larger than Σ’s. Small errors get amplified by the optimizer into extreme weights (short asset A at 300%, long asset B at 400%). In practice MPT is almost never used directly; instead, robust variants like Black-Litterman / Shrinkage / Risk Parity / Equal Weight are preferred. A simple 60/40 long-term Sharpe is often comparable to “scientifically optimal” MPT.

Risk Parity — Risk Parity

Ray Dalio developed Risk Parity at Bridgewater in 1996, with the core idea: don’t allocate by capital weight, allocate by “risk contribution”. Each asset contributes equally to portfolio risk.

  1. Risk Parity · Equal Risk Contribution. RCᵢ = wᵢ × (Σw)ᵢ / σ_p; require: RCᵢ = 1/N ∀i. Each asset contributes equally to portfolio marginal risk. “In traditional 60/40, equities contribute 90% of the vol” — risk parity uses more bonds + moderate leverage to balance stock-bond risk.
  2. Naive RP · Simplified / Inverse Vol. wᵢ = (1/σᵢ) / Σ(1/σⱼ). Weights only by inverse vol, ignoring correlations. Not true RP, but in practice often performs similarly, simple and reliable.

Classic All Weather Allocation

Asset ClassWeight (typical)
US equities30%
Long Treasuries40%
Intermediate Treasuries15%
Gold7.5%
Commodities7.5%

Pros and Cons

Pros:

Cons:

Other Common Sizing Rules

  1. CPPI · Constant Proportion Portfolio Insurance. Risk asset = m × (NAV − Floor), where m = multiplier (3-5), Floor = floor amount. Add risk asset on NAV rise; reduce on fall. The underlying of “principal-protected + upside participation” products. Drawback: choppy markets cause repeated stops; in 2008 many CPPI products got trapped after touching floor.
  2. Volatility Scaling. Scale each strategy to the same vol, then weight. The underlying mechanism of all multi-strategy platforms (Millennium / Citadel / Balyasny).
  3. Optimal F · Ralph Vince’s Optimal F. Futures extension of Kelly, normalized by historical max loss. More aggressive than Kelly; Ralph Vince’s Mathematics of Money Management. Controversial in practice.
  4. Fixed Fractional. Simplest: always bet X% of account (e.g., 2%). Recommended starting point for new traders. Kelly is essentially “dynamic fixed fractional.”
  5. Fixed Dollar. Bet a constant amount each time (e.g., always $1,000). Doesn’t reduce after losses, doesn’t add after gains. Avoids emotional sizing, but poor long-term compounding.
  6. Anti-Martingale. “Add on wins, reduce on losses.” Opposite of Martingale’s “double on losses.” Natural trend-following sizing — Kelly is also “anti-Martingale” when win rate is stable.

End-to-End Workflow — end-to-end sizing workflow

Once you have a positive-EV strategy (trading, gambling, or business investment), the workflow below takes you from “should I bet” to “how much.”

Step 1 · Verify Positive EV

Step 2 · Estimate Parameters

Step 3 · Compute Kelly

Step 4 · Run + Review

A typical example · How to size a $100K account with a Sharpe 1.0 strategy?

  • Strategy backtest: μ = 15% (annualized excess), σ = 15%, Sharpe = 1.0
  • Continuous Kelly f* = μ/σ² = 0.15 / 0.0225 = 6.67× (theoretical 666% leverage)
  • Half Kelly → 3.3× leverage (still unrealistic)
  • Quarter Kelly → 1.67× leverage (workable)
  • With 20% MDD constraint: actual position < 1.0× NAV
  • Conclusion: use 100% NAV (no leverage), annualized expectation 15%, max drawdown 20-25%, Sharpe held at 1.0
  • Leverage temptation is large, but parameter uncertainty makes the more robust choice

Common Pitfalls — the usual suspects

  1. Compute Kelly from backtest parameters directly. Backtest Sharpe 2.0 may be real-world 0.5. Parameter uncertainty means always discount — Full Kelly is almost never the right answer.
  2. Ignore correlation when stacking. “10% on each of 5 strategies” = 50% position — but if the 5 strategies have correlation 0.8, actual risk equals a 40% single-strategy. Must use the covariance matrix.
  3. Martingale recovery. “Double down on losses, eventually recoverable” — mathematically positive EV but requires infinite capital. In reality probability of ruin > 50%. Every “Martingale machine” in history has blown up.
  4. Kelly applied to negative-EV strategies. Negative EV gives Kelly a negative position (should reverse). If you insist on going long, Kelly becomes the “fastest path to zero” formula. Confirm EV > 0 first.
  5. Confuse fixed-dollar with fixed-fractional. “Always bet $1,000” vs “always bet 1% NAV” yield very different compounding. Must use proportional long-term, otherwise profit doesn’t compound while losses are % of account.
  6. Use too-distant vol. Estimating current vol with 3-year window = under-estimating sudden vol. EWMA or 1-3 month windows better reflect the present.
  7. Ignore “leverage cost.” Kelly often gives > 100% sizing, requiring borrowing. Borrowing cost 3-6%/yr eats much of the excess — actual Kelly should subtract borrowing rate.
  8. One Sharpe rules them all. Strategy A Sharpe 1.0 but thick-tailed vs Strategy B Sharpe 0.8 but normally distributed. Kelly conclusions differ. Kelly ≈ Sharpe²/2 under normality assumption; non-normal needs re-derivation.
  9. MDD stop = selling at the bottom. Strict “20% stop” strategies would have liquidated everything in 2020-03 or 2008-10 → missed the rebound. MDD rules need to combine with “volatility state,” or replace hard stops with vol targeting.
  10. Over-concentration in high-conviction. “I’m very sure this time” → put 40% in. Kelly emphasizes robustness of parameter estimates over confidence — even with high confidence, not exceeding 25% single-asset is recommended.
  11. Sharpe ≠ economic optimum. High Sharpe can come from tail option-selling strategies — 99% of the time earning 1%, 1% of the time losing 30%. Sortino / Calmar / MDD must be viewed together; Sharpe isn’t unique.
  12. Psychology vs math mismatch. Mathematically Half Kelly is optimal, but clients / yourself can’t stand 30% MDD. Psychological tolerance determines actual sizing, not math. Top managers treat psychological tolerance as a hard constraint, not “adjustable parameter.”

References — the people of the Kelly formula

Origins

Modern Practitioners

Institutions