Rajiv Chaitanya M

PPO Portfolio

Reinforcement Learning for Indian Bank Equities

Reinforcement Learning, Quantitative Finance·Apr 2025 – Jun 2025

PPOReinforcement LearningQuantIndian Equities

Problem

Standard portfolio allocators do not internalise transaction costs, leading to high turnover and erosion of returns.

Approach

Portfolio allocation across HDFC, ICICI, and Kotak (2018-2023) formulated as a continuous-action MDP. Reward combines returns with a turnover penalty. PPO is trained against this reward and evaluated on returns, turnover, and drawdown profile.

Methodology

Universe: HDFC, ICICI, Kotak daily data, 2018-2023.
Action space: continuous allocation weights, softmax-constrained.
Reward: returns net of turnover penalty.
Trainer: PPO with standard actor-critic.
Evaluation: portfolio evolution and drawdown curves alongside summary metrics.

Results

155.7% cumulative return over the test window.
Sharpe ratio of 0.730.
Turnover of 0.683, indicating learned cost awareness.
Drawdown profile consistent with allocator behaviour rather than aggressive trading.

155.7%

Cumulative Return

0.730

Sharpe

0.683

Turnover

Assets

Projects

PPO Portfolio

Problem

Approach

Methodology

Results

GridCast

Problem

Approach

Methodology

Results