PPO Portfolio
Reinforcement Learning for Indian Bank Equities
Problem
Standard portfolio allocators do not internalise transaction costs, leading to high turnover and erosion of returns.
Approach
Portfolio allocation across HDFC, ICICI, and Kotak (2018-2023) formulated as a continuous-action MDP. Reward combines returns with a turnover penalty. PPO is trained against this reward and evaluated on returns, turnover, and drawdown profile.
Methodology
- Universe: HDFC, ICICI, Kotak daily data, 2018-2023.
- Action space: continuous allocation weights, softmax-constrained.
- Reward: returns net of turnover penalty.
- Trainer: PPO with standard actor-critic.
- Evaluation: portfolio evolution and drawdown curves alongside summary metrics.
Results
- 155.7% cumulative return over the test window.
- Sharpe ratio of 0.730.
- Turnover of 0.683, indicating learned cost awareness.
- Drawdown profile consistent with allocator behaviour rather than aggressive trading.