Article
aayush garg
garg-aayush
AI & ML interests
None yet
Organizations
RLHF Papers
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452
LLM Tech Reports
RLHF Papers
-
Proximal Policy Optimization Algorithms
Paper • 1707.06347 • Published • 11 -
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Paper • 2305.18290 • Published • 66 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 452
models 47
garg-aayush/cs336-grpo-exps
Updated
garg-aayush/cs336_exp-iter_exps
Updated
garg-aayush/llama31-8b-sft-mask
Updated
garg-aayush/llama31-8b-sft-nomask
Updated
garg-aayush/ckpt-140
Updated
garg-aayush/ckpt-100
Updated
garg-aayush/test
Updated
garg-aayush/llama-2-7b-miniplatypus-1K
Updated • 2
garg-aayush/zephyr-7b-sft-qlora
Updated
garg-aayush/wolf_plushie
Text-to-Image • Updated • 9 • • 1