All
Search
Images
Videos
Maps
News
More
Shopping
Flights
Travel
Notebook
Report an inappropriate content
Please select one of the options below.
Not Relevant
Offensive
Adult
Child Sexual Abuse
Length
All
Short (less than 5 minutes)
Medium (5-20 minutes)
Long (more than 20 minutes)
Date
All
Past 24 hours
Past week
Past month
Past year
Resolution
All
Lower than 360p
360p or higher
480p or higher
720p or higher
1080p or higher
Source
All
Dailymotion
Vimeo
Metacafe
Hulu
VEVO
Myspace
MTV
CBS
Fox
CNN
MSN
Price
All
Free
Paid
Clear filters
SafeSearch:
Moderate
Strict
Moderate (default)
Off
Filter
Microsoft
v-trmyl
Deep Reinforcement Learning Through Policy Optimization
Reinforcement Learning (Deep RL) has seen several breakthroughs in recent years. In this tutorial we will focus on recent advances in Deep RL through policy gradient methods and actor critic methods. These methods have shown significant success in a wide range of domains, including continuous-action domains such as manipulation, locomotion, and ...
Jun 5, 2024
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Language Model Training
論文紹介:Direct Preference Optimization: Your Language Model is Secretly a Reward Model
speakerdeck.com
Aug 19, 2024
7:52
21. Direct Preference Optimization (DPO) (Rafailov et al., 2023)
YouTube
LOADING_
1 views
2 months ago
21:06
6기 논문 리뷰 📎 DPO(2024.06) Direct Preference Optimization: Your Language Model is Secretly a Reward ...
YouTube
KMU X:AI
47 views
3 months ago
Top videos
4:00
When Is Policy Optimization Useful For Reinforcement Learning?
YouTube
AI and Machine Learning
1 month ago
3:19
Can Policy Optimization Help Reinforcement Learning Succeed?
YouTube
AI and Machine Learning
2 views
1 month ago
13:45
An Introduction to Proximal Policy Optimization (PPO) in Deep Reinforcement Learning
YouTube
Udacity-DeepRL
17.8K views
Jun 3, 2019
Direct Preference Optimization: Your Language Model is Secretly a Reward Model Reward Modeling
59:37
The Evolution of LLM Preference Optimization • Guest Lecture at BITS Pilani Goa • Oct 10, 2025
YouTube
Aman Chadha
26 views
3 months ago
7:55
[Paper Review] DPO : Your language model is secretly a reward model
YouTube
LOADING_
5 views
4 months ago
13:14
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
bilibili
dalaska的欢愉
6 views
2 weeks ago
4:00
When Is Policy Optimization Useful For Reinforcement Learning?
1 month ago
YouTube
AI and Machine Learning Explained
3:19
Can Policy Optimization Help Reinforcement Learning Succeed?
2 views
1 month ago
YouTube
AI and Machine Learning Explained
13:45
An Introduction to Proximal Policy Optimization (PPO) in Deep Reinfo
…
17.8K views
Jun 3, 2019
YouTube
Udacity-DeepRL
1:27:21
RLHF, PPO and DPO for Large language models
3.6K views
Feb 18, 2024
YouTube
Arvind N
16:27
An introduction to Reinforcement Learning
702K views
Apr 2, 2018
YouTube
Arxiv Insights
25:51
Part 1 of 3 — Proximal Policy Optimization Implementation: 11 C
…
62.9K views
Sep 10, 2021
YouTube
Weights & Biases
19:50
An introduction to Policy Gradient methods - Deep Reinforcement Le
…
254.6K views
Oct 1, 2018
YouTube
Arxiv Insights
11:31
Reinforcement Learning in DeepSeek-R1 | Visually Explained
42.2K views
11 months ago
YouTube
AGI Lambda
54:00
Deep Reinforcement Learning with Proximal Policy Optimization (PP
…
7.7K views
Jan 15, 2024
YouTube
Luke Ditria
30:48
Introduction to Proximal Policy Optimization Tutorial with OpenAI
…
9K views
Nov 17, 2020
YouTube
Python Lessons
1:10
What is Proximal Policy Optimization ( PPO)?
19 views
2 months ago
YouTube
Data Science Made Easy
29:04
Introduction to Proximal Policy Optimization algorithm (PPO)
12.8K views
Mar 31, 2020
YouTube
Python Lessons
14:47
Reinforcement Learning: on-policy vs off-policy algorithms
23.5K views
Nov 13, 2023
YouTube
CodeEmporium
21:15
Direct Preference Optimization (DPO) - How to fine-tune LLMs dir
…
28.5K views
Jun 21, 2024
YouTube
Serrano.Academy
1:03
Direct Policy Gradients Direct Optimization of Policies in Discret
…
66 views
Oct 28, 2020
bilibili
开题开了一万年
38:24
Proximal Policy Optimization (PPO) - How to train Large Language Mod
…
76.4K views
Jan 24, 2024
YouTube
Serrano.Academy
5:04
Direct optimization study in the Optimization Tool of ANSA
1.5K views
Nov 18, 2022
YouTube
BETA CAE Systems
Direct Preference Optimization (DPO) explained
100 views
Dec 27, 2024
substack.com
17:32
Design Parameter Optimization (Direct Optimization)
13.4K views
Feb 5, 2023
YouTube
Engineering Educator Academy
37:44
MOPO: Model-Based Offline Policy Optimization
2.7K views
Sep 28, 2020
YouTube
Simons Institute for the Theory of Computing
17:50
Proximal Policy Optimization Explained
75.8K views
May 20, 2021
YouTube
Edan Meyer
Direct Nash Optimization: Teaching language models to self-improve
…
Sep 3, 2024
Microsoft
22:17
GRPO - Group Relative Policy Optimization - How DeepSeek trai
…
10.1K views
8 months ago
YouTube
Serrano.Academy
35:01
Let's Code Proximal Policy Optimization
17.3K views
May 28, 2021
YouTube
Edan Meyer
58:07
Aligning LLMs with Direct Preference Optimization
33K views
Feb 8, 2024
YouTube
DeepLearningAI
1:17:42
Optimal Control (CMU 16-745) 2023 Lecture 1: Intro and Dynamics Rev
…
23.1K views
Jan 19, 2023
YouTube
CMU Robotic Exploration Lab
7:59
Multiple Variable Optimization with Equality Constraints (Direct Subst
…
6.7K views
Feb 20, 2021
YouTube
Reindolf Boadu
1:02:47
Proximal Policy Optimization (PPO) is Easy With PyTorch | Full PPO T
…
84.2K views
Dec 24, 2020
YouTube
Machine Learning with Phil
18:59
ITIL 4 Strategist: Direct, Plan & Improve | Direct, Plan & Improve:
…
3.8K views
Oct 18, 2021
YouTube
GogoTraining - PeopleCert Accredited (ATO)
See more videos
More like this
Feedback