Wei Yi

AI

Understand REINFORCE, Actor-Critic and PPO in one go

Use the loss function of the Policy Gradient algorithm as key to understand various reinforcement learning algorithms: REINFORCE, Actor-Critic, and PPO, which are theoretical preparations to understand the Reinforcement Learning from Human Feedback (RLHF) algorithm used to build ChatGPT. Wei Yi · Follow Published in Towards Data Science · 37 min read · 5 hours ago — Image from Unsplash Studying reinforcement learning can be frustrating because the field is cursed with confusing jargon and

Read More »
AI

How Does an Image-Text Foundation Model Work

Learn how an image-text multi-modality model can perform image classification, image retrieval, and image captioning Wei Yi · Follow Published in Towards Data Science · 18 min read · 11 hours ago — Photo by Bozhin Karaivanov on Unsplash Nowadays, there is a surge of multi-modality foundation models. They understand different kinds of data, including text, image, video, audio, and can perform tasks that require the knowledge of…

Read More »