Reinforcement Learning Example Code

News

Intuit on MSN

6 steps to train an AI model to do whatever you want

A recent study shows that 1 in 5 people use AI every day. From the chatbot helping you budget smarter to the recommendations ...

The Information

Everyone Wants To Be a Reinforcement Learning Startup

These days, artificial intelligence developers, investors and founders are all obsessed with “reinforcement learning,” a ...

INFLY TECH Team Solves the Problem of Diversity Collapse in Large Model Training

This groundbreaking research, jointly completed by INFLY TECH, Fudan University, and Griffith University, was published in ...

INFLY TECH Team Proposes DPH-RL Framework: Helping AI Training Bid Farewell to the Dilemma of 'Specialized Learning'

Through in-depth investigation, a research team composed of INFLY TECH, Fudan University, and Griffith University found that the root of the problem lies in the use of the 'reverse KL divergence' ...

IEEE

Survey on Large Language Model-Enhanced Reinforcement Learning: Concept, Taxonomy, and Methods

Abstract: With extensive pretrained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects, such as ...

Microsoft’s new AI framework trains powerful reasoning models with a fraction of the cost

The rStar2-Agent framework boosts a 14B model to outperform a 671B giant, offering a path to state-of-the-art AI without ...

Baidu's new Ernie-4.5 model is open for enterprise use with Apache 2.0 license and increased efficiency

ERNIE-4.5-21B-A3B-Thinking is available now on Hugging Face under an enterprise-friendly Apache 2.0 license — allowing for commercial usage — and is specifically optimized for advanced reasoning, tool ...

IEEE

Autonomous Operations With a Safe Reinforcement Learning Approach for Urban Rail Transit

Abstract: Reinforcement learning has increasingly showcased its potential in decision-making for the autonomous operation of urban rail transit. However, the inability of reinforcement learning to ...

GitHub

Code for Online Preference-based Reinforcement Learning with Self-augmented Feedback from Large Language Model

We introduce RL-SaLLM-F, a novel approach for preference-based reinforcement learning (PbRL) that leverages large language models (LLMs) to provide trajectory feedback without human intervention or ...

GitHub

open-thought/reasoning-gym

Reasoning Gym is a community-created Python library of procedural dataset generators and algorithmically verifiable reasoning environments for training reasoning models with reinforcement learning (RL ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results