News

A recent study shows that 1 in 5 people use AI every day. From the chatbot helping you budget smarter to the recommendations ...
These days, artificial intelligence developers, investors and founders are all obsessed with “reinforcement learning,” a ...
This groundbreaking research, jointly completed by INFLY TECH, Fudan University, and Griffith University, was published in ...
Through in-depth investigation, a research team composed of INFLY TECH, Fudan University, and Griffith University found that the root of the problem lies in the use of the 'reverse KL divergence' ...
Abstract: With extensive pretrained knowledge and high-level general capabilities, large language models (LLMs) emerge as a promising avenue to augment reinforcement learning (RL) in aspects, such as ...
The rStar2-Agent framework boosts a 14B model to outperform a 671B giant, offering a path to state-of-the-art AI without ...
ERNIE-4.5-21B-A3B-Thinking is available now on Hugging Face under an enterprise-friendly Apache 2.0 license — allowing for commercial usage — and is specifically optimized for advanced reasoning, tool ...
Abstract: Reinforcement learning has increasingly showcased its potential in decision-making for the autonomous operation of urban rail transit. However, the inability of reinforcement learning to ...
We introduce RL-SaLLM-F, a novel approach for preference-based reinforcement learning (PbRL) that leverages large language models (LLMs) to provide trajectory feedback without human intervention or ...
Reasoning Gym is a community-created Python library of procedural dataset generators and algorithmically verifiable reasoning environments for training reasoning models with reinforcement learning (RL ...