Reinforcement Learning NLP

18h

EMNLP2025 | The Combination of SFT and RL: vivo AI Lab Proposes a New Post-Training Method

Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL) fine-tuning are two common methods for post-training large models. While reinforcement learning fine-tuning has made significant progress ...

15h

Daguan Data Intelligent Recommendation: The Core of Building a Highly Relevant Recommendation Mechanism for Short Video Information Platforms

When opening a short video information app and swiping across the screen, users expect to find content that is 'just what they want to see' — but the reality is often different: during commutes, users ...

PWM

Why AlphaGo, not ChatGPT, will shape the future of wealth management

Wealth managers, like much of the business world, have been focusing on LLMs, but the real innovation in managing assets may ...

Phishing 3.0: AI Threats And Overcoming The Risk Of Human Reluctance

When speaking to and surveying end-users about security tools and policies, a few themes frequently arise. One of them is ...

The Information

OpenAI’s Models Are Getting Too Smart For Their Human Teachers

In the fight to improve AI models, Anthropic and OpenAI have doubled down on two methods: letting models train on fake clones ...

Physics World

The pros and cons of reinforcement learning in physical science

David Silver of Google DeepMind thinks AIs that ‘learn by experience’ are the future of AI – but maybe not in particle ...

Tech Xplore on MSN

The AI model that teaches itself to think through problems, no humans required

Artificial intelligence is getting smarter every day, but it still has its limits. One of the biggest challenges has been ...

Google’s Embedding Gemma On-Device RAG Made Easy for NLP Efficiency

Learn how Google’s Embedding Gemma redefines compact AI with customizable dimensions and advanced NLP features for developers ...

Nature

Secrets of DeepSeek AI model revealed in landmark paper

DeepSeek says its R1 model did not learn by copying examples generated by other LLMs. Credit: David Talukdar/ZUMA via Alamy ...

The Register on MSN

China's DeepSeek applying trial-and-error learning to its AI 'reasoning'

Model can also explain its answers, researchers find Chinese AI company DeepSeek has shown it can improve the reasoning of its LLM DeepSeek-R1 through trial-and-error based reinforcement learning, and ...

Futurism

ChatGPT Is Blowing Up Marriages as Spouses Use AI to Attack Their Partners

Across the world, marriages are being destroyed as spouses use AI like OpenAI's ChatGPT to attack their partners.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results