NVIDIA Unveils Llama 3.1-Nemotron-70B-Reward to Improve AI Alignment with Human Preferences

October 7, 2024

227

NVIDIA Unveils Llama 3.1-Nemotron-70B-Reward to Enhance AI Alignment with Human Preferences

NVIDIA has launched a groundbreaking reward mannequin, Llama 3.1-Nemotron-70B-Reward, geared toward enhancing the alignment of huge language fashions (LLMs) with human preferences. This growth is a part of NVIDIA’s efforts to leverage reinforcement studying from human suggestions (RLHF) to enhance AI programs, in keeping with NVIDIA Technical Weblog.

Developments in AI Alignment

Reinforcement studying from human suggestions is essential for growing AI programs that may emulate human values and preferences. This system permits superior LLMs similar to ChatGPT, Claude, and Nemotron to generate responses that replicate consumer expectations extra precisely. By incorporating human suggestions, these fashions exhibit improved decision-making capabilities and nuanced conduct, fostering belief in AI purposes.

Llama 3.1-Nemotron-70B-Reward Mannequin

The Llama 3.1-Nemotron-70B-Reward mannequin has achieved the highest place on the Hugging Face RewardBench leaderboard, which evaluates the capabilities, security, and pitfalls of reward fashions. With a powerful rating of 94.1% on General RewardBench, the mannequin demonstrates a excessive capability to determine responses aligning with human preferences.

This mannequin excels throughout 4 classes: Chat, Chat-Arduous, Security, and Reasoning, notably attaining 95.1% and 98.1% accuracy in Security and Reasoning, respectively. These outcomes underscore the mannequin’s capability to soundly reject unsafe responses and its potential assist in domains like arithmetic and coding.

Implementation and Effectivity

NVIDIA has optimized the mannequin for top compute effectivity, boasting a measurement solely a fifth of the Nemotron-4 340B Reward whereas sustaining superior accuracy. The mannequin’s coaching utilized CC-BY-4.0-licensed HelpSteer2 knowledge, making it appropriate for enterprise use instances. The coaching course of mixed two standard approaches, guaranteeing excessive knowledge high quality and advancing AI capabilities.

Deployment and Accessibility

The Nemotron Reward mannequin is obtainable as an NVIDIA NIM inference microservice, facilitating simple deployment throughout numerous infrastructures, together with cloud, knowledge facilities, and workstations. NVIDIA NIM employs inference optimization engines and industry-standard APIs to ship high-throughput AI inference that scales with demand.

Customers can discover the Llama 3.1-Nemotron-70B-Reward mannequin immediately from their browsers or make the most of the NVIDIA-hosted API for large-scale testing and proof of idea growth. The mannequin is accessible for obtain on platforms like Hugging Face, offering builders with versatile choices for integration.

Picture supply: Shutterstock

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

NVIDIA Unveils Llama 3.1-Nemotron-70B-Reward to Improve AI Alignment with Human Preferences

Developments in AI Alignment

Llama 3.1-Nemotron-70B-Reward Mannequin

Implementation and Effectivity

Deployment and Accessibility

Related Articles

Advancing Embodied AI: How Meta is Bringing Human-Like Contact and Dexterity to AI

A Smarter Path to AI: Breaking the Boundaries to ROI from AI

A Frosty Beard for Santa STEM Problem

LEAVE A REPLY Cancel reply

Latest Articles

Advancing Embodied AI: How Meta is Bringing Human-Like Contact and Dexterity to AI

A Smarter Path to AI: Breaking the Boundaries to ROI from AI

A Frosty Beard for Santa STEM Problem

NASA’s Curiosity rover captures 360-degree view of Mars — and finds unusual sulfur stones

AI and Simulative Duties: What It Means for Your Job and Keep Forward | by Prajeesh Prathap | Nov, 2024

NVIDIA Unveils Llama 3.1-Nemotron-70B-Reward to Improve AI Alignment with Human Preferences

Developments in AI Alignment

Llama 3.1-Nemotron-70B-Reward Mannequin

Implementation and Effectivity

Deployment and Accessibility

Related Articles

LEAVE A REPLY Cancel reply

Stay Connected

Latest Articles