Meta has announced the release of a new AI model known as the “Self-Taught Evaluator,” designed to reduce human involvement in AI development.

This tool, first introduced in an August paper, employs the “chain of thought” technique, mirroring the approach of OpenAI’s o1 models, to enhance the reliability of AI judgments.

The “chain of thought” method, which involves simplifying complex problems into smaller, logical steps, is said to have shown promise in boosting the accuracy of AI responses, particularly in intricate domains such as science, coding, and mathematics.

Meta’s researchers have taken a significant step by training the evaluator model using solely AI-generated data, thus bypassing the need for human input at this phase of development.

Using AI to assess other AI models is claimed to provide a glimpse into the future of creating autonomous AI agents that are capable of learning from their own errors.

These self-improving models are said to eliminate the current necessity for Reinforcement Learning from Human Feedback (RLHF), a process that is both costly and inefficient.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

RLHF requires human annotators with specialised skills to label data and confirm the accuracy of complex mathematical and writing solutions.

Meta researcher Jason Weston: “We hope, as AI becomes more and more super-human, that it will get better and better at checking its work, so that it will actually be better than the average human.”

He emphasised the importance of self-teaching and self-evaluation in reaching unprecedented levels of AI proficiency.

Recently, Meta collaborated with Hollywood company Blumhouse, known for producing horror films, to trial its generative AI video model, Movie Gen.