IBM releases database of 1 million faces to combat bias in facial recognition

One of the apparent strengths of artificial intelligence is its ability to remove human bias. However, although this may be the intention, AI systems learn what they are taught; meaning if they are not powered by robust and diverse data sets, bias can still emerge.

The challenge in training AI is clearly demonstrated by facial recognition technology. Facial recognition systems use biometrics to map facial features from an image, and then compares this with a database of known faces to find a match.

Access deeper industry intelligence

Experience unmatched clarity with a single platform that combines unique data, AI, and human expertise.

Find out more

If the data used when training the machine learning software favours particular facial characteristics, problems arise. If, for example, a larger proportion of the data comes from people of a certain ethnicity or skin colour, the system will be better equipped to recognise certain facial features, and will struggle to recognise others.

This means that some users may encounter problems when using facial recognition. According to the New York Times, a study conducted last year by Joy Buolamwini, a researcher at the MIT Media Lab, found that Amazon’s facial analysis software can recognise the face of a white man 99% of the time. However, for darker skinned women, the software made errors in 35% of cases, often misidentifying gender.

To combat this, data sets must be large enough and different enough that the technology learns to recognise a wide variety of different faces regardless of age, gender, ethnicity and skin tone, as not only are errors annoying for users, they point to an inherently unrepresentative dataset.

This will only become more apparent as facial recognition software becomes more commonplace, with the iPhone XR equipped with Face ID and many airports expected to replace passports with biometric facial recognition in the future, highlighting the need for AI systems that are fair and accurate.

GlobalData Strategic Intelligence

US Tariffs are shifting - will you react or anticipate?

Don’t let policy changes catch you off guard. Stay proactive with real-time data and expert analysis.

By GlobalData

IBM’s facial recognition dataset

Today, IBM Research, a subsidiary of the computer hardware company, released a new, large and diverse dataset called Diversity in Faces (DiF) to advance the study of accuracy in facial recognition technology.

Believed to be the first of its kind, DiF provides a data set of annotations of 1 million human facial images using publicly available images from the YFCC-100M Creative Commons data set.

IBM then annotated the faces using ten different coding schemes to measure craniofacial features such as head length, nose length, forehead height and other factors, including age and gender.

By studying a wide range of different faces, it is hoped that diversity and coverage of data for AI facial recognition will improve by providing a more balanced distribution and broader coverage of facial images compared with previous data sets.

The dataset is now available to the global research community upon request.

IBM releases database of 1 million faces to combat bias in facial recognition

Go deeper with GlobalData

Artificial Intelligence in Retail - Case Study: The North Face

Panama: Operators Await Release of 700MHz Spectrum to Drive 4G Adoption and Revenue

Data Insights

Access deeper industry intelligence

US Tariffs are shifting - will you react or anticipate?

IBM’s facial recognition dataset

Artificial Intelligence in Retail - Case Study: The North Face

Panama: Operators Await Release of 700MHz Spectrum to Drive 4G Adoption and Revenue

Go deeper with GlobalData

Circular economy: EU market rules, US repair patchwork, and big tech’s recycling race

Global IoT providers focus on multi-network service management in H2 2025

Accenture Q1 FY26 revenue increases 6% to $18.7bn

Qualcomm wraps up $2.4bn acquisition of Alphawave Semi

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

Go deeper with GlobalData

Data Insights

Access deeper industry intelligence

US Tariffs are shifting - will you react or anticipate?

IBM’s facial recognition dataset

Sign up for our daily news round-up!

Give your business an edge with our leading industry insights.

Go deeper with GlobalData

Go deeper with GlobalData

Access deeper industry intelligence

Sign up for our daily news round-up!

Sign up to the newsletter: In Brief

I would also like to subscribe to:

Thank you for subscribing