The MLCommons initiative has unveiled Croissant, a metadata format designed to facilitate how machine learning (ML) practitioners interact with datasets.

The challenges in ML development are manifold, ranging from disparate data representations such as text, structured data, images, audio, and video, each with its unique arrangements and formats.

While existing metadata formats like schema.org and DCAT cater to general datasets, they fall short of meeting the specific needs of ML practitioners.

Croissant, a collaborative work within the MLCommons initiative, offers a standardised method to describe and organise ML-ready datasets.

Building upon the foundation of schema.org, Croissant introduces layers for ML-specific metadata, data resources, organisation, and default ML semantics.

Major ML platforms, including Kaggle, Hugging Face, and OpenML, along with frameworks like TensorFlow, PyTorch, and JAX, have announced their support for the Croissant format.

The 1.0 release of Croissant includes a comprehensive specification, example datasets, an open-source Python library for validation and generation of Croissant metadata, and a user-friendly visual editor for creating intuitive dataset descriptions.

In the realm of ML, where the majority of work revolves around data, the absence of a common format imposes a substantial data development burden.

Croissant aims to alleviate this burden by streamlining the ML development process, facilitating dataset discoverability, simplifying data cleaning and analysis, and enabling model training with minimal code.

Croissant datasets are already available on prominent platforms like Google Dataset Search, Hugging Face, Kaggle, and OpenML.

How well do you really know your competitors?

Access the most comprehensive Company Profiles on the market, powered by GlobalData. Save hours of research. Gain competitive edge.

Company Profile – free sample

Thank you!

Your download email will arrive shortly

Not ready to buy yet? Download a free sample

We are confident about the unique quality of our Company Profiles. However, we want you to make the most beneficial decision for your business, so we offer a free sample that you can download by submitting the below form

By GlobalData
Visit our Privacy Policy for more information about our services, how we may use, process and share your personal data, including information of your rights in respect of your personal data and how you can unsubscribe from future marketing communications. Our services are intended for corporate subscribers and you warrant that the email address submitted is your corporate email address.

Integration with TensorFlow Datasets allows for data ingestion, while the Croissant editor UI enables users to inspect and modify metadata.

To publish a Croissant dataset, creators can use the editor UI to generate metadata automatically, publish it on their dataset webpage, or leverage supported repositories.