Skip to content

Why MLArray?

MLArray addresses a gap I repeatedly ran into over the last few years: we have excellent storage formats optimized for machine learning workloads, but no widely usable image format that combines efficient array storage with standardized, software-friendly metadata.

Projects like Zarr and Blosc2 already solve the “store large arrays efficiently” problem extremely well. However, they do not provide a standardized metadata layer for imaging. As a result, it’s difficult to integrate their file formats into common analysis and visualization tools in a meaningful and consistent way.

MLArray is designed to bridge that gap: a machine-learning-friendly array format that preserves metadata and enables a broader ecosystem of tooling around it.


How does MLArray address this gap?

  • A standardized, extensible metadata schema MLArray defines a metadata schema that balances standardization and flexibility: software that supports MLArray has a consistent way to access relevant metadata, while users can still attach arbitrary custom metadata when needed.

  • Preserve original metadata across conversions Users can convert images from arbitrary formats to MLArray while preserving the original metadata in a structured and reproducible way. Tools that integrate MLArray can still access metadata according to the original format’s conventions, which makes MLArray a practical alternative for ML pipelines without breaking downstream analysis or visualization workflows.

  • Machine learning–specific metadata support In addition to format-preserving metadata, MLArray includes a dedicated schema for machine-learning-relevant information, and it also supports storing dynamic metadata outside predefined schemas.


What type of images can I store as MLArray?

In short: any array data.

MLArray was designed with very large N-dimensional images in mind, including:

  • medical imaging (radiology, histopathology, etc.)
  • satellite and remote sensing data
  • general scientific imaging
  • segmentation masks and label maps

Natural image data can also be stored in MLArray, but it is often unnecessary—formats like JPEG and PNG are already a strong default for many ML training pipelines.

MLArray can also store metadata-only or non-array data, such as:

  • bounding boxes
  • regression targets
  • classification results

This can be useful when you want a standardized interface for accessing these annotations and results, enabling simpler analysis and visualization in software that supports MLArray.


How is MLArray optimized for Machine Learning / Deep Learning?

MLArray uses Blosc2 as its storage backend, which provides several properties that are particularly well-suited for machine learning and deep learning workloads.

For details, see: ML Optimization