API Reference¶
MLArray Module¶
mlarray.mlarray.MLArray ¶
MLArray(array: Optional[Union[ndarray, str, Path]] = None, spacing: Optional[Union[List, Tuple, ndarray]] = None, origin: Optional[Union[List, Tuple, ndarray]] = None, direction: Optional[Union[List, Tuple, ndarray]] = None, meta: Optional[Union[Dict, Meta]] = None, channel_axis: Optional[int] = None, num_threads: int = 1, copy: Optional[MLArray] = None)
Initializes a MLArray instance.
The MLArray file format (".mla") is a Blosc2-compressed container with standardized metadata support for N-dimensional medical images.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array
|
Optional[Union[ndarray, str, Path]]
|
Input data or file path. Use a numpy ndarray for in-memory arrays, or a string/Path to load a ".b2nd" or ".mla" file. If None, an empty MLArray instance is created. |
None
|
spacing
|
Optional[Union[List, Tuple, ndarray]]
|
Spacing per spatial axis. Provide a list/tuple/ndarray with length equal to the number of spatial dimensions (e.g., [sx, sy, sz]). |
None
|
origin
|
Optional[Union[List, Tuple, ndarray]]
|
Origin per axis. Provide a list/tuple/ndarray with length equal to the number of spatial dimensions. |
None
|
direction
|
Optional[Union[List, Tuple, ndarray]]
|
Direction cosine matrix. Provide a 2D list/tuple/ndarray with shape (ndims, ndims) for spatial dimensions. |
None
|
meta
|
Optional[Dict | Meta]
|
Free-form metadata dictionary or Meta instance. Must be JSON-serializable when saving. If meta is passed as a Dict, it will internally be converted into a Meta object with the dict being interpreted as meta.image metadata. |
None
|
channel_axis
|
Optional[int]
|
Axis index that represents channels in the array (e.g., 0 for CHW or -1 for HWC). If None, the array is treated as purely spatial. |
None
|
num_threads
|
int
|
Number of threads for Blosc2 operations. |
1
|
copy
|
Optional[MLArray]
|
Another MLArray instance to copy metadata fields from. If provided, its metadata overrides any metadata set via arguments. |
None
|
affine
property
¶
affine: ndarray
Computes the affine transformation matrix for the image.
Returns:
| Name | Type | Description |
|---|---|---|
list |
ndarray
|
Affine matrix with shape (ndims + 1, ndims + 1), or None if no array is loaded. |
direction
property
¶
direction
Returns the image direction.
Returns:
| Name | Type | Description |
|---|---|---|
list |
Direction cosine matrix with shape (ndims, ndims). |
dtype
property
¶
dtype
Returns the dtype of the array.
Returns:
| Type | Description |
|---|---|
|
np.dtype: Dtype of the underlying array, or None if no array is loaded. |
ndim
property
¶
ndim: int
Returns the number of dimensions of the array.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Number of dimensions, or None if no array is loaded. |
origin
property
¶
origin
Returns the image origin.
Returns:
| Name | Type | Description |
|---|---|---|
list |
Origin per spatial axis with length equal to the number of |
|
|
spatial dimensions. |
rotation
property
¶
rotation
Extracts the rotation matrix from the affine matrix.
Returns:
| Name | Type | Description |
|---|---|---|
list |
Rotation matrix with shape (ndims, ndims), or None if no array is loaded. |
scale
property
¶
scale
Extracts the scaling factors from the affine matrix.
Returns:
| Name | Type | Description |
|---|---|---|
list |
Scaling factors per axis with length equal to the number of spatial dimensions, or None if no array is loaded. |
shape
property
¶
shape
Returns the shape of the array.
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
Shape of the underlying array, or None if no array is loaded. |
shear
property
¶
shear
Computes the shear matrix from the affine matrix.
Returns:
| Name | Type | Description |
|---|---|---|
list |
Shear matrix with shape (ndims, ndims), or None if no array is loaded. |
spacing
property
¶
spacing
Returns the image spacing.
Returns:
| Name | Type | Description |
|---|---|---|
list |
Spacing per spatial axis with length equal to the number of |
|
|
spatial dimensions. |
translation
property
¶
translation
Extracts the translation vector from the affine matrix.
Returns:
| Name | Type | Description |
|---|---|---|
list |
Translation vector with length equal to the number of spatial dimensions, or None if no array is loaded. |
close ¶
close()
Flush metadata and close the underlying store.
After closing, the MLArray instance no longer has an attached array.
comp_blosc2_params ¶
comp_blosc2_params(image_size: Union[Tuple[int, int], Tuple[int, int, int], Tuple[int, int, int, int]], patch_size: Union[Tuple[int, int], Tuple[int, int, int]], channel_axis: Optional[int] = None, bytes_per_pixel: int = 4, l1_cache_size_per_core_in_bytes: int = 32768, l3_cache_size_per_core_in_bytes: int = 1441792, safety_factor: float = 0.8)
Computes a recommended block and chunk size for saving arrays with Blosc v2.
Blosc2 NDIM documentation: "Having a second partition allows for greater flexibility in fitting different partitions to different CPU cache levels. Typically, the first partition (also known as chunks) should be sized to fit within the L3 cache, while the second partition (also known as blocks) should be sized to fit within the L2 or L1 caches, depending on whether the priority is compression ratio or speed." (Source: https://www.blosc.org/posts/blosc2-ndim-intro/)
Our approach is not fully optimized for this yet. Currently, we aim to fit the uncompressed block within the L1 cache, accepting that it might occasionally spill over into L2, which we consider acceptable.
Note: This configuration is specifically optimized for nnU-Net data loading, where each read operation is performed by a single core, so multi-threading is not an option.
The default cache values are based on an older Intel 4110 CPU with 32KB L1, 128KB L2, and 1408KB L3 cache per core. We haven't further optimized for modern CPUs with larger caches, as our data must still be compatible with the older systems.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
image_size
|
Union[Tuple[int, int], Tuple[int, int, int], Tuple[int, int, int, int]]
|
Image shape. Use a 2D, 3D, or 4D size; 2D/3D inputs are internally expanded to 4D (with channels first). |
required |
patch_size
|
Union[Tuple[int, int], Tuple[int, int, int]]
|
Patch size for spatial dimensions. Use a 2-tuple (x, y) or 3-tuple (x, y, z). |
required |
channel_axis
|
Optional[int]
|
Axis index for channels in the original array. If set, the size is moved to channels-first for cache calculations. |
None
|
bytes_per_pixel
|
int
|
Number of bytes per element. Defaults to 4 for float32. |
4
|
l1_cache_size_per_core_in_bytes
|
int
|
L1 cache per core in bytes. |
32768
|
l3_cache_size_per_core_in_bytes
|
int
|
L3 cache per core in bytes. |
1441792
|
safety_factor
|
float
|
Safety factor to avoid filling caches. |
0.8
|
Returns:
| Type | Description |
|---|---|
|
Tuple[List[int], List[int]]: Recommended chunk size and block size. |
load
classmethod
¶
load(filepath: Union[str, Path], num_threads: int = 1)
Loads a Blosc2-compressed file. Both MLArray ('.mla') and Blosc2 ('.b2nd') files are supported.
WARNING
MLArray supports both ".b2nd" and ".mla" files. The MLArray format standard and standardized metadata are honored only for ".mla". For ".b2nd", metadata is ignored when loading.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Path to the Blosc2 file to be loaded. The filepath needs to have the extension ".b2nd" or ".mla". |
required |
num_threads
|
int
|
Number of threads to use for loading the file. |
1
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the file extension is not ".b2nd" or ".mla". |
open
classmethod
¶
open(filepath: Union[str, Path], shape: Optional[Union[List, Tuple, ndarray]] = None, dtype: Optional[dtype] = None, channel_axis: Optional[int] = None, mmap: str = 'r', patch_size: Optional[Union[int, List, Tuple]] = 'default', chunk_size: Optional[Union[int, List, Tuple]] = None, block_size: Optional[Union[int, List, Tuple]] = None, num_threads: int = 1, cparams: Optional[Dict] = None, dparams: Optional[Dict] = None)
Open an existing Blosc2 file or create a new one with memory mapping.
This method supports both MLArray (".mla") and plain Blosc2 (".b2nd")
files. When creating a new file, both shape and dtype must be
provided.
WARNING
MLArray supports both ".b2nd" and ".mla" files. The MLArray format standard and standardized metadata are honored only for ".mla". For ".b2nd", metadata is ignored when loading.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Target file path. Must end with ".b2nd" or ".mla". |
required |
shape
|
Optional[Union[List, Tuple, ndarray]]
|
Shape of the array to create. If provided, a new file is created. Length must match the full array dimensionality (including channels if present). |
None
|
dtype
|
Optional[dtype]
|
Numpy dtype for a newly created array. |
None
|
channel_axis
|
Optional[int]
|
Axis index for channels in the array. Used for patch/chunk/block calculations. |
None
|
mmap
|
str
|
Blosc2 mmap mode. One of "r", "r+", "w+", "c". |
'r'
|
patch_size
|
Optional[Union[int, List, Tuple]]
|
Patch size hint for chunk/block optimization. Provide an int for isotropic sizes or a list/tuple with length equal to the number of spatial dimensions. Use "default" to use the default patch size of 192. |
'default'
|
chunk_size
|
Optional[Union[int, List, Tuple]]
|
Explicit chunk size.
Provide an int or tuple/list with length equal to the array
dimensions. Ignored when |
None
|
block_size
|
Optional[Union[int, List, Tuple]]
|
Explicit block size.
Provide an int or tuple/list with length equal to the array
dimensions. Ignored when |
None
|
num_threads
|
int
|
Number of threads for Blosc2 operations. |
1
|
cparams
|
Optional[Dict]
|
Blosc2 compression parameters. |
None
|
dparams
|
Optional[Dict]
|
Blosc2 decompression parameters. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
MLArray |
The current instance (for chaining). |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the file extension is invalid, if shape/dtype are inconsistent, or if mmap mode is invalid for creation. |
save ¶
save(filepath: Union[str, Path], patch_size: Optional[Union[int, List, Tuple]] = 'default', chunk_size: Optional[Union[int, List, Tuple]] = None, block_size: Optional[Union[int, List, Tuple]] = None, num_threads: int = 1, cparams: Optional[Dict] = None, dparams: Optional[Dict] = None)
Saves the array to a Blosc2-compressed file. Both MLArray ('.mla') and Blosc2 ('.b2nd') files are supported.
WARNING
MLArray supports both ".b2nd" and ".mla" files. The MLArray format standard and standardized metadata are honored only for ".mla". For ".b2nd", metadata is ignored when saving.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
Union[str, Path]
|
Path to save the file. Must end with ".b2nd" or ".mla". |
required |
patch_size
|
Optional[Union[int, List, Tuple]]
|
Patch size hint for chunk/block optimization. Provide an int for isotropic sizes or a list/tuple with length equal to the number of dimensions. Use "default" to use the default patch size of 192. |
'default'
|
chunk_size
|
Optional[Union[int, List, Tuple]]
|
Explicit chunk size. Provide an int or a tuple/list with length equal to the number of dimensions, or None to let Blosc2 decide. Ignored when patch_size is not None. |
None
|
block_size
|
Optional[Union[int, List, Tuple]]
|
Explicit block size. Provide an int or a tuple/list with length equal to the number of dimensions, or None to let Blosc2 decide. Ignored when patch_size is not None. |
None
|
num_threads
|
int
|
Number of threads to use for saving the file. |
1
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If the file extension is not ".b2nd" or ".mla". |
to_numpy ¶
to_numpy()
Return the underlying data as a NumPy array.
Returns:
| Type | Description |
|---|---|
|
np.ndarray: A NumPy view or copy of the stored array data. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If no array data is loaded. |
Metadata Module¶
mlarray.meta.BaseMeta
dataclass
¶
BaseMeta()
Base class for metadata containers.
Subclasses should implement _validate_and_cast to coerce and validate fields after initialization or mutation.
copy_from ¶
copy_from(other: T, *, overwrite: bool = False) -> None
Copy fields from another instance of the same class.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
other
|
T
|
Source instance. |
required |
overwrite
|
bool
|
When True, overwrite all fields. When False, only fill destination fields that are "unset" (None or empty containers). Nested BaseMeta fields are merged recursively unless the entire destination sub-meta is default, in which case it is replaced. |
False
|
Raises:
| Type | Description |
|---|---|
TypeError
|
If other is not the same class as self. |
ensure
classmethod
¶
ensure(x: Any) -> T
Coerce x into an instance of cls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Any
|
None, an instance of cls, or a mapping of fields. |
required |
Returns:
| Type | Description |
|---|---|
T
|
An instance of cls. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If x is not None, cls, or a mapping. |
from_mapping
classmethod
¶
from_mapping(d: Mapping[str, Any]) -> T
Construct an instance from a mapping.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
Mapping[str, Any]
|
Input mapping matching dataclass field names. |
required |
Returns:
| Type | Description |
|---|---|
T
|
A new instance of cls. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If d is not a Mapping. |
KeyError
|
If unknown keys are present. |
to_mapping ¶
to_mapping(*, include_none: bool = True) -> Dict[str, Any]
Serialize to a mapping, recursively expanding nested BaseMeta.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_none
|
bool
|
Include fields with None values when True. |
True
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
A dict of field names to serialized values. |
to_plain ¶
to_plain(*, include_none: bool = False) -> Any
Convert to plain Python objects recursively.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_none
|
bool
|
Include fields with None values when True. |
False
|
Returns:
| Type | Description |
|---|---|
Any
|
A dict of field values, with nested BaseMeta expanded. SingleKeyBaseMeta |
Any
|
overrides this to return its wrapped value. |
mlarray.meta.SingleKeyBaseMeta
dataclass
¶
SingleKeyBaseMeta()
Bases: BaseMeta
BaseMeta subclass that wraps a single field as a raw value.
ensure
classmethod
¶
ensure(x: Any) -> SK
Coerce input into an instance of cls.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
Any
|
None, instance of cls, mapping, or raw value. |
required |
Returns:
| Type | Description |
|---|---|
SK
|
An instance of cls. |
from_mapping
classmethod
¶
from_mapping(d: Any) -> SK
Construct from either schema-shaped mapping or raw value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
d
|
Any
|
None, mapping, or raw value. |
required |
Returns:
| Type | Description |
|---|---|
SK
|
A new instance of cls. |
to_mapping ¶
to_mapping(*, include_none: bool = True) -> Dict[str, Any]
Serialize to a mapping with the single key.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_none
|
bool
|
Include the key when the value is None. |
True
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
A dict with the single field name as the key, or an empty dict. |
to_plain ¶
to_plain(*, include_none: bool = False) -> Any
Return the wrapped value for plain output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_none
|
bool
|
Return None when the value is None. |
False
|
Returns:
| Type | Description |
|---|---|
Any
|
The wrapped value or None. |
mlarray.meta.Meta
dataclass
¶
Meta(original: 'MetaOriginal' = (lambda: MetaOriginal())(), extra: 'MetaExtra' = (lambda: MetaExtra())(), spatial: 'MetaSpatial' = (lambda: MetaSpatial())(), stats: 'MetaStatistics' = (lambda: MetaStatistics())(), bbox: 'MetaBbox' = (lambda: MetaBbox())(), is_seg: 'MetaIsSeg' = (lambda: MetaIsSeg())(), _blosc2: 'MetaBlosc2' = (lambda: MetaBlosc2())(), _has_array: 'MetaHasArray' = (lambda: MetaHasArray())(), _image_meta_format: 'MetaImageFormat' = (lambda: MetaImageFormat())(), _mlarray_version: 'MetaVersion' = (lambda: MetaVersion())())
Bases: BaseMeta
Top-level metadata container for mlarray.
Attributes:
| Name | Type | Description |
|---|---|---|
original |
'MetaOriginal'
|
Image metadata from the origin source (JSON-serializable dict). |
extra |
'MetaExtra'
|
Additional metadata (JSON-serializable dict). |
spatial |
'MetaSpatial'
|
Spatial metadata (spacing, origin, direction, shape). |
stats |
'MetaStatistics'
|
Summary statistics. |
bbox |
'MetaBbox'
|
Bounding boxes. |
is_seg |
'MetaIsSeg'
|
Segmentation flag. |
_blosc2 |
'MetaBlosc2'
|
Blosc2 chunking/tiling metadata. |
_has_array |
'MetaHasArray'
|
Payload presence flag. |
_image_meta_format |
'MetaImageFormat'
|
Image metadata format identifier. |
_mlarray_version |
'MetaVersion'
|
Version string for mlarray. |
to_plain ¶
to_plain(*, include_none: bool = False) -> Any
Convert to plain values, suppressing default sub-metas.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
include_none
|
bool
|
Include None values when True. |
False
|
Returns:
| Type | Description |
|---|---|
Any
|
A dict of field values where default child metas are represented |
Any
|
as None and optionally filtered out. |
mlarray.meta.MetaOriginal
dataclass
¶
MetaOriginal(data: Dict[str, Any] = dict())
Bases: SingleKeyBaseMeta
Image metadata from the origin source stored as JSON-serializable dict.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
Dict[str, Any]
|
Arbitrary JSON-serializable metadata. |
mlarray.meta.MetaExtra
dataclass
¶
MetaExtra(data: Dict[str, Any] = dict())
Bases: SingleKeyBaseMeta
Generic extra metadata stored as JSON-serializable dict.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
Dict[str, Any]
|
Arbitrary JSON-serializable metadata. |
mlarray.meta.MetaSpatial
dataclass
¶
MetaSpatial(spacing: Optional[List] = None, origin: Optional[List] = None, direction: Optional[List[List]] = None, shape: Optional[List] = None, channel_axis: Optional[int] = None)
Bases: BaseMeta
Spatial metadata describing geometry and layout.
Attributes:
| Name | Type | Description |
|---|---|---|
spacing |
Optional[List]
|
Per-dimension spacing values. Length must match ndims. |
origin |
Optional[List]
|
Per-dimension origin values. Length must match ndims. |
direction |
Optional[List[List]]
|
Direction cosine matrix of shape [ndims, ndims]. |
shape |
Optional[List]
|
Array shape. Length must match ndims, or (ndims + 1) when channel_axis is set. |
channel_axis |
Optional[int]
|
Index of the channel dimension, if any. |
mlarray.meta.MetaStatistics
dataclass
¶
MetaStatistics(min: Optional[float] = None, max: Optional[float] = None, mean: Optional[float] = None, median: Optional[float] = None, std: Optional[float] = None, percentile_min: Optional[float] = None, percentile_max: Optional[float] = None, percentile_mean: Optional[float] = None, percentile_median: Optional[float] = None, percentile_std: Optional[float] = None, percentile_min_key: Optional[float] = None, percentile_max_key: Optional[float] = None)
Bases: BaseMeta
Numeric summary statistics for an array.
Attributes:
| Name | Type | Description |
|---|---|---|
min |
Optional[float]
|
Minimum value. |
max |
Optional[float]
|
Maximum value. |
mean |
Optional[float]
|
Mean value. |
median |
Optional[float]
|
Median value. |
std |
Optional[float]
|
Standard deviation. |
percentile_min |
Optional[float]
|
Minimum percentile value. |
percentile_max |
Optional[float]
|
Maximum percentile value. |
percentile_mean |
Optional[float]
|
Mean percentile value. |
percentile_median |
Optional[float]
|
Median percentile value. |
percentile_std |
Optional[float]
|
Standard deviation of percentile values. |
percentile_min_key |
Optional[float]
|
Minimum percentile key used to determine percentile_min (for example 0.05). |
percentile_max_key |
Optional[float]
|
Maximum percentile key used to determine percentile_max (for example 0.95). |
mlarray.meta.MetaBbox
dataclass
¶
MetaBbox(bboxes: Optional[List[List[List[int]]]] = None)
Bases: SingleKeyBaseMeta
Bounding boxes represented as per-dimension min/max pairs.
Attributes:
| Name | Type | Description |
|---|---|---|
bboxes |
Optional[List[List[List[int]]]]
|
List of bounding boxes with shape [n_boxes, ndims, 2], where each inner pair is [min, max] for a dimension. Values must be ints. |
mlarray.meta.MetaIsSeg
dataclass
¶
MetaIsSeg(is_seg: Optional[bool] = None)
Bases: SingleKeyBaseMeta
Flag indicating whether the array is a segmentation mask.
Attributes:
| Name | Type | Description |
|---|---|---|
is_seg |
Optional[bool]
|
True/False when known, None when unknown. |
mlarray.meta.MetaBlosc2
dataclass
¶
MetaBlosc2(chunk_size: Optional[list] = None, block_size: Optional[list] = None, patch_size: Optional[list] = None)
Bases: BaseMeta
Metadata for Blosc2 tiling and chunking.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_size |
Optional[list]
|
List of per-dimension chunk sizes. Length must match ndims. |
block_size |
Optional[list]
|
List of per-dimension block sizes. Length must match ndims. |
patch_size |
Optional[list]
|
List of per-dimension patch sizes. Length must match ndims, or (ndims - 1) when a channel axis is present. |
mlarray.meta.MetaHasArray
dataclass
¶
MetaHasArray(has_array: bool = False)
Bases: SingleKeyBaseMeta
Flag indicating whether an array is present.
Attributes:
| Name | Type | Description |
|---|---|---|
has_array |
bool
|
True when array data is present. |
mlarray.meta.MetaImageFormat
dataclass
¶
MetaImageFormat(image_meta_format: Optional[str] = None)
Bases: SingleKeyBaseMeta
String describing the image metadata format.
Attributes:
| Name | Type | Description |
|---|---|---|
image_meta_format |
Optional[str]
|
Format identifier, or None. |
mlarray.meta.MetaVersion
dataclass
¶
MetaVersion(mlarray_version: Optional[str] = None)
Bases: SingleKeyBaseMeta
Version metadata for mlarray.
Attributes:
| Name | Type | Description |
|---|---|---|
mlarray_version |
Optional[str]
|
Version string, or None. |