Skip to content

batch

batch

Batch processing utilities for GPU memory management.

This module provides the BatchProcessor class for processing large datasets in batches on GPU while managing memory. It supports batch processing of multiple voxels on GPU with configurable batch sizes and graceful fallback to CPU execution if GPU memory is exhausted.

Example

from osipy.common.backend import BatchProcessor import numpy as np

def process_batch(data): ... return data ** 2

processor = BatchProcessor(batch_size=10000) large_data = np.random.randn(100000, 50) result = processor.process(large_data, process_batch)

References

.. [1] CuPy Memory Management: https://docs.cupy.dev/en/stable/user_guide/memory.html

BatchResult dataclass

BatchResult(
    data,
    used_gpu=False,
    fallback_occurred=False,
    batches_processed=0,
)

Result from batch processing.

ATTRIBUTE DESCRIPTION
data

The processed data.

TYPE: NDArray

used_gpu

Whether GPU was used for processing.

TYPE: bool

fallback_occurred

Whether CPU fallback occurred due to GPU memory issues.

TYPE: bool

batches_processed

Number of batches processed.

TYPE: int

BatchProcessor dataclass

BatchProcessor(
    batch_size=None,
    use_gpu=True,
    auto_fallback=True,
    memory_safety_margin=0.1,
)

Process data in batches with automatic GPU memory management.

This class provides efficient batch processing for large datasets, automatically managing GPU memory and falling back to CPU when necessary.

PARAMETER DESCRIPTION
batch_size

Number of elements per batch. Default uses the global configuration.

TYPE: int DEFAULT: None

use_gpu

Whether to attempt GPU acceleration. Default is True.

TYPE: bool DEFAULT: True

auto_fallback

Whether to automatically fall back to CPU on GPU memory errors. Default is True.

TYPE: bool DEFAULT: True

memory_safety_margin

Fraction of estimated memory to keep free (0.0 to 0.5). Default is 0.1 (10% safety margin).

TYPE: float DEFAULT: 0.1

Example

processor = BatchProcessor(batch_size=5000) result = processor.map(data, lambda x: x ** 2)

__post_init__

__post_init__()

Initialize with defaults from global config if needed.

map

map(data, func, axis=0)

Apply a function to data in batches.

PARAMETER DESCRIPTION
data

Input data array.

TYPE: NDArray

func

Function to apply to each batch. Should accept and return arrays.

TYPE: Callable

axis

Axis along which to batch. Default is 0.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
BatchResult

Result containing processed data and metadata.

Notes

If GPU memory is exhausted, this will automatically fall back to CPU processing (if auto_fallback is True) with a warning.

batch_apply

batch_apply(data, func, batch_size=None, axis=0)

Convenience function to apply a function in batches.

PARAMETER DESCRIPTION
data

Input data array.

TYPE: NDArray

func

Function to apply to each batch.

TYPE: Callable

batch_size

Batch size. Default uses global configuration.

TYPE: int DEFAULT: None

axis

Axis along which to batch. Default is 0.

TYPE: int DEFAULT: 0

RETURNS DESCRIPTION
NDArray

Processed data.

Example

result = batch_apply(large_array, lambda x: x ** 2, batch_size=10000)