How to Configure GPU/CPU Backend¶
Set up GPU acceleration or force CPU execution.
Prerequisites¶
GPU acceleration requires:
- NVIDIA GPU with CUDA support
- CUDA toolkit installed
- CuPy library installed
Install CuPy¶
Install CuPy matching your CUDA version:
Install CuPy for GPU support
Verify GPU Availability¶
Check if GPU is available:
Check GPU availability and info
import osipy
# Check GPU status
print(f"GPU available: {osipy.is_gpu_available()}")
print(f"Current backend: {osipy.get_backend()}")
# Detailed GPU info
if osipy.is_gpu_available():
import cupy as cp
device = cp.cuda.Device()
print(f"GPU: {device.compute_capability}")
print(f"Memory: {device.mem_info[1] / 1e9:.1f} GB")
Automatic GPU Usage¶
When GPU is available, osipy uses it automatically:
Automatic GPU usage with numpy input
import numpy as np
import osipy
# Data on CPU (numpy array)
concentration = np.random.rand(64, 64, 32, 60)
time = np.linspace(0, 300, 60)
aif = osipy.ParkerAIF()(time)
# Fitting automatically uses GPU if available
result = osipy.fit_model("extended_tofts", concentration, aif, time)
# Result is returned as numpy (CPU) array
print(f"Result type: {type(result.parameter_maps['Ktrans'].values)}") # numpy.ndarray
Explicit GPU Arrays¶
For manual control, use CuPy arrays:
Explicit GPU arrays with CuPy
import cupy as cp
import osipy
# Move data to GPU
concentration_gpu = cp.asarray(concentration)
time_gpu = cp.asarray(time)
# Process on GPU
result = osipy.fit_model("extended_tofts", concentration_gpu, aif, time_gpu)
# Result stays on GPU (CuPy array)
print(f"Result type: {type(result.parameter_maps['Ktrans'].values)}") # cupy.ndarray
# Move back to CPU when needed
ktrans_cpu = osipy.to_numpy(result.parameter_maps['Ktrans'].values)
Memory Management¶
Monitor and manage GPU memory:
Monitor and free GPU memory
Process Large Datasets¶
For datasets larger than GPU memory, process in chunks:
Process large datasets in chunks
import numpy as np
import osipy
def fit_chunked(concentration, aif, time, chunk_size=10000):
"""Fit model in chunks to manage GPU memory."""
# Reshape to (n_voxels, n_timepoints)
shape = concentration.shape
data_2d = concentration.reshape(-1, shape[-1])
n_voxels = data_2d.shape[0]
# Initialize results
results = {
'Ktrans': np.zeros(n_voxels),
've': np.zeros(n_voxels),
'vp': np.zeros(n_voxels),
'r_squared': np.zeros(n_voxels),
}
# Process in chunks
for start in range(0, n_voxels, chunk_size):
end = min(start + chunk_size, n_voxels)
chunk = data_2d[start:end]
# Fit chunk (will use GPU)
chunk_result = osipy.fit_model(
"extended_tofts",
chunk[..., np.newaxis, np.newaxis, :].transpose(1, 2, 0, 3),
aif, time
)
# Store results
results['Ktrans'][start:end] = chunk_result.parameter_maps['Ktrans'].values.flatten()
results['ve'][start:end] = chunk_result.parameter_maps['ve'].values.flatten()
results['vp'][start:end] = chunk_result.parameter_maps['vp'].values.flatten()
results['r_squared'][start:end] = chunk_result.r_squared_map.flatten()
# Reshape back to 3D
for key in results:
results[key] = results[key].reshape(shape[:-1])
return results
# Use for large datasets
result = fit_chunked(concentration, aif, time)
Configure GPU Settings¶
Configure GPU behavior:
Configure GPU settings
Multi-GPU Processing¶
Use specific GPU devices:
Multi-GPU device selection
Performance Tips¶
1. Batch Your Data¶
Process entire volume at once
2. Use Appropriate Data Types¶
Use float32 for faster computation
3. Pre-allocate Arrays¶
Pre-allocate GPU arrays
Troubleshooting¶
CUDA Out of Memory¶
Free GPU memory and retry
CuPy Import Error¶
Fix CuPy installation
Slow First Run¶
Warmup CuPy kernel compilation
Force CPU Execution¶
Force CPU even when GPU is available -- useful for debugging, reproducibility testing, or when GPU memory is insufficient.
Global CPU Mode¶
Force CPU execution globally
Environment Variable¶
Compare CPU vs GPU Results¶
Compare CPU and GPU results
import numpy as np
import osipy
osipy.set_backend(osipy.GPUConfig(force_cpu=True))
result_cpu = osipy.fit_model("extended_tofts", concentration, aif, time)
osipy.set_backend(osipy.GPUConfig(force_cpu=False))
result_gpu = osipy.fit_model("extended_tofts", concentration, aif, time)
ktrans_cpu = osipy.to_numpy(result_cpu.parameter_maps['Ktrans'].values)
ktrans_gpu = osipy.to_numpy(result_gpu.parameter_maps['Ktrans'].values)
print(f"Max difference: {np.abs(ktrans_cpu - ktrans_gpu).max()}")