Optimization of Dimensions
This section covers the optimization of component numbers for multi-view ICA, including both core (shared) and unique (species-specific) components.
Overview
Choosing the right number of components is crucial for meaningful results. MultiModulon provides automated optimization methods to determine:
Optimal number of core components - Shared across all species
Optimal number of unique components - Specific to each species
All optimization now relies on the single-gene filter metric, which removes components dominated by a single gene (largest absolute weight > 3 × second-largest).
Optimizing Core Components
- MultiModulon.optimize_number_of_core_components(**kwargs)
Optimize the number of core (shared) components across species.
- Parameters:
max_k (int) – Maximum number of core components to test (Auto-determined)
step (int) – Step size for k candidates (default: 5)
num_runs (int) – Number of runs per k value (default: 1)
mode (str) – Computation mode ‘gpu’ or ‘cpu’ (default: ‘gpu’)
seed (int) – Random seed for reproducibility (default: 42)
save_path (str) – Directory to save optimization plot
fig_size (tuple) – Figure size as (width, height) (default: (5, 3))
font_path (str) – Path to font file for plots
- Returns:
Optimal number of core components
- Return type:
Basic Usage
# Optimize using single-gene filter
optimal_core = multiModulon.optimize_number_of_core_components(
max_k=30,
step=5,
save_plot="single_gene_optimization.png"
)
print(f"Optimal number of core components: {optimal_core}")
Understanding the Metric
Single-gene Filter:
Removes components dominated by a single gene (largest absolute weight > 3 × second-largest)
Highlights components with distributed gene contributions
Improves biological interpretability by eliminating single-gene artifacts
Optimizing Unique Components
After determining core components, optimize unique components:
- MultiModulon.optimize_number_of_unique_components(**kwargs)
Optimize the number of unique components for each species.
- Parameters:
optimal_num_core_components (int) – Number of core components (from previous step)
step (int) – Step size for testing unique components (default: 5)
mode (str) – Computation mode ‘gpu’ or ‘cpu’ (default: ‘gpu’)
seed (int) – Random seed (default: 42)
save_path (str) – Directory to save plots for each species
fig_size (tuple) – Figure size (default: (5, 3))
font_path (str) – Path to font file
- Returns:
Tuple of (optimal_unique_components, optimal_total_components)
- Return type:
Basic Usage
# Optimize unique components
optimal_unique, optimal_total = multiModulon.optimize_number_of_unique_components(
optimal_num_core_components=20, # From previous step
step=5,
save_plots="unique_optimization/"
)
# Results
print("Optimal unique components per species:")
for species, n_unique in optimal_unique.items():
n_total = optimal_total[species]
print(f"{species}: {n_unique} unique, {n_total} total")
How It Works
For each species:
Tests different numbers of unique components
Runs ICA with fixed core + varying unique
Removes single-gene components before clustering/counting
Selects number that maximizes interpretable components
Complete Optimization Workflow
Here’s a complete optimization workflow:
# Step 1: Optimize core components
print("Optimizing core components...")
optimal_core = multiModulon.optimize_number_of_core_components(
max_k=40,
step=5,
num_runs=3,
save_path="optimization_results/",
fig_size=(6, 4)
)
print(f"Optimal core components: {optimal_core}")
# Step 2: Optimize unique components
print("\nOptimizing unique components...")
optimal_unique, optimal_total = multiModulon.optimize_number_of_unique_components(
optimal_num_core_components=optimal_core,
step=5,
save_path="optimization_results/",
fig_size=(6, 4)
)
print("\nOptimization complete!")
print(f"Core components: {optimal_core}")
for species in multiModulon.species:
print(f"{species}: {optimal_unique[species]} unique, "
f"{optimal_total[species]} total")
# Step 3: Run ICA with optimal parameters
print("\nRunning multi-view ICA with optimal parameters...")
M_matrices, A_matrices = multiModulon.run_robust_multiview_ica(
a=optimal_total,
c=optimal_core,
num_runs=100,
mode='gpu'
)
Best Practices
Use the single-gene filter - More biologically relevant
Use multiple runs - At least 3-5 for reliability
Inspect plots - Don’t just trust automatic selection
Validate results - Check if components make biological sense
Next Steps
After optimization:
Robust Multi-view ICA - Run ICA with optimal parameters
Visualization of iModulons - Visualize and interpret components