Optimization of Dimensions

This section covers the optimization of component numbers for multi-view ICA, including both core (shared) and unique (species-specific) components.

Overview

Choosing the right number of components is crucial for meaningful results. MultiModulon provides automated optimization methods to determine:

  1. Optimal number of core components - Shared across all species

  2. Optimal number of unique components - Specific to each species

All optimization now relies on the single-gene filter metric, which removes components dominated by a single gene (largest absolute weight > 3 × second-largest).

Optimizing Core Components

MultiModulon.optimize_number_of_core_components(**kwargs)

Optimize the number of core (shared) components across species.

Parameters:
  • max_k (int) – Maximum number of core components to test (Auto-determined)

  • step (int) – Step size for k candidates (default: 5)

  • num_runs (int) – Number of runs per k value (default: 1)

  • mode (str) – Computation mode ‘gpu’ or ‘cpu’ (default: ‘gpu’)

  • seed (int) – Random seed for reproducibility (default: 42)

  • save_path (str) – Directory to save optimization plot

  • fig_size (tuple) – Figure size as (width, height) (default: (5, 3))

  • font_path (str) – Path to font file for plots

Returns:

Optimal number of core components

Return type:

int

Basic Usage

# Optimize using single-gene filter
optimal_core = multiModulon.optimize_number_of_core_components(
    max_k=30,
    step=5,
    save_plot="single_gene_optimization.png"
)

print(f"Optimal number of core components: {optimal_core}")

Understanding the Metric

Single-gene Filter:

  • Removes components dominated by a single gene (largest absolute weight > 3 × second-largest)

  • Highlights components with distributed gene contributions

  • Improves biological interpretability by eliminating single-gene artifacts

Optimizing Unique Components

After determining core components, optimize unique components:

MultiModulon.optimize_number_of_unique_components(**kwargs)

Optimize the number of unique components for each species.

Parameters:
  • optimal_num_core_components (int) – Number of core components (from previous step)

  • step (int) – Step size for testing unique components (default: 5)

  • mode (str) – Computation mode ‘gpu’ or ‘cpu’ (default: ‘gpu’)

  • seed (int) – Random seed (default: 42)

  • save_path (str) – Directory to save plots for each species

  • fig_size (tuple) – Figure size (default: (5, 3))

  • font_path (str) – Path to font file

Returns:

Tuple of (optimal_unique_components, optimal_total_components)

Return type:

Tuple[Dict[str, int], Dict[str, int]]

Basic Usage

# Optimize unique components
optimal_unique, optimal_total = multiModulon.optimize_number_of_unique_components(
    optimal_num_core_components=20,  # From previous step
    step=5,
    save_plots="unique_optimization/"
)

# Results
print("Optimal unique components per species:")
for species, n_unique in optimal_unique.items():
    n_total = optimal_total[species]
    print(f"{species}: {n_unique} unique, {n_total} total")

How It Works

For each species:

  1. Tests different numbers of unique components

  2. Runs ICA with fixed core + varying unique

  3. Removes single-gene components before clustering/counting

  4. Selects number that maximizes interpretable components

Complete Optimization Workflow

Here’s a complete optimization workflow:

# Step 1: Optimize core components
print("Optimizing core components...")
optimal_core = multiModulon.optimize_number_of_core_components(
    max_k=40,
    step=5,
    num_runs=3,
    save_path="optimization_results/",
    fig_size=(6, 4)
)
print(f"Optimal core components: {optimal_core}")

# Step 2: Optimize unique components
print("\nOptimizing unique components...")
optimal_unique, optimal_total = multiModulon.optimize_number_of_unique_components(
    optimal_num_core_components=optimal_core,
    step=5,
    save_path="optimization_results/",
    fig_size=(6, 4)
)

print("\nOptimization complete!")
print(f"Core components: {optimal_core}")
for species in multiModulon.species:
    print(f"{species}: {optimal_unique[species]} unique, "
          f"{optimal_total[species]} total")

# Step 3: Run ICA with optimal parameters
print("\nRunning multi-view ICA with optimal parameters...")
M_matrices, A_matrices = multiModulon.run_robust_multiview_ica(
    a=optimal_total,
    c=optimal_core,
    num_runs=100,
    mode='gpu'
)

Best Practices

  1. Use the single-gene filter - More biologically relevant

  2. Use multiple runs - At least 3-5 for reliability

  3. Inspect plots - Don’t just trust automatic selection

  4. Validate results - Check if components make biological sense

Next Steps

After optimization:

  1. Robust Multi-view ICA - Run ICA with optimal parameters

  2. Visualization of iModulons - Visualize and interpret components