Optimization of Dimensions

This section covers the optimization of component numbers for multi-view ICA, including both core (shared) and unique (species-specific) components.

Overview

Choosing the right number of components is crucial for meaningful results. MultiModulon provides automated optimization methods to determine:

  1. Optimal number of core components - Shared across all species

  2. Optimal number of unique components - Specific to each species

Two optimization metrics are available:

Optimizing Core Components

MultiModulon.optimize_number_of_core_components(**kwargs)

Optimize the number of core (shared) components across species.

Parameters:
  • max_k (int) – Maximum number of core components to test (Auto-determined)

  • step (int) – Step size for k candidates (default: 5)

  • max_a_per_view (int) – Maximum components per species (default: max_k)

  • train_frac (float) – Fraction of data for training (default: 0.75)

  • num_runs (int) – Number of cross-validation runs (default: 1)

  • mode (str) – Computation mode ‘gpu’ or ‘cpu’ (default: ‘gpu’)

  • seed (int) – Random seed for reproducibility (default: 42)

  • metric (str) – Optimization metric ‘nre’ or ‘effect_size’ (default: ‘effect_size’)

  • effect_size_threshold (float) – Cohen’s d threshold (default: 5)

  • num_top_gene (int) – Number of top genes for Cohen’s d (default: 20)

  • save_path (str) – Directory to save optimization plot

  • fig_size (tuple) – Figure size as (width, height) (default: (5, 3))

  • font_path (str) – Path to font file for plots

Returns:

Tuple of (optimal_num_core_components, metric_scores)

Return type:

Tuple[int, Dict[int, float]]

Basic Usage

# Optimize using Cohen's d effect size
optimal_core, scores = multiModulon.optimize_number_of_core_components(
    max_k=30,
    step=5,
    metric='effect_size',
    effect_size_threshold=5,  # Minimum Cohen's d
    num_top_gene=20,          # Top genes to consider
    save_plot="effect_size_optimization.png"
)

print(f"Optimal number of core components: {optimal_core}")

Understanding the Metrics

NRE (Normalized Reconstruction Error):

  • Measures how well core components reconstruct the data

  • Lower values are better

  • May include noise components

Cohen’s d Effect Size:

  • Measures separation between top genes and others

  • Higher values indicate components with a more clear gene membership

  • Better for biological interpretability

  • Filters out noise components

Optimizing Unique Components

After determining core components, optimize unique components:

MultiModulon.optimize_number_of_unique_components(**kwargs)

Optimize the number of unique components for each species.

Parameters:
  • optimal_num_core_components (int) – Number of core components (from previous step)

  • step (int) – Step size for testing unique components (default: 5)

  • mode (str) – Computation mode ‘gpu’ or ‘cpu’ (default: ‘gpu’)

  • seed (int) – Random seed (default: 42)

  • effect_size_threshold (float) – Cohen’s d threshold (default: 5)

  • num_top_gene (int) – Number of top genes for Cohen’s d (default: 20)

  • save_path (str) – Directory to save plots for each species

  • fig_size (tuple) – Figure size (default: (5, 3))

  • font_path (str) – Path to font file

Returns:

Tuple of (optimal_unique_components, optimal_total_components)

Return type:

Tuple[Dict[str, int], Dict[str, int]]

Basic Usage

# Optimize unique components
optimal_unique, optimal_total = multiModulon.optimize_number_of_unique_components(
    optimal_num_core_components=20,  # From previous step
    step=5,
    save_plots="unique_optimization/"
)

# Results
print("Optimal unique components per species:")
for species, n_unique in optimal_unique.items():
    n_total = optimal_total[species]
    print(f"{species}: {n_unique} unique, {n_total} total")

How It Works

For each species:

  1. Tests different numbers of unique components

  2. Runs ICA with fixed core + varying unique

  3. Calculates mean Cohen’s d for unique components

  4. Selects number that maximizes interpretable components

Custom Thresholds

Different species may need different thresholds:

# Strict threshold for well-studied species
optimal_unique_strict, _ = multiModulon.optimize_number_of_unique_components(
    optimal_num_core_components=20,
    effect_size_threshold=7,  # Higher threshold
    save_plots="strict_optimization/"
)

# Permissive threshold for novel species
optimal_unique_permissive, _ = multiModulon.optimize_number_of_unique_components(
    optimal_num_core_components=20,
    effect_size_threshold=3,  # Lower threshold
    save_plots="permissive_optimization/"
)

Complete Optimization Workflow

Here’s a complete optimization workflow:

# Step 1: Optimize core components
print("Optimizing core components...")
optimal_core, core_scores = multiModulon.optimize_number_of_core_components(
    max_k=40,
    step=5,
    metric='effect_size',
    effect_size_threshold=5,
    num_runs=3,
    save_path="optimization_results/",
    fig_size=(6, 4)
)
print(f"Optimal core components: {optimal_core}")

# Step 2: Optimize unique components
print("\nOptimizing unique components...")
optimal_unique, optimal_total = multiModulon.optimize_number_of_unique_components(
    optimal_num_core_components=optimal_core,
    step=5,
    effect_size_threshold=5,
    save_path="optimization_results/",
    fig_size=(6, 4)
)

print("\nOptimization complete!")
print(f"Core components: {optimal_core}")
for species in multiModulon.species:
    print(f"{species}: {optimal_unique[species]} unique, "
          f"{optimal_total[species]} total")

# Step 3: Run ICA with optimal parameters
print("\nRunning multi-view ICA with optimal parameters...")
M_matrices, A_matrices = multiModulon.run_robust_multiview_ica(
    a=optimal_total,
    c=optimal_core,
    num_runs=100,
    mode='gpu'
)

Best Practices

  1. Start with effect_size metric - More biologically relevant

  2. Use multiple runs - At least 3-5 for reliability

  3. Inspect plots - Don’t just trust automatic selection

  4. Validate results - Check if components make biological sense

Next Steps

After optimization:

  1. Robust Multi-view ICA - Run ICA with optimal parameters

  2. Visualization of iModulons - Visualize and interpret components