Robust Multi-view ICA

This section covers running multi-view Independent Component Analysis (ICA) to identify core (conserved) and unique (species-specific) regulatory modules.

Overview

Multi-view ICA decomposes expression matrices from multiple species into:

  • Core components - Shared regulatory modules conserved across species

  • Unique components - Species-specific regulatory modules

The analysis produces:

  • M matrices - Gene weights (mixing matrices)

  • A matrices - Sample activities (source signals)

Basic Multi-view ICA

MultiModulon.run_multiview_ica(**kwargs)

Run multi-view ICA on aligned expression matrices.

Parameters:
  • a – Number of components per species. Can be: - int: Same number for all species - Dict[str, int]: Different numbers per species

  • c (int) – Number of core (shared) components

  • mode (str) – ‘gpu’ or ‘cpu’ (default: ‘gpu’)

  • effect_size_threshold (float) – Cohen’s d threshold for filtering (optional)

  • num_top_gene (int) – Number of top genes for Cohen’s d calculation (default: 20)

Returns:

Dictionary mapping species to M matrices

Return type:

Dict[str, pd.DataFrame]

Basic Usage

# Run with equal components per species
M_matrices = multiModulon.run_multiview_ica(
    a=50,      # 50 total components per species
    c=20,      # 20 core components
    mode='gpu'
)

# Run with different components per species
M_matrices = multiModulon.run_multiview_ica(
    a={'Species1': 50, 'Species2': 60, 'Species3': 45},
    c=20,
    mode='gpu'
)

Component Filtering

Filter components by effect size:

# Only keep components with Cohen's d > 5
M_matrices = multiModulon.run_multiview_ica(
    a=60,
    c=25,
    effect_size_threshold=5,
    num_top_gene=20
)

# Check how many components passed the filter
for species, M in M_matrices.items():
    print(f"{species}: {M.shape[1]} components retained")

Robust Multi-view ICA

For more reliable results, use robust ICA with multiple runs:

MultiModulon.run_robust_multiview_ica(**kwargs)

Run multi-view ICA multiple times and cluster results for robustness.

Parameters:
  • a (Dict[str, int]) – Total components per species

  • c (int) – Number of core components

  • num_runs (int) – Number of ICA runs (default: 100)

  • mode (str) – ‘gpu’ or ‘cpu’ (default: ‘gpu’)

  • seed (int) – Random seed (default: 42)

  • effect_size_threshold (float) – Cohen’s d threshold for all components

  • effect_size_threshold_core (float) – Threshold for core components only

  • effect_size_threshold_unique (float) – Threshold for unique components only

  • num_top_gene (int) – Number of top genes for Cohen’s d

Returns:

Tuple of (M_matrices, A_matrices)

Return type:

Tuple[Dict[str, pd.DataFrame], Dict[str, pd.DataFrame]]

Robust ICA Example

# Run robust ICA with 100 runs
M_matrices, A_matrices = multiModulon.run_robust_multiview_ica(
    a={'Species1': 50, 'Species2': 60},
    c=20,
    num_runs=100,
    mode='gpu',
    seed=42
)

# Access results
M_species1 = M_matrices['Species1']
A_species1 = A_matrices['Species1']

print(f"M matrix shape: {M_species1.shape}")
print(f"A matrix shape: {A_species1.shape}")

Different Thresholds

Apply different thresholds to core and unique components:

# Stricter threshold for core, looser for unique
M_matrices, A_matrices = multiModulon.run_robust_multiview_ica(
    a={'Species1': 50, 'Species2': 60},
    c=20,
    num_runs=100,
    effect_size_threshold_core=7,    # Strict for core
    effect_size_threshold_unique=3,  # Permissive for unique
    num_top_gene=20
)

Understanding the Results

M Matrix (Gene Weights)

The M matrix contains gene weights for each component:

M = M_matrices['Species1']

# Structure:
# Rows: Genes (aligned across species)
# Columns: Components (Core_1, Core_2, ..., Unique_1, ...)

# Get top genes for a component
component = 'Core_1'
weights = M[component].sort_values(ascending=False)

print(f"Top 10 genes in {component}:")
print(weights.head(10))

print(f"\nBottom 10 genes in {component}:")
print(weights.tail(10))

A Matrix (Sample Activities)

The A matrix contains component activities across samples:

A = A_matrices['Species1']

# Structure:
# Rows: Components
# Columns: Samples

# Get activity profile for a component
component = 'Core_1'
activities = A.loc[component]

# Find samples with high activity
high_activity_samples = activities[activities > 5].index
print(f"Samples with high {component} activity:")
print(high_activity_samples.tolist())

Component Types

Components are labeled by type:

# List all components
all_components = M.columns.tolist()

# Separate by type
core_components = [c for c in all_components if c.startswith('Core_')]
unique_components = [c for c in all_components if c.startswith('Unique_')]

print(f"Core components: {len(core_components)}")
print(f"Unique components: {len(unique_components)}")

Generating Activity Matrices

After running ICA, generate A matrices from M and X:

MultiModulon.generate_A()

Generate A matrices (M.T @ X) for all species.

Example:

# Generate A matrices after ICA
multiModulon.generate_A()

# Access generated matrices
for species in multiModulon.species:
    A = multiModulon[species].A
    print(f"{species} activities: {A.shape}")

This is useful when:

  • You’ve loaded pre-computed M matrices

  • You want to recalculate activities after filtering

Advanced Usage

Custom Parameters

Fine-tune the ICA algorithm:

# Direct access to underlying function
from multimodulon.multiview_ica import run_multiview_ica

M_matrices = run_multiview_ica(
    species_X_matrices={s: multiModulon[s].X for s in multiModulon.species},
    a_values={'Species1': 50, 'Species2': 60},
    c=20,
    mode='gpu',
    max_iter=10000,      # More iterations
    learning_rate=0.01,  # Custom learning rate
    batch_size=None,     # Full batch
    seed=42
)

GPU vs CPU Mode

Choose based on your system:

import torch

# Check GPU availability
if torch.cuda.is_available():
    print("GPU available - using GPU mode")
    mode = 'gpu'
else:
    print("No GPU - using CPU mode")
    mode = 'cpu'

# Run ICA
M_matrices = multiModulon.run_multiview_ica(
    a=50,
    c=20,
    mode=mode
)

Quality Control

Assess ICA Results

# Calculate explained variance
explained_var = multiModulon.calculate_explained_variance()
for species, var in explained_var.items():
    print(f"{species}: {var:.1%} variance explained")

# Check component effect sizes
from multimodulon.multiview_ica_optimization import calculate_average_effect_sizes

effect_sizes = calculate_average_effect_sizes(
    M_matrices,
    num_top_gene=20
)

Component Correlation

Check independence of components:

# Within species
M = M_matrices['Species1']
corr_matrix = M.corr()

# High correlation indicates redundancy
import seaborn as sns

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, cmap='coolwarm', center=0)
plt.title("Component correlation within Species1")
plt.show()

# Across species (for core components)
core_comps = [c for c in M.columns if c.startswith('Core_')]
M1_core = M_matrices['Species1'][core_comps]
M2_core = M_matrices['Species2'][core_comps]

# Compare matching components
for comp in core_comps:
    corr = M1_core[comp].corr(M2_core[comp])
    print(f"{comp} correlation: {corr:.3f}")

Best Practices

  1. Always use robust ICA for final results (20+ runs)

  2. Validate core components across species

  3. Use GPU mode when available for speed

Next Steps

After running ICA:

  1. Visualization of iModulons - Visualize components

  2. Basic Workflow - Complete workflow example

  3. Biological interpretation - Analyze gene sets and activities