Robust Multi-view ICA
This section covers running multi-view Independent Component Analysis (ICA) to identify core (conserved) and unique (species-specific) regulatory modules.
Overview
Multi-view ICA decomposes expression matrices from multiple species into:
Core components - Shared regulatory modules conserved across species
Unique components - Species-specific regulatory modules
The analysis produces:
M matrices - Gene weights (mixing matrices)
A matrices - Sample activities (source signals)
Basic Multi-view ICA
- MultiModulon.run_multiview_ica(**kwargs)
Run multi-view ICA on aligned expression matrices.
- Parameters:
a – Number of components per species. Can be: - int: Same number for all species - Dict[str, int]: Different numbers per species
c (int) – Number of core (shared) components
mode (str) – ‘gpu’ or ‘cpu’ (default: ‘gpu’)
effect_size_threshold (float) – Cohen’s d threshold for filtering (optional)
num_top_gene (int) – Number of top genes for Cohen’s d calculation (default: 20)
- Returns:
Dictionary mapping species to M matrices
- Return type:
Dict[str, pd.DataFrame]
Basic Usage
# Run with equal components per species
M_matrices = multiModulon.run_multiview_ica(
a=50, # 50 total components per species
c=20, # 20 core components
mode='gpu'
)
# Run with different components per species
M_matrices = multiModulon.run_multiview_ica(
a={'Species1': 50, 'Species2': 60, 'Species3': 45},
c=20,
mode='gpu'
)
Component Filtering
Filter components by effect size:
# Only keep components with Cohen's d > 5
M_matrices = multiModulon.run_multiview_ica(
a=60,
c=25,
effect_size_threshold=5,
num_top_gene=20
)
# Check how many components passed the filter
for species, M in M_matrices.items():
print(f"{species}: {M.shape[1]} components retained")
Robust Multi-view ICA
For more reliable results, use robust ICA with multiple runs:
- MultiModulon.run_robust_multiview_ica(**kwargs)
Run multi-view ICA multiple times and cluster results for robustness.
- Parameters:
c (int) – Number of core components
num_runs (int) – Number of ICA runs (default: 100)
mode (str) – ‘gpu’ or ‘cpu’ (default: ‘gpu’)
seed (int) – Random seed (default: 42)
effect_size_threshold (float) – Cohen’s d threshold for all components
effect_size_threshold_core (float) – Threshold for core components only
effect_size_threshold_unique (float) – Threshold for unique components only
num_top_gene (int) – Number of top genes for Cohen’s d
- Returns:
Tuple of (M_matrices, A_matrices)
- Return type:
Robust ICA Example
# Run robust ICA with 100 runs
M_matrices, A_matrices = multiModulon.run_robust_multiview_ica(
a={'Species1': 50, 'Species2': 60},
c=20,
num_runs=100,
mode='gpu',
seed=42
)
# Access results
M_species1 = M_matrices['Species1']
A_species1 = A_matrices['Species1']
print(f"M matrix shape: {M_species1.shape}")
print(f"A matrix shape: {A_species1.shape}")
Different Thresholds
Apply different thresholds to core and unique components:
# Stricter threshold for core, looser for unique
M_matrices, A_matrices = multiModulon.run_robust_multiview_ica(
a={'Species1': 50, 'Species2': 60},
c=20,
num_runs=100,
effect_size_threshold_core=7, # Strict for core
effect_size_threshold_unique=3, # Permissive for unique
num_top_gene=20
)
Understanding the Results
M Matrix (Gene Weights)
The M matrix contains gene weights for each component:
M = M_matrices['Species1']
# Structure:
# Rows: Genes (aligned across species)
# Columns: Components (Core_1, Core_2, ..., Unique_1, ...)
# Get top genes for a component
component = 'Core_1'
weights = M[component].sort_values(ascending=False)
print(f"Top 10 genes in {component}:")
print(weights.head(10))
print(f"\nBottom 10 genes in {component}:")
print(weights.tail(10))
A Matrix (Sample Activities)
The A matrix contains component activities across samples:
A = A_matrices['Species1']
# Structure:
# Rows: Components
# Columns: Samples
# Get activity profile for a component
component = 'Core_1'
activities = A.loc[component]
# Find samples with high activity
high_activity_samples = activities[activities > 5].index
print(f"Samples with high {component} activity:")
print(high_activity_samples.tolist())
Component Types
Components are labeled by type:
# List all components
all_components = M.columns.tolist()
# Separate by type
core_components = [c for c in all_components if c.startswith('Core_')]
unique_components = [c for c in all_components if c.startswith('Unique_')]
print(f"Core components: {len(core_components)}")
print(f"Unique components: {len(unique_components)}")
Generating Activity Matrices
After running ICA, generate A matrices from M and X:
- MultiModulon.generate_A()
Generate A matrices (M.T @ X) for all species.
Example:
# Generate A matrices after ICA multiModulon.generate_A() # Access generated matrices for species in multiModulon.species: A = multiModulon[species].A print(f"{species} activities: {A.shape}")
This is useful when:
You’ve loaded pre-computed M matrices
You want to recalculate activities after filtering
Advanced Usage
Custom Parameters
Fine-tune the ICA algorithm:
# Direct access to underlying function
from multimodulon.multiview_ica import run_multiview_ica
M_matrices = run_multiview_ica(
species_X_matrices={s: multiModulon[s].X for s in multiModulon.species},
a_values={'Species1': 50, 'Species2': 60},
c=20,
mode='gpu',
max_iter=10000, # More iterations
learning_rate=0.01, # Custom learning rate
batch_size=None, # Full batch
seed=42
)
GPU vs CPU Mode
Choose based on your system:
import torch
# Check GPU availability
if torch.cuda.is_available():
print("GPU available - using GPU mode")
mode = 'gpu'
else:
print("No GPU - using CPU mode")
mode = 'cpu'
# Run ICA
M_matrices = multiModulon.run_multiview_ica(
a=50,
c=20,
mode=mode
)
Quality Control
Assess ICA Results
# Calculate explained variance
explained_var = multiModulon.calculate_explained_variance()
for species, var in explained_var.items():
print(f"{species}: {var:.1%} variance explained")
# Check component effect sizes
from multimodulon.multiview_ica_optimization import calculate_average_effect_sizes
effect_sizes = calculate_average_effect_sizes(
M_matrices,
num_top_gene=20
)
Component Correlation
Check independence of components:
# Within species
M = M_matrices['Species1']
corr_matrix = M.corr()
# High correlation indicates redundancy
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, cmap='coolwarm', center=0)
plt.title("Component correlation within Species1")
plt.show()
# Across species (for core components)
core_comps = [c for c in M.columns if c.startswith('Core_')]
M1_core = M_matrices['Species1'][core_comps]
M2_core = M_matrices['Species2'][core_comps]
# Compare matching components
for comp in core_comps:
corr = M1_core[comp].corr(M2_core[comp])
print(f"{comp} correlation: {corr:.3f}")
Best Practices
Always use robust ICA for final results (20+ runs)
Validate core components across species
Use GPU mode when available for speed
Next Steps
After running ICA:
Visualization of iModulons - Visualize components
Basic Workflow - Complete workflow example
Biological interpretation - Analyze gene sets and activities