Robust Multi-view ICA
This section covers running multi-view Independent Component Analysis (ICA) to identify core (conserved) and unique (species-specific) regulatory modules.
Overview
Robust Multi-view ICA decomposes expression matrices from multiple species into:
Core components - Shared regulatory modules conserved across species
Unique components - Species-specific regulatory modules
The analysis produces:
M matrices - Gene weights (mixing matrices)
A matrices - Sample activities (source signals)
Basic Multi-view ICA
- MultiModulon.run_multiview_ica(**kwargs)
Run multi-view ICA on aligned expression matrices.
- Parameters:
- Returns:
Dictionary mapping species to M matrices
- Return type:
Dict[str, pd.DataFrame]
Basic Usage
# Run with equal components per species
M_matrices = multiModulon.run_multiview_ica(
a=50, # 50 total components per species
c=20, # 20 core components
mode='gpu'
)
# Run with different components per species
M_matrices = multiModulon.run_multiview_ica(
a={'Species1': 50, 'Species2': 60, 'Species3': 45},
c=20,
mode='gpu'
)
Robust Multi-view ICA
For more reliable results, use robust ICA with multiple runs:
- MultiModulon.run_robust_multiview_ica(**kwargs)
Run multi-view ICA multiple times and cluster results for robustness.
- Parameters:
- Returns:
Tuple of (M_matrices, A_matrices)
- Return type:
Robust ICA Example
# Run robust ICA with 100 runs
M_matrices, A_matrices = multiModulon.run_robust_multiview_ica(
a={'Species1': 50, 'Species2': 60},
c=20,
num_runs=100,
mode='gpu',
seed=42
)
# Access results
M_species1 = M_matrices['Species1']
A_species1 = A_matrices['Species1']
print(f"M matrix shape: {M_species1.shape}")
print(f"A matrix shape: {A_species1.shape}")
Understanding the Results
M Matrix (Gene Weights)
The M matrix contains gene weights for each component:
M = M_matrices['Species1']
# Structure:
# Rows: Genes (aligned across species)
# Columns: Components (Core_1, Core_2, ..., Unique_1, ...)
# Get top genes for a component
component = 'Core_1'
weights = M[component].sort_values(ascending=False)
print(f"Top 10 genes in {component}:")
print(weights.head(10))
print(f"\nBottom 10 genes in {component}:")
print(weights.tail(10))
A Matrix (Sample Activities)
The A matrix contains component activities across samples:
A = A_matrices['Species1']
# Structure:
# Rows: Components
# Columns: Samples
# Get activity profile for a component
component = 'Core_1'
activities = A.loc[component]
# Find samples with high activity
high_activity_samples = activities[activities > 5].index
print(f"Samples with high {component} activity:")
print(high_activity_samples.tolist())
Component Types
Components are labeled by type:
# List all components
all_components = M.columns.tolist()
# Separate by type
core_components = [c for c in all_components if c.startswith('Core_')]
unique_components = [c for c in all_components if c.startswith('Unique_')]
print(f"Core components: {len(core_components)}")
print(f"Unique components: {len(unique_components)}")
Advanced Usage
Custom Parameters
Fine-tune the ICA algorithm:
# Direct access to underlying function
from multimodulon.multiview_ica import run_multiview_ica
M_matrices = run_multiview_ica(
species_X_matrices={s: multiModulon[s].X for s in multiModulon.species},
a_values={'Species1': 50, 'Species2': 60},
c=20,
mode='gpu',
max_iter=10000, # More iterations
learning_rate=0.01, # Custom learning rate
batch_size=None, # Full batch
seed=42
)
GPU vs CPU Mode
Choose based on your system:
import torch
# Check GPU availability
if torch.cuda.is_available():
print("GPU available - using GPU mode")
mode = 'gpu'
else:
print("No GPU - using CPU mode")
mode = 'cpu'
# Run ICA
M_matrices = multiModulon.run_multiview_ica(
a=50,
c=20,
mode=mode
)
Quality Control
Assess ICA Results
# Check components passing the single-gene filter
from multimodulon.optimization import passes_single_gene_filter
for species, M in M_matrices.items():
n_components = M.shape[1]
n_passing = sum(
passes_single_gene_filter(M.iloc[:, idx].values)
for idx in range(n_components)
)
print(f"{species}: {n_passing}/{n_components} components pass single-gene filter")
Component Correlation
Check independence of components:
# Within species
M = M_matrices['Species1']
corr_matrix = M.corr()
# High correlation indicates redundancy
import seaborn as sns
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, cmap='coolwarm', center=0)
plt.title("Component correlation within Species1")
plt.show()
# Across species (for core components)
core_comps = [c for c in M.columns if c.startswith('Core_')]
M1_core = M_matrices['Species1'][core_comps]
M2_core = M_matrices['Species2'][core_comps]
# Compare matching components
for comp in core_comps:
corr = M1_core[comp].corr(M2_core[comp])
print(f"{comp} correlation: {corr:.3f}")
Best Practices
Always use robust ICA for final results (20+ runs)
Validate core components across species
Use GPU mode when available for speed
Next Steps
After running ICA:
Visualization of iModulons - Visualize components
MultiModulon Worflow - Complete workflow example
Biological interpretation - Analyze gene sets and activities