Visualization of iModulons

This section covers the visualization functions for exploring iModulon gene weights and activities across samples.

Overview

MultiModulon provides eight main visualization functions:

  1. view_iModulon_weights - Visualize gene weights within a component for a single species

  2. view_core_iModulon_weights - Visualize a core iModulon component across all species

  3. view_iModulon_activities - Visualize component activities across samples

  4. compare_core_iModulon_activity - Compare core iModulon activities across multiple species for specific conditions

  5. plot_iM_conservation_bubble_matrix - Summarize iModulon conservation across species

  6. show_iModulon_activity_change - Visualize activity changes between two conditions

  7. core_iModulon_stability - Quantify core iModulon stability across species using pairwise correlations

  8. show_gene_iModulon_correlation - Show correlation between gene expression and iModulon activity across species

All functions support customization of appearance, highlighting, and export options.

Visualizing Gene Weights

MultiModulon.view_iModulon_weights(species, component, save_path=None, fig_size=(6, 4), font_path=None, show_COG=False, show_gene_names=None, show_all_gene_names=False)

Create a bar plot showing gene weights for a specific iModulon component.

Parameters:
  • species (str) – Species/strain name

  • component (str) – Component name (e.g., ‘Core_1’, ‘Unique_1’)

  • save_path (str) – Path to save the plot (optional)

  • fig_size (tuple) – Figure size as (width, height) (default: (6, 4))

  • font_path (str) – Path to custom font file (optional)

  • show_COG (bool) – Color genes by COG category (default: False)

  • show_gene_names (bool) – Show gene names on plot. If None, auto-set based on component size (default: None). Maximum 60 gene labels will be shown (top genes by weight magnitude)

  • show_all_gene_names (bool) – Label all genes above threshold without the 60-label cap (default: False)

Basic Usage

# Simple gene weight plot
multiModulon.view_iModulon_weights(
    species='Species1',
    component='Core_1',
    save_path='core1_weights.svg'
)

# With COG coloring
multiModulon.view_iModulon_weights(
    species='Species1',
    component='Core_1',
    show_COG=True,
    save_path='core1_weights_COG.svg'
)

# With gene names labeled
multiModulon.view_iModulon_weights(
    species='Species1',
    component='Core_1',
    show_gene_names=True,
    save_path='core1_weights_labeled.svg'
)

# Auto-labeling for small components (default behavior)
# If component has <10 genes above threshold, labels are shown automatically
multiModulon.view_iModulon_weights(
    species='Species1',
    component='Small_Component_1',  # Has only 7 genes
    save_path='small_component_auto_labeled.svg'
)

Understanding the Plot

  • X-axis: Gene positions along genome (Mb)

  • Y-axis: Gene weights (coefficients from M matrix)

  • Dotted lines: Threshold (if optimized)

  • Colors: COG categories (if show_COG=True) or light blue/grey based on threshold

  • Labels: Gene names displayed on plot when show_gene_names=True (max 60 genes) - Automatically shown for small components (<10 genes above threshold) - Uses Preferred_name if available, otherwise uses standard gene names - Text has white background boxes for better readability - Positioned with initial offset (2% of y-range) from dots using golden angle distribution - Uses adjustText library for optimized positioning with strong repulsion parameters - Simple lines connect labels to their corresponding points - Fallback smart positioning with alternating pattern if adjustText fails

COG Categories

When show_COG=True, genes are colored by functional category:

# COG categories and their colors:
# - Translation (J): black
# - Transcription (K): sandybrown
# - Replication (L): fuchsia
# - Cell division (D): olive
# - Defense (V): orchid
# - Signal transduction (T): teal
# - Cell membrane (M): purple
# - Energy production (C): red
# - Carbohydrate metabolism (G): gold
# - Amino acid metabolism (E): darkgreen
# - Nucleotide metabolism (F): pink
# - Coenzyme metabolism (H): brown
# - Lipid metabolism (I): lightsalmon
# - Inorganic ion metabolism (P): darkblue
# - Secondary metabolism (Q): sienna
# - Unknown function (S): lightgray
# - Not in COG: gray

Customizing Appearance

# Larger figure with custom font
multiModulon.view_iModulon_weights(
    species='Species1',
    component='Core_1',
    fig_size=(8, 6),
    font_path='/usr/share/fonts/truetype/liberation/LiberationSans-Regular.ttf',
    save_path='custom_weights.svg'
)

Visualizing Core iModulons Across Species

MultiModulon.view_core_iModulon_weights(component, save_path=None, fig_size=(6, 4), font_path=None, show_COG=False, reference_order=None, show_gene_names=False)

Visualize a core iModulon component across all species. Creates individual plots for each species showing the same core component, or a combined plot with subplots when COG coloring is enabled.

Parameters:
  • component (str) – Core component name (e.g., ‘Core_1’, ‘Core_2’)

  • save_path (str) – Directory path to save plots (optional)

  • fig_size (tuple) – Figure size for individual plots (default: (6, 4))

  • font_path (str) – Path to custom font file (optional)

  • show_COG (bool) – Color genes by COG category (default: False)

  • reference_order (list) – Custom species order for subplot arrangement (optional)

  • show_gene_names (bool) – Show gene names on plots for all genes above threshold and uses adjust_text to reduce overlap (default: False).

Basic Usage

# Visualize core component across all species
multiModulon.view_core_iModulon_weights(
    component='Core_1',
    save_path='core_plots/'
)

# With COG coloring - creates combined plot
multiModulon.view_core_iModulon_weights(
    component='Core_1',
    show_COG=True,
    save_path='core1_all_species_COG.svg'
)

# With gene labeling - shows all genes above threshold
multiModulon.view_core_iModulon_weights(
    component='Core_1',
    show_gene_names=True,
    save_path='core1_labeled.svg'
)
# This will:
# - Label all genes above threshold in each species plot
# - Print list of shared genes to console (when available)

Custom Species Order

When using COG coloring, arrange species in a specific order:

# Define custom order (first 3 in top row, rest in bottom row)
multiModulon.view_core_iModulon_weights(
    component='Core_1',
    show_COG=True,
    reference_order=['MG1655', 'BL21', 'C', 'Crooks', 'W', 'W3110'],
    save_path='core1_ordered.svg'
)

Understanding the Output

Without COG coloring: Creates individual plots for each species
  • Each plot saved as ‘{species}_{component}_iModulon.svg’

  • Shows gene weights on genomic coordinates

  • Includes threshold lines if available

  • Gene labels shown if show_gene_names=True

With COG coloring: Creates a single combined plot
  • All species shown as subplots

  • Shared COG category legend at bottom

  • Genes colored by functional category

  • Grey dots indicate genes below threshold

  • Gene labels shown if show_gene_names=True (all genes above threshold)

  • Shared genes across all species are printed to console instead

  • Initial positioning uses golden angle spiral (same as view_iModulon_weights)

  • Stronger force parameters for crowded subplots (force_points: 1.0-1.2, force_text: 2.0-2.5)

  • More expansion around points and text (3.5-4.0 for points, 3.0-3.5 for text)

  • 2500 iterations for better convergence in small subplot spaces

  • Fallback to initial positions if adjust_text fails

Batch Processing Core Components

# Plot all core components
M = multiModulon[multiModulon.species[0]].M
core_components = [c for c in M.columns if c.startswith('Core_')]

for comp in core_components:
    # Individual species plots
    multiModulon.view_core_iModulon_weights(
        component=comp,
        save_path=f'core_plots/{comp}/'
    )

    # Combined COG plot
    multiModulon.view_core_iModulon_weights(
        component=comp,
        show_COG=True,
        save_path=f'core_plots/{comp}_COG.svg'
    )

Visualizing iModulon Activities

MultiModulon.view_iModulon_activities(species, component, save_path=None, fig_size=(12, 3), font_path=None, highlight_project=None, highlight_study=None, highlight_condition=None, show_highlight_only=False, show_highlight_only_color=None)

Create a bar plot showing component activities across samples.

Parameters:
  • species (str) – Species/strain name

  • component (str) – Component name

  • save_path (str) – Path to save the plot

  • fig_size (tuple) – Figure size (default: (12, 3))

  • font_path (str) – Path to custom font

  • highlight_project – Project(s) to highlight (str or list)

  • highlight_study (str) – Study to highlight

  • highlight_condition – Condition(s) to highlight (str or list)

  • show_highlight_only (bool) – Only show highlighted conditions

  • show_highlight_only_color (list) – Colors for highlighted conditions

Basic Usage

# Simple activity plot
multiModulon.view_iModulon_activities(
    species='Species1',
    component='Core_1',
    save_path='core1_activities.svg'
)

# Highlight specific project
multiModulon.view_iModulon_activities(
    species='Species1',
    component='Core_1',
    highlight_project='ProjectA',
    save_path='core1_highlighted.svg'
)

Condition-based Visualization

When a condition column exists in the sample sheet:

# Activities are averaged by condition
# Individual sample values shown as black dots
multiModulon.view_iModulon_activities(
    species='Species1',
    component='Core_1',
    save_path='condition_averaged.svg'
)

# Highlight specific conditions
multiModulon.view_iModulon_activities(
    species='Species1',
    component='Core_1',
    highlight_condition=['Treatment1', 'Treatment2'],
    save_path='conditions_highlighted.svg'
)

Show Only Highlighted Conditions

Focus on specific conditions:

# Show only specific conditions with custom colors
multiModulon.view_iModulon_activities(
    species='Species1',
    component='Core_1',
    highlight_condition=['Control', 'Stress', 'Recovery'],
    show_highlight_only=True,
    show_highlight_only_color=['blue', 'red', 'green'],
    save_path='focused_conditions.svg'
)

Multiple Highlighting Options

# Highlight multiple projects
multiModulon.view_iModulon_activities(
    species='Species1',
    component='Core_1',
    highlight_project=['ProjectA', 'ProjectB'],
    save_path='multi_project.svg'
)

# Highlight by study
multiModulon.view_iModulon_activities(
    species='Species1',
    component='Core_1',
    highlight_study='GSE12345',
    save_path='study_highlighted.svg'
)

Advanced Visualization

Batch Visualization

Create plots for multiple components:

# Plot all core components
for species in multiModulon.species:
    M = multiModulon[species].M
    core_comps = [c for c in M.columns if c.startswith('Core_')]

    for comp in core_comps:
        # Gene weights
        multiModulon.view_iModulon_weights(
            species=species,
            component=comp,
            show_COG=True,
            save_path=f'weights/{species}_{comp}_weights.svg'
        )

        # Activities
        multiModulon.view_iModulon_activities(
            species=species,
            component=comp,
            save_path=f'activities/{species}_{comp}_activities.svg'
        )

Export Options

File Formats

Save plots in different formats:

# Vector format (scalable)
multiModulon.view_iModulon_weights(
    species='Species1',
    component='Core_1',
    save_path='weights.svg'  # SVG format
)

# High-resolution raster
multiModulon.view_iModulon_weights(
    species='Species1',
    component='Core_1',
    save_path='weights.png'  # png at 300 DPI
)

# PDF for publications
multiModulon.view_iModulon_weights(
    species='Species1',
    component='Core_1',
    save_path='weights.pdf'
)

Directory Organization

Organize outputs systematically:

import os

# Create directory structure
base_dir = 'imodulon_plots'
for subdir in ['weights', 'activities', 'weights_COG']:
    os.makedirs(f'{base_dir}/{subdir}', exist_ok=True)

# Save with organized naming
for species in multiModulon.species:
    for comp in ['Core_1', 'Core_2', 'Unique_1']:
        # Weights without COG
        multiModulon.view_iModulon_weights(
            species=species,
            component=comp,
            save_path=f'{base_dir}/weights/{species}_{comp}.svg'
        )

        # Weights with COG
        multiModulon.view_iModulon_weights(
            species=species,
            component=comp,
            show_COG=True,
            save_path=f'{base_dir}/weights_COG/{species}_{comp}.svg'
        )

        # Activities
        multiModulon.view_iModulon_activities(
            species=species,
            component=comp,
            save_path=f'{base_dir}/activities/{species}_{comp}.svg'
        )

Comparing Core iModulon Activities Across Species

MultiModulon.compare_core_iModulon_activity(component, species_in_comparison, condition_list, save_path=None, fig_size=(12, 3), font_path=None, legend_title=None, title=None)

Compare core iModulon activities across multiple species for specific conditions. Creates a grouped bar plot with conditions on x-axis and species shown as different colored bars.

Parameters:
  • component (str) – Core component name (e.g., ‘Core_1’, ‘Core_2’)

  • species_in_comparison (list) – List of species names to compare

  • condition_list (list) – List of conditions in format “condition:project”

  • save_path (str) – Path to save the plot (optional)

  • fig_size (tuple) – Figure size (default: (12, 3))

  • font_path (str) – Path to custom font file (optional)

  • legend_title (str) – Custom title for the legend (default: ‘Species’)

  • title (str) – Custom title for the plot (default: ‘Core iModulon {component} Activity Comparison’)

Basic Usage

# Compare Core_1 activities across species for specific conditions
multiModulon.compare_core_iModulon_activity(
    component='Core_1',
    species_in_comparison=['E_coli', 'S_enterica', 'K_pneumoniae'],
    condition_list=['glucose:project1', 'lactose:project1', 'arabinose:project2']
)

Condition Format

Conditions must be specified as “condition:project” pairs:

# Comparing growth conditions from different projects
multiModulon.compare_core_iModulon_activity(
    component='Core_1',
    species_in_comparison=['Species1', 'Species2', 'Species3'],
    condition_list=[
        'exponential:growth_study',    # Exponential phase from growth_study
        'stationary:growth_study',     # Stationary phase from growth_study
        'heat_shock:stress_project',   # Heat shock from stress_project
        'cold_shock:stress_project'    # Cold shock from stress_project
    ],
    save_path='core1_condition_comparison.svg'
)

Understanding the Plot

  • X-axis: Conditions (grouped by the order in condition_list)

  • Y-axis: iModulon activity values

  • Bars: Different colors for each species

  • Dots: Individual sample values (black dots on bars)

  • Legend: Species names with corresponding colors

Error Handling

The function validates that all conditions exist in all species:

# This will raise an error if any species lacks a condition
try:
    multiModulon.compare_core_iModulon_activity(
        component='Core_1',
        species_in_comparison=['Species1', 'Species2'],
        condition_list=['rare_condition:project1']
    )
except ValueError as e:
    print(f"Error: {e}")

Customizing Appearance

# Larger figure with custom font
multiModulon.compare_core_iModulon_activity(
    component='Core_1',
    species_in_comparison=['Species1', 'Species2', 'Species3'],
    condition_list=['control:exp1', 'treatment:exp1'],
    fig_size=(15, 5),  # Wider figure
    font_path='/path/to/font.ttf',
    save_path='comparison_custom.svg'
)

# Custom title and legend
multiModulon.compare_core_iModulon_activity(
    component='Core_1',
    species_in_comparison=['E_coli_K12', 'E_coli_B', 'E_coli_C'],
    condition_list=['glucose:carbon_study', 'lactose:carbon_study'],
    title='Carbon Source Response in E. coli Strains',
    legend_title='E. coli Strain',
    save_path='ecoli_carbon_response.svg'
)

Use Cases

  1. Stress Response Comparison: Compare how different species respond to the same stresses

  2. Metabolic Adaptation: Analyze metabolic shifts across species under different carbon sources

  3. Evolutionary Analysis: Study conservation of regulatory responses

# Example: Comparing stress responses
stress_conditions = [
    'control:stress_study',
    'heat_42C:stress_study',
    'oxidative_H2O2:stress_study',
    'acid_pH5:stress_study'
]

multiModulon.compare_core_iModulon_activity(
    component='Core_1',  # Assuming Core_1 is stress-related
    species_in_comparison=['E_coli', 'S_enterica', 'K_pneumoniae'],
    condition_list=stress_conditions,
    save_path='stress_response_comparison.svg'
)

Conservation Bubble Matrix

MultiModulon.plot_iM_conservation_bubble_matrix(n_components, reference_order=None, iM_colors=None, fig_size=(10, 4), bubble_scale=800.0, y_label='Species/Strains', save_path=None, font_path=None)

Plot a bubble matrix summarizing iModulon conservation across species.

Parameters:
  • n_components (int) – Number of leading components (per species) on the x-axis

  • reference_order (list) – Optional species order for the y-axis

  • iM_colors (list) – Optional list of colors for iModulon columns

  • fig_size (tuple) – Figure size as (width, height) (default: (10, 4))

  • bubble_scale (float) – Scaling factor for bubble sizes (default: 800.0)

  • y_label (str) – Label for the y-axis (default: “Species/Strains”)

  • save_path (str) – Path to save the plot (optional)

  • font_path (str) – Path to custom font file (optional)

Basic Usage

# Summarize conservation for the top 8 components per species
multiModulon.plot_iM_conservation_bubble_matrix(
    n_components=8,
    reference_order=['Species1', 'Species2', 'Species3'],
    save_path='conservation_bubble_matrix.svg'
)

Visualizing Activity Changes Between Conditions

MultiModulon.show_iModulon_activity_change(species, condition_1, condition_2, save_path=None, fig_size=(5, 5), font_path=None, threshold=1.5)

Visualize iModulon activity changes between two conditions as a scatter plot.

Creates a scatter plot with condition_1 activities on x-axis and condition_2 on y-axis. Components with significant changes are highlighted in light blue and labeled. Activities are calculated by averaging all biological replicates for each condition.

Parameters:
  • species (str) – Species/strain name

  • condition_1 (str) – First condition in format “condition_name:project_name” (x-axis)

  • condition_2 (str) – Second condition in format “condition_name:project_name” (y-axis)

  • save_path (str) – Path to save the plot (optional)

  • fig_size (tuple) – Figure size (default: (5, 5))

  • font_path (str) – Path to custom font file (optional)

  • threshold (float) – Threshold for significant change (default: 1.5). Scaled based on activity range

Basic Usage

# Compare activities between two conditions
multiModulon.show_iModulon_activity_change(
    species='E_coli',
    condition_1='glucose:carbon_source_study',
    condition_2='lactose:carbon_source_study',
    save_path='glucose_vs_lactose_changes.svg'
)

# Compare conditions from different projects
multiModulon.show_iModulon_activity_change(
    species='E_coli',
    condition_1='control:experiment_1',
    condition_2='stress:experiment_2',
    save_path='cross_project_comparison.svg'
)

Understanding the Plot

  • Grey dots: Components with minimal change between conditions

  • Light blue dots: Components with significant change (absolute difference > scaled threshold)

  • Labels: Component names shown for significant changes - Smart initial positioning that checks distance to ALL points before placing labels - Minimum safe distance of 10% of axis range from any point - White background boxes with light gray borders for readability - Simple gray lines connect labels to their corresponding points - No automatic repositioning to prevent labels from moving onto dots - 10% axis margins added to ensure labels are fully visible - Saved with 0.05 inch padding to prevent label cutoff

  • Dotted lines: Three reference lines at y=x (diagonal), x=0 (vertical), and y=0 (horizontal)

Note: The threshold is automatically scaled based on the range of activities to handle negative ICA values appropriately.

Customizing the Threshold

# Use stricter threshold for significance
multiModulon.show_iModulon_activity_change(
    species='E_coli',
    condition_1='control:stress_study',
    condition_2='heat_shock:stress_study',
    threshold=2.0,  # Require 2-fold change
    save_path='stress_response_strict.svg'
)

# Use more lenient threshold
multiModulon.show_iModulon_activity_change(
    species='E_coli',
    condition_1='early_log:growth_curve',
    condition_2='late_log:growth_curve',
    threshold=1.3,  # 1.3-fold change
    save_path='growth_phase_changes.svg'
)

Use Cases

  1. Metabolic Shifts: Identify iModulons responding to carbon source changes

  2. Stress Response: Find iModulons activated under stress conditions

  3. Growth Phase: Compare exponential vs stationary phase activities

  4. Treatment Effects: Analyze drug or environmental perturbations

# Example: Analyzing antibiotic response
multiModulon.show_iModulon_activity_change(
    species='E_coli',
    condition_1='untreated:antibiotic_study',
    condition_2='ampicillin:antibiotic_study',
    threshold=1.5,
    save_path='ampicillin_response.svg'
)

# Example: Growth phase comparison
multiModulon.show_iModulon_activity_change(
    species='S_enterica',
    condition_1='exponential:growth_phases',
    condition_2='stationary:growth_phases',
    font_path='/path/to/Arial.ttf',
    save_path='growth_phase_comparison.pdf'
)

Core iModulon Stability Analysis

MultiModulon.core_iModulon_stability(component, save_path=None, fig_size=(6, 4), font_path=None, show_stats=True)

Quantify core iModulon stability across species using pairwise correlations.

This function calculates how similar a core iModulon is across different species by computing the mean pairwise correlation of M matrix weights. It uses adaptive gap detection to identify distinct groups and outliers in small datasets (3-6 species).

Parameters:
  • component (str) – Component name (e.g., ‘Core_1’, ‘Iron’)

  • save_path (str) – Path to save the plot (optional)

  • fig_size (tuple) – Figure size as (width, height) (default: (6, 5))

  • font_path (str) – Path to custom font file (optional)

  • show_stats (bool) – Whether to show statistics on the plot (default: True)

Returns:

Tuple of (stable_species, stable_min, stable_max, stability_scores) - stable_species (list): List of species names classified as stable (non-outliers) - stable_min (float): Lower boundary for stable range (outlier detection threshold) - stable_max (float): Upper boundary for stable range (always 1.0) - stability_scores (dict): Dictionary mapping species names to stability scores

Basic Usage

# Simple usage - robust outlier detection optimized for 3-6 species
stable, min_bound, max_bound, scores = multiModulon.core_iModulon_stability('Core_1')
print(f"Stable species: {stable}")
print(f"Outlier threshold: {min_bound:.3f}")

# With custom font for publications
stable, min_bound, max_bound, scores = multiModulon.core_iModulon_stability(
    'Iron',
    font_path='/usr/share/fonts/truetype/msttcorefonts/Arial.ttf',
    save_path='iron_stability.svg'
)

# Analyze individual species stability
for species, score in scores.items():
    status = "stable" if species in stable else "outlier (problematic)"
    print(f"{species}: {score:.3f} ({status})")

# Check if any species are problematic
if len(stable) < len(scores):
    outliers = [s for s in scores.keys() if s not in stable]
    print(f"⚠️  Potential issues with: {', '.join(outliers)}")
else:
    print("✅ All species show consistent regulatory patterns")

Understanding the Stability Metric

The stability score for each species is calculated as the mean pairwise Pearson correlation of its M matrix weights with all other species for the specified core component:

  • Score = 1.0: Perfect correlation with all other species (highly stable)

  • Score > 0.7: Good stability, similar regulatory pattern across species

  • Score < 0.5: Low stability, divergent regulatory pattern

Adaptive Gap Detection

The function uses adaptive gap detection - specifically designed to handle both group separation and outlier detection in small datasets (3-6 species):

Multi-Level Detection Strategy
  1. Similar scores (range < 0.05): All species marked stable

  2. Significant gaps: Detects natural breaks between groups of species

  3. Outlier detection: Uses IQR method for scattered individual outliers

  4. Edge cases: Special handling for 3-species datasets

Gap Detection Logic
  • Large gap threshold: Gap must be ≥15% of total range (or 60% for small ranges)

  • Split groups: Places threshold at midpoint of largest significant gap

  • Fallback to IQR: Uses interquartile range outlier detection if no clear gaps

Why This Approach Works Better?
  • Detects group patterns: Identifies when species form distinct clusters

  • Handles uniform distributions: Correctly identifies when all species are similar

  • Sensitive to structure: Finds meaningful biological separations

  • Robust to sample size: Works from 3-6 species with appropriate thresholds

Real Examples from Your Data:
  • Core_5 scores: [0.63, 0.65, 0.67, 0.67, 0.68] → Range=0.05 → All stable

  • Core_6 scores: [0.38, 0.38, 0.40, 0.52, 0.52, 0.53] → Gap=0.12 → Two groups detected

This adaptive method correctly identifies both scenarios: truly stable components and those with distinct species groups.

Understanding the Plot

  • X-axis: Species names

  • Y-axis: Stability scores (mean pairwise correlation)

  • Bar colors: - Blue (#C1C6E8): Stable species - Peach (#F0DDD2): Unstable species

  • Gray dashed lines: Adaptive detection boundaries (gap detection or outlier threshold)

  • Light blue shading: Stable correlation range (adapts to data structure)

  • Clean legend: Simple “Stable” and “Unstable” labels

Use Cases

  1. Quality Control: Identify species with poorly defined or inconsistent iModulons

  2. Data Validation: Detect potential ICA decomposition issues or data quality problems

  3. Species Selection: Choose the most reliable species for downstream analysis

  4. Comparative Analysis: Understand which species deviate from conserved regulatory patterns

  5. Method Validation: Assess whether core components are truly “core” across species

# Example: Analyzing all core components
M = multiModulon[multiModulon.species[0]].M
core_components = [c for c in M.columns if c.startswith('Core_')]

stability_results = {}
for comp in core_components:
    stable, min_bound, max_bound, scores = multiModulon.core_iModulon_stability(
        comp,
        save_path=f'stability/{comp}_stability.svg'
    )
    stability_results[comp] = {
        'stable_species': stable,
        'range': (min_bound, max_bound),
        'all_scores': scores,
        'range_width': max_bound - min_bound
    }

# Identify components with outlier species
problematic_components = [comp for comp, res in stability_results.items()
                        if len(res['stable_species']) < len(scores)]
print(f"Components with outlier species: {problematic_components}")

# Find the most problematic species across all components
all_outliers = []
for res in stability_results.values():
    outliers = [s for s in scores.keys() if s not in res['stable_species']]
    all_outliers.extend(outliers)

from collections import Counter
outlier_counts = Counter(all_outliers)
if outlier_counts:
    print("Species flagged as outliers (count across components):")
    for species, count in outlier_counts.most_common():
        print(f"  {species}: {count} components")

Advanced Options

# Disable statistics display for cleaner plots
stable, min_bound, max_bound, scores = multiModulon.core_iModulon_stability(
    'Core_1',
    show_stats=False,
    save_path='clean_stability.svg'
)

# Custom figure size and font
stable, min_bound, max_bound, scores = multiModulon.core_iModulon_stability(
    'Core_1',
    fig_size=(7, 5),
    font_path='/usr/share/fonts/truetype/msttcorefonts/Arial.ttf',
    save_path='custom_stability.pdf'
)

# Batch analysis of multiple components
M = multiModulon[multiModulon.species[0]].M
components = [c for c in M.columns if c.startswith('Core_')]

stability_summary = {}
for comp in components:
    stable, min_bound, max_bound, scores = multiModulon.core_iModulon_stability(
        comp,
        save_path=f'stability/{comp}_stability.svg'
    )
    stability_summary[comp] = {
        'n_stable': len(stable),
        'n_total': len(scores),
        'outlier_threshold': min_bound,
        'median_score': np.median(list(scores.values())),
        'outlier_species': [s for s in scores.keys() if s not in stable]
    }

print("Component stability summary:")
for comp, stats in stability_summary.items():
    outlier_info = f", outliers: {stats['outlier_species']}" if stats['outlier_species'] else ""
    print(f"{comp}: {stats['n_stable']}/{stats['n_total']} stable "
          f"(median={stats['median_score']:.3f}){outlier_info}")

# Overall dataset quality assessment
total_stable = sum(stats['n_stable'] for stats in stability_summary.values())
total_possible = sum(stats['n_total'] for stats in stability_summary.values())
print(f"\nOverall stability: {total_stable}/{total_possible} "
      f"({100*total_stable/total_possible:.1f}%) species-component pairs are stable")

Gene-iModulon Correlation Analysis

MultiModulon.show_gene_iModulon_correlation(gene, component, save_path=None, fig_size=(5, 4), font_path=None)

Show correlation between gene expression and iModulon activity across species.

Creates scatter plots showing the correlation between gene expression (from log_tpm) and component activity (from A matrix) for each species where the gene is present.

Parameters:
  • gene (str) – Gene name (any value from combined_gene_db)

  • component (str) – Component name (e.g., ‘Core_1’, ‘Unique_1’)

  • save_path (str) – Path to save the figure (optional). Can be: - Full file path with extension (e.g., ‘output/correlation.svg’) - Directory path (will save as ‘{gene}_{component}_correlation.svg’)

  • fig_size (tuple) – Figure size for each subplot (default: (5, 4))

  • font_path (str) – Path to custom font file (optional)

Basic Usage

# Show correlation for a specific gene and core iModulon
multiModulon.show_gene_iModulon_correlation(
    gene='argA',
    component='Core_1',
    save_path='argA_Core1_correlation.svg'
)

# With custom appearance
multiModulon.show_gene_iModulon_correlation(
    gene='trpE',
    component='Core_3',
    fig_size=(6, 5),
    font_path='/path/to/Arial.ttf',
    save_path='output_dir/'
)

Features

  • Multi-species visualization: Shows correlation for all species containing the gene

  • Correlation coefficient: Displays Pearson’s r in the top left of each subplot

  • Fitted line: Shows linear relationship between expression and activity

  • Automatic layout: Maximum 3 columns per row for multiple species

  • Species-specific gene names: Uses appropriate gene identifiers for each species

Use Cases

  1. Validate iModulon members: Confirm genes are truly regulated by the iModulon

  2. Cross-species comparison: See if gene-iModulon relationships are conserved

  3. Identify outliers: Find conditions where typical correlations break down

  4. Regulatory strength: Assess how tightly a gene follows iModulon activity

# Example: Analyzing amino acid biosynthesis regulation
multiModulon.show_gene_iModulon_correlation(
    gene='hisG',  # Histidine biosynthesis
    component='Core_5',  # Amino acid biosynthesis iModulon
    save_path='histidine_regulation.pdf'
)

# Example: Stress response gene analysis
multiModulon.show_gene_iModulon_correlation(
    gene='dnaK',  # Heat shock protein
    component='Core_8',  # Stress response iModulon
    fig_size=(5, 4),
    save_path='stress_response_correlation.svg'
)

Best Practices

  1. Use descriptive filenames - Include species and component names

  2. Consistent figure sizes - Use same dimensions for comparable plots

  3. Save vector formats - Use SVG for publication figures

  4. Document parameters - Note thresholds and highlighting used

Next Steps

  1. examples/visualization_gallery - More visualization examples

  2. Biological interpretation - Analyze visualized patterns

  3. Export for further analysis - Use data in other tools