Simulators

The simulator module provides classes for generating population genetic simulations under various demographic models. These simulators produce tree sequences that serve as the foundation for the simulation-based inference workflow.

Overview

Simulators in popgen-npe are responsible for:

  • Generating tree sequences under specified demographic models

  • Sampling parameters from prior distributions

  • Simulating both ancestry and mutations

  • Ensuring consistent random seeding for reproducibility

All simulators inherit from BaseSimulator and follow a consistent interface, making them interchangeable within the workflow. Unless otherwise noted, each parameter listed below is sampled uniformly over the given bounds and stored exactly as drawn in the targets array. Models that work in log10 space (e.g., the variable population-size simulators) return log10-transformed population sizes so downstream analyses must exponentiate them when real sizes are required.

Base Simulator

class workflow.scripts.ts_simulators.BaseSimulator(config: dict, default: dict)[source]

Bases: object

Base class for all simulators. Handles configuration parsing and default parameter assignment.

Parameters:

  • config (dict) – Configuration dictionary with simulator parameters

  • default (dict) – Default parameter values

Available Simulators

YRI_CEU

class workflow.scripts.ts_simulators.YRI_CEU(config: dict)[source]

Bases: BaseSimulator

Simulate a model defined in dadi’s manual (https://dadi.readthedocs.io/en/latest/examples/YRI_CEU/YRI_CEU/). Ancestral population changes in size; splits into YRI/CEU; CEU population undergoes a bottleneck upon splitting and subsequently grows exponentially. There is continuous symmetric migration between YRI and CEU.

Simulates the demographic model from the dadi manual for YRI (Yoruba) and CEU (European) populations.

Model Description:

  1. Ancestral population of size N_A

  2. Split into YRI and CEU populations at time Tp + T

  3. CEU undergoes bottleneck (N_CEU_initial) and exponential growth to N_CEU_final

  4. YRI maintains constant size N_YRI

  5. Continuous symmetric migration M between populations

Fixed Parameters:

  • samples: {“YRI”: 10, “CEU”: 10}

  • sequence_length: 10e6

  • recombination_rate: 1.5e-8

  • mutation_rate: 1.5e-8

Inferred Parameters (uniform priors):

  • N_A: [1e2, 1e5] - Ancestral population size

  • N_YRI: [1e2, 1e5] - YRI population size

  • N_CEU_initial: [1e2, 1e5] - CEU bottleneck size

  • N_CEU_final: [1e2, 1e5] - CEU final size

  • M: [0, 5e-4] - Migration rate

  • Tp: [0, 6e4] - Time before split

  • T: [0, 6e4] - Time of split

Example Configuration:

simulator:
  class_name: YRI_CEU
  sequence_length: 5e6
  samples:
    YRI: 20
    CEU: 20
default_config = {'M': [0, 0.0005], 'N_A': [100.0, 100000.0], 'N_CEU_final': [100.0, 100000.0], 'N_CEU_initial': [100.0, 100000.0], 'N_YRI': [100.0, 100000.0], 'T': [0, 60000.0], 'Tp': [0, 60000.0], 'mutation_rate': 1.5e-08, 'recombination_rate': 1.5e-08, 'samples': {'CEU': 10, 'YRI': 10}, 'sequence_length': 10000000.0}

DroMel_CO_FR

class workflow.scripts.ts_simulators.DroMel_CO_FR(config: dict)[source]

Bases: BaseSimulator

Simulate a two-population isolation-with-migration model for Drosophila melanogaster, Congolese and French populations, with growth in the French population

Simulates a two-population isolation-with-migration model for Drosophila melanogaster, with Cameroonian (CO) and French (FR) populations, including exponential growth in the French population.

Model Description:

  1. Ancestral population of size N_ANC

  2. Split into CO and FR populations at time T

  3. FR population undergoes exponential growth from N_FR0 to N_FR1

  4. CO population maintains constant size N_CO

  5. Asymmetric migration between populations (m_CO_FR, m_FR_CO)

Fixed Parameters:

  • samples: {“CO”: 10, “FR”: 10}

  • sequence_length: 1e6

  • recombination_rate: [0.5e-8, 3.0e-8] (sampled uniformly)

  • mutation_rate: 5.49e-9

  • haploid: True

Inferred Parameters (uniform priors in log10 space):

  • N_ANC: [3.5, 6.5] - log10 ancestral population size

  • N_CO: [3.5, 6.5] - log10 Cameroonian population size

  • N_FR0: [3.5, 6.5] - log10 French population size after split

  • N_FR1: [3.5, 6.5] - log10 French population size after growth

  • T: [3, 6] - log10 split time

  • m_CO_FR: [-8, -3] - log10 migration rate from Cameroonian to French

  • m_FR_CO: [-8, -3] - log10 migration rate from French to Cameroonian

Example Configuration:

simulator:
  class_name: DroMel_CO_FR
  sequence_length: 2e6
  samples:
    CO: 20
    FR: 20
default_config = {'N_ANC': [3.5, 6.5], 'N_CO': [3.5, 6.5], 'N_FR0': [3.5, 6.5], 'N_FR1': [3.5, 6.5], 'T': [3, 6], 'haploid': True, 'm_CO_FR': [-8, -3], 'm_FR_CO': [-8, -3], 'mutation_rate': 5.49e-09, 'recombination_rate': [5e-09, 3e-08], 'samples': {'CO': 10, 'FR': 10}, 'sequence_length': 1000000.0}

AraTha_2epoch

class workflow.scripts.ts_simulators.AraTha_2epoch(config: dict)[source]

Bases: BaseSimulator

Simulate the African2Epoch_1H18 model from stdpopsim for Arabidopsis thaliana. The model consists of a single population that undergoes a size change.

Simulates the African2Epoch_1H18 model from stdpopsim for Arabidopsis thaliana.

Model Description:

A single population model with a size change event:

  1. Ancestral population of size N_A

  2. Size change to nu * N_A at time T * 2 * N_A generations ago

Fixed Parameters:

  • samples: {“SouthMiddleAtlas”: 10}

  • sequence_length: 10e6

  • recombination_rate: 1.5e-8

  • mutation_rate: 1.5e-8

Inferred Parameters (uniform priors):

  • nu: [0.01, 1] - Ratio of current to ancestral population size

  • T: [0.01, 1.5] - Time of size change (scaled by 2*N_A)

Example Configuration:

simulator:
  class_name: AraTha_2epoch
  sequence_length: 1e7
  samples:
    SouthMiddleAtlas: 15
default_config = {'T': [0.01, 1.5], 'nu': [0.01, 1], 'samples': {'SouthMiddleAtlas': 10}, 'sequence_length': 10000000.0}
model = 'African2Epoch_1H18'
species = 'AraTha'

VariablePopulationSize

class workflow.scripts.ts_simulators.VariablePopulationSize(config: dict)[source]

Bases: BaseSimulator

Simulate a model with varying population size across multiple time windows. The model consists of a single population that undergoes multiple size changes.

Simulates a single population with multiple size changes across time windows.

Model Description:

  1. Single population with size changes at fixed time intervals

  2. Time points are logarithmically spaced using np.logspace(2, log10(max_time), num_time_windows) with the first time point set to 0

  3. Population sizes are sampled independently for each time window (in log10 space)

  4. Ensures minimum number of SNPs (400) after MAF filtering

Fixed Parameters:

  • samples: {“pop”: 10}

  • sequence_length: 10e6

  • mutation_rate: 1.5e-8

  • num_time_windows: 3

  • max_time: 100000

  • time_rate: 0.1

Inferred Parameters (uniform priors):

  • N_0, N_1, ..., N_{n-1}: log10 population sizes sampled between log10(pop_sizes[0]) and log10(pop_sizes[1]) (defaults: 2–5, corresponding to 1e2–1e5)

  • recomb_rate: [1e-9, 2e-8] - Recombination rate

The parameter vector returned by this simulator stores log10 population sizes; exponentiate to obtain sizes in diploid individuals.

Example Configuration:

simulator:
  class_name: VariablePopulationSize
  num_time_windows: 5
  pop_sizes: [1e3, 1e6]
  max_time: 50000
default_config = {'max_time': 100000, 'mutation_rate': 1.5e-08, 'num_time_windows': 3, 'pop_sizes': [100.0, 100000.0], 'recomb_rate': [1e-09, 2e-08], 'samples': {'pop': 10}, 'sequence_length': 10000000.0, 'time_rate': 0.1}

DependentVariablePopulationSize

class workflow.scripts.ts_simulators.DependentVariablePopulationSize(config: dict)[source]

Bases: BaseSimulator

Simulate a population with variable population size across multiple time windows, with each population size dependent on the previous one. The model consists of a single population that undergoes multiple size changes.

Simulates a single population with correlated size changes across time windows. Unlike VariablePopulationSize, each population size depends on the previous one, creating a correlated random walk in log-space.

Model Description:

  1. Single population with 21 time windows (by default)

  2. Time points are exponentially spaced using: ((exp(log(1 + rate*max_time) * i/(n-1)) - 1) / rate)

  3. First population size N_0 is sampled uniformly in log10 space

  4. Subsequent sizes follow: N_i = N_{i-1} * 10^beta, where beta is sampled from [-1, 1]

  5. Population sizes are clamped to stay within the specified bounds

  6. Ensures minimum number of SNPs (400) after MAF filtering

Fixed Parameters:

  • samples: {“pop0”: 25}

  • sequence_length: 2e6

  • mutation_rate: 1e-8

  • num_time_windows: 21

  • maf: 0.05

  • max_time: 130000

  • time_rate: 0.1

Inferred Parameters:

  • N_0, N_1, ..., N_{n-1}: log10 population sizes (defaults span log10(1e2)–log10(1e5)); correlations are induced via pop_changes increments sampled uniformly in log10 space

  • recomb_rate: [1e-9, 1e-8] - Recombination rate

The simulator returns log10-transformed population sizes; exponentiate them (base 10) to recover effective population sizes in standard units.

Example Configuration:

simulator:
  class_name: DependentVariablePopulationSize
  num_time_windows: 21
  pop_sizes: [1e2, 1e5]
  pop_changes: [-0.5, 0.5]
  max_time: 100000
default_config = {'maf': 0.05, 'max_time': 130000, 'mutation_rate': 1e-08, 'num_time_windows': 21, 'pop_changes': [-1, 1], 'pop_sizes': [100.0, 100000.0], 'recomb_rate': [1e-09, 1e-08], 'samples': {'pop0': 25}, 'sequence_length': 2000000.0, 'time_rate': 0.1}

recombination_rate

class workflow.scripts.ts_simulators.recombination_rate(config: dict)[source]

Bases: BaseSimulator

Simulate a one population model where recombination rate varies among replicates. The prior is a beta distribution shifted/scaled to a given interval (by default, a noninformative beta).

Simple constant-size population model with variable recombination rate.

Model Description:

Single population of constant size with recombination rate as the only inferred parameter.

Fixed Parameters:

  • samples: {0: 10}

  • sequence_length: 1e6

  • mutation_rate: 1.5e-8

  • pop_size: 1e4

Inferred Parameters (uniform priors):

  • recombination_rate: [0, 1e-8]

Example Configuration:

simulator:
  class_name: recombination_rate
  pop_size: 5e4
  sequence_length: 5e6
default_config = {'mean_and_dispersion': [0.5, 0.5], 'mutation_rate': 1.5e-08, 'pop_size': 10000.0, 'recombination_rate': [0, 1e-08], 'samples': {0: 10}, 'sequence_length': 1000000.0}

Usage in Configuration Files

Simulators are specified in the workflow configuration YAML files:

simulator:
  class_name: YRI_CEU  # Name of the simulator class
  sequence_length: 5e6
  samples:
    YRI: 20
    CEU: 20

The class_name must match one of the available simulator classes. All other keys under simulator must correspond to attributes listed in the simulator’s default_config; values provided here override those defaults. Supplying unused keys will raise an error during configuration parsing.

Creating Custom Simulators

To create a custom simulator:

  1. Inherit from BaseSimulator

  2. Define default_config with fixed and random parameters

  3. Set up the parameter prior in __init__

  4. Implement __call__ to return (tree_sequence, parameters)

Example:

class MySimulator(BaseSimulator):
    default_config = {
        # Fixed parameters
        "samples": {"pop1": 10},
        "sequence_length": 1e6,
        # Random parameters (ranges)
        "pop_size": [1e3, 1e5],
    }

    def __init__(self, config: dict):
        super().__init__(config, self.default_config)
        self.parameters = ["pop_size"]
        self.prior = BoxUniform(
            low=torch.tensor([self.pop_size[0]]),
            high=torch.tensor([self.pop_size[1]])
        )

    def __call__(self, seed: int = None):
        torch.manual_seed(seed)
        theta = self.prior.sample().numpy()
        pop_size = theta[0]

        # Simulate tree sequence
        ts = msprime.sim_ancestry(...)
        ts = msprime.sim_mutations(...)

        return ts, theta

Technical Notes

  • All simulators use msprime for coalescent simulations

  • Tree sequences include both topology and mutations

  • Random seeds ensure reproducibility across runs

  • Parameters are sampled from uniform priors (BoxUniform)

  • The returned parameter vector matches the order in self.parameters