Simulators
The simulator module provides classes for generating population genetic simulations under various demographic models. These simulators produce tree sequences that serve as the foundation for the simulation-based inference workflow.
Overview
Simulators in popgen-npe are responsible for:
Generating tree sequences under specified demographic models
Sampling parameters from prior distributions
Simulating both ancestry and mutations
Ensuring consistent random seeding for reproducibility
All simulators inherit from BaseSimulator and follow a consistent interface, making them interchangeable within the workflow. Unless otherwise noted, each parameter listed below is sampled uniformly over the given bounds and stored exactly as drawn in the targets array. Models that work in log10 space (e.g., the variable population-size simulators) return log10-transformed population sizes so downstream analyses must exponentiate them when real sizes are required.
Base Simulator
- class workflow.scripts.ts_simulators.BaseSimulator(config: dict, default: dict)[source]
Bases:
objectBase class for all simulators. Handles configuration parsing and default parameter assignment.
Parameters:
config (dict) – Configuration dictionary with simulator parameters
default (dict) – Default parameter values
Available Simulators
YRI_CEU
- class workflow.scripts.ts_simulators.YRI_CEU(config: dict)[source]
Bases:
BaseSimulatorSimulate a model defined in dadi’s manual (https://dadi.readthedocs.io/en/latest/examples/YRI_CEU/YRI_CEU/). Ancestral population changes in size; splits into YRI/CEU; CEU population undergoes a bottleneck upon splitting and subsequently grows exponentially. There is continuous symmetric migration between YRI and CEU.
Simulates the demographic model from the dadi manual for YRI (Yoruba) and CEU (European) populations.
Model Description:
Ancestral population of size N_A
Split into YRI and CEU populations at time Tp + T
CEU undergoes bottleneck (N_CEU_initial) and exponential growth to N_CEU_final
YRI maintains constant size N_YRI
Continuous symmetric migration M between populations
Fixed Parameters:
samples: {“YRI”: 10, “CEU”: 10}sequence_length: 10e6recombination_rate: 1.5e-8mutation_rate: 1.5e-8
Inferred Parameters (uniform priors):
N_A: [1e2, 1e5] - Ancestral population sizeN_YRI: [1e2, 1e5] - YRI population sizeN_CEU_initial: [1e2, 1e5] - CEU bottleneck sizeN_CEU_final: [1e2, 1e5] - CEU final sizeM: [0, 5e-4] - Migration rateTp: [0, 6e4] - Time before splitT: [0, 6e4] - Time of split
Example Configuration:
simulator: class_name: YRI_CEU sequence_length: 5e6 samples: YRI: 20 CEU: 20
- default_config = {'M': [0, 0.0005], 'N_A': [100.0, 100000.0], 'N_CEU_final': [100.0, 100000.0], 'N_CEU_initial': [100.0, 100000.0], 'N_YRI': [100.0, 100000.0], 'T': [0, 60000.0], 'Tp': [0, 60000.0], 'mutation_rate': 1.5e-08, 'recombination_rate': 1.5e-08, 'samples': {'CEU': 10, 'YRI': 10}, 'sequence_length': 10000000.0}
DroMel_CO_FR
- class workflow.scripts.ts_simulators.DroMel_CO_FR(config: dict)[source]
Bases:
BaseSimulatorSimulate a two-population isolation-with-migration model for Drosophila melanogaster, Congolese and French populations, with growth in the French population
Simulates a two-population isolation-with-migration model for Drosophila melanogaster, with Cameroonian (CO) and French (FR) populations, including exponential growth in the French population.
Model Description:
Ancestral population of size N_ANC
Split into CO and FR populations at time T
FR population undergoes exponential growth from N_FR0 to N_FR1
CO population maintains constant size N_CO
Asymmetric migration between populations (m_CO_FR, m_FR_CO)
Fixed Parameters:
samples: {“CO”: 10, “FR”: 10}sequence_length: 1e6recombination_rate: [0.5e-8, 3.0e-8] (sampled uniformly)mutation_rate: 5.49e-9haploid: True
Inferred Parameters (uniform priors in log10 space):
N_ANC: [3.5, 6.5] - log10 ancestral population sizeN_CO: [3.5, 6.5] - log10 Cameroonian population sizeN_FR0: [3.5, 6.5] - log10 French population size after splitN_FR1: [3.5, 6.5] - log10 French population size after growthT: [3, 6] - log10 split timem_CO_FR: [-8, -3] - log10 migration rate from Cameroonian to Frenchm_FR_CO: [-8, -3] - log10 migration rate from French to Cameroonian
Example Configuration:
simulator: class_name: DroMel_CO_FR sequence_length: 2e6 samples: CO: 20 FR: 20
- default_config = {'N_ANC': [3.5, 6.5], 'N_CO': [3.5, 6.5], 'N_FR0': [3.5, 6.5], 'N_FR1': [3.5, 6.5], 'T': [3, 6], 'haploid': True, 'm_CO_FR': [-8, -3], 'm_FR_CO': [-8, -3], 'mutation_rate': 5.49e-09, 'recombination_rate': [5e-09, 3e-08], 'samples': {'CO': 10, 'FR': 10}, 'sequence_length': 1000000.0}
AraTha_2epoch
- class workflow.scripts.ts_simulators.AraTha_2epoch(config: dict)[source]
Bases:
BaseSimulatorSimulate the African2Epoch_1H18 model from stdpopsim for Arabidopsis thaliana. The model consists of a single population that undergoes a size change.
Simulates the African2Epoch_1H18 model from stdpopsim for Arabidopsis thaliana.
Model Description:
A single population model with a size change event:
Ancestral population of size N_A
Size change to nu * N_A at time T * 2 * N_A generations ago
Fixed Parameters:
samples: {“SouthMiddleAtlas”: 10}sequence_length: 10e6recombination_rate: 1.5e-8mutation_rate: 1.5e-8
Inferred Parameters (uniform priors):
nu: [0.01, 1] - Ratio of current to ancestral population sizeT: [0.01, 1.5] - Time of size change (scaled by 2*N_A)
Example Configuration:
simulator: class_name: AraTha_2epoch sequence_length: 1e7 samples: SouthMiddleAtlas: 15
- default_config = {'T': [0.01, 1.5], 'nu': [0.01, 1], 'samples': {'SouthMiddleAtlas': 10}, 'sequence_length': 10000000.0}
- model = 'African2Epoch_1H18'
- species = 'AraTha'
VariablePopulationSize
- class workflow.scripts.ts_simulators.VariablePopulationSize(config: dict)[source]
Bases:
BaseSimulatorSimulate a model with varying population size across multiple time windows. The model consists of a single population that undergoes multiple size changes.
Simulates a single population with multiple size changes across time windows.
Model Description:
Single population with size changes at fixed time intervals
Time points are logarithmically spaced using
np.logspace(2, log10(max_time), num_time_windows)with the first time point set to 0Population sizes are sampled independently for each time window (in log10 space)
Ensures minimum number of SNPs (400) after MAF filtering
Fixed Parameters:
samples: {“pop”: 10}sequence_length: 10e6mutation_rate: 1.5e-8num_time_windows: 3max_time: 100000time_rate: 0.1
Inferred Parameters (uniform priors):
N_0, N_1, ..., N_{n-1}: log10 population sizes sampled betweenlog10(pop_sizes[0])andlog10(pop_sizes[1])(defaults: 2–5, corresponding to 1e2–1e5)recomb_rate: [1e-9, 2e-8] - Recombination rate
The parameter vector returned by this simulator stores log10 population sizes; exponentiate to obtain sizes in diploid individuals.
Example Configuration:
simulator: class_name: VariablePopulationSize num_time_windows: 5 pop_sizes: [1e3, 1e6] max_time: 50000
- default_config = {'max_time': 100000, 'mutation_rate': 1.5e-08, 'num_time_windows': 3, 'pop_sizes': [100.0, 100000.0], 'recomb_rate': [1e-09, 2e-08], 'samples': {'pop': 10}, 'sequence_length': 10000000.0, 'time_rate': 0.1}
DependentVariablePopulationSize
- class workflow.scripts.ts_simulators.DependentVariablePopulationSize(config: dict)[source]
Bases:
BaseSimulatorSimulate a population with variable population size across multiple time windows, with each population size dependent on the previous one. The model consists of a single population that undergoes multiple size changes.
Simulates a single population with correlated size changes across time windows. Unlike
VariablePopulationSize, each population size depends on the previous one, creating a correlated random walk in log-space.Model Description:
Single population with 21 time windows (by default)
Time points are exponentially spaced using:
((exp(log(1 + rate*max_time) * i/(n-1)) - 1) / rate)First population size N_0 is sampled uniformly in log10 space
Subsequent sizes follow: N_i = N_{i-1} * 10^beta, where beta is sampled from [-1, 1]
Population sizes are clamped to stay within the specified bounds
Ensures minimum number of SNPs (400) after MAF filtering
Fixed Parameters:
samples: {“pop0”: 25}sequence_length: 2e6mutation_rate: 1e-8num_time_windows: 21maf: 0.05max_time: 130000time_rate: 0.1
Inferred Parameters:
N_0, N_1, ..., N_{n-1}: log10 population sizes (defaults span log10(1e2)–log10(1e5)); correlations are induced viapop_changesincrements sampled uniformly in log10 spacerecomb_rate: [1e-9, 1e-8] - Recombination rate
The simulator returns log10-transformed population sizes; exponentiate them (base 10) to recover effective population sizes in standard units.
Example Configuration:
simulator: class_name: DependentVariablePopulationSize num_time_windows: 21 pop_sizes: [1e2, 1e5] pop_changes: [-0.5, 0.5] max_time: 100000
- default_config = {'maf': 0.05, 'max_time': 130000, 'mutation_rate': 1e-08, 'num_time_windows': 21, 'pop_changes': [-1, 1], 'pop_sizes': [100.0, 100000.0], 'recomb_rate': [1e-09, 1e-08], 'samples': {'pop0': 25}, 'sequence_length': 2000000.0, 'time_rate': 0.1}
recombination_rate
- class workflow.scripts.ts_simulators.recombination_rate(config: dict)[source]
Bases:
BaseSimulatorSimulate a one population model where recombination rate varies among replicates. The prior is a beta distribution shifted/scaled to a given interval (by default, a noninformative beta).
Simple constant-size population model with variable recombination rate.
Model Description:
Single population of constant size with recombination rate as the only inferred parameter.
Fixed Parameters:
samples: {0: 10}sequence_length: 1e6mutation_rate: 1.5e-8pop_size: 1e4
Inferred Parameters (uniform priors):
recombination_rate: [0, 1e-8]
Example Configuration:
simulator: class_name: recombination_rate pop_size: 5e4 sequence_length: 5e6
- default_config = {'mean_and_dispersion': [0.5, 0.5], 'mutation_rate': 1.5e-08, 'pop_size': 10000.0, 'recombination_rate': [0, 1e-08], 'samples': {0: 10}, 'sequence_length': 1000000.0}
Usage in Configuration Files
Simulators are specified in the workflow configuration YAML files:
simulator: class_name: YRI_CEU # Name of the simulator class sequence_length: 5e6 samples: YRI: 20 CEU: 20The
class_namemust match one of the available simulator classes. All other keys undersimulatormust correspond to attributes listed in the simulator’sdefault_config; values provided here override those defaults. Supplying unused keys will raise an error during configuration parsing.
Creating Custom Simulators
To create a custom simulator:
Inherit from
BaseSimulatorDefine
default_configwith fixed and random parametersSet up the parameter prior in
__init__Implement
__call__to return (tree_sequence, parameters)
Example:
class MySimulator(BaseSimulator):
default_config = {
# Fixed parameters
"samples": {"pop1": 10},
"sequence_length": 1e6,
# Random parameters (ranges)
"pop_size": [1e3, 1e5],
}
def __init__(self, config: dict):
super().__init__(config, self.default_config)
self.parameters = ["pop_size"]
self.prior = BoxUniform(
low=torch.tensor([self.pop_size[0]]),
high=torch.tensor([self.pop_size[1]])
)
def __call__(self, seed: int = None):
torch.manual_seed(seed)
theta = self.prior.sample().numpy()
pop_size = theta[0]
# Simulate tree sequence
ts = msprime.sim_ancestry(...)
ts = msprime.sim_mutations(...)
return ts, theta
Technical Notes
All simulators use msprime for coalescent simulations
Tree sequences include both topology and mutations
Random seeds ensure reproducibility across runs
Parameters are sampled from uniform priors (BoxUniform)
The returned parameter vector matches the order in
self.parameters