API Reference
=================

This section provides detailed API documentation for the popgen-npe package modules.

Tree Sequence Processors
------------------------

The ``ts_processors`` module transforms tree sequences into tensor representations for neural networks.

.. currentmodule:: workflow.scripts.ts_processors

BaseProcessor
~~~~~~~~~~~~~

.. autoclass:: BaseProcessor
   :members:
   :undoc-members:
   :show-inheritance:

   Base class for all processors. Handles configuration and default parameters.

genotypes_and_distances
~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: genotypes_and_distances
   :members:
   :undoc-members:
   :show-inheritance:

   Extracts genotype matrix with inter-SNP distances.

cnn_extract
~~~~~~~~~~~

.. autoclass:: cnn_extract
   :members:
   :undoc-members:
   :show-inheritance:

   Feature extractor for CNN architectures using dinf's HaplotypeMatrix.

tskit_sfs
~~~~~~~~~

.. autoclass:: tskit_sfs
   :members:
   :undoc-members:
   :show-inheritance:

   Computes site frequency spectra for single or multiple populations.

tskit_windowed_sfs_plus_ld
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: tskit_windowed_sfs_plus_ld
   :members:
   :undoc-members:
   :show-inheritance:

   Combines windowed SFS with linkage disequilibrium statistics.

SPIDNA_processor
~~~~~~~~~~~~~~~~

.. autoclass:: SPIDNA_processor
   :members:
   :undoc-members:
   :show-inheritance:

   Processor specifically designed for SPIDNA embedding networks.

ReLERNN_processor
~~~~~~~~~~~~~~~~~

.. autoclass:: ReLERNN_processor
   :members:
   :undoc-members:
   :show-inheritance:

   Processor for ReLERNN architecture with phased genotype requirements.

Embedding Networks
------------------

The ``embedding_networks`` module provides neural network architectures that process tensor outputs from processors.

.. currentmodule:: workflow.scripts.embedding_networks

RNN
~~~

.. autoclass:: RNN
   :members:
   :undoc-members:
   :show-inheritance:

   A recurrent neural network using bidirectional GRU layers for processing sequential genetic data.

   **Parameters:**
   
   - **input_size** (*int*) -- The input size of the GRU layer (e.g., num_individuals * ploidy)
   - **output_size** (*int*) -- The dimension of the output feature vector
   - **num_layers** (*int, optional*) -- Number of GRU layers (default: 2)
   - **dropout** (*float, optional*) -- Dropout probability (default: 0.0)

   **Architecture:**
   
   - Bidirectional GRU with configurable layers
   - MLP head with dropout for final embedding

ExchangeableCNN
~~~~~~~~~~~~~~~

.. autoclass:: ExchangeableCNN
   :members:
   :undoc-members:
   :show-inheritance:

   Implements the Exchangeable CNN (permutation-invariant CNN) from Chan et al. 2018.
   This architecture builds in invariance to permutations of individuals in haplotype matrices.

   **Parameters:**
   
   - **output_dim** (*int, optional*) -- Dimension of the final output vector (default: 64)
   - **input_rows** (*list of int, optional*) -- Number of rows (samples) per population
   - **input_cols** (*list of int, optional*) -- Number of cols (SNPs) per population
   - **channels** (*int, optional*) -- Number of input channels (default: 2)
   - **symmetric_func** (*str, optional*) -- Symmetric pooling function: "max", "mean", or "sum" (default: "max")

   **Architecture:**
   
   - Two CNN layers with 2D convolutions (kernel heights = 1)
   - ELU activation and batch normalization
   - Symmetric pooling layer for permutation invariance
   - Global average pooling
   - Feature extractor MLP

   **Notes:**
   
   - Supports multiple populations with different dimensions
   - Automatically masks padded values (-1) when processing multiple populations
   - First CNN layer uses wider kernel and stride for long-range LD capture

SummaryStatisticsEmbedding
~~~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: SummaryStatisticsEmbedding
   :members:
   :undoc-members:
   :show-inheritance:

   Identity embedding layer for pre-computed summary statistics.

   **Parameters:**
   
   - **output_dim** (*int, optional*) -- Not used, maintained for API consistency

   **Input Formats:**
   
   - Single population SFS: shape (num_samples + 1,)
   - Joint SFS: shape (num_samples_pop1 + 1, num_samples_pop2 + 1)

   **Notes:**
   
   - Simply passes through pre-computed summary statistics
   - Automatically flattens multi-dimensional statistics
   - Converts numpy arrays to torch tensors if needed

SPIDNA_embedding_network
~~~~~~~~~~~~~~~~~~~~~~~~

.. autoclass:: SPIDNA_embedding_network
   :members:
   :undoc-members:
   :show-inheritance:

   SPIDNA (Spatially-aware Population genomics with Deep neural Networks) architecture
   for processing genetic data with positional information.

   **Parameters:**
   
   - **output_dim** (*int, optional*) -- Dimension of output features (default: 64)
   - **num_block** (*int, optional*) -- Number of SPIDNA blocks (default: 3)
   - **num_feature** (*int, optional*) -- Number of convolutional features (default: 32)

   **Architecture:**
   
   - Separate convolutional processing for position and SNP data
   - Sequential SPIDNA blocks with residual connections
   - Progressive feature aggregation across blocks

   **Input Format:**
   
   - Shape: (batch, channels, samples, snps)
   - Channel 0: positional information
   - Channels 1+: SNP/haplotype data

ReLERNN
~~~~~~~

.. autoclass:: ReLERNN
   :members:
   :undoc-members:
   :show-inheritance:

   ReLERNN architecture following the design from https://github.com/kr-colab/ReLERNN/.
   Combines recurrent processing of haplotypes with positional information.

   **Parameters:**
   
   - **input_size** (*int*) -- Input size for GRU (num_individuals * ploidy)
   - **n_snps** (*int*) -- Number of SNPs in the input data
   - **output_size** (*int, optional*) -- Output embedding dimension (default: 64)
   - **shuffle_genotypes** (*bool, optional*) -- Shuffle genotypes during training (default: False)

   **Architecture:**
   
   - Bidirectional GRU for haplotype processing
   - Separate linear layer for positional encoding
   - Concatenated features passed through MLP
   - Dropout for regularization

   **Input Format:**
   
   - Shape: (batch, sequence_length, 1 + input_size)
   - First feature: positional data
   - Remaining features: haplotype representation

Supporting Classes
~~~~~~~~~~~~~~~~~~

.. autoclass:: SymmetricLayer
   :members:
   :undoc-members:
   :show-inheritance:

   Permutation-invariant pooling layer.

   **Parameters:**
   
   - **axis** (*int*) -- Dimension along which to apply the symmetric function
   - **func** (*str, optional*) -- Function type: "max", "mean", or "sum" (default: "max")

.. autoclass:: SPIDNABlock
   :members:
   :undoc-members:
   :show-inheritance:

   Basic building block for SPIDNA architecture.

   **Parameters:**
   
   - **num_feature** (*int*) -- Number of feature channels
   - **output_dim** (*int*) -- Output dimension for feature aggregation

   **Architecture:**
   
   - Convolutional layer with batch normalization
   - Sample-wise averaging for feature extraction
   - Residual connection to output
   - Max pooling for spatial dimension reduction