Detecting Selective Sweeps in the Genomic Era: From Classical Statistics to Deep Learning
Sanchit Pal Singh *
Indian Veterinary Research Institute, Izatnagar, India.
Shruti Gupta
Indian Veterinary Research Institute, Izatnagar, India.
Varinder Singh Raina
National Animal Resource Facility for Biomedical Research, Hyderabad, India.
*Author to whom correspondence should be addressed.
Abstract
Selective sweeps are genomic signatures that arises when a beneficial allele rapidly increases in frequency in the population and reduces diversity in the surrounding chromosomal region. Since the hitchhiking effect was formalised by Smith and Haigh in 1974, methods for detecting these signatures have evolved substantially: from classical neutrality tests such as Tajima's D and Fay and Wu's H, through haplotype based statistics including EHH, iHS, and H12, to contemporary machine learning frameworks capable of capturing subtle, ancient, and structurally complex sweep patterns. This review offers a comparative and integrative synthesis of this methodological progression, situating two recent deep learning approaches FlexSweep and the Domain Adaptive Neural Network (DANN) for sweep detection. FlexSweep advances versatility by integrating multiple complementary summary statistics across nested genomic scales, enabling reliable detection of diverse sweep types in modern genomic datasets. DANN addresses the technically demanding challenge of sweep detection in ancient DNA by incorporating domain adaptation, which actively corrects for systematic discrepancies between simulated training data and empirical genomic data. Rather than competing tools, FlexSweep and DANN serve some complementary purposes, with the optimal choice depending on data type and biological context. Persistent challenges across the field such as demographic confounding, simulation misspecification, and dependence on high quality phased data are critically examined. Accurate and scalable sweep detection has broad implications for medical genetics, animal and plant breeding, conservation genomics, and the study of evolutionary adaptation. As reference genomes and long read sequencing datasets expand across diverse taxa, the continued development of demographically aware, transferable deep learning frameworks will be central to unlocking the genetic basis of adaptation in non-model organisms and livestock populations.
Keywords: Selection signature, selective sweep, hitchhiking, FlexSweep, DANN