GAMUT: A genomics big data management tool
Efficient analysis of Single Nucleotide Polymorphisms (SNPs) across genomic samples enable in deciphering the relationship between genotype and phenotype. The core principle behind SNP comparison is to arrive at a probable list of variants that can differentiate two sets of data (populations). Such SNPs have direct applications in array design, genotype imputation and in cataloging of variants in regions of interest. We have developed GAMUT (Genomics bigdAta Management Tool), a big data-based solution for efficient run-time comparison of SNPs across large datasets based on partition of samples belonging to different populations taking into account user-defined splits. The tool is based on client-server architecture with MongoDB at the back-end and JSF with PrimeFaces as the front-end. It is readily deployable on wild-fly server as well as a docker container. Spark-based parallel data uploader enables optimal loading times. GAMUT enables dynamic querying of the large datasets consisting of multiple samples using text-based, chromosome position-based as well as gene-name based options. Various charting options like bar and pie charts along with tabular formats are available to ease the analysis of the queried data. The resultant data pertaining to comparison of genome-wide SNPs can also be downloaded in different formats like text, html, json for further stand-alone analysis. GAMUT is available for download at: https://github.com/bioinformatics-cdac/gamut