In 2014, the Wall Lab at Stanford University sought to answer one of the most pressing questions in neuroscience: What genes influence autism spectrum disorder (ASD)? According to the Centers for Disease Control (CDC), this neurodevelopmental disorder affects roughly one in 54 children in America and is on the rise—nearly tripling since 1992.

To tease out more of the genetics, the Wall Lab combined a time-honored approach with next-generation sequencing and high performance computing (HPC) resources. The researchers opted for an experimental design known as a linkage study. This project was a strong candidate for cloud-based storage and processing due to the magnitude of genomic data.

Using resizable compute capacity in the cloud with Amazon Elastic Compute Cloud (Amazon EC2), the Wall Lab analyzed genetic variation across all 4,610 samples in parallel. By another Stanford lab’s estimation, a similar workload on an on-premises high performance computing cluster might have taken four times longer. The Wall Lab used a serverless analytics tool, Amazon Athena, to quickly query groups of genetic variants using standard SQL by sample, or family.

The Wall Lab shares their data upon request. Approved users receive access to the iHART dataset stored in Amazon Simple Storage Service (Amazon S3), as well as an Amazon Machine Image pre-loaded with the Wall Lab’s processing and analytic pipelines. Information and directions to access the iHart dataset can be explored on the Registry of Open Data on AWS.

Learn more about the AWS Open Data Registry on the AWS Public Sector Blog.

Amazon Web Services