Analysis-ready, cloud-optimized (ARCO) data formats allow us to leverage elastic scaling and distributed parallel processing to dramatically accelerate scientific discovery. In this live demo, we will implement these principles to analyze and visualize a massive sea surface height dataset from the Copernicus Marine Environment Monitoring Service (CMEMS). The presentation will feature Pangeo's open-source software stack including Jupyter, Xarray, Dask, and Zarr and will conclude with a look at Pangeo Forge, a new open-source toolkit for transforming your own data from archival formats into ARCO data stores.

Author Bio:

Charles Stern is a Data Infrastructure Engineer in the Ocean Transport Group at Lamont-Doherty Earth Observatory. His work focuses on Pangeo Forge, an open source tool for data Extraction, Transformation, and Loading (ETL). The goal of Pangeo Forge is to make it easy to extract data from traditional data repositories and deposit it in cloud object storage in analysis-ready, cloud-optimized (ARCO) format. He is endlessly curious about elegant, open-source tools that help us understand our changing planet.

For further information:

RRoCCET21 is a conference that was held virtually by CloudBank from August 10th through 12th, 2021. Its intention is to inspire you to consider utilizing the cloud in your research, by way of sharing the success stories of others. We hope the proceedings, of which this case study is a part, give you an idea of what is possible and act as a “recipe book” for mapping powerful computational resources onto your own field of inquiry.