RRoCCET21: Wildfire modeling with TPUs

Abstract:

Increasing rates of catastrophic wildfires has become a major concern for the economy and environment globally. Nearly 8 billion tons of CO2 is released by wildfires every year, which accounts for ~10% of the global CO2 emissions. Additionally, wildfires cause an annual loss of 220 billion dollars and 775 thousand residences in the US. It has been shown that high fidelity numerical simulations are useful for investigating and predicting wildfires. These simulations provide insights for fire risk assessment and mitigation, fire management and firefighting strategy optimization, and land-use and wildland-urban interface planning. These simulations are also potentially useful for generating synthetic data, which can be used in developing improved fire prediction models (including ML based ones). We present a computational fluid dynamics (CFD) simulation framework based on clusters with Tensor Processing Units (TPU). In these clusters, up to 1024 chips are connected through a dedicated high-speed,low-latency, two-dimensional toroidal inter core interconnect (ICI) network, which provides higher computational efficiency than a CPU based high performance computing framework.The simulation framework solves the 3D Navier-Stokes equation along with constitutive models for combustion, pyrolysis, heat-transfer, and other thermochemical processes. With this framework, we have been able to produce a high fidelity wildfire dataset based on the FireFlux II experimental setup. This dataset considers a broad range of physical factors including wind speed, terrains, fuel density, and moisture. Additionally, we have simulated the full event of the Tubbs fire that happened historically in California in 2017. These high fidelity simulations are generated by the TPU simulation framework at a significantly lower cost than by CPU based CFD softwares, which provides us a foundation for studying the physical insights of wildfire propagations scientifically.

Case Study Summary:

The scientific problem we tackled:
We want to investigate wildfire propagation at the unprecedented resolution, potentially coupling the local cloud etc weather phonenoma

We implemented physical models that are based on first-principles by solving governing conservation equations for mass, momentum, energy, species transport in conjunction with constitutive models for combustion, pyrolysis, heat-transfer, and other thermochemical processes. These models solve for the 3D wildfire behavior that is two-way coupled to the atmospheric flow field dynamics.

Based on this simulation framework, we performed a large scale simulation of the Tubbs fire, which happened historically in California, 2017. This simulation represents a domain of 200 thousand acres mountainous terrain with a mesh of more than 800 million degrees of freedom. The full duration of the fire is simulated with 2.5 days of runtime.
The computational methods we used:
These are finite difference large scale distributed simulation of CFD + thermodynamics + combustion modeling. We are also exploring using ML + non-ML approaches together in understanding the problem.

The set of governing equations is solved on a Cartesian coordinate system employing an equidistant mesh in each direction. All dependent variables are stored on a collocated mesh, and spatial derivatives are discretized using a finite-difference scheme. We use Tensorflow to program the described computational methods on TPU.
The cloud resources we used:
TPU clusters which provide great scalability through the dedicated high bandwidth low-latency inter-chip connections up to 1024 chips.

Our work is conducted on clusters with Tensor Processing Units (TPU). We are using Google internal clusters at present for the convenience of development. The framework we built is transferable to Google cloud.

TPU are application-specific integrated circuits that have been specifically designed for machine-learning applications. A TPU v3, considered in this work, consists of a TPU board with four independent chips. Each chip has two tensor compute cores that are optimized for vectorized operations. Up to 1024 chips are connected through a dedicated high-speed, low-latency, two-dimensional toroidal inter core interconnect (ICI) network, forming a TPU pod.

Each TPU core has 16 GiB on-chip high-bandwidth memory (HBM), and consists of a scalar, vector and two matrix units (MXU), each MXU supporting 128×128 multiply-accumulate (MAC) operations per cycle. Each MXU has a raw peak throughput of 22.5 teraflops.

The TPU board is connected through the PCIe-bus to the host server, which sends the instruction to the TPU for execution. The ICI network directly connects the cores within a TPU pod, so that the communication between TPU cores go directly through this dedicated network without involvement of PCIe-bus and host.

Author Bios:

Dr. Qing Wang joined Google as a software engineer after he received his PhD degree from Stanford University in 2018. His current research focuses on developing computational fluid dynamics libraries based on the tensor processing unit (TPU) infrastructure, which is used to simulate geophysical flow related applications such as wildfires and clouds. Before joining Google, he worked on model development for large-eddy simulations with focuses on multiphase and combustion related problems. Dr. Yi-fan Chen works as a software engineer at Google, leading a team on developing high performance scientific computing libraries and applications. His previous work at Google includes development of recommendation systems and other Google infrastructures. Before joining Google, he worked at Intel and ASML. He got his PhD degree in 2005 in applied physics at Cornell University.

For further information:

https://research.google/research-areas/general-science/

RRoCCET21 is a conference that was held virtually by CloudBank from August 10th through 12th, 2021. Its intention is to inspire you to consider utilizing the cloud in your research, by way of sharing the success stories of others. We hope the proceedings, of which this case study is a part, give you an idea of what is possible and act as a “recipe book” for mapping powerful computational resources onto your own field of inquiry.