GENIE : A Multimodal Foundation Model for Climate Science

GENIE Logo
Teaser figure

Overview

GENIE is a novel foundation model designed to accelerate hypothesis generation for assessing climate risks. Unlike traditional climate models that require months for hypothesis testing, GENIE integrates numerical data (measurements, climate simulations) and text data (research papers, climate reports) to generate scientifically valid hypotheses in a few-shot manner. Key innovations include:

  • Multi-Modal Learning: Uses Transformer architectures to represent both numerical and textual climate data.

  • Scientific Validity: Ensures physical consistency using Physics-Guided Deep Learning (PGDL) and Reinforcement Learning with World-Feedback (RLWF).

  • Uncertainty Quantification Leverages Bayesian Active Learning to assess prediction confidence and optimize experimental design.

Climate scientists currently rely on running ensembles of complex climate models to test their hypotheses, a process hindered by several limitations. First, hypothesis generation takes an excessive amount of time, with a typical climate model run requiring six months from conception to analysis. This significantly slows down research and decision-making. Second, ad-hoc parameterization across multiple global and regional climate models introduces inconsistencies, as these models use different parameterizations that are related in unknown ways. There is no single framework capable of addressing a diverse set of climate-related tasks. Lastly, barriers to knowledge sharing prevent policymakers and resource-constrained communities from accessing integrated datasets and tools that can efficiently generate climate risk scenarios.

GENIE transforms climate risk assessment by reducing hypothesis testing turnaround time from six months to one week, allowing for rapid and iterative scientific exploration. The integration of Bayesian active learning further optimizes data acquisition costs, reducing the need for expensive large-scale climate simulations by at least 30%. Beyond climate science, GENIE has broad applications in forecasting, causal inference, and scenario creation, making it a versatile tool for researchers and policymakers. Additionally, the model has direct implications for national security, assisting the Department of Defense in areas such as strategic military planning, operational preparedness, and infrastructure resilience against climate-induced threats. Faster, more accurate climate risk assessments enabled by GENIE can drive more effective policy interventions, ultimately mitigating economic losses and protecting human lives.

Research

ClimaQA: An Automated Evaluation Framework for Climate Foundation Models ๐Ÿ”—

  • Abstract: The use of Large Language Models (LLMs) in climate science has recently gained significant attention. However, a critical issue remains: the lack of a comprehensive evaluation framework capable of assessing the quality and scientific validity of model outputs. To address this issue, we develop ClimaGen (Climate QA Generator), an adaptive learning framework that generates question-answer pairs from graduate textbooks with climate scientists in the loop. As a result, we present ClimaQA-Gold, an expert-annotated benchmark dataset alongside ClimaQA-Silver>, a large-scale, comprehensive synthetic QA dataset for climate science. Finally, we develop evaluation strategies and compare different LLMs on our benchmarks. Our results offer novel insights into various approaches used to enhance knowledge of climate LLMs.

Adapting While Learning: Grounding LLMs for Scientific Problems with Tool Usage Adaptation ๐Ÿ”—

  • Abstract: Large Language Models (LLMs) demonstrate promising capabilities in solving simple scientific problems but, even with domain-specific fine-tuning, often produce hallucinations for complex ones. While integrating LLMs with tools can mitigate this reliability issue, models finetuned on tool usage only often over-rely on them, incurring unnecessary costs from resource-intensive scientific tools even for simpler problems. Inspired by how human experts assess the complexity of the problem before choosing the solutions, we propose a novel two-component fine-tuning method, Adapting While Learning (AWL). In the first component World Knowledge Learning (WKL), LLMs internalize scientific knowledge by learning from tools-generated solutions. In the second component Tool Usage Adaptation (TUA), we classify questions as easy or hard based on the \firstphaseshort-trained model's accuracy, and train it to maintain direct reasoning for simple problems while switching to tools for challenging ones. We validate our method on 6 scientific benchmark datasets in climate science, epidemiology, and mathematics. Compared to the base 8B model, our trained models achieve 28.27% higher answer accuracy and 13.76% better tool usage accuracy, even surpassing state-of-the-art models including GPT-4o and Claude-3.5 on 4 custom-created datasets.

Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs ๐Ÿ”—

  • Abstract: Accurate uncertainty quantification of large language models (LLMs) provides credibility measure over their outputs. However, fine-tuned LLMs often struggle with overconfidence in uncertain predictions due to the limitations in the models' ability to generalize with limited data. Existing parameter efficient fine-tuning (PEFT) uncertainty quantification methods for LLMs focus on post fine-tuning stage and fall short of calibrating epistemic uncertainty. To address these limitations, we propose Functional-Level Uncertainty Quantification for Calibrated Fine-Tuning (UQ4CT), which captures and calibrates epistemic uncertainty over the space of functions that map input prompts to outputs. We implement UQ4CT during the fine-tuning stage via a mixture-of-experts framework that hierarchically decomposes the functional space. We demonstrate that UQ4CT reduces Expected Calibration Error (ECE) by more than 25% while maintaining high accuracy across 5 benchmarks. Even under distribution shift, UQ4CT maintains superior ECE performance with high accuracy, showcasing improved generalizability.

Demo

Coming Soon

Team

photo of Veera
Taylor Berg-kirkpatrick

Associate Professor
UC San Diego, CSE

tberg@ucsd.edu
photo of Veera
Yian Ma

Assistant Professor
UC San Diego, HDSI

yianma@ucsd.edu
photo of Duncan Watson-Parris
Duncan Watson-Parris

Assistant Professor
UC San Diego, SIO

dwatsonparris@ucsd.edu
photo of Rose Yu
Rose Yu

Associate Professor
UC San Diego, CSE

roseyu@ucsd.edu
photo of Veera
Sumanth Varambally

PhD, HDSI
UC San Diego

svarambally@ucsd.edu
photo of Veera
Veeramakali Vignesh Manivannan

MS, CSE
UC San Diego

vmanivannan@ucsd.edu
photo of Yasaman
Yasaman Jafari

PhD, CSE
UC San Diego

yajafari@ucsd.edu
photo of Salva
Salva Rรผhling Cachay

PhD, CSE
UC San Diego

sruhlingcachay@ucsd.edu
photo of Brooks
Brooks (Ruijia) Niu

MS, CSE
UC San Diego

rniu@ucsd.edu
photo of Srikar
Srikar Eranky

BS, CSE
UC San Diego

seranky@ucsd.edu
photo of Spencer
Spencer Ho

BS, CSE
UC San Diego

s8ho@ucsd.edu
photo of Zachary
Zachary Novak

PhD, CSE
UC San Diego

znovack@ucsd.edu