I’m a fifth-year PhD candidate at UC Davis (graduating December
2026), working with Professor
Jason Lowe-Power on hardware/software co-design for memory systems.
My recent work, Pickle, introduces a data prefetcher
that interweaves the flexibility of software with the efficiency of
specialized hardware: software decides what to prefetch, and hardware
manages the prefetch requests. Essentially, this is how a software
engineer would build a data prefetcher.
I have an extensive background in hardware architecture and software
development. I’ve contributed to the gem5 simulator for 6 years at UC
Davis, and I’ve interned at Google as a software engineer and at AMD as
a researcher.
Research Interests: I began my PhD focused on the
inevitable address translation bottlenecks in
scatter/gather operations of vector
architectures. Over time, I realized this was fundamentally a
data prefetching problem. This insight led to Pickle, a hardware/software
co-design data prefetcher for irregular memory accesses.
Research
My research focuses on building robust tools for hardware modeling
and identifying the optimal interface between hardware and software.
[Pickle Prefetcher] I lead the
development of Pickle, a last-level cache prefetcher for irregular
memory accesses, in collaboration with AMD Research. In our paradigm,
software provides prefetch kernels that generate requests, while
hardware handles the execution to deliver them in a timely manner.
Because the kernels are real code rather than inferred patterns, they
can precisely prefetch for hash-indexed lookups, predicated traversals,
and other accesses that have long resisted conventional hardware
prefetchers, a direction the field’s decades-long focus on pattern
recognition has left underexplored. On graph analytics workloads, Pickle
delivers significant speedups at only 2% DRAM traffic overhead as 99% of
prefetches are timely and accurate.
This project is named after my first cat, Pickle. She loves to play
fetch, tends to take off before I throw the ball (prefetching),
and ignores the bad throws (conditional prefetching).
[Choreographer] I drive the development of
Choreographer, a gem5-based framework enabling hardware/software
co-design for near-cache accelerators, in collaboration with AMD
Research. Because these accelerators touch every layer of the
hardware/software stack, evaluating them demands full-system visibility.
Choreographer provides exactly that: it models the accelerator alongside
a cluster of high-performance out-of-order CPUs, a chiplet-based on-chip
network with a fully detailed MOESI coherence protocol via gem5’s CHI,
and the complete software stack, all in full-system simulation.
Pickle is built on Choreographer. I extended the framework to track
the source of every cache miss and measure prefetch usefulness across
the system.
Summer 2025: I profiled and analyzed Borglet’s CPU scheduling on AMD
chips.
Summer 2024: I built a pre-RTL area estimation model for the XLS project. The area model is
used to guide pre-RTL optimizations of certain designs.
[AMD Research]:
Summer 2023: I built the Choreographer framework.
Previous Work
As an undergrad in Prof. Ian
Davidson’s lab, I collaborated with Zilong Bai on graph-based
unsupervised feature selection, which results in a
SIGKDD paper.
Teaching
I strongly believe that student engagement in classroom and research
comes from a deep understanding of the problem, coupled with a fluency
in using tools (e.g., using software, using facts, and using
abstractions) for problem-solving. The recent emergence of LLMs makes
this foundational understanding and command of tools even more
essential, enabling students to formulate the right
questions for AI assistants and verify the
answers.
Bootcamp Instructor, gem5 Bootcamp, UC Davis (Summer 2022).