I’m a fifth-year PhD student at UC Davis (graduating December 2026),
working with Professor
Jason Lowe-Power on hardware/software co-designed data prefetchers
that interweave the flexibility of software with the efficiency of
specialized hardware: software decides what to prefetch and hardware
carries it out. To put it differently: this is how a software engineer
would build a data prefetcher.
I have an extensive background in hardware architecture and software
development. I’ve contributed to the gem5 simulator for 6 years at UC
Davis, and I’ve interned at Google as a software engineer and at AMD as
a researcher.
Research Interests: I started my PhD thinking about
the inevitable address translation bottlenecks in
scatter/gather operations of vector
architectures. After a few years, I realized this has always
been a data prefetching problem. This led to Pickle, a hardware/software
co-design data prefetcher for irregular memory accesses.
Research
My research builds tools for hardware modeling and finds the right
interface between hardware and software.
[Pickle Prefetcher] I lead the
development of Pickle, a last-level cache prefetcher for irregular
memory accesses, in collaboration with AMD Research. In our paradigm,
the software provides prefetch kernels, which generate prefetch requests
that are handled by the hardware to deliver timely prefetches. As the
kernels are real code rather than inferred patterns, they can prefetch
for hash-indexed lookups, predicated traversals, and other accesses that
have long resisted hardware prefetchers, a direction the field’s
decades-long focus on pattern recognition has left underexplored. On
graph analytics workloads, Pickle delivers significant speedups at only
2% DRAM traffic overhead.
This project is named after my first cat, Pickle. She loves to play
fetch, tends to take off before I throw the ball (prefetching),
and ignores the bad throws (conditional prefetching).
[Choreographer] I drive the development of
Choreographer, a gem5-based framework enabling hardware/software
co-design for near-cache accelerators, in collaboration with AMD
Research. Because these accelerators touch every layer of the
hardware/software stack, evaluating them demands full-system visibility.
Choreographer provides exactly that: it models the accelerator alongside
a cluster of high-performance out-of-order CPUs, a chiplet-based on-chip
network with a fully detailed MOESI coherence protocol via gem5’s CHI,
and the complete software stack, all in full-system simulation.
Pickle is built on Choreographer. I extend the framework to track
the source of every cache miss and measure prefetch usefulness across
the system.
Summer 2025: I profiled and analyzed Borglet’s CPU scheduling on AMD
chips.
Summer 2024: I built a pre-RTL area estimation model for the XLS project. The area model is
used to guide pre-RTL optimizations of certain designs.
[AMD Research]: Summer 2023: I built the
Choreographer framework.
Previous Work
As an undergrad in Prof. Ian
Davidson’s lab, I collaborated with Zilong Bai on graph-based
unsupervised feature selection, which results in a
SIGKDD paper.
Teaching
I strongly believe that student engagement in classroom/research
comes from understanding the nature of the problem, and from the fluency
of using tools (e.g., using software, using learned facts, and using
learned abstractions) for problem-solving. As an extensive user of
coding agents (Copilot, Gemini, and Claude), I strongly believe that the
understanding of the problem and the fluency are even more important in
the AI/LLM era as these factors help students formulate the
right questions to the AI.
Bootcamp Instructor, gem5 Bootcamp, UC Davis (Summer 2022).