I’m a fifth-year PhD student at UC Davis (graduating December 2026),
working with Professor
Jason Lowe-Power on rethinking data prefetching through the lens of
hardware/software co-design, building systems that interweave the
flexibility of software prefetching with the efficiency of specialized
hardware. To put it differently: this is how a software engineer
would build a data prefetcher.
I have an extensive background in hardware architecture and software
development. I’ve contributed to the gem5 simulator for 6 years at UC
Davis, and during my PhD I’ve interned as a software engineer at Google
and as a researcher at AMD.
I’ll be returning to Google for another internship in summer
2026.
Research Interests: I started my PhD thinking about
the inevitable address translation bottlenecks in
scatter/gather operations of vector
architectures. At some point, I realized this has always been a
data prefetching problem. This led to Pickle, a data
prefetcher for irregular memory accesses.
Research
My research builds tools for hardware modeling and finds the right
interface between hardware and software.
[Pickle Prefetcher] I lead the development of
Pickle, a last-level cache prefetcher for irregular memory accesses, in
collaboration with AMD Research. In our paradigm, the software provides
prefetch kernels, which generate prefetch requests that are handled by
the hardware to deliver timely prefetches. As the kernels are real code
rather than inferred patterns, they can prefetch for hash-indexed
lookups, predicated traversals, and other accesses that have long
resisted hardware prefetchers, a direction the field’s decades-long
focus on pattern recognition has left underexplored. On graph analytics
workloads, Pickle delivers significant speedups at only 2% DRAM traffic
overhead.
This project is named after my first cat, Pickle. She loves to play
fetch, tends to take off before I throw the ball (prefetching),
and ignores the bad throws (conditional prefetching).
[Choreographer] I drive the development of
Choreographer, a gem5-based framework enabling hardware/software
co-design for near-cache accelerators, in collaboration with AMD
Research. Because these accelerators touch every layer of the
hardware/software stack, evaluating them demands full-system visibility.
Choreographer provides exactly that: it models the accelerator alongside
a cluster of high-performance out-of-order CPUs, a chiplet-based on-chip
network with a fully detailed MOESI coherence protocol via gem5’s CHI,
and the complete software stack, all in full-system simulation.
Pickle is built on Choreographer. I extend the framework to track
the source of every cache miss and measure prefetch usefulness across
the system.
Summer 2024 — I built a pre-RTL area estimation model for the XLS project. The area model is
used to guide pre-RTL optimizations of certain designs.
Summer 2025 — I profiled and analyzed Borglet’s CPU scheduling on
AMD chips.
[AMD Research]: Summer 2023 — I built the
Choreographer framework.
Teaching
I strongly believe that student engagement in classroom/research
comes from understanding the nature of the problem, and from the fluency
of using tools (e.g., using software, using learned facts, and using
learned abstractions) for problem-solving. As an extensive user of
coding agents (Copilot, Gemini, and Claude), I strongly believe that the
understanding of the problem and the fluency are even more important in
the AI/LLM era as these factors help students formulate the
right questions to the AI.
Bootcamp Instructor, gem5 Bootcamp, UC Davis (Summer 2022).