I’m a fifth-year PhD student at UC Davis, working with Professor
Jason Lowe-Power on bringing flexibility to
hardware, making the hardware adaptable to the ever-changing demands of
modern software systems.
More specifically, we leverage reconfigurable technologies located
near the last-level cache (LLC) to build accelerators. The
reconfigurable technologies integrate hardware construction into
the software development cycle, furthering the impact and
practicality of hardware/software co-design. I strongly believe that,
given the vast capacity of LLC in modern CPUs, using reconfigurable
technologies near LLC is the most viable path toward maximizing the
computational efficiency of modern systems.
I have an extensive background in hardware architectures and software
development. I have contributed to the gem5 project, a widely-used
hardware simulator, at UC Davis for 5 years, and I interned at Google
and AMD during my PhD.
I hope my CV would ever land me an
impactful research position. Let’s see!
Research
My research work involves using the right tools for modeling hardware
and writing software for new hardware.
[Tools] This work is in collaboration with AMD
Research. I drive the development of the Choreographer platform, a
gem5-based framework for studying in-cache accelerators. The framework
provides high-resolution views of both hardware/software stacks, which
is crucial to evaluate an in-cache accelerator.
This is achieved by modeling the full high-performance system with
an out-of-order CPU, a chiplet-based network on chip, a fully detailed
cache coherence protocol (we use gem5’s CHI protocol to model the MOESI
protocol with L3 victim cache), and the full software stack. We use
full-system simulations so we also do not miss out on optimizing any
part of the software stack! [arXiV link pending]
[Prefetcher] This work is in collaboration with AMD
Research. I drive the development of the Pickle prefetcher, a last-level
cache prefetcher accelerating irregular memory accesses in
network-on-chip (NoC) architectures.
The prefetcher is an integral part of the NoC. We are able to
monitor the traffic pattern between the prefetcher and other parts of
the NoC. This leads us to derive a lot of metrics for measuring the
efficiency of the prefetcher in the NoC and provide insights on how to
further optimize the prefetcher! [arXiV link pending]
I strongly believe that student engagement in classroom/research
comes from understanding the nature of the problem, and from the fluency
of using tools (e.g., using software, using learned facts, and using
learned abstractions) for problem-solving. As an extensive user of
coding agents (Copilot, Gemini, and Claude), I strongly believe that the
understanding the problem and the fluency are even more important in the
AI/LLM era as these factors help students formulate the right
questions to the AI.
Bootcamp Instructor, gem5 Bootcamp, UC Davis (Summer 2022).
Teaching Assistant, Optimization (MAT 168), UC Davis (Spring
2019).
Teaching Assistant, Abstract Mathematics (MAT 108), UC Davis (Winter
2019).
Internships
[Google]: I built a pre-RTL area estimation model
for the XLS project in the
Summer of 2024, and worked on Borglet’s CPU scheduling problem in the
Summer of 2025. The area model is useful enough for optimizing the area
of certain designs ;).
[AMD Research]: I built the software/hardware stack
for a last-level cache prefetcher (the Pickle prefetcher) in the Summer
of 2023.