סמינר של הפקולטה להנדסה ע"ש איבי ואלדר פליישמן

EE ZOOM Seminar: Optimizing Multi-Core Read Latency in ccNUMA Systems

06 באוגוסט 2025, 15:00

סמינר זום

EE ZOOM Seminar: Optimizing Multi-Core Read Latency in ccNUMA Systems

https://tau-ac-il.zoom.us/j/87072113850

Electrical Engineering Systems ZOOM Seminar

Speaker: Alla Lenchner

M.Sc. student under the supervision of Prof. Adam Morrison

Wednesday, 6^th August 2025, at 15:00

Optimizing Multi-Core Read Latency in ccNUMA Systems

Abstract

Non-Uniform Memory Access (NUMA) architectures present challenges for efficient system utilization, particularly in data-intensive, read-heavy workloads common in big data analytics and machine learning.

While read operations are theoretically scalable, this thesis reveals a significant and unexpected degradation in memory read latency for multi-core read-heavy workloads on ccNUMA systems, extending beyond the known "NUMA effect."

This issue stems from a problematic interaction within Intel's directory-based cache coherence protocol: silent evictions of exclusively cached but unmodified data leave stale entries in the coherence directory. Subsequent requests for exclusive ownership of these cache lines incur redundant inter-node round trips, unnecessarily increasing access latency.

This thesis investigates this phenomenon, analyzing the latency impact of inter-node traffic and cache coherence overheads.

We detail the interaction between the Exclusive state and silent evictions, providing a theoretical analysis of its latency implications.

We also examine an existing BIOS-configurable optimization by Intel, demonstrating its benefit for read-heavy workloads but highlighting its detrimental effect on read-write workloads.

To address this, we propose a novel micro-architectural solution: an adaptive directory state mechanism.

This mechanism aims to reduce redundant coherence traffic for cache lines affected by silent evictions without negatively impacting read-write performance. Evaluation through gem5 simulations demonstrates that our proposed solution achieves a 16\% reduction in memory read latency for the affected workloads.

Alla Lenchner is an M.Sc. student specializing in CPU performance and architecture.

השתתפות בסמינר תיתן קרדיט שמיעה = עפ"י רישום בצ'ט של שם מלא + מספר ת.ז.