(논문) Tiered Memory Management: Access Latency is the Key!, SOSP'24 (6-7. Related Work and Conclusion)

본 글은 논문 Tiered Memory Management: Access Latency is the Key! (SOSP 2024) 를 읽고 정리한 글입니다.

별도의 명시가 없는 한, 본 글의 모든 그림은 위 논문에서 가져왔습니다.

목차

1. Introduction

2. Motivation

3. Colloid

4. Colloid with Existing Memory Tiering Systems

5. Evaluation

6-7. Related Work and Conclusion (현재 글)

6.1. Memory management for NUMA architectures.

NUMA architecture 에서의 초점은 compute task 와 memory page 를 multi-socket 에 배치하는 문제였다고 한다 ¹.
NUMA 에서 Colloid 와 가장 비슷한 연구는 Carrefour, ASPLOS’13 이다.
- 이 연구에서는 여러 NUMA node 의 memory 에 가해지는 memory request load (평균 memory request rate) 를 balancing 한다.
하지만 tiered memory 의 관점에서는 이것은 효율적이지 않다.
- 왜냐면 tiered memory 에서는 unloaded latency 가 서로 다르기 때문에, load 를 balancing 하면 memory interconnect contention 이 없을 때에도 alternate tier 에 hot page 를 배치하게 되기 때문이다.
- 그리고, memory interconnect contention 이 있을 때에도 불필요하게 latency 가 큰 tier 에 page 를 배치하게 되어 비효율적으로 작동한다.
따라서 Colloid 에서는 load 가 아닌 latency 를 balancing 하는 것이 더 좋은 approach 임을 보여준다.

6.2. Software-managed tiered memory management.

일단 Section 4. 에서 보여준 대로 Colloid 는 대부분의 software-managed tiered memory management system 과 통합이 가능하다.
Colloid 와 유사한 이전 연구는 BATMAN, MEMSYS’17 인데 이 연구에서는 각 tier 의 theoretical maximum bandwidth 에 따라서 memory request 를 각 tier 에 분배한다.
하지만 이는 다음의 두가지 이유에 의해 효율적이지 않다:
- 우선 bandwidth 의 비율에 따라 page placement tier 를 결정하기 때문에, 이때에도 hot page 가 불필요하게 latency 가 큰 tier 에 배치될 수 있다.
- 그리고 Section 3.1.0. 에서 말한 것처럼 BW 가 saturate 되지 않아도 memory controller queue 때문에 memory interconnect contention 이 발생할 수 있다: 따라서 BW 만 고려하는 것은 올바르지 않다.
Colloid 는 “balancing latency” 라는 원칙 아래 BW saturation 와 무관하게 memory interconnect contention 을 반영하여 성능을 극대화한다.

6.3. Hardware-managed tiered memory management.

HW managed tiered memory management 는 Intel Optane PMEM 의 memory mode 처럼 default tier 를 alternate tier 에 대한 cache 처럼 사용하는 접근방식을 취한다.
- 구체적으로는 Intel Optane memory mode 와 같은 것을 Inclusive cache 라고 부르고,
- 이 외에도 stacked DRAM 이나 exclusive cache 와 같은 접근이 있다고 한다 ².
이런 접근 방식은 Cache line 단위로 placement 를 할 수 있다는 장점이 있지만, 여전히 default tier 의 latency 가 항상 낮을 것이라는 가정에서 작동한다.
하지만 Colloid 에서는 이 가정이 틀렸다는 것을 보여주고, 따라서 이러한 접근도 효율적이지 않다는 것을 추론할 수 있다.
다만 본 논문에서는 Colloid 와 같은 “balancing latency” 원칙을 HW managed tiered memory system 에 통합하는 것도 추후에 연구해 봄즉 하다고 말하고 있다.

7. Conclusion

본 논문을 정리해 보면,
- 기존의 연구는 page access tracking 하는 방식이나 효율적인 page migration 기법, 그리고 page size determination 에 혁신을 이루어 왔지만 항상 default tier 의 latency 가 alternate tier 에 비해 더 낮을 것이라는 가정을 해 왔었다.
- 하지만 본 논문에서는 memory interconnect contention 상황에서는 이 가정이 틀렸음을 보여준다.
- 따라서 본 논문에서 제시하는 Colloid 에서는 “balancing latency” 라는 원칙 아래 latency 를 측정하고 이 latency 에 따라 page placement 를 하는 방법을 제시한다.
- 그리고 이것을 기존의 SOTA tiered memory management system 와 통합하여 구현하고, 이들에 대한 실험 결과를 제시한다.

이게 뭔소린지 모르겠다. ↩
이놈들이 구체적으로 어떤건지는 모르겠다. ↩

Madison Digital Garden

(논문) Tiered Memory Management: Access Latency is the Key!, SOSP'24 (6-7. Related Work and Conclusion)

6.1. Memory management for NUMA architectures.

6.2. Software-managed tiered memory management.

6.3. Hardware-managed tiered memory management.

7. Conclusion

Graph View

Table of Contents

Backlinks

Madison Digital Garden

(논문) Tiered Memory Management: Access Latency is the Key!, SOSP'24 (6-7. Related Work and Conclusion)

6.1. Memory management for NUMA architectures. §

6.2. Software-managed tiered memory management. §

6.3. Hardware-managed tiered memory management. §

7. Conclusion §

Footnotes §

Graph View

Table of Contents

Backlinks

6.1. Memory management for NUMA architectures.

6.2. Software-managed tiered memory management.

6.3. Hardware-managed tiered memory management.

7. Conclusion

Footnotes