Memory Overcommit Technologies for KVM hosts

a glance over technologies enabling a more effective memory overcommit

 

rodrigo freire

Principal Technical Account Manager

São Paulo - Sep 2018

QEmu-kvm mM

Agenda

Technologies

How to Overcommit RAM

QEmu-kvm mM

memory
balloon

QEmu-kvm mM

memory Balloon

Clever Trick

Works in the guest RAM

Inflates and... Deflate :-P

QEmu-kvm mM

Text

8 GB

memory Balloon

QEmu-kvm mM

8 GB

[root@satserver-rf ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        3018        4320          30         644        4633
Swap:          3071           0        3071
[root@satserver-rf ~]# 

memory Balloon

QEmu-kvm mM

8 GB

[root@satserver-rf ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        3018        4320          30         644        4633
Swap:          3071           0        3071
[root@satserver-rf ~]# 

memory Balloon

QEmu-kvm mM

Some minutes later...

memory Balloon

QEmu-kvm mM

8 GB

[root@satserver-rf ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        4492         794          45        2696        3100
Swap:          3071           0        3071
[root@satserver-rf ~]# 

memory Balloon

QEmu-kvm mM

8 GB

[root@satserver-rf ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        4492         794          45        2696        3100
Swap:          3071           0        3071
[root@satserver-rf ~]# 

memory Balloon

QEmu-kvm mM

Let's apply a Balloon!

memory Balloon

QEmu-kvm mM

6 GB

2 GB

memory Balloon

QEmu-kvm mM

6 GB

2 GB

memory Balloon

QEmu-kvm mM

6 GB

2 GB

[root@satserver-rf ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           5935        2799         434          45        2701        2746
Swap:          3071           0        3071
[root@satserver-rf ~]# 

memory Balloon

Now:

QEmu-kvm mM

6 GB

2 GB

[root@satserver-rf ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           5935        2799         434          45        2701        2746
Swap:          3071           0        3071
[root@satserver-rf ~]# 
[root@satserver-rf ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           7983        4492         794          45        2696        3100
Swap:          3071           0        3071
[root@satserver-rf ~]#

memory Balloon

Now:

Was:

QEmu-kvm mM

6 GB

2 GB

[root@satserver-rf ~]# free -m
              total        used        free      shared  buff/cache   available
Mem:           5935        2799         434          45        2701        2746
Swap:          3071           0        3071
[root@satserver-rf ~]# 

memory Balloon

QEmu-kvm mM

cons

  • Requires external control

 

  • Poll interval

 

  • Possible OOMs and dead bodies in the way

 

  • Vulnerable to memory surge

memory Balloon

QEmu-kvm mM

  • Requires external control

 

  • Poll interval

 

  • Possible OOMs and dead bodies in the way

 

  • Vulnerable to memory surge

memory Balloon

cons

QEmu-kvm mM

ksm

QEmu-kvm mM

  • Memory Deduplication

 

  • Works in the Hypervisor Memory

 

  • Kicks in when memory watermark is low

 

  • Scans memory for similar pages

ksm

QEmu-kvm mM

ksm

QEmu-kvm mM

ksm

:-)

\o/

:-D

:-D

:-D

:-D

:-D

:-D

QEmu-kvm mM

ksm

strong

  • Simple

 

  • VERY advantageous if have similar VMs

QEmu-kvm mM

ksm

weak

  • Does not work with hugepages
  • Slow ramp up time
  • Crossing NUMA domains: Check
    /sys/kernel/mm/ksm/merge_across_nodes
  • Non-negligible CPU consumption
  • Does no miracles :-}

QEmu-kvm mM

zswap

QEmu-kvm mM

Works on top of frontswap - which uses Transcendent Memory as its back-end

 

Stashes 'swapped' pages in compressed memory region

 

Extremely optimistic - expect pages to be reclaimed while still on RAM

zswap

QEmu-kvm mM

pros

Useful if you have a external memory device as backing for Frontswap...

zswap

QEmu-kvm mM

pros

zswap

Useful if you have a external memory device as backing for Frontswap...

... which is not exactly

common.

QEmu-kvm mM

Cons

May break bad when system is under memory pressure

 

Breaks swappiness algorithm

 

Consumes...

zswap

RAM.

QEmu-kvm mM

May break bad when system is under memory pressure

 

Breaks swappiness algorithm

 

Consumes...

Cons

zswap

RAM.

QEmu-kvm mM

swap

QEmu-kvm mM

  • 'swap is bad'.

 

  • 'My systems are swapless'

 

  • 'If it is using swap, something is real real bad'

swap

common swap quotes

QEmu-kvm mM

  • 'swap is bad'.

 

  • 'My systems are swapless'

 

  • 'If it is using swap, something is real real bad'

swap

common swap quotes

QEmu-kvm mM

  • Simple

 

  • Very useful if properly used/configured

 

  • Can act as a safety net to avoid OOMs

 

  • Happilly puts unneeded pages out of memory

swap

pros

QEmu-kvm mM

  • Memory starvation == I/O Thrashing

 

  • I/O Thrashing == Unuseable system for a (good?) while

 

  • Unsuitable for low latency workloads

swap

cons

QEmu-kvm mM

kswapd

QEmu-kvm mM

  • It is NOT about (just) swap

 

  • Also exists (and can stick to 100% CPU!) in swapless systems

 

  • Responsible for MEMORY RECLAIM.

kswapd

The worst named kernel component

QEmu-kvm mM

¯\_(ツ)_/¯

QEmu-kvm mM

How to Overcommit RAM

Mix of solutions - KSM and...

QEmu-kvm mM

How to Overcommit RAM

Mix of solutions - KSM and...

Swap!

QEmu-kvm mM

the fine print

  • Ensure that you really really know what you are doing: Overcommitting memory may incur some extra risks to your operation.

 

  • In this test scenario we are using several similar VMs (All RHEL systems)

 

  • Mileage may vary.

How to Overcommit RAM

QEmu-kvm mM

how will we be using it

  • Kernel identifies 'cold' pages from guests

 

  • Turns the pages candidates to swap

 

  • Hypervisor under memory pressure, discharges the pages to swap device

 

  • Reload the pages back from swap if needed.

swap

QEmu-kvm mM

rationale

  • Not all memory is hot all the time

 

  • Not all VMs have all of its RAM hot all the time

 

  • Not even all the VMs are hot all the time.

How to Overcommit RAM

QEmu-kvm mM

rationale

  • Similar pages: Merged by KSM (but reacts slowly)

 

  • Swap acts as safety net while KSM does not kick in

 

  • Preserve 'cached' memory as much as possible

 

  • If out ouf 'Free' memory, prefers dumping cold pages to swap

How to Overcommit RAM

QEmu-kvm mM

Expected operation

Expected routine: A few sparse pgin/pgout ops

How to Overcommit RAM

QEmu-kvm mM

When things gets out of hand

How to Overcommit RAM

QEmu-kvm mM

tweaking

Be Conservative

 

75% of your RAM overcommit

 

Ex.: Host 1 TB RAM, overcommit 1.5:1

 

  • 375 GB swap

How to Overcommit RAM

QEmu-kvm mM

tweaking

vm.swappiness = 100

 

Rationale: Preserve as much as possible cache memory and shed cold pages to Swap ASAP, avoiding a reclaiming trashing by kswapd when memory goes low

How to Overcommit RAM

QEmu-kvm mM

tweaking

How to Overcommit RAM

QEmu-kvm mM

requirements

High throughput volume

 

  • Local NVMe, SSD

or

  • High-end storage LUN

or

  • Mid-end  Storage LUN + Striped LVM

How to Overcommit RAM

QEmu-kvm mM

pros

Easy implementation

 

qemu-kvm is treated as a regular process, without specific tricks to make use of swap

 

Robustness against memory consumption surges

 

No need of external control

How to Overcommit RAM

QEmu-kvm mM

Cons

  • Does not work with hugepages

 

  • Need extra volumes

 

  • Excessive overcommit can lead to  I/O thrashing

 

  • Pay attention to events that can wake up too many machines at the same time - Can lead to I/O thrashing events too.

How to Overcommit RAM

QEmu-kvm mM

Remember!

  • Operating System uses memory!
  • Hypervisor too
  • Ancillary services too
  • And Pagetables too
  • And {insert name here} too


Sometimes, even 1:1 (no overcommit) might not be attainable (no KSM, using Hugepages, heterogeneous workload), because... Above.

How to Overcommit RAM

QEmu-kvm mM

My Test Environment

  • W541 32 GB RAM
  • 3x SSD
  • Quick and Dirty environments for tests and reproducers
  • Plus... a Satellite server!

How to Overcommit RAM

QEmu-kvm mM

the demo system

How to Overcommit RAM

QEmu-kvm mM

recap

How to Overcommit RAM

  • Fast swap back-end
  • Swap: ¾ of your overcommited
  • vm.swappiness=100
  • ksm = running
  • Give a few minutes to KSM dedupe
  • Keep an eye on your stats.

QEmu-kvm mM

Bonus: Monitoring ksm

How to Overcommit RAM

while true ; do
   echo -n 'KSM state: '
   cat /sys/kernel/mm/ksm/run
   echo -n 'Saved (deduped) memory (bytes) : '
   echo `cat /sys/kernel/mm/ksm/pages_sharing`*4096 | bc | numfmt --to=si
   echo -n 'Actual size in memory (bytes)  : '
   echo `cat /sys/kernel/mm/ksm/pages_shared`*4096 | bc | numfmt --to=si
   egrep '(Dirty|MemFree|^Cached)' /proc/meminfo
   sleep 1
   clear
done