Most things in the kernel use 32-bit reference counters, relying on the fact that the memory constraints of real computers make it impossible to create enough references to overflow the counters. There are exceptions for things like `struct file` because it is possible to create references to them with relatively little memory usage. Using BPF_MAP_TYPE_PROG_ARRAY maps, it is possible to create references to BPF programs that only need sizeof(void*) bytes each (8 bytes on amd64), permitting an overflow after filling ~32GB of memory that is subject to RLIMIT_MEMLOCK restrictions. The requirement for more than 32GB of RAM is relatively high, but not impossible. The requirement that the allocations need to be below RLIMIT_MEMLOCK is probably the bigger obstacle for exploitation: On most Linux systems, every user is only permitted to allocate up to 64KiB of RAM. However: - There are systems where RLIMIT_MEMLOCK is disabled administratively. - On systems with containers (e.g. LXC containers), usually every container's root user has access to 2^16 different UIDs. If an attacker has control over 9 containers and can share file descriptors between them or has control over one container with a relatively high number of mapped UIDs, he should be able to trigger the overflow. The attached PoC, when run in a Ubuntu 16.04 VM with 40GB RAM and the RLIMIT_MEMLOCK limit disabled, needs 25 minutes to execute and causes the following oops: [ 1850.676543] BUG: unable to handle kernel paging request at ffffc900069c5010 [ 1850.676550] IP: [] bpf_prog_put_rcu+0x5/0x30 [ 1850.676556] PGD 9bc094067 PUD 9bc095067 PMD 9b4d2b067 PTE 0 [ 1850.676558] Oops: 0000 [#1] SMP [ 1850.676561] Modules linked in: nls_utf8 isofs vboxsf(OE) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event joydev snd_rawmidi snd_seq snd_seq_device snd_timer input_leds snd serio_raw soundcore vboxvideo(OE) 8250_fintek drm i2c_piix4 vboxguest(OE) mac_hid parport_pc ppdev lp parport autofs4 hid_generic usbhid hid psmouse ahci libahci fjes video e1000 pata_acpi [ 1850.676579] CPU: 0 PID: 1861 Comm: overflow Tainted: G OE 4.4.0-21-generic #37-Ubuntu [ 1850.676581] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 1850.676582] task: ffff8809b2fe4b00 ti: ffff8809b2f3c000 task.ti: ffff8809b2f3c000 [ 1850.676583] RIP: 0010:[] [] bpf_prog_put_rcu+0x5/0x30 [ 1850.676585] RSP: 0018:ffff8809b2f3fdb8 EFLAGS: 00010286 [ 1850.676586] RAX: ffffffff81a24f20 RBX: 0000000000000000 RCX: 0000000000000001 [ 1850.676587] RDX: ffff880230ebc110 RSI: ffff880230ebc100 RDI: ffffc900069c5000 [ 1850.676588] RBP: ffff8809b2f3fdc0 R08: 0000000000000000 R09: 0000000000000000 [ 1850.676589] R10: ffff8809b55468e0 R11: ffff880230ebc110 R12: ffffc90814ce6060 [ 1850.676590] R13: ffffc90814ce6000 R14: ffff8809b5a9d1a0 R15: ffff8809b29cf480 [ 1850.676592] FS: 00007fbe54cf5700(0000) GS:ffff8809e3c00000(0000) knlGS:0000000000000000 [ 1850.676593] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 1850.676594] CR2: ffffc900069c5010 CR3: 00000009ae9ce000 CR4: 00000000000006f0 [ 1850.676598] Stack: [ 1850.676599] ffffffff8117810e ffff8809b2f3fde8 ffffffff811783c6 ffffc90814ce6000 [ 1850.676600] 0000000000000008 ffff8809b55468e0 ffff8809b2f3fdf8 ffffffff811729bd [ 1850.676602] ffff8809b2f3fe10 ffffffff811733b9 ffff880230ebc100 ffff8809b2f3fe58 [ 1850.676603] Call Trace: [ 1850.676607] [] ? prog_fd_array_put_ptr+0xe/0x10 [ 1850.676609] [] bpf_fd_array_map_clear+0x36/0x50 [ 1850.676611] [] bpf_map_put_uref+0x1d/0x20 [ 1850.676612] [] bpf_map_release+0x19/0x30 [ 1850.676616] [] __fput+0xe4/0x220 [ 1850.676617] [] ____fput+0xe/0x10 [ 1850.676621] [] task_work_run+0x73/0x90 [ 1850.676625] [] do_exit+0x2e4/0xae0 [ 1850.676626] [] do_group_exit+0x43/0xb0 [ 1850.676628] [] SyS_exit_group+0x14/0x20 [ 1850.676632] [] entry_SYSCALL_64_fastpath+0x16/0x71 [ 1850.676633] Code: cf 00 55 48 89 e5 48 89 78 08 48 89 07 48 c7 47 08 60 55 e6 81 48 89 3d 4a 20 cf 00 5d c3 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> 8b 47 10 3e ff 08 74 01 c3 55 48 8b 7f 10 48 c7 c6 20 2f 17 [ 1850.676649] RIP [] bpf_prog_put_rcu+0x5/0x30 [ 1850.676650] RSP [ 1850.676651] CR2: ffffc900069c5010 [ 1850.676653] ---[ end trace 90333448b9273067 ]--- [ 1850.676655] Fixing recursive fault but reboot is needed! I believe that this issue illustrates that reference count hardening makes sense, even without reference leaks. A suggested patch (compile-tested) is attached. Fixed in https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/kernel/bpf?id=92117d8443bc5afacc8d5ba82e541946310f106e # Iranian Exploit DataBase = http://IeDb.Ir [2016-05-10]