Source: https://bugs.chromium.org/p/project-zero/issues/detail?id=981 As part of Samsung KNOX, Samsung phones include a security hypervisor called RKP (Real-time Kernel Protection), running in EL2. This hypervisor is meant to ensure that the HLOS kernel running in EL1 remains protected from exploits and aims to prevent privilege escalation attacks by "shielding" certain data structures within the hypervisor. One of the protections implemented by RKP is a security policy meant to ensure that only the "authentic" kernel code pages are executable from EL1. This mitigation is achieved by combining a few memory protection policies together, namely: -All pages with the exception of the kernel code are marked PXN -All kernel code pages are marked read-only in the stage 2 translation table -Kernel data pages are never marked executable -Kernel code pages are never marked writable (for more information, see https://www2.samsungknox.com/en/blog/real-time-kernel-protection-rkp) In order to explore this mitigation technique, I've written a small tool to dump the stage 1 and stage 2 translation tables for EL0/EL1. First, the initial stage 2 translation table is embedded in the VMM code, so it can be statically retrieved and analysed. Here is a short snippet from the initial stage 2 translation table (the addresses here are PAs, although RKP implements a one-to-one PA<->IPA translation, barring memory protections): ... 0x80000000-0x80200000: S2AP=11, XN=0 0x80200000-0x80400000: S2AP=11, XN=0 0x80400000-0x80600000: S2AP=11, XN=0 0x80600000-0x80800000: S2AP=11, XN=0 0x80800000-0x80a00000: S2AP=11, XN=0 0x80a00000-0x80c00000: S2AP=11, XN=0 0x80c00000-0x80e00000: S2AP=11, XN=0 0x80e00000-0x81000000: S2AP=11, XN=0 0x81000000-0x81200000: S2AP=11, XN=0 0x81200000-0x81400000: S2AP=11, XN=0 0x81400000-0x81600000: S2AP=11, XN=0 ... The physical address range above corresponds with the physical address range in which the kernel code is located. As can be seen above, this entire address range is mapped as RWX in the initial stage 2. However, obviously RKP does not leave this area unprotected, as this might allow an attacker to subvert the kernel's integrity (by writing to the kernel's code pages). When RKP is initialized (i.e., when the HVC command RKP_INIT is called from EL1), the HLOS kernel passes a structure containing the address ranges for the currently loaded kernel. Here is a short snippet from "rkp_init" (init/main.c): static void rkp_init(void) { rkp_init_t init; init.magic = RKP_INIT_MAGIC; init.vmalloc_start = VMALLOC_START; init.vmalloc_end = (u64)high_memory; init.init_mm_pgd = (u64)__pa(swapper_pg_dir); init.id_map_pgd = (u64)__pa(idmap_pg_dir); init.rkp_pgt_bitmap = (u64)__pa(rkp_pgt_bitmap); init.rkp_map_bitmap = (u64)__pa(rkp_map_bitmap); init.rkp_pgt_bitmap_size = RKP_PGT_BITMAP_LEN; init.zero_pg_addr = page_to_phys(empty_zero_page); init._text = (u64) _text; init._etext = (u64) _etext; if (!vmm_extra_mem) { printk(KERN_ERR"Disable RKP: Failed to allocate extra mem\n"); return; } init.extra_memory_addr = __pa(vmm_extra_mem); init.extra_memory_size = 0x600000; init._srodata = (u64) __start_rodata; init._erodata =(u64) __end_rodata; init.large_memory = rkp_support_large_memory; rkp_call(RKP_INIT, (u64)&init, 0, 0, 0, 0); } Upon receiving this command, RKP changes the stage 2 permissions for the address range corresponding to the kernel text (from "init._text" to "init._etext") to read-only and executable, like so: ... kernel_text_phys_start = rkp_get_pa(text); kernel_text_phys_end = rkp_get_pa(etext); rkp_debug_log("DEFERRED INIT START", 0LL, 0LL, 0LL); if ( etext & 0x1FFFFF ) rkp_debug_log("Kernel range is not aligned", 0LL, 0LL, 0LL); if ( !rkp_s2_range_change_permission(kernel_text_phys_start, kernel_text_phys_end, 128LL, 1, 1) ) rkp_debug_log("Failed to make Kernel range RO", 0LL, 0LL, 0LL); rkp_l1pgt_process_table(init_mm_pgd, 1u, 1u); ... However, notice that the code above only marks the region from _text to _etext as read-only. This region is *strictly smaller* than the physical address range reserved for the kernel text region (in part in order to account for RKP's KASLR slide, which means the kernel can be placed at several offsets within this region). If we take a look at the stage 1 translation table from TTBR1_EL1, we can see that the kernel code pages are allocated using L2 block descriptors (i.e., a large granularity), like so: ... [256] L1 table [PXNTable: 0, APTable: 0] [ 0] 0x080000000-0x080200000 [PXN: 0, UXN: 1, AP: 0] [ 1] 0x080200000-0x080400000 [PXN: 0, UXN: 1, AP: 0] [ 2] 0x080400000-0x080600000 [PXN: 0, UXN: 1, AP: 0] [ 3] 0x080600000-0x080800000 [PXN: 0, UXN: 1, AP: 0] [ 4] 0x080800000-0x080a00000 [PXN: 0, UXN: 1, AP: 0] [ 5] 0x080a00000-0x080c00000 [PXN: 0, UXN: 1, AP: 0] [ 6] 0x080c00000-0x080e00000 [PXN: 0, UXN: 1, AP: 0] [ 7] 0x080e00000-0x081000000 [PXN: 0, UXN: 1, AP: 0] [ 8] 0x081000000-0x081200000 [PXN: 0, UXN: 1, AP: 0] [ 9] 0x081200000-0x081400000 [PXN: 0, UXN: 1, AP: 0] [ 10] 0x081400000-0x081600000 [PXN: 1, UXN: 1, AP: 0] ... Moreover, as we can see above, the region 0x080000000-0x081400000 is marked as RWX in the stage 1 translation table, even though the kernel code pages only take up a much smaller area within this region. Combining these facts, we arrive at the conclusion that any address in the range 0x080000000-"_text" or "_etext"-0x081400000 are marked as RWX both in stage 1 and stage 2, even after RKP is initialized. This issue can be reproduced by simply writing code to any of these memory regions in EL1 and executing it directly (e.g., writing code to address 0xffffffc000000000 in the kernel's VAS). Proof of Concept: https://github.com/offensive-security/exploit-database-bin-sploits/raw/master/sploits/41217.zip # Iranian Exploit DataBase = http://IeDb.Ir [2017-02-02]