Reverse Code Engineering Tutorial Part 2 Hackingloops

Let’s continue our tutorial on reverse engineering. Today I will teach you basic assembly language that is necessary for learning reverse engineering. Assembly language is very important for reverse engineering and we must know what registers are and which register serves for what. How the assembly language instructions work and how can we relate them to normal high language coding ( C, JAVA, VB, etc.) to hack software is extremely important knowledge to have.

Reverse Engineering Hacking class 2 – Introduction to assembly language

What is Assembly Language?

Assembly language is a low level machine language made up of machine instructions. Assembly language is specific to processor architecture, for example the language for x86 architecture is not the same as the language for SPARC architecture. Assembly language consists of assembly instructions and CPU registers. I will explain my tutorial considering x86 architecture. We will start with CPU registers.

CPU registers – Brief Introduction:

First of all what are registers? Most of Computer and Electronics Engineers know about them, but for others, registers are small segments of memory inside CPU that are used for storing temporary data. Some registers have specific functions, others are just use for general data storage. I am assuming that you all are using x86 machines. There are two types of processors, 32 bit and 64 bit processors. In a 32 bit processor, each register can hold 32 bits of data. On the other hand 64 bit register can hold 64 bits data. I am explaining this tutorial assuming that we are all using 32 bit processors. I will explain the same for 64 bits in later classes.

There are several registers but for Reverse engineering we HackingLoops users are interested in only 9 general purpose registers namely:

EAX
EBX
ECX
EDX
ESI
EDI
ESP
EBP
EIP

All of these registers serve a different purpose. So I will start explaining them one by one for a more clear and accurate understanding of register concepts. I am putting more strain on these because these registers are called the heart of reverse engineering.

EAX register is an accumulator register which is used to store results of calculations. If any function returns a value its stored into the EAX register. We can access the EAX register using functions to retrieve the value of the EAX register.

Note: EAX registesr can also be used for holding normal values regardless of calculations.

The EDX is the data register. It’s basically an extension of EAX to assist it in storing extra data for complex operations. It can also be used for general purpose data storage.

The ECX, also called the count register, is used for looping operations. The repeated operations could be storing a string or counting numbers.

The ESI and EDI are relied upon by loops that process data. The ESI register is the source index for data operation and holds the location of the input data stream. The EDI points to the location where the result of data operation is stored, or the destination index.

ESP is the stack pointer, and EBP is the base pointer. These registers are used for managing function calls and stack operations. When a function is called, the function’s arguments are pushed on the stack and are followed by a return address. The ESP register points to the very top of the stack, so it will point to the return address. EBP is used to point to the bottom of the call stack.

EBX is the only register that was not designed for anything specific. It can be used for extra storage.

EIP is the register that points to the current instruction being executed. As the CPU moves through the binary executing code, EIP is updated to reflect the location where the execution is occurring.

The ‘E’ at the beginning of each register name stands for Extended. When a register is referred to by its extended name, it indicates that all 32 bits of the register are being addressed. An interesting thing about registers is that they can be broken down into smaller subsets of themselves; the first sixteen bits of each register can be referenced by simply removing the ‘E’ from the name. For example, if you wanted to only manipulate the first sixteen bits of the EAX register, you would refer to it as the AX register. Additionally, registers AX through DX can be further broken down into two eight bit parts. So, if you wanted to manipulate only the first eight bits (bits 0-7) of the AX register, you would refer to the register as AL; if you wanted to manipulate the last eight bits (bits 8-15) of the AX register, you would refer to the register as AH (‘L’ standing for Low and ‘H’ standing for High).

Introduction to Memory and Stacks:

There are three main sections of memory:

1. Stack Section – Where the stack is located, stores local variables and function arguments.

2. Data Section – Where the heap is located, stores static and dynamic variables.

3. Code Section – Where the actual program instructions are located.

The stack section starts at the high memory addresses and grows downwards, towards the lower memory addresses; conversely, the data section (heap) starts at the lower memory addresses and grows upwards, towards the high memory addresses. Therefore, the stack and the heap grow towards each other as more variables are placed in each of those sections. I have shown that in below Figure..

High Memory Addresses (0xFFFFFFFF)
———————- <—–Bottom of the stack
|    |
|    | |
| Stack    | | Stack grows down
|        | v
|        |
|———————| <—-Top of the stack (ESP points here)
|    |
|        |
|        |
|        |
|        |
|———————| <—-Top of the heap
|        |
|        | ^
| Heap    |    |   Heap grows up
|    | |
|    |
|———————| <—–Bottom of the heap
|    |
|    Instructions    |
|    |
|    |
———————–
Low Memory Addresses (0x00000000)

Some Essential Assembly Instructions for Reverse Engineering:

Instruction	Example	Description
push	push eax	Pushes the value stored in EAX onto the stack
pop	pop eax	Pops a value off of the stack and stores it in EAX
call	call 0x08abcdef	Calls a function located at 0x08abcdef
mov	mov eax,0x5	Moves the value of 5 into the EAX register
sub	sub eax,0x4	Subtracts 4 from the value in the EAX register
add	add eax,0x1	Adds 1 to the value in the EAX register
inc	inc eax	Increases the value stored in EAX by one
dec	dec eax	Decreases the value stored in EAX by one
cmp	cmp eax,edx	Compare values in EAX and EDX; if equal set the zero flag* to 1
test	test eax,edx	Performs an AND operation on the values in EAX and EDX; if the result is zero, sets the zero flag to 1
jmp	jmp 0x08abcde	Jump to the instruction located at 0x08abcde
jnz	jnz 0x08ffff01	Jump if the zero flag is set to 1
jne	jne 0x08ffff01	Jump to 0x08ffff01 if a comparison is not equal
and	and eax,ebx	Performs a bit wise AND operation on the values stored in EAX and EBX; the result is saved in EAX
or	or eax,ebx	Performs a bit wise OR operation on the values stored in EAX and EBX; the result is saved in EAX
xor	xor eax,eax	Performs a bit wise XOR operation on the values stored in EAX and EBX; the result is saved in EAX
leave	leave	Remove data from the stack before returning
ret	ret	Return to a parent function
nop	nop	No operation (a ‘do nothing’ instruction)

*The zero flag (ZF) is a 1 bit indicator which records the result of a cmp or test instruction

Each instruction performs one specific task, and can deal directly with registers, memory addresses, and the contents thereof. It is easiest to understand exactly what these functions are used for when seen in the context of a simple hello world program and try to relate assembly language with high level language such as C language.

Here is simple C program that displays Hello World:

int main(int argc, char *argv[])

{
printf(“Hello World!n”);
    return 0;
}

Save this program as helloworld.c and compile it with ‘gcc -o helloworld helloworld.c’; run the resulting binary and it should print “Hello World!” on the screen and exit. Ahhah… It looks quite simple. Now let’s look how it will look in assembly language.

0x8048384     push ebp            <— Save the EBP value on the stack
0x8048385     mov ebp,esp <— Create a new EBP value for this function
0x8048387 sub esp,0x8                 <—Allocate 8 bytes on the stack for local variables
0x804838a and esp,0xfffffff0          <—Clear the last byte of the ESP register
0x804838d mov eax,0x0                 <—Place a zero in the EAX register
0x8048392     sub esp,eax                  <—Subtract EAX (0) from the value in ESP
0x8048394     mov DWORD PTR [esp],0x80484c4     <—Place our argument for the printf() (at address 0x08048384) onto the stack
0x804839b     call 0x80482b0 <_init 56>                     <—Call printf()
0x80483a0 mov eax,0x0                 <—Put our return value (0) into EAX
0x80483a5     leave                              <—Clean up the local variables and restore the EBP value
0x80483a6 ret                                  <—Pop the saved EIP value back into the EIP register

As you can easily figure out these instructions are similar to that of C program. You can easily note that flow of the program is the same. Of course it will be same, it’s an assembly code of same binary (exe) obtained from executing the above C program.

A quick tip for all users on learning assembly language: pick a ready-made code and generate its binary or exe file, and obtain the assembly code of that binary, trying to relate assembly code with high language code. I guarantee that will help you to understand this process better.

Reverse Code Engineering Tutorial Part 2 Hackingloops

Did you enjoy this post?

Leave a Reply Cancel reply