1. Introduction to x86 Assembly
What is x86 Assembly?
In a nutshell assembly is a low-level programming language to write instructions that a CPU can directly execute. Each instruction in assembly is composed of a mnemonic (opcode), operand, and an address. Some instructions come with a prefix, suffix or flag.
x86 in the name specifies the architecture of the language. There are multiple types of assembly languages for each CPU architecture and for each CPU. For example, Intel chips have different instructions than ARM chips.
This course is a semi deep dive into the x86 programming language and should provide the user with enough information on how assembly works to build a program successfully.
Registers
In a sentence: registers are small storage locations in a CPU that's used to hold temporary data during execution.
Each register has a purpose. However, most of them can be used for general purposes or for various operations. In this section we will provide information on the registers in the x86 architecture.
General purpose registers
EAXPurpose:
The Accumulator register.
Common usage:
It is normally used for arithmetic operations, such as:
add,sub,mul. Also used to store calculations results.
Example of usage:
mov eax, 15 ; loads 15 into EAX register
add eax, 15 ; EAX register now holds 30EBXPurpose:
The Base register
Common usage:
Usually used as a pointer to data in memory, but can also be used for arithmetic purposes.
Example of usage:
ECXPurpose:
The Counter register
Common usage:
Mostly used as a loop counter or for string/memory operations
Example of usage:
EDXPurpose:
The Data register
Common usage:
Works with the
eaxregister for multiplication/division. Holds parts of large results/data.
Example of usage:
NOTE: It is worth mentioning that 32bit is smaller than 64bit to store which is why sometimes edx, and eax are used together to hold values.
Pointer and index registers
ESIPurpose:
The Source register
Common usage:
Mostly used for string operations to hold source address of mnemonics like:
movsb,movsw,movsd
Example of usage:
EDIPurpose:
The Destination Index register
Common usage:
String operations to hold the destination address in similar instructions
Example of usage:
EBPPurpose:
Base Pointer register
Common usage:
Generally points to base of current stack frame. References function parameters and variables.
Example of usage:
ESPPurpose
Stack Pointer register
Common usage:
Points to the top of the stack, automatically updated.
Example of usage:
Special Purpose Registers
EIPPurpose:
Instruction pointer
Common usage:
Holds the address of the next instruction to be executed. This register is not directly modifiable by most instructions. Can only be modified through control flow instructions such as:
jmp,call,ret.
EFLAGSPurpose
Flag register
Common usage:
Stores status flags that indicate operation results.
Flags:
Zero Flag (ZF): operation is zero
Carry Flag (CF): operation results in a carry or borrow
Sign Flag (SF): operation is negative
Table breakdown:
EAX
Accumulator (arithmetic)
EBX
Base register
ECX
Counter (loops, shifts)
EDX
Data register
ESI
Source index
EDI
Destination index
ESP
Stack pointer
EBP
Base pointer (stack frame)
EIP
Instruction pointer
The Stack
Now that we've gotten the registers out of the way, we need to learn about the stack. What's the stack? Well, the stack is basically a piece of memory that operates by doing the last in first out (LIFO) principle. This principle is a data structure where the last item added is the first item to be removed.
The stack is used to store temporary data such as addresses, local variables, and register states. The below image should provide you with a better understanding of the stack:
Now if you don't fully understand this yet that's okay there's a lot of information quickly. Let's go through it. Basic principles are as follows:
The stack grows downwards in memory, from higher address to lower address
The
PUSHmnemonic adds an item to the top of the stack.The
POPmnemonic removes the most recent item that was added. For example:
The
ESPregister is the stack pointer. This register tracks the top of the stack. So in the above example at the end, the register is10because we pushed 20 off the stack.When a function is called the arguments and the return address of the function is pushed to the stack as well.
So basically, as the stack grows downward in memory the push and pop mnemonics manage the stack by adding and removing items from the top of the stack while the esp register automatically tracks the top of the stack.
Writing code in x86 Assembly
When writing code in assembly you will be adding sections to the assembly code. These sections are used to organize the code into specific areas of memory. These areas of memory have different purposes during runtime. Let's go through the sections and their responsibilities:
Common Section Information
.text
Contains the code of the program
Executable by the CPU
Read-only to prevent modification
.data
Houses the static data of the program that will be modified by the program
Usually stores globals and initialized data
Most likely will be read-write
.bss
Stores uninitialized data
Variables sizes in the section are known but values are 0 at runtime
Saves memory because you don't need to initialize variables
.rdata
Stores read only data
Data is not modified during execution
Primary purpose is to hold constant data
By breaking a program apart into sections, it allows the processor to access each type more efficiently. Such as how modern processors cache information differently to optimize speed.
When the operating system loads the program it does the following:
Loads the code marked as executable into memory
Loads the data into memory segments that's marked as writeable
Sets the permissions accordingly to help with performance and security
Basically, sections assemble code into readable, writeable, or readable and writeable segments to help the processor efficiently use memory, obtain higher levels of security, and provide easier management for both the assembler and the operating system. An example of sections is the following:
Writing the code
Now that we understand sections and what they are for, we can start writing some code. For this course we will write a basic 'Hello, World!' program using x86 assembly.
Compiling Assembly Code
Now that we have written the program, we need to compile it. To compile it you will need NASM. You can see installation instructions on how to install NASM here. Assembly code needs to be compiled, and then linked to the correct format, save the above code into hello.asm and follow the below steps:
Let’s breakdown what we just did starting with nasm -f elf32 -o hello.o hello.asm:
There are plenty of other formats you can compile into. By running nasm -hf you can see all of them. The next command is ld -m elf_i386 -o hello hello.o. Same thing, lets break it down:
Once all these are done you will be able to call your output file by running it like so: ./hello. This means that you have now successfully compiled and run an assembly program.
Last updated