1. Introduction to x86 Assembly

What is x86 Assembly?

In a nutshell assembly is a low-level programming language to write instructions that a CPU can directly execute. Each instruction in assembly is composed of a mnemonic (opcode), operand, and an address. Some instructions come with a prefix, suffix or flag.

x86 in the name specifies the architecture of the language. There are multiple types of assembly languages for each CPU architecture and for each CPU. For example, Intel chips have different instructions than ARM chips.

This course is a semi deep dive into the x86 programming language and should provide the user with enough information on how assembly works to build a program successfully.


Registers

In a sentence: registers are small storage locations in a CPU that's used to hold temporary data during execution.

Each register has a purpose. However, most of them can be used for general purposes or for various operations. In this section we will provide information on the registers in the x86 architecture.

General purpose registers

  • EAX

    • Purpose:

      • The Accumulator register.

    • Common usage:

      • It is normally used for arithmetic operations, such as: add, sub, mul. Also used to store calculations results.

    • Example of usage:

mov eax, 15 ; loads 15 into EAX register
add eax, 15 ; EAX register now holds 30
  • EBX

    • Purpose:

      • The Base register

    • Common usage:

      • Usually used as a pointer to data in memory, but can also be used for arithmetic purposes.

    • Example of usage:

  • ECX

    • Purpose:

      • The Counter register

    • Common usage:

      • Mostly used as a loop counter or for string/memory operations

    • Example of usage:

  • EDX

    • Purpose:

      • The Data register

    • Common usage:

      • Works with the eax register for multiplication/division. Holds parts of large results/data.

    • Example of usage:

NOTE: It is worth mentioning that 32bit is smaller than 64bit to store which is why sometimes edx, and eax are used together to hold values.


Pointer and index registers

  • ESI

    • Purpose:

      • The Source register

    • Common usage:

      • Mostly used for string operations to hold source address of mnemonics like: movsb, movsw, movsd

    • Example of usage:

  • EDI

    • Purpose:

      • The Destination Index register

    • Common usage:

      • String operations to hold the destination address in similar instructions

    • Example of usage:

  • EBP

    • Purpose:

      • Base Pointer register

    • Common usage:

      • Generally points to base of current stack frame. References function parameters and variables.

    • Example of usage:

  • ESP

    • Purpose

      • Stack Pointer register

    • Common usage:

      • Points to the top of the stack, automatically updated.

    • Example of usage:


Special Purpose Registers

  • EIP

    • Purpose:

      • Instruction pointer

    • Common usage:

      • Holds the address of the next instruction to be executed. This register is not directly modifiable by most instructions. Can only be modified through control flow instructions such as: jmp, call, ret.

  • EFLAGS

    • Purpose

      • Flag register

    • Common usage:

      • Stores status flags that indicate operation results.

    • Flags:

      • Zero Flag (ZF): operation is zero

      • Carry Flag (CF): operation results in a carry or borrow

      • Sign Flag (SF): operation is negative


Table breakdown:

Register
Purpose

EAX

Accumulator (arithmetic)

EBX

Base register

ECX

Counter (loops, shifts)

EDX

Data register

ESI

Source index

EDI

Destination index

ESP

Stack pointer

EBP

Base pointer (stack frame)

EIP

Instruction pointer


The Stack

Now that we've gotten the registers out of the way, we need to learn about the stack. What's the stack? Well, the stack is basically a piece of memory that operates by doing the last in first out (LIFO) principle. This principle is a data structure where the last item added is the first item to be removed.

The stack is used to store temporary data such as addresses, local variables, and register states. The below image should provide you with a better understanding of the stack:

Now if you don't fully understand this yet that's okay there's a lot of information quickly. Let's go through it. Basic principles are as follows:

  • The stack grows downwards in memory, from higher address to lower address

  • The PUSH mnemonic adds an item to the top of the stack.

  • The POP mnemonic removes the most recent item that was added. For example:

  • The ESP register is the stack pointer. This register tracks the top of the stack. So in the above example at the end, the register is 10 because we pushed 20 off the stack.

  • When a function is called the arguments and the return address of the function is pushed to the stack as well.

So basically, as the stack grows downward in memory the push and pop mnemonics manage the stack by adding and removing items from the top of the stack while the esp register automatically tracks the top of the stack.


Writing code in x86 Assembly

When writing code in assembly you will be adding sections to the assembly code. These sections are used to organize the code into specific areas of memory. These areas of memory have different purposes during runtime. Let's go through the sections and their responsibilities:

Common Section Information

Section Name
Description
Info
Notes

.text

Contains the code of the program

Executable by the CPU

Read-only to prevent modification

.data

Houses the static data of the program that will be modified by the program

Usually stores globals and initialized data

Most likely will be read-write

.bss

Stores uninitialized data

Variables sizes in the section are known but values are 0 at runtime

Saves memory because you don't need to initialize variables

.rdata

Stores read only data

Data is not modified during execution

Primary purpose is to hold constant data

By breaking a program apart into sections, it allows the processor to access each type more efficiently. Such as how modern processors cache information differently to optimize speed.

When the operating system loads the program it does the following:

  • Loads the code marked as executable into memory

  • Loads the data into memory segments that's marked as writeable

  • Sets the permissions accordingly to help with performance and security

Basically, sections assemble code into readable, writeable, or readable and writeable segments to help the processor efficiently use memory, obtain higher levels of security, and provide easier management for both the assembler and the operating system. An example of sections is the following:

Writing the code

Now that we understand sections and what they are for, we can start writing some code. For this course we will write a basic 'Hello, World!' program using x86 assembly.


Compiling Assembly Code

Now that we have written the program, we need to compile it. To compile it you will need NASM. You can see installation instructions on how to install NASM here. Assembly code needs to be compiled, and then linked to the correct format, save the above code into hello.asm and follow the below steps:

Let’s breakdown what we just did starting with nasm -f elf32 -o hello.o hello.asm:

There are plenty of other formats you can compile into. By running nasm -hf you can see all of them. The next command is ld -m elf_i386 -o hello hello.o. Same thing, lets break it down:

Once all these are done you will be able to call your output file by running it like so: ./hello. This means that you have now successfully compiled and run an assembly program.

Last updated