All the explanations in this file follow the System V AMD64 ABI calling convention.
- Apple Silicon
- Prerequisites and basic concepts 📚
- ft_strlen
- ft_strcpy
- ft_strcmp
- ft_write
- ft_read
- ft_strdup
- Resources 📖
Apple Silicon is based on ARM64 architecture. The assembly code in this repository is written for x86_64
architecture.
If you wish to use and test your assembly code seamlessly on those chips, you need to add the following lines to your Makefile:
CC = gcc
ifeq ($(shell uname -m), arm64)
CC += -ld_classic --target=x86_64-apple-darwin
endif
Instead of rewriting all the assembly code for ARM64 architecture, we basically compile our C code in x86_64
architecture as it can be executed through Rosetta 2.
Don't forget the
-f macho64
flag after yournasm
command when compiling on Macs.
Assembly works mainly with registers. They can be compared to little pre-defined boxes that can store data.
The general-purpose registers are:
rax
rcx
rbx
rdx
rsi
rdi
rsp
rbp
r8
,r9
, ...r15
rip
We have to be careful when putting data in those registers, because they can be used by the system at any time (read or written).
Among the above registers for example:
rax
is used to store the return value of a function.rcx
is used as a counter in loops.rsp
is used to store the stack pointer.rbp
is used to store the base pointer.rdi
,rsi
,rdx
,rcx
,r8
,r9
are used respectively to pass arguments, vulgarly likefunction(rdi, rsi, rdx, rcx, r8, r9)
This may seem like an inconvenience, but it is actually very useful to manipulate the behavior of the program.
For instance, if we want to call
sys_write
:
- We put the syscall number (4 for
sys_write
) inrax
.- We put the file descriptor in
rdi
.- We put the address of the buffer in
rsi
.- We put the number of bytes to write in
rdx
.- We call the
syscall
instruction.
The conclusion is: the little boxes are very useful as intermediate storage for data, but we have to be careful not to overwrite them when we (or the system) need them.
Assembly is read from top to bottom.
The instructions can be grouped in labels, which are used to mark a specific point in the code. They are followed by a colon.
One thing that can be confusing is that although labels look like functions in other languages like C, they are not.
For example:
entry_point:
xor rax, rax
do_something:
mov rax, 0
do_something_else:
mov rax, 1
return_label:
ret
In this example, entry_point
is the entry point of the program/function.
do_something
and do_something_else
will be executed one after the other, even without a "jump" instruction.
The syntax used in this repository is the Intel syntax. It is the most common syntax used in assembly programming, and a requisite of the subject. It is characterized by the fact that the destination operand is on the left and the source operand is on the right.
Virtually all the lines in assembly are composed of an instruction followed by its operand(s).
A few examples:
-
mov rax, 0
copies the value0
into the registerrax
. -
add rax, 1
adds1
to the value in the registerrax
. -
cmp rax, 0
compares the value in the registerrax
with0
. -
jmp do_something
jumps to the labeldo_something
.
It is very important to remember that every instruction can alter the behavior of the program implicitly.
For example:
- The
cmp
instruction will set the flags register according to the result of the comparison. - The
loop
instruction will decrement thercx
register and jump to the label ifrcx
is not zero.
Like in C, we can work with addresses.
The square brackets []
are used to dereference an address.
For example, if we want to move the value at the address 0x1234
into the register rax
, we can do:
mov rax, [0x1234]
If we want to compare the address 3 bytes after the address in rax
with 0
, we can do:
cmp [rax + 3], 0
Here we should technically use an identifier for the address (
BYTE
,WORD
,DWORD
,QWORD
) to specify the size of the data we want to compare, but we ignored it for the sake of this explanation.
In order to use the functions we write in assembly in a C program, we need to export them.
To do so, we can use the global
directive.
For example, if we want to export the function ft_strlen
, we can do:
global ft_strlen
ft_strlen:
...
The ft_strlen
function is a function that returns the length of a string. It is a very simple function that iterates over the string until it finds the null-terminator (\0
).
To implement it in assembly, we need to recapitulate the behavior of the function:
- Set a counter to 0.
- Look at the first character of the string.
- Increment the counter.
- Look at the next character.
- If it is not the null-terminator, increment the counter and go back to step 4.
- If it is the null-terminator, return the counter.
To replicate this behavior in assembly, we will need to learn a few instructions:
mov
to copy data.jmp
to jump.cmp
to compare data.je
to jump if equal.inc
to increment a register.ret
to return from the function.
mov rax, rdi
This instruction copies the value in rdi
to rax
.
entry_point:
jmp some_other_label
some_label:
...
some_other_label:
...
The first line of entry_point
will jump to the label some_other_label
(and skip some_label
) regardless of the condition.
Jumps work like a goto
in C. They can be used to skip parts of the code, or to create loops (as they can jump to a label that is located earlier in the code).
cmp rax, rdi
This instruction compares the value in rax
with the value in rdi
.
If we want to check if rax
is equal to rdi
, we can do:
cmp rax, rdi
je equal
Which leads us to the next instruction.
cmp some_register, some_other_register
je equal
This instruction will jump to the label equal
only if the two registers are equal.
inc rax
This instruction increments the value in rax
by 1
. Simple.
ret
This instruction returns from the program/function.
Remember that in the System V AMD64 ABI, the return value of a function is stored in the
rax
register. Whatever is inrax
when we callret
instruction will be the return value of the program/function.
Now that we know the instructions we need, we can implement the ft_strlen
function.
The first thing we need to do is to set the counter to 0. We can do this by moving 0
to rcx
(or any other register, but remember rcx
is commonly used as a counter).
mov rcx, 0
Then, we need to define our recurring loop.
We need to look at every character in the string passed in rdi
(where is passed the first argument in this calling convention, as seen before).
So, rdi
first points at the first character of the string, and rcx
is our counter initialized to 0.
Like we would look into *(str + i)
in C, we can use assembly's square brackets []
as follows:
cmp [rdi + rcx], 0
What happens next?
- If the character is not the null-terminator, we need to increment the counter and go back to the beginning of the loop.
- If it is the null-terminator, we need to return the counter.
So, with the instructions we learned before, we can use:
cmp
to compare the character with0
je
to jump to the end of the function if it is the null-terminator.inc
to increment the counter.jmp
to go back to the beginning of the loop.
Our loop would then look like:
loop:
cmp [rdi + rcx], 0
je end
inc rcx
jmp loop
Finally, we need to return the counter. We can do this by moving the value in rcx
to rax
and returning.
Remember, that "whatever is in
rax
when we callret
instruction will be the return value of the program/function."
end:
mov rax, rcx
ret
And that's it! We have implemented the ft_strlen
function in assembly.
The ft_strcpy
function is a function that copies a string into another string, and returns a pointer to the destination string.
The logic is very similar to the ft_strlen
function:
- Set a counter to 0.
- Look at the first character of the source string.
- Copy it to the destination string.
- Increment the counter.
- Look at the next character and copy it.
- If it is not the null-terminator, increment the counter and go back to step 5.
- If it is the null-terminator, exit the loop and return a pointer to the destination string.
A pointer to our dst
string is passed in rdi
, and a pointer to our src
string is passed in rsi
.
We can start the same way we did with ft_strlen
by setting the counter to 0.
ft_strcpy:
mov rcx, 0
We can then start our loop.
We need to copy every character in rsi
to rdi
. However, in assembly, we can't copy data directly from one address to another (mov [rdi], [rsi]
would not work).
We therefore need to copy the data from the source address to a register, and then copy it to the destination address.
We could use any register to store the character (like
r8
as seen before), but it is more appropriate to useal
for this purpose, as it is a register that is meant to store a single byte.
loop:
mov al, [rsi + rcx]
mov [rdi + rcx], al
inc rcx
cmp al, 0
jne loop
In this loop:
- We copy the character in
rsi
toal
. - We copy
al
tordi
. - We increment the counter.
- We check if the character is the null-terminator.
- If it is not, we go back to the beginning of the loop.
Finally, we need to return the pointer to the destination string.
Given that we received this pointer in rdi
and that we did not move it, we can simply copy it to rax
and return.
mov rax, rdi
ret
The ft_strcmp
function is a function that compares two strings, and returns the difference between the first two different characters. For example, "abc" and "abd" would return -1, as 'c' - 'd' = -1.
Its logic is not much more complex than the previous functions, but the particularities of assembly make it a bit more challenging (for me at least, maybe I have a shitty logic).
Let's recap the behavior of the function anyway:
- Set a counter to 0.
- Compare every character of the first string with every character of the second string.
- If they are different, substract the second character from the first character and return the result.
From this exercise on, I will only explain the new instructions and concepts we use.
sub rax, rdi
performs a substraction such as rax
← rax
- rdi
.
movzx rax, BYTE[rdi + rcx]
moves the byte at the address rdi + rcx
to rax
, and fills the remaining bits with 0.
movzx will adapt to the keyword used to specify the size of the data we want to move. For example,
movzx rax, WORD[rdi + rcx]
would move a word (16 bits) torax
.
This instruction is useful to us as rax
is a 64-bit register, and we only want to compare the characters as bytes (8 bits).
jz label
jumps to label
if the zero flag is set, a bit like je
and jne
but for the zero flag.
For example, if we want to jump to end
if one of register1
or register2
is 0, we can do:
cmp register1, 0
cmp register2, 0
jz end
I will not show code like in the previous ones, just describe the logic with more depth.
My intermediates registers will be rax
and r8
(not the most efficient code but easier to understand).
- We set
rcx
,rax
andr8
to 0. - We start our loop.
- We copy
rdi
andrsi
torax
andr8
. - We check that the characters are not the null-terminator.
- We compare the characters.
- If they are different, we substract them and return.
- If not, we increment the counter and go back to the beginning of the loop.
That's it! With all the precautions I mentioned and the new instructions, you should be able to implement the ft_strcmp
function.
The ft_write
function is a function that, provided a file descriptor, a buffer and a size, writes the buffer to the file descriptor. It returns the number of bytes written, or -1 if an error occurred.
To implement this function, we need to know how to make a syscall.
As we saw in the introduction, to make a syscall, we need to:
- Put the syscall number in
rax
. - Put its arguments in the appropriate registers (here
rdi
,rsi
andrdx
). - Trigger the syscall with the
syscall
instruction, that will read the values in the registers and behave accordingly.
In our case, the parameters we receive from C are already in the right registers, so we can save the following instructions:
mov rdi, rdi ; file descriptor
mov rsi, rsi ; buffer
mov rdx, rdx ; count
We can then put the syscall number in rax
and call the syscall.
On Apple Silicon Macs, the syscall number for sys_write
is 0x2000004
. On Linux, it is 1
.
mov rax, 0x2000004
; mov rdi, rdi
; mov rsi, rsi
; mov rdx, rdx
syscall
ret
We could stop here, as the function would properly write to the file descriptor and return the number of bytes written (the syscall putting its return value in rax
).
However, we need to return -1
if an error occurred, and set the errno
variable accordingly, as asked in the subject.
To handle any error and jump to another label, we can use the jc
instruction. This will jump to the label if the Carry Flag is set, which is generally the case when an error occurs.
jc error
errno
is a variable that is set when an error occurs. It is a global variable that is set by the system when a syscall fails.
However, it is not automatically translated to the errno
variable in C. We need to set it ourselves.
To do so, we are provided with __errno_location
(or ___error
on Mac). This function returns a pointer to the errno
variable.
We can call it with the instruction call __errno_location
. As for other instructions, it will put the return value in rax
.
However, rax
already contains the return value of the write
syscall. We need to save it before calling __errno_location
.
error:
mov r8, rax
call __errno_location
We now have the address of the errno
variable in rax
. We can put the previously saved error code in it by dereferencing the address.
mov [rax], r8
Finally, we can return -1
and exit the function.
mov rax, -1
ret
And that's it! We have implemented the ft_write
function.
The ft_read
function is a function that, provided a file descriptor, a buffer and a size, reads from the file descriptor to the buffer. It returns the number of bytes read, or -1 if an error occurred.
It's literally implementing the ft_write
function but with the sys_read
syscall.
Same logic, same instructions, same everything.
So we just replace 0x2000004
with 0x2000003
and we're good to go.
Easy.
The ft_strdup
function is a function that duplicates a string. It allocates memory for the new string, copies the string to it, and returns a pointer to the new string.
Doesn't sound as easy as ft_read
.
Luckily, we can use the functions we implemented before, and the malloc
function. But we also need to learn two new instructions.
We have another way to store data in assembly: the stack.
The stack is literally a pile of data, organized in a LIFO (Last In, First Out) way.
If I push 1
, 2
and 3
on the stack, it will look like:
3
2
1
If I pop the stack here, I will get 3
.
In assembly, we can manipulate the stack as follows:
push rax
push rbx
This instruction will push the value of rax
, then the value of rbx
on the stack.
value of rbx
value of rax
We then have the pop
command, that will take the last value pushed on the stack and put it in the operand we specify.
pop any_register
After this instruction, any_register
will contain the value of rbx
.
In this implementation, we will use push
and pop
to save the values of our registers when calling other functions that manipulate them.
Let's recap the steps of the function:
- Get the length of the string (
ft_strlen
). - Allocate memory for the new string (
malloc
). - Copy the string to the new string (
ft_strcpy
).
We also know that:
- When
ft_strdup
is called,rdi
contains*s
(the string to duplicate). ft_strlen
reads a string inrdi
and returns the length inrax
.malloc
reads the size inrdi
and returns a pointer to the allocated memory inrax
.ft_strcpy
reads the source string inrsi
and the destination string inrdi
.
So:
- We don't need to touch
rdi
before callingft_strlen
. - We call
ft_strlen
, that saves the length inrax
. - We push the
rdi
(that contains*s
) to the stack, to use it later. - We increment the length by 1 (for the null-terminator).
- We move the length of
*s
tordi
to callmalloc
with the right size. - We call
malloc
, that returns a pointer to the allocated memory inrax
. - We pop the
*s
from the stack torsi
(second argument offt_strcpy
). - We move the pointer to the allocated memory to
rdi
(first argument offt_strcpy
). - We call
ft_strcpy
, that copies the string to the allocated memory. - We can directly call
ret
as the pointer to the new string is already inrax
.
And that's it! We have implemented the last mandatory function of the project.
- The syscall number for
sys_write
is1
on Linux. - The syscall number for
sys_read
is0
on Linux. - The
___error
function is named__errno_location
on Linux. - The symbols don't need to be prefixed with an underscore on Linux.
- The Carry Flag is not set when an error occurs on Linux. We need to check if
rax
is negative to detect an error. - The nasm flag
-f elf64
should be used instead of-f macho64
.