Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Toolchain

In general, the set of programming tools used to create a program is referred to as the toolchain. The toolchain used here consists of the following:

  • Assembler
  • Linker
  • Loader
  • Debugger

The source code file passes through multiple stages before becoming an executable program during the assemble, link, and load process. The human-readable source code file is converted into an object file by the assembler, which is then transformed into an executable by the linker, and the executable is loaded into memory with the help of loader.

Overview: Assemble, Link, Load

Assembler

The assembler is a program that will read an assembly language source code containing assembly instruction in input file and convert the code into a machine language binary (bytecode).

During this process the comment are removed and variable names and label are converted into appropriate addres (as required by the CPU during execution)

The assembler used here is yasm.

yasm -g dwarf2 -f elf64 example.asm -l example.lst
  • -g dwarf2 : it used to specify to assembler to include debugging information in object file (.o)
  • -f elf64 : Informs the assembler to create the object file in elf64 format (which is appropriate to 64-bit Linux based-system)
  • axample.asm : is a assembly source file in input.
  • -l example.lst : in form assembler to create a list file named example.lst

But what is a list file? A list file shows the line number, the relative address, the machine-language version of the instruction (including variable references), and the original source line. This information is useful when debugging.

36 00000009 40660301		dVar1	dd		17000000
37 0000000D 40548900		dVar2	dd		9000000
38 00000011 00000000		dResult	dd		0
  • Line 36
  • relativ address : 0x00000009 stored in the data area
  • double-word variable: dVar1 requires four-bytes.
  • next address is 0x0000000D so dVar1 uses a 0x00000009, 0x0000000A, 0x0000000B, 0x0000000C
  • 0x40660301 is the value in hex, as placed in memory. A 17000000 is 0x01036640. in hex. Remember that the architecture used here is little-endian; the LSB (0x40) is placed in the lowest memory address.
  • A 0x40 is placed at 0x00000009 next 0x66 is placed in address 0x0000000A

Little Endian

For example, a fragment of the list file text section, excerpted from the example program in the previous chapter is as follows:

95						last:
96	0000005A 48C7C03C000000		mov		rax, SYS_exit
97	00000061 48C7C300000000		mov		rdi, EXIT_SUCCESS
98	00000068 0F05				syscall
  • Again, the number to the left are the line numbers, the net number 0x0000005A is the relative address if where the line of code is placed.
  • The next number 0x48C7C03C000000 is the machine language version of instruction, in hex , that the CPU reads and understands.
  • The rest of the line is the original assembly language source instruction.
  • The label last: does not have a machine language instruction, it not a executable instruction.

Two-Pass Assembler

The assembler will read a source code and convert it in bytecode who it translate in binary (understand by CPU) The 1’s and 0’s are referred to as machine language. This relationship between assembly code and binary readable language means that machine language can be converted back to human readable, but of course the comment, variable names and label names are missing, so the resulting code can be very difficult to read.

Each line read by the assembler has her instruction generated, but in case when a instructions is a jumps like If statements or unconditional jumps, it not possible to perform the convertion of this instructions.

Ex :

mov		rax, 0
jmp		skipRest
...
...
...
skipRest:

Reading line by line a assembler cannot know if a skipRest is defined or just exist when it read a line when it called, the solution for that is to read a file twice, it know by name of two-pass assembler.

Fisrt pass

This step vary of the design specific assembler, but several basic operations performed is:

  • Create symbol table
  • Expand macros
  • Evaluate constant expressions

A macro is a program element that is expanded into a set of programmer predefinned instructions. A constant expression is an expression composed entirel of contants. By example if a constant is used in one line do a operations, if we know from begenning that it was declared this line can be read, understand and executed without problem.

Ex:

mov rax, BUFF+5

Second pass

The steps taken on the second pass vary based on design of the specific assembler. The differents basic operation performed on the second pass include :

  • Final generation of code
  • Creation of list file (if requested)
  • Create object file

The generation of code is about to the conversion of the assembly language into the CPU executable machine instruction. Knowing that a one-to-one correspondance, is used for transform instructions (instructions that do not use symbols on either the first or second pass)

A based assembler design can help to done code generation be done on the first or all done on the second pass. In much case a final generation is performed on second pass and require using the symbol table to check program symbols and obtain the appropriate addresses from the table.

Assembler Directives

Assembler directives are instructions to the assembler that direct the assembler to do something. This might be formatting or layout. These directives are not translated into instructions for the CPU.

Linker

The linker, sometimes referred to as linkage editor, will combine one or more object files into a single executable file including any neccesary libraries . A example using example file from previous chapter with GNU gold linker.

ld -g -o example example.o
  • -g is used to included debugging information in the final executable file.
  • -o specifies to create a executable file name example (with no extension) when the -o is ommitted the output file is named a.out The linker reader a example.o file who is input here, note that you can name you file like what you want and not need to have the same name as any of the input object files.

It is also possible to link multiple object files.

ld -g -o example main.o example.o

When a function are located in external source file, any function not in the current source file must be declared as extern . Variables, such as global variables, in other source files can be accessed by using the extern statement as well, however data is typically transferred as arguments of the function call.

Linking Process

The object files and library routines are combined into a single executale module. As part of combining the object file, the linker must adjust the relocatable addresses as necessary.

Assuming there are two source files, the main and secondary source file boths of which have been assembled into object file main.o and funcs.o .4 After assembles the calls to routines outside of file being assembled are declared with the external assembler directive.

The code is not available for an external reference and such references are marked as external in the object file. The list file will show an R for such relocatable addresses. The linker must satisfy the external references. Additionally, the final location of the external references must be placed in the code. For example, if the main.o object file calls a function in the funcs.o file, the linker must update the call with the appropriate address as shown in the following illustration.

linking multiple

Here fnc1 is external to main.o it inside a funcs.o file and it marked with the R. It started to relative address (0x100), and when it was combined with main.o the final executable it adapt her and take 0x400 like address, and the linker insert this address into the call statement in the main in order to complete the linking process and ensure the function call work correctly it work with the relocatable adresses for both code and data.

Dynamic Linking

The linux Operating system supports dynamic linking who is represented by a .so (shared object file), which allows for postponing the resolution of some symbols until a program is being executed. The actual instructions are not placed in exacutable file and instead, if needed, resolved and accessed at run-time.

This approach offers two advantages:

  • Commonly used libraries can be stored in a single location instead of being duplicated in every binary.
  • If a bug in a shared library is fixed, programs that use it dynamically will benefit from the fix on next run.
  • Disadvantages
  • When a library is updated, the executable may break because it depends on the previous library version.
  • A program using its own library must be trusted; replacing components can introduce compatibility issues.

For not wast a time for type always the command to assemble and link with ld it possible to write a script who do all assembly and linking process See below:

#!/bin/bash

# Simple assemble/link script

if [ -z $1 ]; then
echo "Usage: ./asm64 <asmMainFile> (no extension)"
exit
fi
#Verify no extensions were entered

if [ ! -e "$1.asm" ]; then
echo "Error, $1.asm not found."
echo "Note, do not enter file extensions."
exit
fi

# Compile, assemble, and link.

yasm -Worphan-labels -g dwarf2 -f elf64 $1.asm -l $1.lst

ld -g -o $1 $1.o

This script file can be name asm64 we don’t need obligatory a extension here because on linu all is file.

chmod +x asm64  # to give execution right to script file

Execute it:

./asm64 example # to compile a file and give her the name example note that you can use another filename

Loader

The loader is a part of our opereationg system who load the file from secondary storage (Hard drive) to primary storage (RAM), it create a new process for executable, and load the code in memeory, the program is run when the executable is invoked

./example  # the previous file created after linking and assembley

Debugger

The debugger is used for control program execution of program, if during execution nothing is printed to user it possible to use debugger to check a result . Multiple debugger exist but the GNU product is appreciated for our exporation. So we used a GNU DDD who is a graphical interface for GDB.