Thursday, April 3, 2014

link editing

 link editing (ld)

ink-editor, ld(1) concatenates one or more input files (relocatable objects, shared objects or archive libraries) to produce one output file ( relocatable object, exe, or shared object ). Most commonly evoked as a part of the compilation ( cc, gcc ).

Link Editing (ld)

Takes input files from cc, as, or ld and produces one output file of the following formats: relocatable objects, static exe, dynamic exe or shared object. All input files to ld are in the Executable Linker Format (ELF). It is therefore crucial that we understand ELF file format in order to understand link editing. First we shall examine the types of ELF files one can have and there purpose.

  • Relocatable Objects - concatenation of relocatable object input files into one output that can be used again in link-editing. These files contain data telling the linker how to link them to other relocatable objects, shared objects, and executable's.
  • Static exe - all symbol references get bound to the exe, and thus represent a ready to run process. Both forms of executable files contain the data necessary for the operating system to produce an executable image.
  • Dynamic exe - concatenation of relocatable objects that requires intervention by the runtime linker to produce the runnable process. The symbols in the symtab might need binding at runtime. The dynamic executable may also be dependent on shared objects(so). Dynamic executable's are the default output of a compilation.
  • Shared Objects - concatenation of relocatable objects that provides services to dynamic executable's bound at runtime by the runtime linker ld.so.1. Shared objects might also be dependent on other shared objects. Think of Shared objects as dynamic executable's that have not been assigned any virtual address space.

The graphic below demonstrates how to create the various file format discussed above.

Executable Linker Format (ELF)

The ELF file format was created by Unix System Laboratories as a better alternative to a.out and COFF binary formats. Some capabilities of the ELF format include: dynamic linking, dynamic loading, imposing runtime control on a program, and an improved method for creating shared libraries. ELF files contain five section types that may or may not be included in the file. The five types include:

  1. The ELF header.
  2. The Program header table.
  3. The Section header table.
  4. ELF sections. (linker view)
  5. ELF segments. (executable view)

Each of the ELF file formats described above can be looked at in 2 ways (called views). The first view is the linker view and the second is the executable view. The views are summarized in the figure below:

The linker view of ELF files is partitioned into sections while the executable view is partitioned into segments. Sections represent the smallest indivisible unit that can be processed in the ELF file. A segment is a collection of sections and is the smallest unit that can be mapped (mmap) to memory by (exec) or (ld.so.1). These two views allows us to look at information that is specific to linking such as the symbol table and relocation information separate from information specific to creating the process image, like text and data segments. The bulk of the data is therefore stored in sections and segments with the rest of the file (headers) devoted to the organization and access of those sections/segments. The following is a brief description of each of the five file parts.



ELF Header.

This is the only fixed portion of the ELF file, always occurring at the start. It provides information such as: ELF version, target architecture, location of program header table, location of section header table, location of strings table(storing the names of sections), along with the size of each table, and lastly the location of the first instruction that is going to be executed.

#define EI_NIDENT 16

typedef struct {
   unsigned char e_ident[EI_NIDENT];
   uint16_t e_type;
   uint16_t e_machine;
   uint32_t e_version;
   ElfN_Addr e_entry;
   ElfN_Off e_phoff;
   ElfN_Off e_shoff;
   uint32_t e_flags;
   uint16_t e_ehsize;
   uint16_t e_phentsize;
   uint16_t e_phnum;
   uint16_t e_shentsize;
   uint16_t e_shnum;
   uint16_t e_shstrndx;
} ElfN_Ehdr;



Program Header Table

The program header table is only useful to executables and shared objects. This provides organizational information on the array of segments in the file. Each entry in the program header table contains the type, file offset, physical address, virtual address, file size, memory image size, and alignment for a segment in the program. Each segment is copied into memory if its pt_type=PT_LOAD. ?? Question how do we know the physical address ??

typedef struct {
   uint32_t p_type;
   Elf32_Off p_offset;
   Elf32_Addr p_vaddr;
   Elf32_Addr p_paddr;
   uint32_t p_filesz;
   uint32_t p_memsz;
   uint32_t p_flags;
   uint32_t p_align;
} Elf32_Phdr;



Section Header Table

Provides organization information on the array of sections in the ELF file. These entries provide the name, type, memory image starting address (if loadable), file offset, the section's size in bytes, alignment, and how the information in the section should be interpreted.

typedef struct {
   uint32_t sh_name;
   uint32_t sh_type;
   uint32_t sh_flags;
   Elf32_Addr sh_addr;
   Elf32_Off sh_offset;
   uint32_t sh_size;
   uint32_t sh_link;
   uint32_t sh_info;
   uint32_t sh_addralign;
   uint32_t sh_entsize;
} Elf32_Shdr;



ELF Sections

Sections can hold executable code, data, dynamic linking information, debugging data, symbol tables, relocation information, comments, string tables, and notes. Some sections provide information on liking, others are loaded into the process image, while others provide information on building an executable.

ELF Segments

Segments are a groupings of like sections ( text segment, data segment). A process image is created by loading segments into virtual memory segments described by the program header.

Tools readelf

readelf is a tool for viewing elf files. Click here to view and example elfdump. Make sure to view the sections in the example file and return to the example when needed. I found that it gave me a better understanding of the material having an example elf file handy.

Sections of Interest to us

So the basic idea from here is that the link editor concatenates program .text, .data, and .bss sections into the new output file. The rest of the relocation and symbol information is modified or generated to the output file.

ld Execution

So the basic idea from here is that the link editor concatenates program .text, .data, and .bss sections into the new output file. The rest of the relocation and symbol information is modified or generated to the output file.

Here is the program flow for the linker:

  • Verify options passed to it.
  • Concatenate like sections (type, attribute, name) from input relocatable objects to form sections within the output file.
  • Read symbol tables from relocatable object's and shared object's and apply the info to output file by updating other input sections. In addition an output relocation section might be generated.
  • Generate program headers that describe all the segments created.
  • generate dynamic linking info section providing shared object's dependencies and symbol bindings to the runtime linker.

You can change how these sections get mapped by creating a mapping file and using the -M option with (ld). More on this later.

Your Compiler

In practice you rarely invoke ld yourself and it is generally good practice not to. This is because the linker will not attach init and termination code to your program. But we will run some tests on our example program to better understand this (example test.c - the simplest c program).


int main( )
{
return 0;
}

Then we can ask nicely for gcc to compile our test program but not to link it. Once we are done this we can try to manually link the file

gcc -c test.c

ld test.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000008048094

Click here to view the "readelf -a" of the resulting file

The normal way is to have the compiler dirver invoke the linker as follows

gcc test.o

Click here to view the readelf of the resulting file. The deference is rather substantial. To the tune of a lot of extra crap gets included into my simple little program. There is actually more stuff added then there is stuff in my program. At this point it could be said that gcc is the author of my program and not me. So what is all this extra crap that is being added? Lets find out.

One of the only times that it is acceptable to invoke the linker on your own is when you are creating another relocatable object. This is done with the -r option for ld.

ld -r test.o

The moral of the story is that during compilation there is a bunch of extra stuff that gets included in your file. Upon realizing this a good question is what is it? On a Solaris box we can use the -# option to have the compiler display these mysterious files that are included into our code. In linux and gcc you can get the same output with a call to gcc --verbose..

cc -# -o prog test.c

Here is the results on Solaris.

/opt/SUNWspro/bin/../WS6U1/bin/acomp -i test.c -y-fbe -y/opt/SUNWspro/bin/../WS6U1/bin/fbe -y-xarch=generic -y-o -ytest.o -y-s -y-verbose -y-xmemalign=4s -Qy -D__SunOS_5_8 -D__SUNPRO_C=0x520 -D__SVR4 -D__unix -D__sun -D__sparc -D__BUILTIN_VA_ARG_INCR -D__SUN_PREFETCH -Xa -D__PRAGMA_REDEFINE_EXTNAME -Dunix -Dsun -Dsparc -D__RESTRICT -I/opt/SUNWspro/WS6U1/include/cc "-g/opt/SUNWspro/bin/../WS6U1/bin/cc -c "
### Note: LD_LIBRARY_PATH = <null>
### Note: LD_RUN_PATH = <null>
/usr/ccs/bin/ld /opt/SUNWspro/WS6U1/lib/crti.o
/opt/SUNWspro/WS6U1/lib/crt1.o
/opt/SUNWspro/WS6U1/lib/values-xa.o -o prog test.o -Y "P,/opt/SUNWspro/WS6U1/lib:/usr/ccs/lib:/usr/lib" -Qy -lc /opt/SUNWspro/WS6U1/lib/crtn.o
gcc --verbose test.c

Here is the results under debian linux

Reading specs from /usr/lib/gcc-lib/i486-linux/3.3.4/specs
Configured with: ../src/configure -v --enable-languages=c,c++,java,f77,pascal,objc,ada,treelang --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --with-system-zlib --enable-nls --without-included-gettext --enable-__cxa_atexit --enable-clocale=gnu --enable-debug --enable-java-gc=boehm --enable-java-awt=xlib --enable-objc-gc i486-linux
Thread model: posix
gcc version 3.3.4 (Debian)
/usr/lib/gcc-lib/i486-linux/3.3.4/cc1 -quiet -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=4 test.c -quiet -dumpbase test.c -auxbase test -version -o /tmp/ccSbXIgh.s
GNU C version 3.3.4 (Debian) (i486-linux)
compiled by GNU C version 3.3.4 (Debian)
GGC heuristics: --param ggc-min-expand=98 --param ggc-min-heapsize=129048
ignoring nonexistent directory "/usr/i486-linux/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/local/include
/usr/lib/gcc-lib/i486-linux/3.3.4/include
/usr/include
End of search list.
as -V -Qy -o /tmp/ccWmNHhp.o /tmp/ccSbXIgh.s
GNU assembler version 2.15 (i386-linux) using BFD version 2.15
/usr/lib/gcc-lib/i486-linux/3.3.4/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc-lib/i486-linux/3.3.4/../../../crt1.o /usr/lib/gcc-lib/i486-linux/3.3.4/../../../crti.o /usr/lib/gcc-lib/i486-linux/3.3.4/crtbegin.o -L/usr/lib/gcc-lib/i486-linux/3.3.4 -L/usr/lib/gcc-lib/i486-linux/3.3.4/../../.. /tmp/ccWmNHhp.o -lgcc -lgcc_eh -lc -lgcc -lgcc_eh /usr/lib/gcc-lib/i486-linux/3.3.4/crtend.o /usr/lib/gcc-lib/i486-linux/3.3.4/../../../crtn.o


Initialization and Termination Sections

Dynamic Objects provide code for runtime initialization and termination. This code may be in the form of function pointers or one entire block. Each of these sections is built from like section types given by input relocatable objects. Sections:

  • .preinit_array
  • .init_array
  • .fini_array

When creating dynamic objects the link editor identifies these arrays with .dynamic tags DT_PREINIT_ARRAY, DT_PREINIT_ARRAYSZ, AND DT_INIT_ARRAY, DT_INIT_ARRAYSZ, AND DT_FINI_ARRAY, DT_INI_ARRAYSZ.

The sections .init and .fini provide the runtime initialization and termination code for your dynamic executable. Compiler drivers usually supply these sections as files that are tacked onto the beginning and end of the input file list. These sections are provide the requred code in the form of two reserved functions named _init and _fini. When creating a dynamic object the link editor provides symbols with .dynamic tags DT_INIT and DT_FINI. One thing that is very kewl is that you can add functions to the ini_array and the fini_array.

refer back to our ELF file to locate these symbols.

Symbol Processing and Resolution

During input file processing the link editor passes any local symbols straight through to the output file, while global symbols are accumulated internally. The internal symbol table is searched for each new global symbol entry to determine if two are the same and some form of resolution needs to occur.

Basic types of symbol resulution

  • Undefined - global
  • Tentative - occupy storage at runtime
  • Defined - occupy storage in file

No comments:

Post a Comment