Saturday 19 August 2017

Understanding linker scripts and walkthrough of it in MDK-OS.

I wanted to write about a certain complex topic called linker scripts. Although the syntax being simple the topic is very vast. I will be taking an example from my OS. From here it is possible to expand and build on top of it. I will discuss the basics of the topic and will expand as I implement further techniques.

Linker scripts are used by the linker (in GNU toolchain it is called ld). Linker scripts allow us to control the positioning and attributes of object code in the final output file.  The linker "stitches" the various object code files into one single output file with instructions on how to position various "sections" of the program taken from the linker script file. The extension of a linker script is ".ld" or sometimes ".lds". The script is written using the linker command language.

Linker always uses a the linker script. If you don't supply any linker scripts the linker will use an internal linker script which is compiled into the linker executable file. You can check out what the default linker script is by typing:

1
ld --verbose

You can provide a linker script by providing a -T option. In my Makefile for the loader and mdkos I have two variables LOADER_LDSCRIPT and OD_LDSCRIPT with strings -Tloader.lds and -Tmdkos.lds respectively.

As mentioned above the linker script combines different input files into a single output file. These  files are in a special format called object file format. The files are called object files. Each of the object file has among other things called sections. The sections in input files are called input sections. Similarly the section in an output file is called an output section.

Each section in an object file has a name and size.  Most sections also have an associated block of data called section contents.  A section may be marked as loadable, which means that the contents should be loaded into memory when the output file is run. A section with no contents may be allocatable, which means that an area in memory should be set aside, but nothing in particular should be loaded there (in some cases this memory must be zeroed out). A section which is neither loadable nor allocatable typically contains some sort of debugging information.

You can see the different sections of an object file by using objdump with the -h option. For example to view the different sections in the mdk_loader elf file I input:

1
arm-none-eabi-objdump -h mdk_loader.elf

Every object file also has a list of symbols, known as the symbol table. A symbol may be defined or undefined. Each symbol has a name, and each defined symbol has an address, among other information. If you compile a C or C++ program into an object file, you will get a defined symbol for every defined function and global or static variable. Every undefined function or global variable which is referenced in the input file will become an undefined symbol.
You can check the different symbols in the object file using the -t option for the objdump or use nm.
For example:

1
arm-none-eabi-objdump -t mdk_loader.elf

Typically the different sections in your program and the place they reside can be:
  • Constant data: For example this can be const char teststr = "Test string". This type of information can safely be stored in ROM and used in place and need not be copied to say RAM.
  • Initialized variables: For example int testint = 1234. This data my physically reside in RAM, the initial values to be loaded at boot time must be in ROM.
  • Uninitialized variables: For example a declaration such as  int testint;. These need not occupy any space in ROM. The start up code simply needs to allocate sufficient space in RAM for them, the linker needs to know how to resolve references to these variables.
In addition we have
  • Startup code (hardware and C run-time initialization) code. This code is written in assembly and must be located at a specific place in ROM.
  • Application code: This is distinct from startup code and usually doesn't have to reside anywhere specific in the memory map.
Next we come to the difficult and confusing part of linker scripts. The "load memory address" (LMA) and the "virtual memory address"(VMA). Note that VMA has nothing to do with concepts of virtual, physical memory etc. Generally every loadable or allocatable section out section has these two addresses.

Virtual memory address (VMA): This is the address the section will have when the output file is run. In the code this is the address which will be used as a reference by other parts of the code. Hence we would have to move code from the Load Memory Address to the Virtual Memory Address.
Load memory address (LMA): This the address the section will be loaded.
In most cases both the addresses will be the same. The case where they may differ is when say the .data section is loaded into ROM and then copied to RAM when the program starts up. In this case the ROM address would be the LMA and the RAM address is the VMA.

An interesting example I have encountered previously was a device with low RAM probably about 4MB.  This device had various applications to simplify it was less than 4MB but if added together would be greater than 4MB, somewhere around 50 - 100 MB monolithic application. The technique is called overlaying. All the applications were loaded into a huge NOR flash. All the applications had overlapping memory addresses in the RAM i.e. all the applications had references starting from the same address in the RAM. This is called the VMA.

The application is a monolithic code. So the applications had been placed in the incrementing addresses i.e. the section was placed at incrementing address in the NOR flash. Now to load the application from the NOR flash to the RAM there was a small program called the loader. When a user wanted to go to a specific application he would select in the user interface (UI) and the loader would copy the application from the start address i.e. the LMA address and pastes it on to the RAM i.e. the VMA address and the program counter(PC) would jump to that particular address.

We can inform ld where to load various parts of the program in two ways.

The first is to assign names to various memory regions of our device and then direct each code or data section to the appropriate memory region. This is what is followed in my code.

The second method is to start the linker's current memory location counter at a known address (the start address of the first section of the memory to be populated) and emit sections one by one to the current location, manually incrementing this location counter as appropriate in order to skip "holes" in the memory map. The "holes" in the memory map can be peripheral memory mapping etc.

For eg:

1
2
3
4
5
6
7
SECTIONS
{
    . = 0x30000000;
    .text : { *(.text) }
    .data : { *(.data) }
    .bss  : { *(.bss) }
}

In this script we know that the RAM of the S3C2440 starts at 0x30000000. So we set the location counter at that location in the RAM. The line . = 0x30000000 achieves this.

Next we tell ld which sections to include in the output file, where to emit them into memory and which sections of the input files should be mapped. The next 3 lines does this task. These lines basically say "collect all .text sections from the input files and emit them to a section called .text in the output file. Next collect all .data sections from the input file and emit them to a section called .data in the output file. Finally collect all .bss sections from the input file and emit them to a section called .bss in the output file".

Now I will describe some of the linker script examples in the MDK OS.

First we define the memory regions as follows:

1
2
3
4
5
6
7
MEMORY
{
 sram : org = 0x00000000 , len = 0x1000
 /*sdram : org = 0x30000000 , len = 0x4000000*/
 sdram : org = 0x30000000 , len = 0x3F00000 /* 63MB RAM */
 vectors : org = 0x33F00000 , len = 0x100000 /* Last 1MB for the isr handlers */
}


In the above case we have
  1. SRAM at location 0x0000 of size 4KB. 
  2. SDRAM at location 0x30000000 of size 64MB but I have commented it out. Instead I am keeping the SDRAM region size of 63MB reserving the last 1MB.
  3. The last 1MB is reserved for the interrupt vectors and it is the vectors region.
Next my different sections looks as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
.text :
{
 *(.text);
 . = ALIGN(4);
} > sdram


.data :
{
 __data_start__ = .;
 *(.data);
 . = ALIGN(4);
 __data_end__ = .;
} > sdram


In this section we have .text section loaded onto the "sdram" region. After this we also have the .data section loaded onto the "sdram" region.

I have the __data_start__ = . and the __data_end__ =  . which is extern'd in the code. These variables will be filled with the addresses of the start and end of the data section. Please note that the __data_start__ and __data_end__ is loaded with the VMA. In .data section the VMA and LMA is the same.

Next we have the following section:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
.rodata :
{
 __rodata_start__ = .;
 *(.rodata);
 . = ALIGN(4);
 __rodata_end__ = .;
} > sdram

.bss  :
{
 __bss_start__ = .;
 *(.bss); *(COMMON)
 __bss_end__ = .;

 __usr_sys_stack_bottom__ = .;
 . += 0x1000;
 __usr_sys_stack_top__ = .;

 __irq_stack_bottom__ = .;
 . += 0x1000;
 __irq_stack_top__ = .;

 __fiq_stack_bottom__ = .;
 . += 0x1000;
 __fiq_stack_top__ = .;

 __svc_stack_bottom__ = .;
 . += 0x1000;
 __svc_stack_top__ = .;
} > sdram

In the above example I have kept the ".rodata" or the read-only data in the SDRAM. This will be moved to ROM later on.

Next we come to the .bss section which is the data section. All the data is clubbed and kept in the "sdram" memory region. We also setup the user, irq, fiq and svc stack sections each the size of 4KB. We also place markers which will be used in the assembly and C code for setup of stack.

Next we come to the usage of the VMA and LMA concepts and that is in the interrupt handlers. We have the ld script as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
.vector_reloc :
{
 *(.vector_reloc);
} >vectors AT>sdram 

/* Get the lma address for the particular section */
__exception_vector_reloc_startaddr__ = LOADADDR(.vector_reloc);
__exception_vector_reloc_endaddr__ = LOADADDR(.vector_reloc) + SIZEOF(.vector_reloc);

/* 
 * Above SDRAM is where it will be stored in the file but address
 * references will be in the addresses of the isr handler section
 */

.isrhandler :
{
 *(.isrhandler);
} >vectors AT>sdram

__exception_handler_start_addr__ = LOADADDR(.isrhandler);
__exception_handler_end_addr__ = LOADADDR(.isrhandler) +  SIZEOF(.isrhandler);

In this we have the .vector_reloc section at sdram which is the LMA denoted by AT>sdram. The VMA is the vectors memory region which starts from 0x33F00000. Following this section is the .isr_handler section which is similar to above which has the LMA being in the SDRAM and the VMA being in the vectors memory region. We use LOADADDR to get the LMA of the section and SIZE to get the size of the section.

In the previous case the variables __irq_stack_bottom__ etc is loaded with the VMA. Since the LMA and the VMA are the same in it we do not bother to use the LOADADDR and SIZE functions.

How does all this come together?
The address generated and all the code references of the vector_reloc and isr_handler is the VMA. It is stored in the LMA though.The code for exception_vectors is present in os_vector.s in the section vector_reloc. The code snippet is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
.section .vector_reloc,"ax" //Apparent fix for missing section when objcopy is to have allocatable and executable flags-"ax"
//TODO: Understand the reason for the above flag.
.code 32

.globl exception_vectors

exception_vectors:
 ldr pc,=do_handle_reset  //Reset vector
 ldr pc,=do_handle_undef  //Undefined instruction
 ldr pc,=do_handle_swi   //Software Interrupt
 ldr pc,=do_handle_pabt   //Abort prefetch
 ldr pc,=do_handle_dabt  //Abort data
 ldr pc,=do_handle_reserved //Reserved
 ldr pc,=do_handle_irq  //IRQ
 ldr pc,=do_handle_fiq  //FIQ
.end

The code for exception handling is present in the file exception_handler.s and in the section isrhandler. The snippet of the code is as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
.section .isrhandler,"ax"

.code 32

.globl do_handle_reset
do_handle_reset:
 b do_handle_reset

.globl do_handle_undef
do_handle_undef:
 b do_handle_undef

.globl do_handle_swi
do_handle_swi:
 b do_handle_swi

.globl do_handle_pabt
do_handle_pabt:
 b do_handle_pabt

.globl do_handle_dabt
do_handle_dabt:
 b do_handle_dabt

.globl do_handle_reserved
do_handle_reserved:
 b do_handle_reserved

.globl do_handle_irq
do_handle_irq:

          ...
          ...

.globl do_handle_fiq
do_handle_fiq:
  b do_handle_fiq
...
...

.end

Please note that do_handle_irq contents and some other unrelated contents are replaced with "..." for clarity.

The objdump of the section is as follows run with the following command:


1
arm-none-eabi-objdump -tDSl bin/mdk_os.elf


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Disassembly of section .vector_reloc:

33f00000 <exception_vectors>:
exception_vectors():
33f00000: e59ff018  ldr pc, [pc, #24] ; 33f00020 <exception_vectors+0x20>
33f00004: e59ff018  ldr pc, [pc, #24] ; 33f00024 <exception_vectors+0x24>
33f00008: e59ff018  ldr pc, [pc, #24] ; 33f00028 <exception_vectors+0x28>
33f0000c: e59ff018  ldr pc, [pc, #24] ; 33f0002c <exception_vectors+0x2c>
33f00010: e59ff018  ldr pc, [pc, #24] ; 33f00030 <exception_vectors+0x30>
33f00014: e59ff018  ldr pc, [pc, #24] ; 33f00034 <exception_vectors+0x34>
33f00018: e59ff018  ldr pc, [pc, #24] ; 33f00038 <exception_vectors+0x38>
33f0001c: e59ff018  ldr pc, [pc, #24] ; 33f0003c <exception_vectors+0x3c>
33f00020: 33f00040  mvnscc r0, #64 ; 0x40
33f00024: 33f00044  mvnscc r0, #68 ; 0x44
33f00028: 33f00048  mvnscc r0, #72 ; 0x48
33f0002c: 33f0004c  mvnscc r0, #76 ; 0x4c
33f00030: 33f00050  mvnscc r0, #80 ; 0x50
33f00034: 33f00054  mvnscc r0, #84 ; 0x54
33f00038: 33f00058  mvnscc r0, #88 ; 0x58
33f0003c: 33f00098  mvnscc r0, #152 ; 0x98

Disassembly of section .isrhandler:

33f00040 <do_handle_reset>:
do_handle_reset():
33f00040: eafffffe  b 33f00040 <do_handle_reset>

33f00044 <do_handle_undef>:
do_handle_undef():
33f00044: eafffffe  b 33f00044 <do_handle_undef>

33f00048 <do_handle_swi>:
do_handle_swi():
33f00048: eafffffe  b 33f00048 <do_handle_swi>

33f0004c <do_handle_pabt>:
do_handle_pabt():
33f0004c: eafffffe  b 33f0004c <do_handle_pabt>

33f00050 <do_handle_dabt>:
do_handle_dabt():
33f00050: eafffffe  b 33f00050 <do_handle_dabt>

33f00054 <do_handle_reserved>:
do_handle_reserved():
33f00054: eafffffe  b 33f00054 <do_handle_reserved>

33f00058 <do_handle_irq>:
do_handle_irq():
...
...
33f00098 <do_handle_fiq>:
do_handle_fiq():
33f00098: eafffffe  b 33f00098 <do_handle_fiq>

Now that we have all the data we can start analysing the dumps.

Firstly we verify the claim that all code references are using VMA regions. If we see the disassembly of the vector_reloc and isrhandler above we can see that the memory regions (the first column) are using the addresses from the vectors region which starts from 0x33F00000. After this the isrhandler follows which starts from 0x33F00040.

Because all references are in the 0x33F0000 range we have to load the code in that memory range from the part of the RAM pointed to by the LMA to the VMA. To get the address of the LMA we use the following code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
extern char __exception_handler_start_addr__[];
extern char __exception_handler_end_addr__[];

extern char __exception_vector_reloc_startaddr__[];
extern char __exception_vector_reloc_endaddr__[];

static void setup_interrupt_vector_table()
{
/*
 * TODO: Optimize it to remove the extra index variables. Unoptimized only for test purposes.
 *
 */

 char *vector_table = (char *)EXCEPTION_INTERRUPT_VECTOR_TABLE_START;

 /* 
  * Need to get the lma of the code.
  * The __exception_vector_reloc_startaddr__ is the lma i.e. the generated 
  * address in the file. I need to use this as the start address for the 
  * later vectors and handlers.
  */

 char *src = (char *)__exception_vector_reloc_startaddr__; 
      
 uint32_t i = 0;

 for(i = (uint32_t)__exception_vector_reloc_startaddr__; 
   i<(uint32_t)__exception_vector_reloc_endaddr__; 
    i++) {
  *vector_table = *src;
  vector_table++;
  src++;
 }


 /* Continue with the same place for handler source  */
 for(i = (uint32_t)__exception_handler_start_addr__; 
     i<(uint32_t)__exception_handler_end_addr__;
     i++) {
  *vector_table = *src;
  vector_table++;
  src++;
 }

}

We extern the markers __exception_handler_start_addr__, __exception_handler_end_addr__ and __exception_vector_reloc_startaddr__,__exception_vector_reloc_endaddr__ .
The exception handler loading starts after the exception vector loading stops.

We get back to the objdump disassembly to analyse further the addresses. We see the loading to of the PC (Program counter) with the function address of the handler. We take the first example starting at 0x33F0000 which is ldr pc, =do_handle_reset.

The do_handle_reset symbol is located at 0x33F00040. To load this address we see that ldr PC, [PC, #24] (#24 is 0x18) which means load the contents of the memory present at PC+24. We face a small dilemma here. We see that the PC value is 0x33F00000. So the value after addition is 0x33F00018 (#24 is 0x18) but in the code we land to a value 0x59ff018 in that memory location 0x33F00018. Why is this?
According to the ARM guide we have the following:

Reading the program counter

When an instruction reads the PC, the value read depends on which instruction set it comes from:

For an ARM instruction, the value read is the address of the instruction plus 8 bytes. Bits [1:0] of this
value are always zero, because ARM instructions are always word-aligned.


Due to this we have actually have the PC value as 0x33F0000 + 0x8 when we do a read of the PC value in the LDR instruction. Hence PC value will be 0x33F00000 + 0x8 which is 0x33F00008. Next we have the addition of #24 which is 0x18 in hex which equals 0x33F00020. The value of that memory region is placed in the register PC. The value at that location is 0x33F00040. Hence the value of 0x33F00040 is placed in the PC which is address of the function do_handle_reset.

We observe something strange in the disassembly. We see the location 33F00020 has the following in the object dump.

1
33f00020: 33f00040  mvnscc r0, #64 ; 0x40

What does mvnscc mean? Why do we have some instructions present there which does not make no sense? Well it stumped me for sometime and then I realized that it is just a value placed in the memory. The PC loads that value which is the address of the do_handle_reset hence loads the instruction from there. Why it shows an instruction? This is because the disassembler just blindly decodes the value present. How did I come to this conclusion? I simply changed the vectors address to 0x32F0000 which loaded another instruction which had the value 0x32Fxxxxx.

Finally how exactly do I make the interrupt handler jump to the address mentioned in the vectors when ARM states that the interrupt handlers should be in location 0x00000000? I just map the address 0x00000000 to the address 0x33F00000 in the MMU translation table. So when the CPU emits the address 0x00000000 it translates to 0x33F00000.

This concludes the post on linker scripts. I will add any new things to this post if I come across anything interesting or make things even more clearer with examples.

Finally I want to conclude with the memory map of the MDK OS.

Memory Map Documentation:
=========================


+-------------------------+ ----> 0x00000000
|                         |       ^
|   Initial bootloader    |       |---> Stepping stone buffer.
|    (mdk_loader)      |       v
+-------------------------+ ----> 0x00001000
|                         |
|                         |
|  Peripheral memory map  |
|          hole           |
.                         .
.                         .
.                         .
+-------------------------+ ----> 0x30000000
|   mdk_os (.text)        |
|         .               |
|         .               |
|   mdk_os (.data)        |
|         .               |
|         .               |
|   mdk_os (.rodata)      |
|         .               |
|         .               |
|   mdk_os (.bss)         |
|         .               |
|         .               |
|   mdk_os (.stack)       |
.                         .
.                         .
.                         .
.                         .
+-------------------------+
+-------------------------+ ----> 0x33F00000
|                         |
| Interrupt Vector table  |
| (section .vector_reloc) |
|                         |
+-------------------------+ ----> 0x33F00020
|                         |
|                         |
| Interrupt handlers      |
| (section .isrhandler)   |
|                         |
|                         |
+-------------------------+ ----> 0x34000000


No comments:

Post a Comment