The Soul of a Machine: baremetal OS

Showing posts with label baremetal OS. Show all posts

Wednesday, 27 January 2016

MINI2440 memory address banks and SDRAM setup.

Initially the loader can be booted up without setting the RAM. This is achieved by the stepping stone controller. This controller fetches first 4KB of data from the NAND flash and places it in the 4KB SRAM called the stepping stone buffer.

This SRAM is good enough for the loader to load the MDK OS. To do anything serious we need to setup the SDRAM.

My setup:

In my MINI2440 board I have two Samsung K4S561632N SDRAM chips each of size 32MB totalling 64MB of SDRAM.

To proceed further we need to understand the datasheet thoroughly.

From the datasheet we have:

The K4S560432N / K4S560832N / K4S561632N is 268,435,456 bits synchronous high data rate Dynamic RAM organized as 4 x 16,777,216 words by 4 bits / 4 x 8,388,608 words by 8bits / 4 x 4,194,304 words by 16bits.

The SRAM type table is as follows:

Our memory being the K4S561632N its organization is 4 x 4,194,304 words by 16bits (i.e. 16M x 16). The x16 forms the data bus width i.e. 16 bits or 2 bytes. The "words" in the above sentence means this data bus width i.e. 16 bits. Hence the SDRAM outputs a "word".

Also here 268,435,456 bits is 32MB or 256Mb.

Notice that all the memories have "4 x " prefix. This is because all these memories have 4 banks and therefore are 4 bank operation chips.

The organization in the data sheet is as follows for the 3 different memories:

In this case our memory organization is the 16Mx16 with Row Address from A0~A12 and Column Address from A0-A8.

Hence we can have 2^13 addressable rows and 2^9 addressable columns to make it 4194304 addressable words in each bank. Hence 4 banks x 4194304 addressable words become a total of 16777216 words. Since each word is 16 bits or 2 bytes the chip capacity is 32MB (=16777216 x 2 bytes (16 bits) = 33554432 or 32MB).

Similarly we can figure out the numbers for the other 2 memories.

For the K4S560832N:
4 x 8,388,608 words = 33554432 words.
Since it is 8 bits per word or 1 byte per word it is 33554432 x 1 = 33554432 or 32MB.

For the K4S560432N:
4 x 16,777,216 = 67108864 words.
Since it is 4 bits per word or 1/2 a byte per word it is 67108864 x 1/2 = 33554432 or 32MB.

Now we come to how these memory chips are wired to our processor. A diagram of how the chips are wired to the processor is below (ASCII art courtesy of Juergen Borleis of Pengutronix mailing list for helping me understand the bank map configuration):

----------+      /CS to bank#2
          |----------------------------------------------------------
          |                                            |            |
S3C2440   |      /CS to bank#1                         |            |
          |------------------------------              |            |
          |                  |          |              |            |
          |             +--------+  +--------+     +--------+   +--------+
          |             | SDRAM1 |  | SDRAM2 |     | SDRAM3 |   | SDRAM4 |
          |             |        |  |        |     |        |   |        |
          |             +--------+  +--------+     +--------+   +--------+
          |            0..15 |          |16..31   0..15|            |16..31
          |                  |          |              |            |
          |----------------------------------------------------------
          |  32 bit databus
          |
----------+

If we go back to schematic we can find that nGCS6 with net name LLnSCS0 is connected to nSCS (SDRAM Chip Select) input of the two 32 MB chips.

Coming back to the data sheet we see that nGCS6 starts at memory address 0x30000000. Hence our SDRAM memory address starts from 0x30000000. The snapshot of the memory map is below:

According to the data sheet the nGCS6 forms Bank 6. Hence the two SDRAM chips are connected to Bank6 with both 16 bit bus width forming connected to the 32 bit data bus of the processor. You can verify this in the schematic snapshot below:

You can see that the LDATA0 to LDATA15 connections from chip1 and LDATA16 to LDATA31 from chip2 forming the 32 bit data bus width.

When an address say 'A' is sent on the address lines for a read from the address then the chip U6 will respond with the data set in address 'A' through LDATA0 - LDATA15. Since the same address lines are fed to chip U7 it too responds with the data set in address 'A' through LDATA16 - LDATA31. When a write is done to address 'A', the first 16 bit data is set in the address 'A' of chip U6 and the send 16 bit data is set in address 'A' of chip U7.

Notice that the address pin connections start at ADDR2. For a 32 bit data bus the address is at 4 byte boundaries.

Notice that LADDR24 and LADDR25 lines are set as inputs to BA0 and BA1 respectively. BA0 and BA1 forms bank select pins for the chip.

Now why is LADDR24 and LADDR25 pins selected?

There are 4 banks per chip. Hence the 2 bit combination will allow to select the 4 banks.
If the bits below LADDR24 are set to 1 it becomes 0xFFFFFF which is 16777215(starting from 0) which is the size of the 4 banks. (4 x4M words). Since the address starts at LADDR2 shift the LADDR24 and LADDR25 by 2 bits to the right. Now we get 0x3FFFFF which is 4194303 (starting from 0) which is the size of the single bank. A 4194304 address switches the bank to 1. Hence as far as I see this is the explanation for the bank switching using the addresses themselves i.e. when the bits of the addresses corresponding to banks change there is a bank switch.

Register setup:

We finally come to source code for the SDRAM setup.

First we need to configure Bus Width and Wait Control register (BWSCON)

The code is as follows:

void config_bwscon()
{

 /* Configure BWSCON */
 writereg32(BWSCON_REG(MEM_BA),
   DW7_RESERVED|DW6_32b|DW5_RESERVED|DW4_RESERVED|
   DW3_RESERVED|DW2_RESERVED|DW1_RESERVED);
}

Here the DW6 parameter is set to DW6_32b i.e. bus width as 32 bit.

My SDRAM init is as follows:

void sdram_init()
{

 config_bwscon();

 /* Configure BANKCON6/BANKCON7 */
 writereg32(BANKCON6_REG(MEM_BA),MT_SYNC_DRAM|SCAN_9BIT);

 /* 
  * Set BANKCON7 to ROM/SRAM i.e 00 and not SYNC_DRAM.
  * Rest of the values should not be used as they are reserved
  * Default value is SYNC_DRAM which should not be used.
  */
 writereg32(BANKCON7_REG(MEM_BA),MT_ROM_SRAM);

 /* Configure SDRAM Refresh settings */
 writereg32(REFRESHCTL_REG(MEM_BA),REFEN|Tsrc_5|1269);

 /* Configure Banksize setting */
 writereg32(BANKSIZE_REG(MEM_BA),BURST_EN|SCKE_EN|SCLK_EN|BK76MAP_64MB);

 /* Configure mode set register for BANK6 */
 writereg32(MRSRB6_REG(MEM_BA),CAS_LATENCY_2CLK);

 return;
}

I set the BANKCON6 register to Sync DRAM as it is SDRAM (Synchronous DRAM). For the memory type of SDRAM I set SCAN parameter to SCAN_9BIT as it is A0-A8 or 9 bit.

Bank7 has to be disabled. Set the BANKCON7 register to MT_ROM_SRAM as it is set default to MT_SYNC_DRAM which should be removed.

There is Trcd or RAS to CAS delay to set. In the datasheet the RAS to CAS latency or Trcd(min) is 20ns. In our processor we have setup the HCLK to be 101 Mhz or 9.99ns ~ 10ns. Hence we have to setup or Trcd to have to 2 clock delay which is 00.

Refresh control register (REFRESH):
We set REFEN which is self auto refresh.
We set TREFMD to 0 CBR/Auto refresh mode.
We set the SDRAM RAS pre-charge time to 2 clocks i.e. value 00 as the data sheet gives a tRP(min) as 20ns or 2 clock cycles.
We set the SDRAM semi row cycle time Tsrc to Tsrc_5. The calculation is as follows:

Trc = Tsrc + Trp
or
Tsrc = Trc - Trp

From the data sheet we have Trc as 65ns Trp as 20 ns. Hence we get Tsrc as 45ns. Hence we set Tsrc_5 which is 5 clocks or 50ns.

I set the refresh counter to 1269 as given in the data sheet example.

Banksize register settings (BANKSIZE):
Here I enable BURST_EN(burst enable), SCKE_EN (SDRAM power down mode enable), SCLK_EN (SCLK being enabled during SDRAM access cycle to reduce power consumption) and BANK76MAP set to 001 or 64MB as the size of the memory is 32MiB + 32MiB = 64MiB.

SDRAM Mode register set register (MRSR):
We simply set the CL parameter or the CAS Latency to 2 clocks. According to the data sheet the CAS latency is 2.

A note on the memory controller bank select:

The S3C2440 has 8 memory banks. The General Chip select or nGCS should be connected to the different chip selects of the various peripherals connected which use the address space.
The banks are activated when the address of a memory is within the address region of the bank. This takes the burden out of doing a chip select manually whenever you want to access the memory region. Hence you can multiplex the address lines to different chips in different banks. When an address is generated the chip in the memory region is automatically selected using the bank chip select signal. I will verify this and provide an oscilloscope trace.

For reference from the data sheet:

Conclusion:
We do all the SDRAM setup in the loader itself as the MDK OS is loaded onto the SDRAM.

References: http://thread.gmane.org/gmane.comp.embedded.ptxdist.oselas.community/1994/focus=2010

Schematics from FriendlyARM.
Data sheet snapshots from Samsung S3C2440 data sheet.
Memory organization snap shots from Samsung K4S561632N data sheet.

It is better to do the right problem the wrong way than the wrong problem the right way. --Richard Hamming

Wednesday, 23 December 2015

MDK OS update on file systems.

Currently I will be working in parallel on implementing the FAT32 file system. Initially work will be done to just read the file system. Write and Modify will come later.

Now what is the need for implementing it from scratch?
It is to learn how FAT32 is implemented. I will writing posts on different aspects of the file system and the design decisions taken. This hopefully will act like a reference for people implementing their own or help them debug their system.

The development will be done outside the MDK OS tree. This is mainly for the ease of development and debugging. Later on it will be integrated with the MDK OS.

Friday, 18 December 2015

Interrupt handling in the MDK OS.

In ARM microprocessors the memory map address 0x00000000 is reserved for the vector table which is a set of 32bit words. When an interrupt occurs the processors suspends normal execution and starts loading instructions from the exception vector table. It is usually contains a form of branch instruction to a particular routine.

The interrupt vector table is as follows:

Vector	Address
Reset	0x00000000
Undefined	0x00000004
SWI	0x00000008
PABT	0x0000000C
DABT	0x00000010
Reserved	0x00000014
IRQ	0x00000018
FIQ	0x00000018

In the S3C2440 after a power on reset the initial 4KB of the NAND flash memory will be loaded onto an internal boot SRAM called the "stepping stone" buffer and the boot code present in this memory address will be executed. The loader is flashed onto the NAND flash using supervivi.

Interrupt handling in loader:

The "stepping stone" buffer SRAM memory map address is located at 0x00000000. Hence our MDK loader gets executed from there. The MDK loader has the following code at the start:

.section .text
.code 32
.globl vectors

vectors:
 b reset  /* Reset */
 b fault_state  /* Undefined instruction */
 b fault_state /* Software Interrupt */
 b fault_state  /* Abort prefetch */
 b fault_state  /* Abort data */
 b .  /* Reserved */
 b fault_state /* IRQ */
 b fault_state /* FIQ */

The code is placed in .text section. The addresses in this section is generated from 0x00000000. The fragment of the loader script is below:

MEMORY
{
 sram : org = 0x00000000 , len = 0x1000
 sdram : org = 0x30000000 , len = 0x4000000
}

SECTIONS
{
 .text :
 {
  *(.text);
  . = ALIGN(4);
 } > sram

As shown above the section .text is loaded onto the sram section which has origin from 0x00000000 with the length of 0x1000(4096) or 4KB.

Notice that my reset vector contains a branch to the reset label. The reset code fragment is as follows:

reset:
 /* Start by clearing bss section */
 ldr r1, bss_start
 ldr r2, bss_end
 ldr r3, =0

clear_bss:
 cmp r1,r2
 str r3,[r1],#4
 bne clear_bss

 /* load r13 i.e. stack pointer with stack_pointer */
 ldr r13,stack_pointer

 bl main

Here I load the bss_start and bss_end as present in the linker script file. Next in the clear_bss I compare if r1 i.e. the bs_start has reached r2 i.e the bs_end. I clear the bss by storing r3 in r1 memory content and incrementing it by 4. Then if I have not equaled r2 I continue the loop. Else I load the stack pointer in r13 and branch to main. The main is the main() function in os_main.c file.

Where have I got the stack_pointer,bss_start and bss_end variables from?

The code fragment below explains:

1
2
3

stack_pointer: .word __stack_top__
bss_start : .word __bss_start__
bss_end : .word __bss_end__

Where did the __stack_top__,__bss_start__ and __bss_end__ come from?

The linker script code explains:

SECTIONS
{
 .text :
 {
  *(.text);
  . = ALIGN(4);
 } > sram

 .data :
 {
  __data_start__ = .;
  *(.data)
  . = ALIGN(4);
  __data_end__ = .;
 } > sram

 .bss  :
 {
  __bss_start__ = .;
  *(.bss); *(COMMON)
  __bss_end__ = .;

  __stack_bottom__ = .;
  . += 0x300;
  __stack_top__ = .;

 } > sram

Notice that the linker script variables has global visibility. Now we can take the generated address and use it in our code. Notice that the __stack_bottom__ and __stack_top__ has 0x300(768) bytes of space. Please note that we are loading __stack_top__ in r13(SP) as the stack is a descending stack.

We are not handling any other interrupts in the loader. So if there are any interrupts that happens we just jump to a fault state as shown below:

fault_state:
 ldr r3,GPBCON
 ldr r4,GPBDAT
 ldr r5,GPBUP

 ldr r6,=0x15400
 str r6,[r3]  @Set to output
 ldr r6,=0x00
 str r6,[r4]  @Set the led
 ldr r6,=0x1E0
 str r6,[r5]  @Disable pullup 

 b .

I have setup the LED's to glow so that I understand that I am in a fault state.

This completes interrupt handling in the loader after a Power on Reset. Next we will see how we will handle this in the MDK OS.

Interrupt handling in MDK OS:

In the MDK OS the interrupt handling will be done differently. We face several problems with using the initial vectors to jump to a particular interrupt handling routine. First is that if we want to jump to a routine which is placed in the SDRAM at address 0x30000000 it becomes too far a jump.

So how did I fix this? I enabled the MMU and mapped address 0x00000000 to EXCEPTION_INTERRUPT_VECTOR_TABLE_START which is presently hard coded to 0x33F00000. So now whenever the processor jumps to 0x00000000 it will do an address translation and translates it to 0x33F00000 and executes the content at that address.

So how is the implementation done?

First we visit the code where the exception vectors are written(os_vectors.s).

.section .vector_reloc,"ax" //Apparent fix for missing section when objcopy is to have allocatable and executable flags-"ax"

.code 32

.globl exception_vectors

exception_vectors:
 ldr pc,=do_handle_reset  //Reset vector
 ldr pc,=do_handle_undef  //Undefined instruction
 ldr pc,=do_handle_swi   //Software Interrupt
 ldr pc,=do_handle_pabt   //Abort prefetch
 ldr pc,=do_handle_dabt  //Abort data
 ldr pc,=do_handle_reserved //Reserved
 ldr pc,=do_handle_irq  //IRQ
 ldr pc,=do_handle_fiq  //FIQ
.end

The code is put at section .vector_reloc (intuitive name vector relocation).

The exception vector code by itself is very simple. It just loads the PC (Program Counter) register with the different exception handlers.

How is the address generated for the code? It would be EXCEPTION_INTERRUPT_VECTOR_TABLE_START.

How is the above address generation determined? We have to look at the linker script of the MDK OS(mdkos.lds).

First we look the memory section:

MEMORY
{
 sram : org = 0x00000000 , len = 0x1000
 /*sdram : org = 0x30000000 , len = 0x4000000*/
 sdram : org = 0x30000000 , len = 0x3F00000 /* 63MB RAM */
 vectors : org = 0x33F00000 , len = 0x100000 /* Last 1MB for the isr handlers */
}

I have defined vectors region starting at 0x33F00000.

Next we see the sections:

SECTIONS
{

 .text :
 {
  *(.text);
  . = ALIGN(4);
 } > sdram


 .data :
 {
  __data_start__ = .;
  *(.data);
  . = ALIGN(4);
  __data_end__ = .;
 } > sdram

/*
 * Note on constant string bug (related to .rodata): 
 * There was a bug initially when printing a string constant would make the 
 * device go into a loop printing nonsense. This was due the fact that .rodata section
 * was left out. Due to this the addresses of the constant was emitted after the interrupt
 * vectors but the actual address of the constant was somewhere in between the file. (it
 * was after the stack setup. All the functions which referred to the string would use
 * the address which was emitted at the end of the isr handlers but the string was sitting
 * way before. It would have worked if after startup the string was moved to the address
 * at the end of the isr handler. Instead of doing this we can create a .rodata section and
 * put in the RAM. Also make sure we don't overwrite the read only section with some 
 * method. We can later write the .rodata to say flash and lock the write and do only a
 * read.
 */
 .rodata :
 {
  __rodata_start__ = .;
  *(.rodata);
  . = ALIGN(4);
  __rodata_end__ = .;
 } > sdram

 .bss  :
 {
  __bss_start__ = .;
  *(.bss); *(COMMON)
  __bss_end__ = .;

  __usr_sys_stack_bottom__ = .;
  . += 0x1000;
  __usr_sys_stack_top__ = .;

  __irq_stack_bottom__ = .;
  . += 0x1000;
  __irq_stack_top__ = .;

  __fiq_stack_bottom__ = .;
  . += 0x1000;
  __fiq_stack_top__ = .;

  __svc_stack_bottom__ = .;
  . += 0x1000;
  __svc_stack_top__ = .;
 } > sdram

 
 .vector_reloc :
 {
  *(.vector_reloc);
 } >vectors AT>sdram 

 /* Get the lma address for the particular section */
 __exception_vector_reloc_startaddr__ = LOADADDR(.vector_reloc);
 __exception_vector_reloc_endaddr__ = LOADADDR(.vector_reloc) + SIZEOF(.vector_reloc);

 /* 
     * Above SDRAM is where it will be stored in the file but address
     * references will be in the addresses of the isr handler section
  */


 .isrhandler :
 {
  *(.isrhandler);
 } >vectors AT>sdram

 __exception_handler_start_addr__ = LOADADDR(.isrhandler);
 __exception_handler_end_addr__ = LOADADDR(.isrhandler) +  SIZEOF(.isrhandler);


 /* 
  * >vma region AT > lma region 
  */

 /* 
  * eg: .data section is linked with LMA in ROM and
  * the VMA pointing to the real RAM versions
  */

 .stab 0 (NOLOAD) : 
 {
  [ .stab ]
 }

 .stabstr 0 (NOLOAD) :
 {
  [ .stabstr ]
 }
}

In line 65 vector_reloc part I tell the linker to generate addresses in the range defined by vectors i.e. from 0x33F00000. This will be the VMA region.

Now how do I know where the code is loaded?
The code is loaded by the loader to address 0x30000000 which is the start of the SDRAM. The code is placed after the .bss section. The __exception_vector_reloc_startaddr__ and __exception_vector_reloc_endaddr__ contains the start and end of the exception handler vector section. So when the code is loaded the place where it be present is 0x30XXXXXX. This will be the LMA region. The code has to be loaded from this region to the EXCEPTION_INTERRUPT_VECTOR_TABLE_START(0x3F000000) region.

The loading of these code is done the following way(setup_interrupt_vector_table(..) in os/mmu.c):

static void setup_interrupt_vector_table()
{
/*
 * TODO: Optimize it to remove the extra index variables. Unoptimized only for test purposes.
 *
 */

 char *vector_table = (char *)EXCEPTION_INTERRUPT_VECTOR_TABLE_START;

 /* 
  * Need to get the lma of the code.
  * The __exception_vector_reloc_startaddr__ is the lma i.e. the generated 
  * address in the file. I need to use this as the start address for the 
  * later vectors and handlers.
  */

 char *src = (char *)__exception_vector_reloc_startaddr__; 
      
 uint32_t i = 0;

 for(i = (uint32_t)__exception_vector_reloc_startaddr__; 
   i<(uint32_t)__exception_vector_reloc_endaddr__; 
    i++) {
  *vector_table = *src;
  vector_table++;
  src++;
 }


 /* Continue with the same place for handler source  */
 for(i = (uint32_t)__exception_handler_start_addr__; 
     i<(uint32_t)__exception_handler_end_addr__;
     i++) {
  *vector_table = *src;
  vector_table++;
  src++;
 }

}

In line 17 we get the content from "vectoreloc" start address to end address and we copy it to the vector_table pointer pointing to EXCEPTION_INTERRUPT_VECTOR_TABLE_START i.e. 0x3F000000 address.

Apart from that we continue to copy the contents of the interrupt handlers. The isr handlers are placed right next to the exception handlers.

The interrupt service handlers are placed in file exception_handler.s under the section .isrhandler

The code fragment is as follows:

.section .isrhandler,"ax"


.code 32


.globl do_handle_reset
do_handle_reset:
 b do_handle_reset

.globl do_handle_undef
do_handle_undef:
 b do_handle_undef

.globl do_handle_swi
do_handle_swi:
 b do_handle_swi

.globl do_handle_pabt
do_handle_pabt:
 b do_handle_pabt

.globl do_handle_dabt
do_handle_dabt:
 b do_handle_dabt

.globl do_handle_reserved
do_handle_reserved:
 b do_handle_reserved


.globl do_handle_irq
do_handle_irq:
 sub lr,lr,#4 @Subtract r14(lr) by 4.
 stmfd sp!, {r0-r12,lr} @Save r0-r12 and lr. 
       @sp! indicates sp will be subtracted by the sizes of the registers saved.
       @Instruction details can be read in ARM System Developers guide book at Pg 65.
 /*
  * Note on disabling and enabling CPU IRQ.
  * ======================================
  * There is no need to disable IRQ when in IRQ mode. When there is 
  * an interrupt the processor switches to IRQ mode with the I bit 
  * enabled which means it is masked.
  *
  * It was tested by printing the cpsr_irq which had the value
  * 0x60000092. The 7th bit is set which means the IRQ flag is set.
  *
  * This is the same case with the FIQ.
  */
 
 ldr r2,INTOFFSET    @Load the INTOFFSET value into r2
 ldr r2,[r2]      @Load the value in the address to r2
 

 ldr r3,=interrupt_handler_jmp_table @Load the address of the interrupt handler jump table.

 mov lr,pc
 ldr pc,[r3,r2,LSL #2] @Load the value which is the interrupt handler jmp table.


// bl handle_irq

 //Clear interrupt source pending

 ldr r2,INTOFFSET    @Load the INTOFFSET value into r2
 ldr r2,[r2]      @Load the value in the address to r2

 mov r3,#1    @move 1 to r3.
 mov r3,r3, LSL r2   @Shift left by INTOFFSET and store it in r3
 
 ldr r4,SRCPND
 str r3,[r4]   @Store the value of r3 in r4 address

 ldr r4,INTPND
 str r3,[r4]   @Store the value of r3 in r4 address

 
 
 ldmfd sp!, {r0-r12,pc}^  @Restore the stack values to r0 and r12. Next restore lr to pc.
       @The ^ indicates the spsr has to copied to cpsr. The cpsr was copied to spsr
       @when the interrupt was generated.
       @The restoration of CPSR will change the mode to whatever mode was
       @present before the interrupt was called.
 

.globl do_handle_fiq
do_handle_fiq:
  b do_handle_fiq

The code in os_vector.s for e.g. where the ldr pc,=do_handle_irq was done has the code of do_handle_irq in the file exception_handler.s which contains the implementation.

This concludes the memory juggling needed to execute the interrupts.

Handling of various IRQ's:

To get an interrupt you have to enable the global IRQ and FIQ in the CPSR register. This is done as follows:

static void enable_irq_fiq(void)
{
 uint32_t cpsr_val = 0;

 __asm__ __volatile__ (
  "mrs r0,cpsr\n\t"   /* Copy CPSR to r0 */
  "bic r0,r0,#0xC0\n\t"  /* Clear IRQ, FIQ */
  "msr cpsr,r0\n\t"   /* Copy modified value to cpsr */
  "mov %0,r0\n\t"
  : [cpsr_val]"=r"(cpsr_val) /* No output */
  : /* No input */
  : "r0" /* r0 gets clobbered */
 );

 //print_hex_uart(UART0_BA,cpsr_val);
}

An optimization would be to rewrite as a macro.

Next we will go to the actual handling of the interrupt exception. For this we have to turn over to the code in exception_hander.s

In the do_handle_irq we have :

do_handle_irq:
 sub lr,lr,#4 @Subtract r14(lr) by 4.
 stmfd sp!, {r0-r12,lr} @Save r0-r12 and lr. 
       @sp! indicates sp will be subtracted by the sizes of the registers saved.
       @Instruction details can be read in ARM System Developers guide book at Pg 65.
 /*
  * Note on disabling and enabling CPU IRQ.
  * ======================================
  * There is no need to disable IRQ when in IRQ mode. When there is 
  * an interrupt the processor switches to IRQ mode with the I bit 
  * enabled which means it is masked.
  *
  * It was tested by printing the cpsr_irq which had the value
  * 0x60000092. The 7th bit is set which means the IRQ flag is set.
  *
  * This is the same case with the FIQ.
  */
 
 ldr r2,INTOFFSET    @Load the INTOFFSET value into r2
 ldr r2,[r2]      @Load the value in the address to r2
 

 ldr r3,=interrupt_handler_jmp_table @Load the address of the interrupt handler jump table.

 mov lr,pc
 ldr pc,[r3,r2,LSL #2] @Load the value which is the interrupt handler jmp table.


// bl handle_irq

 //Clear interrupt source pending

 ldr r2,INTOFFSET    @Load the INTOFFSET value into r2
 ldr r2,[r2]      @Load the value in the address to r2

 mov r3,#1    @move 1 to r3.
 mov r3,r3, LSL r2   @Shift left by INTOFFSET and store it in r3
 
 ldr r4,SRCPND
 str r3,[r4]   @Store the value of r3 in r4 address

 ldr r4,INTPND
 str r3,[r4]   @Store the value of r3 in r4 address

 
 
 ldmfd sp!, {r0-r12,pc}^         @Restore the stack values to r0 and r12. Next restore lr to pc.
     @The ^ indicates the spsr has to copied to cpsr. The cpsr was copied to spsr
     @when the interrupt was generated.
     @The restoration of CPSR will change the mode to whatever mode was
     @present before the interrupt was called.

Before we go in depth into the explanation of the code there is a need to explain the first line of the code.
When an exception occurs the link register is set to a specific address based on the current pc. When an IRQ exception is raised the link register lr points to the last executed instruction plus 8. Care has to be taken to make sure the exception handler does not corrupt the lr because lr is used to return from an exception handler. The IRQ exception is taken only after the current instruction is executed, so the return address has to point to the next instruction i.e. lr-4.

The following has useful addresses for the different exceptions.

Exception	Address
Reset
Undefined	lr
SWI	lr
PABT	lr-4
DABT	lr-8
Reserved
IRQ	lr-4
FIQ	lr-4

Next we save the registers from r0 to r12.
Next we get the interrupt offset from the interrupt offset register. After this we load the program counter with the index to the handler in the interrupt_handler_jmp_table.

Later code involves interrupt clean up by setting bits in source pending and interrupt pending registers. After this we restore the values r0 to r12 from the stack and load lr to pc to continue where we left off.

Note on the jump tables:

There are 2 jump tables present. The interrupt_handler_jmp_table and external_interrupt_handler_jmp_table. The 2 tables are array of functions pointers of the type void(*handler)(void).

This completes the generic parts of the interrupt handling by the MDK OS. I will add more details if I see anything lacking.

Restlessness is discontent and discontent is the first necessity of progress. Show me a thoroughly satisfied man and I will show you a failure.
--Thomas A. Edison

The Soul of a Machine