Embedded systems programming is not like normal PC programming. In many ways, programming for an embedded system is like programming a PC was 15 years ago. The hardware for the system is usually chosen to make the device as cheap as possible. Spending an extra dollar a unit in order to make things easier to program can cost millions. Hiring a programmer for an extra month is cheap in comparison. This means the programmer must make do with slow processors and low memory, while at the same time battling a need for efficiency not seen in most PC applications. Below is a list of issues specific to the embedded field.


Embedded development makes up a small fraction of total programming. There's also a large number of embedded architectures, unlike the PC world where 1 instruction set rules, and the Unix world where there's only 3 or 4 major ones. This means that the tools are more expensive. It also means that they're lower featured, and frequently buggier. On a major embedded project, at some point you will almost always find a compiler bug of some sort.

Debugging tools are another issue. Since you can't run general programs on your embedded processor, you can't run a debugger on it. This makes fixing your program difficult. Special hardware such as JTAG ports can overcome this issue in part. However, if you stop on a breakpoint when your system is controlling real world hardware (such as a motor), permanent equipment damage can occur. As a result, people doing embedded programming quickly become masters at using serial IO channels and error message style debugging.


To save costs, embedded systems frequently have the cheapest processors that can do the job. This means your programs need to be written as efficiently as possible. When dealing with large data sets, issues like memory cache misses that never matter in PC programming can hurt you. Luckily, this won't happen too often- use reasonably efficient algorithms to start, and optimize only when necessary. Of course, normal profilers won't work well, due to the same reason debuggers don't work well. So more intuition and an understanding of your software and hardware architecture is necessary to optimize effectively.

Memory is also an issue. For the same cost savings reasons, embedded systems usually have the least memory they can get away with. That means their algorithms must be memory efficient (unlike in PC programs, you will frequently sacrifice processor time for memory, rather than the reverse). It also means you can't afford to leak memory. Embedded applications generally use deterministic memory techniques and avoid the default "new" and "malloc" functions, so that leaks can be found and eliminated more easily.

Other resources programmers expect may not even exist. For example, most embedded processors do not have hardware FPUs (Floating-Point Processing Unit). These resources either need to be emulated in software, or avoided altogether.

Real Time Issues

Embedded systems frequently control hardware, and must be able to respond to them in real time. Failure to do so could cause inaccuracy in measurements, or even damage hardware such as motors. This is made even more difficult by the lack of resources available. Almost all embedded systems need to be able to prioritize some tasks over others, and to be able to put off/skip low priority tasks such as UI in favor of high priority tasks like hardware control.

Fixed-Point Arithmetic

Some embedded microprocessors may have an external unit for performing floating point arithmetic(FPU), but most low-end embedded systems have no FPU. Most C compilers will provide software floating point support, but this is significantly slower than a hardware FPU. As a result, many embedded projects enforce a no floating point rule on their programmers. This is in strong contrast to PCs, where the FPU has been integrated into all the major microprocessors, and programmers take fast floating point number calculations for granted. Many DSPs also do not have an FPU and require fixed-point arithemtic to obtain acceptable performance.

A common technique used to avoid the need for floating point numbers is to change the magnitude of data stored in your variables so you can utilize fixed point mathematics. For example, if you are adding inches and only need to be accurate to the hundreth of an inch, you could store the data as hundreths rather than inches. This allows you to use normal fixed point arithmetic. This technique works so long as you know the magnitude of data you are adding ahead of time, and know the accuracy to which you need to store your data.

Embedded Systems/Microprocessor Architectures

The chapters in this section will discuss some of the basics in microprocessor architecture. They will discuss how many features of a microprocessor are implemented, and will attempt to point out some of the pitfalls (speed decreases and bottlenecks, specifically) that each feature represents to the system.

Memory Bus

In a computer, a processor is connected to the RAM by a data bus. The data bus is a series of wires running in parallel to each other that can send data to the memory, and read data back from the memory. In addition, the processor must send the address of the memory to be accessed to the RAM module, so that the correct information can be manipulated.

Multiplexed Address/Data Bus

In old microprocessors, and in some low-end versions today, the memory bus is a single bus that will carry both the address of the data to be accessed, and then will carry the value of the data. Putting both signals on the same bus, at different times is a technique known as "time division multiplexing", or just multiplexing for short. The effect of a multiplexed memory bus is that reading or writing to memory actually takes twice as long: half the time to send the address to the RAM module, and half the time to access the data at that address. This means that on a multiplexed bus, moving data to and from the memory is a very expensive (in terms of time) process, and therefore memory read/write operations should be minimized. It also makes it important to ensure algorithms which work on large datasets are cache efficient.

Demultiplexed Bus

The opposite of a multiplexed bus is a demultiplexed bus. A demultiplexed bus has the address on one set of wires, and the data on another set. This scheme is twice as fast as a multiplexed system, and therefore memory read/write operations can occur much faster.

Bus Speed

In modern high speed microprocessors, the internal CPU clock may move much faster than the clock that synchronizes the rest of the microprocessor system. This means that operations that need to access resources outside the processor (the RAM for instance) are restricted to the speed of the bus, and cannot go as fast as possible. In these situations, microprocessors have 2 options: They can wait for the memory access to complete (slow), or they can perform other tasks while they are waiting for the memory access to complete (faster). Old microprocessors and low-end microprocessors will always take the first option (so again, limit the number of memory access operations), while newer, and high-end microprocessors will often take the second option.

I/O Bus

Any computer, be it a large PC or a small embedded computer, is useless if it has no means to interact with the outside world. I/O communications for an embedded computer frequently happen over a bus called the I/O Bus. Like the memory bus, the I/O bus frequently multiplexes the input and output signals over the same bus. Also, the I/O bus is moving at a slower speed than the processor is, so large numbers of I/O operations can cause a severe performance bottleneck.

It is not uncommon for different IO methods to have separate buses. Unfortunately, it is also not uncommon for the electrical engineers designing the hardware to cheat and use a bus for more than 1 purpose. Doing so can save the need for extra transistors in the layout, and save cost. For example, a project may use the USB bus to talk to some LEDs that are physically close by. These different devices may have very different speeds of communication. When programming IO bus control, make sure to take this into account.

In some systems, memory mapped IO is used. In this scheme, the hardware reads its IO from predefined memory addresses instead of over a special bus. This means you'll have simpler software, but it also means main memory will get more access requests.

Programming the IO Bus

When programming IO bus controls, there are 5 major variations on how to handle it- the main thread poll, the multithread poll, the interrupt method, the interrupt+thread method, and using a DMA controller.

Main thread poll

In this method, whenever you have output ready to be sent, you check if the bus is free and send it. Depending on how the bus works, sending it can take a large amount of time, during which you may not be able to do anything else. Input works similarly- every so often you check the bus to see if input exists.


* Simple to understand


* Very inefficient, especially if you need to push the data manually over the bus (instead of via DMA)
* If you need to push data manually, you are not doing anything else, which may lead to problem with real time hardware
* Depending on polling frequency and input frequency, you could lose data by not handling it fast enough

In general, this system should only be used if IO only occurs at infrequent intervals, or if you can put it off when there are more important things to do. If your system supports multithreading or interrupts, you should use other techniques instead.

Multi thread polling

In this method, we spawn off a special thread to poll. If there is no IO when it polls, it puts itself back to sleep for a predefined amount of time. If there is IO, it deals with it on the IO thread, allowing the main thread to do whatever is needed.


* Does not put off the main thread
* Allows you to define the importance of IO by changing the priority of the thread


* Still somewhat inefficient
* If IO occurs frequently, your polling interval may be too small for you to sleep sufficiently, starving other threads
* If your thread is too low in priority or there are too many threads for the OS to wake the thread in a timely fashion, data can be lost.
* Requires an OS capable of threading

This technique is good if your system supports threading, but does not support interrupts or has run out of interrupts. It does not work well when frequent IO is expected- the OS may not properly sleep the thread if the interval is too small, and you will be adding the overhead of 2 context switches per poll.


In this method, the bus fires off an interrupt to the processor whenever IO is ready. The processor then jumps to a special function, dropping whatever else it was doing. The special function (called an interrupt handler, or interrupt service routine) takes care of all IO, then goes back to whatever it was doing.


* Very efficient
* Very simple, requires only 1 function


* If dealing with IO takes a long time, you can starve other things. This is especially dangerous if your handler masks interrupts, which can cause you to miss hardware interrupts from real time hardware
* If your handler takes so long more input is ready before you handle existing input, data can be lost.

This technique is great so long as dealing with the IO is a short process, such as when you just need to set up DMA. If its a long process, use multithreaded polling or interrupts with threads.

Interrupts and threads

In this technique, you use an interrupt to detect when IO is ready. Instead of dealing with the IO directly, the interrupt signals a thread that IO is ready and lets that thread deal with it. Signalling the thread is usually done via semaphore- the semaphore is initialized to the taken state. The IO thread tries to take the semaphore, which fails and the OS puts it to sleep. When IO is ready, the interrupt is fired and releases the semaphore. The thread then wakes up, and handles the IO before trying to take the semaphore and being put back to sleep.


* Does not put off the main thread
* Allows you to define the importance of IO by changing the priority of the thread
* Very efficient- only makes context changes when needed and does not poll.
* Very clean solution architecturally, allows you to be very flexible in how you handle IO.


* Requires an OS capable of threading
* Most complex solution

This solution is the most flexible, and one of the most efficient. It also minimizes the risk of starving more important tasks. Its probably the most common method used today.

DMA (Direct Memory Access) Controller

In some specialised situations, such as where a set of data must be transfered to a communications IO device, a DMA controller may be present that can automatically detect when the IO device is ready for more data, and transfer that data. This technique may be used in conjuction with many of the other techniques, for instance an interrupt may be used when the data transfer is complete.


* This provides the best performance, since the I/O can happen in parallel with other code execution


* Only applicable to a limited range of problems
* Not all systems have DMA controllers. This is especially true of the more basic 8-bit microcontrollers.
* Parallel nature may complicate a system


External timer chips are incredibly useful for performing a number of different operations. For instance, many multi-threaded operating systems operate by setting a timer, and then switching to a different thread every time the timer goes off. Programmable timer chips can often be programmed to provide a variety of different timing functions, to take the burden off the microprocessor. You can see some of the 8086 compatible timer chips like 8253/54 also they are the same have three independent timers internally but 8254 can work with higher frequencies and is used to generate interrupt like Memory refresh interrupt ,Time of day TOD interrupt and the last one is used to generate the speaker frequencies.

Interrupt Controllers

The orignal 8086 processor had only a single pin used for signaling an interrupt, so a programmable interrupt controller would handle most of the messy details of calling the interrupt. Also, a programmable interrupt controller could be used to monitor an input port for instance, and triggering an interrupt routine when input is received.

Direct Memory Access

Since memory read/write operations take longer than other operations for the microprocessor, it is very wasteful to ask the microprocessor to move large blocks of memory. Luckily, the original 8086 came with a programmable direct memory access controller (DMA) for use in automatically copying and moving large segments of memory. DMAs could also be used for implementing memory-mapped I/O, by being programmed to automatically move memory data to and from an output port.

DMA memory copies can also greatly enhance system performance by allowing the CPU to execute code in parallel with a DMA controller automatically performing the memory copy.

Peripheral Interface Controllers

Peripheral interface controllers take a number of different forms. Each different type of port has a different controller that the microprocessor will interface with to send the output on that port. For instance there are controllers for parallel ports and more modern USB ports. These controllers are used for controlling settings on output such as timing, and setting different modes on the output/input port.
addressable areas

There are typically 4 distinct addressable areas, each one implemented with a different technology:

* program memory (which holds the programs you write), often called ROM (although most developers prefer to use chips that actually implement this with Flash). While your program is running, it is impossible to change any of the data in program memory. But at least when the power comes back on, it's all still there.
* RAM, which holds the variables and stack. (Initial values for variables are copied from ROM). Forgets everything when power is lost.
* EEPROM. Used kind of like the hard drive in a personal computer, to store settings that might change occasionally, and that need to be remembered next time it starts up.
* I/O. This is really the entire point of a microcontroller.

Many popular microcontrollers (including the 8051, the Atmel AVR, the Microchip PIC, the Cypress PSoC) have a "w:Harvard architecture", meaning that programs can only execute out of "ROM". You can copy bytes from ROM (or elsewhere) into RAM, but it's physically impossible to jump or call such "code" in RAM. This is exactly the opposite of the situation on desktop computers, where the code you write cannot be executed until after it is copied into RAM.

A few popular microcontrollers (such as the 68HC11 and 68HC12 and …) have a unified address space (a "von Neumann architecture"). You can jump or call code anywhere (although jumping to an address in I/O space is almost certainly not what you really wanted to do).

Reserved Memory

Reserved memory is memory which is reserved for some purpose like additional software installation and startup.'

Often software applications grow and grow. Ancient processors (such as the 8085 used on the Mars rover Sojourner) with 16 bit address registers can directly access a maximum of 65 536 locations — however, systems using these processors often have much more physical RAM and ROM than that. They use "paging" hardware that swaps in and out "banks" of memory into the directly accessible space. Early Microchip PIC processors had 2 completely separate set of "banking registers", one for swapping in different banks of program ROM, the other for swapping in different banks of RAM.

Segmented Memory

Old x86 processors were only 16 bit processors, and if a flat memory scheme was used, those processors would only be able to support 65 Kilobytes of memory. The system engineers behind the old 8086 and 80286 processors came up with the idea to segment memory, and use a combination of segment pointers and offset pointers to access an effective 20 bit address range, for a maximum of 1 megabyte of addressable memory.

Address = (Segment register * 16) + pointer register

New 32 bit processors allow for 4 Gigabytes of addressable memory space, and therefore the segmented memory model was all but abandoned in current 32 bit machines (although the segment registers are still used to implement paging schemes).

Memory-Mapped I/O

Memory-Mapped I/O is a mechanism by which the processor performs I/O access by using memory access techniques. This is often put into effect because the memory bus is frequently much faster then the I/O bus. Another reason that memory mapped I/O might be used is that the architecture in use does not have a separate I/O bus.

In memory mapped IO, certain range of CPU's address space is kept aside for the external peripherals. These locations can be accessed using the same instructions as used for other memory accesses. But instead, the read/writes to these addresses are interpreted as access to device rather than a location on the main memory.

A CPU may expect a particular device at a fixed location or can dynamically assign a space for it.

The way this works is that memory interfaces are often designed as a bus (a shared communications resource), where many devices are attached. These devices are usually arranged as master and slave devices, where a master device can send and receive data from any of the slave devices. A typical system would have:

* A CPU as the master
* One or more RAM and/or ROM devices for program code and data storage
* Peripheral devices for interfacing with the outside world. Examples of these might be a UART (serial communications), Display device or Input device

memory management

All too often, programs written for embedded systems grow and grow until they exceed the available program space. There are a variety of techniques for dealing with the out-of-memory problem:

* re-compile with the "-Os" (optimize for size) option
* find and comment-out "dead code"
* "refactor" repeated sections into a common subroutine
* trade RAM space for program space.
* put a small interpreter in "internal program memory" that loads and interprets "instructions".
o use "instructions" — perhaps p-code or threaded code — that are more compact than directly coding it in assembly language; and/or
o place these "instructions" can be placed in EEPROM or external serial Flash that couldn't otherwise be used as program memory. This technique is often used in "stamp" style CPU modules.
* add more memory (perhaps using a paging or banking scheme)

Most CPUs used in desktop machines have a "memory management unit" (MMU). The MMU handles virtual memory, protects regions of memory used by the OS from untrusted programs, and …

Most embedded systems do not have a MMU.


One type of memory that is as cheap as it is useless is Read-Only Memory (ROM). I say that it is useless because you can program it once, and then you can never change the data that is on it. This makes it useless because you can't upgrade the information on the ROM chip (be it program code or data), you can't fix it if there is an error, etc…. Because of this, they are usually called "Programmable Read-Only Memory" (PROM), because you can program it once, but then you can't change it at all.


In contrast to PROM is EPROM ("Erasable Programmable Read-Only Memory"). EPROM chips will have a little window, made of either glass or quartz that can be used to erase the memory on the chip. To erase an EPROM, the window needs to be uncovered (they usually have some sort of guard or cover), and the EPROM needs to be exposed to UV radiation to erase the memory, and allow it to be reprogrammed.


A step up from EPROM is EEPROM ("Electrically Erasable Programmable Read-Only Memory"). EEPROM can be erased by exposing it to an electrical charge. This means that EEPROM can be erased in circuit (as opposed to EPROM, which needs to be removed from the circuit, and exposed to UV). An appropriate electrical charge will erase the entire chip, so you can't erase just certain data items at a time.


Random Access Memory (RAM) is a temporary, volatile memory that requires a persistant electric current to maintain information. As such, a RAM chip will not store data when you turn the power OFF. RAM is more expensive then ROM, and is often at a premium: Embedded systems can have many Kbytes of ROM (sometimes Megabytes or more), but often they have less then 100 bytes of RAM available for use in program flow.

FLASH Memory

Flash memory is a combination of the best parts of RAM and ROM. Like ROM, Flash memory can hold data when the power is turned off. Like RAM, Flash can be reprogrammed electrically, in whole or in part, at any time during program execution.

Flash memory modules are only good for a limited number of Read/Write cycles, which means that they can burn out if you use them too much, too often. As such, Flash memory is better used to store persistant data, and RAM should be used to store volatile data items.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License