MSP430 Size Optimizations

I'm writing this document to collect some of the current/new knowledge on how to minimize the flash/rom size needed for MSP430 applications, using the new msp430-elf (FSF/Red Hat) tools. I'll keep a copy at http://people.redhat.com/~dj/msp430/size-optimizations.html

This document has two purposes: to collect this information in one public place, and to ask others to test it out and provide feedback.

Obvious: Please do all size testing with -Os, which optimizes for size, and not -O3, which optimizes for speed.

Linker Section Optimizations

The msp430-elf assembler has a new directive ".refsym" that adds a reference to the named symbol into the generated object. In the past, to do this, you'd use a ".word" directive instead, but that takes up space in the final image. This new directive takes up no space in the image.

The startup code provided in the upstream newlib/libgloss (crt0 et al) uses ".refsym" to tell the linker about dependencies between the various snippets of startup code and things in the whole image which require them. This dependency information is partly manual in crt0.S itself, and partly automatic in code generated by gcc and gas.

For example, gcc has code to check if main() ever returns. In most embedded programs, it won't. If it *does* return, gcc adds a ".refsym" that causes a snippet in crt0.S to be included which adds a call to exit() after the call to main(). If main() doesn't return, there won't be any special code after the call to main().

The assembler has code to detect if either the data or bss sections are used, and if they are, it will use .refsym to tell crt0 to pull in snippets of code to initialize the RAM correctly. However, this means that most objects will now have extra "undefined" symbols that aren't part of your application:

	U __crt0_movedata

This new functionality means that there's a new library that must be linked in: "-lcrt". This is built by libgloss (part of newlib) by splitting up crt0.s, and contains all the snippets. To link this library correctly, you need a special new section in your linker script, which looks like this:

.text           :
  {
    . = ALIGN(2);
    PROVIDE (_start = .);
    KEEP (*(SORT(.crt_*)))
    *(.lowtext .text .stub .text.* .gnu.linkonce.t.* .text:*)

The keep/sort line places all the snippets from crt0 at that point (after _start but before the rest of your program) in asciibetical order. Conveniently, the sections in libcrt.a are all named like ".crt_0013something" so the four-digit number causes them to all be inserted in the right order.

Lastly, there's a GCC option -minrt that tells gcc to use a "minimum runtime" for programs that do not need static initializers or constructors (popular in C++ and Java). Note that this is different than initialized data (like "int j = 5;"), this is for functions that need to be called before main() is. You can also forcibly remove the extra language support in your linker script by discarding anything from crtbegin/crtend:

  /DISCARD/ : { *crtbegin*.o(*) *crtend*.o(*) }

Don't forget to take out the KEEP's for crtbegin/end too, though ;-)

What's the net result of all this? A simple "blink an led" program that has no global variables can take as few as 24 bytes of flash depending on how you blink the led!

__int20 Patch for Large Model

The second big change is some ongoing work to add true "__intN" support to the GCC internals. Before now, gcc had one __int128 type built-in and any target that wanted something else had to hack it in somehow, without support from gcc's core. I've put a huge unofficial patch online at:

http://people.redhat.com/~dj/msp430/int20-patch.txt

This patch may not apply cleanly if the upstream sources have changed too much since the patch was generated.

There are two parts to this patch: The first part is the core __intN support, and the second part is changes to the msp430 backend to enable __int20 and support it as a regular integer type. Note that this patch mostly affects "large model" programs (-mlarge) as it changes pointer math to use __int20 for size_t instead of "unsigned long".

To explicitly use the __int20 type, replace "int" with "__int20" like this:

unsigned __int20 x[10];
extern __int20 a, b, c;
void foo (__int20 a, void *b);

Note that __int20 won't work (and you'll get a helpful compile-time error) unless you are building for an MSP430X-class cpu. You are allowed to use an explicit __int20 type with small model, though.

Building Newlib for Reduced Size

In some cases, applications may want to use the stock newlib runtime but want to reduce the amount of flash newlib routines use. If you're willing to rebuild newlib yourself, there are some config options you can provide that remove features you may not need. For an up-to-date list of these options, run "./configure --help" in the newlib/ subdirectory. Any --enable-foo option can be given as --disable-foo to disable a feature. For example:

../newlib-trunk/configure --disable-newlib-io-float

There is also an alternate tiny malloc() implementation that can be enabled:

../newlib-trunk/configure --enable-newlib-nano-malloc

Note that you can specify multiple --enable/--disable options on one configure command.