September 28, 2018

How we developed the NIOS II processor module for IDA Pro


IDA Pro has a well-earned place in the toolkit of security researchers worldwide. We at Positive Technologies are no exception. In fact, we like it so much that we developed a disassembler processor module for the NIOS II architecture to make analyzing code faster and more convenient.

Here I will give a brief history of the project and share what exactly it is that we created.


It all started in 2016, when we had to develop a processor module in-house to analyze firmware for some work we were doing. Development started from scratch based on the Nios II Classic Processor Reference Guide, which was the most up-to-date reference at the time. This took about two weeks.

The processor module was developed for IDA version 6.9. IDA Python was the logical choice for the sake of speed. The procs subfolder inside the IDA Pro installation folder, where processor modules are stored, contains three Python modules: msp430, ebc, and spu. These modules offered an example as to module structure and how to implement basic functionality:

  • Parsing instructions and operands
  • Simplifying and displaying same
  • Creating offsets, cross-references, and the code and data to which they refer
  • Handling switch constructions
  • Handling manipulations with the stack and stack variables

This is the functionality I was able to implement at that time, more or less. Fortunately, these labors came in handy again during a different project a year later, during which I actively used and improved the module.

I decided to share this experience creating a processor module with the community at PHDays 8. The talk drew interest (a video is available on the PHDays site) and even Ilfak Guilfanov, the creator of IDA Pro, was in attendance. One of his questions was: is IDA Pro version 7 supported? The answer then was "no" but after the talk, I committed to releasing a module version that would. And that's when things got interesting.

Now there was a newer manual from Intel, which helped to make comparisons and check for bugs. I made big changes to the module, added numerous new features, and fixed some problems that had previously eluded solution. And of course, I added support for version 7 of IDA Pro. This is the result.

NIOS II programming model

NIOS II is an embedded processor developed for FPGAs from Altera (now a part of Intel). From a software standpoint, it has the following notable features: Little Endian byte order, 32-bit address space, 32-bit instruction set (meaning a fixed command length of 4 bytes), and 32 general-purpose registers and 32 special-purpose registers.

Disassembly and code references

So we open in IDA Pro a new file with firmware for the NIOS II processor. After installing the module, we see it in the list of IDA Pro processors. The list is shown in the following screenshot:

Let's say that the module does not yet support even basic parsing of commands. Since each command occupies 4 bytes, we place the bytes in groups of four, which resembles the following:

After we implement basic functionality to decode the instructions and operands, display them on screen, and analyze execution transfer instructions, the set of bytes from our above example turns into the following code:

As the example shows, cross-references with execution transfer commands are formed as well (in this particular case, we see a conditional jump and procedure call).

One useful thing we can implement in processor modules is comments for commands. If we disable display of byte values and enable comments instead, the same code will look as follows:

So if you are dealing with assembler code on an architecture that is new to you, comments can help you to get a feel for what is going on. The remaining code examples here will be given in the same way, with comments, so that you can concentrate on what is happening in the code instead of flipping through the NIOS II manual.

Pseudoinstructions and simplifying commands

Some NIOS II commands are pseudoinstructions. These commands do not have separate opcodes, and they themselves are modeled as special cases of other commands. During disassembly, instructions are simplified: in other words, certain combinations are replaced with pseudoinstructions. There are four types of NIOS II pseudoinstructions:

  • When the zero register (r0) is one of the sources and can be disregarded 
  • When a command has a negative value and the command is replaced with the opposite one
  • When a condition is replaced with the opposite one
  • When a 32-bit offset is moved in two commands (high and low halfword) and this is replaced with a single command

The first two types have been implemented, since replacing the condition does not change much. But 32-bit offsets are more diverse than described in the manual.

Let's see an example of the first type.

The zero register is used frequently in calculations, in our example. If we look closely, all commands (other than execution transfer commands) involve simply moving values to particular registers.

After pseudoinstruction handling has been applied, we get more readable code and instead of the OR and ADD commands, we get MOV instead.

Stack variables

NIOS II has stack support, and besides the stack pointer (sp) it also has a stack frame pointer (fp). Here is an example of a short procedure with use of a stack:

Space on the stack is reserved for local variables. Presumably, the ra register is saved in, and then restored from, a stack variable.

After we have added functionality to the module for monitoring stack pointer changes and creating stack variables, this is how the sample will look:

The code is easier to understand now, and we can name stack variables and study their purpose by referring to the cross-references. The __fastcall function in our example, as well as its arguments in the r4 and r5 registers, are moved to the stack to call a subprocedure with the _stdcall type.

32-bit numbers and offsets

During a single operation (=when performing one command), NIOS II can move a value of maximum size 2 bytes (16 bits) to a register. On the other hand, the processor registers and address space are 32-bit, meaning that 4 bytes are necessary for register addressing.

To overcome this, it is necessary to use offsets consisting of two parts. A similar mechanism is used on PowerPC processors: an offset consists of a high and low part and is moved to the register in two commands. This is how it works on PowerPC:

Cross-references are formed from both commands, although effectively it is the second command that sets the address. This can be inconvenient when trying to count the number of cross-references.

The non-standard type HIGHA16 is used in the properties of the offset for the high part, and sometimes the HIGH16 type is used; LOW16 is used for the low part.

Actually calculating 32-bit numbers from the two parts is not at all difficult. What's difficult is generating operands as the offsets for two separate commands. All of this processing is the job of the processor module. There were no existing examples of how to do this in the IDA SDK (and definitely not any written in Python).

In the PHDays talk, I mentioned offsets as an unresolved task. To solve this, we had to be clever: the 32-bit offset is taken only from the low halfword, relative to the base. The base is calculated as the high halfword shifted 16 bits to the left.

With this approach, a cross-reference is generated only for the command responsible for moving the low halfword of the 32-bit offset.

In the offset properties, we can see the base and property for treating the base address as a plain number, to avoid generating a large number of cross-references to the very same address that is serving as the base.

The NIOS II code contains the following mechanism for moving 32-bit numbers to a register: First, the high halfword of the offset is moved with the movhi command. Then it is joined by the low halfword. This can be accomplished in three different ways (commands): adding (addi), subtracting (subi), and with logical OR (ori).

For example, in the following code the registers are set to 32-bit numbers that are then moved to registers (arguments prior to calling a function):

After we have added offset calculations, we get the following representation of the code:

The resulting 32-bit offset is displayed next to the command for moving its low halfword. This example is rather striking and we could even mentally sum up all the 32-bit numbers with ease, simply by combining the high and low parts. Judging by the values, they are unlikely to be offsets.

Now we will look at a case when subtraction is used for moving a low halfword. Here we can no longer calculate the final 32-bit values (offsets) without effort.

After applying calculation of 32-bit numbers, it looks as follows:

Here we see that if an address is contained in the address space, an offset for it is generated, and the value formed by combining the high and low halfwords is no longer displayed next to it. Here we obtained the offset for the string "10/22/08". To make the final offsets point to valid addresses, we will increase the segment size slightly.

After enlarging the segment, we see that now all the calculated 32-bit numbers are offsets and point to valid addresses.

Earlier, I mentioned that we can also use the logical OR command to calculate offsets. The following code uses this approach to calculate two offsets:

The calculations from register r8 are then placed in the stack.

After conversion, we see that the registers are set to the start addresses of procedures: the procedure address is moved to the stack.

Reading and writing relative to base

So far, the 32-bit number being moved in two commands has been either a number or offset. In the next example, a base is moved to the high halfword of the register and then reading or writing is performed relative to it.

In this case, we get offsets for variables from the read and write commands themselves. Depending on the size of the operation, the size of the variable may be set as well.

Switch constructions

The switch constructions found in binary files can simplify analysis. For example, based on the number of options inside a switch construction, we can localize the switch responsible for handling a particular protocol or system of commands. This is why we want to recognize switch and its parameters. Take the following code:

After execution, it stops on the register jump jmp r2. This is followed by code referenced in the data; the end of each block of code contains a jump to the same label. Thus we can see that this is a switch construction, and that these blocks handle particular cases within it. Above we also see verification of the number of cases and a default jump.

After we add switch handling, the code looks as follows:

Now we clearly make out the jump, address of the table with offsets, number of cases, and each case with corresponding number.

The table with offsets is as follows: To save space, only the first five elements have been listed.

In essence, switch handling involves going through the code (starting with the tail end) and finding all of its components. So say that a particular switch organization scheme is being described. Sometimes schemes can contain exceptions. This is one reason why existing processor modules can fail to recognize seemingly obvious switches. In effect, the real-life switch simply doesn't fit the scheme defined inside the processor module. Or perhaps a scheme exists, but it contains other commands that are not part of the scheme, the locations of main commands have been switched, or the scheme is interrupted by jumps.

The NIOS II processor module recognizes a switch despite the presence of unrelated instructions between main commands, as well as a switch whose main commands have been switched places or one containing disruptive jumps. A reverse execution path approach is used that takes into account possible scheme-disrupting jumps, with setting of internal variables that signal various states of the recognizer. In total, there are approximately 10 different ways of organizing switch that are found in various firmware.

The custom instruction

The NIOS II has an interesting instruction by the name of custom. This instruction gives access to the 256 user-settable instructions supported on the NIOS II. In addition to general-purpose registers, the custom instruction can access a special set of 32 custom registers. After implementing logic for parsing the custom command, here is what we see:

Note that the two final instructions have the same instruction number and seem to perform the same actions.

The custom instruction is the subject of a separate manual. According to the manual, one of the most complete and modern custom instruction sets is the NIOS II Floating Point Hardware 2 Component (FPH2) set of instructions for floating-point computations. This is how our example looks after implementing FPH2 command parsing:

Based on the mnemonic of the two last commands, we indeed see that they perform the same action (the fadds command).

Jumping by register value

In firmware, we often see situations when a 32-bit offset (setting the jump location) is moved to a register and a jump is performed based on the register value.

Have a look at the code:

In the last line, there is a jump by register value. Before it, the address of the procedure (the one starting in the first line of the example) is moved to the register. The jump clearly is to the beginning of the procedure.

This is the result after adding functionality for jump recognition:

Next to the jmp r8 command is the address to which the jump is being made, if we were able to determine it. A cross-reference is also generated between the command and the address of the jump destination. The cross-reference is visible in the first line, while the jump itself occurs in the final line.

gp (global pointer) values: saving and loading

It is common to use a global pointer set to an address and then address variables relative to that pointer. In NIOS II, the gp register is used to store the global pointer. At a certain moment (most often during the firmware initialization procedures), an address value is moved to the gp register. The processor module handles this situation. To illustrate this, we have given examples of code and output from IDA Pro with debug messages enabled for the processor module.

In this case, the processor module finds and calculates the value of the gp register in the new base. When the idb base is closed, the value of gp is saved in the base.

When an existing idb base is loaded and the value of gp has already been found, the value is loaded form the base, as shown in the debug message in the following example:

Reading and writing relative to gp

Reading and writing with an offset relative to gp is a common occurrence. The following example includes three reads and one write of this type:

Since we have already obtained the address value, which is stored in the gp register, we can address these reads and writes.

Handling of gp-relative reading and writing makes things more convenient for us.

We can see which variables are being accessed, track their use, and determine their purpose.

Addressing relative to gp

The gp register can also be used for addressing variables.

For example, here we see that registers are set relative to gp to certain variables or data regions:

Once we drop in functionality to handle this situation by converting to offsets and adding cross-references, here is the result:

Now it is clear what is happening and we can identify the regions to which registers are being set relative to gp.

Addressing relative to sp

Similarly, registers in the next examples are set to certain memory regions, but this time relative to the stack pointer (sp).

As is visible, the registers are set to certain local variables. Such situations when arguments are set to local buffers before procedures calls are fairly common.

After adding handling for these situations (by converting the values to offsets), we obtain the following:

Now it is clear that after the procedure call, the values are loaded from the variables whose addresses were passed in parameters prior to the function call.

Cross-references from code to structure fields

Setting and using structures in IDA Pro can make code analysis more efficient.

We can see from the code that the field_8 field is incremented and perhaps used as a counter for triggering an event. If the read and write fields are far away from each other in the code, cross-references might be useful.

Let's look at the structure:

Structure fields are accessed, but cross-references from the code to structure elements were not formed.

After these situations are handled, this is how it will all look in our case:

Now there are cross-references to structure fields from the specific commands involving those fields. Direct and reverse cross-references are present as well. And based on various procedures, we can see where the field values are read or written.

Where the manual and reality diverge

According to the manual, during decoding of some commands, certain bits are supposed to take only strictly defined values. For example, for the eret command for returning from an exception, bits 22–26 should equal 0x1E.

Here is a command example from firmware:

When we open different firmware in a place with similar context, something different happens:

These bytes were not automatically converted to a command, although all the commands can be handled. Judging by the context (and even similar address), this should be the same command. But take a close look at the bytes. This is the eret command, except that bits 22–26 equal zero instead of equaling 0x1E.

So we have to slightly tweak the parsing results for the command: although it doesn't exactly correspond to the manual anymore, it does match the reality.

IDA 7 support

The API provided by IDA Python for ordinary scripts has changed considerably as of IDA version 7.0. For processor modules, the changes are massive. Nonetheless, we succeeded in reworking the NIOS II processor module for version 7.

There is one strange thing: when a new binary file for NIOS II is loaded in IDA 7, analysis does not start automatically, unlike in IDA 6.9.


The SDK contains examples in which a processor module, besides having basic disassembler functionality, supports numerous features that make it easier to pick apart code. Certainly this all could be done by hand, but say that you have a binary file with megabytes of firmware containing tens of thousands of offsets of various types—why waste so much time if there is a more efficient way? A well-implemented processor module can perform this task instead. And cruising through code with the help of cross-references can be downright fun! With these abilities, IDA remains the convenient and helpful tool beloved by so many.

Author: Anton Dorfman, Positive Technologies

No comments:

Post a Comment