Instruction selection with Selection DAG

Once the frontend generates the LLVM IR, the compiler must convert it into machine instructions.

We will take a look at how LLVM uses a tree pattern-matching algorithm for this and how a backend like Nova can configure LLVM to generate machine instructions.

Instruction Selection on DAGs

Instruction selection is difficult because a processor provides different ways to perform the same operation. If we had exactly one way to perform a certain operation, the compiler would simply replace each IR line with the corresponding machine instruction.

But rarely are things so simple. A simple reg to reg move instruction can be performed with addition, division, subtraction, multiplication, or a bitwise operation (with the other operand being the corresponding identity element).

LLVM uses a tree pattern-matching structure called a Selection DAG to do this job.

A simple example

A Selection DAG is a directed acyclic graph (DAG) where each node represents an operation or a value. In particular, the leaves of the DAG are constants or registers (virtual or physical) and other nodes represent operations that return their result (which can be multiple).

The selection process works on one basic block at a time. So every basic block is converted into a separate DAG, with no edges to the other basic blocks.

Branch operation nodes take a basic block operand as an input, and the basic block is represented as a node in the DAG.

Let’s take this simple example.

The compiler takes this code as input:

int main(int a, int b) {
  if (a + b == 0) {
    return a - b;
  }
  return 0;
}

The LLVM IR for this code is this (consider no optimizations):

define i32 @main(i32 %a, i32 %b) {
entry:
  %add = add i32 %a, %b
  %cmp = icmp eq i32 %add, 0
  br i1 %cmp, label %true, label %false
true:
  %sub = sub i32 %a, %b
  ret i32 %sub
false:
  ret i32 0
}

This is now converted into a DAG like so:
Notice that there are two disconnected components in this DAG. The unconditional branch node br does not use any value from other nodes, so there is nothing to connect it to.

Other basic blocks are not shown here.
After constructing the DAG, we need to lower it to machine instructions. In this example, the brcond and setcc is combined into a single branch with compare instruction.

This completes the lowering phase of the selection. From here we schedule the instructions into a linear sequence.

SelectionDAG in LLVM

You might notice a problem with the above DAG. The unconditional branch is not connected to the other nodes, so there is no way to know when to schedule it. (we certainly should not emit it before all other instructions!)

To represent control flow dependencies like this, nodes return a special type of value called a chain. Further, if two nodes should be scheduled adjacent to each other, they use a Glue value.

Chains represent a dependency between nodes that can’t be represented by a data dependency. For example a load following a store that might alias with the address of the load. The store must happen before the load. So the load’s chain input is dependent on the store’s chain output either directly or through other intermediate nodes that also have chain inputs and outputs. There can be multiple chains in parallel in the DAG. TokenFactor nodes are used to merge separate chains. The InstrEmitter ensures that the chain dependency is satisfied when emitting the linear instruction sequence after isel. But nothing guarantees that parallel chains won’t be interleaved. After a node is schedule all of the nodes dependent on it either through data or chain are checked to see if they are now ready to schedule. The scheduler will pick from the ready to schedule nodes without any concern for whether they were on the same chain as the last node scheduled.

Glue is stricter, it says that two nodes must be scheduled adjacent to each other in the linear instruction sequence.

Taken from: Glue and chains

Viewing the selection DAG output

While compiling with llc, you can use these options to view the DAG output:

-view-dag-combine1-dags displays the DAG after being built, before the first optimization pass.
-view-legalize-dags displays the DAG before Legalization.
-view-dag-combine2-dags displays the DAG before the second optimization pass.
-view-isel-dags displays the DAG before the Select phase.
-view-sched-dags displays the DAG before Scheduling.

Taken from: Viewing the Selection DAG

Selection stages

Read about the stages of selection in the LLVM documentation and the post linked below.

Instruction Selection - Source Code Artisan

Selection DAG in LLVM

Classes used

Let’s look at the classes in LLVM that represent the selection DAG.

MVT and EVT

These two represent the types in LLVM.

1. MVT

This is a union of all types that targets support in Selection DAG.

From the documentation

Machine Value Type. Every type that is supported natively by some processor targeted by LLVM occurs here. This means that any legal value type can be represented by an MVT.

It is basically a single uint16_t value that is used to represent the type. We can see all types listed in ValueTypes.td. We can query the properties of the type using the MVT class methods.

The types included in MVT are integers like i1, i32, floating point types like f32, bf16, vectors like v1i1, v1f16 and others like Glue and Other. Some of the types included in this are pasted here for reference:

}

def i1      : VTInt<1,   2>;  // One bit boolean value
def i2      : VTInt<2,   3>;  // 2-bit integer value
13 collapsed lines
def i4      : VTInt<4,   4>;  // 4-bit integer value
def i8      : VTInt<8,   5>;  // 8-bit integer value
def i16     : VTInt<16,  6>;  // 16-bit integer value
def i32     : VTInt<32,  7>;  // 32-bit integer value
def i64     : VTInt<64,  8>;  // 64-bit integer value
def i128    : VTInt<128, 9>;  // 128-bit integer value

def bf16    : VTFP<16,  10>;  // 16-bit brain floating point value
def f16     : VTFP<16,  11>;  // 16-bit floating point value
def f32     : VTFP<32,  12>;  // 32-bit floating point value
def f64     : VTFP<64,  13>;  // 64-bit floating point value
def f80     : VTFP<80,  14>;  // 80-bit floating point value
def f128    : VTFP<128, 15>;  // 128-bit floating point value
def ppcf128 : VTFP<128, 16>;  // PPC 128-bit floating point value

def v1i1    : VTVec<1,    i1, 17>;  //    1 x i1 vector value

2. EVT

This extends the types include in the MVT set with types that the LLVM IR supports, like i3 or <4 x i5> but may not be supported directly by a target.

From the documentation

Extended Value Type. Capable of holding value types which are not native for any processor (such as the i12345 type), as well as the types an MVT can represent.

An EVT object holds an MVT value and an LLVM Type object.

We can initialize an EVT object in two ways shown below. Note that they can be implicitly casted from MVT.

public:
  constexpr EVT() = default;
  constexpr EVT(MVT::SimpleValueType SVT) : V(SVT) {}
  constexpr EVT(MVT S) : V(S) {}
  bool operator==(EVT VT) const { return !(*this != VT); }
  bool operator!=(EVT VT) const {

/// for any processor (such as the i12345 type), as well as the types an MVT
/// can represent.
struct EVT {
private:
  MVT V = MVT::INVALID_SIMPLE_VALUE_TYPE;
  Type *LLVMTy = nullptr;
public:
  constexpr EVT() = default;

We then have a number of methods to query the type. Some of them are:

isSimple() - Returns true if the type is a simple type (i.e. not a vector or complex type). Here “extended” means that this is not an MVT type but an LLVMTy.
isInteger() - Returns true if the type is an integer type. and so on.

Viewing the DAG

Run llc with -debug-only=isel to see the selection DAG.

Initial selection DAG: %bb.1 'main:true'
SelectionDAG has 8 nodes:
  t0: ch,glue = EntryToken
      t2: i32,ch = CopyFromReg t0, Register:i32 %3
      t4: i32,ch = CopyFromReg t0, Register:i32 %4
    t5: i32 = sub t2, t4
  t7: ch = CopyToReg t0, Register:i32 %1, t5

SDNode

These are the nodes in the selection DAG.

Each node has an opcode, some flags, a list of operands and a list of result values.

Opcodes

Opcodes are represented by an int32_t value, but is taken from the ISD enum or a target-specific XXXISD enum.

We can query for the SDNode opcode with these methods:

  //  Accessors
  //
  /// Return the SelectionDAG opcode value for this node. For
  /// pre-isel nodes (those for which isMachineOpcode returns false), these
  /// are the opcode values in the ISD and <target>ISD namespaces. For
  /// post-isel opcodes, see getMachineOpcode.
  unsigned getOpcode()  const { return (unsigned)NodeType; }

  /// Test if this node has a target-specific opcode (in the
  /// \<target\>ISD namespace).
  bool isTargetOpcode() const { return NodeType >= ISD::BUILTIN_OP_END; }
  /// Return true if the type of the node type undefined.
  bool isUndef() const { return NodeType == ISD::UNDEF; }

You can see all of the built-in opcodes in ISDOpcodes.h.