Instruction selection with Selection DAG
Once the frontend generates the LLVM IR, the compiler must convert it into machine instructions.
We will take a look at how LLVM uses a tree pattern-matching algorithm for this and how a backend like Nova can configure LLVM to generate machine instructions.
Instruction Selection on DAGs
Instruction selection is difficult because a processor provides different ways to perform the same operation. If we had exactly one way to perform a certain operation, the compiler would simply replace each IR line with the corresponding machine instruction.
But rarely are things so simple. A simple reg to reg move instruction can be performed with addition, division, subtraction, multiplication, or a bitwise operation (with the other operand being the corresponding identity element).
LLVM uses a tree pattern-matching structure called a Selection DAG to do this job.
A simple example
A Selection DAG is a directed acyclic graph (DAG) where each node represents an operation or a value. In particular, the leaves of the DAG are constants or registers (virtual or physical) and other nodes represent operations that return their result (which can be multiple).
The selection process works on one basic block at a time. So every basic block is converted into a separate DAG, with no edges to the other basic blocks.
Branch operation nodes take a basic block operand as an input, and the basic block is represented as a node in the DAG.
Let’s take this simple example.
The compiler takes this code as input:
int main(int a, int b) {if (a + b == 0) {return a - b;}return 0;}The LLVM IR for this code is this (consider no optimizations):
define i32 @main(i32 %a, i32 %b) {entry:%add = add i32 %a, %b%cmp = icmp eq i32 %add, 0br i1 %cmp, label %true, label %falsetrue:%sub = sub i32 %a, %bret i32 %subfalse:ret i32 0}This is now converted into a DAG like so:
Notice that there are two disconnected components in this DAG. The unconditional branch node
br
does not use any value from other nodes, so there is nothing to connect it to.Other basic blocks are not shown here.
After constructing the DAG, we need to lower it to machine instructions. In this example, the brcond and setcc is combined into a single branch with compare instruction.
This completes the lowering phase of the selection. From here we schedule the instructions into a linear sequence.
SelectionDAG in LLVM
You might notice a problem with the above DAG. The unconditional branch is not connected to the other nodes, so there is no way to know when to schedule it. (we certainly should not emit it before all other instructions!)
To represent control flow dependencies like this, nodes return a special type of value called a chain. Further, if two nodes should be scheduled adjacent to each other, they use a Glue value.
Chains represent a dependency between nodes that can’t be represented by a data dependency. For example a load following a store that might alias with the address of the load. The store must happen before the load. So the load’s chain input is dependent on the store’s chain output either directly or through other intermediate nodes that also have chain inputs and outputs. There can be multiple chains in parallel in the DAG. TokenFactor nodes are used to merge separate chains. The InstrEmitter ensures that the chain dependency is satisfied when emitting the linear instruction sequence after isel. But nothing guarantees that parallel chains won’t be interleaved. After a node is schedule all of the nodes dependent on it either through data or chain are checked to see if they are now ready to schedule. The scheduler will pick from the ready to schedule nodes without any concern for whether they were on the same chain as the last node scheduled.
Glue is stricter, it says that two nodes must be scheduled adjacent to each other in the linear instruction sequence.
Viewing the selection DAG output
While compiling with llc
, you can use these options to view the DAG output:
-view-dag-combine1-dags
displays the DAG after being built, before the first optimization pass.-view-legalize-dags
displays the DAG before Legalization.-view-dag-combine2-dags
displays the DAG before the second optimization pass.-view-isel-dags
displays the DAG before the Select phase.-view-sched-dags
displays the DAG before Scheduling.
Selection stages
Read about the stages of selection in the LLVM documentation and the post linked below.
Selection DAG in LLVM
Classes used
Let’s look at the classes in LLVM that represent the selection DAG.
MVT and EVT
These two represent the types in LLVM.
1. MVT
This is a union of all types that targets support in Selection DAG.
From the documentation
Machine Value Type. Every type that is supported natively by some processor targeted by LLVM occurs here. This means that any legal value type can be represented by an MVT.
It is basically a single uint16_t
value that is used to represent the type.
We can see all types listed in ValueTypes.td
.
We can query the properties of the type using the MVT
class methods.
The types included in MVT
are integers like i1, i32, floating point types like f32, bf16,
vectors like v1i1, v1f16 and others like Glue
and Other
.
Some of the types included in this are pasted here for reference:
}
def i1 : VTInt<1, 2>; // One bit boolean valuedef i2 : VTInt<2, 3>; // 2-bit integer value13 collapsed lines
def i4 : VTInt<4, 4>; // 4-bit integer valuedef i8 : VTInt<8, 5>; // 8-bit integer valuedef i16 : VTInt<16, 6>; // 16-bit integer valuedef i32 : VTInt<32, 7>; // 32-bit integer valuedef i64 : VTInt<64, 8>; // 64-bit integer valuedef i128 : VTInt<128, 9>; // 128-bit integer value
def bf16 : VTFP<16, 10>; // 16-bit brain floating point valuedef f16 : VTFP<16, 11>; // 16-bit floating point valuedef f32 : VTFP<32, 12>; // 32-bit floating point valuedef f64 : VTFP<64, 13>; // 64-bit floating point valuedef f80 : VTFP<80, 14>; // 80-bit floating point valuedef f128 : VTFP<128, 15>; // 128-bit floating point valuedef ppcf128 : VTFP<128, 16>; // PPC 128-bit floating point value
def v1i1 : VTVec<1, i1, 17>; // 1 x i1 vector value
2. EVT
This extends the types include in the MVT
set with types that the LLVM IR supports, like i3
or <4 x i5>
but may not be supported directly by a target.
From the documentation
Extended Value Type. Capable of holding value types which are not native for any processor (such as the i12345 type), as well as the types an MVT can represent.
An EVT object holds an MVT
value and an LLVM Type
object.
We can initialize an EVT
object in two ways shown below. Note that they can be implicitly casted from MVT
.
public: constexpr EVT() = default; constexpr EVT(MVT::SimpleValueType SVT) : V(SVT) {} constexpr EVT(MVT S) : V(S) {} bool operator==(EVT VT) const { return !(*this != VT); } bool operator!=(EVT VT) const {
/// for any processor (such as the i12345 type), as well as the types an MVT/// can represent.struct EVT {private: MVT V = MVT::INVALID_SIMPLE_VALUE_TYPE; Type *LLVMTy = nullptr;public: constexpr EVT() = default;
We then have a number of methods to query the type. Some of them are:
isSimple()
- Returns true if the type is a simple type (i.e. not a vector or complex type). Here “extended” means that this is not an MVT type but an LLVMTy.isInteger()
- Returns true if the type is an integer type. and so on.
Viewing the DAG
Run llc
with -debug-only=isel
to see the selection DAG.
Initial selection DAG: %bb.1 'main:true'SelectionDAG has 8 nodes: t0: ch,glue = EntryToken t2: i32,ch = CopyFromReg t0, Register:i32 %3 t4: i32,ch = CopyFromReg t0, Register:i32 %4 t5: i32 = sub t2, t4 t7: ch = CopyToReg t0, Register:i32 %1, t5
SDNode
These are the nodes in the selection DAG.
Each node has an opcode, some flags, a list of operands and a list of result values.
Opcodes
Opcodes are represented by an int32_t value, but is taken from the ISD
enum or
a target-specific XXXISD
enum.
We can query for the SDNode opcode with these methods:
// Accessors // /// Return the SelectionDAG opcode value for this node. For /// pre-isel nodes (those for which isMachineOpcode returns false), these /// are the opcode values in the ISD and <target>ISD namespaces. For /// post-isel opcodes, see getMachineOpcode. unsigned getOpcode() const { return (unsigned)NodeType; }
/// Test if this node has a target-specific opcode (in the /// \<target\>ISD namespace). bool isTargetOpcode() const { return NodeType >= ISD::BUILTIN_OP_END; } /// Return true if the type of the node type undefined. bool isUndef() const { return NodeType == ISD::UNDEF; }
You can see all of the built-in opcodes in ISDOpcodes.h
.