TableGen for instruction selection

We have looked at how the instruction selection works. Here, we will see how you can select LLVM IR to your target’s instruction set.

TableGen

TableGen does a lot of heavy-lifting for us in the instruction selection process. Refer to the TableGen documentation for how to use TableGen.

A TableGen introduction is coming soon to this site.

Essentially, we specify instructions and rules on how to match SDNodes to our instructions by using TableGen records. Then, the instruction selection TableGen backend generates the necessary code to actually perform the selection.

This page can be updated

This page is a work in progress. The intention is to provide a comprehensive documentation on how to use TableGen for instruction selection.

The documentation needs some commentary from more experienced developers to be more informative. If you happen to be one of them, please feel free to contribute.

Registers

Before we can write any instruction, we must define some registers for our architecture.

Registers are quite simple, so refer directly to the Adding Registers chapter for how to define them.

Register Classes

The important part in defining the register classes is the supported types of the class. For example, MIPS defines the GPR32 class for register supporting i32 types:

def GPR32 : GPR32Class<[i32]>;

AMDGPU defines the VGPR class for registers supporting v4i32 types:

// VGPR 32-bit registers
// i16/f16 only on VI+
def VGPR_32 : SIRegisterClass<"AMDGPU", !listconcat(Reg32Types.types, Reg16Types.types), 32,
                            (add (sequence "VGPR%u", 0, 255))> {
  let AllocationPriority = 0;
  let Size = 32;
  let Weight = 1;
  let BaseClassOrder = 32;
}
...
// Types are defined as a list:
def Reg16Types : RegisterTypes<[i16, f16, bf16]>;
def Reg32Types : RegisterTypes<[i32, f32, v2i16, v2f16, v2bf16, p2, p3, p5, p6]>;

Defining Instructions

Instructions refer to the MachineInstr class, which holds a MCInstrDesc object and are generated from records that derive from the TableGen Instruction class.

The MCInstrDesc class describes the properties of an instruction, like:

Opcode
Arguments
Arguments that are definitions
Encoding size
Implicit uses and defs
Target-independent flags
Target-specific flags

Instances of this class are created by the InstrInfo TableGen backend. The Opcodes are generated in the target’s namespace and are the same name as the records defined for each instruction.

Types in TableGen

Types come from the ValueTypes.td file. Whenever a valuetype is used here, it refers to records that subclass the ValueType class. Here are some examples:

Integers: i1, i2, ... i128
Floating point: bf16, f16, f32... f128, ppcf128
Vectors: v1i1, ... v2048i1, v64i4, ...
ScalableVectors: nxv1i1 etc

Some special types are:

Glue - used for glue operands
OtherVT - the “Other” value, used as the type of return values of nodes of opcodes like ISD::BasicBlock or ISD::Br.

Find all of them in the llvm/include/llvm/CodeGen/ValueTypes.td file.

SD Nodes in TableGen

To describe patterns, we need to specify the SDNodes that we want to match. An SDNode is defined by its type profile, opcode and node properties.

Syntax:

def SDNodeName : SDNode<"XXXISD::OpcodeName", {type profile record}, [{node properties}...], {sdnode C++ class}>;

Type Profile

Instances of this class are used to describe the types of the operands and results of the SDNode.

Type constraints

This uses a class called SDTypeConstraint to describe the input and output types of the SDNode. TableGen has hardcoded a list of subclasses of SDTypeConstraint (like SDTisFP) that are documented in the TargetSelectionDAG.td file.

class SDTypeConstraint<int opnum>;

OpNum argument

The argument int OpNum is the operand number of the SDNode that we want to apply this to.

The 0th operand is the return value of the SDNode’s, and the operands of the SDNode are numbered from 1 to N.

To use these classes, we create anonymous records in the type profile.

Creating a type profile

To create a type profile, define a record that subclasses the SDTypeProfile class.

Here is an example:

// SDTCisFP - The specified operand has floating-point type.
class SDTCisFP<int OpNum> : SDTypeConstraint<OpNum>;

// Use it by creating an anonymous record in the type profile like: SDTCisFP<0>
// Create a floating point binary operation in SDAG:

def SDFloatingBinaryOp :
             SDTypeProfile<
              1, // the number of result values
              2, // the number of operands
              [ SDTCisSameAs<0, 1>,
                SDTCisSameAs<0, 2>, // all operands and result are same
                SDTCisFP<0> // all are floating point
              ]>;

There are loads of pre-defined type profiles in TargetSelectionDAG.td that you can use.

SDNode Properties

The properties of the SDNode are defined in SDNodeProperties.td file.

Show all properties

  list<SDNodeProperty> Properties = [];
}
//===----------------------------------------------------------------------===//
// Selection DAG Node Properties.
//
// Note: These are hard coded into tblgen.
//
def SDNPCommutative : SDNodeProperty;   // X op Y == Y op X
def SDNPAssociative : SDNodeProperty;   // (X op Y) op Z == X op (Y op Z)
def SDNPHasChain    : SDNodeProperty;   // R/W chain operand and result
def SDNPOutGlue     : SDNodeProperty;   // Write a flag result
def SDNPInGlue      : SDNodeProperty;   // Read a flag operand
def SDNPOptInGlue   : SDNodeProperty;   // Optionally read a flag operand
def SDNPMayStore    : SDNodeProperty;   // May write to memory, sets 'mayStore'.
def SDNPMayLoad     : SDNodeProperty;   // May read memory, sets 'mayLoad'.
def SDNPSideEffect  : SDNodeProperty;   // Sets 'HasUnmodelledSideEffects'.
def SDNPMemOperand  : SDNodeProperty;   // Touches memory, has assoc MemOperand
def SDNPVariadic    : SDNodeProperty;   // Node has variable arguments.

SD Nodes

Targets can create their own SDNodes to match with this TableGen class.

MIPS uses an SDNode with the opcode MIPSISD::Ret to lower return instructions from LLVM IR to SDAG. This gets selected into a RetRA machine pseudo instruction.

class SDNode<string opcode, SDTypeProfile typeprof,
             list<SDNodeProperty> props = [], string sdclass = "SDNode">;

Here is the example from MIPSInstrInfo.td:

// Return
def MipsRet : SDNode<"MipsISD::Ret", SDTNone,
                     [SDNPHasChain, SDNPOptInGlue, SDNPVariadic]>;

// This is used in the RetRA instruction's inline pattern:
def RetRA : PseudoSE<(outs), (ins), [(MipsRet)]>;

Patterns

Patterns in TableGen are used to match the SDNodes to the instructions.

You can jump to the Patterns Semantics section to just see the semantics.

Before we get into the details, here is a simple example of a pattern:

def : Pat<(add GPR32:$gp, (MipsGPRel tglobaladdr:$in)),
                (ADDiu GPR32:$gp, tglobaladdr:$in)>;
// where:
// add = builtin SDNode for the add operation
// GPR32 = register class
// MipsGPRel = SDNode for the global address
// tglobaladdr = builtin SDNode for a global address
// ADDiu = The target's instruction for the add immediate unsigned operation

We match the add SDNode with the ADDiu instruction where one of the operands is a global address.

Syntax

To match a SDAG pattern, we use the Pat class. The following defines a single pattern:

Signature:
class Pattern<dag patternToMatch, list<dag> resultInstrs>;

Pat

Usually we use the Pat class instead of Pattern directly. Targets create their own Pat class wrapper.

class MipsPat<dag pattern, dag result> : Pat<pattern, result>, PredicateControl;

Input pattern The LHS of the Pat is defined as the input pattern. (the patternToMatch in the above signature) The dag patternToMatch is the SDNode that we want to match in the SDAG.

Input patterns cannot contain instructions, so we cannot match instructions to .other instructions through these patterns.

Output pattern The RHS is the output pattern.

The list<dag> resultInstrs is the list of result SDAGs that we want to generate on matching the previous pattern. These cannot contain SDNodes (like add).

def : Pat<(input_pattern), (output_pattern)>;

The list<dag> resultInstrs is the list of result SDAGs that we want to generate on matching the previous pattern. This is called as the output pattern. These cannot contain SDNodes (like add).

Right now resultInstrs can only take a single dag in the list. This could change in the future (but is not likely).

Use the Pat class to create a pattern instead of using Pattern directly.

class Pat<dag patternToMatch, dag resultInstrs> : Pattern<patternToMatch, [resultInstrs]>;

Inline patterns

Patterns can also be defined inline, in the Instruction record. These are input patterns only (so we cannot use OutPatFrags).

Read in detail how inline patterns work

def ADD32 : XXXInstrFormat<(outs GPR32:$dst),
                           (ins GPR32:$src1, GPR32:$src2),
        /*inline pattern*/ (set GPR32:$dst, (add GPR32:$src1, GPR32:$src2))
                           >;

The special record set is used to match the output operand of the instruction.

Using patterns

Using patterns is quite simple. Use the Pat class to create a pattern. You don’t have to name the pattern record.

def : Pat<(add i32:$src1, i32:$src2)),
                (ADDiu i32:$src1, i32:$src2)>;

PatFrags

Pattern fragments are like macros for patterns. They can be used as an operator in other patterns or other pattern fragments. These get expanded at the use site into their corresponding output patterns.

def not_pat_frag : PatFrag<(ops node:$in), (xor node:$in, -1)>;

Syntax

def {pat_frag_name} : PatFrags<(ops { node:$nameN }*), [ {output patternN}* ]>

The ‘input’ of a pattern fragment is the (ops node, ...) pattern. The output pattern (xor node:$in, -1) is what gets inlined at the use of this pattern fragment.

`ops`, `ins`, `outs`, `node`, `set`

All these are “keywords” in the ISel TableGen language.

“keywords” mean just special records that have a hardcoded meaning in the TableGen backend.

ops, ins and outs are all equivalent nodes used only in input patterns for fragments and inline patterns in instructions.
- ins is mostly used for input operand pattern of an instruction and outs for the output.
- ops is mostly used in pattern fragments.
set is a special operator that can be only used in instruction’s inline patterns.
node can be only used in the input pattern of a pattern fragment.

def : Pat<(not_pat_frag i16:$in), (NotInstr i16:$in)>;
// not_pat_frag gets expanded to:
def : Pat<(xor i16:$in, -1), (NotInstr i16:$in)>;

The labels like $in cannot be used more than once in the same pattern fragment.

We can use more than one output for the pattern fragment to expand to.

def multiple_pat_frag : PatFrags<(ops node:$in), [(xor node:$in, -1), (add node:$in, 1)]>;

The use of this PatFrag will get expanded to match both the xor and add instructions to the same output instruction pattern.

def : Pat<(multiple_pat_frag i16:$in), (NotInstr i16:$in)>;
// xor with -1 and add with 1 will match to the NotInstr in the
// target-specific DAG.

OutPatFrags

The PatFrags we saw stand for input pattern macros. For output patterns, we need to use OutPatFrags.

The syntax is the same as PatFrags, but we cannot use SDNodeXForm or SDNode records as operands.

def out_pat_frag : OutPatFrag<
                          (ops node:$src1, node:$src2),
                          (ADD32 $src1, // ADD32 is an instruction
                                 (ADD32 GPR32, $src2)
                          )
                          >;
// We cannot use SDNodes like `add` or `xor` in the output pattern.
def invalid_pat_frag : OutPatFrag< // use PatFrag instead
                          (ops node:$src1, node:$src2),
                          (add $src1, $src2) // invalid
                          >;

These can be used in other pattern fragments also.

def : Pat<(fmul f32:$src1, f32:$src2),
                (out_pat_frag $src1, $src2)>;

ComplexPatterns

Complex patterns are used mainly for addressing modes where we want a C++ function to do the matching manually.

class ComplexPattern<ValueType ty, // the return type of emitted node by the select function
                     int numpos, // number of operands the select function takes
                     string fn, // name of the select function. This is a member
                                // of the ISelDAGToDAG class where we include the
                                // tablegen'erated file.
                     >;

These can be used as nodes in pattern or pattern fragments.

def : Pat<(i32 (load addrRegImm:$a),
               (LW addrRegImm:$a))>;

Using constants

Patterns can match or emit constants. The constants are represented as TargetConstant nodes if emitted, otherwise are just Constant nodes.

def : Pat<(store (i32 0), addr:$dst),
          (SW ZERO, addr:$dst)>;
// where:
// store = SDNode for the store operation. This is a builtin PatFrag.
// i32 = SDNode for the i32 type
// 0 = the constant value
// ZERO = the target's zero register (MIPS zero register)
// addr = a complex pattern for a MIPS addressing mode.

Type inference

The input pattern (in a Pat definition) must resolve to a single type.

This is done by intersecting the possible types of the nodes in the pattern DAG.

For example, the following pattern will not work because the types of the add operands can be i32 or i64:

// GPR32 register class supports [i32, i64] types
def : Pat<(add GPR32:$src1, GPR32:$src2),
                (ADDrr GPR32:$src1, GPR32:$src2)>;

To constrain the types, we use the type cast operator:

TypeCastOperator ::= TGTypes::List<recordOfType(ValueType)> // for intrinsics returning multiple values
                                                        // like ([i16, i32] (intrinsic_with_two_ret_values ...))
                    | recordOfType(ValueType) // single ret value type cast
                                             // like (i16 (op_ret_one_val ...))

Basically we just wrap the multi-typed operation in a type node.

// we need to cast to the type of the return
// value of the extloadi8 operation.
def : Pat<(i32 (extloadi8 addr:$src),
               (LBu addr:$src))>;

Since we need the intersection of all types to be singular, we can constrain one of the types to be a single type:

def : Pat<(add GPR32:$src1, i32:$src2),
                (ADDrr GPR32:$src1, GPR32:$src2)>;

GPR32 must contain i32 though, else it is an error.

Trying this in PatFrags seems to fail, so I am not sure if this is a universal rule. No one really does this sort of partly-constrained matching; it is the return type that is casted like shown above.

def : Pat<(brcond (i32 (cond_op RO:$rs, RO:$rt)), bb:$offset),
          (CBranch RO:$rs, RO:$rt, bb:$offset)>;

Pattern semantics reference

I have attempted to condense the allowed combinations of operations in patterns with this BNF-like grammar syntax. This is not exhaustive, but should be enough to serve as a quick reference.

using TGTypes = TableGenTypes; // only for readability,
// this virtual namespace refers to types in the TableGen language.
function recordOfType(TGTypes::Class className) -> TGTypes::Record;
// this grammar function is to represent an instance record of a
// tablgen class.

// values in single quotes are just characters or built-in records.

pattern ::=  '(' PatternOperator ')' // only operator, no arguments
           | '(' PatternOperator ArgList ')' // operator with arguments

attribute bool pattern.isInput; // these match the SDAG nodes
attribute bool pattern.isOutput; // these are what replace SDAG nodes.

PatternOperator ::= 'null_frag'
                    | TypeCastOperator
                    | SDNodeOperator
                    | recordOfType(
                            PatFrags
                            | Instruction if pattern.isOutput
                            | SDNodeXForm if pattern.isOutput
                            | Intrinsic if pattern.isInput
                            | ComplexPattern
                          )
                  | 'set' // can only be used in instruction inline patterns
                  | 'ops' | 'ins' | 'outs' // only for pattern fragment operators

SDNodeOperator ::= OutputSDNodeOperator if pattern.isOutput
                   | InputSDNodeOperator if pattern.isInput

OutputSDNodeOperator ::= 'imm' | 'timm' | 'fpimm' | 'tglobaltlsaddr'
                    | 'tglobaladdr' | 'tblockaddress' | 'texternalsym'
                    | 'tconstpool' | 'tjumptable' | 'tframeindex'
                    | 'bb' | 'vt' | 'mcsym'
                         // no other SDNodes' can be emitted in the DAG

InputSDNodeOperator ::= recordOfType(SDNode) // like (add imm:$immediate, ...)
                                            // any SDNode can be used here

TypeCastOperator ::= TGTypes::List<recordOfType(ValueType)>
                               // for intrinsics returning multiple values
                               // like ([i16, i32] (intrinsic_with_two_ret_values ...))
                    | recordOfType(ValueType) // single ret value type cast
                                             // like (i16 (op_ret_one_val ...))

ArgList ::= PatternArg ( ',' PatternArg )* // comma separated list of arguments

PatternArg ::= PatternArgVal ArgName? // argument can have an optional name
            | '$' name // no value
            | 'node' ArgName // special record to indicate an input argument
                             // for PatFrag operands only
            | instanceOfType(TGTypes::Int) // int value, to match or emit constants
            | instanceOfType(TGTypes::Bit) // converted to an int value
            | instanceOfType(TGTypes::Bits) ArgName?
                                      // Bits is also converted to an int value
                                      // and emitted as TargetConstant node.

ArgName ::= ':' '$' identifier

PatternArgVal ::= pattern // note: (add imm:$imm) is the same as (add (imm):$imm)
                | '?' // unset value
                | SDNodeOperator // like 'imm' in (add imm:$immediate, ...)
                                // this is wrapped into its own dag: imm:$name -> (imm):$name
                | recordOfType(PatFrags) // this is wrapped similarly