The Return Instruction

Instruction encodings

Since we are not going to generate object code yet, we do not need to write the instruction encodings.

To support instruction selection for the Nova target, we have to write some tablegen and C++ code. Here is what we have to do:

Write instructions in NovaInstrInfo.td. These should cover all instructions we want to support from the MIPS ISA.

Normally you want to add instruction encodings in 1, but we don’t need encodings for compiling to assembly. We just move to the next step.

Write matching patterns in NovaInstrPats.td. This is where we implement the instruction selection.
Write the NovaISelLowering.cpp file to lower LLVM code to the target-specific SelectionDAG nodes.
Write the NovaISelDAGToDAG.cpp file to implement the instruction selection.

You’ll know what each step means when we get to it.

Instruction Selection

Explain how selection works

Remove this section and just link to the previous Instruction Selection pages.

LLVM uses a two-phase process to select instructions.

Most of the time, TableGen will generate the patterns for you.

Selection phase is: LLVM IR -> SelectionDAG —optimize—> SelectionDAG -> Target-specific SelectionDAG —optimize—> Target-specific SelectionDAG

Lowering phase is: Target-specific SelectionDAG —optimize, then lower—> Target-specific SelectionDAG -> MachineInstr —optimize—> MachineInstr

To define Nova’s instructions, we need to write entries for each instruction as a TableGen record that is an instance of the Instruction class. This is done in the NovaInstrInfo.td file.

Instruction formats

Since instructions have a certain encoding format like rs, rt, rd, shamt, funct and others, we usually define these formats in the XXXInstrFormats.td file. Read the Target.td file (llvm/include/llvm/Target/Target.td) to see how the instruction formats are defined.

Let’s add our file NovaInstrFormats.td to the llvm/lib/Target/Nova/ directory.

>> new file

//===-- NovaInstrFormats.td - Nova Instruction Formats --------------------===//
// This file contains the instruction formats for the Nova architecture.
//===----------------------------------------------------------------------===//

Since we are not going to generate object code yet, we do not need to add the instruction encoding formats. We will just create a simple base class for Nova instructions.

//===----------------------------------------------------------------------===//

class NovaInst<dag outs, dag ins, string asmString> : Instruction {
  let Namespace = "Nova";
  let OutOperandList = outs;
  let InOperandList = ins;
  let AsmString = asmString;
}

Remember that the instructions we define are the MachineInstrs that LLVM IR instructions map to. Ideally, these match the target’s instruction set architecture (ISA) instructions.

But sometimes we need additional instructions that are not part of the ISA. These are called “pseudo instructions”. Pseudo instructions are not real instructions, but they are used to represent a sequence of real instructions. They are used to simplify the instruction selection process and to work more easily.

For example, the MIPS backend uses PseudoRet to represent a return instruction. PseudoRet is then printed as jr or jalr depending on the MIPS Version ISA.

Let’s add a Pseudo instruction class for Nova instructions.

}

class PseudoNovaInst<dag outs, dag ins, string asmString> : NovaInst<outs, ins, asmString> {
  let isPseudo = 1;
  let isCodeGenOnly = 1;
}

Defining the instruction

We’ll start with the return instruction.

Create the NovaISD::Ret enum value. In our LLVM version, these enum values are not generated by TableGen, but work is in progress to generate them. See this RFC.

>> new file

//==-- Nova DAG Lowering Interface --------//

#ifndef LLVM_LIB_TARGET_NOVA_NOVAISELLOWERING_H
#define LLVM_LIB_TARGET_NOVA_NOVAISELLOWERING_H

#include "llvm/CodeGen/ISDOpcodes.h"
#include "llvm/CodeGen/TargetLowering.h"
namespace llvm {

namespace NovaISD {
enum NodeType : unsigned {
  FIRST_NUMBER = ISD::BUILTIN_OP_END,

  // Return
  Ret,
};
} // end namespace NovaISD

While we are in this file, add the NovaTargetLowering class. This is responsible for lowering LLVM IR to the target-specific DAG nodes.

→ namespace llvm {

} // end namespace NovaISD

class NovaSubtarget;

class NovaTargetLowering : public TargetLowering {
public:
  explicit NovaTargetLowering(const TargetMachine &TM,
                              const NovaSubtarget &STI);
23 collapsed lines

  SDValue LowerReturn(SDValue Chain, CallingConv::ID CallConv, bool isVarArg,
                      const SmallVectorImpl<ISD::OutputArg> &Outs,
                      const SmallVectorImpl<SDValue> &OutVals, const SDLoc &dl,
                      SelectionDAG &DAG) const override;

  SDValue LowerCall(TargetLowering::CallLoweringInfo &CLI,
                    SmallVectorImpl<SDValue> &InVals) const override;

  bool CanLowerReturn(CallingConv::ID CallConv, MachineFunction &MF,
                      bool IsVarArg,
                      const SmallVectorImpl<ISD::OutputArg> &Outs,
                      LLVMContext &Context, const Type *RetTy) const override;
  SDValue
  LowerFormalArguments(SDValue Chain, CallingConv::ID /*CallConv*/,
                       bool /*isVarArg*/,
                       const SmallVectorImpl<ISD::InputArg> & /*Ins*/,
                       const SDLoc & /*dl*/, SelectionDAG & /*DAG*/,
                       SmallVectorImpl<SDValue> & /*InVals*/) const override {
    return Chain;
  }
  /// getTargetNodeName - This method returns the name of a target specific
  //  DAG node.
  const char *getTargetNodeName(unsigned Opcode) const override;
};

} // namespace llvm

#endif

Add the SDNode that LLVM IR’s ret maps to. The opcode of this node is NovaISD::Ret and it takes a variable number of operands. This is to support multiple return value registers (like returning an i64 value needs two i32 registers, in $v0 and $v1).

>> new file

//===- Nova Instruction Definitions ----------------------------===//
include "NovaInstrFormats.td"

//==---------- All SD nodes for Nova ------------------===//
def NovaRetSDN : SDNode<"NovaISD::Ret",
                        SDTNone, // 0 results and 0 operands
                        [SDNPHasChain, SDNPVariadic, SDNPOptInGlue]>;
//==---------- End SD Node definitions ----------------===//

This node will get selected to the PseudoRet instruction.

                        [SDNPHasChain, SDNPVariadic, SDNPOptInGlue]>;
//==---------- End SD Node definitions ----------------===//
//==----------- Nova Instruction Definitions ----------===//
def PseudoRet : PseudoNovaInst<(outs), (ins), "ret"> {
  let isReturn = 1;
  let isTerminator = 1;
}
//==--------- End Nova Instruction Definitions --------===//

Add the pattern that will select the NovaISD::Ret node.

}
//==--------- End Nova Instruction Definitions --------===//
//==---- All patterns to match SD nodes -----------==//
def : Pat<(NovaRetSDN), (PseudoRet)>;

All target tablegen files are included in the top-level XXX.td file. Include the new NovaInstrInfo.td file in Nova.td:

include "NovaRegisterInfo.td"
include "NovaInstrInfo.td"

def : ProcessorModel<"generic", NoSchedModel, []>;

InstrInfo class

TableGen’erated instruction records are stored in the NovaInstrInfo class. Following the common tablegen pattern, we derive our class from the NovaGenInstrInfo class.

>> new file

#ifndef LLVM_LIB_TARGET_NOVA_NOVAINSTRINFO_H
#define LLVM_LIB_TARGET_NOVA_NOVAINSTRINFO_H

#include "Nova.h"
#include "NovaRegisterInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/TargetInstrInfo.h"

#define GET_INSTRINFO_HEADER
#include "NovaGenInstrInfo.inc"

namespace llvm {
class NovaSubtarget;

class NovaInstrInfo : public NovaGenInstrInfo {
public:
  explicit NovaInstrInfo(const NovaSubtarget &STI);
protected:
  const NovaSubtarget &Subtarget;
};
} // end namespace llvm

#endif

Before we create the constructor, we need stack manipulation instructions.

These instructions and the callseq_end SDNode are just placeholders for now. We will use them while lowering call nodes.

def : Pat<(NovaRetSDN), (PseudoRet)>;

def callseq_end : SDNode<"ISD::CALLSEQ_END", SDTNone, [SDNPHasChain, SDNPOptInGlue]>;

def ADJCALLSTACKDOWN : Instruction {
  let OutOperandList = (outs);
  let Namespace = "Nova";
  let InOperandList = (ins);
  let AsmString = "ADJCALLSTACKDOWN";
  let Pattern = [(callseq_end)];
}

def ADJCALLSTACKUP : Instruction {
  let OutOperandList = (outs);
  let Namespace = "Nova";
  let InOperandList = (ins);
  let AsmString = "ADJCALLSTACKUP";
  let Pattern = [(callseq_end)];
}

Create the NovaInstrInfo.cpp file and implement the constructor.

>> new file

#include "NovaInstrInfo.h"
#include "MCTargetDesc/NovaMCTargetDesc.h"
#include "NovaTargetMachine.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"

using namespace llvm;

#define DEBUG_TYPE "nova-instr-info"

#define GET_INSTRINFO_CTOR_DTOR
#include "NovaGenInstrInfo.inc"

NovaInstrInfo::NovaInstrInfo(const NovaSubtarget &STI) :
  NovaGenInstrInfo(Nova::ADJCALLSTACKDOWN, Nova::ADJCALLSTACKUP),
  Subtarget(STI) { }

Include this in CMakeLists.txt to build the file.

  NovaRegisterInfo.cpp
  MCTargetDesc/NovaMCTargetDesc.cpp
  NovaTargetObjectFile.cpp
  NovaSubtarget.cpp
  MCTargetDesc/NovaMCAsmInfo.cpp
  NovaInstrInfo.cpp

// Add the GenInstrInfo.inc include to MCTargetDesc files.

Registering the InstrInfo

Instructions are represented by enum objects, and individual information is in MCInstrDesc objects.

Include the enum declaration in the MCTargetDesc header file.

#include "NovaGenSubtargetInfo.inc"

#define GET_INSTRINFO_ENUM
#include "NovaGenInstrInfo.inc"

#endif

TableGen generates all instructions in a MSInstrDesc[] array.

using namespace llvm;

#define GET_INSTRINFO_MC_DESC
#define ENABLE_INSTR_PREDICATE_VERIFIER
#include "NovaGenInstrInfo.inc"

#define GET_REGINFO_MC_DESC

We should now also include the necessary files for the definitions.

#include "NovaTargetInfo.h"
#include "llvm/MC/MCSubtargetInfo.h"
#include "llvm/MC/MCInstrInfo.h"

↓ after static MCAsmInfo* createNovaMCAsmInfo(const MCRegisterInfo &MRI, const Triple &TT, const MCTargetOptions &Options) {

}

static MCInstrInfo* createNovaMCInstrInfo() {
  MCInstrInfo *X = new MCInstrInfo();
  InitNovaMCInstrInfo(X);
  return X;
}

static MCInstPrinter* createNovaMCInstPrinter(const Triple &T, unsigned SyntaxVariant, const MCAsmInfo &MAI, const MCInstrInfo &MII, const MCRegisterInfo &MRI) {

→ extern "C" void LLVMInitializeNovaTargetMC() {

  TargetRegistry::RegisterMCRegInfo(*T, createNovaMCRegisterInfo);
  TargetRegistry::RegisterMCSubtargetInfo(*T, createNovaSubtargetInfo);
  TargetRegistry::RegisterMCInstrInfo(*T, createNovaMCInstrInfo);

With this, we have defined everything required to support the return instruction.

Lowering to SelectionDAG

We have to tell the SelectionDAGBuilder how to lower the LLVM IR ret instruction to Nova’s SDNodes.

More specifically, we have to construct physical register nodes for the return values and insert the actual return SDNode. This is done in the LowerReturn method of the TargetLowering class.

Let’s consider an example of a return statement that needs to be lowered.

define i64 @rett(i32 %a, i32 %b) {
entry:
    %aext = zext i32 %a to i64
    %bext = zext i32 %b to i64
    %ret = add i64 %aext, %bext
    ret i64 %ret
}

This is converted into this selection DAG:

Initial selection DAG: %bb.0 'rett:entry'
SelectionDAG has 17 nodes:
  t0: ch,glue = EntryToken
      t2: i32,ch = CopyFromReg t0, Register:i32 %0
    t5: i64 = zero_extend t2
      t4: i32,ch = CopyFromReg t0, Register:i32 %1
    t6: i64 = zero_extend t4
  t7: i64 = add t5, t6
    t9: i32 = extract_element t7, Constant:i32<1>
  t13: ch,glue = CopyToReg t0, Register:i32 $v0, t9
    t11: i32 = extract_element t7, Constant:i32<0>
  t15: ch,glue = CopyToReg t13, Register:i32 $v1, t11, t13:1
  t16: ch = MipsISD::Ret t15, Register:i32 $v0, Register:i32 $v1, t15:1

We see that the return instruction returns two values for one i64 value. This is because the MIPS ABI requires that all values be returned in registers. The return value is split into two 32-bit values.

The LowerReturn method is responsible for lowering the return instruction. It does this by iterating over the return values and creating a new SDNode for each value. The SDNode is then added to the DAG.

See the virtual method in TargetLowering

This method must be implemented by targets.

  }

  /// This hook must be implemented to lower outgoing return values, described
  /// by the Outs array, into the specified DAG. The implementation should
  /// return the resulting token chain value.
  virtual SDValue LowerReturn(SDValue /*Chain*/, CallingConv::ID /*CallConv*/,
                              bool /*isVarArg*/,
                              const SmallVectorImpl<ISD::OutputArg> & /*Outs*/,
                              const SmallVectorImpl<SDValue> & /*OutVals*/,
                              const SDLoc & /*dl*/,
                              SelectionDAG & /*DAG*/) const {
    llvm_unreachable("Not Implemented");
  }

  /// Return true if result of the specified node is used by a return node

To begin, spin up the NovaISelLowering.cpp file.

>> new file

//===- NovaIselLowering.cpp - Nova DAG Lowering Implementation -----------===//
#include "NovaISelLowering.h"
#include "MCTargetDesc/NovaMCTargetDesc.h"
#include "NovaSubtarget.h"

using namespace llvm;

#define DEBUG_TYPE "nova-isel"

We have to declare legal types for the target. This is done in the NovaTargetLowering constructor.

#define DEBUG_TYPE "nova-isel"

NovaTargetLowering::NovaTargetLowering(const TargetMachine &TM,
                                       const NovaSubtarget &STI)
    : TargetLowering(TM) {
  addRegisterClass(MVT::i32, &Nova::GPR32RegClass);

  computeRegisterProperties(STI.getRegisterInfo());
}

Now implement the LowerReturn method.

↓ after : TargetLowering(TM) {

}

SDValue
NovaTargetLowering::LowerReturn(SDValue Chain, CallingConv::ID CallConv,
                                bool isVarArg,
                                const SmallVectorImpl<ISD::OutputArg> &Outs,
                                const SmallVectorImpl<SDValue> &OutVals,
                                const SDLoc &dl, SelectionDAG &DAG) const {

Classes used for lowering arguments and return values

These types that are used for calling-convention information.

1. `ISD::ArgFlagsTy`

This is a bitset that contains information about the argument. It is used to determine how the argument should be passed to the function.

ISD::ArgFlagsTy

namespace ISD {

struct ArgFlagsTy {
private:
  unsigned IsZExt : 1;  ///< Zero extended
  unsigned IsSExt : 1;  ///< Sign extended
  unsigned IsNoExt : 1; ///< No extension
  unsigned IsInReg : 1; ///< Passed in register
  unsigned IsSRet : 1;  ///< Hidden struct-ret ptr
  unsigned IsByVal : 1;    ///< Struct passed by value
  unsigned IsByRef : 1;    ///< Passed in memory

2. `ISD::InputArg`

This struct contains the flags and type information about a single incoming (formal) argument or incoming return value virtual register.

/// of the caller) return value virtual register.
///
struct InputArg {
  ArgFlagsTy Flags;
  MVT VT = MVT::Other;
  EVT ArgVT;
  bool Used = false;

  /// Index original Function's argument.
  unsigned OrigArgIndex;
  /// Sentinel value for implicit machine-level input arguments.
  static const unsigned NoArgIndex = UINT_MAX;

  /// Offset in bytes of current input value relative to the beginning of
  /// original argument. E.g. if argument was splitted into four 32 bit
  /// registers, we got 4 InputArgs with PartOffsets 0, 4, 8 and 12.
  unsigned PartOffset;

  InputArg() = default;

3. `ISD::OutputArg`

Same as ISD::InputArg, but for outgoing arguments. It is used to determine how the argument should be passed to the function.

/// of the caller) return value virtual register.
///
struct OutputArg {
  ArgFlagsTy Flags;
  MVT VT;
  EVT ArgVT;

  /// IsFixed - Is this a "fixed" value, ie not passed through a vararg "...".
  bool IsFixed = false;

  /// Index original Function's argument.
  unsigned OrigArgIndex;

  /// Offset in bytes of current output value relative to the beginning of
  /// original argument. E.g. if argument was splitted into four 32 bit
  /// registers, we got 4 OutputArgs with PartOffsets 0, 4, 8 and 12.
  unsigned PartOffset;
  OutputArg() = default;
  OutputArg(ArgFlagsTy flags, MVT vt, EVT argvt, bool isfixed, unsigned origIdx,

The Outs vector contains the return values that we have to stuff into registers according to the calling convention.

This is done by the generic return lowering code in SelectionDAGBuilder.cpp. It splits the return value of any LLVM type (like i17) into legal types (like i32, f32) and puts them into the Outs vector.

Let’s just support single register return values for now.

↓ after : TargetLowering(TM) {

                                const SmallVectorImpl<SDValue> &OutVals,
                                const SDLoc &dl, SelectionDAG &DAG) const {
  // Handle only integer return values
  // we need to copy the value to the v0 register.
  if (Outs.size() > 1) {
    report_fatal_error(
        "Multiple return values not supported\n"
        "This could be because the return type is a struct or a large integer "
        "that got split into multiple registers",
        false);
  }

report_fatal_error

We use this function here to report a user error.

In the current LLVM version, the report_fatal_error function is deprecated and replaced by reportFatalUsageError.

If we have no return values, just emit a return node.

↓ after report_fatal_error(

  }

  if (Outs.size() == 0) {
    return DAG.getNode(NovaISD::Ret, dl, MVT::Other, Chain);
  }

Else, we iterate over the values given in Outs and emit CopyToReg nodes for each value. These nodes must be glued together, and then to the final NovaISD::Ret node.

Note that this only supports i32 values.

↓ after report_fatal_error(

  }

  SDValue Glue;
  SmallVector<SDValue, 3> RetOps(1, Chain);
  for (unsigned i = 0, e = Outs.size(); i != e; ++i) {
    const ISD::OutputArg &Out = Outs[i];
    const SDValue &OutVal = OutVals[i];
    if (!Out.ArgVT.isScalarInteger() || Out.ArgVT.getScalarSizeInBits() > 32) {
      report_fatal_error("Only i32 return values are supported", false);
    }
    Chain = DAG.getCopyToReg(Chain, dl, Nova::V0, OutVal, Glue);
    Glue = Chain.getValue(1);
    RetOps.push_back(DAG.getRegister(Nova::V0, Out.VT));
  }
  RetOps[0] = Chain;
  RetOps.push_back(Glue);

  return DAG.getNode(NovaISD::Ret, dl,MVT::Other, RetOps);
}

Add dummy implementations for the LowerCall and other required methods.

↓ after report_fatal_error(

}

SDValue NovaTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
  SmallVectorImpl<SDValue> &InVals) const {
    return SDValue();
  }

bool NovaTargetLowering::CanLowerReturn(CallingConv::ID CallConv, MachineFunction &MF,
    bool IsVarArg,
    const SmallVectorImpl<ISD::OutputArg> &Outs,
    LLVMContext &Context, const Type *RetTy) const{
      return true;
}

const char *NovaTargetLowering::getTargetNodeName(unsigned Opcode) const {
  switch (Opcode) {
  case NovaISD::Ret:
    return "NovaISD::Ret";
  default:
    return "Unknown NovaISD::Node";
  }
}

Finally, tell CMakeLists.txt to build the new file.

  MCTargetDesc/NovaMCAsmInfo.cpp
  NovaInstrInfo.cpp
  NovaISelLowering.cpp

Instruction Selection pass

The lowering code above is driven by the instruction selection pass that comes after some IR optimizations in the llc pipeline.

Let’s create the pass for our target. The logic mainly comes from the SelectionDAGISel class.

>> new file

#ifndef LLVM_LIB_TARGET_NOVA_NOVAISELDAGTODAG_H
#define LLVM_LIB_TARGET_NOVA_NOVAISELDAGTODAG_H

#include "NovaSubtarget.h"
#include "NovaTargetMachine.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/SelectionDAGISel.h"

namespace llvm {
class NovaDAGToDAGISel final : public SelectionDAGISel {
  const NovaSubtarget *Subtarget;

public:
  explicit NovaDAGToDAGISel(NovaTargetMachine &TM, CodeGenOptLevel OptLevel)
      : SelectionDAGISel(TM, OptLevel) {}

  bool runOnMachineFunction(MachineFunction &MF) override;

private:
#include "NovaGenDAGISel.inc"

  void Select(SDNode *Node) override;
};
} // namespace llvm

#endif

Select() is called for each node in the DAG. We can put our custom selection code and call the TableGen generated code to select the node based on patterns in td files.

>> new file

#include "NovaISelDAGToDAG.h"
#include "NovaSubtarget.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/SelectionDAGISel.h"
#include "llvm/Pass.h"
#include "llvm/Support/CodeGen.h"

using namespace llvm;

#define DEBUG_TYPE "nova-isel"

namespace {
class NovaDAGToDAGISelLegacy : public SelectionDAGISelLegacy {
public:
  static char ID;
  NovaDAGToDAGISelLegacy(NovaTargetMachine &TM, CodeGenOptLevel OptLevel)
      : SelectionDAGISelLegacy(
            ID, std::make_unique<NovaDAGToDAGISel>(TM, OptLevel)) {}
};
} // namespace

char NovaDAGToDAGISelLegacy::ID = 0;

INITIALIZE_PASS(NovaDAGToDAGISelLegacy, DEBUG_TYPE, "nova-isel", false, false);


FunctionPass *llvm::createNovaISelDagLegacy(NovaTargetMachine &TM,
                                     CodeGenOptLevel OptLevel) {
  return new NovaDAGToDAGISelLegacy(TM, OptLevel);
}

bool NovaDAGToDAGISel::runOnMachineFunction(MachineFunction &MF) {
  Subtarget =
      &static_cast<const NovaSubtarget &>(MF.getSubtarget<NovaSubtarget>());
  return SelectionDAGISel::runOnMachineFunction(MF);
}

void NovaDAGToDAGISel::Select(SDNode *Node) {
  // Implement the selection logic here.
  // This is where you would match the SelectionDAG nodes to the target
  // instructions. For example, you might want to match a specific node type and
  // then create a corresponding machine instruction.

  // Example: if (Node->getOpcode() == ISD::ADD) { ... }
  // This is just a placeholder for the actual implementation.
  SelectCode(Node);
}

Legacy passes like this one need to be initialized by registering them in the PassRegistry. We put such initializer functions in Nova.h file.

#include "llvm/Support/CodeGen.h"

namespace llvm {
  class FunctionPass;
  class NovaTargetMachine;

  FunctionPass *createNovaISelDagLegacy(NovaTargetMachine &TM,
                                     CodeGenOptLevel OptLevel);

  void initializeNovaDAGToDAGISelLegacyPass(PassRegistry &);
} // namespace llvm
#endif

Finish with required includes.

#include "MCTargetDesc/NovaMCTargetDesc.h"
#include "llvm/Pass.h"
#include "llvm/Support/CodeGen.h"

Plug into the pipeline

We now set up the pass pipeline to use the new NovaISelDAGToDAG pass. Register the targetmachine in the target registry.

→ extern "C" void LLVMInitializeNovaTarget() {

extern "C" void LLVMInitializeNovaTarget() {
  // TODO: Add initialize target
  RegisterTargetMachine<NovaTargetMachine> X(getTheNovaTarget());

  initializeNovaDAGToDAGISelLegacyPass(*PassRegistry::getPassRegistry());
}

Targets construct their pipeline by using the TargetPassConfig class.

↓ after extern "C" void LLVMInitializeNovaTarget() {

}

namespace {
class NovaPassConfig : public TargetPassConfig {
public:
  NovaPassConfig(NovaTargetMachine &TM, PassManagerBase &PM)
      : TargetPassConfig(TM, PM) {}

  NovaTargetMachine &getNovaTargetMachine() const {
    return getTM<NovaTargetMachine>();
  }
  bool addInstSelector() override {
    addPass(createNovaISelDagLegacy(getNovaTargetMachine(), getOptLevel()));
    return false;
  }
  void addPreEmitPass() override {}
};
} // namespace

TargetPassConfig *NovaTargetMachine::createPassConfig(PassManagerBase &PM) {
  return new NovaPassConfig(*this, PM);
}

Great! We are almost there - the last piece of the backend is the instruction printer.

Instruction Printer

To write the machine instructions to the assembly file, we have to implement our AsmPrinter pass. This uses another class called MCInstPrinter to print the instructions.

This pass writes the MachineInstr to the output file. It is responsible for converting the MachineInstr to the target-specific assembly syntax.

When we write the instructions in the NovaInstrInfo.td file, we also define the the assembly string format for it. TableGen will generate the printing method using that format.

tablegen(LLVM NovaGenInstrInfo.inc -gen-instr-info)
tablegen(LLVM NovaGenDAGISel.inc -gen-dag-isel)
tablegen(LLVM NovaGenAsmWriter.inc -gen-asm-writer)

>> new file

#ifndef LLVM_LIB_TARGET_NOVA_MCTARGETDESC_NOVAMCINSTPRINTER_H
#define LLVM_LIB_TARGET_NOVA_MCTARGETDESC_NOVAMCINSTPRINTER_H

#include "llvm/MC/MCInstPrinter.h"
#include "llvm/MC/MCRegister.h"

namespace llvm {
class NovaInstPrinter : public MCInstPrinter {
public:
  NovaInstPrinter(const MCAsmInfo &MAI, const MCInstrInfo &MII,
                  const MCRegisterInfo &MRI)
      : MCInstPrinter(MAI, MII, MRI) {}

  void printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
  const MCSubtargetInfo &STI, raw_ostream &O) override;

  bool printAliasInstr(const MCInst *MI, uint64_t Address, raw_ostream &OS);

  void printInstruction(const MCInst *MI, uint64_t Address, raw_ostream &O);

  void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);

  void printRegName(raw_ostream &OS, MCRegister RegNo) override;

  const char *getRegisterName(MCRegister Reg);

  std::pair<const char*, uint64_t>
  getMnemonic(const MCInst &MI) const override;
};
} // end namespace llvm

#endif

The tablegen code is included in the implementation file like so:

>> new file

#include "NovaMCInstPrinter.h"
#include "NovaInstrInfo.h"
#include "llvm/MC/MCInst.h"
#define DEBUG_TYPE "nova-mcinst-printer"

using namespace llvm;

#define PRINT_ALIAS_INSTR
#include "NovaGenAsmWriter.inc"

To print instructions, we use the generated printInstruction method. Sometimes we need to print aliases of the instruction, which is handled by printAliasInstr.

#include "NovaGenAsmWriter.inc"

void NovaInstPrinter::printInst(const MCInst *MI, uint64_t Address, StringRef Annot,
  const MCSubtargetInfo &STI, raw_ostream &O) {
    // check if we have an alias
    if (!printAliasInstr(MI, Address, O)) {
      printInstruction(MI, Address, O);
    }
    printAnnotation(O, Annot);
}

void NovaInstPrinter::printRegName(raw_ostream &OS, MCRegister Reg) {

Registers in MIPS assembly are printed as $v0, $v1, etc. This is done by the printRegName method.

}

void NovaInstPrinter::printRegName(raw_ostream &OS, MCRegister Reg) {
  OS << "$" << StringRef(getRegisterName(Reg)).lower();
}

Printing registers is just a special case of printing operands. MCOperand represents several types of operands:

→ namespace llvm {

class raw_ostream;

/// Instances of this class represent operands of the MCInst class.
/// This is a simple discriminated union.
class MCOperand {
  enum MachineOperandType : unsigned char {
    kInvalid,      ///< Uninitialized.
    kRegister,     ///< Register operand.
    kImmediate,    ///< Immediate operand.
    kSFPImmediate, ///< Single-floating-point immediate operand.
    kDFPImmediate, ///< Double-Floating-point immediate operand.
    kExpr,         ///< Relocatable immediate operand.
    kInst          ///< Sub-instruction operand.
  };
  MachineOperandType Kind = kInvalid;

Handle this on a case-by-case basis in the printOperand method.

↓ after void NovaInstPrinter::printRegName(raw_ostream &OS, MCRegister Reg) {

}

void NovaInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
                                   raw_ostream &O) {
  const MCOperand &Op = MI->getOperand(OpNo);
  if (Op.isReg()) {
    printRegName(O, Op.getReg());
    return;
  }

  if (Op.isImm()) {
    O << Op.getImm();
    return;
  }

  assert(Op.isExpr() && "unknown operand type");
  Op.getExpr()->print(O, &MAI, true);
}

Let’s get this show on the road by getting this in our target.

↓ after static MCInstrInfo* createNovaMCInstrInfo() {

}

static MCInstPrinter* createNovaMCInstPrinter(const Triple &T, unsigned SyntaxVariant, const MCAsmInfo &MAI, const MCInstrInfo &MII, const MCRegisterInfo &MRI) {
  return new NovaInstPrinter(MAI, MII, MRI);
}

extern "C" void LLVMInitializeNovaTargetMC() {

Install the instance in the Target POD class.

→ extern "C" void LLVMInitializeNovaTargetMC() {

  TargetRegistry::RegisterMCInstrInfo(*T, createNovaMCInstrInfo);
  TargetRegistry::RegisterMCAsmInfo(*T, createNovaMCAsmInfo);
  TargetRegistry::RegisterMCInstPrinter(*T, createNovaMCInstPrinter);
}

Reference the new header in.

#include "MCTargetDesc/NovaMCAsmInfo.h"
#include "llvm/MC/MCDwarf.h"
#include "MCTargetDesc/NovaMCInstPrinter.h"

#include "llvm/MC/MCRegisterInfo.h"

Get it rolling by garnishing CMakeLists.txt file with the new files.

  NovaISelLowering.cpp
  NovaISelDAGToDAG.cpp
  MCTargetDesc/NovaMCInstPrinter.cpp

ASM Printer

The class above is used by the assembly printer to print the instructions.

>> new file

#include "Nova.h"
#include "NovaSubtarget.h"
#include "NovaTargetInfo.h"
#include "NovaTargetMachine.h"
#include "MCTargetDesc/NovaMCInstPrinter.h"
#include "llvm/CodeGen/AsmPrinter.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/MC/MCExpr.h"
#include "llvm/MC/MCSymbol.h"
#include "llvm/MC/TargetRegistry.h"

#define DEBUG_TYPE "nova-asm-printer"

using namespace llvm;

namespace {
class NovaAsmPrinter : public AsmPrinter {
public:
  NovaAsmPrinter(TargetMachine &TM, std::unique_ptr<MCStreamer> Streamer)
  : AsmPrinter(TM, std::move(Streamer)) {}

  StringRef getPassName() const override {
    return "Nova Assembly Printer";
  }

  void emitInstruction(const MachineInstr *MI) override;

  // Lower the MachineInstr to MCInst
  void lowerInstruction(const MachineInstr &MI, MCInst &Inst);

  // bool lowerPseudoInstExpansion(const MachineInstr *MI, MCInst &Inst);
private:

  MCOperand lowerSymbolOperand(const MachineOperand &MO, MCSymbol *Sym);
};

MCOperand NovaAsmPrinter::lowerSymbolOperand(const MachineOperand &MO, MCSymbol *Sym) {
  auto &Ctx = OutContext;
  const MCExpr *Expr = MCSymbolRefExpr::create(Sym, MCSymbolRefExpr::VK_None, Ctx);
  assert(MO.isMBB() && "Only basic block symbols are supported");
  return MCOperand::createExpr(Expr);
}


void NovaAsmPrinter::lowerInstruction(const MachineInstr &MI, MCInst &Inst) {
  // This function should convert the MachineInstr to MCInst
  // The implementation will depend on the specific instruction set
  // and how you want to represent it in the MCInst format.
  // For now, we will just print the opcode and operands.

  Inst.setOpcode(MI.getOpcode());
  for (const auto &Op : MI.operands()) {
    MCOperand MCOp;
    switch (Op.getType()) {
      case MachineOperand::MO_Register:
        MCOp = MCOperand::createReg(Op.getReg());
        break;
      case MachineOperand::MO_Immediate:
        MCOp = MCOperand::createImm(Op.getImm());
        break;
      case MachineOperand::MO_MachineBasicBlock:
        MCOp = lowerSymbolOperand(Op, Op.getMBB()->getSymbol());
        break;
      // Add other operand types as needed
      default:
        llvm_unreachable("Unsupported operand type");
    }
    Inst.addOperand(MCOp);
  }
}

} // end anonymous namespace

void NovaAsmPrinter::emitInstruction(const MachineInstr *MI) {
  // Lower the instruction to MCInst
  MCInst Inst;
  lowerInstruction(*MI, Inst);
  EmitToStreamer(*OutStreamer, Inst);
}

extern "C" LLVM_EXTERNAL_VISIBILITY void LLVMInitializeNovaAsmPrinter() {
  RegisterAsmPrinter<NovaAsmPrinter> X(getTheNovaTarget());
}

Add to cmake.

  NovaISelDAGToDAG.cpp
  MCTargetDesc/NovaMCInstPrinter.cpp
  NovaAsmPrinter.cpp

  LINK_COMPONENTS

Compiling

And we are done! We can compile this code to assembly now.

define void @main() {
  ret void
}

Run llc on the file.

llc -mtriple=mipsnova test.ll -o -

        .text
        .globl  voidTest                        # -- Begin function voidTest
        .type   voidTest,@function
voidTest:                               # @voidTest
# %bb.0:
        ret
.Lfunc_end0:
        .size   voidTest, .Lfunc_end0-voidTest
                                        # -- End function
        .section        ".note.GNU-stack","",@progbits

Congrats, you just completed your first LLVM backend!

The Return Instruction

Instruction Selection

Instruction formats

Defining the instruction

InstrInfo class

Registering the InstrInfo

Lowering to SelectionDAG

1. ISD::ArgFlagsTy

2. ISD::InputArg

3. ISD::OutputArg

Instruction Selection pass

Plug into the pipeline

Instruction Printer

ASM Printer

Compiling

1. `ISD::ArgFlagsTy`

2. `ISD::InputArg`

3. `ISD::OutputArg`