summaryrefslogtreecommitdiffstats
path: root/third_party/rust/cranelift-codegen/src/regalloc/liveness.rs
diff options
context:
space:
mode:
Diffstat (limited to 'third_party/rust/cranelift-codegen/src/regalloc/liveness.rs')
-rw-r--r--third_party/rust/cranelift-codegen/src/regalloc/liveness.rs443
1 files changed, 443 insertions, 0 deletions
diff --git a/third_party/rust/cranelift-codegen/src/regalloc/liveness.rs b/third_party/rust/cranelift-codegen/src/regalloc/liveness.rs
new file mode 100644
index 0000000000..2e9c5015bd
--- /dev/null
+++ b/third_party/rust/cranelift-codegen/src/regalloc/liveness.rs
@@ -0,0 +1,443 @@
+//! Liveness analysis for SSA values.
+//!
+//! This module computes the live range of all the SSA values in a function and produces a
+//! `LiveRange` instance for each.
+//!
+//!
+//! # Liveness consumers
+//!
+//! The primary consumer of the liveness analysis is the SSA coloring pass which goes through each
+//! block and assigns a register to the defined values. This algorithm needs to maintain a set of the
+//! currently live values as it is iterating down the instructions in the block. It asks the
+//! following questions:
+//!
+//! - What is the set of live values at the entry to the block?
+//! - When moving past a use of a value, is that value still alive in the block, or was that the last
+//! use?
+//! - When moving past a branch, which of the live values are still live below the branch?
+//!
+//! The set of `LiveRange` instances can answer these questions through their `def_local_end` and
+//! `livein_local_end` queries. The coloring algorithm visits blocks in a topological order of the
+//! dominator tree, so it can compute the set of live values at the beginning of a block by starting
+//! from the set of live values at the dominating branch instruction and filtering it with
+//! `livein_local_end`. These sets do not need to be stored in the liveness analysis.
+//!
+//! The secondary consumer of the liveness analysis is the spilling pass which needs to count the
+//! number of live values at every program point and insert spill code until the number of
+//! registers needed is small enough.
+//!
+//!
+//! # Alternative algorithms
+//!
+//! A number of different liveness analysis algorithms exist, so it is worthwhile to look at a few
+//! alternatives.
+//!
+//! ## Data-flow equations
+//!
+//! The classic *live variables analysis* that you will find in all compiler books from the
+//! previous century does not depend on SSA form. It is typically implemented by iteratively
+//! solving data-flow equations on bit-vectors of variables. The result is a live-out bit-vector of
+//! variables for every basic block in the program.
+//!
+//! This algorithm has some disadvantages that makes us look elsewhere:
+//!
+//! - Quadratic memory use. We need a bit per variable per basic block in the function.
+//! - Dense representation of sparse data. In practice, the majority of SSA values never leave
+//! their basic block, and those that do spa basic blocks rarely span a large number of basic
+//! blocks. This makes the data stored in the bitvectors quite sparse.
+//! - Traditionally, the data-flow equations were solved for real program *variables* which does
+//! not include temporaries used in evaluating expressions. We have an SSA form program which
+//! blurs the distinction between temporaries and variables. This makes the quadratic memory
+//! problem worse because there are many more SSA values than there was variables in the original
+//! program, and we don't know a priori which SSA values leave their basic block.
+//! - Missing last-use information. For values that are not live-out of a basic block, we would
+//! need to store information about the last use in the block somewhere. LLVM stores this
+//! information as a 'kill bit' on the last use in the IR. Maintaining these kill bits has been a
+//! source of problems for LLVM's register allocator.
+//!
+//! Data-flow equations can detect when a variable is used uninitialized, and they can handle
+//! multiple definitions of the same variable. We don't need this generality since we already have
+//! a program in SSA form.
+//!
+//! ## LLVM's liveness analysis
+//!
+//! LLVM's register allocator computes liveness per *virtual register*, where a virtual register is
+//! a disjoint union of related SSA values that should be assigned to the same physical register.
+//! It uses a compact data structure very similar to our `LiveRange`. The important difference is
+//! that Cranelift's `LiveRange` only describes a single SSA value, while LLVM's `LiveInterval`
+//! describes the live range of a virtual register *and* which one of the related SSA values is
+//! live at any given program point.
+//!
+//! LLVM computes the live range of each virtual register independently by using the use-def chains
+//! that are baked into its IR. The algorithm for a single virtual register is:
+//!
+//! 1. Initialize the live range with a single-instruction snippet of liveness at each def, using
+//! the def-chain. This does not include any phi-values.
+//! 2. Go through the virtual register's use chain and perform the following steps at each use:
+//! 3. Perform an exhaustive depth-first traversal up the CFG from the use. Look for basic blocks
+//! that already contain some liveness and extend the last live SSA value in the block to be
+//! live-out. Also build a list of new basic blocks where the register needs to be live-in.
+//! 4. Iteratively propagate live-out SSA values to the new live-in blocks. This may require new
+//! PHI values to be created when different SSA values can reach the same block.
+//!
+//! The iterative SSA form reconstruction can be skipped if the depth-first search only encountered
+//! one SSA value.
+//!
+//! This algorithm has some advantages compared to the data-flow equations:
+//!
+//! - The live ranges of local virtual registers are computed very quickly without ever traversing
+//! the CFG. The memory needed to store these live ranges is independent of the number of basic
+//! blocks in the program.
+//! - The time to compute the live range of a global virtual register is proportional to the number
+//! of basic blocks covered. Many virtual registers only cover a few blocks, even in very large
+//! functions.
+//! - A single live range can be recomputed after making modifications to the IR. No global
+//! algorithm is necessary. This feature depends on having use-def chains for virtual registers
+//! which Cranelift doesn't.
+//!
+//! Cranelift uses a very similar data structures and algorithms to LLVM, with the important
+//! difference that live ranges are computed per SSA value instead of per virtual register, and the
+//! uses in Cranelift IR refers to SSA values instead of virtual registers. This means that
+//! Cranelift can skip the last step of reconstructing SSA form for the virtual register uses.
+//!
+//! ## Fast Liveness Checking for SSA-Form Programs
+//!
+//! A liveness analysis that is often brought up in the context of SSA-based register allocation
+//! was presented at CGO 2008:
+//!
+//! > Boissinot, B., Hack, S., Grund, D., de Dinechin, B. D., & Rastello, F. (2008). *Fast Liveness
+//! Checking for SSA-Form Programs.* CGO.
+//!
+//! This analysis uses a global pre-computation that only depends on the CFG of the function. It
+//! then allows liveness queries for any (value, program point) pair. Each query traverses the use
+//! chain of the value and performs lookups in the precomputed bit-vectors.
+//!
+//! I did not seriously consider this analysis for Cranelift because:
+//!
+//! - It depends critically on use chains which Cranelift doesn't have.
+//! - Popular variables like the `this` pointer in a C++ method can have very large use chains.
+//! Traversing such a long use chain on every liveness lookup has the potential for some nasty
+//! quadratic behavior in unfortunate cases.
+//! - It says "fast" in the title, but the paper only claims to be 16% faster than a data-flow
+//! based approach, which isn't that impressive.
+//!
+//! Nevertheless, the property of only depending in the CFG structure is very useful. If Cranelift
+//! gains use chains, this approach would be worth a proper evaluation.
+//!
+//!
+//! # Cranelift's liveness analysis
+//!
+//! The algorithm implemented in this module is similar to LLVM's with these differences:
+//!
+//! - The `LiveRange` data structure describes the liveness of a single SSA value, not a virtual
+//! register.
+//! - Instructions in Cranelift IR contains references to SSA values, not virtual registers.
+//! - All live ranges are computed in one traversal of the program. Cranelift doesn't have use
+//! chains, so it is not possible to compute the live range for a single SSA value independently.
+//!
+//! The liveness computation visits all instructions in the program. The order is not important for
+//! the algorithm to be correct. At each instruction, the used values are examined.
+//!
+//! - The first time a value is encountered, its live range is constructed as a dead live range
+//! containing only the defining program point.
+//! - The local interval of the value's live range is extended so it reaches the use. This may
+//! require creating a new live-in local interval for the block.
+//! - If the live range became live-in to the block, add the block to a work-list.
+//! - While the work-list is non-empty pop a live-in block and repeat the two steps above, using each
+//! of the live-in block's CFG predecessor instructions as a 'use'.
+//!
+//! The effect of this algorithm is to extend the live range of each to reach uses as they are
+//! visited. No data about each value beyond the live range is needed between visiting uses, so
+//! nothing is lost by computing the live range of all values simultaneously.
+//!
+//! ## Cache efficiency of Cranelift vs LLVM
+//!
+//! Since LLVM computes the complete live range of a virtual register in one go, it can keep the
+//! whole `LiveInterval` for the register in L1 cache. Since it is visiting the instructions in use
+//! chain order, some cache thrashing can occur as a result of pulling instructions into cache
+//! somewhat chaotically.
+//!
+//! Cranelift uses a transposed algorithm, visiting instructions in order. This means that each
+//! instruction is brought into cache only once, and it is likely that the other instructions on
+//! the same cache line will be visited before the line is evicted.
+//!
+//! Cranelift's problem is that the `LiveRange` structs are visited many times and not always
+//! regularly. We should strive to make the `LiveRange` struct as small as possible such that
+//! multiple related values can live on the same cache line.
+//!
+//! - Local values should fit in a 16-byte `LiveRange` struct or smaller. The current
+//! implementation contains a 24-byte `Vec` object and a redundant `value` member pushing the
+//! size to 32 bytes.
+//! - Related values should be stored on the same cache line. The current sparse set implementation
+//! does a decent job of that.
+//! - For global values, the list of live-in intervals is very likely to fit on a single cache
+//! line. These lists are very likely to be found in L2 cache at least.
+//!
+//! There is some room for improvement.
+
+use crate::entity::SparseMap;
+use crate::flowgraph::{BlockPredecessor, ControlFlowGraph};
+use crate::ir::dfg::ValueDef;
+use crate::ir::{Block, Function, Inst, Layout, ProgramPoint, Value};
+use crate::isa::{EncInfo, OperandConstraint, TargetIsa};
+use crate::regalloc::affinity::Affinity;
+use crate::regalloc::liverange::LiveRange;
+use crate::timing;
+use alloc::vec::Vec;
+use core::mem;
+use core::ops::Index;
+
+/// A set of live ranges, indexed by value number.
+type LiveRangeSet = SparseMap<Value, LiveRange>;
+
+/// Get a mutable reference to the live range for `value`.
+/// Create it if necessary.
+fn get_or_create<'a>(
+ lrset: &'a mut LiveRangeSet,
+ value: Value,
+ isa: &dyn TargetIsa,
+ func: &Function,
+ encinfo: &EncInfo,
+) -> &'a mut LiveRange {
+ // It would be better to use `get_mut()` here, but that leads to borrow checker fighting
+ // which can probably only be resolved by non-lexical lifetimes.
+ // https://github.com/rust-lang/rfcs/issues/811
+ if lrset.get(value).is_none() {
+ // Create a live range for value. We need the program point that defines it.
+ let def;
+ let affinity;
+ match func.dfg.value_def(value) {
+ ValueDef::Result(inst, rnum) => {
+ def = inst.into();
+ // Initialize the affinity from the defining instruction's result constraints.
+ // Don't do this for call return values which are always tied to a single register.
+ affinity = encinfo
+ .operand_constraints(func.encodings[inst])
+ .and_then(|rc| rc.outs.get(rnum))
+ .map(Affinity::new)
+ .or_else(|| {
+ // If this is a call, get the return value affinity.
+ func.dfg
+ .call_signature(inst)
+ .map(|sig| Affinity::abi(&func.dfg.signatures[sig].returns[rnum], isa))
+ })
+ .unwrap_or_default();
+ }
+ ValueDef::Param(block, num) => {
+ def = block.into();
+ if func.layout.entry_block() == Some(block) {
+ // The affinity for entry block parameters can be inferred from the function
+ // signature.
+ affinity = Affinity::abi(&func.signature.params[num], isa);
+ } else {
+ // Give normal block parameters a register affinity matching their type.
+ let rc = isa.regclass_for_abi_type(func.dfg.value_type(value));
+ affinity = Affinity::Reg(rc.into());
+ }
+ }
+ };
+ lrset.insert(LiveRange::new(value, def, affinity));
+ }
+ lrset.get_mut(value).unwrap()
+}
+
+/// Extend the live range for `value` so it reaches `to` which must live in `block`.
+fn extend_to_use(
+ lr: &mut LiveRange,
+ block: Block,
+ to: Inst,
+ worklist: &mut Vec<Block>,
+ func: &Function,
+ cfg: &ControlFlowGraph,
+) {
+ // This is our scratch working space, and we'll leave it empty when we return.
+ debug_assert!(worklist.is_empty());
+
+ // Extend the range locally in `block`.
+ // If there already was a live interval in that block, we're done.
+ if lr.extend_in_block(block, to, &func.layout) {
+ worklist.push(block);
+ }
+
+ // The work list contains those blocks where we have learned that the value needs to be
+ // live-in.
+ //
+ // This algorithm becomes a depth-first traversal up the CFG, enumerating all paths through the
+ // CFG from the existing live range to `block`.
+ //
+ // Extend the live range as we go. The live range itself also serves as a visited set since
+ // `extend_in_block` will never return true twice for the same block.
+ //
+ while let Some(livein) = worklist.pop() {
+ // We've learned that the value needs to be live-in to the `livein` block.
+ // Make sure it is also live at all predecessor branches to `livein`.
+ for BlockPredecessor {
+ block: pred,
+ inst: branch,
+ } in cfg.pred_iter(livein)
+ {
+ if lr.extend_in_block(pred, branch, &func.layout) {
+ // This predecessor block also became live-in. We need to process it later.
+ worklist.push(pred);
+ }
+ }
+ }
+}
+
+/// Liveness analysis for a function.
+///
+/// Compute a live range for every SSA value used in the function.
+pub struct Liveness {
+ /// The live ranges that have been computed so far.
+ ranges: LiveRangeSet,
+
+ /// Working space for the `extend_to_use` algorithm.
+ /// This vector is always empty, except for inside that function.
+ /// It lives here to avoid repeated allocation of scratch memory.
+ worklist: Vec<Block>,
+}
+
+impl Liveness {
+ /// Create a new empty liveness analysis.
+ ///
+ /// The memory allocated for this analysis can be reused for multiple functions. Use the
+ /// `compute` method to actually runs the analysis for a function.
+ pub fn new() -> Self {
+ Self {
+ ranges: LiveRangeSet::new(),
+ worklist: Vec::new(),
+ }
+ }
+
+ /// Current live ranges.
+ pub fn ranges(&self) -> &LiveRangeSet {
+ &self.ranges
+ }
+
+ /// Clear all data structures in this liveness analysis.
+ pub fn clear(&mut self) {
+ self.ranges.clear();
+ self.worklist.clear();
+ }
+
+ /// Get the live range for `value`, if it exists.
+ pub fn get(&self, value: Value) -> Option<&LiveRange> {
+ self.ranges.get(value)
+ }
+
+ /// Create a new live range for `value`.
+ ///
+ /// The new live range will be defined at `def` with no extent, like a dead value.
+ ///
+ /// This asserts that `value` does not have an existing live range.
+ pub fn create_dead<PP>(&mut self, value: Value, def: PP, affinity: Affinity)
+ where
+ PP: Into<ProgramPoint>,
+ {
+ let old = self
+ .ranges
+ .insert(LiveRange::new(value, def.into(), affinity));
+ debug_assert!(old.is_none(), "{} already has a live range", value);
+ }
+
+ /// Move the definition of `value` to `def`.
+ ///
+ /// The old and new def points must be in the same block, and before the end of the live range.
+ pub fn move_def_locally<PP>(&mut self, value: Value, def: PP)
+ where
+ PP: Into<ProgramPoint>,
+ {
+ let lr = self.ranges.get_mut(value).expect("Value has no live range");
+ lr.move_def_locally(def.into());
+ }
+
+ /// Locally extend the live range for `value` to reach `user`.
+ ///
+ /// It is assumed the `value` is already live before `user` in `block`.
+ ///
+ /// Returns a mutable reference to the value's affinity in case that also needs to be updated.
+ pub fn extend_locally(
+ &mut self,
+ value: Value,
+ block: Block,
+ user: Inst,
+ layout: &Layout,
+ ) -> &mut Affinity {
+ debug_assert_eq!(Some(block), layout.inst_block(user));
+ let lr = self.ranges.get_mut(value).expect("Value has no live range");
+ let livein = lr.extend_in_block(block, user, layout);
+ debug_assert!(!livein, "{} should already be live in {}", value, block);
+ &mut lr.affinity
+ }
+
+ /// Change the affinity of `value` to `Stack` and return the previous affinity.
+ pub fn spill(&mut self, value: Value) -> Affinity {
+ let lr = self.ranges.get_mut(value).expect("Value has no live range");
+ mem::replace(&mut lr.affinity, Affinity::Stack)
+ }
+
+ /// Compute the live ranges of all SSA values used in `func`.
+ /// This clears out any existing analysis stored in this data structure.
+ pub fn compute(&mut self, isa: &dyn TargetIsa, func: &mut Function, cfg: &ControlFlowGraph) {
+ let _tt = timing::ra_liveness();
+ self.ranges.clear();
+
+ // Get ISA data structures used for computing live range affinities.
+ let encinfo = isa.encoding_info();
+ let reginfo = isa.register_info();
+
+ // The liveness computation needs to visit all uses, but the order doesn't matter.
+ // TODO: Perhaps this traversal of the function could be combined with a dead code
+ // elimination pass if we visit a post-order of the dominator tree?
+ for block in func.layout.blocks() {
+ // Make sure we have created live ranges for dead block parameters.
+ // TODO: If these parameters are really dead, we could remove them, except for the
+ // entry block which must match the function signature.
+ for &arg in func.dfg.block_params(block) {
+ get_or_create(&mut self.ranges, arg, isa, func, &encinfo);
+ }
+
+ for inst in func.layout.block_insts(block) {
+ // Eliminate all value aliases, they would confuse the register allocator.
+ func.dfg.resolve_aliases_in_arguments(inst);
+
+ // Make sure we have created live ranges for dead defs.
+ // TODO: When we implement DCE, we can use the absence of a live range to indicate
+ // an unused value.
+ for &def in func.dfg.inst_results(inst) {
+ get_or_create(&mut self.ranges, def, isa, func, &encinfo);
+ }
+
+ // Iterator of constraints, one per value operand.
+ let encoding = func.encodings[inst];
+ let operand_constraint_slice: &[OperandConstraint] =
+ encinfo.operand_constraints(encoding).map_or(&[], |c| c.ins);
+ let mut operand_constraints = operand_constraint_slice.iter();
+
+ for &arg in func.dfg.inst_args(inst) {
+ // Get the live range, create it as a dead range if necessary.
+ let lr = get_or_create(&mut self.ranges, arg, isa, func, &encinfo);
+
+ // Extend the live range to reach this use.
+ extend_to_use(lr, block, inst, &mut self.worklist, func, cfg);
+
+ // Apply operand constraint, ignoring any variable arguments after the fixed
+ // operands described by `operand_constraints`. Variable arguments are either
+ // block arguments or call/return ABI arguments.
+ if let Some(constraint) = operand_constraints.next() {
+ lr.affinity.merge(constraint, &reginfo);
+ }
+ }
+ }
+ }
+ }
+}
+
+impl Index<Value> for Liveness {
+ type Output = LiveRange;
+ fn index(&self, index: Value) -> &LiveRange {
+ self.ranges
+ .get(index)
+ .unwrap_or_else(|| panic!("{} has no live range", index))
+ }
+}