Compiler Passes

Warning

This page has not been revised yet since modularization and refactoring, and may thus be out of date.

Most of the passes in their function and implementation are platform independent, deriving their platform dependent information from options and/or the configuration file. This holds also for mapping, although one wouldn’t think so first, since it is called from the platform dependent part of the compiler now. All passes like this are summarized below first and are described extensively platform independently later in this section.

Note

Some passes are called from the platform independent compiler, other ones from the back-end compiler. That is a platform dependent issue and therefore described with the platform.

Their description includes:

  • their API, including their name: in general, apart from their name, they take all parameters from the program context which includes the configuration file and the platform, the options, and the vector of kernels with their circuits

  • the Intermediate Representation (IR) they expect as input and what they update in the IR; in general, they should accept any IR, so all types of gates, quantum as well as classical

  • the particular options they listen to; there usually is an option to disable it; also there are ways to dump the IR before and/or after it although this is not generally possible yet

  • the function they perform, in terms of the IR and the options

Other passes, of which the implementation (i.e. source code etc.) is platform dependent, can be found with the platforms. An example of the latter passes is QISA (i.e. instruction) generation in CC-Light. In the lists below, these passes are indicated to be platform dependent.

Passes have some general facilities available to them; these are not passes themselves since they don’t transform the IR. Examples of such facilities are:

  • ql::report::report_qasm(prog_name, kernels, platform, relplacename, passname):

    Writing the IR out in an external representation (QASM 1.0) to a file when option write_qasm_files has the value yes. The file is stored in the default output directory; the name of the file is composed from the program name (prog_name), the place relative to the pass (relplacename), and the pass name (passname), all separated by _, and the result suffixed by .qasm. The pass indicates before or after which the IR is written to file. The place relative to the pass indicates e.g. in or out, meaning before or after the pass, respectively. In this way, multiple qasm files can be written per compile, and be easily related to the point in compilation where the writing was done.

  • ql::report::report_bundles(prog_name, kernels, platform, relplacename, passname):

    Identical to report_qasm but the QASM is written as bundles.

  • ql::report::report_statistics(prog_name, kernels, platform, relplacename, passname, prefix):

    Identical to report_qasm but the IR itself is not written but a summary of it, e.g. the number of kernels, the numer of one-qubit, two-qubit and more-qubit gates, which qubits were used and which not, the wall clock time that compilation took until this point, etc. This is done for each kernel separately and for the whole program; additional interfaces are available for making the individual reports and adding pass specific lines to the reports. The prefix string is prepended to each line in the report file, e.g. to make it qasm comment. Furthermore the suffix is .report. And writing the report is only done when option write_report_files has the value yes.

  • ql::utils::write_file(filename, contentstring):

    Writing a content string to the file with given filename in the default output directory for off-line inspection. An example is writing (in dot format) the gate dependence graph which is a scheduling pass internal data structure. The writing to a file of a string is a general facility but the generation of the string representation of the internal data structure is pass dependent. The options controlling this are also pass specific.

Writing the IR out to a file in a form suitable for a particular subsequent tool such as quantumsim is considered code generation for the quantumsim platform and is therefore considered a pass.

Note

A compiler pass is not something defined in OpenQL. It should be. Passes then have a standard API, standard intermediate representation dumpers before and after them, a standard way to include them in the compiler. We could have the list of passes to call be something defined in the configuration file, perhaps with the places where we want to have dumps and reports.

Summary of compiler passes

Compiler passes in OpenQL are the compiler elements that, when called one after the other, gradually transform the OpenQL input program to some platform defined output program. The following passes are available and usually called in this order. More detailed information on each can be found in the sections below.

When it is indicated that a pass is CC-Light (or any other platform) dependent, it means that its implementation with respect to source code is platform dependent. A pass of which the source code is platform independent, can behave platform dependently by its parameterization by the platform configuration file.

  • program reading

    not a real pass now; it covers the code that for a particular program sets its options, connects it to a platform, defines its program parameters such as number of qubits, defines its kernels, and defines its gates; in the current OpenQL implementation this is all code upto and including the call to p.compile(). See Input external representation and Creating your first program.

  • optimize

    attempts to find contigous sequences of quantum gates that are equivalent to identity (within some small epsilon which currently is 10 to the power -4) and then take those sequences out of the circuit; this relies on the function of each gate to be defined in its mat field as a matrix. See Optimization.

  • decompose_toffoli

    each toffoli gate in the IR is replaced by a gate sequence with at most two-qubit gates; depending on the value of the equally named option; it does this in the Neilsen and Chuang way (NC), or in the way as in https://arxiv.org/pdf/1210.0974,pdf (AM). See Decomposition.

  • unitary decomposition

    the unitary decomposition pass is not generally available yet; it is in some private OpenQL branch. See Decomposition.

  • scheduling

    of each kernel’s circuit the gates are scheduled at a particular cycle starting from 0 (by filling in the gate’s cycle attribute) that matches the gates’ dependences, their duration, the constraints imposed by their resource use, the buffer values defined for the platform, and the latency value defined for each gate; multiple gates may start in the same cycle; in the resulting circuits (which are vectors of pointers to gate) the gates are ordered by their cycle value. The schedulers also produce a bundled version of each circuit: the circuit is then represented by a vector of bundles in which each bundle lists the gates that are to be started in the same cycle; each bundle further contains sublists that combines gates with the same operation but with different operands. The resource-constrained and non-constrained versions of the scheduler have different entry points (currently). The latter only considers the gates’ dependences and their duration, which is sufficient as input to QX. Next to the above necessary constraints, the remaining freedom is defined by a scheduling strategy which is defined by the scheduler option value: ASAP, ALAP and some other options. See Scheduling.

  • decomposition before scheduling (CC-Light dependent)

    classical non-primitive gates are decomposed to primitives (e.g. eq is transformed to cmp followed by an empty cycle and an fbr_eq); after measurements an fmr is inserted provided the measurement had a classical register operand. See Decomposition.

  • clifford optimization

    dependency chains of one-qubit clifford gates operating on the same qubit are replaced by equivalent sequences of primitive gates when the latter leads to a shorter execution time. Clifford gates are recognized by their name and use is made of the property that clifford gates form a group of 24 elements. Clifford optimization is called before and after the mapping pass. See Optimization.

  • mapping

    the circuits of all kernels are transformed such that for any two-qubit gate the operand qubits are connected (are NN, Nearest Neighbor) in the platform’s topology; this is done by a kernel-level initial placement pass and when it fails, by a subsequent heuristic; the heuristic essentially transforms each circuit from start to end; doing this, it maintains a map from virtual (program) qubits to real qubits (v2r); each time that it encounters a two-qubit gate that in the current map is not NN, it inserts swap gates before this gate that gradually make the operand qubits NN; when inserting a swap, it updates the v2r map accordingly. There are many refinements to this algorithm that can be controlled through options and the configuration file. It is not complete in the sense that it ignores transfer of the v2r map between kernels. See Mapping.

  • rcscheduler

    resource constraints are taken into account; the result reflects the timing required during execution, i.e. also taking into account any further non-OpenQL passes and run-time stages such as (for CC_Light):

    • QISA assembly

    • classical code execution (from here on these passes are executed as run-time stages)

    • quantum microcode generation

    • micro operation to signal and microwave conversion

    • execution unit reprogramming and inter operation reset times

    • signal communication line delays

    • execution time and feed-back delays

    The resulting circuit is stored in the usual manner and as a sequence of bundles. See Scheduling.

  • decomposition after scheduling (CC-Light dependent)

    two-qubit flux gates are decomposed to a series of one-qubit flux gates of the form sqf q0 to be executed in the same cycle; this is done only when the cz_mode option has the value auto; such a gate is generated for each operand and for all qubits that need to be detuned; see the detuned_qubits resource description in the CC-Light platform configuration file for details. See Decomposition.

  • opcode and control store file generation (CC-Light dependent)

    currently disabled as not used by CC-Light

  • write_quantumsim_program

    writes the current IR as a python script that interfaces with quantumsim

  • write_qsoverlay_program

    writes the current IR as a python script that interfaces with the qsoverlay module of quantumsim

  • QISA generation (CC-Light dependent)

    • bundle to QISA translation

      • deterministic sorting of gates per bundle

      • instruction prefix and wait instruction insertion

      • classical gate to QISA classical instruction translation

      • SOMQ generation and mask to mask register assignment (should include mask instruction generation)

      • insertion of wait states between meas and fmr (should be done by scheduler)

    • mask instruction generation

    • QISA file writing

    See Platforms and architectures.

Decomposition

Decomposition of gates [TBD]

Control decomposition

Entry points

The following entry points are supported:

  • entry() TBD

Input and output intermediate representation

TBD.

Options

The following options are supported:

  • option TBD

Function

TBD

Unitary decomposition

Unitary decomposition allows a developer of quantum algorithms to specify a quantum gate as a unitary matrix, which is then split into a circuit consisting of ry, rz and cnot gates.

To use it, define a Unitary with a name and a (complex) list containing all the values in the unitary matrix in order from the top left to the bottom right. The matrix needs to be unitary to be a valid quantum gate, otherwise an error will be raised by the compilation step.

Name

operands

C++ type

example

Unitary

name

string

“U_name”

unitary matrix

vector<complex<double>>

[0.5+0.5j,0.5-0.5j,0.5-0.5j,0.5+0.5j]

The unitary is first decomposed, by calling the .decompose() function on it. Only then can it be added to the kernel as a normal gate to the number of qubits corresponding to the unitary matrix size. This looks like:

u1 = ql.Unitary("U_name", [0.5+0.5j,0.5-0.5j,0.5-0.5j,0.5+0.5j])
u1.decompose()
k.gate(u1, [0])

Which generates this circuit:

rz q[0], -1.570796
ry q[0], -1.570796
rz q[0], 1.570796

The circuit generated might also have different angles, though not different gates, and result in the same effect on the qubits, this is because a matrix can have multiple valid decompositions.

For a two-qubit unitary gate or matrix, it looks like:

list_matrix = [1, 0       , 0       , 0,
               0, 0.5+0.5j, 0.5-0.5j, 0,
               0, 0.5-0.5j, 0.5+0.5j, 0,
               0, 0       , 0       , 1]
u1 = ql.Unitary("U_name", list_matrix)
u1.decompose()
k.gate(u1, [0,1])

This generates a circuit of 24 gates of which 6 cnots, spanning qubits 0 and 1. The rest are ry and rz gates on both qubits, which looks like this:

rz q[0], -0.785398
ry q[0], -1.570796
rz q[0], -3.926991
rz q[1], -0.785398
cnot q[0],q[1]
rz q[1], 1.570796
cnot q[0],q[1]
rz q[0], 2.356194
ry q[0], -1.570796
rz q[0], -3.926991
ry q[1], 0.785398
cnot q[0],q[1]
ry q[1], 0.785398
cnot q[0],q[1]
rz q[0], -0.000000
ry q[0], -1.570796
rz q[0], 3.926991
rz q[1], 0.785398
cnot q[0],q[1]
rz q[1], -1.570796
cnot q[0],q[1]
rz q[0], 3.926991
ry q[0], -1.570796
rz q[0], -2.356194

The unitary gate has no limit in how many qubits it can apply to. But the matrix size for an n-qubit gate scales as 2^n*2^n, which means the number of elements in the matrix scales with 4^n. This is also the scaling rate of the execution time of the decomposition algorithm and of the number of gates generated in the circuit. Caution is advised for decomposing large matrices both for compilation time and for the size of the resulting quantum circuit.

More detailed information can be found at http://resolver.tudelft.nl/uuid:9c60d13d-4f42-4d8b-bc23-5de92d7b9600

Decomposition before scheduling

Entry points

The following entry points are supported:

  • entry() TBD

Input and output intermediate representation

TBD.

Options

The following options are supported:

  • option TBD

Function

TBD

Decomposition after scheduling

Entry points

The following entry points are supported:

  • entry() TBD

Input and output intermediate representation

TBD.

Options

The following options are supported:

  • option TBD

Function

TBD

Decompose_toffoli

Entry points

The following entry points are supported:

  • entry() TBD

Input and output intermediate representation

TBD.

Options

The following options are supported:

  • option TBD

Function

TBD

Optimization

Optimization of circuits [TBD]

Optimize

attempts to find contigous sequences of quantum gates that are equivalent to identity (within some small epsilon which currently is 10 to the power -4) and then take those sequences out of the circuit; this relies on the function of each gate to be defined in its mat field as a matrix.

Entry points

The following entry points are supported:

  • entry() TBD

Input and output intermediate representation

TBD.

Options

The following options are supported:

  • option TBD

Function

TBD

Clifford optimization

dependency chains of one-qubit clifford gates operating on the same qubit are replaced by equivalent sequences of primitive gates when the latter leads to a shorter execution time. Clifford gates are recognized by their name and use is made of the property that clifford gates form a group of 24 elements. Clifford optimization is called before and after the mapping pass.

Entry points

The following entry points are supported:

  • entry() TBD

Input and output intermediate representation

TBD.

Options

The following options are supported:

  • option TBD

Function

TBD

Scheduling

Of each kernel’s circuit the gates are scheduled at a particular cycle starting from 0 (by filling in the gate’s cycle attribute) that matches the gates’ dependences, their duration, the constraints imposed by their resource use, the buffer values defined for the platform, and the latency value defined for each gate; multiple gates may start in the same cycle; in the resulting circuits (which are vectors of pointers to gate) the gates are ordered by their cycle value. The schedulers also produce a bundled version of each circuit; see Circuits and bundles in the internal representation.

The resource-constrained and non-constrained versions of the scheduler have different entry points (currently). The latter only considers the gates’ dependences and their duration, which is sufficient as input to QX. Next to the above necessary constraints, the remaining freedom is defined by a scheduling strategy which is defined by the scheduler option value: ASAP, ALAP and some other options.

Entry points

The following two entry points are supported, one for the non-constrained and one for the resource-constrained scheduler:

  • p.schedule() In the context of program object p, this method schedules the circuits of the kernels of the program, according to a strategy specified by the scheduling options, but without taking resource constraints, buffers and latency compensation of the platform into account.

  • bundles = cc_light_schedule_rc(circuit, platform, num_qubits, num_creg) In the context of the cc_light_eqasm_compiler, a derived class of the eqasm_compiler class, in its compile(prog_name, kernels, platform) method, inside a loop over the specified kernels, the resource-constrained scheduler is called to schedule the specified circuit, according to a strategy specified by the scheduling options, and taking resource constraints, buffers and latency compensation of the platform into account. It creates a bundled version of the IR and returns it.

Note

These entry points need to be harmonized to fit in the generalized pass model: same class, program-level interface, no result except in IR, buffer and latency compensation split off to separate passes.

The above entry points each create a sched object of class Scheduler and call a selection of its methods:

  • sched.init(circuit, platform, num_qubits, num_creg) A dependence graph representation of the specified circuit is constructed. This graph is a Directed Acyclic Graph (DAG). In this graph, the nodes represent the gates and the directed edges the dependences. The top of the graph is a newly created SOURCE gate, the bottom is a newly created SINK gate. With respect to dependences, the SOURCE and SINK gates behave as if they update all qubits and classical registers with 0 duration. Gates are added in the order of presence in the circuit and linked in dependence chains according to their operation and operands.

    The nodes have as attributes (apart from the gate’s attributes):

    • name with the qasm string representation of the gate (such as cnot q[1],q[2])

    The edges have as attributes:

    • weight representing the number of cycles needed from the start of execution of the gate at the source of the edge, to the start of execution of the gate at the target of the edge; this value is initialized from the duration attribute of the gate

    • cause representing the qubit or classical register causing the dependence

    • depType representing the type of the dependence

    The latter two attributes are currently only used internally in the dependence graph construction.

    This sched.init method is called by both entry points for each circuit of the program.

  • bundles = sched.schedule_asap(sched_dot) The cycle attributes of the gates are initialized consistent with an ASAP (i.e. downward) walk over the dependence graph. Subsequently, the gates in the circuit are sorted by their cycle value; and the bundler called to produce a bundled version of the IR to return.

    This method is called by p.schedule() for each circuit of the program when non-uniform ASAP scheduling.

  • bundles = sched.schedule_alap(sched_dot) The cycle attributes of the gates are initialized consistent with an ALAP (i.e. upward) walk over the dependence graph. Subsequently, the gates in the circuit are sorted by their cycle value; and the bundler called to produce a bundled version of the IR to return.

    This method is called by p.schedule() for each circuit of the program when non-uniform ALAP scheduling.

  • bundles = sched.schedule_alap_uniform() The cycle attributes of the gates are initialized consistent with a uniform ALAP schedule: this modified ALAP schedule aims to have an equal number of gates starting in each non-empty bundle. Subsequently, the gates in the circuit are sorted by their cycle value; and the bundler called to produce a bundled version of the IR to return.

    This method is called by p.schedule() for each circuit of the program when uniform and ALAP scheduling.

  • bundles = sched.schedule_asap(resource_manager, platform, sched_dot)

    This method is called by cc_light_schedule_rc after calling sched.init, and creation of the resource manager for each circuit of the program when non-uniform ASAP scheduling. See Function for a more extensive description.

  • bundles = sched.schedule_alap(resource_manager, platform, sched_dot)

    This method is called by cc_light_schedule_rc after calling sched.init, and creation of the resource manager for each circuit of the program when non-uniform ALAP scheduling. See Function for a more extensive description.

In the sched_dot parameter of the methods above a dot representation of the dependence graph of the kernel’s circuit is constructed, in which the gates are ordered along a timeline according to their cycle attribute.

Input and output intermediate representation

The schedulers expect kernels with or without a circuit. When with a circuit, the cycle attribute need not be valid. Gates that are supported on input are one-qubit measure, no-operand display, any classical gate, cnot, cz/cphase, and any other quantum and scheduling gate.

They produce a circuit with the same gates (but potentially differently ordered). The cycle attribute of each gate has been defined. The gates in the circuit are ordered with non-decreasing cycle value. The cycle values are consistent with all constraints imposed during scheduling and with the scheduling strategy that has been specified through the options or by selection of the entry point.

Note

There are no gates for control flow; so these are not defined in the configuration file; these are not scheduled in the usual way; these are not translated to QASM and external representations in the usual way. See Kernel.

Options

The following options are supported:

  • scheduler With the value ASAP, the scheduler creates a forward As Soon As Possible schedule of the circuit. With the value ALAP, the scheduler creates a backward As Soon As Possible schedule which is equivalent to a forward As Late As Possible schedule of the circuit. Default value is ALAP.

  • scheduler_uniform With the value yes, the scheduler creates a uniform schedule of the circuit. With the value no, it doesn’t. Default value is no.

  • scheduler_commute With the value yes, the scheduler exploits commutation rules for cnot, and cz/cphase to have more scheduling freedom to aim for a shorter latency circuit. With the value no, it doesn’t. Default value is no.

  • output_dir The value is the name of the directory which should be present in the current directory during execution of OpenQL, where all output and report files of OpenQL are created. Default value is test_output.

  • write_qasm_files When it has the value yes, p.schedule produces in the output directory a bundled QASM (see Output external representation) of all kernels in a single file with as name the name of the program followed by _scheduled.qasm.

  • print_dot_graphs When it has the value yes, p.schedule produces in the output directory in multiple files each with as name the name of the kernel followed by _dependence_graph.dot a dot representation of the dependence graph of the kernel’s circuit. Furthermore it produces in the output directory in multiple files each with as name the name of the kernel followed by the value of the scheduler option and _scheduled.dot a dot representation of the dependence graph of the kernel’s circuit, in which the gates are ordered along a timeline according to their cycle attribute.

Note

The options don’t discriminate between the prescheduler and the rcscheduler although these could desire different option values. Also there is not an option to skip this pass.

Function

Scheduling of a circuit starts with creation of the dependence graph; see Entry points for its definition.

Gates that are supported on input are one-qubit measure, no-operand display, any classical gate, cnot, cz/cphase, and any other quantum and scheduling gate. With respect to dependence creation, the latter ones are assumed to use and update each of their operands during the operation; and the former ones each have a specific definition regarding the use and update of their operands:

  • measure also updates its corresponding classical register;

  • display and the classical gates use/update all qubits and classical registers (so these act as barriers);

  • cnot uses and doesn’t update its control operand, and it commutes with cnot/cz/cphase with equal control operand; cnot uses and updates its target operand, it commutes with cnot with equal target operand;

  • cz/cphase commutes with cnot/cz/cphase with equal first operand, and it commutes with cz/cphase with equal second operand. This commutation is exploited to aim for a shorter latency circuit when the scheduler_commute option is in effect.

When scheduling without resource constraints the cycle attributes of the gates are initialized consistent with an ASAP (i.e. downward/forward) or ALAP (i.e. upward/backward) walk over the dependence graph. Subsequently, the gates in the circuit are sorted by their cycle value; and the bundler called to produce a bundled version of the IR to return.

The remaining part of this subsection describes scheduling with resource constraints.

The implementation of this list scheduler is parameterized on doing a forward or a backward schedule. The former is used to create an ASAP schedule and the latter is used to create an ALAP schedule. We here describe the forward case because that is easier to grasp and later come back on the backward case.

A list scheduler maintains at each moment a list of gates that are available for being scheduled because they are not blocked by dependences on non-scheduled gates. Not all gates that are available (not blocked by dependences on non-scheduled gates) can actually be scheduled. It must be made sure in addition that those scheduled gates that it depends on, actually have completed their execution (using its duration) and that the resources are available for it. Furthermore, making a selection from the gates that remain after ignoring these, determines the optimality of the scheduling result. The implemented list scheduler is a critical path scheduler, i.e. it prefers to schedule the most critical gate first. The criticality of a gate estimates the effect that delaying scheduling the gate has on the latency of the resulting circuit, and is determined by computing the length of the longest dependence chain from the gate to the SINK gate; the higher this value, the higher the gate’s scheduling priority in the current cycle is.

The scheduler relies on the dependence graph representation of the circuit. At the start only the SOURCE gate is available. Then one by one, according to a criterion, a gate is selected from the list of available ones and added to the schedule. Having scheduled the gate, it is taken out of the available list; after having scheduled a gate, some new gates may become available because they don’t depend on non-scheduled gates anymore; those gates are found and put in the available list of gates. This continues, filling cycle by cycle from low to high, until the available list gets empty (which happens after scheduling the last gate, the SINK gate).

Above it was mentioned that a gate can only be scheduled in a particular cycle when the resources are available for it. In this, the scheduler relies on the resource manager of the platform. The latter was created and initialized from the platform configuration file before scheduling started. Please refer to cclplatform for a description of the specification of resources of the CC-Light platform. And furthermore note that only the resources that are specified in the platform configuration file determine the resource constraints that apply to the scheduler; recall that for each resource type, several resources can be specified, each of which typically has some kind of exclusive use. The simplest one is the qubits resource type of which there are as many resources as there are qubits. The resource manager maintains a so-called machine state that describes the occupation status of each resource. This resource state typically consists of two elements: the operation type that is using this resource; and the occupation period, which is described by a pair of cycle values, representing the first cycle that it is occupied, and the first cycle that it is free again, respectively.

If a gate is to be scheduled at cycle t, then all the resources for executing the gate are checked to be available from cycle t till (and not including) t plus the gate’s duration in cycles; and when actually committing to scheduling the gate at cycle t, all its resources are set to occupied for the duration of its execution. The resource manager offers methods for this check (bool rm.available()) and commit (rm.reserve()). Doing this check and committing for a particular gate, some additional gate attributes may be required by the resource manager. For the CC-Light resource manager, these additional gate attributes are:

  • operation_name initialized from the configuration file cc_light_instr gate attribute representing the operation of the gate; it is used by the qwgs resource type only; two gates having the same operation_name are assumed to use the same wave form

  • operation_type initialized from the configuration file type gate attribute representing the kind of operation of the gate: mw for rotation gates, readout for measurement gates, and flux for one and two-qubit flux gates; it is used by each resource type

This concludes the description of the involvement of the resource manager in the scheduling of a gate.

The list scheduler algorithm uses a so-called availability list to represent gates that can be scheduled; see above. When the available list becomes empty, all cycle values were assigned and scheduling is almost done. The gates in the circuit are then first sorted on their cycle value.

Then latency compensation is done: for each gate for which in the platform configuration file a latency attribute value is specified, the gate’s cycle value is incremented by this latency value converted to cycles; the latter is usually negative. This mechanism allows to start execution of a gate earlier to compensate for a relative delay in the control electronics that is involved in executing the gate. So in theory, in the quantum hardware, gates which before latency compensation had the same cycle value, also execute in the same cycle. After this, the gates in the circuit are again sorted on their cycle value.

After the bundler has been called to produce a bundled IR, any buffer delays are inserted. Buffer delays can be specified in the platform configuration file in the hardware_settings section. Insertion makes use of the type attribute of the gate in the platform configuration file, the one which can have the values mw, readout and flux. For each bundle, it checks for each gate in the bundle, whether there is a non-zero buffer delay specified with a gate in the previous bundle, and if any, takes the maximum of those buffer delays, and adds it (converted to cycles) to the bundle’s start_cycle attribute. Moreover, when the previous bundle got shifted in time because of earlier bundle delays, the same shift is applied first to the current bundle. In this way, the schedule gets stretched for all qubits at the same time. This is a valid thing to do and doesn’t invalidate dependences nor resource constraints.

Note

Buffer insertion only has effect on the start_cycle attributes of the bundles and not on the cycle attributes of the gates. It would be better to do buffer insertion on the circuit and to do bundling afterwards, so that circuit and bundles are consistent.

In the backward case, the scheduler traverses the dependence graph bottom-up, scheduling the SINK gate first. Gates become available for scheduling at a particular cycle when at that cycle plus its duration all its dependent gates have started execution. And scheduling finishes when the available list is empty, after having scheduled the SOURCE gate. In this, cycles are decremented after having scheduled SINK at some very high cycle value, and later, after having scheduled SOURCE, the cycle values of the gates are consistently shifted down so that SOURCE starts at cycle 0. The resource manager’s state and methods also are parameterized on the scheduling direction.

Scheduling for software platforms

  • Scheduling for qx

  • Scheduling for quantumsim

Scheduling for hardware platforms

  • Scheduling for CC-Light platform

  • Scheduling for CC platform

  • Scheduling for CBox platform

Mapping

The circuits of all kernels are transformed such that after mapping for any two-qubit gate the operand qubits are connected (are NN, Nearest Neighbor) in the platform’s topology; this is done by a kernel-level initial placement and when it fails, by subsequent heuristic routing and mapping. Both maintain a map from virtual (program) qubits to real qubits (v2r) and a map from each real qubit index to its state (rs); both are available after each of the two mapping subpasses.

  • initial placement This module attempts to find a single mapping of the virtual qubits of a circuit to the real qubits (v2r map) of the platform’s qubit topology, that minimizes the sum of the distances between the two mapped operands of all two-qubit gates in the circuit. The distance between two real qubits is the minimum number of swaps that is required to move the state of one of the two qubits to the other. It employs a Mixed Integer Linear Programming (MIP) algorithm to solve the initial placement that is modelled as a Quadratic Assignment Problem. The module can find a mapping that is optimal for the whole circuit, but because its time-complexity is exponential with respect to the size of the circuit, this may take quite some computer time. Also, the result is only really useful when in the mapping found all mapped operands of two-qubit gates are NN. So, there is no guarantee for success: it may take too long and the result may not be optimal.

  • heuristic routing and mapping This module essentially transforms each circuit in a linear scan over the circuit, from start to end, maintaining the v2r and rs maps. Each time that it encounters a two-qubit gate that in the current map is not NN, it inserts swap gates before this gate that make the operand qubits NN (this is called routing the qubits); when inserting a swap, it updates the v2r and rs maps accordingly. There are many refinements to this algorithm that can be controlled through options and the configuration file. The module will find the minimum number of swaps to make the mapped operands of each two-qubit gate NN in the mapping that applies just before it. In the most basic version, it has a linear time-complexity with respect to circuit size and number of qubits. With advanced search options set, the algorithm may become cubic with respect to number of qubits. So, it is still scalable and is guaranteed to find a solution.

The implementation is not complete:

  • In the presence of multiple kernels with control flow among them, the v2r at the start of each kernel must match the v2r at the end of all predecessor kernels: this is not implemented. Instead, the v2r at the start of each kernel is re-initialized freshly, independently of the v2r at the end of predecessor kernels. The current implementation thus assumes that at the end of each kernel all qubits don’t hold a state that must be preserved for a subsequent kernel.

Entry points

Mapping is implemented by a class Mapper with the support of many private other classes among which the scheduler class for obtaining the dependence graph. The following entry points are supported:

  • Mapper() Constructs a new mapper to be used for the whole program. Initialization is left to the Init method.

  • Mapper.Init(platform) Initialize the mapper for the given platform but independently of a particular kernel and circuit. This includes checking and initializing the mapper’s representation of the platform’s topology from the platform’s configuration file.

  • Mapper.Map(kernel) Perform mapping on the kernel, i.e. replace the kernel’s circuit by an equivalent but mapped circuit. Each kernel is mapped independently of any other kernel. Of each gate the cycle attribute is assigned, and the resulting circuit is scheduled; which constraints are obeyed in this schedule depends on the mapping strategy (the value of the mapper attribute). In the argument kernel object, the qubit_count attribute is updated from the number of virtual qubits of the kernel to the number of real qubits as specified by the platform; this is done because in the mapped circuit the qubit operands of all gates will be real qubit indices of which the values should be in the range of the valid real qubit indices of the platform.

    Furthermore, some reporting of internal mapper statistics is done into attributes of the Mapper object. These can be retrieved by the caller of Map:

    • nswapsadded Number of swaps and moves inserted.

    • nmovesadded Number of moves inserted.

    • v2r_in Vector with for each virtual qubit index its mapping to a real qubit index (or UNDEFINED_QUBIT represented by INT_MAX, indicating that the virtual qubit index is not mapped to a real qubit), after initialization and before initial placement and/or heuristic routing and mapping.

    • rs_in Vector with for each real qubit index its state. This vector shows the state after initialization of the mapper and before initial placement and/or heuristic routing and mapping. State values can be:

      • rs_nostate: no statically known quantum state and no dynamically useful quantum state to preserve

      • rs_wasinited: known to be in zero base state (|0>)

      • rs_hasstate: useful but statically unknown quantum state; must be preserved

    • v2r_ip Vector with for each virtual qubit index its mapping to a real qubit index (or UNDEFINED_QUBIT represented by INT_MAX, indicating that the virtual qubit index is not mapped to a real qubit), after initial placement but before heuristic routing and mapping.

    • rs_ip Vector with for each real qubit index its state (see rs_in above for the values), after initial placement but before heuristic routing and mapping.

    • v2r_out Vector with for each virtual qubit index its mapping to a real qubit index (or UNDEFINED_QUBIT represented by INT_MAX, indicating that the virtual qubit index is not mapped to a real qubit), after heuristic routing and mapping.

    • rs_out Vector with for each real qubit index its state (see rs_in above for the values), after heuristic routing and mapping.

Input and output intermediate representation

The mapper expects kernels with or without a circuit. When with a circuit, the cycle attributes of the gates need not be valid. Gates that are supported on input are one-qubit measure, no-operand display, any classical gate, cnot, cz/cphase, and any other quantum and scheduling gate. The mapper refuses multi-qubit quantum gates as input with more than two quantum operands.

The mapper produces a circuit with the same gates but then mapped (see below), with the real qubit operands of two-qubit gates made nearest-neighbor in the platform’s topology, and with additional quantum gates inserted to implement the swapping or moving of qubit states. The mapping of any (quantum, classical, etc.) gate entails replacing the virtual qubit operand indices by the real qubit operand indices corresponding to the mapping of virtual to real qubit indices applicable at the time of execution of the gate; furthermore the gate itself (when a quantum gate) is optionally replaced at the time of its mapping by one or more gates as specified by the platform’s configuration file: if the configuration file contains a definition for a gate with the name of the original gate with _real appended, then that one is created and replaces the original gate. Note that when this created gate is defined in the gate_decomposition section, the net effect is that the specified decomposition is done. When a swap or move gate is created to be inserted in the circuit, first a swap_real (or move_real) is attempted to be created instead before creating a swap or move; this also allows the gate to be decomposed to more primitive gates during mapping.

When a kernel’s circuit has been mapped, an optional final decomposition of the mapped gates is done: each gate is optionally replaced by one or more gates as specified by the platform’s configuration file, by creating a gate with the name of the original gate with _prim appended, if defined in the configuration file, and replacing the original gate by it. Note that when this created gate is specified in the configuration file in the gate_decomposition section, the net effect is that the specified decomposition is done. When in the mapped circuit, swap or move gates were inserted and swap_prim or move_prim are specified in the configuration file, these are also used to replace the swap or move at this time.

The cycle attribute of each gate is assigned a valid value. The gates in the circuit are ordered with non-decreasing cycle value. The cycle values are consistent with the constraints that are imposed during mapping; these are specified by the mapper option.

The above implies that non-quantum gates are accepted on input and are passed unchanged to output.

Options and Function

The options and corresponding function of the mapper are described.

The options include the proper mapper options and a few scheduler options. The subset of the scheduler options applies because the mapper uses the dependence graph created by the initialization method of the scheduler. Also see Options.

Most if not all options can be combined to compose a favorite mapping strategy, i.e. the options are largely independent.

With the options, also the effects that they have on the function of the mapper are described.

The options and function are described in the order of their virtual encountering by a particular gate that is mapped. Please remember that heuristic routing and mapping essentially performs a linear scan over the gates of the circuit to route the qubits, map and transform the gates.

Initialization and configuration

The Init method initializes the mapper for the given platform but independently of a particular kernel and circuit. This includes sanity checking and initializing the mapper’s representation of the platform’s topology from the platform’s configuration file; see Configuration file definitions for mapper control for the description of the platform’s topology.

The topology’s edges define the neighborhood/connection map of the real qubits. Floyd-Warshall is used to compute a distance matrix that contains for each real qubit pair the shortest distance between them. This makes the mapper applicable to arbitrary formed connection graphs but at the same time less scalable in number of qubits. For NISQ systems this is no problem. For larger and more regular connection grids, the implementation contains a provision to replace this by a distance function.

Subsequently, Map is called for each kernel/circuit in the program. It will attempt initial placement and then heuristic routing and mapping. Before anything else, for each kernel again, the v2r and rs are initialized, each under control of an option:

  • mapinitone2one: Definition of the initialization of the v2r map at the start of the mapping of each kernel; this v2r will apply at the start of initial placement.

    • no: there is no initial mapping of virtual to real qubits; each virtual qubit is allocated to the first free real qubit on the fly, when it is mapped

    • yes (default for back-ward compatibility): the initial mapping is 1 to 1: a virtual qubit with index qi is mapped to its real qi counterpart (so: same index)

  • mapassumezeroinitstate: Definition of the initialization of the rs map at the start of the mapping of each kernel; this rs will apply at the start of initial placement. Values can be: rs_nostate (no useful state), rs_wasinited (zero state), and rs_hasstate (useful but unknown state).

    • no (default for back-ward compatibility): each real qubit is assumed not to contain any useful state nor is it known that it is in a particular base state; this corresponds to the state with value rs_nostate.

    • yes (best): each real qubit is assumed to be in a zero state (e.g. |0>) that allows a swap with it to be replaced by a (cheaper) move; this corresponds to the state with value rs_wasinited.

Initial Placement

After initialization and configuration, initial placement is started. See the start of Mapping of a description of initial placement. Since initial placement may take a lot of computer time, provisions have been implemented to time it out; this comes in use during benchmark runs. Initial placement is run under the control of two options:

  • initialplace: Definition of initial placement operation. Initial placement, when run, may be 100% successful (all two-qubit gates were made NN); be moderately successful (not all two-qubit gates were made NN, only some) or fail to find a solution:

    • no (default): no initial placement is attempted

    • yes (best, optimal result): do initial placement starting from the initial v2r mapping; since initial placement employs an Integer Linear Programming model as the base of implementation, finding an initial placement may take quite a while.

    • 1s, 10s, 1m, 10m, 1h (best, limit time, still a result): put a soft time limit on the execution time of initial placement; do initial placement as with yes but limit execution time to the indicated maximum (one second, 10 seconds, one minute, etc.); when it is not successfull in this time, it fails, and subsequently heuristic routing and mapping is started, which cannot fail.

    • 1sx, 10sx, 1mx, 10mx, 1hx: put a hard time limit on the execution time of initial placement; do initial placement as with yes but limit execution time to the indicated maximum (one second, 10 seconds, one minute, etc.); when it is not successfull in this time, it fails, and subsequently the compiler fails as well.

  • initialplace2qhorizon: The initial placement algorithm considers only a specified number of two-qubit gates from the start of the circuit (a horizon) to determine a mapping. This limits computer time but also may make a suboptimal result more useful. Option values are:

    • 0 (default, optimal result): When 0 is specified as option value, there is no limit; all two-qubit gates of the circuit are taken into account.

    • 10, 20, 30, 40, 50, 60, 70, 80, 90, 100: The initial placement algorithm considers only this number of initial two-qubit gates in the circuit to determine a mapping.

Best result would be obtained by running initial placement optionally twice (this is not implemented):

  • Once with a modified model in which only the result with all two-qubit gates NN is successful. When it succeeds, mapping has completed. Depending on the resources one wants to spend on this, a soft time limit could be set.

  • Otherwise, attempt to get a good starting mapping by running initial placement with a soft time limit (of e.g. 1 minute) and with a two-qubit horizon (of e.g. 10 to 20 gates). What ever the result is, run heuristic routing and mapping afterwards.

This concludes initial placement. The v2r and rs at this time are stored in attributes for retrieval by the caller of the Map method. See Input and output intermediate representation.

Heuristic Routing and Mapping

Subsequently heuristic routing and mapping starts for the kernel given in the Map method call.

Dependence Graph and Look-Ahead, Which Gate(s) To Map Next

The mapper optionally uses the dependence graph representation of the circuit to enlarge the number of alternatives it can consider, and to make use of the criticality of gates in the decision which one to map next. To this end, it calls the scheduler’s init method, and sets up the availability list of gates as set of gates to choose from which one to map next: initially it contains just the SOURCE gates. See Scheduling, and below for more information on the availability list’s properties. The mapper listens to the following scheduler options:

  • scheduler_commute: Because the mapper uses the dependence graph that is also generated for the scheduler, the alternatives that are made available by commutation of czs/cnots, can be made available to the mapper:

    • no (default for backward-compatibility): don’t allow two-qubit gates to commute (cz/cnot) in the dependence graph; they are kept in original circuit order and presented to the mapper in this order

    • yes (best): allow commutation of two-qubit cz/cnot gates; e.g. when one isn’t nearest-neighbor but one that comes later in the circuit but commutes with the earlier one is NN now, allow the later one to be mapped before the earlier one

  • print_dot_graphs: When it has the value yes, the mapper produces in the output directory in multiple files each with as name the name of the kernel followed by _mapper.dot a dot representation of the dependence graph of the kernel’s circuit at the start of heuristic routing and mapping, in which the gates are ordered along a timeline according to their cycle attribute.

With the dependence graph available to the mapper, its availability list is used just as in the scheduler:

  • the list at each moment contains those gates that have not been mapped but can be mapped now

  • the availability list forms a kind of cut of the dependence graph: all predecessors of the gates in the list and recursively all their predecessors have been mapped, all other gates have not been mapped (the cut is really the set of dependences between the set of mapped and the set of non-mapped gates)

  • each moment a gate has been mapped, it is taken out of the availability list; those of its successor dependence gates of which all predecessors have been mapped, become available for being mapped, i.e. are added to the availability list

This dependence graph is used to look-ahead, to find which two-qubit to map next, to make a selection from all that are available or take just the most critical one, to try multiple ones and evaluate each alternative to map it, comparing those alternatives against one of the metrics (see later), and even go into recursion (see later as well), i.e. looking farther ahead to see what the effects on subsequent two-qubit gates are when mapping the current one.

In this context the criticality of a gate is an important property of a gate: the criticality of a gate is the length of the longest dependence path from the gate to the SINK gate and is computed in a single linear backward scan over the dependence graph (Dijkstra’s algorithm).

Deciding for the next two-qubit gate to map, is done based on the following option:

  • maplookahead: How does the mapper exploit the lookahead offered by the dependence graph constructed from the input circuit?

    • no: the mapper ignores the dependence graph and takes the gates to be mapped one by one from the input circuit

    • critical: gates that by definition do not need routing, are mapped first (and kind of flushed): these include the classical gates, scheduling gates (such as wait), and the single qubit quantum gates; and of the remaining (only two qubit) quantum gates the most critical gate is selected first to be routed and mapped next; the rationale of taking the most critical gate is that after that one the most cycles are expected until the end of the circuit, and so a wrong routing decision of a critical gate is likely to have most effect on the mapped circuit’s latency; so criticality has higher priority to select the one to be mapped next, than NN (see noroutingfirst for the opposite approach)

    • noroutingfirst (default, best): gates that by definition do not need routing, are mapped first (and kind of flushed): these include the classical gates, scheduling gates (such as wait), and the single qubit quantum gates; in this, this noroutingfirst option has the same effect as critical; but those two qubit quantum gates of which the operands are neighbors in the current mapping are selected to be mapped first, not needing routing, also when these are not critical; and when none such are left, only then take the most critical one; so NN has higher priority to select the one to be mapped next, than criticality

    • all (promising in combination with recursion): as with noroutingfirst but don’t select the most critical one, select them all; so at each moment gates that do not need routing, are mapped first (and kind of flushed); these thus include the NN two-qubit gates; this mapping and flushing stops when only non-NN two-qubit gates remain; instead of selecting one of these to be routed/mapped next, all of these are selected, the decision is postponed; i.e. for all remaining (two qubit non-NN) gates generate alternatives and find the best from these according to the chosen metric (see the mapper option below); and then select that best one to route/map next

Generating Routing Alternatives

Having selected one (or more) two-qubit gates to map next, for each two-qubit gate the routing alternatives are explored. Subsequently, those alternatives will be compared using the selected metric and the best one selected; see further below.

But first the routing alternatives have to be generated. When the mapped operands of a two-qubit gate are not NN, they must be made NN by swapping/moving one or both over nearest-neighbor connections in the target platform’s grid topology towards each other. Only then the two-qubit gate can be executed; the mapper will insert those swaps and moves before the two-qubit gate in the circuit.

There are usually many routes between the qubits. The current implementation only selects the ones with the shortest distance, and these can still be many. In a perfectly rectangular grid, the number of routes is similar to a Fibonaci number depending on the distance decomposed in the x and y directions, and is maximal when the distances in the x and y directions are equal. All shortest paths between two qubits in such a grid stay within a rectangle in the grid with the mapped qubit operands at opposite sides of the diagonal.

A shortest distance leads to a minimal number of swaps and moves. For each route between qubits at a distance d, there are furthermore d possible places in the route where to do the two-qubit gate; the other d-1 places in the route will be a swap or a move.

The implementation supports an arbitrarily formed connection graph, so not only a rectangular grid. All that matter are the distances between the qubits. Those have been computed using Floyd-Warshall from the qubit neighbor relations during initialization of the mapper. The shortests paths are generated in a brute-force way by only navigating to those neighbor qubits that will not make the total end-to-end distance longer. Unlike other implementations that only minimize the number of swaps and for which the routing details are irrelevant, this implementation explicitly generates all alternative paths to allow the more complicated metrics that are supported, to be computed.

The generation of those alternatives is controlled by the following option:

  • mappathselect: When generating alternatives of shortest paths between two real qubits:

    • all (default, best): select all possible alternatives: those following all possible shortest paths and in each path each possible placement of the two-qubit gate

    • borders: only select those alternatives that correspond to following the borders of the rectangle spanning between the two extreme real qubits; so on top of the at most two paths along the borders, there still are all alternatives of the possible placements of the two-qubit gate along each path

It is thus not supported to turn off to generate alternatives for the possible placements of the two-qubit gate along each path.

The alternatives are ordered; this is relevant for the maptiebreak option below. The alternatives are ordered:

  • first by the two-qubit gate for which they are an alternative; the most critical two-qubit gate is first; remember that there can be more than one two-qubit gate when all was selected for the maplookahead option.

  • then by the followed path; each path is represented by a sequence of transitions from the mapped first operand qubit to the mapped second operand qubit. The paths are ordered such that of any set of paths with a common prefix these are ordered by a clock-wise order of the successor qubits as seen from the last qubit of the common prefix.

  • and then by the placement of the two-qubit gate; the placements are ordered from start to end of the path.

So, the first alternative will be the one that clock-wise follows the border and has the two-qubit gate placed directly at the qubit that is the mapped first operand of the gate; the last alternative will be the one that anti-clock-wise follows the border and has the two-qubit gate placed directly at the qubit that is the mapped last operand of the gate.

Comparing Alternatives, Which Metric To Use

With all alternatives available, it is time to compare them using the defined metric. The metric to use is defined by the strategy option, called for historic reasons mapper. What needs to be done when multiple alternatives compare equal, is specified later.

  • mapper: The basic mapper strategy (metric of mapper result optimization) that is employed:

    • no (default for back-ward compatibility): no mapping is done. The output circuit is identical to the input circuit.

    • base: map the circuit: use as metric just the length of the paths between the mapped operands of each two-qubit gate, and minimize this length for each two-qubit gate that is mapped; with only alternatives for one two-qubit gate, all alternatives have the same shortest path, so all alternatives qualify equally; with alternatives for multiple two-qubit gates, those two-qubit gates are preferred that lead to the least swaps and moves.

    • minextend (best): map the circuit: use as metric the extension of the circuit by each of the shortest paths between the mapped operands of each two-qubit gate, and minimize this circuit extension by evaluating all alternatives; the computation of the extension relies on scheduling-in the required swaps and moves in the circuit and just subtracting the depths before and after doing that; the various options controlling this scheduling-in, will be specified later below.

    • minextendrc: map the circuit: as in minextend, but taking resource constraints into account when scheduling-in the swaps and moves.

Look-Back, Maximize Instruction-Level Parallelism By Scheduling

To know the circuit’s latency extension of an alternative, the mapped gates are represented as a scheduled circuit, i.e. with gates with a defined cycle attribute, and the gates ordered in the circuit with non-decreasing cycle value. In case the mapper option has the minextendrc value, also the state of all resources is maintained. When a swap or move gate is added, it is ASAP scheduled (optionally taking the resource constraints into account) into the circuit and the corresponding cycle value is assigned to the cycle attribute of the added gate. Note that when swap or move is defined by a composite gate, the decomposed sequence is scheduled-in instead.

The objective of this is to maximize the parallel execution of gates and especially of swaps and moves. Indeed, the smaller the latency extension of a circuit, the more parallelism was created, i.e. the more the ILP was enlarged. When swaps and moves are not inserted as primitive gates but the equivalent decomposed sequences are inserted, ILP will be improved even more.

This scheduling-in is done separately for each alternative: for each alternative, the swaps or moves are added and the end-result evaluated.

This scheduling-in is controlled by the following options:

  • mapusemoves: Use move instead of swap where possible. In the current implementation, a move is implemented as a sequence of two cnots while a swap is implemented as a sequence of three cnots.

    • no: don’t

    • yes (default, best): do, when swapping with an ancillary qubit which is known to be in the zero state (|0> for moves with 2 cnots); when not in the initial state, insert a move_init sequence (when defined in the configuration file, the defined sequence, otherwise a prepz followed by a hadamard) when it doesn’t additionally extend the circuit; when a move_init sequence would extend the circuit, don’t insert the move

    • 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20: yes, and insert a move_init sequence to get the ancillary qubit in the initial state, if needed; but only when the number of cycles of circuit extension that this move_init causes, is less-equal than 0, 1, ... 20 cycles.

      Please note that the mapassumezeroinitstate option defines whether the implementation of the mapper can assume that each qubit starts off in the initial state; this increases the likelihood that moves are inserted, and makes all these considerations of only inserting a move when a move_init can bring the ancillary qubit in the initial state somehow without additional circuit extension, of no use.

  • mapprepinitsstate: Does a prepz initialize the state, i.e. leave the state of a qubit in the |0> state? When so, this can be reflected in the rs map.

    • no (default, playing safe): no, it doesn’t; a prepz during mapping will, as any other quantum gate, set the state of the operand qubits to rs_hasstate in the rs map

    • yes (best): a prepz during mapping will set the state of the operand qubits to rs_wasinited; any other gate will set the state of the operand qubits to rs_hasstate

  • mapselectswaps: When scheduling-in swaps and moves at the end for the best alternative found, this option selects that potentially not all required swaps and moves are inserted. When not all are inserted but only one, the distance of the mapped operand qubits of the two-qubit gate for which the best alternative was generated, will be one less, and after insertion heuristic routing and mapping starts over generating alternatives for the new situation.

    Please note that during evaluation of the alternatives, all swaps and moves are inserted. So the alternatives are compared with all swaps and moves inserted but only during the final real insertion after having selected the best alternative, just one is inserted.

    • all (best, default): insert all swaps and moves as usual

    • one: insert only one swap or move; take the one swapping/moving the mapped first operand qubit

    • earliest: insert only one swap or move; take the one that can be scheduled earliest from the one swapping/moving the mapped first operand qubit and the one swapping/moving the mapped second operand qubit

  • mapreverseswap: Since swap is symmetrical in effect (the states of the qubits are exchanged) but not in implementation (the gates on the second operand start one cycle earlier and end one cycle later), interchanging the operands may cause a swap to be scheduled at different cycles. Reverse operand real qubits of swap when beneficial:

    • no: don’t

    • yes (best, default): when scheduling a swap, exploiting the knowledge that the execution of a swap for one of the qubits starts one cycle later, a reversal of the real qubit operands might allow scheduling it one cycle earlier

Looking Farther Ahead, Recurse To Find Best Alternative

Looking farther ahead beyond the mapping of the current two-qubit gate, the router recurses considering the effects of its mapping on subsequent two-qubit gates.

After having evaluated the metric for each alternative, multiple alternatives may remain, all with the best value. For the minextend and minextendrc strategies, there are options to select from these by looking ahead farther, i.e. beyond the metric evaluation of this alternative for mapping one two-qubit gate. This recursion assumes that the current alternative is selected, its swaps and moves are added to the circuit the v2r map is updated, and the availability set is updated. And then in this new situation the implementation recurses by selecting one or more two-qubit gates to map next, generating alternatives, evaluating these alternatives against the metric, and deciding which alternatives are the best. This recursion can go deeper and deeper until a particular depth has been reached. Then of the resulting tree of alternatives, for all the leaves representing the deepest alternatives, the metric is computed from the root to the leaf and compared to each other. In this way suboptimalities of individual choices can be balanced to a more optimal combination. From these leaves, the best is taken; when multiple alternatives compare equally well from root to leaf, the maptiebreak option decides which one to take, as usual; see below there.

The following options control this recursion:

  • mapselectmaxlevel: Looking farther ahead beyond the mapping of the current two-qubit gate, the router recurses considering the effects of its mapping on subsequent two-qubit gates. The level specifies the recursion depth: how many two-qubits in a row are considered beyond the current one. This generates a tree of alternatives.

    • 0 (default, back-ward compatible): no recursion is done

    • 1, 2, 3, 4, 5, 6, 7, 8, 9, 10: the indicated number of recursions is done; initial experiments show that a value of 3 produces reasonable results, and that recursion depth of 5 and higher are infeasible because of resource demand explosion

    • inf: there is no limit to the number of recursions; this makes the resource demand of heuristic routing and mapping explode

  • mapselectmaxwidth: Not all alternatives are equally promising, so only some best are selected to recurse on. The width specifies the recursion width: for how many alternatives the recursion is actually done. The specification of the width is done relative to the number of alternatives that came out as best at the current recursion level.

    • min (default): only recurse on those alternatives that came out as best at this point

    • minplusone: only recurse on those alternatives that came out as best at this point, plus one second-best

    • minplushalfmin (best combination of optimality and resources: only recurse on those alternatives that came out as best at this point, plus some number of second-bests: half the number more than the number of best ones

    • minplusmin: only recurse on those alternatives that came out as best at this point, plus some number of second-bests: twice the number of best ones

    • all: don’t put a limit on the recursion width

  • maprecNN2q: In maplookahead with value all, as with noroutingfirst, two-qubit gates which are already NN, are immediately mapped, kind of flushing them. However, in recursion this creates an imbalance: at each level optionally several more than just one two-qubit gate are mapped and this makes the results of the alternatives largely incomparable. Comparision would be easier to understand when at each level only one two-qubit gate would be mapped. This option specifies independently of the maplookahead option that is chosen and that is applied before going into recursion, whether in the recursion this immediate mapping/flushing of NN two-qubit gates is done.

    • no (default, best): no, NN two-qubit gates are not immediately mapped and flushed until only non-NN two-qubit gates remain; at each recursion level exactly one two-qubit gate is mapped

    • yes: yes, NN two-qubit gates are immediately mapped and flushed until only non-NN two-qubit gates remain; this makes recursion more greedy but makes interpreting the evaluations of the alternatives harder

Deciding For The Best, Committing To The Best

With or without recursion, for the base strategy as well as for the minextend and minextendrc strategies, when at the end multiple alternatives still compare equally well, a decision has to be taken which two-qubit gate to route and map. This selection is made based on the value of the following option:

  • maptiebreak: When multiple alternatives remain for a particular strategy with the same best evaluation value, decide how to select the best single one:

    • first: select the first of the set

    • last: select the last of the set

    • random (default, best, non-deterministic): select in a random way from the set; when testing and comparing mapping strategies, this option introduces non-determinism and non-reproducibility, which precludes reasoning about the strategies unless many samples are taken and statistically analyzed

    • critical (deterministic, second best): select the first of the alternatives generated for the most critical two-qubit gate (when there were more)

Having selected a single best alternative, the decision has been made to route and map its corresponding two-qubit gate. This means, scheduling in the result circuit the swaps and moves that route the mapped operand qubits, updating the v2r and rs maps on the fly; see Look-Back, Maximize Instruction-Level Parallelism By Scheduling for the details of this scheduling. And then map the two-qubit gate; see Input and output intermediate representation for what mapping involves.

After this, in the dependence graph a next gate is looked for to map next and heuristic routing and mapping starts over again.

Configuration file definitions for mapper control

The configuration file contains the following sections that are recognized by the mapper:

  • hardware_settings

    the number of real qubits in the platform, and the cycle time in nanoseconds to convert instruction duration into cycles used by the various scheduling actions are taken from here

  • instructions

    the mapper assumes that the OpenQL circuit was read in and that gates were created according to the specifications of these in the configuration file: the name of each encountered gate is looked up in this section and, if not found, in the gate_decomposition section; if found, that gate (or those gates) are created; the duration field specifies the duration of each gate in nanoseconds; the type and various cc_light fields of each instruction are used as parameters to select applicable resource constraints in the resource-constrained scheduler

  • gate_decomposition

    when creating a gate matching an entry in this section, the set of gates specified by the decomposition description of the entry is created instead; the mapper exploits the decomposition support that the configuration file offers by this section in the following way:

    • reading the circuit When a gate specified as a composite gate is created in an OpenQL program, its decomposition is created instead. So a cnot in the OpenQL program that is specified in the gate_decomposition section as e.g. two hadamards with a cz in the middle, is input by the mapper as this latter sequence.

    • swap support A swap is a composite gate, usually consisting of 3 cnots; those cnots usually are decomposed to a sequence of primitive gates itself. The mapper supports generating swap as a primitive; or generating its shallow decomposition (e.g. to cnots); or generating its full decomposition (e.g. to the primitive gate set). The former leads to a more readable intermediate qasm file; the latter to more precise evaluation of the mapper selection criteria. Relying on the configuration file, when generating a swap, the mapper first attempts to create a gate with the name swap_real, and when that fails, create a gate with the name swap. The same machinery is used to create a move.

    • making gates real Each gate input to the mapper is a virtual gate, defined to operate on virtual qubits. After mapping, the output gates are real gates, operating on real qubits. Making gates real is the translation from the former to the latter. This is usually done by replacing the virtual qubits by their corresponding real qubits. But support is provided to also replace the gate itself: when a gate is made real, the mapper first tries to create a gate with the same name but with _real appended to its name (and using the mapped, real qubits); if that fails, it keeps the original gate and uses that (with the mapped, real qubits) in the result circuit.

    • ancilliary initialization For a move to be done instead of a swap, the target qubit must be in a particular state. For CC-Light this is the |+> state. To support other target platforms, the move_init gate is defined to prepare a qubit in that state for the particular target platform. It decomposes to a prepz followed by a Hadamard for CC-Light.

    • making all gates primitive After mapping, the output gates will still have to undergo a final schedule with resource constraints before code can be generated for them. Best results are obtained when then all gates are primitive. The mapper supports a decomposition step to make that possible and this is typically used to decompose leftover swaps and moves to primitives: when a gate is made primitive, the mapper first tries to create a gate with the same name but with _prim appended to its name; if that fails, it keeps the original gate and uses that in the result circuit that is input to the scheduler.

  • topology A qubit grid’s topology is defined by the neighbor relation among its qubits. Each qubit has an id (its index, used as a gate operand and in the resources descriptions) in the range of 0 to the number of qubits in the platform minus 1. Qubits are connected by directed pairs, called edges. Each edge has an id (its index, also used in the resources descriptions) in some contiguous range starting from 0, a source qubit and a destination qubit. Two grid forms are supported: the xy form and the irregular form. In grids of the xy form, there must be two additional attributes: x_size and y_size, and the qubits have in addition an X and a Y coordinate: these coordinates in the X (Y) direction are in the range of 0 to x_size-1 (y_size-1).

  • resources See the scheduler’s documentation.