Skip to main content

View on GitHub

Open this notebook in GitHub to run it yourself
Quantum Support Vector Machines is the quantum version of classical Support Vector Machines (SVM); i.e., a data classification method that separates the data by performing a mapping to a high-dimensional space, in which the data is separated by a hyperplane [1]. QSVM is a hybrid quantum–classical classification algorithm in which classical data are embedded into a high-dimensional quantum Hilbert space using a parameterized quantum feature map. A quantum processor is then used to evaluate inner products between these quantum states, producing a kernel matrix that captures similarities between data points in this quantum feature space. This kernel is passed to a classical support vector machine optimizer, which learns the optimal separating hyperplane by identifying support vectors and model parameters. For prediction, the trained model classifies new data points using quantum-evaluated kernel values and a classical decision rule. For some problem instances, a quantum feature map may enable improved classification performance while using fewer computational resources than a classical algorithm [2]. The algorithm treats the following problem:
  • Input: Classical data points xi{\mathbf{x}_i}, where xiRd\mathbf{x}_i\in \mathbb{R}^d are d-dimensional vectors, corresponding labels yi{1,1}y_i \in\{-1, 1\}, where i=1,,mi=1,\dots,m, as well as feature map Uϕ(xi)U_\phi(\mathbf{x}_i), encoding the classical data in a quantum state.
  • Output: A kernel matrix evaluated using quantum measurements. The matrix is then fed into a classical SVM optimizer, producing a full characterization of the separating hyperplane.

Keywords: Quantum Machine Learning (QML), hybrid quantum–classical algorithm, supervised learning, binary classification.

Background

Our goal is to find a hyperplane in Rd\mathbb{R}^d which separates the points {xi}\{\mathbf{x}_i\} into ones for which the corresponding labels are yi=+1y_i = +1 and yi=1y_i = -1. The hyperplane is conveniently defined by a vector normal to it, wRd\mathbf{w} \in \mathbb{R}^d and an offset bRb \in \mathbb{R}. The classification of a point x\mathbf{x} can be determined by hw(x)=sign(w,x+b)h_{\mathbf{w}}(\mathbf{x}) = \text{sign}(\langle \mathbf{w},\mathbf{x} \rangle + b), which decides on which side of the hyperplane the point lies on. Here w,x\langle \mathbf{w},\mathbf{x} \rangle is the inner product between the two vectors. To describe the goal explicitly, we introduce the geometric margin as the distance from the hyperplane to the closest training point xi\mathbf{x}_i: minx((w,x/w)\min_{\mathbf{x}}((\langle \mathbf{w},\mathbf{x} \rangle/||\mathbf{w}||). The optimal classification corresponds to the hyperplane and offset that maximizes the geometric margin. This goal can be stated as a naive optimization problem: find w\mathbf{w} satisfying maxwmini(sign(w,xi+b))\text{max}_{\mathbf{w}}\text{min}_{\mathbf{i}}(\text{sign} (\langle \mathbf{w},\mathbf{x}_i \rangle + b )) subject to:   sign(w,xi+b)=sign(yi) .\text{subject to: }~~\text{sign}(\langle \mathbf{w},\mathbf{x}_i \rangle + b) = \text{sign}(y_i)~. However, this formulation of the problem is a nonlinear optimization problem due to the sign comparison. Alternatively, we can express the objective as a linear function with linear constraints. A separating hyperplane satisfies (w,xi+b)yi0  ,(\langle \mathbf{w},\mathbf{x}_i \rangle + b)y_i \geq 0 ~~, moreover, we set the length of w\mathbf{w} by enforcing that the inner product with respect to the nearest point (in fact, there will always be two data points on each side of the hyperplane with the same minimal distance to the hyperplane) to be w,xmin=1\langle \mathbf{w},\mathbf{x}_{\min} \rangle = 1. As a result, all data points satisfy (w,xi+b)yi1(\langle \mathbf{w},\mathbf{x}_i \rangle + b)y_i \geq 1. The condition enables defining the optimization problem minimize 12w2{\text{minimize}~ \frac{1}{2}} || \mathbf{w}||^2 subject to   (w,xi+b)yi1    for    i=1,,m .\text{subject to }~~ (\langle \mathbf{w},\mathbf{x}_i \rangle + b)y_i \geq 1~~~~\text{for}~~~~i=1,\dots,m~. In general, it would not be possible to separate the bare data points by a hyperplane; therefore, a transformation of the data to a higher-dimensional space using a feature map ϕ(x)\phi(\mathbb{x}) is performed. Following the transformation, the hyperplane and offset are evaluated (similar problem, obtained by transforming xiϕ(xi)\mathbf{x}_i \rightarrow \phi(\mathbf{x}_i)). The main disadvantage of the present (primal) formulation of the problem is that explicitly computing ϕ(x)\phi(\mathbf{x}) may require infeasible computational resources, or even involve a mapping to an infinite-dimensional space. An alternative approach utilizes the dual formulation of the problem. This approach relies on the Karush-Kuhn-Tucker theorem [3], which implies that one can formulate a dual optimization problem, whose solution (under certain conditions which are satisfied for the present case) coincides with the solution of the present (primal) problem. The dual problem of the original primal optimization problem is given by maximize  LD(α1,,αm)=i=1mαi12i,j=1myiyjαiαjK(xi,xj)(2)\text{maximize}~~ {\cal{L}}_D(\alpha_1,\dots,\alpha_m) = \sum_{i=1}^m \alpha_i - \frac{1}{2} \sum_{i,j=1}^m y_i y_j \alpha_i \alpha_j K(\mathbf{x}_i, \mathbf{x}_j) \tag{2} subject to:    αi0    for    i=1,,m  \text{subject to:}~~~~\alpha_i \geq 0~~~~\text{for}~~~~i=1,\dots,m~~ i=1myiαi=0  ,\sum_{i=1}^m y_i \alpha_i = 0~~, where K(xi,xj)=xi,xjK(\mathbf{x}_i, \mathbf{x}_j) = \langle \mathbf{x}_i,\mathbf{x}_j \rangle is the (i,j)(i,j) component of the matrix, called the kernel matrix. The important advantage of the dual formulation is that for specific feature maps, evaluation of the kernel matrix components does not require explicit assessment of the inner product of two feature vectors (which might be infinite-dimensional after the feature map transformation). The quantum version of SVM is based on the dual optimization problem, where the main innovation is that a quantum computer can perform unitary feature transformations by applying quantum circuits and evaluate the inner product between transformed states by specified measurements.

QSVM Algorithms

The QSVM training algorithm includes three-steps.
  1. Data loading of the classical data and feature map transformation.
  2. Evaluation of the overlap between two feature states.
  3. Classical optimization procedure, optimizing the circuit control parameters and modification of the feature map transformation.
Train: Step 1: The mapping of a classical data point x\mathbf{x} into a quantum feature state involves loading or encoding the data into a quantum state 0nUDE(x)x  .|{0}^n\rangle \xrightarrow{U_{DE}(\mathbf{x})} |\mathbf{x}\rangle ~~. Various popular transformations exist, for example: basis, amplitude, angle, and dense encoding. The possible approaches showcase a general tradeoff between the number of qubits required to encode the classical data and the circuit depth. Generally, highly entangled states allow encoding more classical data utilizing fewer qubit, while requiring deeper circuits. Following, a unitary feature operation maps the encoded state to a quantum feature state xUϕ(x)ϕ(x)  .|{\mathbf{x}}\rangle \xrightarrow{U_{\phi}(\mathbf{x})} |\phi(\mathbf{x})\rangle ~~. The two transformations can be combined to a single unitary transformation U(x)=Uϕ(x)UDC(x)U(\mathbf{x}) = U_{\phi}(\mathbf{x})U_{DC}(\mathbf{x}), dependent on the classical data point x\mathbf{x}. Step 2: The overlap between two feature vectors ϕ(xi)\phi(\mathbf{x}_i) and ϕ(xj)\phi(\mathbf{x}_j) is performed by applying the circuit UQSVM(xi)=U(xi)U(xj)U_{\text{QSVM}}(\mathbf{x_i}) = U^{\dagger}(\mathbf{x}_i)U(\mathbf{x}_j) to the initial state 0n|0^n\rangle and measuring in the probability to measure 0n0^n. The expected probability, P0n=ϕ(xi)ϕ(xj)2P_{|0^n\rangle}=| \langle \phi (\mathbf{x}_i)| \phi(\mathbf{x}_j) \rangle|^2 provides the elements of the kernel matrix K(xi,xj)=ϕ(xi)ϕ(xj)  .K(\mathbf{x}_i,\mathbf{x}_j) = \langle \phi (\mathbf{x}_i)| \phi(\mathbf{x}_j) \rangle~~. Step 3: Optimize the dual problem using a classical optimization algorithm. The SVM optimization problem, Eq. (2), is a quadratic programming problem (a specific case of a convex optimization problem), for which several exact or approximate solution methods exist, such as active-set, interior point, and gradient/projection-based methods. The algorithm is given the kernel matrix and constraints as input, and produces the optimized {αi}\{\alpha_i\} coefficients. Prediction: For a new data point s\mathbf{s}, the kernel matrix of the new datum is evaluated with respect the optimized {αi}\{\alpha_i\} Predicted Labels(x)=sign(i=1myiαiK(xi,s)+b)  .(1)\text{Predicted Labels}(\mathbf{x}) = \text{sign}(\sum_{i=1}^m y_i \alpha_i K(\mathbf{x}_i,\mathbf{s})+b)~~. \tag{1} In practice, only a number of data points contribute to optimization (the support vectors). Only for these data points the corresponding coefficinets, αi\alpha_i, do not vanish. As a consequence, most of the terms in the sum of Eq. (1) vanish, and we can limit the calclulation of the kernel matrix elements for ii‘s for which αi0\alpha_i\neq 0.

QSVM with Classiq

We consider two kinds of 2D data sets:
  • A “simple” data set, constructed by randomly distributing points around two source data points.
This data set enables a straightforward linear classification of the data by the introduction of a separating line (hyperplane in 2D) and constitutes a simple preliminary example.
  • A more complex data set, generated by qiskit’s ad_hoc_data function.
This is a special type of dataset that requires a highly nonlinear transformation to classify the data. Specifically, it can be accurately classified by a Pauli transformation map. The data sets are classified utilizing two feature maps: the Bloch Sphere and Pauli maps. From these data sets and feature maps, we construct exemplary examples.
  1. In the first example, we generate “simple” artificial training, test, and prediction data sets (defined points in the data space). A Bloch sphere transformation is employed as a feature map, allowing perfect classification of the data.
The model is trained, tested, and then used for predictions of the labels of the prediction data set.
  1. The second examples involve classifying the ad_hoc_data by application of both the Bloch sphere and Pauli mappings.
The Pauli transformation can accurately classify the data, while the Bloch sphere mapping manages to classify approximately half of the test and prediction data sets. The examples emphasize the importance of tailoring the chosen feature map to the specific data set. Note: In the following examples, the labels are either 00 or 11 instead of +1/1+1/-1 as is in the theoretical description.

Example 1: Bloch Sphere Feature Map Applied to Linearly Classifiable Data

We start coding with the relevant imports:
!pip install -qq -U "classiq[qml]"
!pip install -qq -U qiskit-algorithms
!pip install -qq -U qiskit-machine-learning
!pip install -qq -U scikit-learn

import matplotlib.pyplot as plt
import numpy as np

from classiq import *
from classiq.applications.qsvm.qsvm import QSVM, QuantumKernelEvaluator
from classiq.applications.qsvm.quantum_feature_maps import pauli_feature_map
from classiq.open_library.functions.variational import inplace_encode_on_bloch
Next, we generate data. Three data sets are generated:
  • Training data: labelled data utilized to train and optimize the algorithm parameters
  • Test data: labelled data employed to evaluate the optimization process
  • Prediction data: unlabelled data that the optimized algorithm predicts the corresponding classification labels.
This example takes a 2D input space and a binary classification (i.e., only two groups of data points):
import random

seed = 0
random.seed(seed)
np.random.seed(seed)
In the data generation we utilize a number of utility functions:
  • generate_data: given two sources points and outputs a python dictionary with the training data points (random points within the vicinity of the sources).
  • data_dict_to_data_and_labels: given a generated data dictionary, outputs the input data and associated labels.
# Importing functions used for this demo, to generate random linearly separable data
from classiq.applications.qsvm.qsvm_data_generation import (
    data_dict_to_data_and_labels,
    generate_data,
)

# Generate sample data:
sources = np.array([[1.23016026, 1.72327701], [3.20331931, 5.32365722]])

training_input: dict = generate_data(sources=sources)
test_input: dict = generate_data(sources=sources)
predict_input, predict_real_labels = data_dict_to_data_and_labels(
    generate_data(sources=sources)
)

Defining the Data

In addition to the feature map, we need to prepare our data. The training_input and test_input datasets consist of data and its labels. The labels are a 1D array where the value of the label corresponds to each data point and can be basically anything, such as (0, 1), (3, 5), or (‘A’, ‘B’). The predict_input consists only of data points (without labels). We normalize the data to be in the range [1,1)[-1, 1).
# Prepare and define `train_input` and `test_input` datasets consisting of data and labels
TRAIN_DATA_1, TRAIN_LABELS_1 = data_dict_to_data_and_labels(training_input)
TEST_DATA_1, TEST_LABELS_1 = data_dict_to_data_and_labels(test_input)

# Prepare and define `predict_input`
PREDICT_DATA_1 = predict_input


def normalize(data: np.ndarray, range: tuple) -> np.ndarray:
    """
    Normalizes data in the range [range[0], range[1]) to be between [-1,1)

    Args:
      data (np.ndarray): data to be normalized
      range (tuple): range of the data
    """
    return (2 * data - range[1] - range[0]) / (range[1] - range[0])


# normalizing the data appropriatly
RANGE = (0, 6 * np.pi)
TRAIN_DATA_1 = normalize(TRAIN_DATA_1, RANGE)
TEST_DATA_1 = normalize(TEST_DATA_1, RANGE)
PREDICT_DATA_1 = normalize(PREDICT_DATA_1, RANGE)
To get a better understanding of the classification task at hand, we plot the data.
# Plot the data
plot_range = (-1, -0.4)
colors = {0: "blue", 1: "orange"}
for i in range(TRAIN_DATA_1.shape[0]):
    plt.scatter(*TRAIN_DATA_1[i, :].T, c=colors[TRAIN_LABELS_1[i]])
plt.title("Training Data")
plt.xlim(plot_range)
plt.ylim((-1, 0))
plt.show()

for i in range(TRAIN_DATA_1.shape[0]):
    plt.scatter(*TEST_DATA_1[i, :].T, c=colors[TEST_LABELS_1[i]])
plt.title("Test Data")
plt.xlim(plot_range)
plt.ylim((-1, 0))
plt.show()

plt.scatter(*PREDICT_DATA_1.T)
plt.title("Prediction Data (unlabeled)")
plt.xlim((-1, -0.4))
plt.ylim((-1, -0.2))
plt.show()
output output output Here the dark blue and orange dots correspond to the labels 00 and 11, and the prediction data is unlabelled.

Defining the Feature Map

When constructing a QSVM model, we must supply the feature map, encoding the classical data into quantum states in Hilbert space (the feature space of the problem). Here, we choose to encode the data onto the surface of the Bloch sphere. This can be defined in terms of the following transformation on the 2D data point x=[x0,x1]TRZ(πx1)RX(πx0)0=cos(πx0/2)0+ieiπx1sin(πx0/2)1  ,\mathbf{x} = [x_0,x_1]^T\rightarrow R_Z(\pi x_1) R_X(\pi x_0)|0\rangle = \cos(\pi x_0/2)|0\rangle + i e^{i \pi x_1}\sin(\pi x_0/2)|1\rangle~~, where the circuit takes a single qubit per data point and the last equality is up to a global phase. We define a quantum function that generalizes the Bloch sphere mapping to an input vector of any dimension (also known as “dense angle encoding” in the field of quantum neural networks). Each pair of entries in the vector is mapped to a Bloch sphere. If there is an odd size, we apply a single RX gate on an extra qubit. Since a single qubit stores the data of a single data point, for such a feature mapping the number of qubits required is n=2md/2n=2\lceil m d/2 \rceil. The feature map is uploaded from the open_library.functions.variational module at the beginning of the notebook. Unlike the other feature maps, this feature map is located outside the quantum_feature_map module, as it serves as a building block beyond the scope of QSVM and QML.

Constructing a Model

We begin by building Classiq’s QSVM class, consisting of the QML model. The model is given a feature map from the quantum_feature_maps module, and possibly ExecutionPreferences. The feature map will be employed in the evaluation of the kernel in the training step.
# Build a quantum support vector machine model
bloch_num_qubits = int(np.ceil(np.log2(TRAIN_DATA_1.shape[1])))
qsvm_model = QSVM(feature_map=inplace_encode_on_bloch, num_qubits=bloch_num_qubits)

Executing QSVM

The execution involves the following steps:
  1. Training
  2. Testing the training process, and outputing a test score.
  3. Predicting, by taking unlabeled data and returning its predicted labels.
This may be applied multiple times on different datasets. These steps are performed utilizing the train, test and predict methods of the QSVM class. In the training stage, the quantum kernel is constructed element by element, by repeated execution of quantum circuits. The execution employs Classiq’s sample_batch to evaluate the overlap between the states encoding the two data points of each pair. The kernel matrix is then fed into by scikit-learn’s SVC (Support Vector Classifier) function to optimize the quadratic program (the dual optimization problem). The knowledge of the optimized coefficients enables the model to predict the classification of a new data point, utilizing Eq. (1).
# Train the model
qsvm_model.train(TRAIN_DATA_1, TRAIN_LABELS_1)

# Check the test score
test_score, test_predicted_labels = qsvm_model.test(TEST_DATA_1, TEST_LABELS_1)

# Predict labels
predicted_labels = qsvm_model.predict(PREDICT_DATA_1)
The quantum program is accessible by QSVM’s method get_qprog.
qprog_bloch = qsvm_model.get_qprog()
show(qprog_bloch)
Output:

Quantum program link: https://platform.classiq.io/circuit/3CZaBtXy43KaHm7XyENHDOsj1r7
  

Results

We can view the classification accuracy through test_score, moreover, since this data was previously generated, we also know the real labels and can print them for comparison.
# Printing tests result
print(f"Testing success ratio: {test_score}")
print()
# Printing predictions
print("Prediction from datapoints set:")
print(f"  ground truth: {predict_real_labels}")
print(f"  prediction:   {predicted_labels}")
print(
    f"  success rate: {100 * np.count_nonzero(predicted_labels == predict_real_labels) / len(predicted_labels)}%"
)
Output:

Testing success ratio: 1.0

  Prediction from datapoints set:
    ground truth: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   1 1 1]
    prediction:   [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
   1 1 1]
    success rate: 100.0%
  

We can even visualize the predicted results:
plt.figure()
for i in range(PREDICT_DATA_1.shape[0]):
    plt.scatter(*PREDICT_DATA_1[i, :].T, c=colors[predicted_labels[i]])
plt.title("Prediction Data")
plt.xlim(plot_range)
plt.ylim((-1, 1 + 0.3))
plt.show()
output

Example 2: Puali and Bloch Sphere Feature Map on a Complex Data Set

We begin by importing the relevant software packages
from itertools import combinations

from qiskit_algorithms.utils import algorithm_globals
from qiskit_machine_learning.datasets import ad_hoc_data
We consider a more complicated classification task, utilizing qiskit’s ad_hoc_data, we generate a data set which can be fully separated by ZZ feature map [1]
seed = 12345
algorithm_globals.random_seed = seed

adhoc_dimension = 2

TRAIN_DATA_2, TRAIN_LABELS_2, test_features, test_labels, ad_hoc_total = ad_hoc_data(
    training_size=20,
    test_size=5 + 5,  # 5 for test, 5 for predict
    n=adhoc_dimension,
    gap=0.3,
    plot_data=False,
    one_hot=False,
    include_sample_total=True,
)
Next we split the test features and labels to a test and prediction set.
# the sizes of the `test_features` and `test_labels` are double those of the `test_size` argument
# Since there are `test_size` items for each `adhoc_dimension`


def split(obj: np.ndarray, n: int = 20):
    quarter = n // 4
    half = n // 2
    first = np.concatenate((obj[:quarter], obj[half : half + quarter]))
    second = np.concatenate((obj[quarter:half], obj[half + quarter :]))
    return first, second


TEST_DATA_2, PREDICT_DATA_2 = split(test_features)
TEST_LABELS_2, predict_real_labels_2 = split(test_labels)
The data can be visualised by a color coded plot
# Plot data
import matplotlib.pyplot as plt

plt.scatter(
    TRAIN_DATA_2[np.where(TRAIN_LABELS_2[:] == 0), 0],
    TRAIN_DATA_2[np.where(TRAIN_LABELS_2[:] == 0), 1],
    marker="s",
    facecolors="w",
    edgecolors="b",
    label="A train",
)
plt.scatter(
    TRAIN_DATA_2[np.where(TRAIN_LABELS_2[:] == 1), 0],
    TRAIN_DATA_2[np.where(TRAIN_LABELS_2[:] == 1), 1],
    marker="o",
    facecolors="w",
    edgecolors="r",
    label="B train",
)
plt.scatter(
    TEST_DATA_2[np.where(TEST_LABELS_2[:] == 0), 0],
    TEST_DATA_2[np.where(TEST_LABELS_2[:] == 0), 1],
    marker="s",
    facecolors="b",
    edgecolors="w",
    label="A test",
)
plt.scatter(
    TEST_DATA_2[np.where(TEST_LABELS_2[:] == 1), 0],
    TEST_DATA_2[np.where(TEST_LABELS_2[:] == 1), 1],
    marker="o",
    facecolors="r",
    edgecolors="w",
    label="B test",
)

plt.legend(bbox_to_anchor=(1.05, 1), loc="upper left", borderaxespad=0.0)
plt.title("Ad hoc dataset for classification")

plt.show()
output

Pauli Feature Map

We build a Pauli feature map. This feature map is of size NN qubits for data x\mathbf{x} of size NN, and it corresponds to the following unitary: U=exp(fk(1)(x)Hk(1)+fk(2)(x)Hk(2)+)Hn  ,U = \exp\left(\sum f^{(1)}_k(\mathbf{x})H^{(1)}_k + \sum f^{(2)}_k(\mathbf{x})H^{(2)}_k+\dots \right) H^{\otimes n}~~, where HnH^{\otimes n} designates the Hadamard transform, and H(i)H^{(i)} is a Hamiltonian acting on ii qubits according to some connectivity map, and f(i)f^{(i)} is some classical function, typically taken as the polynomial of degree ii. For example, if our data is of size 33 and we assume circular connectivity, taking Hamiltonians depending only on ZZ, the Hamiltonian reads as fk(1)(x)Hk(1)=α(x0+β)ZII+α(x1+β)IZI+α(x2+β)IIZ,\sum f^{(1)}_k(\mathbf{x})H^{(1)}_k = \alpha(x_0+\beta)ZII+\alpha(x_1+\beta)IZI+\alpha(x_2+\beta)IIZ, fk(2)(x)Hk(2)=γ2(x0+ζ)(x1+ζ)ZZI+γ2(x1+ζ)(x2+ζ)IZZ+γ2(x0+ζ)(x3+ζ)ZIZ  ,\sum f^{(2)}_k(\mathbf{x})H^{(2)}_k = \gamma^2(x_0+\zeta)(x_1+\zeta)ZZI+\gamma^2(x_1+\zeta)(x_2+\zeta)IZZ + \gamma^2(x_0+\zeta)(x_3+\zeta)ZIZ~~, where (α,β)(\alpha,\beta) and (γ,ζ)(\gamma,\zeta) define some affine transformation on the data and correspond to the functions f(1,2)f^{(1,2)}. We start by defining classical functions for creating a connectivity map for the Hamiltonians and for generating the full Hamiltonian:

Model Construction

We firs define the hyperparameters of the Pauli feature map and construct an appropriate wrapper feature map, utilizing pauli_feature_map.
# Define the parameters for the Pauli feature map
N_DIM = 2
PAULIS = [[Pauli.Z], [Pauli.Z, Pauli.Z]]
CONNECTIVITY = 2
AFFINES = [[1, 0], [1, np.pi]]
REPS = 2

# Build the wrapper function for the Pauli feature map
feature_map = lambda data, qba: pauli_feature_map(
    data, PAULIS, AFFINES, CONNECTIVITY, REPS, qba
)
Next, the model is constructed
pauli_num_qubits = 2
# Build a quantum support vector machine model
qsvm_model_pauli = QSVM(feature_map=feature_map, num_qubits=pauli_num_qubits)

Train, Test and Prediction of the Pauli Model

# Train the model
qsvm_model_pauli.train(TRAIN_DATA_2, TRAIN_LABELS_2)

# Check the test score
test_score_pauli, test_predicted_labels_pauli = qsvm_model_pauli.test(
    TEST_DATA_2, TEST_LABELS_2
)

# Predict labels
predicted_labels_pauli = qsvm_model_pauli.predict(PREDICT_DATA_2)

Prediction Utilizing the Bloch Feature Map

We compare the Pauli feature map to the Bloch feature map results. For that end, we construct a new model, using the inplace_encode_on_bloch.
# Build a quantum support vector machine model
qsvm_model_bloch = QSVM(
    feature_map=inplace_encode_on_bloch, num_qubits=bloch_num_qubits
)

# Train the model
qsvm_model_bloch.train(TRAIN_DATA_2, TRAIN_LABELS_2)

# Check the test score
test_score_bloch, test_predicted_labels_bloch = qsvm_model_bloch.test(
    TEST_DATA_2, TEST_LABELS_2
)

# Predict labels
predicted_labels_bloch = qsvm_model_bloch.predict(PREDICT_DATA_2)

Results

The Pauli feature map accurately classifies the test and prediction data sets.
# Printing tests result
print(f"Testing success ratio for the Pauli feature map: {test_score_pauli}")
print()
# Printing predictions
print("Prediction from datapoints set:")
print(f"ground truth: {predict_real_labels_2}")
print(f"prediction:   {predicted_labels_pauli}")
print(
    f"success rate: {100 * np.count_nonzero(predicted_labels_pauli == predict_real_labels_2) / len(predicted_labels_pauli)}%"
)
Output:

Testing success ratio for the Pauli feature map: 1.0

  Prediction from datapoints set:
  ground truth: [0 0 0 0 0 1 1 1 1 1]
  prediction:   [0 0 0 0 0 1 1 1 1 1]
  success rate: 100.0%
  

In comparison, the Bloch sphere feature map achieves lower accuracy in the classification task.
# Printing tests result
print(f"Testing success ratio for the Bloch sphere feature map: {test_score_bloch}")
print()
# Printing predictions
print("Prediction from datapoints set:")
print(f"ground truth: {predict_real_labels_2}")
print(f"prediction:   {predicted_labels_bloch}")
print(
    f"success rate: {100 * np.count_nonzero(predicted_labels_bloch == predict_real_labels_2) / len(predicted_labels_bloch)}%"
)
Output:

Testing success ratio for the Bloch sphere feature map: 0.6

  Prediction from datapoints set:
  ground truth: [0 0 0 0 0 1 1 1 1 1]
  prediction:   [1 0 1 1 1 1 1 1 1 1]
  success rate: 60.0%
  

Viewing the Model’s Parameterized Quantum Circuit

qprog_pauli = qsvm_model_pauli.get_qprog()
show(qprog_pauli)
Output:

Quantum program link: https://platform.classiq.io/circuit/3CZaGsdOUu684lPN1pwno4z0Jl9
  

Summary and Discussion

The notebook demonstrated the application of the Quantum Support Vector Machine (QSVM) algorithm to supervised data classification tasks. Two datasets were analyzed using two distinct quantum feature maps: a Bloch-sphere–based encoding and a Pauli-based feature map. The Bloch-sphere mapping achieved perfect classification accuracy on a linearly separable dataset but showed only moderate performance when applied to a more complex, nonlinearly separable dataset. In contrast, the Pauli feature map successfully captured the structure of the complex dataset and yielded accurate classification results. The results emphasize a central consideration in QSVM design: the choice of quantum feature map plays a decisive role in model performance. Effective alignment between the feature map and the underlying structure of the data is essential for achieving high classification accuracy.

Reference

[1] Havlíček, V., Córcoles, A. D., Temme, K., Harrow, A. W., Kandala, A., Chow, J. M., & Gambetta, J. M. (2019). Supervised learning with quantum-enhanced feature spaces. Nature, 567(7747), 209-212. [2] Liu, Y., Arunachalam, S., & Temme, K. (2021). A rigorous and robust quantum speed-up in supervised machine learning. Nature Physics, 17(9), 1013-1017. [3] Karush–Kuhn–Tucker conditions