Note: This post is part 2 in a series. You can find the inaugural post here.
In the prior post in this series, we covered 2 topics: some preliminaries of quantum computing, and methods for encoding classical data into quantum states. Please take a look at that post if you need a refresher on those topics.
In this post, we start to look at quantum kernels, which are similarity measures between 2 pieces of classical data evaluated using parameterized quantum circuits (PQCs). Quantum kernels can be used in any kernel-based classical ML algorithm. If you need a quick overview of kernel methods in classical machine learning, the Wikipedia article on the topic is pretty good, as is this video.
Note: people often use the phrase "quantum support vector machine" when they should use "quantum kernels". This is more of a historical anachronism than anything else, but it’s worth paying attention to when reading the literature.
Also – and I should have said this at the start of the series – I’m linking to papers on the arXiv because that way you can access them for free. Many of the papers have been published in peer-reviewed journals, but those typically have paywalls.
What are quantum kernels?
First, let’s define what quantum kernels are. As far as I’m aware, the idea of quantum kernels was introduced at roughly the same time (in early 2018) by the 2 papers Quantum machine learning in feature Hilbert spaces & Supervised learning with quantum enhanced feature spaces.
Watch the video Lecture 6.2 - Quantum Feature Spaces and Kernels from the 2021 Qiskit Global Summer school. It’s important to pay attention to how a quantum kernel is defined as a quantum circuit, and how the choice of feature map (PQC) influences the kernel.
Lecture notes are here.
A companion chapter from the Qiskit textbook, Quantum Feature Maps and Kernels, would also be useful.
Watch the video Quantum Machine Learning - 33 - Quantum-Enhanced Kernel Methods 2 (Maria Schuld), noting an alternative circuit one could use to calculate kernels.
It turns out that explicit quantum models (i.e., “quantum support vector machines”) and implicit quantum models (i.e., quantum kernels) are equivalent; the paper Supervised quantum machine learning models are kernel methods explains how. The significance of this work is that it provides a unifying perspective on these two topics, and why the note just before this section was included. This work also makes the point that in terms of training such models, the kernel-based approach might offer some benefits.
Why do quantum kernels matter?
While quantum kernels are certainly interesting as a topic unto themselves, having some justification or explanation as to why they might be important would be nice. Within a couple of years of the introduction of the idea, it was shown that for a very specific problem, a classical ML algorithm with access to quantum kernels can provably outperform any classical ML algorithm which does not.
What’s interesting about this result – beyond formally showing a proof of why quantum kernels matter – is how it is able to make provable statements about the performance of these algorithms by using computational complexity theory and learning theory.
Read the blog post Quantum kernels can solve machine learning problems that are hard for all classical methods for an overview of both the set up and the problem itself.
Watch the video A Rigorous & Robust Quantum Speed-up in Supervised ML with Quantum Kernels if you are interested in the details.
The paper upon which the video is based: A Rigorous & Robust Quantum Speed-up in Supervised Machine Learning (Warning: not necessarily for the faint of heart! [Which includes me….])
The above result found a particular problem where quantum kernels have a provable benefit. Another work, Universal expressiveness of variational quantum classifiers and quantum kernels for support vector machines, shows that an entire family of problems exist with such a benefit. As a consequence, this work shows that, in principle, you can use quantum kernels to solve problems which cannot be efficiently solved using classical computers.
What have people done with quantum kernels?
This is probably the question you were asking yourself at the start of the post, no? There have been many projects applying quantum kernels in the context of real-world data sets. While no evidence of ‘advantage’ has been found, strictly speaking, these projects do highlight how researchers have started work to make quantum kernels practically useful.
The list of papers below isn’t intended to be a comprehensive bibliography on the matter. After it, I will share some thoughts on what I think a ‘good’ project could look like.
Quantum Machine Learning Framework for Virtual Screening in Drug Discovery: a Prospective Quantum Advantage presents a framework for using quantum kernels to facilitate screening drug candidates. A video is here.
Detecting Clouds in Multispectral Satellite Images Using Quantum-Kernel Support Vector Machines describes a project using quantum kernels for processing satellite data.
Application of Quantum Machine Learning using the Quantum Kernel Algorithm on High Energy Physics Analysis at the LHC uses quantum kernels to re-analyze some of the data used to detect the Higgs boson.
Quantum artificial vision for defect detection in manufacturing takes a look at (pun intended) using quantum kernels to process image data arising from a manufacturing process, and compares performance of a quantum-enhanced classifier to other methods.
Quantum Machine Learning for Software Supply Chain Attacks: How Far Can We Go? uses quantum kernels to process data from code bases to understand whether there are threats.
Unsupervised quantum machine learning for fraud detection uses quantum kernels in a financial context, studying how they can enhance fraud detection.
Mixed Quantum-Classical Method For Fraud Detection with Quantum Feature Selection tackles a similar problem as the above paper, but through the lens of developing a method for feature selection.
QuASK -- Quantum Advantage Seeker with Kernels is about a framework (and software implementation) for helping data scientists and other ML practitioners more rapidly assess the potential of a given quantum kernel. A video is here.
Quantum Kernels for Real-World Predictions Based on Electronic Health Records presents a method for evaluating whether using quantum kernels on a given small-scale data set could be advantageous. A blog post about this work is here.
So…what does a ‘good’ project involving quantum kernels look like?
In light of the papers above, it might be useful to share some thoughts about what a good project could look like. From a big-picture level, a lot of the work people have done to date can be thought of as simply getting a handle on the idea of quantum kernels and a software package or two for coding them up. In my opinion, while that sort of work does remain interesting (after all, there are lots of data-science-like projects out there!), there are a few things one could do to take a project to the next level.
Scale the number of qubits
As you’ll see from the above papers, typically the number of qubits used is rather small – on the order of 10s and the like. Circuits acting on that number of qubits can be simulated classically, so there’s no real use in using quantum hardware. Finding a problem where you need on the order of at least 50 qubits would be great, as that starts to approach the limit(s) of what a typical end-user could figure out how to simulate classically with modest resources. If you really want to be audacious, go for at least 100 qubits. A straightforward way to scale the number of qubits is to scale the number of features. This would mean finding a problem where having a large number of features is essential.
Go beyond ‘default’ feature maps, and investigate hardware-efficient ones
More work is needed to thoughtfully choose the feature map used to encode the classical data. Many projects end up using a feature map introduced in the paper Supervised Learning Using Quantum-Enhanced Feature Spaces, citing the fact that feature map is conjectured to be hard to simulate classically. In my opinion, it would be better to use hardware-efficient feature maps (where the 2-qubit gates in the circuit match the native 2-qubit interactions supported by the hardware), because you might have a better chance of reaching higher circuit depths without suffering as much of a performance degradation from noise for a given number of qubits. This allows you to push for deeper circuits (which, generally, can be harder to simulate). Now, these feature maps do have their own problems (we’ll get to the ‘barren plateau’ phenomenon later in this guide).
Make good-faith efforts to use state-of-the-art classical ML
I think people end up getting a little fixated on using quantum-enhanced ML such that they don’t do the legwork necessary to really use more advanced classical ML. In particular, if there are common or accepted methods used to solve the problem you are looking at, then you also need to do those methods justice in the project. The reason being that it’s not really appropriate to be comparing quantum-enhanced ML to a naive or non-relevant classical method. It also introduces a failure point, wherein someone else comes along and uses one of those state-of-the-art methods to surpass both your quantum-enhanced and classical ones.
Use rigorous assessments for potential advantage
Too many projects end up saying there might be some kind of quantum advantage because the quantum-enhanced model performs better than the classical one. (And, like the point above mentions, projects sometimes use the wrong classical approach and end up with poor classical performance as a result). However, as point 1 above notes, projects also tend to use a small number of qubits...for most of those, one could just use a classical computer to simulate the circuits. Hence, it’s not so reasonable to imply there is a kind of quantum advantage.
There are a couple of quantities introduced in the literature which can provide more rigorous assessments about whether some kind of quantum advantage can be found with a given data set:
The ‘geometric difference’, introduced in Power of Data in Quantum Machine Learning. (A video is here.) This quantifies how easily a quantum kernel can be replicated (modeled) using a classical one. It has the advantage that it only depends on the kernel values themselves, not the actual labels for the data. A software package which makes computing the geometric difference easy is described in QuASK -- Quantum Advantage Seeker with Kernels.
The ‘phase space terrain ruggedness index’, introduced in Quantum Kernels for Real-World Predictions Based on Electronic Health Records. This index quantifies how performance varies as you change the number of training samples and features in the data set. It’s pretty straightforward to calculate, though it does require you to train the model(s) for each data set.
Studying wall-clock time scaling & identify bottlenecks
By ‘wall-clock time’, I mean the actual runtime needed to calculate a set of quantum kernels for a given data set. If you are going to investigate using quantum kernels in a production (or production-like) environment, then you will need to figure out how the overall runtime of your workflow changes by introducing quantum kernels. And you’ll need to understand how changing the data set (in terms of its size or number of features) impacts that runtime.
There’s a common mythos in the popular science literature that quantum computers can process large data sets blindingly fast. This is false, and for quantum-enhanced machine learning, means the community needs to get a handle on the kinds of data sets where quantum computers can process them in a reasonable amount of time. Here, ‘reasonable’ is going to mean ‘reasonable for the specific workflow you are dealing with’. Maybe your workflow is only triggered every week, so you could spend several day’s worth of quantum compute runtime without slowing it down. Or maybe your workflow needs to run on the order of minutes. In that case, the time spent calculating kernels might actually slow down the overall workflow.
There are 2 places where latencies and delays kick in:
Having jobs (sets of quantum circuits) waiting in a queue with other jobs from other users. Exactly how long a job sits in the queue depends on the vagaries of the entity provisioning your access. An empirical study is performed in the paper Quantum Computing in the Cloud: Analyzing job and machine characteristics.
Actually running the job. Once the job is pulled from the queue, then the circuits contained within it need to be run. Over the past year, I’ve worked on a project modeling this runtime, which you can find at A Model for Circuit Execution Runtime And Its Implications for Quantum Kernels At Practical Data Set Sizes. Figure 6 is relevant for this discussion: using our model, we find that processing – at full scale – some of the flash flood datasets we used last summer would currently take prohibitively long at today’s system speeds.
In practice right now, the queue time is much longer than the actual runtime of the job. But as the paper above notes, scaling the job to encompass practical data set sizes will scale out the runtime of the job itself.
End-users should be collecting runtime information to help them see how large of data sets they can process in a reasonable amount of time given their workflows and considerations thereof.
In sum, this post is about quantum kernels, a quantum-enhanced machine learning algorithm for calculating a similarity measure (kernel) between pieces of classical data. Kernels are ubiquitous in classical ML, and quantum kernels provide a ‘drag-and-drop’ replacement for data scientists and other ML practitioners who want to use quantum-enhanced ML.
We took a look at what quantum kernels are, why they matter, and what some people have been doing with them. I shared some thoughts on some considerations as to what a ‘good’ project with them looks like.
In the next installment of this guide, we’ll take a look at some additional limitations to using quantum kernels. We already saw 2 in this installment; namely, there might be purely-classical kernels which are just as good as quantum ones, and that the quantum compute runtime for calculating them might become prohibitively large for large data sets.
Please let me know how you are finding this guide in the comments below!