An Empirical Approach For Probing the Definiteness of Kernels**preprint submitted to Soft Computing journal, Springer (2024)

11institutetext: Martin Zaefferer (corresponding author) 22institutetext: Faculty of Computer Science and Engineering Science,
TH Köln - University of Applied Sciences,
Steinmüllerallee 1, 51643 Gummersbach, Germany
Telephone: +49 2261 8196 6327
22email: Martin.Zaefferer@th-koeln.de
33institutetext: Thomas Bartz-Beielstein 44institutetext: Faculty of Computer Science and Engineering Science,
TH Köln - University of Applied Sciences,
Steinmüllerallee 1, 51643 Gummersbach, Germany
55institutetext: Günter Rudolph 66institutetext: Department of Computer Science, TU Dortmund University
Otto-Hahn-Str. 14, 44227 Dortmund, Germany

Martin Zaefferer  Thomas Bartz-Beielstein  Günter Rudolph

(Received: date / Accepted: date)

Abstract

Models like support vector machines orGaussian process regression often require positivesemi-definite kernels. These kernels may be basedon distance functions.While definiteness isproven for common distances and kernels, a proof for a new kernel mayrequire too much time and effort for users who simply aim at practical usage.Furthermore, designing definite distances or kernels may be equallyintricate.Finally, models can be enabled to use indefinite kernels.This may deteriorate the accuracy or computational cost of the model.Hence, an efficient method to determine definiteness is required.We propose an empirical approach. We show that sampling as well asoptimization with an evolutionary algorithm may be employedto determine definiteness.We provide a proof-of-concept with 16 different distance measures for permutations.Our approach allows to disprove definiteness if a respective counter-example is found.It can also provide an estimate ofhow likely it is to obtain indefinitekernel matrices.This providesa simple, efficient tool to decidewhether additional effort should be spent ondesigning/selecting a more suitable kernel or algorithm.

Keywords:

Definiteness Kernel Distance Sampling Optimization Evolutionary Algorithm

journal: Soft Computing

1 Introduction

The definiteness of kernels and distances is an important issuein statistics and machine learning(Feller, 1971; Vapnik, 1998; Schölkopf, 2001).One application that recently gained interest is the field of surrogate model-basedcombinatorial optimization(Moraglio and Kattan, 2011; Zaefferer etal., 2014b ; Bartz-Beielstein and Zaefferer, 2017).Continuous distance measures are replaced by distance measures that are adequatefor the respective search space (e.g., permutation distances or string distances).Such a measure will have an effect on the definiteness of the employed kernel function.For arbitrary problems, practitioners may come up with any kind of suitable distance measure or kernel.

While it is easyto determine definiteness of matrices, determining the definiteness of a function is not as simple.Proving definiteness by theoretical means may be infeasible in practice(Murphy, 2012, p.482; Ong etal., 2004).It may be equally difficult to design a function to be definite.Finally, algorithms may be adapted to handle indefinite kernels.These adaptations usually have a detrimental impact on the computational effortor accuracy of the derived model.Hence, this study tries to answerthe following two research questions:

Q1subscriptQ1\text{Q}_{1}

Discovery: Is there an efficient, empirical approach to determine thedefiniteness of kernel functions based on arbitrary distance measures?

Q2subscriptQ2{\text{Q}_{2}}

Measurability: If Q1subscriptQ1\text{Q}_{1} can be answered affirmatively,can we quantify to what extent a lack of definiteness is problematic?

Q1subscriptQ1\text{Q}_{1} tries to find an answer to the general question of whether or not a function is definite.Measurability (Q2subscriptQ2{\text{Q}_{2}}) is important, asthis may allow determining the impact that indefiniteness has in practice.For instance, if a kernel rarely produces indefinite matrices, an optimization or learning process that exploresonly a small subset of the search space may not be negatively affected.Therefore, this article proposes two approaches.

A1subscriptA1{\text{A}_{1}}

Sampling: Random sampling is used to determine the proportion of solution sets withindefinite distance or kernel matrices for a given setting.

A2subscriptA2{\text{A}_{2}}

Optimization: Maximizing the largest eigenvalue related to a certain solution set with an Evolutionary Algorithm (EA), hence findingindefinite cases even when they are rare.

If either approach detects an indefinite matrix, the respective kernel function is demonstrated to beindefinite. On the other hand, if no indefinite matrix is detected,definiteness of the function is not proven. Still, this indeterminate resultmay indicate that indefiniteness is at least unlikely to occur. Hence, the function may be treated as unproblematicin practical use.

Section2 provides the background of the methods presented inSection3.The experimental setup for a proof-of-concept, based on permutation distance measures, is described in Section4.Results of the experiments are analyzed and discussed in Section5.Finally, a summary of this work as well as an outlook on future research directions is given in Section6.

2 Background: Distances, Kernels and Definiteness

2.1 Distance Measures

Distance measures compute the dissimilarity of two objects x,x𝒳𝑥superscript𝑥𝒳x,x^{\prime}\in\mathcal{X},where we do not assume anything about the nonempty set 𝒳𝒳\mathcal{X}.Such objects can be, e.g., permutations, trees, strings, or vectors of real values.Thus, a distance measured:𝒳×𝒳+:𝑑𝒳𝒳subscript{d}:~{}\mathcal{X}\times\mathcal{X}\to\mathbb{R}_{+}expresses a scalar, numerical value d(x,x)𝑑𝑥superscript𝑥{d}(x,x^{\prime}) that should becomelarger the more distinct the objects x𝑥x and xsuperscript𝑥x^{\prime} are.For a set of n𝑛n\in\mathbb{N} objects, the distance matrix D𝐷D of dimension n×n𝑛𝑛n\times ncollects all pairwise distances Dij=d(xi,xj)subscript𝐷𝑖𝑗𝑑subscript𝑥𝑖subscript𝑥𝑗D_{ij}={d}(x_{i},x_{j}) with i=1,,n𝑖1𝑛i=1,\ldots,n and j=1,,n𝑗1𝑛j=1,\ldots,n.Intuitively, distances can be expected to satisfy certain conditions, e.g., theyshould be zero when comparing identical objects.

A more formal definition is implied by the term distance metric.A distance metric d(x,x)𝑑𝑥superscript𝑥{d}(x,x^{\prime}) is symmetric d(x,x)=d(x,x)𝑑𝑥superscript𝑥𝑑superscript𝑥𝑥{d}(x,x^{\prime})={d}(x^{\prime},x), non-negative d(x,x)0,𝑑𝑥superscript𝑥0{d}(x,x^{\prime})\geq 0,preserves identity d(x,x)=0x=x,iff𝑑𝑥superscript𝑥0𝑥superscript𝑥{d}(x,x^{\prime})=0\iff x=x^{\prime}, and satisfies the triangle inequalityd(x,x′′)d(x,x)+d(x,x′′).𝑑𝑥superscript𝑥′′𝑑𝑥superscript𝑥𝑑superscript𝑥superscript𝑥′′{d}(x,x^{\prime\prime})\leq{d}(x,x^{\prime})+{d}(x^{\prime},x^{\prime\prime}).Distance measures that do not preserve identity are often called pseudo-metrics.

An important class of distance measures are edit distance measures.Edit distance measures can be defined to count the minimal number of edit operationsrequired to transform one object into another. An edit distancemeasure may concern one specific edit operation (e.g., only swaps)or a set of different operations (e.g., Levenshtein distance with substitutions, deletions, insertions).Edit distances usuallysatisfy the metric axioms.

2.2 Kernels

In the following, a kernel (also: kernel function, similarity measure or correlation function) is defined asa real valued function k(x,x)𝑘𝑥superscript𝑥k(x,x^{\prime}) with

k:𝒳×𝒳(x,x)k(x,x):𝑘𝒳𝒳𝑥superscript𝑥maps-to𝑘𝑥superscript𝑥\begin{split}k:&~{}\mathcal{X}\times\mathcal{X}\to\mathbb{R}\\&~{}(x,x^{\prime})\mapsto k(x,x^{\prime})\end{split}(1)

that will usually be symmetric k(x,x)=k(x,x)𝑘𝑥superscript𝑥𝑘superscript𝑥𝑥k(x,x^{\prime})=k(x^{\prime},x) andnon-negative k(x,x)0𝑘𝑥superscript𝑥0k(x,x^{\prime})\geq 0(Murphy, 2012, p. 479).Kernels can be based on distance measures, i.e., k(d(x,x))𝑘𝑑𝑥superscript𝑥k({d}(x,x^{\prime})).

2.3 Definiteness of Matrices

One important property of kernels and distance measures is their definiteness.We refer to the literature for more in-detail descriptions and proofs, which arethe basis for the following paragraphs(Berg etal., 1984; Schölkopf, 2001; Camastra and Vinciarelli, 2008).

First, we introduce the concept of matrix definiteness.A symmetric, square matrix A𝐴A of dimension n×n𝑛𝑛n\times n (n𝑛n\in\mathbb{N}) is positive definite (PD) if and only if

i=1nj=1ncicjAij>0,superscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑛subscript𝑐𝑖subscript𝑐𝑗subscript𝐴𝑖𝑗0\sum_{i=1}^{n}\sum_{j=1}^{n}c_{i}c_{j}A_{ij}>0,

for all cn{0}𝑐superscript𝑛0c\in\mathbb{R}^{n}\setminus\{0\}. This is equivalent toall eigenvalues λ1λ2λnsubscript𝜆1subscript𝜆2subscript𝜆𝑛\lambda_{1}\leq\lambda_{2}\leq\ldots\leq\lambda_{n} of the matrix A𝐴A being positive.Due to symmetry, the eigenvalues are λn𝜆superscript𝑛\lambda\in\mathbb{R}^{n}.Respectively, a matrix is negativedefinite (ND) if all eigenvalues are negative.If eigenvalues are non-negative (i.e., some are zero) or non-positive, the matrix is respectivelycalled Positive or Negative Semi-Definite (PSD, NSD).If mixed signs are present in the eigenvalues, the matrix may be called indefinite.Kernel (or correlation, covariance) matrices are examples of matrices thathave to be PSD.

A broader class of matrices are Conditionally PSD or NSD (CPSD, CNSD).Here, the coefficients satisfy

i=1nci=0,superscriptsubscript𝑖1𝑛subscript𝑐𝑖0\sum_{i=1}^{n}c_{i}=0,(2)

with n>1𝑛1n>1.All PSD (NSD) matrices are CPSD (CNSD).To check conditional definiteness, let the n×n𝑛𝑛n\times n matrix P𝑃P be

P=(I(n1)𝐞𝐞T/n𝐞/n(0,,0)1)𝑃matrixsubscript𝐼𝑛1superscript𝐞𝐞𝑇𝑛𝐞𝑛001P=\begin{pmatrix}I_{(n-1)}-\mathbf{e}\mathbf{e}^{T}/n~{}~{}~{}~{}&\mathbf{e}/n\\(0,\ldots,0)~{}~{}~{}~{}&1\end{pmatrix}

with 𝐞=(1,,1)T𝐞superscript11𝑇\mathbf{e}=(1,\ldots,1)^{T}, andB=PAPT𝐵𝑃𝐴superscript𝑃𝑇B=PAP^{T}. Then, A𝐴A is CNSD if and only if

A^=Bn1^𝐴subscript𝐵𝑛1\hat{A}=B_{n-1}(3)

is NSD(Ikramov and Savel’eva, 2000, Algorithm 1).Here, Bn1subscript𝐵𝑛1B_{n-1} is the leading principal submatrix of B𝐵B, thatis, the last column and row of B𝐵B are removed.

2.4 Definiteness of Kernel Functions

In a similar way to the definiteness of matrices, definiteness can also be determined for kernel functions.The upcoming description roughly follows the definitions and notations byBerg etal., (1984)andSchölkopf, (2001).For the nonempty set 𝒳𝒳\mathcal{X}, a symmetric kernel k𝑘k is called PSD if and only if

i=1nj=1ncicjk(xi,xj)0,superscriptsubscript𝑖1𝑛superscriptsubscript𝑗1𝑛subscript𝑐𝑖subscript𝑐𝑗𝑘subscript𝑥𝑖subscript𝑥𝑗0\sum_{i=1}^{n}\sum_{j=1}^{n}c_{i}c_{j}k(x_{i},x_{j})\geq 0,

for all n𝑛n\in\mathbb{N}, x𝒳𝑥𝒳x\in\mathcal{X} and 𝐜n𝐜superscript𝑛\mathbf{c}\in\mathbb{R}^{n}.A PSD kernel will always yield PSD kernel matrices.

An important special case are conditionally definite functions.Analogous to the matrix case, they observe the respective condition in equation(2).One example of a CNSD function is the Euclidean distance.The importance of CNSD functions is also due to the fact that the distance measured(x,x)𝑑𝑥superscript𝑥{d}(x,x^{\prime}) is CNSDif and only if the kernel k(x,x)=exp(θd(x,x))𝑘𝑥superscript𝑥𝜃𝑑𝑥superscript𝑥k(x,x^{\prime})=\exp(-\theta{d}(x,x^{\prime})) is PSD θ>0for-all𝜃0\forall~{}\theta>0 (Schölkopf, 2001, Proposition 2.28).

It is often stated that kernels must be PSD (Rasmussen and Williams, 2006; Curriero, 2006).Still, even an indefinite kernel function may yield a PSD kernel matrix.This depends on the specific data set used to train the model(Burges, 1998);(Li and Jiang, 2004, Theorem 2)as well as the parameters of the kernel function.Some frequently used kernels are known to be indefinite.Examples are the sigmoid kernel(Smola etal., 2000; Camps-Valls etal., 2004) or time-warp kernels for time series(Marteau and Gibet, 2014).

To handle the issue of a kernel’s definiteness, different (not mutually exclusive) approaches can be found in the literature.

  • Proving:Definiteness of a specific function can be proven (or disproven) by theoretical considerationsin some cases (cf.(Berg etal., 1984)). For complex cases, or practitioners this may be aninfeasible approach(Ong etal., 2004).

  • Designing:Functions can be designedto be definite(Haussler, 1999; Gärtner etal., 2003; Marteau and Gibet, 2014).Especially noteworthy are the so called convolution kernels(Haussler, 1999), asthey provide a method to construct PSD kernels for structured data.For a similar purpose, Gärtner etal., (2004) show how to design asyntax based PSD kernel for structured data.However, convolution kernels may be hard to design(Gärtner etal., 2004).Also, kernels and distance measures may be predetermined for a certain application.

  • Adapting:Algorithms or kernel functions may be adaptedto be usable despite a lack of definiteness.This, however, may affect the computational effort or accuracy of the derived model.Some approaches alter matrices (or rather, their eigenspectrum),hence enforcing PSDness(Wu etal., 2005; Chen etal., 2009; Zaefferer and Bartz-Beielstein, 2016).For the case of SVMs, Loosli etal., (2015) provide a nice comparison ofvarious approaches of this type and propose an interesting new solution to the issueof indefinite kernels based on learning in Krein spaces.A recent survey is given bySchleif and Tino, (2015).

Between these three approaches, there is a lack of straightforward empirical procedures,without resorting to complex theoretical reasoning.How can the definiteness of a function be determined? And what impact doesa lack of definiteness have on a model?

Hence, we propose the two related empirical approaches A1subscriptA1{\text{A}_{1}} and A2subscriptA2{\text{A}_{2}} introduced inSec.1 to fill the gap.An empirical approach may help to overcome the difficulty of theoretical considerationsor designed kernels.Empirical results may also be a starting point for a more formal approach.Furthermore, it may give a quick answer on whether or not the algorithm will have to be adaptedfor non-PSD matrices(which, if applied by default, would require additional computational effort and may limit the accuracy of derived models).

3 Methods for Estimating Definiteness

We propose an experimental approachto determine and analyze definiteness (A1subscriptA1{\text{A}_{1}}, A2subscriptA2{\text{A}_{2}}).As a test case, we determine the definiteness of a distance-based exponential kernel, k(x,x)=exp(θd(x,x))𝑘𝑥superscript𝑥𝑒𝑥𝑝𝜃𝑑𝑥superscript𝑥k(x,x^{\prime})=exp(-\theta d(x,x^{\prime})).The kernel is definite if the underlying distance function is CNSD.For a given distance matrix, CNSDness is determined by the largest eigenvalue λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D},based on equation(3).D𝐷D is not CNSD if λn>0subscript𝜆𝑛0\lambda_{n}>0.We could also probe the kernel matrix K𝐾K, but in this case the kernel parameter θ𝜃\thetawould have to be dealt with.

Of course, we cannot simply check definiteness of a single matrix D𝐷D,since this would be only one possible outcome of the respective kernel or distance function.Hence, a large number of solution sets with respective distance matrices has to be generatedto determine whether anyof the matrices are CNSD (research question Q1subscriptQ1\text{Q}_{1}: discovery)and to what extent this may affect a model(Q2subscriptQ2{\text{Q}_{2}}: measurability).For smaller, finite spaces a brute force approach maybe viable. All potential matrices D𝐷D can be enumerated and checked for CNSDness.Since this quickly becomes computational infeasible, we proposeto use sampling or optimization instead.

3.1 Estimating Definiteness with Random Sampling

To estimate the definiteness of a distance or kernel function we propose a simple random sampling approach (A1subscriptA1{\text{A}_{1}}).This approach randomly generates t𝑡t\in\mathbb{N} sets X1,,Xtsubscript𝑋1subscript𝑋𝑡X_{1},\ldots,X_{t}.Each set has size n𝑛n, that is, it contains n𝑛n\in\mathbb{N} candidate samples X={x1,,xn}𝑋subscript𝑥1subscript𝑥𝑛X=\{x_{1},\ldots,x_{n}\}.

For each set, the distance matrix D𝐷D is computed, containing distances between all candidates in the set.Based on this, D^^𝐷\hat{D} is derived from equation(3).Then, the largest eigenvalue λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} is computed.This eigenvalue determines whether D^^𝐷\hat{D} is NSD, and hence whether D𝐷D is CNSD and K𝐾K PSD.This is repeated for all t𝑡t sets.The number of times that the largest eigenvalue is positive (λn>0subscript𝜆𝑛0\lambda_{n}>0) is retained asnλ+subscript𝑛limit-from𝜆n_{\lambda+}.Accordingly, the proportion of non-CNSD matrices is determined withp=nλ+t.𝑝subscript𝑛limit-from𝜆𝑡p=\frac{n_{\lambda+}}{t}.

Obviously, all distance measures that yield p>0𝑝0p>0 are proven to be non-CNSD.Hence, an exponential kernel based on these measures is also proven to be indefinite.If p=0𝑝0p=0, CNSDness is not proven or disproven.

In general, the proposed method can be categorized as a randomized algorithm of the complexity class RP (Motwani and Raghavan, 1995, p.21f.).That is, it stops after polynomially many steps, and if the output is “no” then the distance measure is non-CNSDwith probability 1, and if the output is “yes” then the distance measure is CNSD with some probabilitystrictly bounded from zero.

The parameter p𝑝p is an estimator of how likely a non-CNSD matrix is to occur, for the specified set size n𝑛n.To determine definiteness, the calculation of λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} is not mandatory, but it may be usefulto see how close to zero λnsubscript𝜆𝑛\lambda_{n} is, to distinguish between a pathological case andcases where the matrix is just barely non-CNSD.We will show in Section5.3 that λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} can be linked to model quality.

Note, that inaccuracies of the numerical algorithmused to compute the eigenvalues might lead to an erroneous sign of the largest eigenvalue.To deal with that, one could try to use exact or symbolic methods or else usea tolerance when checking whether the largest eigenvalue is larger than zero. In thelatter case, a matrix D^^𝐷\hat{D} is assumed to be non-NSD if λn>ϵsubscript𝜆𝑛italic-ϵ\lambda_{n}>\epsilon, where ϵitalic-ϵ\epsilonis a small positive number.

3.2 Estimating Definiteness with Directed Search

If very few sets X𝑋X yield indefinite matrices, A1subscriptA1{\text{A}_{1}} may failto find indefinite matrices by pure chance.In such cases, it may be more efficient to replace random sampling with a directed search (A2subscriptA2{\text{A}_{2}}).In detail, a set X𝑋X can itself be viewed as a candidate solution of an optimization problem.The largest eigenvalue λnsubscript𝜆𝑛\lambda_{n} of the transformed distance matrix D^^𝐷\hat{D} is the objective to be maximized.By maximizing the largest eigenvalue, a positive λnsubscript𝜆𝑛\lambda_{n} may be foundmore quickly (and more reliably).

This optimization problem is strongly dependent onthe kind of solution representation used. Evolutionary Algorithms (EAs)are a good choice to solve this problem, because they are applicableto a wide range of solution representations (e.g., real values, strings, permutations, graphs, and mixed search spaces).EAs use principles derived from natural evolution for the purpose of optimization.That is, EAs are optimization algorithms based on a cyclic repetition ofparent selection, recombination, mutation, and fitness selection (see e.g.,Eiben and Smith, (2003)).These operations can be adapted to a large variety of representations, mainly dependingon suitable mutation and recombination operators. The EA in this study will operate as follows:

  • Individual: X𝑋X. A set X={x1,,xn}𝑋subscript𝑥1subscript𝑥𝑛X=\{x_{1},\ldots,x_{n}\} with set size n𝑛n is considered as an individual.Set elements x𝑥x are samples in the actual search or input space, as used throughout Sec.2.

  • Search space: Snsuperscript𝑆𝑛S^{n}. All possible sets X𝑋X of size n𝑛n, i.e., XSn𝑋superscript𝑆𝑛X\in S^{n}.

  • Population: Z𝑍Z. A population of size r𝑟r, containing XkZsubscript𝑋𝑘𝑍X_{k}\in Z with k{1,,r}𝑘1𝑟k\in\{1,\ldots,r\} and ZSn𝑍superscript𝑆𝑛Z\subseteq S^{n}.

  • Objective Function:

    f:SnXλn:𝑓superscript𝑆𝑛𝑋maps-tosubscript𝜆𝑛\displaystyle\begin{split}f:~{}&S^{n}\to\mathbb{R}\\&X\mapsto\lambda_{n}\end{split}(4)

    where λnsubscript𝜆𝑛\lambda_{n} is the largest eigenvalue of the transformed distance matrix D^^𝐷\hat{D} based on equation(3). The objective function f𝑓f is maximized.

  • Mutation: Alteration of an individual.
    Xnew=mutation(X)=subscript𝑋new𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛𝑋absentX_{\text{new}}=mutation(X)=
    {x1,,xj1,submutation(xj),xj+1,,xn,}\{x_{1},\ldots,x_{j-1},submutation(x_{j}),x_{j+1},\ldots,x_{n},\}
    ,with j{1,,n}𝑗1𝑛j\in\{1,\ldots,n\}.For the submutation function, any edit operation that works for a sample x𝑥x can be chosen: xnew=submutation(x)=edit(x)subscript𝑥𝑛𝑒𝑤𝑠𝑢𝑏𝑚𝑢𝑡𝑎𝑡𝑖𝑜𝑛𝑥𝑒𝑑𝑖𝑡𝑥x_{new}=submutation(x)=edit(x).
    For example, in case of permutations, one permutation xjXsubscript𝑥𝑗𝑋x_{j}\in X is chosen and mutated with typical permutation edit-operations (swap, interchange, reversal).The specific edit-operation is called submutation operator, to distinguish between mutation of the individual set X𝑋Xand the submutation of a single permutation xjXsubscript𝑥𝑗𝑋x_{j}\in X.

  • Recombination: Combining two sets. For recombination, two sets are randomly split and the parts of both sets are joined to form a new setof the same size.

  • Repair: Duplicate removal.Mutation and recombination may createduplicates (xi=xjsubscript𝑥𝑖subscript𝑥𝑗x_{i}=x_{j} with ij𝑖𝑗i\neq j). In practice, duplicates are not desirable and are irrelevant tothe question of definiteness.Hence, duplicates are replaced by randomly generated, unique samples xXsuperscript𝑥𝑋x^{*}\notin X.

  • Stopping criterion: Indefiniteness proven or budget exhausted.The optimization can stop when some solution set X𝑋X is found which yields λn>ϵsubscript𝜆𝑛italic-ϵ\lambda_{n}>\epsilon, where ϵitalic-ϵ\epsilonis a small positive number. Alternatively, the process stops if a budget of objective function evaluations is exhausted.

4 Experimental Validation

The proposed approaches can be useful in any case where definiteness of kernels is of interest.The experiments provide a proof of concept of the proposed approaches.Hence, we chose to pick a recent application as a motivation for our experiments:surrogate-model based optimization in permutation spaces (Moraglio and Kattan, 2011; Zaefferer etal., 2014a ).

In many real-world optimization problems, objective functionevaluations are expensive.Sequential modeling and optimization techniques are state-of-the-art in these settings(Bartz-Beielstein and Zaefferer, 2017).Typically, an initial model is built at the first stage of the process. The model will be subsequently refined by adding further datauntil the budget is exhausted. At each stage of this sequential process, available information from the model is used to determinepromising new candidate solutions.This motivates the rather small data set sizes used throughout this study.

4.1 Test Case: Permutations

As a proof-of-concept, we selected 16 distance measuresfor permutations; see Table1 for a complete list.The implementation of these distance measures is taken from the R-package CEGO111The package CEGO is available on CRAN at http://cran.r-project.org/package=CEGO..

Similarly to Schiavinotto and Stützle, (2007) we defineΠmsuperscriptΠ𝑚\Pi^{m} as the set of all permutations of the numbers {1,2,,m}12𝑚\{1,2,\ldots,m\}.A permutation has exactly m𝑚m elements.We denote a single permutation with πΠm𝜋superscriptΠ𝑚\pi\in\Pi^{m} and π={π1,π2,,πm}𝜋subscript𝜋1subscript𝜋2subscript𝜋𝑚\pi=\{\pi_{1},\pi_{2},\ldots,\pi_{m}\} where πisubscript𝜋𝑖\pi_{i} is aspecific element of the permutation at position i𝑖i.For example, a permutation in this notation is π={3,2,1,4,5}Π5𝜋32145superscriptΠ5\pi=\{3,2,1,4,5\}\in\Pi^{5}.Explanations and formulas (where applicable) for the distance measures are given in appendix A.

namecomplexitymetricabbreviation
LevenshteinO(m2)𝑂superscript𝑚2O(m^{2})yesLev
SwapO(m2)𝑂superscript𝑚2O(m^{2})yesSwa
InterchangeO(m2)𝑂superscript𝑚2O(m^{2})yesInt
InsertO(mlog(m))𝑂𝑚𝑙𝑜𝑔𝑚O(mlog(m))yesIns
LC SubstringO(m2)𝑂superscript𝑚2O(m^{2})yesLCStr
RO(m2)𝑂superscript𝑚2O(m^{2})yesR
AdjacencyO(m2)𝑂superscript𝑚2O(m^{2})pseudoAdj
PositionO(m2)𝑂superscript𝑚2O(m^{2})yesPos
Position2O(m2)𝑂superscript𝑚2O(m^{2})noPosq
HammingO(m)𝑂𝑚O(m)yesHam
EuclideanO(m)𝑂𝑚O(m)yesEuc
ManhattanO(m)𝑂𝑚O(m)yesMan
ChebyshevO(m)𝑂𝑚O(m)yesChe
LeeO(m)𝑂𝑚O(m)yesLee
CosineO(m)𝑂𝑚O(m)noCos
LexicographicO(mlog(m))𝑂𝑚𝑙𝑜𝑔𝑚O(mlog(m))yesLex

4.2 Random Sampling

In a single experiment,t=10,000𝑡10000t=10,000 sets of permutations are randomly generated.Each set contains n𝑛n permutations and each permutation has m𝑚m elements.For each set, the largest eigenvalue λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} is computed based on equation(3).To summarize all t𝑡t sets, the largest λnsubscript𝜆𝑛\lambda_{n} as well as the ratio p=nλ+t𝑝subscript𝑛limit-from𝜆𝑡p=\frac{n_{\lambda+}}{t} are recorded.The tolerance value used to check whether the largest eigenvalue is positive is ϵitalic-ϵ\epsilon=1e-10.This process is repeated 10 times, to achieve a reliable estimate of the recorded values.

Two batches of experiments are performed. In the first, all 16 distance measures areexamined, with n={4,,20}𝑛420n=\{4,\ldots,20\} and m={4,,15}𝑚415m=\{4,\ldots,15\}.In the second batch, larger sizes n={21,,40,45,50,60,70,80,90,100}𝑛2140455060708090100n=\{21,\ldots,40,45,50,60,70,80,90,100\} are tested,but the permutations are restricted to m={5,,15}𝑚515m=\{5,\ldots,15\} and the distance measures are only LCStr, Insert, Chebyshev, Levenshtein and Interchange.

4.3 Directed Search

To be comparable to the random sampling approach, the budget for each EA run is10,0001000010,000 fitness function evaluations. A run will stop if the budget isexhausted or if λn>ϵ=1010subscript𝜆𝑛italic-ϵsuperscript1010\lambda_{n}>\epsilon=10^{-10}.The population size of the EA is set to 100. The recombination rate is 0.5, the mutationrate is 1/m1𝑚1/m and truncation selection is used.

To identify bias introducedby the choice of the submutation operator (which may have a strong interaction withthe respective distance measures), each EA run is performed repeatedly withthree different submutation operators:

  • Swap mutation: Transposing two adjacent elements of the permutation.

    π𝜋\displaystyle\pi=π1,,πa,πb,,πmabsentsubscript𝜋1subscript𝜋𝑎subscript𝜋𝑏subscript𝜋𝑚\displaystyle=\pi_{1},\ldots,\pi_{a},\pi_{b},\ldots,\pi_{m}
    πsuperscript𝜋\displaystyle\pi^{*}=π1,,πb,πa,,πm,absentsubscript𝜋1subscript𝜋𝑏subscript𝜋𝑎subscript𝜋𝑚\displaystyle=\pi_{1},\ldots,\pi_{b},\pi_{a},\ldots,\pi_{m},

    with 1a<(m1)1𝑎𝑚11\leq a<(m-1) and b=a+1𝑏𝑎1b=a+1.

  • Interchange mutation: Transposing two arbitrary elements of the permutation.

    π𝜋\displaystyle\pi=π1,,πa1,πa,πa+1,,πb1,πb,πb+1,,πmabsentsubscript𝜋1subscript𝜋𝑎1subscript𝜋𝑎subscript𝜋𝑎1subscript𝜋𝑏1subscript𝜋𝑏subscript𝜋𝑏1subscript𝜋𝑚\displaystyle=\pi_{1},\ldots,\pi_{a-1},\pi_{a},\pi_{a+1},\ldots,\pi_{b-1},\pi_{b},\pi_{b+1},\ldots,\pi_{m}
    πsuperscript𝜋\displaystyle\pi^{*}=π1,,πa1,πb,πa+1,,πb1,πa,πb+1,,πm,absentsubscript𝜋1subscript𝜋𝑎1subscript𝜋𝑏subscript𝜋𝑎1subscript𝜋𝑏1subscript𝜋𝑎subscript𝜋𝑏1subscript𝜋𝑚\displaystyle=\pi_{1},\ldots,\pi_{a-1},\pi_{b},\pi_{a+1},\ldots,\pi_{b-1},\pi_{a},\pi_{b+1},\ldots,\pi_{m},

    with 1am1𝑎𝑚1\leq a\leq m and 1bm1𝑏𝑚1\leq b\leq m.

  • Reversal mutation: Reversing a substring of the permutation.

    π𝜋\displaystyle\pi=π1,,πa,πa+1,,πb1,πb,,πmabsentsubscript𝜋1subscript𝜋𝑎subscript𝜋𝑎1subscript𝜋𝑏1subscript𝜋𝑏subscript𝜋𝑚\displaystyle=\pi_{1},\ldots,\pi_{a},\pi_{a+1},\ldots,\pi_{b-1},\pi_{b},\ldots,\pi_{m}
    πsuperscript𝜋\displaystyle\pi^{*}=π1,,πb,πb1,,πa+1,πa,,πm,absentsubscript𝜋1subscript𝜋𝑏subscript𝜋𝑏1subscript𝜋𝑎1subscript𝜋𝑎subscript𝜋𝑚\displaystyle=\pi_{1},\ldots,\pi_{b},\pi_{b-1},\ldots,\pi_{a+1},\pi_{a},\ldots,\pi_{m},

    with 1a<bm1𝑎𝑏𝑚1\leq a<b\leq m.

All 16 distance measures aretested, with
n={4,,20}𝑛420n=\{4,\ldots,20\} and m={4,,15}𝑚415m=\{4,\ldots,15\}. With ten repeats, and the three differentsubmutation operators, this results into 97,9209792097,920 EA runs, each with 10,0001000010,000 fitness function evaluations.The employed EA implementation is part of the R-package CEGO.

4.4 Tests For Other Search Domains

To show that the proposed approach is not limited to the presented permutation distance example, wealso briefly explore other search domains and their respective distances. However, these are not analyzed in further detail. Instead,we provide a list of minimal examples in Appendix B: the smallest (w.r.t.dimension) indefinite distance matrix for eachtested distance measure. The examples include distance measures for permutations, signed permutations,trees, and strings.

5 Observations and Discussion

5.1 Sampling Results

The proportions of sets with positive eigenvalues (p𝑝p) are summarized in Fig.1.The largest eigenvalues are depicted in Fig.2.Only the five indefinite distance measures, which achieved positive eigenvalues are shown:

An Empirical Approach For Probing the Definiteness of Kernels**preprint submitted to Soft Computing journal, Springer (1)
An Empirical Approach For Probing the Definiteness of Kernels**preprint submitted to Soft Computing journal, Springer (2)

Longest Common Substring, Insert, Chebyshev, Levenshtein, Interchange.

No counter-examples are found for the remaining eleven measures. That does notprove that they are CNSD (although some are CNSD, e.g., Euclidean distance, Swapdistance(Jiao and Vert, 2015), Hamming distance(Hutter etal., 2011)),but indicates that it may be unproblematic to use them in practice.Some of the five non-CNSD distance measures were reported to work wellin a surrogate-model optimization framework, e.g., Levenshtein distanceseemed to work well for modeling of scheduling problems(Zaefferer etal., 2014a ).This is mainly due to the fact, that even a non-CNSD distance may yieldPSD kernel matrix K𝐾K, depending on the specific data set and kernel parameters used.We do not suggest that non-CNSD distance measures should be avoided, but thattheir application should be handled with care.

Regarding values of p𝑝p (Fig.1),some trends can be observed.For indefinite distance measures, increasing theset size(n𝑛n) will in general lead to larger values of p𝑝p. Obviously, a larger set is more likely to contain combinations of samples that yieldnegative eigenvalues. In addition, the lower bound for eigenvalues decreases with increasingmatrix size(Constantine, 1985).

In contrast to the set size, increasing the number of permutation elements (m𝑚m) decreases the proportion of positive eigenvalues p𝑝p in all five cases. This can be attributed to a larger and hence more difficult search space.Overall, none of the distance measures shows exactly the same behavior. LCStr distancehas the least problematic behavior. Only very few sets (of comparatively large size) yielded positive eigenvalues with LCStr distance.Interchange, Levenshtein and Insert distance all have relatively large p𝑝p for small set sizes n𝑛n. Chebyshev on the other hand,starts to have non-zero p𝑝p for relatively large n𝑛n. However, the number of permutation elements has only a weak influencein case of Chebyshev distance. Hence, curves for different m𝑚m are much closer to each other, compared to the other distance measures.Somewhat analogous to p𝑝p, the largest eigenvalues of D^^𝐷\hat{D} plotted in Fig.2 are generally increasing for larger n𝑛n,and decreasing with larger m𝑚m.

Our findings are confirmed by some results from literature.Cortes etal., (2004) have shownthat an exponential kernel function based on the Levenshtein distancesis indefinite for strings of more than one symbol. Our experiments show that this result can be easilyrediscovered empirically, for the case of permutations.At the same time, these findings also confirm (and provide reasons for)problems observed with these kinds of kernel functions in a previous study(Zaefferer etal., 2014a ).

As a consistency check, Table2 compares the samplingresults to a brute force approach for n={4,,8}𝑛48n=\{4,\ldots,8\} and m=4𝑚4m=4.It shows that the sampling indeed approximatesthe number of non-CNSD matrices quite well. In this small set, the sampling identified all combinations of distance measure and n𝑛n that may yield non-CNSD matrices.

ndistancenssubscript𝑛𝑠n_{s}trueestimate
5Insert42,5040.0470.048
6Insert134,5960.2210.219
7Insert346,1040.5300.529
8Insert735,4710.8270.825
5Interchange42,5040.1060.105
6Interchange134,5960.4650.463
7Interchange346,1040.8330.834
8Interchange735,4710.9780.978
5Levenshtein42,5040.0870.087
6Levenshtein134,5960.2930.293
7Levenshtein346,1040.5910.589
8Levenshtein735,4710.8470.846
5LCStr42,5040.0020.002
6LCStr134,5960.0070.007
7LCStr346,1040.0260.026
8LCStr735,4710.0930.092

5.2 Optimization Results

The average number of fitness function evaluations (i.e., number of times λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} is computed)required to find a matrix D^^𝐷\hat{D} with positive λnsubscript𝜆𝑛\lambda_{n} is depicted in Fig.3.

An Empirical Approach For Probing the Definiteness of Kernels**preprint submitted to Soft Computing journal, Springer (3)

The optimization results show similar behavior with respect to n𝑛n and m𝑚m as the sampling approach. Increasing m𝑚mleads to an increased number of fitness function evaluations. That means, finding positive eigenvalues becomes more difficult with increasingvalues of m𝑚m.Increasing n𝑛n reduces the number of required fitness function evaluations. That is, finding positive eigenvalues becomes easier.In some cases, this effect disappears for large values of m𝑚m, e.g., for LCStr distance, where the average is more or less constant overn𝑛n, if m𝑚m is large enough.

Importantly, the comparison to the sampling results clearly shows that the EA has some success in optimizing the largest eigenvalueof the transformed distance matrix D^^𝐷\hat{D}.In several cases, positive eigenvalues are found by the EA while sampling with the same budgetfailed to find any.Hence, it can be assumed that the fitness landscape based on λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} is sufficiently smooth to allow for optimization.The eigenvalue λnsubscript𝜆𝑛\lambda_{n} seems to be a good indicator of how close a solution set is to yielding an indefinite kernel matrix.

Furthermore, the expected bias of the used submutation operator becomes visible.For Insert and LCStr distance, the EA with swap mutation works considerably better than the other two variants.Hence, comparisons of these values across different distance measures should be handled with caution.Clearly, other aspects of the optimization algorithm (e.g., selection criteria or recombination operators) might have similar effects.While this bias is troubling, results may still offer interesting insights. The eigenvalue optimizationcould be interpreted as a worst-case scenario that occurs if an iterative learning process strongly correlates with theeigenvalues of the employed distance matrix.

5.3 Verification: Impact on Model Quality

Earlier, we discussed two valuesthat may express the effect of the lack of definitenessin practice, i.e., p𝑝p and λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D}.But what do these values imply?

The value p𝑝p can be seen as a probability of generating indefinite matrices.If we assume that a model is unable to deal with indefinite data,the fraction p𝑝p is an estimate of how likely a modeling failure is.For other cases, it is very hard to link it to a model performance measuresuch as accuracy without making too many assumptions.We still argue that p𝑝p provides useful information, especiallywhen the kernel is designed and probed before sampling any data (e.g.,when planning an experiment).We suggest to use p𝑝p to support an initial decision (e.g., whetherto spend additional time on fixing or otherwise dealing with the indefinite kernel).One advantage is that it is rather easy to interpret.

In contrast, the parameter λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} is more difficult to interpret.But it has the advantage that it may be estimated for a single matrixas well as its average for a set of matrices. Hence, we want to determinewhether the magnitude of this eigenvalue affects model performance.We expect an influence that depends on the choice of model.Consider, e.g., a Gaussian process regression model, as e.g., described by Forrester etal., (2008).The model may be able to mitigate the problematic eigenvalueby assigning larger θ𝜃\theta values to the kernel k(x,x)=exp(θd(x,x))𝑘𝑥superscript𝑥𝜃𝑑𝑥superscript𝑥k(x,x^{\prime})=\exp(-\theta{d}(x,x^{\prime})).For very large λnsubscript𝜆𝑛\lambda_{n} and thus very large θ𝜃\theta, this will lead to kernel matricesthat approximate the unit matrix, which is positive definite.A model with a unit kernel matrix would be able to reproducethe training data, but would predict the process mean for most other data points.Hence, we examine Gaussian process regression models, since they providea transparent and interpretable test case (but similar experimentscould easily be made with support vector regression).

An experimental test has to consider the potential bias of the used data set.We need to be able to reasonably assume that differences in performanceare actually due to the properties of employed distance or kernel(i.e., a kernel performs poorly because the corresponding λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} is high)rather than properties of the data set(i.e., a kernel performs poorly because it does not fit well to the groundtruth of the data set).To that end, we suggest that observations in a test data set are derived fromthe same distances that are used in the model.

Hence, we randomly created data sets X𝑋X of size n𝑛n with permutations of dimension m𝑚m,similarly to the random sampling performed earlier.Then, we created training observations by evaluating the distance of each permutation in X𝑋Xto a reference permutation xref=1,,msubscript𝑥𝑟𝑒𝑓1𝑚x_{ref}={1,...,m}, i.e., y=d(x,xref)𝑦𝑑𝑥subscript𝑥𝑟𝑒𝑓y={d}(x,x_{ref}).A Gaussian process model was then trained with this data, largely followingthe descriptions inForrester etal., (2008).The model was trained with the kernel k(x,x)=exp(θd(x,x))𝑘𝑥superscript𝑥𝜃𝑑𝑥superscript𝑥k(x,x^{\prime})=\exp(-\theta{d}(x,x^{\prime})),and the model parameters (e.g., θ𝜃\theta) were determined by maximum likelihood estimation, via the locally biasedversion of the DIviding RECTangles (DIRECT) algorithm (Gablonsky and Kelley, 2001)with 1,000 likelihood evaluations.For each test, the distance chosen to produce the observations y𝑦y and the distance chosen in the kernel were identical.

The Root MeanSquare Error (RMSE) of the model was evaluated on 1000 randomly chosen permutations.The resulting RMSE values for each training set X𝑋X, as well as the correspondingeigenvalue λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D} are shown in Fig.4.The graphic shows a trend that confirms our expectation. It seems that distancesassociated to larger λnsubscript𝜆𝑛\lambda_{n} tend to produce larger errors.

An Empirical Approach For Probing the Definiteness of Kernels**preprint submitted to Soft Computing journal, Springer (4)

6 Summary and Outlook

The focus of this study was the definiteness of kernel and distance measures.Definiteness is a main requirement for modeling techniques like Gaussian processes or SVMs.Distance-based kernels with unknown definitenessmay be promising choices for certain parameter spaces.Yet, their definiteness may be very hard to prove or enforce.

It was found, that empirical approaches (based on sampling or optimization) may helpto assess definiteness of the respective function. In detail, two research questionswere investigated:

Q1subscriptQ1\text{Q}_{1}

Discovery: Is there an efficient, empirical approach to determine thedefiniteness of kernel functions based on arbitrary distance measures?

Q2subscriptQ2{\text{Q}_{2}}

Measurability: If Q1subscriptQ1\text{Q}_{1} can be answered affirmatively,can we quantify to what extent a lack of definiteness is problematic?

Two empirical approaches were suggested towards that end. The first approach (A1subscriptA1{\text{A}_{1}})samples from the space of solution sets, and determines whether a set is found which leadsto an indefinite distance or kernel matrix. If indefinite matrices are rare, a directed search with an EAis more successful (A2subscriptA2{\text{A}_{2}}). The EA maximizes the largest eigenvalue of a transformed distance matrix D^^𝐷\hat{D},respectively minimizes the smallest eigenvalue of a kernel matrix. Hence, the EA searches for sets that yield indefinite matrices.

As a proof-of-concept, the approaches were applied to distance measures for permutations.It was shown that five problematic distance measures couldbe identified: Longest Common Substring (LCStr), Insert, Chebyshev, Levenshtein, and Interchange distance.Information known from literature (regarding indefiniteness of the respectivekernel function) could be rediscovered by the empirical approaches.

The optimization approach was successful, as it was able to outperform the sampling approachin discovering sets with indefinite kernel matrices. Still, the results also indicated that thechoice of variation operators in the optimization algorithm does introduce bias.Hence, the respective results do not allow a conclusion about the impact of a lack of definitenessof the respective sets/matrices.Still, the success of the EA indicates that the fitness landscape posed by the largest eigenvalue isnot excessively rugged and has an exploitable structure. This suggests that the largest eigenvalue is a good indicator of how far acertain solution set (and the respective distance or kernel matrix) is from becoming indefinite.In an additional set of experiments, we further verified that increasing the largest eigenvalue can in factbe linked to a decrease in model quality.This results into the following responses to the posed research questions:

R1

Discovery: Sampling from the space of potential candidate solution sets allows identifying problems with definiteness,by identifying solution sets that lead to non-CNSD distance matrices (A1subscriptA1{\text{A}_{1}}).Where such situations are rare (hence more likely to be missed by the sampling),an optimization approach may be more successful (A2subscriptA2{\text{A}_{2}}).While neither approach A1subscriptA1{\text{A}_{1}} nor A2subscriptA2{\text{A}_{2}} are able to prove definiteness, both are able to disprove it.If no negative results are found it is reasonable to assume thatusing the respective distance/kernel function is feasible.

R2

Measurability: The sampling approach (A1subscriptA1{\text{A}_{1}}) yields a proportion of potentially non-CNSD matrices, which in turn yields an estimateof how problematic a distance measure is. In a similar way, yet potentially biased by the choice of optimization algorithm,the number of evaluations required by the optimization approach gives a similar estimate.In addition, the success of the optimization approach (A2subscriptA2{\text{A}_{2}}) suggests that the respective largest eigenvalueis an indicator of how close certain sets (and respective distance or kernel matrices) are to becoming indefinite.Additional experiments showed how this eigenvalue could be linked to model performance.

For future research, it may be of interest to allow the EA to change the set size.Clearly, one issue would be that enlarging the sets may quickly lead to a trivial solution, sincelarger sets naturally lead to larger λnsubscript𝜆𝑛\lambda_{n} of D^^𝐷\hat{D}.Hence, there is a trade-off between the largest eigenvalue and the set size.A multi-objective EA (e.g. NSGA-II(Deb etal., 2002) or SMS-EMOA(Beume etal., 2007))may be used to handle this issue by simultaneouslymaximizing λnsubscript𝜆𝑛\lambda_{n} and minimizing the set size n𝑛n.

Finally, the herein described kernels and distances are not the full story. For other kernels, the relation between distance measure and kernel functionmay not be as straightforward. Parameters of the distance measure or the kernel functioncould complicate the situation. It may be necessary to adapt the proposed method to, e.g., includeparameters in the sampling and optimization procedures.

Compliance with ethical standards

Conflicts of Interest The authors declare that they have no conflict of interest.

References

  • Bader etal., (2004)Bader, D.A., Moret, B.M., Warnow, T., Wyman, S.K., Yan, M., Tang, J.,Siepel, A.C., & Caprara, A. (2004).Genome rearrangements analysis under parsimony and other phylogeneticalgorithms (grappa) 2.0.Online: https://www.cs.unm.edu/moret/GRAPPA/.accessed: 16.11.2016.
  • Bartz-Beielstein and Zaefferer, (2017)Bartz-Beielstein, T. & Zaefferer, M. (2017).Model-based methods for continuous and discrete global optimization.Applied Soft Computing, 55:154 – 167.
  • Berg etal., (1984)Berg, C., Christensen, J. P.R., & Ressel, P. (1984).Harmonic Analysis on Semigroups, volume 100 of GraduateTexts in Mathematics.Springer New York.
  • Beume etal., (2007)Beume, N., Naujoks, B., & Emmerich, M. (2007).SMS-EMOA: Multiobjective selection based on dominatedhypervolume.European Journal of Operational Research, 181(3):1653–1669.
  • Boytsov, (2011)Boytsov, L. (2011).Indexing methods for approximate dictionary searching: Comparativeanalysis.J. Exp. Algorithmics, 16:1–91.
  • Burges, (1998)Burges, C.J. (1998).A tutorial on support vector machines for pattern recognition.Data Mining and Knowledge Discovery, 2(2):121–167.
  • Camastra and Vinciarelli, (2008)Camastra, F. & Vinciarelli, A. (2008).Machine Learning for Audio, Image and Video Analysis: Theoryand Applications.Advanced information and knowledge processing. Springer, London.
  • Campos etal., (2005)Campos, V., Laguna, M., & Martí, R. (2005).Context-independent scatter and tabu search for permutation problems.INFORMS Journal on Computing, 17(1):111–122.
  • Camps-Valls etal., (2004)Camps-Valls, G., Martín-Guerrero, J.D., Rojo-Álvarez, J.L., &Soria-Olivas, E. (2004).Fuzzy sigmoid kernel for support vector classifiers.Neurocomputing, 62:501–506.
  • Chen etal., (2009)Chen, Y., Gupta, M.R., & Recht, B. (2009).Learning kernels from indefinite similarities.In Proceedings of the 26th Annual International Conference onMachine Learning, ICML ’09, (pp. 145–152), New York, NY, USA. ACM.
  • Constantine, (1985)Constantine, G. (1985).Lower bounds on the spectra of symmetric matrices with nonnegativeentries.Linear Algebra and its Applications, 65:171–178.
  • Cortes etal., (2004)Cortes, C., Haffner, P., & Mohri, M. (2004).Rational kernels: Theory and algorithms.J. Mach. Learn. Res., 5:1035–1062.
  • Curriero, (2006)Curriero, F. (2006).On the use of non-euclidean distance measures in geostatistics.Mathematical Geology, 38(8):907–926.
  • Deb etal., (2002)Deb, K., Pratap, A., Agarwal, S., & Meyarivan, T. (2002).A fast and elitist multiobjective genetic algorithm: NSGA-II.IEEE Transactions on Evolutionary Computation, 6(2):182–197.
  • Deza and Huang, (1998)Deza, M. & Huang, T. (1998).Metrics on permutations, a survey.Journal of Combinatorics, Information and System Sciences,23(1-4):173–185.
  • Eiben and Smith, (2003)Eiben, A.E. & Smith, J.E. (2003).Introduction to Evolutionary Computing.Springer Berlin Heidelberg.
  • Feller, (1971)Feller, W. (1971).An Introduction to Probability Theory and Its Applications,Volume 2.JOHN WILEY & SONS INC.
  • Forrester etal., (2008)Forrester, A., Sobester, A., & Keane, A. (2008).Engineering Design via Surrogate Modelling.Wiley.
  • Gablonsky and Kelley, (2001)Gablonsky, J. & Kelley, C. (2001).A locally-biased form of the direct algorithm.Journal of Global Optimization, 21(1):27–37.
  • Gärtner etal., (2003)Gärtner, T., Lloyd, J., & Flach, P. (2003).Kernels for structured data.In Matwin, S. & Sammut, C., editors, Inductive LogicProgramming, volume 2583 of Lecture Notes in Computer Science, pages66–83. Springer Berlin Heidelberg.
  • Gärtner etal., (2004)Gärtner, T., Lloyd, J., & Flach, P. (2004).Kernels and distances for structured data.Machine Learning, 57(3):205–232.
  • Haussler, (1999)Haussler, D. (1999).Convolution kernels on discrete structures.Technical Report UCSC-CRL-99-10, Department of Computer Science,University of California at Santa Cruz.
  • Hirschberg, (1975)Hirschberg, D.S. (1975).A linear space algorithm for computing maximal common subsequences.Commun. ACM, 18(6):341–343.
  • Hutter etal., (2011)Hutter, F., Hoos, H.H., & Leyton-Brown, K. (2011).Sequential model-based optimization for general algorithmconfiguration.In Proc.of LION-5, (pp. 507–523).
  • Ikramov and Savel’eva, (2000)Ikramov, K. & Savel’eva, N. (2000).Conditionally definite matrices.Journal of Mathematical Sciences, 98(1):1–50.
  • Jiao and Vert, (2015)Jiao, Y. & Vert, J.-P. (2015).The Kendall and Mallows kernels for permutations.In Proceedings of the 32nd International Conference on MachineLearning (ICML-15), (pp. 1935–1944).
  • Kendall and Gibbons, (1990)Kendall, M. & Gibbons, J. (1990).Rank Correlation Methods.Oxford University Press.
  • Lee, (1958)Lee, C. (1958).Some properties of nonbinary error-correcting codes.IRE Transactions on Information Theory, 4(2):77–82.
  • Li and Jiang, (2004)Li, H. & Jiang, T. (2004).A class of edit kernels for svms to predict translation initiationsites in eukaryotic mrnas.In Proceedings of the Eighth Annual International Conference onResaerch in Computational Molecular Biology, RECOMB ’04, (pp. 262–271),New York, NY, USA. ACM.
  • Loosli etal., (2015)Loosli, G., Canu, S., & Ong, C. (2015).Learning SVM in Krein spaces.IEEE Transactions on Pattern Analysis and Machine Intelligence,38(6):1204–1216.
  • Marteau and Gibet, (2014)Marteau, P.-F. & Gibet, S. (2014).On recursive edit distance kernels with application to time seriesclassification.IEEE Transactions on Neural Networks and Learning Systems,PP(99):1–1.
  • Moraglio and Kattan, (2011)Moraglio, A. & Kattan, A. (2011).Geometric generalisation of surrogate model based optimisation tocombinatorial spaces.In Proceedings of the 11th European Conference on EvolutionaryComputation in Combinatorial Optimization, EvoCOP’11, (pp. 142–154),Berlin, Heidelberg, Germany. Springer.
  • Motwani and Raghavan, (1995)Motwani, R. & Raghavan, P. (1995).Randomized Algorithms.Cambridge University Press.
  • Murphy, (2012)Murphy, K.P. (2012).Machine Learning.MIT Press Ltd.
  • Ong etal., (2004)Ong, C.S., Mary, X., Canu, S., & Smola, A.J. (2004).Learning with non-positive kernels.In Proceedings of the Twenty-first International Conference onMachine Learning, ICML ’04, (pp. 81–88), New York, NY, USA. ACM.
  • Pawlik and Augsten, (2015)Pawlik, M. & Augsten, N. (2015).Efficient computation of the tree edit distance.ACM Transactions on Database Systems, 40(1):1–40.
  • Pawlik and Augsten, (2016)Pawlik, M. & Augsten, N. (2016).Tree edit distance: Robust and memory-efficient.Information Systems, 56:157–173.
  • Rasmussen and Williams, (2006)Rasmussen, C.E. & Williams, C. K.I. (2006).Gaussian Processes for Machine Learning.The MIT Press.
  • Reeves, (1999)Reeves, C.R. (1999).Landscapes, operators and heuristic search.Annals of Operations Research, 86(0):473–490.
  • Schiavinotto and Stützle, (2007)Schiavinotto, T. & Stützle, T. (2007).A review of metrics on permutations for search landscape analysis.Computers & Operations Research, 34(10):3143–3153.
  • Schleif and Tino, (2015)Schleif, F.-M. & Tino, P. (2015).Indefinite proximity learning: A review.Neural Computation, 27(10):2039–2096.
  • Schölkopf, (2001)Schölkopf, B. (2001).The kernel trick for distances.In Leen, T.K., Dietterich, T.G., & Tresp, V., editors, Advances in Neural Information Processing Systems 13, pages 301–307. MITPress.
  • Sevaux and Sörensen, (2005)Sevaux, M. & Sörensen, K. (2005).Permutation distance measures for memetic algorithms with populationmanagement.In Proceedings of 6th Metaheuristics International Conference(MIC’05), (pp. 832–838). University of Vienna.
  • Singhal, (2001)Singhal, A. (2001).Modern information retrieval: a brief overview.IEEE Bulletin on Data Engineering, 24(4):35–43.
  • Smola etal., (2000)Smola, A.J., Ovári, Z.L., & Williamson, R.C. (2000).Regularization with dot-product kernels.In Advances in Neural Information Processing Systems 13,Proceedings, (pp. 308–314). MIT Press.
  • vander Loo, (2014)vander Loo, M.P. (2014).The stringdist package for approximate string matching.The R Journal, 6(1):111–122.
  • Vapnik, (1998)Vapnik, V.N. (1998).Statistical Learning Theory, volume1.Wiley New York.
  • Voutchkov etal., (2005)Voutchkov, I., Keane, A., Bhaskar, A., & Olsen, T.M. (2005).Weld sequence optimization: The use of surrogate models for solvingsequential combinatorial problems.Computer Methods in Applied Mechanics and Engineering,194(30-33):3535–3551.
  • Wagner and Fischer, (1974)Wagner, R.A. & Fischer, M.J. (1974).The string-to-string correction problem.J. ACM, 21(1):168–173.
  • Wu etal., (2005)Wu, G., Chang, E.Y., & Zhang, Z. (2005).An analysis of transformation on non-positive semidefinite similaritymatrix for kernel machines.In Proceedings of the 22nd International Conference on MachineLearning.
  • Zaefferer and Bartz-Beielstein, (2016)Zaefferer, M. & Bartz-Beielstein, T. (2016).Efficient global optimization with indefinite kernels.In Parallel Problem Solving from Nature–PPSN XIV, (pp.69–79). Springer.
  • (52)Zaefferer, M., Stork, J., & Bartz-Beielstein, T. (2014a).Distance measures for permutations in combinatorial efficient globaloptimization.In Bartz-Beielstein, T., Branke, J., Filipič, B., & Smith, J.,editors, Parallel Problem Solving from Nature–PPSN XIII, (pp.373–383), Cham, Switzerland. Springer.
  • (53)Zaefferer, M., Stork, J., Friese, M., Fischbach, A., Naujoks, B., &Bartz-Beielstein, T. (2014b).Efficient global optimization for combinatorial problems.In Proceedings of the 2014 Conference on Genetic andEvolutionary Computation, GECCO ’14, (pp. 871–878), New York, NY, USA.ACM.

Appendix A: Distance Measures for Permutations

In the following, we describe the distance measures employed in the experiments.

  • The Levenshtein distance is an edit distance measure:

    dLev(π,π)=editsππsubscript𝑑𝐿𝑒𝑣𝜋superscript𝜋𝑒𝑑𝑖𝑡subscript𝑠𝜋superscript𝜋{d}_{Lev}(\pi,\pi^{\prime})=edits_{\pi\rightarrow\pi^{\prime}}

    Here, editsππ𝑒𝑑𝑖𝑡subscript𝑠𝜋superscript𝜋edits_{\pi\rightarrow\pi^{\prime}} is the minimal number of deletions, insertions, or substitutionsrequired to transform one string(or here: permutation) π𝜋\pi into another string πsuperscript𝜋\pi^{\prime}.The implementation is based on Wagner and Fischer, (1974).

  • Swaps are transpositions of two adjacent elements.The Swap distance (also: Kendall’s Tau (Kendall and Gibbons, 1990; Sevaux and Sörensen, 2005) orPrecedence distance (Schiavinotto and Stützle, 2007))counts the minimum number of swapsrequired to transform one permutation into another.For permutations, it is(Sevaux and Sörensen, 2005):

    dSwa(π,π)=i=1mj=1mzijwithsubscript𝑑𝑆𝑤𝑎𝜋superscript𝜋superscriptsubscript𝑖1𝑚superscriptsubscript𝑗1𝑚subscript𝑧𝑖𝑗with\displaystyle{d}_{Swa}(\pi,\pi^{\prime})=\sum_{i=1}^{m}\sum_{j=1}^{m}z_{ij}~{}~{}\text{with}
    zij={1ifπi<πjandπi>πj,0otherwise.subscript𝑧𝑖𝑗cases1ifsubscript𝜋𝑖expectationsubscript𝜋𝑗andsubscriptsuperscript𝜋𝑖subscriptsuperscript𝜋𝑗0otherwise.\displaystyle z_{ij}=\left\{\begin{array}[]{l l}1&\quad\text{if }\pi_{i}<\pi_{j}~{}\text{and}~{}\pi^{\prime}_{i}>\pi^{\prime}_{j},\\0&\quad\text{otherwise.}\end{array}\right.
  • An interchange operation is the transposition of two arbitrary elements.Respectively, the Interchange (also: Cayley) distance counts the minimum numberof interchanges (interchangesππ𝑖𝑛𝑡𝑒𝑟𝑐𝑎𝑛𝑔𝑒subscript𝑠𝜋superscript𝜋interchanges_{\pi\rightarrow\pi^{\prime}}) required to transform one permutation into another (Schiavinotto and Stützle, 2007):

    dInt(π,π)=interchangesππsubscript𝑑𝐼𝑛𝑡𝜋superscript𝜋𝑖𝑛𝑡𝑒𝑟𝑐𝑎𝑛𝑔𝑒subscript𝑠𝜋superscript𝜋{d}_{Int}(\pi,\pi^{\prime})=interchanges_{\pi\rightarrow\pi^{\prime}}

  • The Insert distance is based on the longest common subsequence LCSeq(π,π)𝐿𝐶𝑆𝑒𝑞𝜋superscript𝜋LCSeq(\pi,\pi^{\prime}).The longest common subsequence is the largest number of elements that follow eachother in both permutations, with interruptions. The correspondingdistance is

    dIns(π,π)=mLCSeq(π,π).subscript𝑑𝐼𝑛𝑠𝜋superscript𝜋𝑚𝐿𝐶𝑆𝑒𝑞𝜋superscript𝜋{d}_{Ins}(\pi,\pi^{\prime})=m-LCSeq(\pi,\pi^{\prime}).

    We use the algorithm described by Hirschberg, (1975).The name is due to its interpretation as an edit distance measure. The corresponding edit operation is a combination of insertionand deletion. A single element is moved from one position (delete) to a new position (insert).It is also called Ulam’s distance (Schiavinotto and Stützle, 2007).

  • The Longest Common Substring distance is based on the largest number ofelements that follow each other in both permutations, without interruption.Unlike the longest common subsequence all elements have to be adjacent.

    If LCStr(π,π)𝐿𝐶𝑆𝑡𝑟𝜋superscript𝜋LCStr(\pi,\pi^{\prime}) is the length of the longest common string,the distance is

    dLCStr(π,π)=mLCStr(π,π).subscript𝑑𝐿𝐶𝑆𝑡𝑟𝜋superscript𝜋𝑚𝐿𝐶𝑆𝑡𝑟𝜋superscript𝜋{d}_{LCStr}(\pi,\pi^{\prime})=m-LCStr(\pi,\pi^{\prime}).
  • The R-distance(Campos etal., 2005; Sevaux and Sörensen, 2005)counts the number of times that one element follows another in one permutation, but not in the other.It is identical with the uni-directional adjacency distance(Reeves, 1999). It is computed by

    dR(π,π)=i=1m1yiwithsubscript𝑑𝑅𝜋superscript𝜋superscriptsubscript𝑖1𝑚1subscript𝑦𝑖with\displaystyle{d}_{R}(\pi,\pi^{\prime})=\sum_{i=1}^{m-1}y_{i}~{}~{}\text{with}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}
    yi={0ifj:πi=πjandπi+1=πj+1,1otherwise.subscript𝑦𝑖cases0:if𝑗subscript𝜋𝑖subscriptsuperscript𝜋𝑗andsubscript𝜋𝑖1subscriptsuperscript𝜋𝑗11otherwise.\displaystyle y_{i}=\left\{\begin{array}[]{l l}0&\quad\text{if }\exists j:\pi_{i}=\pi^{\prime}_{j}~{}\text{and}~{}\pi_{i+1}=\pi^{\prime}_{j+1},\\1&\quad\text{otherwise.}\end{array}\right.
  • The (bi-directional) Adjacencydistance(Reeves, 1999; Schiavinotto and Stützle, 2007) counts the number of times two elements are neighbors in one, but not in the other permutation.Unlike R-distance (uni-directional), the order of the two elements does not matter.It is computed by

    dAdj(π,π)=i=1m1yiwithsubscript𝑑𝐴𝑑𝑗𝜋superscript𝜋superscriptsubscript𝑖1𝑚1subscript𝑦𝑖with\displaystyle{d}_{Adj}(\pi,\pi^{\prime})=\sum_{i=1}^{m-1}y_{i}~{}~{}\text{with}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}~{}
    yi={0ifj:πi=πjandπi+1{πj+1,πj1},1otherwise.subscript𝑦𝑖cases0:if𝑗subscript𝜋𝑖subscriptsuperscript𝜋𝑗andsubscript𝜋𝑖1subscriptsuperscript𝜋𝑗1subscriptsuperscript𝜋𝑗11otherwise.\displaystyle y_{i}=\left\{\begin{array}[]{l l}0&\quad\text{if }\exists j:\pi_{i}=\pi^{\prime}_{j}~{}\text{and}~{}\pi_{i+1}\in\{\pi^{\prime}_{j+1},\pi^{\prime}_{j-1}\},\\1&\quad\text{otherwise.}\end{array}\right.
  • The Position distance(Schiavinotto and Stützle, 2007) is identical with the Deviation distance or Spearman’s footrule (Sevaux and Sörensen, 2005),

    dPos(π,π)=k=1m|ij|whereπi=πj=ksubscript𝑑Pos𝜋superscript𝜋superscriptsubscript𝑘1𝑚𝑖𝑗wheresubscript𝜋𝑖subscriptsuperscript𝜋𝑗𝑘{d}_{\text{Pos}}(\pi,\pi^{\prime})=\sum_{k=1}^{m}|i-j|~{}~{}\text{where}~{}~{}\pi_{i}=\pi^{\prime}_{j}=k .

  • The non-metric Squared Position distance is Spearman’s rank correlation coefficient(Sevaux and Sörensen, 2005). In contrast to the Position distance, the term |ij|𝑖𝑗|i-j| is replaced by (ij)2superscript𝑖𝑗2(i-j)^{2}.

  • The Hamming distance or Exact Match distance simply counts the number of unequal elements in two permutations, i.e.,

    dHam(π,π)=i=1mai,whereai={0ifπi=πi,1otherwise.formulae-sequencesubscript𝑑𝐻𝑎𝑚𝜋superscript𝜋superscriptsubscript𝑖1𝑚subscript𝑎𝑖wheresubscript𝑎𝑖cases0ifsubscript𝜋𝑖subscriptsuperscript𝜋𝑖1otherwise.{d}_{Ham}(\pi,\pi^{\prime})=\sum_{i=1}^{m}a_{i},~{}~{}\text{where}~{}~{}a_{i}=\left\{\begin{array}[]{l l}0&\quad\text{if }\pi_{i}=\pi^{\prime}_{i},\\1&\quad\text{otherwise.}\end{array}\right.

  • The Euclidean distance is

    dEuc(π,π)=i=1m(πiπi)2subscript𝑑𝐸𝑢𝑐𝜋superscript𝜋superscriptsubscript𝑖1𝑚superscriptsubscript𝜋𝑖subscriptsuperscript𝜋𝑖2{d}_{Euc}(\pi,\pi^{\prime})=\sqrt{\sum_{i=1}^{m}(\pi_{i}-\pi^{\prime}_{i})^{2}} .

  • The Manhattan distance (A-Distance,
    cf.(Sevaux and Sörensen, 2005; Campos etal., 2005)) is

    dMan(π,π)=i=1m|πiπi|subscript𝑑𝑀𝑎𝑛𝜋superscript𝜋superscriptsubscript𝑖1𝑚subscript𝜋𝑖subscriptsuperscript𝜋𝑖{d}_{Man}(\pi,\pi^{\prime})=\sum_{i=1}^{m}|\pi_{i}-\pi^{\prime}_{i}| .

  • The Chebyshev distance is

    dChe(π,π)=max1im(|πiπi|)subscript𝑑𝐶𝑒𝜋superscript𝜋1𝑖𝑚subscript𝜋𝑖subscriptsuperscript𝜋𝑖{d}_{Che}(\pi,\pi^{\prime})=\underset{1\leq i\leq m}{\max}(|\pi_{i}-\pi^{\prime}_{i}|) .

  • For permutations, the Lee distance(Lee, 1958; Deza and Huang, 1998) is

    dLee(π,π)=i=1mmin(|πiπi|,m|πiπi|)subscript𝑑𝐿𝑒𝑒𝜋superscript𝜋superscriptsubscript𝑖1𝑚subscript𝜋𝑖subscriptsuperscript𝜋𝑖𝑚subscript𝜋𝑖subscriptsuperscript𝜋𝑖{d}_{Lee}(\pi,\pi^{\prime})=\sum_{i=1}^{m}\min(|\pi_{i}-\pi^{\prime}_{i}|,m-|\pi_{i}-\pi^{\prime}_{i}|) .

  • The non-metric Cosine distance is based on the dot product of two permutations.It is derived from the cosine similarity(Singhal, 2001) of two vectors:

    dCos(π,π)=1ππππ.subscript𝑑𝐶𝑜𝑠𝜋superscript𝜋1𝜋superscript𝜋norm𝜋normsuperscript𝜋{d}_{Cos}(\pi,\pi^{\prime})=1-\frac{\pi\cdot\pi^{\prime}}{||\pi||~{}||\pi^{\prime}||}.
  • The Lexicographic distance regards the lexicographic ordering of permutations. If the position of a permutation π𝜋\piin the lexicographic ordering of all permutations with fixed m𝑚m is L(π)𝐿𝜋L(\pi), then the Lexicographic distance metric is

    dLex(π,π)=|L(π)L(π)|.subscript𝑑𝐿𝑒𝑥𝜋superscript𝜋𝐿𝜋𝐿superscript𝜋{d}_{Lex}(\pi,\pi^{\prime})=|L(\pi)-L(\pi^{\prime})|.

Appendix B: Minimal Examples for Indefinite Sets

To showcase the usefulness of the proposed methods, this sectionlists small example data sets and the respective indefinite distance matrices.Besides the standard permutation distances we also tested:

  • Signed permutations, reversal distance: Permutations where each element has a sign are referred to as signed permutations.An application example for signed permutations is, e.g., weld path optimization(Voutchkov etal., 2005).The reversal distance counts the number of reversals required to transform one permutationinto another.We used the non-cyclic reversal distance provided in theGRAPPA library version 2.0(Bader etal., 2004).

  • Labeled trees, tree edit distance:Trees in general are widely applied as solution representation, e.g., in Genetic Programming.In this study, we considered labeled trees. The tree edit distance counts the number node insertions, deletions or relabels.We used the efficient implementation in the APTED 0.1.1 library(Pawlik and Augsten, 2015, 2016).The labeled trees will be denoted with the bracket notation: curly brackets indicate the tree structure, letters indicate labels (internal and terminal nodes).

  • Strings, Optimal String Alignment distance (OSA): The OSA is an non-metric edit distance that counts insertions, deletions, substitutions and transpositions of characters. Each substring can be edited no more than once. It is also called the restricted Damerau-Levenshtein distance(Boytsov, 2011). We used the implementation in the stringdist R package(vander Loo, 2014).

  • Strings, Jaro-Winkler distance: The Jaro Winkler distance is based on the number of matching characters in two strings as well as the number of transpositions required to bring all matches in the same order.We used the implementation in the stringdist R package(vander Loo, 2014).

The respective results are listed in Table3.All of the listed distance measures are shown to be non-CNSD.

Permutations, Insert, n=5𝑛5n=5, m=4𝑚4m=4, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.090
i𝑖ixisubscript𝑥𝑖x_{i}di,1subscript𝑑𝑖1d_{i,1}di,2subscript𝑑𝑖2d_{i,2}di,3subscript𝑑𝑖3d_{i,3}di,4subscript𝑑𝑖4d_{i,4}di,5subscript𝑑𝑖5d_{i,5}
1{1,2,3,4}1234\{1,2,3,4\}01/3131/31/3131/32/3232/31/3131/3
2{1,3,4,2}1342\{1,3,4,2\}02/3232/31/3131/32/3232/3
3{2,3,4,1}2341\{2,3,4,1\}01/3131/32/3232/3
4{3,4,1,2}3412\{3,4,1,2\}01/3131/3
5{4,1,2,3}4123\{4,1,2,3\}0
Permutations, Interchange, n=5𝑛5n=5, m=4𝑚4m=4, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.090
1{1,2,3,4}1234\{1,2,3,4\}01/3131/31/3131/32/3232/31/3131/3
2{1,2,4,3}1243\{1,2,4,3\}02/3232/31/3131/32/3232/3
3{1,3,2,4}1324\{1,3,2,4\}01/3131/32/3232/3
4{1,3,4,2}1342\{1,3,4,2\}01/3131/3
5{1,4,3,2}1432\{1,4,3,2\}0
Permutations, Levenshtein, n=5𝑛5n=5, m=4𝑚4m=4, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.135
1{1,2,4,3}1243\{1,2,4,3\}01111/2121/21/2121/2111
2{2,3,1,4}2314\{2,3,1,4\}01/2121/21/2121/2111
3{2,4,3,1}2431\{2,4,3,1\}01111/2121/2
4{3,1,2,4}3124\{3,1,2,4\}01/2121/2
5{3,4,2,1}3421\{3,4,2,1\}0
Permutations, LCStr, n=5𝑛5n=5, m=4𝑚4m=4, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.023
1{1,3,2,4}1324\{1,3,2,4\}02/3232/31/3131/31/3131/32/3232/3
2{2,4,1,3}2413\{2,4,1,3\}01/3131/31/3131/32/3232/3
3{3,2,4,1}3241\{3,2,4,1\}02/3232/3111
4{4,1,3,2}4132\{4,1,3,2\}02/3232/3
5{4,2,1,3}4213\{4,2,1,3\}0
Permutations, Chebyshev, n=5𝑛5n=5, m=5𝑚5m=5, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.034
1{1,5,3,4,2}15342\{1,5,3,4,2\}01/4141/43/4343/43/4343/4111
2{2,5,3,4,1}25341\{2,5,3,4,1\}01111113/4343/4
3{4,2,3,1,5}42315\{4,2,3,1,5\}02/4242/41/4141/4
4{4,3,1,2,5}43125\{4,3,1,2,5\}01/4141/4
5{5,3,2,1,4}53214\{5,3,2,1,4\}0
Sign. Permutations, Reversal, n=5𝑛5n=5, m=5𝑚5m=5, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.016
1{4,5,-1,-2,-3}45-1-2-3\{~{}4,~{}5,\text{-}1,\text{-}2,\text{-}3\}04/6464/65/6565/63/6363/62/6262/6
2{2,1,3,-4,-5}213-4-5\{~{}2,~{}1,~{}3,\text{-}4,\text{-}5\}02/6262/63/6363/65/6565/6
3{-2,1,3,5,4}-21354\{\text{-}2,~{}1,~{}3,~{}5,~{}4\}05/6565/63/6363/6
4{4,-2,3,1,-5}4-231-5\{~{}4,\text{-}2,~{}3,~{}1,\text{-}5\}02/6262/6
5{4,-2,1,-5,-3}4-21-5-3\{~{}4,\text{-}2,~{}1,\text{-}5,\text{-}3\}0
Labeled Trees, Edit dist., n=5𝑛5n=5, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.026
1{b{c{b}}}𝑏𝑐𝑏\{b\{c\{b\}\}\}0222111333111
2{b}𝑏\{b\}0111333111
3{b{c}}𝑏𝑐\{b\{c\}\}0222222
4{a{c}{a}}𝑎𝑐𝑎\{a\{c\}\{a\}\}0333
5{c{b}}𝑐𝑏\{c\{b\}\}0
Strings, Optimal String Alignment, n=5𝑛5n=5, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.102
1abc0111222333111
2acc0333222222
3cba0111222
4caa0222
5bac0
Strings, Jaro-Winkler, n=4𝑛4n=4, λnsubscript𝜆𝑛absent\lambda_{n}\approx 0.046
1bbbb01111/6161/63/6363/6
2aaaa03/6363/61/6161/6
3bbba03/6363/6
4aaab0
An Empirical Approach For Probing the Definiteness of Kernels**preprint submitted to Soft Computing journal, Springer (2024)

References

Top Articles
Latest Posts
Article information

Author: Stevie Stamm

Last Updated:

Views: 6150

Rating: 5 / 5 (60 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Stevie Stamm

Birthday: 1996-06-22

Address: Apt. 419 4200 Sipes Estate, East Delmerview, WY 05617

Phone: +342332224300

Job: Future Advertising Analyst

Hobby: Leather crafting, Puzzles, Leather crafting, scrapbook, Urban exploration, Cabaret, Skateboarding

Introduction: My name is Stevie Stamm, I am a colorful, sparkling, splendid, vast, open, hilarious, tender person who loves writing and wants to share my knowledge and understanding with you.