Attentive Kernel¶
sgptools.kernels.attentive_kernel.AttentiveKernel
¶
Bases: Kernel
Attentive Kernel function (non-stationary kernel function).
This kernel uses a Multi-Layer Perceptron (MLP) to learn attention weights for a mixture of RBF kernel components, making it adapt to local data characteristics. It is based on the implementation from Weizhe-Chen/attentive_kernels.
Refer to the following paper for more details
- AK: Attentive Kernel for Information Gathering [Chen et al., 2022]
Attributes:
Name | Type | Description |
---|---|---|
_free_amplitude |
Variable
|
The amplitude (variance) parameter of the kernel. |
lengthscales |
Variable
|
Fixed lengthscales for each RBF mixture component. |
num_lengthscales |
int
|
Number of RBF mixture components. |
nn |
NN
|
The Neural Network (MLP) used to generate attention representations. |
Source code in sgptools/kernels/attentive_kernel.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
K(X, X2=None)
¶
Computes the covariance matrix between input data points X
and X2
.
If X2
is None, it computes the covariance matrix K(X, X)
.
The covariance is calculated as a weighted sum of RBF kernels, where the weights are derived from the attention representations generated by the MLP.
Formula (simplified): \(K(X, X') = ext{amplitude} imes ext{attention}(X, X') imes \sum_{i=1}^{Q} ext{RBF}(||X-X'||, ext{lengthscale}_i) imes ext{attention_lengthscale}_i(X,X')\) where $ ext{attention}(X, X') = ext{representation}(X) \cdot ext{representation}(X')^T$.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
(N1, D); Input data points. |
required |
X2
|
Optional[Tensor]
|
(N2, D); Optional second set of input data points.
If None, |
None
|
Returns:
Type | Description |
---|---|
Tensor
|
tf.Tensor: (N1, N2); The computed covariance matrix. |
Source code in sgptools/kernels/attentive_kernel.py
K_diag(X)
¶
Computes the diagonal of the covariance matrix K(X, X)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
(N, D); Input data points. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
tf.Tensor: (N,); A 1D tensor representing the diagonal elements of the covariance matrix. |
Source code in sgptools/kernels/attentive_kernel.py
__init__(lengthscales, hidden_sizes=None, amplitude=1.0, num_dim=2)
¶
Initializes the Attentive Kernel.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lengthscales
|
Union[List[float], ndarray]
|
A list or NumPy array of lengthscale values to be used in the RBF mixture components. These lengthscales are not trained by the optimizer. |
required |
hidden_sizes
|
List[int]
|
A list where each element specifies the number of hidden units in a layer of the MLPs. The length of this list determines the number of hidden layers. Defaults to [10, 10]. |
None
|
amplitude
|
float
|
Initial amplitude (variance) of the kernel function. This parameter is trainable. Defaults to 1.0. |
1.0
|
num_dim
|
int
|
The dimensionality of the input data points (e.g., 2 for 2D data). Defaults to 2. |
2
|
Usage
import gpflow
import numpy as np
from sgptools.kernels.attentive_kernel import AttentiveKernel
# Example: 10 fixed lengthscales ranging from 0.01 to 2.0
l_scales = np.linspace(0.01, 2.0, 10).astype(np.float32)
# Initialize Attentive Kernel for 2D data
kernel = AttentiveKernel(lengthscales=l_scales, hidden_sizes=[10, 10], num_dim=2)
# You can then use this kernel in a GPflow model:
# model = gpflow.models.GPR(data=(X_train, Y_train), kernel=kernel, noise_variance=0.1)
# optimize_model(model)
Source code in sgptools/kernels/attentive_kernel.py
get_representations(X)
¶
Computes normalized latent representations for input data points X
using the MLP.
These representations are used to calculate attention weights for the kernel mixture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
(N, D); Input data points. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
tf.Tensor: (N, num_lengthscales); Normalized latent representations for each input point. |