Attentive Kernel¶
sgptools.kernels.attentive_kernel.AttentiveKernel
¶
Bases: Kernel
Attentive Kernel function (non-stationary kernel function).
This kernel uses a Multi-Layer Perceptron (MLP) to learn attention weights for a mixture of RBF kernel components, making it adapt to local data characteristics. It is based on the implementation from Weizhe-Chen/attentive_kernels.
Refer to the following paper for more details
- AK: Attentive Kernel for Information Gathering [Chen et al., 2022]
Attributes:
Name | Type | Description |
---|---|---|
_free_amplitude |
Variable
|
The amplitude (variance) parameter of the kernel. |
lengthscales |
Variable
|
Fixed lengthscales for each RBF mixture component. |
num_lengthscales |
int
|
Number of RBF mixture components. |
nn |
NN
|
The Neural Network (MLP) used to generate attention representations. |
Source code in sgptools/kernels/attentive_kernel.py
|
|
K(X, X2=None)
¶
Computes the covariance matrix between input data points X
and X2
.
If X2
is None, it computes the covariance matrix K(X, X)
.
The covariance is calculated as a weighted sum of RBF kernels, where the weights are derived from the attention representations generated by the MLP.
Formula (simplified): \(K(X, X') = ext{amplitude} imes ext{attention}(X, X') imes \sum_{i=1}^{Q} ext{RBF}(||X-X'||, ext{lengthscale}_i) imes ext{attention_lengthscale}_i(X,X')\) where $ ext{attention}(X, X') = ext{representation}(X) \cdot ext{representation}(X')^T$.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
(N1, D); Input data points. |
required |
X2
|
Optional[Tensor]
|
(N2, D); Optional second set of input data points.
If None, |
None
|
Returns:
Type | Description |
---|---|
Tensor
|
tf.Tensor: (N1, N2); The computed covariance matrix. |
Source code in sgptools/kernels/attentive_kernel.py
K_diag(X)
¶
Computes the diagonal of the covariance matrix K(X, X)
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
(N, D); Input data points. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
tf.Tensor: (N,); A 1D tensor representing the diagonal elements of the covariance matrix. |
Source code in sgptools/kernels/attentive_kernel.py
__init__(lengthscales, hidden_sizes=None, amplitude=1.0, num_dim=2)
¶
Initializes the Attentive Kernel.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lengthscales
|
Union[List[float], ndarray]
|
A list or NumPy array of lengthscale values to be used in the RBF mixture components. These lengthscales are not trained by the optimizer. |
required |
hidden_sizes
|
List[int]
|
A list where each element specifies the number of hidden units in a layer of the MLPs. The length of this list determines the number of hidden layers. Defaults to [10, 10]. |
None
|
amplitude
|
float
|
Initial amplitude (variance) of the kernel function. This parameter is trainable. Defaults to 1.0. |
1.0
|
num_dim
|
int
|
The dimensionality of the input data points (e.g., 2 for 2D data). Defaults to 2. |
2
|
Usage
import gpflow
import numpy as np
from sgptools.kernels.attentive_kernel import AttentiveKernel
# Example: 10 fixed lengthscales ranging from 0.01 to 2.0
l_scales = np.linspace(0.01, 2.0, 10).astype(np.float32)
# Initialize Attentive Kernel for 2D data
kernel = AttentiveKernel(lengthscales=l_scales, hidden_sizes=[10, 10], num_dim=2)
# You can then use this kernel in a GPflow model:
# model = gpflow.models.GPR(data=(X_train, Y_train), kernel=kernel, noise_variance=0.1)
# optimize_model(model)
Source code in sgptools/kernels/attentive_kernel.py
get_representations(X)
¶
Computes normalized latent representations for input data points X
using the MLP.
These representations are used to calculate attention weights for the kernel mixture.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
X
|
Tensor
|
(N, D); Input data points. |
required |
Returns:
Type | Description |
---|---|
Tensor
|
tf.Tensor: (N, num_lengthscales); Normalized latent representations for each input point. |