Neural Kernel¶

`sgptools.kernels.neural_kernel.NeuralSpectralKernel` ¶

Bases: Kernel

Neural Spectral Kernel function (non-stationary kernel function). This kernel models non-stationarity by using multiple Multi-Layer Perceptrons (MLPs) to map input locations to frequency, lengthscale, and variance parameters for a mixture of spectral components.

Based on the implementation from this repo.

Refer to the following papers for more details

Neural Non-Stationary Spectral Kernel [Remes et al., 2018]

Attributes:

Name	Type	Description
`input_dim`	`int`	Dimensionality of the input data points.
`Q`	`int`	Number of MLP mixture components used in the kernel function.
`num_hidden`	`int`	Number of hidden layers in each MLP.
`freq`	`List[NN]`	List of MLPs, one for each component, predicting frequencies.
`length`	`List[NN]`	List of MLPs, one for each component, predicting lengthscales.
`var`	`List[NN]`	List of MLPs, one for each component, predicting variances.

Source code in sgptools/kernels/neural_kernel.py

class NeuralSpectralKernel(gpflow.kernels.Kernel):
    """
    Neural Spectral Kernel function (non-stationary kernel function).
    This kernel models non-stationarity by using multiple Multi-Layer Perceptrons (MLPs)
    to map input locations to frequency, lengthscale, and variance parameters for a
    mixture of spectral components.

    Based on the implementation from this [repo](https://github.com/sremes/nssm-gp/tree/master?tab=readme-ov-file).

    Refer to the following papers for more details:
        - Neural Non-Stationary Spectral Kernel [Remes et al., 2018]

    Attributes:
        input_dim (int): Dimensionality of the input data points.
        Q (int): Number of MLP mixture components used in the kernel function.
        num_hidden (int): Number of hidden layers in each MLP.
        freq (List[NN]): List of MLPs, one for each component, predicting frequencies.
        length (List[NN]): List of MLPs, one for each component, predicting lengthscales.
        var (List[NN]): List of MLPs, one for each component, predicting variances.
    """

    def __init__(self,
                 input_dim: int,
                 active_dims: Optional[List[int]] = None,
                 Q: int = 1,
                 hidden_sizes: List[int] = None):
        """
        Initializes the Neural Spectral Kernel.

        Args:
            input_dim (int): Number of dimensions of the input data points (e.g., 2 for 2D data).
            active_dims (Optional[List[int]]): A list of indices specifying which input dimensions
                                                the kernel operates on. If None, all dimensions are active.
                                                Defaults to None.
            Q (int): The number of MLP mixture components used in the kernel function.
                     Each component has its own set of MLPs for frequency, lengthscale, and variance.
                     Defaults to 1.
            hidden_sizes (List[int]): A list where each element specifies the number of hidden units
                                      in a layer of the MLPs. The length of this list determines
                                      the number of hidden layers. Defaults to [32, 32].

        Usage:
            ```python
            import gpflow
            import numpy as np
            from sgptools.kernels.neural_kernel import NeuralSpectralKernel

            # Initialize a Neural Spectral Kernel for 2D data with 3 mixture components
            # and MLPs with 2 hidden layers of 64 units each.
            kernel = NeuralSpectralKernel(input_dim=2, Q=3, hidden_sizes=[64, 64])

            # You can then use this kernel in a GPflow model:
            # model = gpflow.models.SGPR(data=(X_train, Y_train), kernel=kernel, ...)
            ```
        """
        super().__init__(active_dims=active_dims)

        if hidden_sizes is None:
            hidden_sizes = [32, 32]  # Default if not provided
        else:
            hidden_sizes = list(hidden_sizes)

        self.input_dim = input_dim
        self.Q = Q
        self.num_hidden = len(hidden_sizes)

        # Initialize lists of MLPs for each component
        self.freq: List[NN] = []
        self.length: List[NN] = []
        self.var: List[NN] = []

        # Create Q sets of MLPs
        for q in range(self.Q):
            # MLP for frequency: maps input_dim -> hidden_sizes -> input_dim
            # Output activation 'softplus' ensures positive frequencies.
            freq_nn = NN([input_dim] + hidden_sizes + [input_dim],
                         output_activation_fn='softplus')

            # MLP for lengthscale: maps input_dim -> hidden_sizes -> input_dim
            # Output activation 'softplus' ensures positive lengthscales.
            length_nn = NN([input_dim] + hidden_sizes + [input_dim],
                           output_activation_fn='softplus')

            # MLP for variance: maps input_dim -> hidden_sizes -> 1 (scalar variance)
            # Output activation 'softplus' ensures positive variances.
            var_nn = NN([input_dim] + hidden_sizes + [1],
                        output_activation_fn='softplus')

            self.freq.append(freq_nn)
            self.length.append(length_nn)
            self.var.append(var_nn)

    @tf.autograph.experimental.do_not_convert
    def K(self, X: tf.Tensor, X2: Optional[tf.Tensor] = None) -> tf.Tensor:
        """
        Computes the covariance matrix between/amongst the input variables `X` and `X2`.
        If `X2` is None, the function computes `K(X, X)` (a symmetric covariance matrix).
        Otherwise, it computes `K(X, X2)` (a cross-covariance matrix).

        The kernel is a sum over `Q` mixture components, where each component's
        parameters (frequency, lengthscale, variance) are determined by MLPs
        based on the input locations.

        Args:
            X (tf.Tensor): (N1, D); First set of input variables to compute covariance from.
                           `N1` is the number of points, `D` is the dimensionality.
            X2 (Optional[tf.Tensor]): (N2, D); Optional second set of input variables.
                                     If provided, computes cross-covariance `K(X, X2)`.
                                     If None, computes auto-covariance `K(X, X)`.

        Returns:
            tf.Tensor: (N1, N2); The computed covariance matrix. If `X2` is None, the
                       diagonal of `K(X, X)` is jittered for numerical stability.
        """
        if X2 is None:
            X2_internal = X
            equal = True  # Flag to add jitter to diagonal for K(X,X)
        else:
            X2_internal = X2
            equal = False

        kern = tf.constant(0.0, dtype=float_type)  # Initialize kernel sum

        for q in range(self.Q):
            # Compute latent function values (frequencies, lengthscales, variances)
            # by passing input locations through the MLPs.
            freq_X, freq_X2 = self.freq[q](X), self.freq[q](
                X2_internal)  # (N, D) frequencies
            lens_X, lens_X2 = self.length[q](X), self.length[q](
                X2_internal)  # (N, D) lengthscales
            var_X, var_X2 = self.var[q](X), self.var[q](
                X2_internal)  # (N, 1) variances

            # Compute length-scale term (E) - based on inverse lengthscales and distances
            Xr = tf.expand_dims(X, 1)  # (N1, 1, D)
            X2r = tf.expand_dims(X2_internal, 0)  # (1, N2, D)
            l1 = tf.expand_dims(lens_X, 1)  # (N1, 1, D)
            l2 = tf.expand_dims(lens_X2, 0)  # (1, N2, D)

            L = tf.square(l1) + tf.square(
                l2)  # (N1, N2, D) - sum of squared lengthscales

            # D term: Squared difference scaled by L, summed over dimensions
            D_term = tf.square(Xr - X2r) / L  # (N1, N2, D)
            D_term = tf.reduce_sum(D_term, 2)  # (N1, N2) - sum over dimensions

            # Determinant term: Product over dimensions of (2 * l1 * l2 / L)^(1/2)
            det_term = tf.sqrt(2 * l1 * l2 / L)  # (N1, N2, D)
            det_term = tf.reduce_prod(det_term,
                                      2)  # (N1, N2) - product over dimensions

            # E term: Combine determinant and exponential of D_term
            E = det_term * tf.exp(-D_term)  # (N1, N2)

            # Compute cosine term (COS) - based on frequencies and dot products with X
            # (N1, D) * (N1, D) -> sum over D -> (N1, 1)
            muX = (tf.reduce_sum(freq_X * X, 1, keepdims=True) - tf.transpose(
                tf.reduce_sum(freq_X2 * X2_internal, 1, keepdims=True)))
            COS = tf.cos(2 * np.pi * muX)  # (N1, N2)

            # Compute kernel variance term (WW) - outer product of variance predictions
            WW = tf.matmul(var_X, var_X2,
                           transpose_b=True)  # (N1, 1) @ (1, N2) -> (N1, N2)

            # Compute the q'th kernel component and add to total kernel
            kern += WW * E * COS

        # Add jitter to the diagonal for K(X,X) matrices for numerical stability
        if equal:
            return robust_kernel(kern, tf.shape(X)[0])
        else:
            return kern

    @tf.autograph.experimental.do_not_convert
    def K_diag(self, X: tf.Tensor) -> tf.Tensor:
        """
        Computes the diagonal of the covariance matrix `K(X, X)`.
        For the Neural Spectral Kernel, this is `sum_q(var_q(X)^2) + jitter`.

        Args:
            X (tf.Tensor): (N, D); Input data points. `N` is the number of points.

        Returns:
            tf.Tensor: (N,); A 1D tensor representing the diagonal elements of the
                        covariance matrix.
        """
        kd = default_jitter()  # Initialize with a small jitter
        for q in range(self.Q):
            # Sum of squared variance predictions from each MLP component
            kd += tf.square(self.var[q](X))
        return tf.squeeze(
            kd)  # Remove singleton dimension (e.g., (N, 1) -> (N,))

`K(X, X2=None)` ¶

Computes the covariance matrix between/amongst the input variables X and X2. If X2 is None, the function computes K(X, X) (a symmetric covariance matrix). Otherwise, it computes K(X, X2) (a cross-covariance matrix).

The kernel is a sum over Q mixture components, where each component's parameters (frequency, lengthscale, variance) are determined by MLPs based on the input locations.

Parameters:

Name	Type	Description	Default
`X`	`Tensor`	(N1, D); First set of input variables to compute covariance from. `N1` is the number of points, `D` is the dimensionality.	required
`X2`	`Optional[Tensor]`	(N2, D); Optional second set of input variables. If provided, computes cross-covariance `K(X, X2)`. If None, computes auto-covariance `K(X, X)`.	`None`

Returns:

Type	Description
`Tensor`	tf.Tensor: (N1, N2); The computed covariance matrix. If `X2` is None, the diagonal of `K(X, X)` is jittered for numerical stability.

Source code in sgptools/kernels/neural_kernel.py

@tf.autograph.experimental.do_not_convert
def K(self, X: tf.Tensor, X2: Optional[tf.Tensor] = None) -> tf.Tensor:
    """
    Computes the covariance matrix between/amongst the input variables `X` and `X2`.
    If `X2` is None, the function computes `K(X, X)` (a symmetric covariance matrix).
    Otherwise, it computes `K(X, X2)` (a cross-covariance matrix).

    The kernel is a sum over `Q` mixture components, where each component's
    parameters (frequency, lengthscale, variance) are determined by MLPs
    based on the input locations.

    Args:
        X (tf.Tensor): (N1, D); First set of input variables to compute covariance from.
                       `N1` is the number of points, `D` is the dimensionality.
        X2 (Optional[tf.Tensor]): (N2, D); Optional second set of input variables.
                                 If provided, computes cross-covariance `K(X, X2)`.
                                 If None, computes auto-covariance `K(X, X)`.

    Returns:
        tf.Tensor: (N1, N2); The computed covariance matrix. If `X2` is None, the
                   diagonal of `K(X, X)` is jittered for numerical stability.
    """
    if X2 is None:
        X2_internal = X
        equal = True  # Flag to add jitter to diagonal for K(X,X)
    else:
        X2_internal = X2
        equal = False

    kern = tf.constant(0.0, dtype=float_type)  # Initialize kernel sum

    for q in range(self.Q):
        # Compute latent function values (frequencies, lengthscales, variances)
        # by passing input locations through the MLPs.
        freq_X, freq_X2 = self.freq[q](X), self.freq[q](
            X2_internal)  # (N, D) frequencies
        lens_X, lens_X2 = self.length[q](X), self.length[q](
            X2_internal)  # (N, D) lengthscales
        var_X, var_X2 = self.var[q](X), self.var[q](
            X2_internal)  # (N, 1) variances

        # Compute length-scale term (E) - based on inverse lengthscales and distances
        Xr = tf.expand_dims(X, 1)  # (N1, 1, D)
        X2r = tf.expand_dims(X2_internal, 0)  # (1, N2, D)
        l1 = tf.expand_dims(lens_X, 1)  # (N1, 1, D)
        l2 = tf.expand_dims(lens_X2, 0)  # (1, N2, D)

        L = tf.square(l1) + tf.square(
            l2)  # (N1, N2, D) - sum of squared lengthscales

        # D term: Squared difference scaled by L, summed over dimensions
        D_term = tf.square(Xr - X2r) / L  # (N1, N2, D)
        D_term = tf.reduce_sum(D_term, 2)  # (N1, N2) - sum over dimensions

        # Determinant term: Product over dimensions of (2 * l1 * l2 / L)^(1/2)
        det_term = tf.sqrt(2 * l1 * l2 / L)  # (N1, N2, D)
        det_term = tf.reduce_prod(det_term,
                                  2)  # (N1, N2) - product over dimensions

        # E term: Combine determinant and exponential of D_term
        E = det_term * tf.exp(-D_term)  # (N1, N2)

        # Compute cosine term (COS) - based on frequencies and dot products with X
        # (N1, D) * (N1, D) -> sum over D -> (N1, 1)
        muX = (tf.reduce_sum(freq_X * X, 1, keepdims=True) - tf.transpose(
            tf.reduce_sum(freq_X2 * X2_internal, 1, keepdims=True)))
        COS = tf.cos(2 * np.pi * muX)  # (N1, N2)

        # Compute kernel variance term (WW) - outer product of variance predictions
        WW = tf.matmul(var_X, var_X2,
                       transpose_b=True)  # (N1, 1) @ (1, N2) -> (N1, N2)

        # Compute the q'th kernel component and add to total kernel
        kern += WW * E * COS

    # Add jitter to the diagonal for K(X,X) matrices for numerical stability
    if equal:
        return robust_kernel(kern, tf.shape(X)[0])
    else:
        return kern

`K_diag(X)` ¶

Computes the diagonal of the covariance matrix K(X, X). For the Neural Spectral Kernel, this is sum_q(var_q(X)^2) + jitter.

Parameters:

Name	Type	Description	Default
`X`	`Tensor`	(N, D); Input data points. `N` is the number of points.	required

Returns:

Type	Description
`Tensor`	tf.Tensor: (N,); A 1D tensor representing the diagonal elements of the covariance matrix.

Source code in sgptools/kernels/neural_kernel.py

@tf.autograph.experimental.do_not_convert
def K_diag(self, X: tf.Tensor) -> tf.Tensor:
    """
    Computes the diagonal of the covariance matrix `K(X, X)`.
    For the Neural Spectral Kernel, this is `sum_q(var_q(X)^2) + jitter`.

    Args:
        X (tf.Tensor): (N, D); Input data points. `N` is the number of points.

    Returns:
        tf.Tensor: (N,); A 1D tensor representing the diagonal elements of the
                    covariance matrix.
    """
    kd = default_jitter()  # Initialize with a small jitter
    for q in range(self.Q):
        # Sum of squared variance predictions from each MLP component
        kd += tf.square(self.var[q](X))
    return tf.squeeze(
        kd)  # Remove singleton dimension (e.g., (N, 1) -> (N,))

`init(input_dim, active_dims=None, Q=1, hidden_sizes=None)` ¶

Initializes the Neural Spectral Kernel.

Parameters:

Name	Type	Description	Default
`input_dim`	`int`	Number of dimensions of the input data points (e.g., 2 for 2D data).	required
`active_dims`	`Optional[List[int]]`	A list of indices specifying which input dimensions the kernel operates on. If None, all dimensions are active. Defaults to None.	`None`
`Q`	`int`	The number of MLP mixture components used in the kernel function. Each component has its own set of MLPs for frequency, lengthscale, and variance. Defaults to 1.	`1`
`hidden_sizes`	`List[int]`	A list where each element specifies the number of hidden units in a layer of the MLPs. The length of this list determines the number of hidden layers. Defaults to [32, 32].	`None`

Usage

import gpflow
import numpy as np
from sgptools.kernels.neural_kernel import NeuralSpectralKernel

# Initialize a Neural Spectral Kernel for 2D data with 3 mixture components
# and MLPs with 2 hidden layers of 64 units each.
kernel = NeuralSpectralKernel(input_dim=2, Q=3, hidden_sizes=[64, 64])

# You can then use this kernel in a GPflow model:
# model = gpflow.models.SGPR(data=(X_train, Y_train), kernel=kernel, ...)

Source code in sgptools/kernels/neural_kernel.py

def __init__(self,
             input_dim: int,
             active_dims: Optional[List[int]] = None,
             Q: int = 1,
             hidden_sizes: List[int] = None):
    """
    Initializes the Neural Spectral Kernel.

    Args:
        input_dim (int): Number of dimensions of the input data points (e.g., 2 for 2D data).
        active_dims (Optional[List[int]]): A list of indices specifying which input dimensions
                                            the kernel operates on. If None, all dimensions are active.
                                            Defaults to None.
        Q (int): The number of MLP mixture components used in the kernel function.
                 Each component has its own set of MLPs for frequency, lengthscale, and variance.
                 Defaults to 1.
        hidden_sizes (List[int]): A list where each element specifies the number of hidden units
                                  in a layer of the MLPs. The length of this list determines
                                  the number of hidden layers. Defaults to [32, 32].

    Usage:
        ```python
        import gpflow
        import numpy as np
        from sgptools.kernels.neural_kernel import NeuralSpectralKernel

        # Initialize a Neural Spectral Kernel for 2D data with 3 mixture components
        # and MLPs with 2 hidden layers of 64 units each.
        kernel = NeuralSpectralKernel(input_dim=2, Q=3, hidden_sizes=[64, 64])

        # You can then use this kernel in a GPflow model:
        # model = gpflow.models.SGPR(data=(X_train, Y_train), kernel=kernel, ...)
        ```
    """
    super().__init__(active_dims=active_dims)

    if hidden_sizes is None:
        hidden_sizes = [32, 32]  # Default if not provided
    else:
        hidden_sizes = list(hidden_sizes)

    self.input_dim = input_dim
    self.Q = Q
    self.num_hidden = len(hidden_sizes)

    # Initialize lists of MLPs for each component
    self.freq: List[NN] = []
    self.length: List[NN] = []
    self.var: List[NN] = []

    # Create Q sets of MLPs
    for q in range(self.Q):
        # MLP for frequency: maps input_dim -> hidden_sizes -> input_dim
        # Output activation 'softplus' ensures positive frequencies.
        freq_nn = NN([input_dim] + hidden_sizes + [input_dim],
                     output_activation_fn='softplus')

        # MLP for lengthscale: maps input_dim -> hidden_sizes -> input_dim
        # Output activation 'softplus' ensures positive lengthscales.
        length_nn = NN([input_dim] + hidden_sizes + [input_dim],
                       output_activation_fn='softplus')

        # MLP for variance: maps input_dim -> hidden_sizes -> 1 (scalar variance)
        # Output activation 'softplus' ensures positive variances.
        var_nn = NN([input_dim] + hidden_sizes + [1],
                    output_activation_fn='softplus')

        self.freq.append(freq_nn)
        self.length.append(length_nn)
        self.var.append(var_nn)

`sgptools.kernels.neural_kernel.init_neural_kernel(X_train, Y_train, inducing_variable, Q, n_inits=1, hidden_sizes=None)` ¶

Helper function to initialize a Sparse Gaussian Process Regression (SGPR) model with a Neural Spectral Kernel. This function can perform multiple random initializations and return the model with the best initial Evidence Lower Bound (ELBO).

Refer to the original paper for more details

Neural Non-Stationary Spectral Kernel [Remes et al., 2018]

Parameters:

Name	Type	Description	Default
`X_train`	`ndarray`	(n, d); Input training set points.	required
`Y_train`	`ndarray`	(n, 1); Training set labels.	required
`inducing_variable`	`ndarray`	(m, d); Initial inducing points. These are passed directly to the SGPR model.	required
`Q`	`int`	The number of MLP mixture components for the Neural Spectral Kernel.	required
`n_inits`	`int`	Number of times to randomly initialize the kernel's MLPs and compute the initial ELBO. The model with the highest ELBO among these initializations is returned. Defaults to 1.	`1`
`hidden_sizes`	`Optional[List[int]]`	List of integers specifying the number of hidden units in each MLP layer. If None, [32, 32] is used.	`None`

Returns:

Name	Type	Description
`SGPR`	`SGPR`	The SGPR model instance initialized with the Neural Spectral Kernel that yielded the best initial ELBO.

Usage

import numpy as np
import gpflow
from sgptools.kernels.neural_kernel import init_neural_kernel
from sgptools.utils.misc import get_inducing_pts # For initial inducing points

# Dummy data
X_train_data = np.random.rand(100, 2).astype(np.float32)
Y_train_data = (np.sin(X_train_data[:, 0]) + np.cos(X_train_data[:, 1]))[:, None].astype(np.float32)

# Initial inducing points (e.g., subset of training data or k-means centers)
initial_inducing_points = get_inducing_pts(X_train_data, num_inducing=20)

# Initialize the SGPR model with Neural Spectral Kernel
# Try 3 random initializations for the MLPs.
model_ns_kernel = init_neural_kernel(
    X_train=X_train_data,
    Y_train=Y_train_data,
    inducing_variable=initial_inducing_points,
    Q=5,              # 5 mixture components
    n_inits=3,        # 3 initializations
    hidden_sizes=[16, 16] # Custom hidden layer sizes
)

# You would typically optimize this model further using optimize_model:
# from sgptools.utils.gpflow import optimize_model
# optimize_model(model_ns_kernel)

Source code in sgptools/kernels/neural_kernel.py

def init_neural_kernel(X_train: np.ndarray,
                       Y_train: np.ndarray,
                       inducing_variable: np.ndarray,
                       Q: int,
                       n_inits: int = 1,
                       hidden_sizes: Optional[List[int]] = None) -> SGPR:
    """
    Helper function to initialize a Sparse Gaussian Process Regression (SGPR) model
    with a Neural Spectral Kernel. This function can perform multiple random
    initializations and return the model with the best initial Evidence Lower Bound (ELBO).

    Refer to the original paper for more details:
        - Neural Non-Stationary Spectral Kernel [Remes et al., 2018]

    Args:
        X_train (np.ndarray): (n, d); Input training set points.
        Y_train (np.ndarray): (n, 1); Training set labels.
        inducing_variable (np.ndarray): (m, d); Initial inducing points. These are passed
                                        directly to the SGPR model.
        Q (int): The number of MLP mixture components for the Neural Spectral Kernel.
        n_inits (int): Number of times to randomly initialize the kernel's MLPs and
                       compute the initial ELBO. The model with the highest ELBO
                       among these initializations is returned. Defaults to 1.
        hidden_sizes (Optional[List[int]]): List of integers specifying the number of hidden
                                            units in each MLP layer. If None, [32, 32] is used.

    Returns:
        SGPR: The SGPR model instance initialized with the Neural Spectral Kernel
              that yielded the best initial ELBO.

    Usage:
        ```python
        import numpy as np
        import gpflow
        from sgptools.kernels.neural_kernel import init_neural_kernel
        from sgptools.utils.misc import get_inducing_pts # For initial inducing points

        # Dummy data
        X_train_data = np.random.rand(100, 2).astype(np.float32)
        Y_train_data = (np.sin(X_train_data[:, 0]) + np.cos(X_train_data[:, 1]))[:, None].astype(np.float32)

        # Initial inducing points (e.g., subset of training data or k-means centers)
        initial_inducing_points = get_inducing_pts(X_train_data, num_inducing=20)

        # Initialize the SGPR model with Neural Spectral Kernel
        # Try 3 random initializations for the MLPs.
        model_ns_kernel = init_neural_kernel(
            X_train=X_train_data,
            Y_train=Y_train_data,
            inducing_variable=initial_inducing_points,
            Q=5,              # 5 mixture components
            n_inits=3,        # 3 initializations
            hidden_sizes=[16, 16] # Custom hidden layer sizes
        )

        # You would typically optimize this model further using optimize_model:
        # from sgptools.utils.gpflow import optimize_model
        # optimize_model(model_ns_kernel)
        ```
    """
    # Convert NumPy arrays to TensorFlow tensors
    X_train_tf, Y_train_tf = data_input_to_tensor((X_train, Y_train))

    best_loglik = -np.inf  # Track the best ELBO found
    best_m: Optional[SGPR] = None  # Store the best model

    N, input_dim = X_train_tf.shape  # Get number of data points and input dimensionality

    for k_init_idx in range(n_inits):
        # Create a new NeuralSpectralKernel instance for each initialization
        current_kernel = NeuralSpectralKernel(input_dim=input_dim,
                                              Q=Q,
                                              hidden_sizes=hidden_sizes)

        # Create an SGPR model with the current kernel initialization
        model = SGPR(data=(X_train_tf, Y_train_tf),
                     inducing_variable=inducing_variable,
                     kernel=current_kernel)

        # Compute the initial ELBO (Evidence Lower Bound)
        loglik = model.elbo()

        # Check if the current initialization is better than previous ones
        if loglik > best_loglik:
            best_loglik = loglik
            # Deepcopy the model to save its state, as it will be deleted/overwritten in next iteration
            # This requires gpflow.utilities.traversal.deepcopy or similar for GPflow models
            # For simplicity, we directly assign here, assuming shallow copy is sufficient
            # or that the user will optimize it later. For robust best model saving, a deepcopy is safer.
            best_m = model

        # Explicitly delete the model and run garbage collection to free memory
        # (important if n_inits is large and models are complex)
        del model
        gc.collect()

    return best_m

Neural Kernel¶

sgptools.kernels.neural_kernel.NeuralSpectralKernel ¶

K(X, X2=None) ¶

K_diag(X) ¶

__init__(input_dim, active_dims=None, Q=1, hidden_sizes=None) ¶

sgptools.kernels.neural_kernel.init_neural_kernel(X_train, Y_train, inducing_variable, Q, n_inits=1, hidden_sizes=None) ¶

`sgptools.kernels.neural_kernel.NeuralSpectralKernel` ¶

`K(X, X2=None)` ¶

`K_diag(X)` ¶

`init(input_dim, active_dims=None, Q=1, hidden_sizes=None)` ¶

`sgptools.kernels.neural_kernel.init_neural_kernel(X_train, Y_train, inducing_variable, Q, n_inits=1, hidden_sizes=None)` ¶