Skip to content

SchurMI

sgptools.objectives.SchurMI

Bases: SLogMI

Computes Mutual Information (MI) using the Schur complement for improved numerical stability and computational efficiency.

This method leverages the properties of block matrix determinants to reformulate the MI calculation. The standard MI formula is: \(MI = \log|K_{XX}| + \log|K_{oo}| - \log|K_{combined}|\) where \(K_{XX} = K(X, X)\), \(K_{oo} = K(X_{objective}, X_{objective})\), and \(K_{combined}\) is the kernel of the union of the points.

Using the Schur complement identity for determinants, \(\log|K_{combined}| = \log|K_{oo}| + \log|K_{XX} - K_{Xo} K_{oo}^{-1} K_{oX}|\), the MI calculation simplifies to: \(MI = \log|K_{XX}| - \log|SchurComplement|\) where the Schur Complement is \(K_{XX} - K_{Xo} K_{oo}^{-1} K_{oX}\).

This approach is particularly efficient when the objective is evaluated multiple times for different sensing locations \(X\) but with a fixed set of \(X_{objective}\) points. By caching the inverse of \(K_{oo}\), we avoid costly recomputations. Like SLogMI, this class uses tf.linalg.slogdet and adds jitter for robust computation.

Source code in sgptools/objectives.py
class SchurMI(SLogMI):
    """
    Computes Mutual Information (MI) using the Schur complement for improved
    numerical stability and computational efficiency.

    This method leverages the properties of block matrix determinants to reformulate
    the MI calculation. The standard MI formula is:
    $MI = \\log|K_{XX}| + \\log|K_{oo}| - \\log|K_{combined}|$
    where $K_{XX} = K(X, X)$, $K_{oo} = K(X_{objective}, X_{objective})$, and
    $K_{combined}$ is the kernel of the union of the points.

    Using the Schur complement identity for determinants,
    $\\log|K_{combined}| = \\log|K_{oo}| + \\log|K_{XX} - K_{Xo} K_{oo}^{-1} K_{oX}|$,
    the MI calculation simplifies to:
    $MI = \\log|K_{XX}| - \\log|SchurComplement|$
    where the Schur Complement is $K_{XX} - K_{Xo} K_{oo}^{-1} K_{oX}$.

    This approach is particularly efficient when the objective is evaluated
    multiple times for different sensing locations $X$ but with a fixed set of
    $X_{objective}$ points. By caching the inverse of $K_{oo}$, we avoid costly
    recomputations. Like `SLogMI`, this class uses `tf.linalg.slogdet` and
    adds jitter for robust computation.
    """
    def __init__(self,
                 X_objective: np.ndarray,
                 kernel: gpflow.kernels.Kernel,
                 noise_variance: float,
                 jitter: float = 1e-6,
                 cache: bool = True,
                 **kwargs: Any):
        """
        Initializes the SchurMI objective.

        Args:
            X_objective (np.ndarray): The fixed set of data points against which
                                      MI is computed. Shape: (N, D).
            kernel (gpflow.kernels.Kernel): The GPflow kernel to compute covariances.
            noise_variance (float): The observed data noise variance.
            jitter (float): A small value added to the diagonal of covariance
                            matrices for numerical stability. Defaults to 1e-6.
            cache (bool): If `True`, the inverse of $K(X_{objective}, X_{objective})$
                          is pre-computed and cached to accelerate subsequent MI
                          calculations. Defaults to True.
            **kwargs: Arbitrary keyword arguments.
        """
        super().__init__(X_objective, kernel, noise_variance, jitter, cache=False, **kwargs)
        self.cache = cache
        if self.cache:
            # K(X_objective, X_objective)
            self.K_obj_obj = self.kernel(self.X_objective)
            # Compute the inverse
            self.inv_K_obj_obj = tf.linalg.inv(self.jitter_fn(self.K_obj_obj))

    def __call__(self, X: tf.Tensor) -> tf.Tensor:
        """
        Computes the Mutual Information for the input points `X` using the
        Schur complement method.

        Args:
            X (tf.Tensor): The input points (e.g., sensing locations) for which
                           MI is to be computed. Shape: (M, D).

        Returns:
            tf.Tensor: The computed Mutual Information value.

        Usage:
            ```python
            import gpflow
            import numpy as np
            # Assume X_objective and kernel are defined
            # X_objective = np.random.rand(100, 2)
            # kernel = gpflow.kernels.SquaredExponential()
            # noise_variance = 0.1

            schur_mi_objective = SchurMI(
                X_objective=X_objective,
                kernel=kernel,
                noise_variance=noise_variance
            )
            X_sensing = tf.constant(np.random.rand(10, 2), dtype=tf.float64)
            mi_value = schur_mi_objective(X_sensing)
            ```
        """
        if self.cache:
            inv_K_obj_obj = self.inv_K_obj_obj
        else:
            # K(X_objective, X_objective)
            K_obj_obj = self.kernel(self.X_objective)
            # Compute the inverse
            inv_K_obj_obj = tf.linalg.inv(self.jitter_fn(K_obj_obj))

        K_X_X = self.kernel(X)
        _, logdet_K_X_X = tf.linalg.slogdet(self.jitter_fn(K_X_X))
        K_X_obj = self.kernel(X, self.X_objective)
        transpose_K_X_obj = tf.transpose(K_X_obj)
        schur = K_X_X - K_X_obj @ inv_K_obj_obj @ transpose_K_X_obj
        _, schur_det = tf.linalg.slogdet(self.jitter_fn(schur))
        mi = logdet_K_X_X - schur_det
        return mi

    def update(self, kernel: gpflow.kernels.Kernel,
               noise_variance: float) -> None:
        """
        Updates the kernel and noise variance for the MI objective.
        This method is crucial for optimizing the GP hyperparameters externally
        and having the objective function reflect those changes.

        Args:
            kernel (gpflow.kernels.Kernel): The updated GPflow kernel function.
            noise_variance (float): The updated data noise variance.
        """
        super().update(kernel, noise_variance)
        if self.cache:
            # K(X_objective, X_objective)
            self.K_obj_obj = self.kernel(self.X_objective)
            # Compute the inverse
            self.inv_K_obj_obj = tf.linalg.inv(self.jitter_fn(self.K_obj_obj))

__call__(X)

Computes the Mutual Information for the input points X using the Schur complement method.

Parameters:

Name Type Description Default
X Tensor

The input points (e.g., sensing locations) for which MI is to be computed. Shape: (M, D).

required

Returns:

Type Description
Tensor

tf.Tensor: The computed Mutual Information value.

Usage
import gpflow
import numpy as np
# Assume X_objective and kernel are defined
# X_objective = np.random.rand(100, 2)
# kernel = gpflow.kernels.SquaredExponential()
# noise_variance = 0.1

schur_mi_objective = SchurMI(
    X_objective=X_objective,
    kernel=kernel,
    noise_variance=noise_variance
)
X_sensing = tf.constant(np.random.rand(10, 2), dtype=tf.float64)
mi_value = schur_mi_objective(X_sensing)
Source code in sgptools/objectives.py
def __call__(self, X: tf.Tensor) -> tf.Tensor:
    """
    Computes the Mutual Information for the input points `X` using the
    Schur complement method.

    Args:
        X (tf.Tensor): The input points (e.g., sensing locations) for which
                       MI is to be computed. Shape: (M, D).

    Returns:
        tf.Tensor: The computed Mutual Information value.

    Usage:
        ```python
        import gpflow
        import numpy as np
        # Assume X_objective and kernel are defined
        # X_objective = np.random.rand(100, 2)
        # kernel = gpflow.kernels.SquaredExponential()
        # noise_variance = 0.1

        schur_mi_objective = SchurMI(
            X_objective=X_objective,
            kernel=kernel,
            noise_variance=noise_variance
        )
        X_sensing = tf.constant(np.random.rand(10, 2), dtype=tf.float64)
        mi_value = schur_mi_objective(X_sensing)
        ```
    """
    if self.cache:
        inv_K_obj_obj = self.inv_K_obj_obj
    else:
        # K(X_objective, X_objective)
        K_obj_obj = self.kernel(self.X_objective)
        # Compute the inverse
        inv_K_obj_obj = tf.linalg.inv(self.jitter_fn(K_obj_obj))

    K_X_X = self.kernel(X)
    _, logdet_K_X_X = tf.linalg.slogdet(self.jitter_fn(K_X_X))
    K_X_obj = self.kernel(X, self.X_objective)
    transpose_K_X_obj = tf.transpose(K_X_obj)
    schur = K_X_X - K_X_obj @ inv_K_obj_obj @ transpose_K_X_obj
    _, schur_det = tf.linalg.slogdet(self.jitter_fn(schur))
    mi = logdet_K_X_X - schur_det
    return mi

__init__(X_objective, kernel, noise_variance, jitter=1e-06, cache=True, **kwargs)

Initializes the SchurMI objective.

Parameters:

Name Type Description Default
X_objective ndarray

The fixed set of data points against which MI is computed. Shape: (N, D).

required
kernel Kernel

The GPflow kernel to compute covariances.

required
noise_variance float

The observed data noise variance.

required
jitter float

A small value added to the diagonal of covariance matrices for numerical stability. Defaults to 1e-6.

1e-06
cache bool

If True, the inverse of \(K(X_{objective}, X_{objective})\) is pre-computed and cached to accelerate subsequent MI calculations. Defaults to True.

True
**kwargs Any

Arbitrary keyword arguments.

{}
Source code in sgptools/objectives.py
def __init__(self,
             X_objective: np.ndarray,
             kernel: gpflow.kernels.Kernel,
             noise_variance: float,
             jitter: float = 1e-6,
             cache: bool = True,
             **kwargs: Any):
    """
    Initializes the SchurMI objective.

    Args:
        X_objective (np.ndarray): The fixed set of data points against which
                                  MI is computed. Shape: (N, D).
        kernel (gpflow.kernels.Kernel): The GPflow kernel to compute covariances.
        noise_variance (float): The observed data noise variance.
        jitter (float): A small value added to the diagonal of covariance
                        matrices for numerical stability. Defaults to 1e-6.
        cache (bool): If `True`, the inverse of $K(X_{objective}, X_{objective})$
                      is pre-computed and cached to accelerate subsequent MI
                      calculations. Defaults to True.
        **kwargs: Arbitrary keyword arguments.
    """
    super().__init__(X_objective, kernel, noise_variance, jitter, cache=False, **kwargs)
    self.cache = cache
    if self.cache:
        # K(X_objective, X_objective)
        self.K_obj_obj = self.kernel(self.X_objective)
        # Compute the inverse
        self.inv_K_obj_obj = tf.linalg.inv(self.jitter_fn(self.K_obj_obj))

update(kernel, noise_variance)

Updates the kernel and noise variance for the MI objective. This method is crucial for optimizing the GP hyperparameters externally and having the objective function reflect those changes.

Parameters:

Name Type Description Default
kernel Kernel

The updated GPflow kernel function.

required
noise_variance float

The updated data noise variance.

required
Source code in sgptools/objectives.py
def update(self, kernel: gpflow.kernels.Kernel,
           noise_variance: float) -> None:
    """
    Updates the kernel and noise variance for the MI objective.
    This method is crucial for optimizing the GP hyperparameters externally
    and having the objective function reflect those changes.

    Args:
        kernel (gpflow.kernels.Kernel): The updated GPflow kernel function.
        noise_variance (float): The updated data noise variance.
    """
    super().update(kernel, noise_variance)
    if self.cache:
        # K(X_objective, X_objective)
        self.K_obj_obj = self.kernel(self.X_objective)
        # Compute the inverse
        self.inv_K_obj_obj = tf.linalg.inv(self.jitter_fn(self.K_obj_obj))