Hyper Models¶

The mixturelib.hyper_models contains classes:

mixturelib.hyper_models.HyperModel
mixturelib.hyper_models.HyperModelDirichlet
mixturelib.hyper_models.HyperExpertNN

class mixturelib.hyper_models.HyperExpertNN(input_dim=20, hidden_dim=10, output_dim=10, epochs=100, device='cpu')[source]¶

A hyper model for mixture of experts. The hyper model prediction on local models probability are depend on the object.

In this hyper model, the probability of each local model is a neural network prediction with softmax. Neural network is a three layer fully conected neural network.

Parameters:	input_dim (int) – The number of features. hidden_dim (int) – The number of parameters in hidden layer. output_dim (int) – The number of local models. epochs (int) – The number epoch to train neural network in each step. device – The device for pytorch. Can be ‘cpu’ or ‘gpu’. Default ‘cpu’.

Example:

>>> _ = torch.random.manual_seed(42) # Set random seed for repeatability
>>>
>>> w = torch.randn(2, 1) # Generate real parameter vector
>>> X = torch.randn(5, 2) # Generate features data
>>> Z = torch.distributions.dirichlet.Dirichlet(
...     torch.tensor([0.5, 0.5])).sample(
...         (5,)) # Set corresponding between data and local models.
>>> Y = X@w + 0.1*torch.randn(5, 1) # Generate target data with noise 0.1
>>>
>>> hyper_model = HyperExpertNN(
...     input_dim=2, 
...     output_dim=2) # Init hyper model with Diriclet weighting
>>> hyper_parameters = {} # Withor hyper parameters
>>>
>>> hyper_model.LogPiExpectation(
...     X, Y, hyper_parameters) # Log of probability before E step
tensor([[-0.4981, -0.9356],
        [-0.5176, -0.9063],
        [-0.4925, -0.9443],
        [-0.4957, -0.9395],
        [-0.4969, -0.9376]])
>>> 
>>> hyper_model.E_step(X, Y, Z, hyper_parameters)
>>> hyper_model.LogPiExpectation(
...     X, Y, hyper_parameters)  # Log of probability after E step
tensor([[-0.6294, -0.7612],
        [-0.9327, -0.5000],
        [-0.3273, -1.2760],
        [-0.5775, -0.8239],
        [-0.5357, -0.8801]])

E_step(X, Y, Z, HyperParameters)[source]¶

The method does nothing.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. Z (FloatTensor) – The tensor of shape num_elements \(\times\) num_models. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

LogPiExpectation(X, Y, HyperParameters)[source]¶

Returns the expected value of each models log of probability.

Takes log softmax from the forward method.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.
Returns:	The tensor of shape num_elements \(\times\) num_models. The espected value of each models probability.
Return type:	FloatTensor

M_step(X, Y, Z, HyperParameters)[source]¶

Doing M-step of EM-algorithm. Finds model parameters by using gradient descent.

Parameters are optimized with respect to the loss function \(loss = -\sum_{i=1}^{num\_elements}\sum_{k=1}^{num\_models} \log\pi_k(x_i, V)\), where V is a neural network parameters.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. Z (FloatTensor) – The tensor of shape num_elements \(\times\) num_models. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

PredictPi(X, HyperParameters)[source]¶

Returns the probability (weight) of each models.

Takes softmax from the forward method.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.
Returns:	The tensor of shape num_elements \(\times\) num_models. The probability (weight) of each models.
Return type:	FloatTensor

forward(input)[source]¶

Returns model prediction for the given input data.

Warning

The number num_answers can be just 1.

Parameters:	input (FloatTensor.) – The tensor of shape num_elements \(\times\) num_feature.
Returns:	The tensor of shape num_elements \(\times\) num_models. Model prediction of probability for all local models for the given input data.
Return type:	FloatTensor

class mixturelib.hyper_models.HyperModel[source]¶

Base class for all hyper models.

E_step(X, Y, Z, HyperParameters)[source]¶

Doing E-step of EM-algorithm. Finds variational probability q of model parameters.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. Z (FloatTensor) – The tensor of shape num_elements \(\times\) num_models. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

LogPiExpectation(X, Y, HyperParameters)[source]¶

Returns the expected value of each models probability.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

M_step(X, Y, Z, HyperParameters)[source]¶

Doing M-step of EM-algorithm. Finds model hyper parameters.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. Z (FloatTensor) – The tensor of shape num_elements \(\times\) num_models. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

PredictPi(X, HyperParameters)[source]¶

Returns the probability of each models.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

class mixturelib.hyper_models.HyperModelDirichlet(output_dim=2, device='cpu')[source]¶

A hyper model for mixture of model. The hyper model cannot predict local model for each object, because model probability does not depend on object.

In this hyper model, the probability of each local model is a vector from dirichlet distribution with parameter \(\mu\).

Parameters:	output_dim (int) – The number of local models. device – The device for pytorch. Can be ‘cpu’ or ‘gpu’. Default ‘cpu’.

Example:

>>> _ = torch.random.manual_seed(42) # Set random seed for repeatability
>>>
>>> w = torch.randn(2, 1) # Generate real parameter vector
>>> X = torch.randn(5, 2) # Generate features data
>>> Z = torch.distributions.dirichlet.Dirichlet(
...     torch.tensor([0.5, 0.5])).sample(
...         (5,)) # Set corresponding between data and local models.
>>> Y = X@w + 0.1*torch.randn(5, 1) # Generate target data with noise 0.1
>>>
>>> hyper_model = HyperModelDirichlet(
...     output_dim=2) # Init hyper model with Diriclet weighting
>>> hyper_parameters = {} # Withor hyper parameters
>>>
>>> hyper_model.LogPiExpectation(
...     X, Y, hyper_parameters) # Log of probability before E step
tensor([[-1.0000, -1.0000],
        [-1.0000, -1.0000],
        [-1.0000, -1.0000],
        [-1.0000, -1.0000],
        [-1.0000, -1.0000]])
>>> 
>>> hyper_model.E_step(X, Y, Z, hyper_parameters)
>>> hyper_model.LogPiExpectation(
...     X, Y, hyper_parameters)  # Log of probability after E step
tensor([[-0.7118, -0.8310],
        [-0.7118, -0.8310],
        [-0.7118, -0.8310],
        [-0.7118, -0.8310],
        [-0.7118, -0.8310]])

E_step(X, Y, Z, HyperParameters)[source]¶

Doing E-step of EM-algorithm. Finds variational probability q of model parameters.

Calculate analytical solution for estimate q in the class of normal distributions \(q = Dir(m)\), where \(m = \mu + \gamma\), where \(\gamma_k = \sum_{i=1}^{num\_elements}Z_{ik}\), and \(\mu\) is prior.

Warning

Now \(\mu_k\) is 1 for all k, and can not be changed.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. Z (FloatTensor) – The tensor of shape num_elements \(\times\) num_models. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

LogPiExpectation(X, Y, HyperParameters)[source]¶

Returns the expected value of each models log of probability.

Returns the expectation of \(\log \pi\) value where \(\pi\) is a random value from Dirichlet distribution.

This function calculates by using \(\digamma\) function

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.
Returns:	The tensor of shape num_elements \(\times\) num_models. The espected value of each models probability.
Return type:	FloatTensor

M_step(X, Y, Z, HyperParameters)[source]¶

The method does nothing.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. Z (FloatTensor) – The tensor of shape num_elements \(\times\) num_models. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

PredictPi(X, HyperParameters)[source]¶

Returns the probability (weight) of each models.

Return the same vector \(\pi\) for all object. Each \(\pi = \frac{\textbf{m}}{\sum \textbf{m}_k}\), where \(\textbf{m}\) is a parameter of Dirichlet pdf.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.
Returns:	The tensor of shape num_elements \(\times\) num_models. The probability (weight) of each models.
Return type:	FloatTensor

class mixturelib.hyper_models.HyperModelGateSparsed(output_dim=2, gamma=1.0, mu=<sphinx.ext.autodoc.importer._MockObject object>, device='cpu')[source]¶

A hyper model for mixture of model. Each \(i\)-th object from train dataset has own probability to each model \(\pi^i\).

In this hyper model, the probability of each local model is a vector from dirichlet distribution with parameter \(\mu\), and \(l\).

Parameters:	output_dim (int) – The number of local models. device – The device for pytorch. Can be ‘cpu’ or ‘gpu’. Default ‘cpu’.

Example:

>>> _ = torch.random.manual_seed(42) # Set random seed for repeatability
>>>
>>> w = torch.randn(2, 1) # Generate real parameter vector
>>> X = torch.randn(5, 2) # Generate features data
>>> Z = torch.distributions.dirichlet.Dirichlet(
...     torch.tensor([0.5, 0.5])).sample(
...         (5,)) # Set corresponding between data and local models.
>>> Y = X@w + 0.1*torch.randn(5, 1) # Generate target data with noise 0.1
>>>
>>> hyper_model = HyperModelGateSparsed(
...     output_dim=2) # Model with Diriclet weighting for each sample
>>> hyper_parameters = {} # Withor hyper parameters
>>>
>>> hyper_model.LogPiExpectation(
...     X, Y, hyper_parameters) # Log of probability before E step
tensor([[-1.3863, -1.3863],
            [-1.3863, -1.3863],
            [-1.3863, -1.3863],
            [-1.3863, -1.3863],
            [-1.3863, -1.3863]])
>>> 
>>> hyper_model.E_step(X, Y, Z, hyper_parameters)
>>> hyper_model.LogPiExpectation(
...     X, Y, hyper_parameters)  # Log of probability after E step
tensor([[-1.9677, -0.4830],
            [-1.7785, -0.5417],
            [-0.5509, -1.7521],
            [-0.7250, -1.3642],
            [-0.4839, -1.9644]])

E_step(X, Y, Z, HyperParameters)[source]¶

Doing E-step of EM-algorithm. Finds variational probability q of model parameters.

Calculate analytical solution for estimate q in the class of normal distributions \(q = Dir(m)\), where \(m = \mu + \gamma\), where \(\gamma_k = \sum_{i=1}^{num\_elements}Z_{ik}\), and \(\mu\) is prior.

Warning

Now \(\mu_k\) is 1 for all k, and can not be changed.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. Z (FloatTensor) – The tensor of shape num_elements \(\times\) num_models. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

LogPiExpectation(X, Y, HyperParameters)[source]¶

Returns the expected value of each models log of probability.

Returns the expectation of \(\log \pi\) value where \(\pi\) is a random value from Dirichlet distribution.

This function calculates by using \(\digamma\) function

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.
Returns:	The tensor of shape num_elements \(\times\) num_models. The espected value of each models probability.
Return type:	FloatTensor

M_step(X, Y, Z, HyperParameters)[source]¶

The method does nothing.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. Y (FloatTensor) – The tensor of shape num_elements \(\times\) num_answers. Z (FloatTensor) – The tensor of shape num_elements \(\times\) num_models. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.

PredictPi(X, HyperParameters)[source]¶

Returns the probability (weight) of each models.

Return the same vector \(\pi\) for all object. Each \(\pi = \frac{\textbf{m}}{\sum \textbf{m}_k}\), where \(\textbf{m}\) is a parameter of Dirichlet pdf.

Parameters:	X (FloatTensor) – The tensor of shape num_elements \(\times\) num_feature. HyperParameters (dict) – The dictionary of all hyper parametrs. Where key is string and value is FloatTensor.
Returns:	The tensor of shape num_elements \(\times\) num_models. The probability (weight) of each models.
Return type:	FloatTensor