A class that represents a Hidden Markov Model with an arbitrary type of emission distribution. More...
Public Member Functions | |
HMM (const arma::mat &transition, const std::vector< Distribution > &emission, const double tolerance=1e-5) | |
Create the Hidden Markov Model with the given transition matrix and the given emission distributions. | |
HMM (const size_t states, const Distribution emissions, const double tolerance=1e-5) | |
Create the Hidden Markov Model with the given number of hidden states and the given default distribution for emissions. | |
size_t & | Dimensionality () |
Set the dimensionality of observations. | |
size_t | Dimensionality () const |
Get the dimensionality of observations. | |
std::vector< Distribution > & | Emission () |
Return a modifiable emission probability matrix reference. | |
const std::vector< Distribution > & | Emission () const |
Return the emission distributions. | |
double | Estimate (const arma::mat &dataSeq, arma::mat &stateProb) const |
Estimate the probabilities of each hidden state at each time step of each given data observation, using the Forward-Backward algorithm. | |
double | Estimate (const arma::mat &dataSeq, arma::mat &stateProb, arma::mat &forwardProb, arma::mat &backwardProb, arma::vec &scales) const |
Estimate the probabilities of each hidden state at each time step for each given data observation, using the Forward-Backward algorithm. | |
void | Generate (const size_t length, arma::mat &dataSequence, arma::Col< size_t > &stateSequence, const size_t startState=0) const |
Generate a random data sequence of the given length. | |
double | LogLikelihood (const arma::mat &dataSeq) const |
Compute the log-likelihood of the given data sequence. | |
double | Predict (const arma::mat &dataSeq, arma::Col< size_t > &stateSeq) const |
Compute the most probable hidden state sequence for the given data sequence, using the Viterbi algorithm, returning the log-likelihood of the most likely state sequence. | |
double & | Tolerance () |
Modify the tolerance of the Baum-Welch algorithm. | |
double | Tolerance () const |
Get the tolerance of the Baum-Welch algorithm. | |
void | Train (const std::vector< arma::mat > &dataSeq, const std::vector< arma::Col< size_t > > &stateSeq) |
Train the model using the given labeled observations; the transition and emission matrices are directly estimated. | |
void | Train (const std::vector< arma::mat > &dataSeq) |
Train the model using the Baum-Welch algorithm, with only the given unlabeled observations. | |
arma::mat & | Transition () |
Return a modifiable transition matrix reference. | |
const arma::mat & | Transition () const |
Return the transition matrix. | |
Private Member Functions | |
void | Backward (const arma::mat &dataSeq, const arma::vec &scales, arma::mat &backwardProb) const |
The Backward algorithm (part of the Forward-Backward algorithm). | |
void | Forward (const arma::mat &dataSeq, arma::vec &scales, arma::mat &forwardProb) const |
The Forward algorithm (part of the Forward-Backward algorithm). | |
Private Attributes | |
size_t | dimensionality |
Dimensionality of observations. | |
std::vector< Distribution > | emission |
Set of emission probability distributions; one for each state. | |
double | tolerance |
Tolerance of Baum-Welch algorithm. | |
arma::mat | transition |
Transition probability matrix. |
A class that represents a Hidden Markov Model with an arbitrary type of emission distribution.
This HMM class supports training (supervised and unsupervised), prediction of state sequences via the Viterbi algorithm, estimation of state probabilities, generation of random sequences, and calculation of the log-likelihood of a given sequence.
The template parameter, Distribution, specifies the distribution which the emissions follow. The class should implement the following functions:
class Distribution { public: // The type of observation used by this distribution. typedef something DataType; // Return the probability of the given observation. double Probability(const DataType& observation) const; // Estimate the distribution based on the given observations. void Estimate(const std::vector<DataType>& observations); // Estimate the distribution based on the given observations, given also // the probability of each observation coming from this distribution. void Estimate(const std::vector<DataType>& observations, const std::vector<double>& probabilities); };
See the mlpack::distribution::DiscreteDistribution class for an example. One would use the DiscreteDistribution class when the observations are non-negative integers. Other distributions could be Gaussians, a mixture of Gaussians (GMM), or any other probability distribution implementing the four Distribution functions.
Usage of the HMM class generally involves either training an HMM or loading an already-known HMM and taking probability measurements of sequences. Example code for supervised training of a Gaussian HMM (that is, where the emission output distribution is a single Gaussian for each hidden state) is given below.
extern arma::mat observations; // Each column is an observation. extern arma::Col<size_t> states; // Hidden states for each observation. // Create an untrained HMM with 5 hidden states and default (N(0, 1)) // Gaussian distributions with the dimensionality of the dataset. HMM<GaussianDistribution> hmm(5, GaussianDistribution(observations.n_rows)); // Train the HMM (the labels could be omitted to perform unsupervised // training). hmm.Train(observations, states);
Once initialized, the HMM can evaluate the probability of a certain sequence (with LogLikelihood()), predict the most likely sequence of hidden states (with Predict()), generate a sequence (with Generate()), or estimate the probabilities of each state for a sequence of observations (with Estimate()).
Distribution | Type of emission distribution for this HMM. |
Definition at line 93 of file hmm.hpp.
mlpack::hmm::HMM< Distribution >::HMM | ( | const size_t | states, | |
const Distribution | emissions, | |||
const double | tolerance = 1e-5 | |||
) |
Create the Hidden Markov Model with the given number of hidden states and the given default distribution for emissions.
The dimensionality of the observations is taken from the emissions variable, so it is important that the given default emission distribution is set with the correct dimensionality. Alternately, set the dimensionality with Dimensionality(). Optionally, the tolerance for convergence of the Baum-Welch algorithm can be set.
states | Number of states. | |
emissions | Default distribution for emissions. | |
tolerance | Tolerance for convergence of training algorithm (Baum-Welch). |
mlpack::hmm::HMM< Distribution >::HMM | ( | const arma::mat & | transition, | |
const std::vector< Distribution > & | emission, | |||
const double | tolerance = 1e-5 | |||
) |
Create the Hidden Markov Model with the given transition matrix and the given emission distributions.
The dimensionality of the observations of the HMM are taken from the given emission distributions. Alternately, the dimensionality can be set with Dimensionality().
The transition matrix should be such that T(i, j) is the probability of transition to state i from state j. The columns of the matrix should sum to 1.
The emission matrix should be such that E(i, j) is the probability of emission i while in state j. The columns of the matrix should sum to 1.
Optionally, the tolerance for convergence of the Baum-Welch algorithm can be set.
transition | Transition matrix. | |
emission | Emission distributions. | |
tolerance | Tolerance for convergence of training algorithm (Baum-Welch). |
void mlpack::hmm::HMM< Distribution >::Backward | ( | const arma::mat & | dataSeq, | |
const arma::vec & | scales, | |||
arma::mat & | backwardProb | |||
) | const [private] |
The Backward algorithm (part of the Forward-Backward algorithm).
Computes backward probabilities for each state for each observation in the given data sequence, using the scaling factors found (presumably) by Forward(). The returned matrix has rows equal to the number of hidden states and columns equal to the number of observations.
dataSeq | Data sequence to compute probabilities for. | |
scales | Vector of scaling factors. | |
backwardProb | Matrix in which backward probabilities will be saved. |
size_t& mlpack::hmm::HMM< Distribution >::Dimensionality | ( | ) | [inline] |
Set the dimensionality of observations.
Definition at line 281 of file hmm.hpp.
References mlpack::hmm::HMM< Distribution >::dimensionality.
size_t mlpack::hmm::HMM< Distribution >::Dimensionality | ( | ) | const [inline] |
Get the dimensionality of observations.
Definition at line 279 of file hmm.hpp.
References mlpack::hmm::HMM< Distribution >::dimensionality.
std::vector<Distribution>& mlpack::hmm::HMM< Distribution >::Emission | ( | ) | [inline] |
const std::vector<Distribution>& mlpack::hmm::HMM< Distribution >::Emission | ( | ) | const [inline] |
double mlpack::hmm::HMM< Distribution >::Estimate | ( | const arma::mat & | dataSeq, | |
arma::mat & | stateProb | |||
) | const |
Estimate the probabilities of each hidden state at each time step of each given data observation, using the Forward-Backward algorithm.
The returned matrix of state probabilities has columns equal to the number of data observations, and rows equal to the number of hidden states in the model. The log-likelihood of the most probable sequence is returned.
dataSeq | Sequence of observations. | |
stateProb | Probabilities of each state at each time interval. |
double mlpack::hmm::HMM< Distribution >::Estimate | ( | const arma::mat & | dataSeq, | |
arma::mat & | stateProb, | |||
arma::mat & | forwardProb, | |||
arma::mat & | backwardProb, | |||
arma::vec & | scales | |||
) | const |
Estimate the probabilities of each hidden state at each time step for each given data observation, using the Forward-Backward algorithm.
Each matrix which is returned has columns equal to the number of data observations, and rows equal to the number of hidden states in the model. The log-likelihood of the most probable sequence is returned.
dataSeq | Sequence of observations. | |
stateProb | Matrix in which the probabilities of each state at each time interval will be stored. | |
forwardProb | Matrix in which the forward probabilities of each state at each time interval will be stored. | |
backwardProb | Matrix in which the backward probabilities of each state at each time interval will be stored. | |
scales | Vector in which the scaling factors at each time interval will be stored. |
void mlpack::hmm::HMM< Distribution >::Forward | ( | const arma::mat & | dataSeq, | |
arma::vec & | scales, | |||
arma::mat & | forwardProb | |||
) | const [private] |
The Forward algorithm (part of the Forward-Backward algorithm).
Computes forward probabilities for each state for each observation in the given data sequence. The returned matrix has rows equal to the number of hidden states and columns equal to the number of observations.
dataSeq | Data sequence to compute probabilities for. | |
scales | Vector in which scaling factors will be saved. | |
forwardProb | Matrix in which forward probabilities will be saved. |
void mlpack::hmm::HMM< Distribution >::Generate | ( | const size_t | length, | |
arma::mat & | dataSequence, | |||
arma::Col< size_t > & | stateSequence, | |||
const size_t | startState = 0 | |||
) | const |
Generate a random data sequence of the given length.
The data sequence is stored in the dataSequence parameter, and the state sequence is stored in the stateSequence parameter. Each column of dataSequence represents a random observation.
length | Length of random sequence to generate. | |
dataSequence | Vector to store data in. | |
stateSequence | Vector to store states in. | |
startState | Hidden state to start sequence in (default 0). |
double mlpack::hmm::HMM< Distribution >::LogLikelihood | ( | const arma::mat & | dataSeq | ) | const |
Compute the log-likelihood of the given data sequence.
dataSeq | Data sequence to evaluate the likelihood of. |
double mlpack::hmm::HMM< Distribution >::Predict | ( | const arma::mat & | dataSeq, | |
arma::Col< size_t > & | stateSeq | |||
) | const |
Compute the most probable hidden state sequence for the given data sequence, using the Viterbi algorithm, returning the log-likelihood of the most likely state sequence.
dataSeq | Sequence of observations. | |
stateSeq | Vector in which the most probable state sequence will be stored. |
double& mlpack::hmm::HMM< Distribution >::Tolerance | ( | ) | [inline] |
Modify the tolerance of the Baum-Welch algorithm.
Definition at line 286 of file hmm.hpp.
References mlpack::hmm::HMM< Distribution >::tolerance.
double mlpack::hmm::HMM< Distribution >::Tolerance | ( | ) | const [inline] |
Get the tolerance of the Baum-Welch algorithm.
Definition at line 284 of file hmm.hpp.
References mlpack::hmm::HMM< Distribution >::tolerance.
void mlpack::hmm::HMM< Distribution >::Train | ( | const std::vector< arma::mat > & | dataSeq, | |
const std::vector< arma::Col< size_t > > & | stateSeq | |||
) |
Train the model using the given labeled observations; the transition and emission matrices are directly estimated.
Each matrix in the vector of data sequences corresponds to a vector in the vector of state sequences. Each point in each individual data sequence should be a column in the matrix, and its state should be the corresponding element in the state sequence vector. For instance, dataSeq[0].col(3) corresponds to the fourth observation in the first data sequence, and its state is stateSeq[0][3]. The number of rows in each matrix should be equal to the dimensionality of the HMM (which is set in the constructor).
dataSeq | Vector of observation sequences. | |
stateSeq | Vector of state sequences, corresponding to each observation. |
void mlpack::hmm::HMM< Distribution >::Train | ( | const std::vector< arma::mat > & | dataSeq | ) |
Train the model using the Baum-Welch algorithm, with only the given unlabeled observations.
Instead of giving a guess transition and emission matrix here, do that in the constructor. Each matrix in the vector of data sequences holds an individual data sequence; each point in each individual data sequence should be a column in the matrix. The number of rows in each matrix should be equal to the dimensionality of the HMM (which is set in the constructor).
It is preferable to use the other overload of Train(), with labeled data. That will produce much better results. However, if labeled data is unavailable, this will work. In addition, it is possible to use Train() with labeled data first, and then continue to train the model using this overload of Train() with unlabeled data.
The tolerance of the Baum-Welch algorithm can be set either in the constructor or with the Tolerance() method. When the change in log-likelihood of the model between iterations is less than the tolerance, the Baum-Welch algorithm terminates.
dataSeq | Vector of observation sequences. |
arma::mat& mlpack::hmm::HMM< Distribution >::Transition | ( | ) | [inline] |
const arma::mat& mlpack::hmm::HMM< Distribution >::Transition | ( | ) | const [inline] |
size_t mlpack::hmm::HMM< Distribution >::dimensionality [private] |
Dimensionality of observations.
Definition at line 327 of file hmm.hpp.
Referenced by mlpack::hmm::HMM< Distribution >::Dimensionality().
std::vector<Distribution> mlpack::hmm::HMM< Distribution >::emission [private] |
double mlpack::hmm::HMM< Distribution >::tolerance [private] |
Tolerance of Baum-Welch algorithm.
Definition at line 330 of file hmm.hpp.
Referenced by mlpack::hmm::HMM< Distribution >::Tolerance().
arma::mat mlpack::hmm::HMM< Distribution >::transition [private] |