Language Models¶
- class pynlpl.lm.lm.ARPALanguageModel(filename, encoding='utf-8', encoder=None, base_e=True, dounknown=True, debug=False, mode='simple')¶
Full back-off language model, loaded from file in ARPA format.
This class does not build the model but allows you to use a pre-computed one. You can use the tool ngram-count from for instance SRILM to actually build the model.
- class NgramsProbs(data, mode='simple', delim=' ')¶
Store Ngrams with their probabilities and backoffs.
This class is used in order to abstract the physical storage layout, and enable memory/speed tradeoffs.
- backoff(ngram)¶
Return backoff value of a given ngram tuple
- prob(ngram)¶
Return probability of given ngram tuple
- score(data, history=None)¶
- scoreword(word, history=None)¶
- class pynlpl.lm.lm.SimpleLanguageModel(n=2, casesensitive=True, beginmarker='<begin>', endmarker='<end>')¶
This is a simple unsmoothed language model. This class can both hold and compute the model.
- append(sentence)¶
- load(filename)¶
- save(filename)¶
- scoresentence(sentence)¶
- class pynlpl.lm.srilm.SRILM(filename, n)¶
- logscore(ngram)¶
- scoresentence(sentence, unknownwordprob=- 12)¶
- exception pynlpl.lm.srilm.SRILMException¶
Base Exception for SRILM.
- class pynlpl.lm.server.LMNGramFactory(lm)¶
- protocol¶
alias of
pynlpl.lm.server.LMNGramProtocol
- class pynlpl.lm.server.LMNGramProtocol¶
- lineReceived(ngram)¶
Override this for when each line is received.
@param line: The line which was received with the delimiter removed. @type line: C{bytes}
- class pynlpl.lm.server.LMSentenceFactory(lm)¶
- protocol¶
alias of
pynlpl.lm.server.LMSentenceProtocol
- class pynlpl.lm.server.LMSentenceProtocol¶
- lineReceived(sentence)¶
Override this for when each line is received.
@param line: The line which was received with the delimiter removed. @type line: C{bytes}
- class pynlpl.lm.server.LMServer(lm, port=12346, n=0)¶
Language Model Server