Class TokenStreamToAutomaton

    • Field Detail

      • preservePositionIncrements

        private boolean preservePositionIncrements
      • finalOffsetGapAsHole

        private boolean finalOffsetGapAsHole
      • unicodeArcs

        private boolean unicodeArcs
      • POS_SEP

        public static final int POS_SEP
        We create transition between two adjacent tokens.
        See Also:
        Constant Field Values
      • HOLE

        public static final int HOLE
        We add this arc to represent a hole.
        See Also:
        Constant Field Values
    • Constructor Detail

      • TokenStreamToAutomaton

        public TokenStreamToAutomaton()
        Sole constructor.
    • Method Detail

      • setPreservePositionIncrements

        public void setPreservePositionIncrements​(boolean enablePositionIncrements)
        Whether to generate holes in the automaton for missing positions, true by default.
      • setFinalOffsetGapAsHole

        public void setFinalOffsetGapAsHole​(boolean finalOffsetGapAsHole)
        If true, any final offset gaps will result in adding a position hole.
      • setUnicodeArcs

        public void setUnicodeArcs​(boolean unicodeArcs)
        Whether to make transition labels Unicode code points instead of UTF8 bytes, false by default
      • changeToken

        protected BytesRef changeToken​(BytesRef in)
        Subclass and implement this if you need to change the token (such as escaping certain bytes) before it's turned into a graph.
      • toAutomaton

        public Automaton toAutomaton​(TokenStream in)
                              throws java.io.IOException
        Pulls the graph (including PositionLengthAttribute) from the provided TokenStream, and creates the corresponding automaton where arcs are bytes (or Unicode code points if unicodeArcs = true) from each term.
        Throws:
        java.io.IOException