class Client
This class was heavily based on the existing Client API by Dmytro Shteflyuk and Alexy Kovyrin. Their code worked fine, I just wanted something a bit more Ruby-ish (ie. lowercase and underscored method names). I also have used a few helper classes, just to neaten things up.
Feel free to use it wherever. Send bug reports, patches, comments and suggestions to pat at freelancing-gods dot com.
Most properties of the client are accessible through attribute accessors, and where relevant use symboles instead of the long constants common in other clients. Some examples:
client.sort_mode = :extended client.sort_by = "birthday DESC" client.match_mode = :extended
To add a filter, you will need to create a Filter object:
client.filters << Riddle::Client::Filter.new("birthday", Time.at(1975, 1, 1).to_i..Time.at(1985, 1, 1).to_i, false)
Public Class Methods
# File lib/riddle/client.rb, line 136 def self.connection @@connection end
# File lib/riddle/client.rb, line 130 def self.connection=(value) Riddle.mutex.synchronize do @@connection = value end end
Can instantiate with a specific server and port - otherwise it assumes defaults of localhost and 9312 respectively. All other settings can be accessed and changed via the attribute accessors.
# File lib/riddle/client.rb, line 143 def initialize(servers = nil, port = nil, key = nil) Riddle.version_warning @servers = Array(servers || "localhost") @port = port || 9312 @socket = nil @key = key reset @queue = [] end
Public Instance Methods
# File lib/riddle/client.rb, line 484 def add_override(attribute, type, values) @overrides[attribute] = {:type => type, :values => values} end
Append a query to the queue. This uses the same parameters as the query method.
# File lib/riddle/client.rb, line 221 def append_query(search, index = '*', comments = '') @queue << query_message(search, index, comments) end
# File lib/riddle/client.rb, line 496 def close close_socket end
Build excerpts from search terms (the words
) and the text of
documents. Excerpts are bodies of text that have the words
highlighted. They may also be abbreviated to fit within a word limit.
As part of the options hash, you will need to define:
-
:docs
-
:words
-
:index
Optional settings include:
-
:before_match (defaults to <span class=“match”>)
-
:after_match (defaults to </span>)
-
:chunk_separator (defaults to ' … ' - which is an HTML ellipsis)
-
:limit (defaults to 256)
-
:around (defaults to 5)
-
:exact_phrase (defaults to false)
-
:single_passage (defaults to false)
The defaults differ from the official PHP client, as I've opted for semantic HTML markup.
Example:
client.excerpts(:docs => ["Pat Allan, Pat Cash"], :words => 'Pat', :index => 'pats') #=> ["<span class=\"match\">Pat</span> Allan, <span class=\"match\">Pat</span> Cash"] lorem_lipsum = "Lorem ipsum dolor..." client.excerpts(:docs => ["Pat Allan, #{lorem_lipsum} Pat Cash"], :words => 'Pat', :index => 'pats') #=> ["<span class=\"match\">Pat</span> Allan, Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua … . Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. <span class=\"match\">Pat</span> Cash"]
Workflow:
Excerpt creation is completely isolated from searching the index. The nominated index is only used to discover encoding and charset information.
Therefore, the workflow goes:
-
Do the sphinx query.
-
Fetch the documents found by sphinx from their repositories.
-
Pass the documents' text to
excerpts
for marking up of matched terms.
# File lib/riddle/client.rb, line 395 def excerpts(options = {}) options[:index] ||= '*' options[:before_match] ||= '<span class="match">' options[:after_match] ||= '</span>' options[:chunk_separator] ||= ' … ' # ellipsis options[:limit] ||= 256 options[:limit_passages] ||= 0 options[:limit_words] ||= 0 options[:around] ||= 5 options[:exact_phrase] ||= false options[:single_passage] ||= false options[:query_mode] ||= false options[:force_all_words] ||= false options[:start_passage_id] ||= 1 options[:load_files] ||= false options[:html_strip_mode] ||= 'index' options[:allow_empty] ||= false options[:passage_boundary] ||= 'none' options[:emit_zones] ||= false options[:load_files_scattered] ||= false response = Response.new request(:excerpt, excerpts_message(options)) options[:docs].collect { response.next } end
# File lib/riddle/client.rb, line 476 def flush_attributes response = Response.new request( :flushattrs, Message.new ) response.next_int end
Generates a keyword list for a given query. Each keyword is represented by a hash, with keys :tokenised and :normalised. If return_hits is set to true it will also report on the number of hits and documents for each keyword (see :hits and :docs keys respectively).
# File lib/riddle/client.rb, line 443 def keywords(query, index, return_hits = false) response = Response.new request( :keywords, keywords_message(query, index, return_hits) ) (0...response.next_int).collect do hash = {} hash[:tokenised] = response.next hash[:normalised] = response.next if return_hits hash[:docs] = response.next_int hash[:hits] = response.next_int end hash end end
# File lib/riddle/client.rb, line 488 def open open_socket return if Versions[:search] < 0x116 @socket.send request_header(:persist) + [1].pack('N'), 0 end
Query the Sphinx daemon - defaulting to all indices, but you can specify a specific one if you wish. The search parameter should be a string following Sphinx's expectations.
The object returned from this method is a hash with the following keys:
-
:matches
-
:fields
-
:attributes
-
:attribute_names
-
:words
-
:total
-
:total_found
-
:time
-
:status
-
:warning (if appropriate)
-
:error (if appropriate)
The key :matches
returns an array of hashes - the actual
search results. Each hash has the document id (:doc
), the
result weighting (:weight
), and a hash of the attributes for
the document (:attributes
).
The :fields
and :attribute_names
keys return list
of fields and attributes for the documents. The key
:attributes
will return a hash of attribute name and type
pairs, and :words
returns a hash of hashes representing the
words from the search, with the number of documents and hits for each,
along the lines of:
results[:words]["Pat"] #=> {:docs => 12, :hits => 15}
:total
, :total_found
and :time
return the number of matches available, the total number of matches (which
may be greater than the maximum available, depending on the number of
matches and your sphinx configuration), and the time in milliseconds that
the query took to run.
:status
is the error code for the query - and if there was a
related warning, it will be under the :warning
key. Fatal
errors will be described under :error
.
# File lib/riddle/client.rb, line 344 def query(search, index = '*', comments = '') @queue.clear @queue << query_message(search, index, comments) self.run.first end
Reset attributes and settings to defaults.
# File lib/riddle/client.rb, line 157 def reset # defaults @offset = 0 @limit = 20 @max_matches = 1000 @match_mode = :all @sort_mode = :relevance @sort_by = '' @weights = [] @id_range = 0..0 @filters = [] @group_by = '' @group_function = :day @group_clause = '@weight DESC' @group_distinct = '' @cut_off = 0 @retry_count = 0 @retry_delay = 0 @anchor = {} # string keys are index names, integer values are weightings @index_weights = {} @rank_mode = :proximity_bm25 @rank_expr = '' @max_query_time = 0 # string keys are field names, integer values are weightings @field_weights = {} @timeout = 0 @overrides = {} @select = "*" end
Run all the queries currently in the queue. This will return an array of results hashes.
# File lib/riddle/client.rb, line 227 def run response = Response.new request(:search, @queue) results = @queue.collect do result = { :matches => [], :fields => [], :attributes => {}, :attribute_names => [], :words => {} } result[:status] = response.next_int case result[:status] when Statuses[:warning] result[:warning] = response.next when Statuses[:error] result[:error] = response.next next result end result[:fields] = response.next_array attributes = response.next_int attributes.times do attribute_name = response.next type = response.next_int result[:attributes][attribute_name] = type result[:attribute_names] << attribute_name end result_attribute_names_and_types = result[:attribute_names]. inject([]) { |array, attr| array.push([ attr, result[:attributes][attr] ]) } matches = response.next_int is_64_bit = response.next_int result[:matches] = (0...matches).map do |i| doc = is_64_bit > 0 ? response.next_64bit_int : response.next_int weight = response.next_int current_match_attributes = {} result_attribute_names_and_types.each do |attr, type| current_match_attributes[attr] = attribute_from_type(type, response) end {:doc => doc, :weight => weight, :index => i, :attributes => current_match_attributes} end result[:total] = response.next_int.to_i || 0 result[:total_found] = response.next_int.to_i || 0 result[:time] = ('%.3f' % (response.next_int / 1000.0)).to_f || 0.0 words = response.next_int words.times do word = response.next docs = response.next_int hits = response.next_int result[:words][word] = {:docs => docs, :hits => hits} end result end @queue.clear results rescue Riddle::OutOfBoundsError => error raise error rescue => original error = ResponseError.new original.message error.original = original raise error end
The searchd server to query. Servers are removed from @server after a Timeout::Error is hit to allow for fail-over.
# File lib/riddle/client.rb, line 190 def server @servers.first end
Backwards compatible writer to the @servers array.
# File lib/riddle/client.rb, line 195 def server=(server) @servers = server.to_a end
Set the geo-anchor point - with the names of the attributes that contain the latitude and longitude (in radians), and the reference position. Note that for geocoding to work properly, you must also set #match_mode to :extended. To sort results by distance, you will need to set #sort_by to '@geodist asc', and #sort_mode to extended (as an example). Sphinx expects latitude and longitude to be returned from you SQL source in radians.
Example:
client.set_anchor('lat', -0.6591741, 'long', 2.530770)
# File lib/riddle/client.rb, line 210 def set_anchor(lat_attr, lat, long_attr, long) @anchor = { :latitude_attribute => lat_attr, :latitude => lat, :longitude_attribute => long_attr, :longitude => long } end
# File lib/riddle/client.rb, line 463 def status response = Response.new request( :status, Message.new ) rows, cols = response.next_int, response.next_int (0...rows).inject({}) do |hash, row| hash[response.next.to_sym] = response.next hash end end
Update attributes - first parameter is the relevant index, second is an array of attributes to be updated, and the third is a hash, where the keys are the document ids, and the values are arrays with the attribute values - in the same order as the second parameter.
Example:
client.update('people', ['birthday'], {1 => [Time.at(1982, 20, 8).to_i]})
# File lib/riddle/client.rb, line 430 def update(index, attributes, values_by_doc) response = Response.new request( :update, update_message(index, attributes, values_by_doc) ) response.next_int end
Private Instance Methods
# File lib/riddle/client.rb, line 843 def attribute_from_type(type, response) handler = AttributeHandlers[type] response.send handler end
# File lib/riddle/client.rb, line 530 def close_socket raise "Not Connected" if @socket.nil? @socket.close @socket = nil true end
If there's an active connection to the Sphinx daemon, this will yield the socket. If there's no active connection, then it will connect, yield the new socket, then close it.
# File lib/riddle/client.rb, line 542 def connect(&block) if @socket.nil? || @socket.closed? @socket = nil open_socket begin yield @socket ensure close_socket end else yield @socket end end
# File lib/riddle/client.rb, line 848 def excerpt_flags(options) flags = 1 flags |= 2 if options[:exact_phrase] flags |= 4 if options[:single_passage] flags |= 8 if options[:use_boundaries] flags |= 16 if options[:weight_order] flags |= 32 if options[:query_mode] flags |= 64 if options[:force_all_words] flags |= 128 if options[:load_files] flags |= 256 if options[:allow_empty] flags |= 512 if options[:emit_zones] flags |= 1024 if options[:load_files_scattered] flags end
Generation of the message to send to Sphinx for an excerpts request.
# File lib/riddle/1.10/client.rb, line 8 def excerpts_message(options) message = Message.new message.append [0, excerpt_flags(options)].pack('N2') # 0 = mode message.append_string options[:index] message.append_string options[:words] # options message.append_string options[:before_match] message.append_string options[:after_match] message.append_string options[:chunk_separator] message.append_ints options[:limit], options[:around] message.append_ints options[:limit_passages], options[:limit_words] message.append_ints options[:start_passage_id] message.append_string options[:html_strip_mode] if Versions[:excerpt] >= 0x103 message.append_string options[:passage_boundary] end message.append_array options[:docs] message.to_s end
# File lib/riddle/0.9.9/client.rb, line 7 def initialise_connection(available_server) socket = initialise_socket(available_server) # Send version socket.send [1].pack('N'), 0 # Checking version version = socket.recv(4).unpack('N*').first if version < 1 socket.close raise Riddle::VersionError, "Can only connect to searchd version 1.0 or better, not version #{version}" end socket end
# File lib/riddle/client.rb, line 572 def initialise_socket(available_server) tries = 0 begin socket = if self.connection self.connection.call(self) elsif self.class.connection self.class.connection.call(self) elsif available_server && available_server.index('/') == 0 UNIXSocket.new available_server elsif available_server TCPSocket.new available_server, @port else raise "Server not set." end rescue Errno::ETIMEDOUT, Errno::ECONNRESET, Errno::ECONNREFUSED => e retry if (tries += 1) < 5 raise Riddle::ConnectionError, "Connection to #{available_server} on #{@port} failed. #{e.message}" end socket end
# File lib/riddle/client.rb, line 617 def key_message @key_message ||= begin message = Message.new message.append_string key message.to_s end end
Generates the simple message to send to the daemon for a keywords request.
# File lib/riddle/client.rb, line 822 def keywords_message(query, index, return_hits) message = Message.new message.append_string query message.append_string index message.append_int return_hits ? 1 : 0 message.to_s end
# File lib/riddle/client.rb, line 502 def open_socket raise "Already Connected" unless @socket.nil? available_servers = servers.dup if @timeout == 0 @socket = initialise_connection(available_servers.first) else begin Timeout.timeout(@timeout) { @socket = initialise_connection(available_servers.first) } rescue Timeout::Error, Riddle::ConnectionError => e failed_servers ||= [] failed_servers << available_servers.shift retry if !available_servers.empty? case e when Timeout::Error raise Riddle::ConnectionError, "Connection to #{failed_servers.inspect} on #{@port} timed out after #{@timeout} seconds" else raise e end end end true end
Generation of the message to send to Sphinx for a search.
# File lib/riddle/client.rb, line 689 def query_message(search, index, comments = '') message = Message.new # Mode, Limits message.append_ints @offset, @limit, MatchModes[@match_mode] # Ranking message.append_int RankModes[@rank_mode] message.append_string @rank_expr if @rank_mode == :expr # Sort Mode message.append_int SortModes[@sort_mode] message.append_string @sort_by # Query message.append_string search # Weights message.append_int @weights.length message.append_ints *@weights # Index message.append_string index # ID Range message.append_int 1 message.append_64bit_ints @id_range.first, @id_range.last # Filters message.append_int @filters.length @filters.each { |filter| message.append filter.query_message } # Grouping message.append_int GroupFunctions[@group_function] message.append_string @group_by message.append_int @max_matches message.append_string @group_clause message.append_ints @cut_off, @retry_count, @retry_delay message.append_string @group_distinct # Anchor Point if @anchor.empty? message.append_int 0 else message.append_int 1 message.append_string @anchor[:latitude_attribute] message.append_string @anchor[:longitude_attribute] message.append_floats @anchor[:latitude], @anchor[:longitude] end # Per Index Weights message.append_int @index_weights.length @index_weights.each do |key,val| message.append_string key.to_s message.append_int val end # Max Query Time message.append_int @max_query_time # Per Field Weights message.append_int @field_weights.length @field_weights.each do |key,val| message.append_string key.to_s message.append_int val end message.append_string comments return message.to_s if Versions[:search] < 0x116 # Overrides message.append_int @overrides.length @overrides.each do |key,val| message.append_string key.to_s message.append_int AttributeTypes[val[:type]] message.append_int val[:values].length val[:values].each do |id,map| message.append_64bit_int id method = case val[:type] when :float :append_float when :bigint :append_64bit_int else :append_int end message.send method, map end end message.append_string @select message.to_s end
Send a collection of messages, for a command type (eg, search, excerpts, update), to the Sphinx daemon.
# File lib/riddle/client.rb, line 627 def request(command, messages) response = "" status = -1 version = 0 length = 0 message = Riddle.encode(Array(messages).join(""), 'ASCII-8BIT') connect do |socket| case command when :search if Versions[command] >= 0x118 socket.send request_header(command, message.length) + [0, messages.length].pack('NN') + message, 0 else socket.send request_header(command, message.length) + [messages.length].pack('N') + message, 0 end when :status socket.send request_header(command, message.length), 0 else socket.send request_header(command, message.length) + message, 0 end header = socket.recv(8) status, version, length = header.unpack('n2N') while response.length < (length || 0) part = socket.recv(length - response.length) # will return 0 bytes if remote side closed TCP connection, e.g, searchd segfaulted. break if part.length == 0 && socket.is_a?(TCPSocket) response << part if part end end if response.empty? || response.length != length raise ResponseError, "No response from searchd (status: #{status}, version: #{version})" end case status when Statuses[:ok] if version < Versions[command] puts format("searchd command v.%d.%d older than client (v.%d.%d)", version >> 8, version & 0xff, Versions[command] >> 8, Versions[command] & 0xff) end response when Statuses[:warning] length = response[0, 4].unpack('N*').first puts response[4, length] response[4 + length, response.length - 4 - length] when Statuses[:error], Statuses[:retry] message = response[4, response.length - 4] klass = message[/out of bounds/] ? OutOfBoundsError : ResponseError raise klass, "searchd error (status: #{status}): #{message}" else raise ResponseError, "Unknown searchd error (status: #{status})" end end
# File lib/riddle/client.rb, line 595 def request_header(command, length = 0) length += key_message.length if key core_header = case command when :search # Message length is +4/+8 to account for the following count value for # the number of messages. if Versions[command] >= 0x118 [Commands[command], Versions[command], 8 + length].pack("nnN") else [Commands[command], Versions[command], 4 + length].pack("nnN") end when :status [Commands[command], Versions[command], 4, 1].pack("nnNN") when :persist [Commands[command], 0, 4 + length].pack("nnN") else [Commands[command], Versions[command], length].pack("nnN") end key ? core_header + key_message : core_header end
# File lib/riddle/0.9.9/client.rb, line 23 def update_message(index, attributes, values_by_doc) message = Message.new message.append_string index message.append_int attributes.length attributes.each_with_index do |attribute, index| message.append_string attribute message.append_boolean values_by_doc.values.first[index].is_a?(Array) end message.append_int values_by_doc.length values_by_doc.each do |key,values| message.append_64bit_int key # document ID values.each do |value| case value when Array message.append_int value.length message.append_ints *value else message.append_int value end end end message.to_s end