换用sphinx后,产品同事发现旧的数据也能搜索出新数据来,查了下官方文档 http://sphinxsearch.com/docs/manual-0.9.9.html#index-merging, 发现更新索引时thinking-sphinx没有把它自定义的sphinx_deleted属性同时更新上去,导致在执行增量索引时用的–merge-dst-range选项无效。现解决方案如下:

  1. 在config/production.sphinx.conf的searchd部分加上 attr_flush_period = 5 ,让sphinx在更新sphinx_deleted属性后写入到磁盘里。

  2. 因为sphinx要部署在另外一台独立的机子上,为了方便运维部署和维护,不用安装其他的类似rmagick之类和sphinx无关的软件,就写了一个ruby脚本,用bundler配置安装下gem包,放在cron里定时跑,部分代码如下:

 1 #!/usr/bin/env ruby -rubygems
 2 RAILS_ROOT = File.expand_path(File.dirname(__FILE__)) unless defined?(RAILS_ROOT)
 3 sphinx_config_yml = RAILS_ROOT + '/config/sphinx.yml'
 4 mysql_config_yml = RAILS_ROOT + '/config/database.yml'
 5 production_sphinx_conf = RAILS_ROOT + '/config/production.sphinx.conf'
 6 gem 'rails', '2.3.4'
 7 require 'initializer'
 8 %w[active_record active_support action_view action_controller action_mailer].map {|act| require act}
 9 gem "thinking-sphinx", "1.3.18", :lib => "thinking_sphinx"
10 %w[yaml riddle thinking_sphinx].map {|lib| require lib}
11 
12 class Hash
13   def symbolize_keys
14     inject({}) do |options, (key, value)|
15       options[(key.to_sym rescue key) || key] = value
16       options
17     end
18   end
19 end
20 
21 sphinx_config = YAML.load_file(sphinx_config_yml)['production'].symbolize_keys
22 ActiveRecord::Base.establish_connection(YAML.load_file(mysql_config_yml)['production'].symbolize_keys)
23 
24 class ActiveRecord::Base
25   def self.has_attached_file(a, b = {}); end
26   def self.validates_attachment_content_type(a, b = {} ); end
27 end
28 
29 files = Dir.glob(RAILS_ROOT + "/app/models/*/*.rb") + Dir.glob(RAILS_ROOT + "/app/models/*/*/*.rb")
30 model_strs = files.map {|path| path.scan(/app\/models\/(.*)\.rb/)[0][0].split('/').map(&:camelize).join('::') }
31 
32 model_strs.each do |str|
33   arr = str.split("::")
34   arr.size.times do |x|
35     begin
36       eval("class #{arr[0..x].join('::')} < ActiveRecord::Base; end")
37     rescue TypeError
38       # FIX superclass mismatch for class Data (TypeError)
39     end
40   end
41 end
42 
43 files.each do |x|
44   begin
45     load x
46   rescue TypeError
47   end
48 end
49 
50 
51 client = Riddle::Client.new(sphinx_config[:address], sphinx_config[:port])
52 indexes = []
53 
54 models = ThinkingSphinx.context.indexed_models.each do |str|
55   prefix = str.split('::').map {|s| s.downcase }.join('_')
56   indexes << ( index = ["#{prefix}_core", "#{prefix}_delta"] )
57   attrs = {}
58   str.constantize.all(:select => "id", :conditions => ["updated_at > ?", Time.now - 3650]).each do |item|
59     attrs[item.id * 5] = [1]
60   end
61 
62   unless attrs.blank?
63     puts attrs.inspect
64     puts "Updating #{client.update(index[0], ['sphinx_deleted'], attrs )} docs"
65   end
66 end
67 sleep 10 # 等待写入到磁盘里
68 
69 # 只允许0出现在最终索引里
70 system "/usr/local/bin/indexer --rotate --config #{RAILS_ROOT}/config/production.sphinx.conf #{indexes.map {|x| x[1] }.join(' ')}"
71 sleep 2 # 马上执行以下会导致delta没有更新到main索引里
72 indexes.each do |index|
73   system "/usr/local/bin/indexer --rotate --config #{RAILS_ROOT}/config/production.sphinx.conf --merge #{index.join(' ')} --merge-dst-range sphinx_deleted 0 0"
74 end