安哥网络 发表于 2016-2-25 15:24:15

在phpcms中应用sphinx全文索引[性能测试中]

在phpcms中应用sphinx全文索引[性能测试中]
Sphinx is a full-text search engine,The latest stable release is 0.9.9-release.
Sphinx features
    * high indexing speed (upto 10 MB/sec on modern CPUs);
    * high search speed (avg query is under 0.1 sec on 2-4 GB text collections);
    * high scalability (upto 100 GB of text, upto 100 M documents on a single CPU);
    * ....
英文介绍:http://www.sphinxsearch.com/docs/manual-0.9.9.html

一、首先需要在服务器上安装sphinx
在Windows上安装sphinx
    1.下载支持mysql的包http://www.sphinxsearch.com/downloads/sphinx-0.9.9-win32.zip
    2.解压缩 sphinx-0.9.9-win32.zip 到 D:sphinx
    3.安装sphinx服务,在命令行执行命令D:sphinxsearchd --install --config d:sphinxsphinx.conf --servicename SphinxSearch
    英文参照:http://www.sphinxsearch.com/docs ... #installing-windows

在Linux服务器上安装sphinx
   1.下载源码包 http://www.sphinxsearch.com/downloads/sphinx-0.9.9.tar.gz $ tar xzvf sphinx-0.9.8.tar.gz
$ cd sphinx
$ ./configure --prefix=/usr/local/sphinx --with-mysql=/usr/local/mysql
$ make
$ make installsphinx.conf样例source main
{
type = mysql

sql_host = 10.228.129.199 #主机地址
sql_user = admin #用户名
sql_pass = admin #密码
sql_db = demo #数据库名
sql_port = 3306 # 端口, default is 3306

sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO phpcms_counter SELECT 1, MAX(searchid) FROM phpcms_search
sql_query = SELECT searchid, type, data FROM phpcms_search
      WHERE searchid>=$start AND searchid<=$end
sql_query_range= SELECT 1,max_doc_id FROM phpcms_counter WHERE counter_id=1
sql_range_step = 5000
sql_query_info = SELECT * FROM main2008_search WHERE searchid=$id
}

source delta : main
{
sql_query_pre = SET NAMES utf8
sql_query = SELECT searchid, type, data FROM phpcms_search
WHERE searchid >( SELECT max_doc_id FROM phpcms_counter WHERE counter_id=1 )
}

index main
{
source = main
# 放索引的目录
path = D:sphinxdatamain #主索引路径
# 编码
charset_type = utf-8
# 指定utf-8的编码表
charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
# 简单分词,只支持0和1,如果要搜索中文,请指定为1
ngram_len = 1
# 需要分词的字符,如果要搜索中文,去掉前面的注释
ngram_chars = U+3000..U+2FA1F
}

index delta : main
{
source = delta
path = D:sphinxdatadelta #从索引(暂时这么理解吧)路径
}

indexer
{
mem_limit = 128M #索引占用内存
}

searchd
{
port = 9312
log = D:sphinxdataphpcmssearchd.log #服务日志路径
query_log = D:sphinxdataphpcmsquery.log #查询日志路径
read_timeout = 5
max_children = 30
pid_file = D:sphinxdataphpcmssearchd.pid
max_matches = 1000
seamless_rotate = 0
preopen_indexes = 0
unlink_old = 1
}二、升级phpcms search模块
    下载升级包直接覆盖search模块目录
    下载地址:search.zip(16.39 KB, 下载次数: 522)
   进入后台配置全文检索
   
创建数据表CREATE TABLE `phpcms_counter` (
`counter_id` INT(11) NOT NULL,
`max_doc_id` INT(11) NOT NULL,
PRIMARY KEY (`counter_id`)
) ENGINE=MYISAM DEFAULT CHARSET=gbk三、设置计划任务更新索引
1.windows下
需要设置计划任务
#凌晨4点合并索引,执行merge.bat
#其余时间每分钟更新索引,执行delta.bat
merge.bat    @ECHO off
    D:\sphinx\bin\indexer.exe --config D:\sphinx\sphinx.conf --merge main delta --rotate
    echo indexing, window will close when complete

复制代码delta.bat    @ECHO off
    D:\sphinx\bin\indexer.exe --config D:\sphinx\sphinx.conf delta --rotate
    echo indexing, window will close when complete 2.linux下编辑定时任务 crontab -e    #凌晨4点合并索引,其余时间每分钟更新索引
    * 0-3 * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf delta --rotate
    * 6-23 * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf delta --rotate
    0 4 * * * /usr/local/sphinx/bin/indexer --config /usr/local/sphinx/etc/sphinx.conf --merge main delta --rotate注意:升级前请注意备份文件,避免意外。

各种路径、权限需要应用所在服务器一致
如:
sphinx.conf 中需要配置
sql_host
sql_user
sql_pass
sql_db
sql_port
phpcms表前缀样例中为phpcms_
索引路径 D:\sphinx\data\delta


使用coreseek中文分词sphinx.conf样例中文参照:http://www.coreseek.cn/products-install/
安装步骤:
按照“中文参照”安装步骤,完成“三、coreseek中文全文检索测试”表示安装成功
coreseek.conf样例:
source main
{
type = mysql
sql_host = 10.228.129.199 #主机地址
sql_user = admin #用户名
sql_pass = admin #密码
sql_db = demo #数据库名
sql_port = 3306 # 端口, default is 3306
sql_query_pre = SET NAMES utf8
sql_query_pre = REPLACE INTO phpcms_counter SELECT 1, MAX(searchid) FROM phpcms_search
sql_query = SELECT searchid, type, data FROM phpcms_search \
      WHERE searchid>=$start AND searchid<=$end
sql_query_range= SELECT 1,max_doc_id FROM phpcms_counter WHERE counter_id=1
sql_range_step = 5000
sql_query_info = SELECT * FROM main2008_search WHERE searchid=$id
}
source delta : main
{
sql_query_pre = SET NAMES utf8
sql_query = SELECT searchid, type, data FROM phpcms_search \
WHERE searchid >( SELECT max_doc_id FROM phpcms_counter WHERE counter_id=1 )
}
index main
{
source = main
# 放索引的目录
path = D:\sphinx\data\main #主索引路径
#未分词版本,详情请参考:http://www.coreseek.cn/products-install/ngram_len_cjk/
# 编码
#charset_type = zh_cn.utf-8
# 指定utf-8的编码表
#charset_table = 0..9, A..Z->a..z, _, a..z, U+410..U+42F->U+430..U+44F, U+430..U+44F
# 简单分词,只支持0和1,如果要搜索中文,请指定为1
#ngram_len = 1
# 需要分词的字符,如果要搜索中文,去掉前面的注释
#ngram_chars = U+3000..U+2FA1F
# 分词版本,详情请参考:http://www.coreseek.cn/products-install/ngram_len_cjk/
charset_dictpath=D:\sphinx\etc
# 编码
charset_type = zh_cn.utf-8
# 指定zh_cn.utf-8的编码表
#charset_table =
ngram_len = 0
#ngram_chars =
}
index delta : main
{
source = delta
path = D:\sphinx\data\delta #从索引(暂时这么理解吧)路径
}
indexer
{
mem_limit = 128M #索引占用内存
}
searchd
{
port = 9312
log = D:\sphinx\data\phpcms\searchd.log #服务日志路径
query_log = D:\sphinx\data\phpcms\query.log #查询日志路径
read_timeout = 5
max_children = 30
pid_file = D:\sphinx\data\phpcms\searchd.pid
max_matches = 1000
seamless_rotate = 0
preopen_indexes = 0
unlink_old = 1
}

http://bbs.phpcms.cn/thread-149380-1-1.html
在phpcms中应用sphinx全文索引[性能测试中]

页: [1]
查看完整版本: 在phpcms中应用sphinx全文索引[性能测试中]