新版truncate_html(不使用hpricot)-Html/Css-WEB前端-六狼论坛-IT论坛-计算机论坛

chenjihua75 发表于 2013-2-7 16:15:51

新版truncate_html(不使用hpricot)

参考文献：
http://mikeburnscoder.wordpress.com/2006/11/11/truncating-html-in-ruby/

这里不使用hpricot，而是使用ruby自己带的库rexml中的pullparser
将文件取名为string.rb，并放到rails项目的lib目录下，当然也可以根据需要，改写成helper(我就是这么做的)。
require 'rexml/parsers/pullparser'class Stringdef truncate_html(len = 30) p = REXML::Parsers::PullParser.new(self) tags = [] new_len = len results = '' while p.has_next? && new_len > 0 p_e = p.pull case p_e.event_type when :start_element tags.push p_e results << "<#{tags.last} #{attrs_to_s(p_e)}>" when :end_element results << "</#{tags.pop}>" when :text results << p_e.first(new_len) new_len -= p_e.length else results << "" end end tags.reverse.each do |tag| results << "</#{tag}>" end resultsendprivatedef attrs_to_s(attrs) if attrs.empty? '' else attrs.to_a.map { |attr| %{#{attr}="#{attr}"} }.join(' ') endendend
1、首先使用html格式字符串初始化PullParser：
p = REXML::Parsers::PullParser.new(self)
2、然后使用pull函数每次获取一个元素：
e = p.pull
假设字符串为：
str = '<div id="head">some text<h1>head</h1></div>'
那么上面的语句返回一个对象e，其中
e.event_type = :start_elemente = 'div'e = {"id" => "head"} # 一个保存tag属性的hash
3、再次执行
e = p.pull
那么就会返回一个e.event_type = :text的对象。
4、空白和中文字符长度处理
如果要去除字符串的空白，使用：
string.strip
如果要获取包含中文字符的字符串的字符长度（非字节长度），使用：
string.split(//).length

页: [1]

六狼论坛's Archiver

新版truncate_html(不使用hpricot)