初学ruby,写个域名查询的小程序练练手
前端时间公司出钱买书,我定了本<Programming Ruby>,拿回家了,每天看一点点,半个月的时间也有了点收获.下面星期天在家里写的一个小程序,起源于我想买个.com域名,心血来潮,想看看还有哪些拼音短域名没被注册过,说不定可以捡个漏呢,哈哈.
程序的关键之处在于词库和拼音,还好有人做好了:http://open-phrase.googlecode.com/files/phrase_pinyin_freq_sc_20090402.txt.bz2
这也是ibus所用的词库.
词库中含有中文,拼音以及词频,非常合用.
运行环境:linux,unix,freebsd,macos
#!/usr/bin/ruby# author tedeyang # 2010-8-2## A script for running whois command to detect which pinyin can be used for ".com" domain (like:zhongguo.com),the phrases are parsed from ibus phrase lib.#class ReadPhraseattr_reader :alldef initialize(file,lengthRange)@file = file@range = lengthRange@count = 0@runcount = 0@askcount = 0@all = Array.new()@work = File.new("work-#{@range.}.log","w")@get = File.new("domain-#{@range}.log","w")@nowPid = 0@ask = ""enddef initFromFilephraseFile = File.new(@file,'r')keystore = Hash.newphraseFile.each{|line| phrase,pinyin,freq = line.split two = (pinyin =~ /^\w+('\w+){0,2}$/) != nil pinyin.gsub!("'","") if pinyin.length <=8 && two # p "#{pinyin} #{phrase} (词频:#{freq})" previousPinyin = @all] if keystore.has_key?(pinyin) if previousPinyin == nil @all << keystore = @count @count = @count + 1 else @all] = ,previousPinyin+","+phrase] end end}puts "read done!"puts "there are #{@count} phrases in file"@all.sort!{|p1,p2| p1.length - p2.length}puts "sort done by phrase 's length!"end def unregisted?(phrasePy) @askcount += 1 domain = phrasePy + ".com" @work.puts "#{@askcount}-----run whois #{domain}---------------------------begin with #{Time.new}" @ask = "whois #{domain}" whois = IO.popen(@ask,"r") do |pipe| @nowPid = pipe.pid result = pipe.read @nowPid = 0 good = (/\s*(No match for).+/ =~ result) != nil @work.puts "#{@askcount}-----run whois #{domain}------------#{good}------------end with #{Time.new}" @work.flush good end return whoisend #start a thread for timeout checkdef startTimeoutKiller th = Thread.start { while 1 pid1 = @nowPid.to_i sleep(3) #wait for n seconds pid2 = @nowPid.to_i if pid1==pid2 && pid1>0 Process.kill 'TERM',pid1 p "kill a timeout whois , pid=#{pid1},is '#{@ask}'" end end }end def tryall @get.puts "begin "+ Time.new().to_s @work.puts "begin "+ Time.new().to_s @all.each{|p| if @range===p.length && self.unregisted?(p) puts "Now get domain : "+p.join(' ') @get.puts "#{p} #{p} #{p} order[#{@askcount}]" @get.flush @runcount += 1 end # sleep(3) if @askcount % 10 ==9 # break if @runcount >1 } puts "done with #{@runcount}!(#{Time.new()}),see run.log"endendparser = ReadPhrase.new("phrase_pinyin_freq_sc_20090402.txt",6..6)parser.initFromFileparser.startTimeoutKiller# p parser.unregisted?('qidian')parser.tryall 代码写得散乱,textmate用得不顺手,缩手缩脚,一点没有用eclipse写java那般行云流水的手感,没办法,写代码也是个经验活,和开车一样,写着写着就有感觉了嘛.
代码没用多线程,只开了个线程检查超时,因为怕whois命令太多导致被封IP,呵呵,机器开了一整夜才跑完1万多个5字母拼音域名.
还有3,4百个汉语拼音没被注册的,有价值的也很少了.
譬如"xieca.com",xieca与"鞋擦"同音,也许对浙江的小商品市场还有那么一丁点商业价值,呵呵,6字母的就太多了,不过双音节的不多.
以后还可以写个多线程版本,从godaddy,net.cn查询,效率应该可以提高不少.
页:
[1]