tedeyang 发表于 2013-1-16 17:36:59

初学ruby,写个域名查询的小程序练练手

前端时间公司出钱买书,我定了本<Programming Ruby>,拿回家了,每天看一点点,半个月的时间也有了点收获.
下面星期天在家里写的一个小程序,起源于我想买个.com域名,心血来潮,想看看还有哪些拼音短域名没被注册过,说不定可以捡个漏呢,哈哈.
 
程序的关键之处在于词库和拼音,还好有人做好了:http://open-phrase.googlecode.com/files/phrase_pinyin_freq_sc_20090402.txt.bz2
这也是ibus所用的词库.
词库中含有中文,拼音以及词频,非常合用.
运行环境:linux,unix,freebsd,macos
 
#!/usr/bin/ruby# author tedeyang # 2010-8-2## A script for running whois command to detect which pinyin can be used for ".com" domain (like:zhongguo.com),the phrases are parsed from ibus phrase lib.#class ReadPhraseattr_reader :alldef initialize(file,lengthRange)@file = file@range = lengthRange@count = 0@runcount = 0@askcount = 0@all = Array.new()@work = File.new("work-#{@range.}.log","w")@get = File.new("domain-#{@range}.log","w")@nowPid = 0@ask = ""enddef initFromFilephraseFile = File.new(@file,'r')keystore = Hash.newphraseFile.each{|line|    phrase,pinyin,freq = line.split    two = (pinyin =~ /^\w+('\w+){0,2}$/) != nil    pinyin.gsub!("'","")    if pinyin.length <=8 && two      # p "#{pinyin}   #{phrase} (词频:#{freq})"      previousPinyin = @all] if keystore.has_key?(pinyin)      if previousPinyin == nil      @all <<       keystore = @count      @count = @count + 1      else          @all] = ,previousPinyin+","+phrase]      end      end}puts "read done!"puts "there are #{@count} phrases in file"@all.sort!{|p1,p2|    p1.length - p2.length}puts "sort done by phrase 's length!"end    def unregisted?(phrasePy)    @askcount += 1    domain = phrasePy + ".com"    @work.puts "#{@askcount}-----run whois #{domain}---------------------------begin with #{Time.new}"    @ask = "whois #{domain}"    whois = IO.popen(@ask,"r") do |pipe|      @nowPid = pipe.pid      result = pipe.read      @nowPid = 0      good = (/\s*(No match for).+/ =~ result) != nil      @work.puts "#{@askcount}-----run whois #{domain}------------#{good}------------end with #{Time.new}"      @work.flush      good    end    return whoisend    #start a thread for timeout checkdef startTimeoutKiller    th = Thread.start {      while 1      pid1 = @nowPid.to_i      sleep(3) #wait for n seconds      pid2 = @nowPid.to_i      if pid1==pid2 && pid1>0          Process.kill 'TERM',pid1         p "kill a timeout whois , pid=#{pid1},is '#{@ask}'"      end      end    }end    def tryall    @get.puts "begin "+ Time.new().to_s    @work.puts "begin "+ Time.new().to_s      @all.each{|p|         if @range===p.length && self.unregisted?(p)      puts "Now get domain : "+p.join(' ')      @get.puts "#{p} #{p} #{p} order[#{@askcount}]"      @get.flush      @runcount += 1      end      # sleep(3) if @askcount % 10 ==9      # break if @runcount >1    }    puts "done with #{@runcount}!(#{Time.new()}),see run.log"endendparser = ReadPhrase.new("phrase_pinyin_freq_sc_20090402.txt",6..6)parser.initFromFileparser.startTimeoutKiller# p parser.unregisted?('qidian')parser.tryall 代码写得散乱,textmate用得不顺手,缩手缩脚,一点没有用eclipse写java那般行云流水的手感,没办法,写代码也是个经验活,和开车一样,写着写着就有感觉了嘛.
代码没用多线程,只开了个线程检查超时,因为怕whois命令太多导致被封IP,呵呵,机器开了一整夜才跑完1万多个5字母拼音域名.
还有3,4百个汉语拼音没被注册的,有价值的也很少了.
譬如"xieca.com",xieca与"鞋擦"同音,也许对浙江的小商品市场还有那么一丁点商业价值,呵呵,6字母的就太多了,不过双音节的不多.
 
以后还可以写个多线程版本,从godaddy,net.cn查询,效率应该可以提高不少.
 
 
 
页: [1]
查看完整版本: 初学ruby,写个域名查询的小程序练练手