主题：Linux基本命令面试题：统计文件a.txt中“每个单词”的重复出现次数-C-编程技术-六狼论坛-IT论坛-计算机论坛

SunRuing 发表于 2013-2-1 09:54:02

主题：Linux基本命令面试题：统计文件a.txt中“每个单词”的重复出现次数

[请教]统计文件a.txt中“每个单词”的重复出现次数？若该文件大到几个G又该如何处理？

方案一：

#!/bin/sh

#定义源文件和临时文件
srcfile=word.txt
tempfile_words=tempfile_words
tempfile_words_uniq=tempfile_words_uniq

#取出所有单词，存入临时文件$tempfile_words，一行一个单词
#去除$tempfile_words中重复单词，并把换行符替换为空格，存入临时文件$tempfile_words_uniq
tr "[\015]" "[\n]"<$srcfile|sed 's/[^0-9a-zA-Z ]*$*$[^0-9a-zA-Z]*/\1\n/g'|sed '/^$/d'>$tempfile_words
sort $tempfile_words|uniq|tr "[\n]" "[ ]">$tempfile_words_uniq

#遍历所有单词，统计数目
words=$(cat $tempfile_words_uniq)
for word in $words
do
word_num=$(grep $word $tempfile_words|wc -l)
echo $word $word_num
done

方案二：

tr -s "\t| " "\n" <word.txt|sort|uniq -c

页: [1]

六狼论坛's Archiver

主题：Linux基本命令面试题：统计文件a.txt中“每个单词”的重复出现次数