zkrym 发表于 2013-1-19 04:14:16

對List集合進行中英混合排序

示例:
假设有这样的十条记录:
Idrecord1A ''Healthy Schools'' program in Hong Kong: Enhancing positive health behavior for school children and teachers21996 opinion survey on civic education: Final report3'A Luxury for the First World': A western perception of Hong Kong Chinese attitudes towards inclusive education4A Chinese cultural critique of the global qualifying standards for social work education5A baseline survey of students' attitudes towards gender stereotypes and family roles6「中國文化」項目的教學與教學上的銜接7中國語文及文化科的組織淺探8「中學美術與設計科應用電子科技教學」試驗計劃9點滴校園:學校社會工作文集 10齊來說故事:透過小組學習方式改善說話的表達能力及態度
通常意义上,我们想通过排序获得的排序结果如下:
IDrecord11996 opinion survey on civic education: Final report2A baseline survey of students' attitudes towards gender stereotypes and family roles3A Chinese cultural critique of the global qualifying standards for social work education4A ''Healthy Schools'' program in Hong Kong: Enhancing positive health behavior for school children and teachers5'A Luxury for the First World': A western perception of Hong Kong Chinese attitudes towards inclusive education6點滴校園:學校社會工作文集 7齊來說故事:透過小組學習方式改善說話的表達能力及態度8「中國文化」項目的教學與教學上的銜接9中國語文及文化科的組織淺探10「中學美術與設計科應用電子科技教學」試驗計劃
由排序结果可知,我们将数字排在最前面,然后按字母排序,再按中文拼音排序。这个排序通常是没有问题的,但由数据我们会发现几个问题:字母大小写、符点符号(特殊符号)、中英混合排序等特殊性的地方。对此,如果数据量不是成万上十万条的情况下,我们可以采取如下步骤进行处理:
1、定义一个对象
Public class RecordInfo{
private String id;
private String record;
private String recordtemp;
//get、set方法省略……
}
2、例如这些记录是在数据库中的,我们将之取出,以RecordInfo对象的方式存储在某一个List集合中,在存入list过程中,为了处理特殊符号,我们可以将处理过后的值set到recordtemp字段中(可以用rePlaceAll()的方法处理特殊符号)。而且加入如下代码:
Collections.sort(listtemp, new AuthorVOCompare());
这里有一个AuthorVOCompare类,这个类主要用于重排序,代码如下:
import java.util.Comparator;import net.sourceforge.pinyin4j.PinyinHelper;import net.sourceforge.pinyin4j.format.HanyuPinyinCaseType;import net.sourceforge.pinyin4j.format.HanyuPinyinOutputFormat;import net.sourceforge.pinyin4j.format.HanyuPinyinToneType;import net.sourceforge.pinyin4j.format.HanyuPinyinVCharType;import net.sourceforge.pinyin4j.format.exception.*;@SuppressWarnings("unchecked")public class AuthorVOCompare implements Comparator {public int compare(Object op1, Object op2) {RecordInfo record1 = (RecordInfo) op1;String o1 = record1.getRecordtemp();RecordInfo record2= (RecordInfo) op2;String o2 = record2. getRecordtemp ();for (int i = 0; i < o1.length() && i < o2.length(); i++) {int codePoint1 = o1.charAt(i);int codePoint2 = o2.charAt(i);if (Character.isSupplementaryCodePoint(codePoint1)|| Character.isSupplementaryCodePoint(codePoint2)) {i++;}if (codePoint1 != codePoint2) {if (Character.isSupplementaryCodePoint(codePoint1)|| Character.isSupplementaryCodePoint(codePoint2)){return codePoint1 - codePoint2;}String pinyin1 = pinyin((char) codePoint1);String pinyin2 = pinyin((char) codePoint2);if (pinyin1 != null && pinyin2 != null) {// 两个字符都是汉字if (!pinyin1.equals(pinyin2)) {//這一條尤為重要,如果調用的是compareTo是不忽略大小寫的return pinyin1. compareToIgnoreCase (pinyin2);}} else {return codePoint1 - codePoint2;}}}return o1.length() - o2.length();}/**对中英文排序**/private String pinyin(char c) {if (String.valueOf(c) == null || String.valueOf(c).length() == 0) {return "";}HanyuPinyinOutputFormat format = new HanyuPinyinOutputFormat();format.setCaseType(HanyuPinyinCaseType.LOWERCASE);format.setToneType(HanyuPinyinToneType.WITHOUT_TONE);format.setVCharType(HanyuPinyinVCharType.WITH_V);String output = "";try {if (java.lang.Character.toString(c).matches("[\\u4E00-\\u9FA5]+")) {String[] temp = PinyinHelper.toHanyuPinyinStringArray(c, format);if (temp != null && temp.length > 0) {output += temp;}} else {output += java.lang.Character.toString(c);}} catch (BadHanyuPinyinOutputFormatCombination e) {e.printStackTrace();}return output;}}
这个类中需要引入一个jar包:pinyin4j-2.5.0.jar,主要用于拼音的排序,可以网上下载。


ps:本文無版權,代碼部分來摘自網絡。
页: [1]
查看完整版本: 對List集合進行中英混合排序