RednaxelaFX 发表于 2013-1-26 12:39:21

PermGen大了也不行

随便记一下。

今天有个应用说是full GC过于频繁。看了下发现执行的都是CMS GC,并不是真的stop-the-world的full GC。但确实是很频繁,几秒就触发一次。
堆的使用状况,eden/SS0/SS1的使用量都没啥特别的,old gen大概用了10%+,而perm gen用了70%+。光看空间占用量的话,都还达不到CMS的触发条件。.
$ jstat -gcutil `pgrep -u admin java` 1sS0   S1   E      O      P   YGC   YGCT    FGC    FGCT   GCT    37.21   0.0099.8112.8776.82   1767196.8433085 2998.088 3194.931 37.21   0.0099.8112.8776.82   1767196.8433086 2998.088 3194.9310.0047.48   1.0612.9076.82   1768196.9593086 2999.778 3196.7370.0047.48   1.8812.9076.82   1768196.9593086 2999.778 3196.737

有几个VM参数会影响GC堆的占用量状况与CMS的触发之间的关系:
product(uintx, MinHeapFreeRatio, 40,"Min percentage of heap free after GC to avoid expansion")product(intx, CMSTriggerRatio, 80,"Percentage of MinHeapFreeRatio in CMS generation that is allocated before a CMS collection cycle commences")product(intx, CMSTriggerPermRatio, 80,"Percentage of MinHeapFreeRatio in the CMS perm generation that is allocated before a CMS collection cycle commences, that also collects the perm generation")product(intx, CMSInitiatingOccupancyFraction, -1,"Percentage CMS generation occupancy to start a CMS collection cycle. A negative value means that CMSTriggerRatio is used")product(intx, CMSInitiatingPermOccupancyFraction, -1,"Percentage CMS perm generation occupancy to start a CMScollection cycle. A negative value means that CMSTriggerPermRatio is used")

在HotSpot VM里,上面几个参数是这样用的:
// The field "_initiating_occupancy" represents the occupancy percentage// at which we trigger a new collection cycle.Unless explicitly specified// via CMSInitiatingOccupancyFraction (argument "io" below), it// is calculated by:////   Let "f" be MinHeapFreeRatio in////    _intiating_occupancy = 100-f +//                           f * (CMSTriggerRatio/100)//   where CMSTriggerRatio is the argument "tr" below.//// That is, if we assume the heap is at its desired maximum occupancy at the// end of a collection, we let CMSTriggerRatio of the (purported) free// space be allocated before initiating a new collection cycle.//void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, intx tr) {assert(io <= 100 && tr >= 0 && tr <= 100, "Check the arguments");if (io >= 0) {    _initiating_occupancy = (double)io / 100.0;} else {    _initiating_occupancy = ((100 - MinHeapFreeRatio) +                           (double)(tr * MinHeapFreeRatio) / 100.0)                            / 100.0;}}
_cmsGen ->init_initiating_occupancy(CMSInitiatingOccupancyFraction, CMSTriggerRatio);_permGen->init_initiating_occupancy(CMSInitiatingPermOccupancyFraction, CMSTriggerPermRatio);

在这个应用里,这几个参数都没显式设置,用的就是默认值:
$ jinfo -flag MinHeapFreeRatio `pgrep -u admin java`-XX:MinHeapFreeRatio=40$ jinfo -flag CMSTriggerPermRatio `pgrep -u admin java`-XX:CMSTriggerPermRatio=80$ jinfo -flag CMSInitiatingPermOccupancyFraction `pgrep -u admin java`-XX:CMSInitiatingPermOccupancyFraction=-1
所以可以知道,CMS perm gen触发CMS GC的占用量是((100 - 40) + (80 * 40) / 100.0) / 100.0 = 92%

要观察CMS的触发条件的动态调整的话,有-XX:+PrintCMSInitiationStatistics参数可用。这里有该参数对应的日志的例子,https://gist.github.com/1050942,内容是类似这样的:
CMSCollector shouldConcurrentCollect: 42.910time_until_cms_gen_full 2.0111715free=32676856contiguous_available=44957696promotion_rate=1.00797e+07cms_allocation_rate=0occupancy=0.5003915initiatingOccupancy=0.9200000initiatingPermOccupancy=0.9200000

CMS GC默认不只通过old gen和perm gen的占用量来触发,还有别的一些条件。
product(bool, UseCMSInitiatingOccupancyOnly, false, "Only use occupancy as a crierion for starting a CMS collection")
如果这个参数是true的话那就只用占用量来触发了。

关于CMS GC的触发条件的一段注释:
// We should be conservative in starting a collection cycle.To// start too eagerly runs the risk of collecting too often in the// extreme.To collect too rarely falls back on full collections,// which works, even if not optimum in terms of concurrent work.// As a work around for too eagerly collecting, use the flag// UseCMSInitiatingOccupancyOnly.This also has the advantage of// giving the user an easily understandable way of controlling the// collections.// We want to start a new collection cycle if any of the following// conditions hold:// . our current occupancy exceeds the configured initiating occupancy//   for this generation, or// . we recently needed to expand this space and have not, since that//   expansion, done a collection of this generation, or// . the underlying space believes that it may be a good idea to initiate//   a concurrent collection (this may be based on criteria such as the//   following: the space uses linear allocation and linear allocation is//   going to fail, or there is believed to be excessive fragmentation in//   the generation, etc... or ...// [.(currently done by CMSCollector::shouldConcurrentCollect() only for//   the case of the old generation, not the perm generation; see CR 6543076)://   we may be approaching a point at which allocation requests may fail because//   we will be out of sufficient free space given allocation rate estimates.]bool ConcurrentMarkSweepGeneration::should_concurrent_collect() const {
// Decide if we want to enable class unloading as part of the// ensuing concurrent GC cycle. We will collect the perm gen and// unload classes if it's the case that:// (1) an explicit gc request has been made and the flag//   ExplicitGCInvokesConcurrentAndUnloadsClasses is set, OR// (2) (a) class unloading is enabled at the command line, and//   (b) (i)   perm gen threshold has been crossed, or//         (ii)old gen is getting really full, or//         (iii) the previous N CMS collections did not collect the//               perm gen// NOTE: Provided there is no change in the state of the heap between// calls to this method, it should have idempotent results. Moreover,// its results should be monotonically increasing (i.e. going from 0 to 1,// but not 1 to 0) between successive calls between which the heap was// not collected. For the implementation below, it must thus rely on// the property that concurrent_cycles_since_last_unload()// will not decrease unless a collection cycle happened and that// _permGen->should_concurrent_collect() and _cmsGen->is_too_full() are// themselves also monotonic in that sense. See check_monotonicity()// below.bool CMSCollector::update_should_unload_classes() {
// Support for concurrent collection policy decisions.bool CompactibleFreeListSpace::should_concurrent_collect() const {// In the future we might want to add in frgamentation stats --// including erosion of the "mountain" into this decision as well.return !adaptive_freelists() && linearAllocationWouldFail();}

=====================================================================

观察GC日志,发现CMS的initial mark阶段的暂停居然超过1.4s了。而印象中上次看这个应用的initial mark的暂停时间还不到1s,说明情况恶化了。这期间,MaxPermSize从256m调到了512m。感觉暂停时间的提高跟perm gen的增大很有关系。

2011-06-28T11:11:01.417+0800: 432933.547: 2003375K(4019584K), 1.9010460 secs] 2011-06-28T11:11:03.347+0800: 432935.478: 2011-06-28T11:11:04.737+0800: 432936.867: 2011-06-28T11:11:04.737+0800: 432936.868: 2011-06-28T11:11:04.752+0800: 432936.883: 2011-06-28T11:11:04.752+0800: 432936.883: 2011-06-28T11:11:07.783+0800: 432939.913: 2063538K->348295K(4019584K), 0.1675310 secs] CMS: abort preclean due to time 2011-06-28T11:11:09.938+0800: 432942.068: 2011-06-28T11:11:09.944+0800: 432942.074: 432942.074: 432942.192: 432942.192: 432942.279: 387351K(4019584K), 0.2316920 secs] 2011-06-28T11:11:10.176+0800: 432942.306: 2011-06-28T11:11:10.688+0800: 432942.818: 2011-06-28T11:11:10.688+0800: 432942.818: 2011-06-28T11:11:10.707+0800: 432942.838:

增大perm gen对这应用来说或许反而成毒药了。它大量使用了Groovy脚本,有比较频繁的新的类的生成与加载动作,简单看了段日志大概是每隔几分钟会加载十来个类。类卸载的速度足以维持perm gen不OOM。

嘛,回头再跟进了。
页: [1]
查看完整版本: PermGen大了也不行