关于hotspot虚拟机中CardTable数组的一点疑惑探究

发表于 2021-04-22 更新于 2025-08-12 分类于技术沉淀

问题的引出

今天上午收到之前项目同事发过来的微信留言：

我看到现在他（指的是《深入理解Java虚拟机：JVM高级特性与最佳实践》作者）还没有解释card table为何一个元素占用1个byte，而不是我理解的1bit

我们看看该问题的上下文是怎样的，在《深入理解Java虚拟机：JVM高级特性与最佳实践》第三版第3章第3.4.5节中，作者写到：

卡表最简单的形式可以只是一个字节数组，而HotSpot虚拟机确实也是这样做的。以下这行代码是HotSpot默认的卡表标记逻辑：

CARD_TABLE [this address >> 9] = 0

此处确实写的是字节数组，那么为什么不是布尔数组呢？经过一番短暂电话沟通和后，我们并没有得出一个明确结论。

问题的排查

正好最近空闲时间比较多，先去深圳湾体育场打个疫苗，然后按照侯捷大神的名言：源码面前，了无秘密，我倒要看看源码里究竟是怎么实现的。在jdk8中card table的实现类为： cardTableModRefBS ，通过一顿分享后，答案基本清晰，抽取出和本文相关的代码并进行相应的注释：

class CardTableModRefBS: public ModRefBarrierSet {

public:
  // Constants
  enum SomePublicConstants {
    card_shift                  = 9,
    card_size                   = 1 << card_shift,
    card_size_in_words          = card_size / sizeof(HeapWord)
  };
  
  //卡页的状态等相关值，这里很有意思，之所以没用CardStatus，推测是因为这里面的定义不仅仅包含
  //卡页的状态，还有其他维度上的定义，比如标记某卡页为最后一个的：last_card
  enum CardValues {
    //某卡页是否干净
    clean_card                  = -1,
    // The mask contains zeros in places for all other values.
    clean_card_mask             = clean_card - 31,

    dirty_card                  =  0,
    precleaned_card             =  1,
    claimed_card                =  2,
    deferred_card               =  4,
    last_card                   =  8,
    CT_MR_BS_last_reserved      = 16
  };
  
  //封装了上述枚举值
  static int clean_card_val()      { return clean_card; }
  static int clean_card_mask_val() { return clean_card_mask; }
  static int dirty_card_val()      { return dirty_card; }
  static int claimed_card_val()    { return claimed_card; }
  static int precleaned_card_val() { return precleaned_card; }
  static int deferred_card_val()   { return deferred_card; }
  
  //本卡表所管理的堆内存区域
  const MemRegion _whole_heap;       // the region covered by the card table
  //卡表数组的长度（也即堆内存中内存块的个数，每一个内存块的大小为：2^card_shift = 2^9 = 256字节）
  size_t          _byte_map_size;    // in bytes
  //卡表数组
  jbyte*          _byte_map;         // the card marking array
  //关键点1：
  //原注释翻译：卡表数组相对于堆内存起始位置基址的偏移，
  //如果堆的首地址为0x0，那么这将是_byte_map的第0个元素。而实际上，堆是从某个较高的地址开始的，
  //因此该指针实际上指向了_byte_map数组前面的某个位置
  //上面这么直接翻译实在难以理解，一语道破：hotspot在实现堆内存和卡表数组之间相互寻址时，和8086/8088中
  //的寄存器相对寻址方式极其相似！（甚至说原理上是完全一致的）该寻址的方式为：
  //有效地址 = 基址寄存器 + 偏移量，变换成公式为：EA = BX/BP + 8位或16位位移量
  //而此处的byte_map_base正是公式中的偏移量，更详细解释参考下面的初始化函数：initialize中的注释
  // Card marking array base (adjusted for heap low boundary)
  // This would be the 0th element of _byte_map, if the heap started at 0x0.
  // But since the heap starts at some higher address, this points to somewhere
  // before the beginning of the actual _byte_map.
  jbyte* byte_map_base;
  
  void CardTableModRefBS::initialize() {
  
    //找到所管理的堆内存的起始和结束位置
    HeapWord* low_bound  = _whole_heap.start();
    HeapWord* high_bound = _whole_heap.end();

    //为卡表数组申请堆内存
    ReservedSpace heap_rs(_byte_map_size, rs_align, false);

    //将base()返回的地址作为数组地址（数组的地址就是数组首元素的地址）（推测base()应该代表这块堆内存的首地址）
    _byte_map = (jbyte*) heap_rs.base();
    //关键点2：
    //下面这行代码中，uintptr_t(low_bound) >> card_shift的含义是：计算堆起始位置的基址（并且间隔为256个字节）
    //而整个表达式的含义是：计算卡表数组地址到堆起始位置基址的偏移
    //当我们把这个表达式切换一下：_byte_map = (uintptr_t(low_bound) >> card_shift) + byte_map_base;
    //那么表达的含义是：给定任意的堆内存地址p，它映射到卡表数组中的地址都可以用自己的基址加上byte_map_base算出来，
    //即：(uintptr_t(p) >> card_shift) + byte_map_base，也即：byte_map_base[(uintptr_t(p) >> card_shift)]
    //而这正是byte_for函数的实现
    byte_map_base = _byte_map - (uintptr_t(low_bound) >> card_shift);
    //断言：堆内存的起始位置和结束位置应该都在卡表数组内
    assert(byte_for(low_bound) == &_byte_map[0], "Checking start of map");
    assert(byte_for(high_bound-1) <= &_byte_map[_last_valid_index], "Checking end of map");

    //省略...
	}
  
  //计算任意堆内存地址映射到卡表数组的地址
  // Mapping from address to card marking array entry
  jbyte* byte_for(const void* p) const {
    assert(_whole_heap.contains(p),
           err_msg("Attempt to access p = "PTR_FORMAT" out of bounds of "
                   " card marking array's _whole_heap = ["PTR_FORMAT","PTR_FORMAT")",
                   p2i(p), p2i(_whole_heap.start()), p2i(_whole_heap.end())));
    //这个表达式实际上就是上面initialize中分析的结果
    jbyte* result = &byte_map_base[uintptr_t(p) >> card_shift];
    assert(result >= _byte_map && result < _byte_map + _byte_map_size,
           "out of bounds accessor for card marking array");
    return result;
  }
  
  //计算任意堆内存地址到卡表数组的索引
  // Mapping from address to card marking array index.
  size_t index_for(void* p) {
    assert(_whole_heap.contains(p),
           err_msg("Attempt to access p = "PTR_FORMAT" out of bounds of "
                   " card marking array's _whole_heap = ["PTR_FORMAT","PTR_FORMAT")",
                   p2i(p), p2i(_whole_heap.start()), p2i(_whole_heap.end())));
    //先计算该地址映射到卡表数组内的位置，然后再减去卡表数组的地址（也即数组首元素的地址），就是相对于数组地址的
    //偏移，也即数组索引
    return byte_for(p) - _byte_map;
  }
  
  //某个卡页是否脏了
  // These are used by G1, when it uses the card table as a temporary data
  // structure for card claiming.
  bool is_card_dirty(size_t card_index) {
    return _byte_map[card_index] == dirty_card_val();
  }
  
}

问题的结论

通过源码分析可知，被管理的堆中每一个内存地址都可以被映射到卡表数组中，而实际上不需要精细到每一个内存都做映射，这将导致极大的内存浪费，因此hotspot将每256个字节划分为一块内存（为这块内存取个好听的名字：卡页），这一块内存都映射到数组的某个元素上，而实际上每一个卡页不仅仅只有脏（dirty_card）和干净（clean_card）两种状态，还包括其他状态，例如：precleaned_card、claimed_card、deferred_card等，至于这些状态的含义嘛，这又是另外一篇博客的事情了。