详解HBase Compaction

HBase 中有一种数据行为叫Compaction,从字面的意思就是数据文件合并,本文对Compaction的目的,控制方法,具体实施过程等几个方面对HBase 的 Compaction 行为进行了介绍。

1. Compaction是什么

合并多个HFile生成一个HFile

Compaction有两种:

  • Minor Compaction(部分文件合并)
  • Major Compaction(完整文件合并)

2.为什么要Compaction?

  • 减少HFile文件的个数
  • 提高性能
  • 清除过期和删除数据

3.配置

修改Hbase配置文件可以控制compaction行为

键值 默认值 意义
hbase.regionserver.thread.splitcompactcheckfrequency 20s compaction检查周期
hbase.hstore.compactionThreshold 3 最小minor compaction的文件个数
hbase.hstore.blockingStoreFiles 7 Block flush操作的Store个数
hbase.hstore.blockingWaitTime 90s Block flush操作的等待时间
hbase.hstore.compaction.max 10 最大minor compaction的文件个数
hbase.hregion.majorcompaction 1 day Major compaction的周期

4.流程

Compaction是一个Async的过程,可以由客户端发起,也可能是服务器端自己检查发起compaction.

1)客户端发起

Client端:

HBaseAdmin::compaction or majorCompaction

==>HMaster modifyTable

==>RegionManager::startAction

==> put into map regionsToCompact and regionsToMajorCompact

==>Send to HRegionServer

Server端:

HRegionServer::run forward the request to CompactionSplitThread

==>CompactionSplitThread handle the request from queue

==>HRegion::compactStores

==>Do compaction preparations, create the compaction folder

==>HStore::compaction

==>Create a HFile.Writer for writing

==>Create a StoreScanner for major compaction

==>Create a MinorCompactionStoreScanner for minor compaction

==>Scan the scanner and write to the hfile

==>Complete the compaction,delete old files and move the file to store folder

2) Server检查发起

Major compaction:

Major compaction由region server定期检查

==>HRegionServer::MajorCompactionChecker

==>Send the request to CompactionSplitThread

Minor compaction:

Minor compaction由Memstore flush到HDFS前检查

==>MemStoreFlusher::flushRegion

==>Send the request to CompactionSplitThread

原文链接:http://www.spnguru.com/?p=271

anyShare一切看了好文章不转的行为,都是耍流氓!
          

无觅相关文章插件,快速提升流量

分类 Hadoop&HBase · tag ,