Tải bản đầy đủ
Chapter 15. Too Many Column Families
umn will be updated many times a day. Over time, because of all the operations
on the counters, the memstore will be flushed into disk. This will create files that
mostly contain only counter operations, which at some point will be compacted.
However, when the compaction is performed, it will most probably select HFiles
that contain customer metadata. The compaction will rewrite all those huge cells
of customer metadata as well as the small counters. As a result, the vast majority
of the I/O will be wasted rewriting files with little to no change just to update or
compact the small counters. This creates an overhead on the I/Os. HBase triggers
compactions at the column family level. By separating the customer metadata
and the customer counters into two different columns families, we will avoid
unnecessarily rewriting the static information. It will lower the total IOPs on the
RegionServers and therefore will improve the overall performances of the appli‐
So how many column families is too many? We will not be able to give you a magic
number. If it makes sense to separate them from the access pattern or from the for‐
mat, separate them. But if you read and write them almost the same way and data has
almost the same format, then simply keep it together in the same column family.
Abusing column families will impact your application’s performance and the way
HBase reacts in different ways. Depending how hard you are pushing HBase, it might
also impact its stability because timeouts can occur, and RegionServers can get killed.
The first impact of too many column families is on the memory side. HBase shares its
memstore (write cache) among all the regions. Because each region is allowed a max‐
imum configurable cache size of 128 MB, this section has to be shared between all the
column families of the same region. Therefore, the more column families you have,
the smaller the average available size in the memstore will be for each of them. When
one column family’s bucket is full, all of the other column families in that region must
be flushed to disk as well, even if they have relatively little data. This will put a lot of
pressure on the memory, as many objects and small files will get created again and
again, but it will also put some pressure on the disks because those small files will
have to be compacted together.
Some work over HBASE-3149 and HBASE-10201 has been done to
flush only the column families that are full instead of flushing all of
them. However, this is not yet available in HBase 1.0. Once this fea‐
ture is available, the memory impact of having too many column
families will be drastically reduced, as will the impact on the
Chapter 15: Too Many Column Families
The number of column families affects the number of store files created during
flushes, and subsequently the number of compactions that must be performed. If a
table has eight column families, and region’s 128 MB memstore is full, the data from
the eight families is flushed to separate files. Over time, more flushes will occur.
When more than three store files exist for a column family, HBase considers those
files for compaction. If a table has one column family, one set of files would need to
be compacted. With eight column families, eight sets of files need to be compacted,
affecting the resources of the RegionServers and HDFS. Configuring fewer column
families allows you to have larger memstore size per family; therefore fewer store files
need to be flushed, and most importantly, fewer compactions need to occur, reducing
the I/Os on the underlying HDFS system. When a table needs to be flushed (like
before taking a snapshot, or if administrators trigger flushes from the shell), all the
memstores are flushed into disk. Depending on the previous operations, it is possible
that doing this will make HBase reach yet another compaction trigger and will start
compactions for many if not all the regions and column families. The more column
families, the more compactions will go into the queue and the more pressure will be
put on HBase and HDFS.
HBase stores columns families’ data into separate files and directories. When one of
those directories become bigger than the configured region size, a split is triggered.
Splits affect all column families in a region, not only the column family whose data
grew beyond the maximum size. As a result, if some column families are pretty big
while others are pretty small, you might end up with column families containing only
a few cells. RegionServers allocate resources like memory and CPU threads per
region and column family. Very small regions and column families can create unnec‐
essary pressure on those resources. The HBase master will also need to manage more
entries in the hbase:meta table and the underlying HDFS system. The HDFS
DataNodes and NameNodes will also need to manage more I/Os for the small col‐
umn family files. If you expect some column families to contain much more data than
others, you might want to separate that data into other tables, producing fewer but
Causes, Solution, and Prevention
The cause of having too many column families is always related to schema design.
HBase will not create too many column families on your behalf; there’s no automaticsplit behavior that can cause too many column families. Thus, to prevent this prob‐
lem, you need to carefully consider HBase principles before you begin designing your
schema so that you can determine an appropriate number of column families.
Causes, Solution, and Prevention
Several solutions exist for the issue of having too many column families. Understand‐
ing the data in the column families and the access pattern is critical. Sometimes the
column families are not required and can simply be dropped—for example, if the
data is duplicated/denormalized and is available in another table. Sometimes the col‐
umn family is present due to the access pattern (i.e., rollups or summary columnfamilies). In that case, maybe the column family can simply be de-coupled from the
table and moved to its own table. Other times, the data in the separate column family
can be merged together with larger column families.
All the operations in the next sections can be done using the Java API, but they can
also very simply be done using the HBase shell or the command line. Because the Java
API will not really add any benefit to those operations, we have not documented it.
Delete a Column Family
If you have decided that you don’t need a specific column family, simply remove this
column family from the table META information. The following method will delete
the picture column family from the sensors table:
alter 'sensors', NAME => 'picture', METHOD => 'delete'
This operation might take some time to execute because it will be applied in all the
regions one by one. At the end, related files in HDFS will be removed, and the
hbase:meta table will be updated to reflect the modification.
Merge a Column Family
Because of a flaw in the original schema design, or a shift in scope in the original use
case, you might have separated the data in two different column families but now
want to merge it back into a single column family. As we discussed in “Solution” on
page 179, CopyTable allows you to copy data from one table into another one.
CopyTable will help us in the current situation. The idea is that CopyTable will
require a source table and a destination table; however, those two tables don’t neces‐
sarily need to be different. Also, CopyTable allows us to rename one column family
into a new one (Figure 15-1).
Chapter 15: Too Many Column Families
Figure 15-1. CopyTable column families operations
CopyTable will run a MapReduce job over the data you want to read and will emit
puts based on what you asked. If for a given table called customer you want to trans‐
fer the data present in the column family address into the column family profile,
you simply need to set both the input and the output table to be the customer table,
the input column family to be address and the output to be profile. This operation
can be achieved by running the following command:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --new.name=customer \
At the end of the MapReduce job, all the data in that you had into the address col‐
umn family will also be present in the profile column family. You can now delete the
address column family using the alter command seen in the preceding code snippet.
There are few things to keep in mind when using the CopyTable
method of merging column families. Data may be overwritten in
the destination column family. If data with the same row/column
qualifier exists in both the source and destination column family,
the data in the destination column family will be overwritten.
When copying a column family, there will need to be enough free
space in HDFS to hold both copies temporarily. Before starting this
operation, estimate this additional space usage and account for it. If
you run this on a live production table, make sure any updates
made to the source column family are also made to the destination
column family. If you are using a supplied or custom timestamp on
your puts or deletes, avoid this method on a live table, as there
might be unexpected results.
Causes, Solution, and Prevention
It is also possible to merge back multiple column families into a single column family.
You simply need to specify them all by separating them with a comma:
Separate a Column Family into a New Table
Separating data into different tables might be desired for various reasons. Perhaps
atomicity of operations between a table’s column families is not needed or there are
significant differences in data size or access patterns between column families. Per‐
haps it just makes more logical sense to have separate tables. Again, we will make use
of CopyTable to perform the operations.
The following command copies the data from column family picture of the
customer table to the map table in the same column family:
hbase org.apache.hadoop.hbase.mapreduce.CopyTable --families=picture \
Here again, as long as you want to transfer multiple column families into the same
destination table, you can specify each desired column family by separating them
with a comma.
Both the destination table and the destination column family should exist before you
start the MapReduce job.
All the warnings from the previous paragraph also apply to the cur‐
Chapter 15: Too Many Column Families
As previously discussed, to maintain its parallel nature, HBase takes advantage of
regions contained in RegionServers distributed across the nodes. In HBase, all read
and write requests should be uniformly distributed across all of the regions in the
RegionServers. Hotspotting occurs when a given region serviced by a single Region‐
Server receives most or all of the read or write requests.
HBase will process the read and write requests based on the row key. The row key is
instrumental for HBase to be able to take advantage of all regions equally. When a
hotspot occurs, the RegionServer trying to process all of the requests can become
overwhelmed while the other RegionServers are mostly idle. Figure 16-1 illustrates a
region being hotspotted. The higher the load is on a single RegionServer, the more
I/O intensive and blocking processes that will have to be executed (e.g., compactions,
garbage collections, and region splits). Hotspotting can also result in increased laten‐
cies, which from the client side create timeouts or missed SLAs.
Figure 16-1. Region being hotspotted
The main cause of hotspotting is usually an issue in the key design. In the following
sections, we will look at some of the most common causes of hotspotting, including
monotonically incrementing or poorly distributed keys, very small reference tables,
and applications issues.
Monotonically Incrementing Keys
Monotonically incrementing keys are keys where only the last bits or bytes are slowly
incrementing. This means that most of the new key being written to or read from
HBase is extremely similar to the previously written or read key. The most commonly
seen monotonically incrementing key occurs when the timestamp is used as the key.
When timestamp is used as the row key, the key will slowly increment from the first
put. Let’s take a look at a quick example: in HBase, keys are stored ordered in lexico‐
graphical order. Our row keys will update as shown here:
If requests are writes, each of the preceding updates is going to go into the same
region until it reaches the key of the next region or its maximum configured size. At
that point, the region will split, and we will begin incrementally updating the next
region. Notice that most of the write operations are against the same region, thereby
burdening a single RegionServer. Unfortunately, when this issue is detected after a
deployment, there is nothing you can do to prevent hotspotting. The only way to
avoid this kind of issue is to prevent it with a good key design.
Poorly Distributed Keys
As stated before, key design is very important, as it will impact not only the scalability
of your application, but also its performance. However, in the first iteration of schema
design in a use case, it is not uncommon for keys to be poorly designed and therefore
wrongly distributed. This can occur due to a lack of information at the time of the
schema design but also because of issues when implementing the application. A good
example of a poorly distributed key is when you expect the source data to send you
keys with digits distributed between “0” and “9”; but you end up receiving keys
always prefixed with value before the expected “0” to “9” values. In this case, the
Chapter 16: Hotspotting
application expects to receive “1977” and “2001” but gets “01977” and “02001”
instead. In this example, if you had properly presplit the table into 10 regions (up to
“1”, “1” to “2”, “2” to “3”, etc.), then all the values you received and stored in HBase
would be written to the first region (up to “1”) while all of the other regions would
remain un-touched. In this case, even though we had the right intentions with the
schema design, the data will never get fully distributed. This issue should be discov‐
ered during proper testing, but if you discover this issue after the application has been
deployed, don’t despair—all is not lost. It is recommended to split the hotspotting
region into multiple regions. This should restore the expected distribution. In this
example, you will have to split the first region into 10 regions to account for the lead‐
ing zero. The new region range distribution being “00”, then “00” to “01”, “01 to “02”,
and so on. The other regions, after “1”, will not be used if the keys are always prefixed
by “0” and can be merged together one by one through region “10”.
Small Reference Tables
This common hotspotting issue refers to the bottleneck that results from the use of
small reference tables typically to perform joins in HBase. For this example, we have
two reference tables consisting of a single region defining postal codes and city
names. In this case, we perform a MapReduce join over a billion-row orders table. All
the generated mappers are going to query those two tables to perform the lookups. As
a result, the two RegionServers hosting the two reference regions will be over‐
whelmed by calls from all the other servers in the cluster. If you are really unlucky,
those two regions will be served by the same RegionServer. This kind of contention/
bottleneck will increase latency and create delays that can lead to job failures due to
timeouts. The good news is that there are multiple ways to avoid this situation. The
first and easiest option is to presplit your reference tables. The goal would be to have
close to as many regions as you have RegionServers. Presplitting the reference table
may not be an option if the table is too small or if you have too many RegionServers
on HBase at the current time.
The other option here would be to distribute this table to all the nodes before per‐
forming the join. The distribution of this data will be done using the MapReduce dis‐
tributed cache mechanism. The distributed cache framework will copy the data to the
slave node before any tasks are executed. The efficiency of this approach stems from
the fact that the files are copied only once per job.
When all servers have a local copy of the data, and if the data is small enough to fit
into memory, it can then be loaded from the setup method of the MapReduce code,
and lookups can now be performed into memory instead of reading from the disk.
This will increase performance, while also fixing the hotspotting issue.
If MapReduce also updates or enriches the reference table, the dis‐
tributed cache is not a viable option. Indeed, distributing a copy of
the table at the beginning means the content is fixed for the dura‐
tion of the job. If updates to the reference table are required in your
use case, then the best option is to presplit your reference table to
ensure even distribution of the table across the RegionServers.
The final example of region hotspotting is related to application design or implemen‐
tation issues. When a region is hotspotting, it is very important to identify the root
cause. To accomplish this, we need to determine the source of the offending calls. If
the data is very well distributed into the table, the hotspotting may be coming from a
bug causing writes to always land in the same region. This is where mistakenly added
prefix fields, double or triple writes, or potentially badly converted data types can
manifest themselves as application issues. Let’s imagine the system is expected to
receive a four-byte integer value, but the backend code converts that to a height byte
value. The four initial bytes might have been very well distributed over the entire
range of data, however, adding four empty bytes to this value will create a never
incrementing prefix of four empty bytes [0x00, 0x00, 0x00, 0x00] all landing into the
same HBase region and creating the hotspot.
Meta Region Hotspotting
Another commonly seen issue is applications hotspotting the META region. When
creating a connection to an HBase cluster, the application’s first stop is ZooKeeper to
acquire HBase Master and the META region location. The application will cache this
information, and will then query the META table to send read and write requests to
the proper region. Each time a new connection is created, the application will again
go to ZooKeeper, and to META. To avoid having all those calls to ZooKeeper and the
META table each time you perform a request to HBase, it is recommended to create a
single connection and to share it across your application. For a web service applica‐
tion, it is recommended to create a pool of a few HBase connections and share them
with all the threads on the application side.
Prevention and Solution
The best way to solve hotspotting is to prevent it from happening. This starts right at
the beginning of the project with a well-tested key design (refer back to “Key and
Table Design” on page 189 if you need a refresher on how to do this). It is also impor‐
tant to keep an eye on all your region metrics to have early detection of potential hot‐
spotting. On the HBase Master web interface, the table page shows the number of
requests received for each of the table’s regions. Requests column represents the num‐
Chapter 16: Hotspotting
ber of read and write requests received by the region since it has been online. When
regions are moved to other RegionServers, or when a table is disabled, the region
metrics are reset. So when a region shows a very high number compared to the oth‐
ers, it might be hotspotting but could also be due to very recent balancing of the
region between two servers. The only way to determine if it is from hotspotting or
region transitions is by looking through the logs or monitoring the suspect region
over time. The best way to avoid these issues in production is to put your application
through proper testing and development cycles.
Prevention and Solution