As the name suggests, filter means to extract or take out only required data and discard useless or excess data. HBase provides a good number of filters, which we can use in get and scan operations to extract or fetch only the needed data from HBase, preventing scanning-not-required data.
HBase filters are a powerful feature that can greatly enhance effectiveness while working with data stored in tables. The two read functions for HBase, get()
and scan()
, support direct access to data and the use of a start and end key, respectively. We can limit the data retrieved by adding limiting selectors to the HBase query. These include column families, column qualifiers, timestamps, ranges, and version numbers.
We can represent HBase filter uses as shown in the following diagram, where we specify filters in get
or scan
. It fetches data from different RegionServers where these filters are shipped using RPC calls and compared with the local data at RegionServers: