hbase-utils-scala

Some hbase utils which could be useful, written in Scala

Build Settings

Build settings in this repo are not complete yet

`Delete Utils`

Given a set of Ids (rowkeys) to be removed from a table, in an HDFS files,

There are three options being added for now.

Option 1 Delete the given set of Ids using a client job. This could be useful for a minimal number of Ids (not for bulk). I could not delete more than 10000 in a single run as the regions timed out quickly. (A healthy cluster could do more)

Option 2 Spawn a mapreduce job (Map Only) and do parrallel RPC calls (for Delete) from different mappers. This was another experiment, but same result as Option 1.

Option 3 Generate HFiles (with Delete marker) and do a bulk load. This solution worked for my case as I had around 10 million ids to be deleted. The job gets completed quickly (in 3-5 minutes) compared to previous options, but the table was not responding to a scan/get for a longer time after this job. It is advisable to perform this in batches (In my case I finally ended up in multiple batches to finish it - say 1 million in a batch)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
project		project
src		src
README.md		README.md
build.sbt		build.sbt
version.sbt		version.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hbase-utils-scala

`Delete Utils`

About

Releases

Packages

Languages

sujeshchirackkal/hbase-utils-scala

Folders and files

Latest commit

History

Repository files navigation

hbase-utils-scala

Delete Utils

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`Delete Utils`

Packages