Dec03

How to install snappy with HBase 0.94.x

Posted by jmspaggi on 12/03/12  ~  Posted in: Non catégorisé

If you are not using an already bundled version of HBase including Snappy, you might have to install it on your own.

I spend the 3 last days working actively on that to figure how to do that, so I think it might help some others too. The steps are very straight forward and it will take you 5 minutes max to complete the installation!

First thing, if you don't already ahve it, you will need to download the  HBase tar file you need, install it and configure it (conf/).

At this point, the compression is not configured.

If you run:
bin/hbase org.apache.hadoop.hbase.util.CompressionTest file:///tmp/test.txt snappy

You will get something like:
12/12/03 10:30:02 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
12/12/03 10:30:02 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
12/12/03 10:30:02 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available.
12/12/03 10:30:02 DEBUG util.FSUtils: Creating file:file:/tmp/test.txtwith permission:rwxrwxrwx
12/12/03 10:30:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/12/03 10:30:02 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/test.txt. Expecting at least 5 path components.
12/12/03 10:30:02 WARN snappy.LoadSnappy: Snappy native library not loaded
Exception in thread "main" java.lang.RuntimeException: native snappy library not available
    at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:123)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
    at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:264)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.<init>(HFileBlock.java:739)
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishInit(HFileWriterV2.java:127)
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.<init>(HFileWriterV2.java:118)
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2$WriterFactoryV2.createWriter(HFileWriterV2.java:101)
    at org.apache.hadoop.hbase.io.hfile.HFile$WriterFactory.create(HFile.java:394)
    at org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(CompressionTest.java:108)
    


So download, make and install libsnappy from http://code.google.com/p/snappy/
Download the snappy tar file, extract it and run ./configure and then make.
This will generate a .libs/libsnappy.so file. This is the file you want!
Copy this file into your hbase lib/native/Linux-ARCH directory where arch is either amd64 or i386-32
HBase is not coming with the amd64 directory created, so you might have to create it yourself:
mkdir /home/hbase/hbase-0.94.3/lib/native/Linux-amd64-64
If you are not sure about where HBase is looking for the lib, change the loglevel to debug on the lo4j property file, and re-run the test. It will tell wher it's looking for the file.

If you run your test again, it will still fail and you will go something like:
12/12/03 10:34:35 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
12/12/03 10:34:35 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
12/12/03 10:34:35 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available.
12/12/03 10:34:35 DEBUG util.FSUtils: Creating file:file:/tmp/test.txtwith permission:rwxrwxrwx
12/12/03 10:34:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/12/03 10:34:35 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/test.txt. Expecting at least 5 path components.
12/12/03 10:34:35 WARN snappy.LoadSnappy: Snappy native library is available
12/12/03 10:34:35 WARN snappy.LoadSnappy: Snappy native library not loaded
Exception in thread "main" java.lang.RuntimeException: native snappy library not available
    at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:123)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:100)
    at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:112)
    at org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:264)
    at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.<init>(HFileBlock.java:739)
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.finishInit(HFileWriterV2.java:127)
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.<init>(HFileWriterV2.java:118)
    at org.apache.hadoop.hbase.io.hfile.HFileWriterV2$WriterFactoryV2.createWriter(HFileWriterV2.java:101)
    at org.apache.hadoop.hbase.io.hfile.HFile$WriterFactory.create(HFile.java:394)
    at org.apache.hadoop.hbase.util.CompressionTest.doSmokeTest(CompressionTest.java:108)
    at org.apache.hadoop.hbase.util.CompressionTest.main(CompressionTest.java:138)


As you can see, now you have "Snappy native library is available", but still not loaded.
If you want it to be, you need to copy your hadoop native libs too!
You will found the so file under hadoop-1.0.3/lib/native/Linux-ARCH/libhadoop.so.

If you don't have the native files there, look into you hbase-0.94.x\lib folder to validate which Hadoop version you are running, then go to https://archive.apache.org/dist/hadoop/core/ to download the related .tar.gz file. Extract it and you will find your natives files.

Again, ajudst ARCH for your architecture, and 1.0.3 for your hadoop version.

Running the test again is now giving this result:
12/12/03 10:37:48 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32 not available.
12/12/03 10:37:48 INFO util.ChecksumType: Checksum can use java.util.zip.CRC32
12/12/03 10:37:48 INFO util.ChecksumType: org.apache.hadoop.util.PureJavaCrc32C not available.
12/12/03 10:37:48 DEBUG util.FSUtils: Creating file:file:/tmp/test.txtwith permission:rwxrwxrwx
12/12/03 10:37:48 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/12/03 10:37:48 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/test.txt. Expecting at least 5 path components.
12/12/03 10:37:48 WARN snappy.LoadSnappy: Snappy native library is available
12/12/03 10:37:48 INFO snappy.LoadSnappy: Snappy native library loaded
12/12/03 10:37:48 INFO compress.CodecPool: Got brand-new compressor
12/12/03 10:37:48 DEBUG hfile.HFileWriterV2: Initialized with CacheConfig:disabled
12/12/03 10:37:49 WARN metrics.SchemaConfigured: Could not determine table and column family of the HFile path file:/tmp/test.txt. Expecting at least 5 path components.
12/12/03 10:37:49 INFO compress.CodecPool: Got brand-new decompressor
SUCCESS


It's working!

As you can see, there is no need (which this version?) to download and build hadoop-snappy.

Now you need to replicate that on all your regionservers and I will recommand you to configure your servers to check the snappy availability at the startup time.