Tuesday, September 10, 2013

JPEG2000 Encoding with CUDA

For the last few days we have been struggling to efficiently write out very large JPEG-2000 files. In all their wisdom Erdas decided to supply only a 32-bit version of the ECW compressor. It seems to have trouble allocating memory for large JPEG-2000 files, is single threaded and dead-slow. Somewhere in the Ermapper install directory there is also the Kakadu core dll. So I decided to have a look at other methods of creating JPEG-2000, possibly faster and more scalable options. Kakadu seems to only supply an SDK and we can test the speed of compression with the SDK. GDAL also supports building against the Kakadu SDK and would be handy for creating the large mosaics we have been struggling with. It seems with modern hardware that is not the fastest option.

After some trial with Kakadu and a bit of reading I came across a few CUDA based implementations - CUJ2K and JPEG2K.  I decided to port the most recently maintained CUDA based JPEG-2000 implementation, claiming to be the fastest on commodity hardware to windows. The CMake based build system makes things easier. The code however was written for a POSIX based systems so I was missing a bunch of headers. I had to hack away at it and move variables around to cope with lack of C99 compliance in the MSVC C-Compiler or I am forced to used the /TP flag to upgrade the C-files to C++ files and allow variable declaration anywhere. Lots of moving variables around later, everything worked fine.

The compression library is built against FreeImage which functions as an I/O driver for the encoder/decoder. However only writing to the J2K profile is supported, a wrapper for the JP2 mode will be required to add-in all the projection metadata GIS Rasters typically require. To make this library more useful for GIS purposes we will need to add an I/O handler for GDAL instead of FreeImage to allow faster compression/decompression.

I have tested the JP2ECW, JP2OpenJPEG and JP2Kakadu drivers on a small file (26MB in .bmp format, since the Kakadu demo binary does not support LZW .tif I have lying around). The times averaged over a few runs on an i7-3930K @ 3.2GHz are shown below:
  • LibECW - 0.8s
  • Kakadu (1 Thread) - 1.5s
  • Kakadu (8 Threads) - 0.5s
  • OpenJPEG - 2s
Compared to these the CUDA library on the GTX580 on the same machine runs at 1.2s (with compression taking 0.75s, rest is I/O).  The logical next step is hooking up the compressor to GDAL for data-block I/O and building a JP2 wrapper in addition to the J2K profile which it currently uses. The Affero GPL licence also needs to be sorted out.


For those interested in the windows build, here is the SVN Diff against rev 84 and the pre-built release binaries against CUDA 5.5.