Tuesday, February 22, 2011

Out with the old (SPI) - Modular Java with OSGi

Renewal starts with death. For me renewal often manifests itself in form of a real or imagined virus detection on my laptop. It prompts me to delete large swaths of stuff and replace it with new things. The latest outbreak was FakeJava. So I promptly deleted my old faithful JDK/JRE and Eclipse and anywhere else in the system where Java installers were lurking. For a Java developer this is pretty drastic. It gave me the chance to update my Eclipse with Helios and try the new improved osgi container implementation lurking in the Eclipse plugins folder. If you are a Java developer of any calibre and not fatally attracted to vim and emacs, you probably have Eclipse installed (or had it at some stage). Anyway long story short - dig inside the Helios install and you shall see something along the lines of org.eclipse.osgi_3.6.1.XXX.jar. This is the source of all Eclipse plugin goodness (no it is not all beacuse of the SWT).

Start up this jar as the osgi container, java -jar org.eclipse.osgi_3.6.1.XXX.jar -console will start up in interactive mode. If you are into IDE tooling (that is the reason you are using Eclipse in the first place anyway) there are a couple of options to make osgi play nicely with you - Eclipse PDE or BndTools. I have written up a few brief steps based on my experience getting a non-SWT, osgi-bundles based WorldWind GUI going. The basic ingredients were already there:

1) Eclipse PDE Tutorial to wrap Jogl (based on Jogamp but works just as well with classic JOGL).
2) Shrink-wrapped Eclipse projects to use with WorldWind, including Jogl and SWTGLCanvas (which I did not quite get to run).
3) BndTools component based development, somehow similar to fragments.
4) Using Eclipse Bundle Repository plugin to pull in the PDE based bundles - exported out of eclipse using Export Application Wizard (+some platform specific forcing hackery). Add this to build.bnd in the bndtools main repo - bndtools.bndplugins.repo.eclipse.EclipseRepo;location=${worldwind-repo};name=WorldWind-repo and the freshly exported PDE bundles will become available to BndTool. This step was the most frustrating and probably would not really need to be done if you stick with pure PDE or pure BndTools, but I had a lot of fun mixing and matching.
5) After everything is set-up you can create a project along the lines of the component tutorial using IGlobe instead of IGreeting and get WorldWind to provide you with a JPanel.
6) BndTools does not add the contents of Jar-in-jar bundles PDE produces into buildpath, make sure you add worldwind.jar and Jogl into the path for a sample build. At runtime OSGi will take care of this.

After this all the bundles will be playing happily in the OSGi Framework, things will look something like this:


START LEVEL 1
   ID   State         Level  Name
[   0] [Active     ] [    0] OSGi System Bundle (3.6.1.R36x_v20100806)
[   1] [Active     ] [    1] org.trikend.glob3.globeui (0)
[   2] [Active     ] [    1] Apache Felix Shell Service (1.4.2)
[   3] [Active     ] [    1] Apache Felix Declarative Services (1.4.0)
[   4] [Installed  ] [    1] JOGL native bindings for Linux x86 (1.1.2)
[   5] [Active     ] [    1] org.trikend.glob3.globe-impl (0)
[   6] [Active     ] [    1] JOGL (1.1.2)
[   7] [Active     ] [    1] osgi.cmpn (4.2.1.201001051203)
[   8] [Active     ] [    1] org.trikend.glob3.api (0)
[   9] [Installed  ] [    1] JOGL native bindings for MacOSX (1.1.2)
[  10] [Active     ] [    1] Worldwind (1.0.0)
[  11] [Resolved   ] [    1] JOGL native bindings for Windows x86 (1.1.2)
[  12] [Active     ] [    1] Apache Felix Shell TUI (1.4.1)


Here is a screenshot for further clarity (or lack thereof).

worldwind_bndtools_less

Saturday, February 19, 2011

Raster to Vector - and things in between

Coming from the remote sensing world, your day revolves around processing datasets from raster to vector and back. Grab a stereo satellite image of Melbourne, trawl through all the buildings in Leica Stereo view and digitize ground footprint, save to shapefile/kml/whatever vector format is the flavour of the day, render back to raster in 2.5D/3D. There is always information between the cracks of every pixel and delineated polygons and lines that is important but gets mangled in the constant transformation.

Spending some time in the NetCDF world I have become aware of the variety in scientific data and how the rigid Raster/Vector mentality of the GIS world does not really fit with it. Last year I went to the ARSPC Conference in Alice Springs and a Chris Tweedie (Technical Sales from Erdas) gave a talk on the joys of JPEG2000 and how all sorts of data can be stored in this format. There was somebody in the back of the room form IMOS/TERN projects disapproving at the disparity between the expectation of the GIS world and the scientific datasets he is used to hosting.
Regular vs Unstructured Grid
NetCDF has nice and fast access to gridded data sets with co-ordinate conventions allowing storage of data not only using cartesian grids but also with curvilinear grids as long as the parameters producing the grid are properly defined. Now there is move towards making this rather rasterlike approach more vectorlike with the idea of unstructured grids. Then NetCDF will become an even more general (and more complex) representation of all possible data - raster, vectors and everything in-between. There is real need for generalised (non vector/raster) representation of geographic information. Something that emphasizes on utility rather than conformance. The software to handle this data and ultimately the memory models need to think differently (in parallel) to stay in touch with reality. It does not help at all when there is a nomenclature clash, vector mean winds and currents in CMAR and mean points and polygons in GIS.

That aside recently I have spent a fair bit of time migrating from SVN to Mercurial. The OTB project made me fall in love with Mercurial. Afterwards Google code has fostered the obsession with working offline while making commits - aeroplane mode. There are myriads of ways for migrating from SVN to Hg - the stock convert extension, hgsvn or hgsubversion. You can even stay attached to your beloved subversion using subrepository techniques. Not everything however is smooth and some strange empty merging takes place along the way due to the repeated rebasing needed to stay in sync with SVN, hopefully the limbo will be over soon.

Sunday, February 6, 2011

Hash conflicts - lets keep it complicated

Hashing(or cryptology in general) is somewhere the keep-it-simple-stupid principle is often .... well quite stupid. I have run into hash collisions in weird and wonderful ways in the last year.

1) Firefox Bug - Hash collision due to Google Maps tile url's being too similar. The Firefox URL hash is based on simple circular shifts. Google ended up adding some permutations of the word Galileo to make the URL's more unique and less susceptible to hash collision.
2) Stuxnet Exploits - Windows stored CRC32 keys for scheduled tasks and these can collide leading to malicious code being run with super user privilege. The patch ended up changing the hashing to something less prone to collisions (SHA).
3) ESTA Visitor Registration site - My re-login ID collided with somebody from Perth and they called me to warn me. Even though a fair few organisations have my details, I am seriously disappointed in the US governments inability to protect my data.
4) Git and Mercurial + other DVCS - I moved over to the DVCS temple due to weakness of classic version control systems - linear version numbers and poor merging. The encoding of code change hunks with SHA makes patches much more unique and version numbers unambiguous, if you find  sha hashes hard to remember just use tags. Well collisions will become a problem with code revisions reaching a billion or so - at that point your software project will go to the cryptograhy hall-of-fame.

The proliferation of information in todays world is such that there is a high likelihood some overlap occuring. Consider the data pyramid generation for fast access case, this trick is performed everywhere in the image processing world. If you are lucky enough to have the data in a format that supports in-built overviews like Tiff you can add them to the file itself. Otherwise you are stuck inventing your own pyramid storage scheme .rrd files for Erdas Imagine, .ovr files for Ossim etc. Now you have to make sure you store the pyramids somewhere with write access and in case the data is on a CD you again create a tempfile which is reusable, but tied to the original file in an unambiguous manner - i.e. hashing the sourcefile is needed.

It would be much nicer to use a file format which allows internal arbitrary overviewing with compression, yes I am talking about the obvious - JPEG2000. At some point there was talk about JPEG2000 based compression for NetCDF, but it turned out to be mostly talk. NetCDF libraries decode GRIB data using JPEG2000, it would be nice to have the multi-dimensional data compressed with arbitrary overviews using the same scheme. There seems to be ongoing work on NetCDF streaming - ncstream, perhaps compression could feature there.

Friday, February 4, 2011

OTB-GPU and Amazon AWS (Cuda Enabled) - Cloud Processing

A while ago OTB did some experiments running image processing code on GPU's. It has not made it to mainline yet since we are waiting on ITK to add more structured and pervasive support for GPU's in their infrastructure. I though it would be a nice bit of code to test the not-so-new but still shiny Amazon AWS Cuda support.

The preconfigured instance with Nvidia CUDA toolkit installed runs CentOS 5. You will need additional repositories to grab goodies like cmake and mercurial to get going, from RPMForge. You will need lots and lots of version controls e.g. subversion and mercurial, even compilers. I should have started from the GIS AMI.Getting GDAL installed as a dependency can be slightly tricky, the CentOS packages from ELGIS did not work form me, lot's of missing dependent libraries. Best bet is to install from source. Then use a small hack to copy over some headers and build OTB.
Hudson Nodes
Build servers like Hudson can easily make use of such an image on AWS once configured with the right version control, configuration and build tools. I will have to test drive the Hudson CMake support with this instance. Otherwise I have been playing puppet master at home with Virtual Box and real hardware. I got the swarm plugin to register most my available platform to hudson as a build slaves, including the BeagleBoard. AWS can be accessed via a similar cloud/cluster plugin. I found the VirtualBox plugin rather cryptic, I think I will have to use the source for that one - it can be very useful for multi-platform installer and GUI testing, even recording instruction videos by playing through a UI test suite. Especially when using bootstrappers, errors are not detected at compile time. They only become apparent when the program is run on a clean system. Having a set of clean virtual machine snapshots makes it much easier to track down the error before it is released into the client base.