Sunday, December 27, 2020

Trucks vs Trains as an analogy for Microservices vs Monoliths

2018 and 2019 was mostly spent obsessing over containers, trucks, trailers and hand written paper invoices for me. I was helping build out the technology stack and engineering team for Lori Systems. Early in 2019 we made our first DevOps hire, getting Clive from Safaricom and getting started on migrating our handrolled Django monolith from EC2 to EKS. We would make jokes around shipping containers using containers. Clive even had a container shaped stressball with the EKS logo on it. This set me thinking on the parallels between shipping code and shipping goods, perhaps also led to the foundations of this post.

Intermodal Shipping in the real-world and in software

Over the almost 2-years of work in Logistics I learnt a lot about how the global logistics system works. Almost like the life-blood of the planet. Large container ships abstract away contents and ship things from Taiwan to Timbukutu. The seminal book on this topic is perhaps, The BOX. Watching global shipping lanes in Marine Traffic and scraping ships arriving in Mombasa from the KPA Sharepoint became a daily ritual. I digress, back to the original point on the importance for containerization in shipping code or machinery.

Docker uses the ubiquitous whale/ship logo, most containers arrive at ports this way from the oceans of developers. I don't quite have an analogy here for the massive ships that land the containers at ports, some 500 or 1000 TEU's at a time. The analogy here covers land transport aspects, somewhat related to how code runs in production and is typically served via datacenters / public clouds to users.

Containers themselves make transfer of goods/code from development (ships) to production (trains/trucks) easy. However even containerized applications can demonstrate tight coupling similar to what a train has, in effect being a distributed monolith, instead of a true suite of microservices. In my opinion, any system that requires a release train approach for new features is most likely to be a distributed monolith masquarding as microservices. The real flexibility comes from the low coupling between containers and the freedom to release each clearly delineated service at its own cadence on the roads.

Trains are awesome

My 5yo is currently obsessed with steam engines, even though they are from an era long gone. There is something magical about a powerful engine pulling everything along smoothly on a set of constraints (rails). It works nicely as long as no quick changes are needed in the carriages and everyone wants to get to the same destination. Trouble arises when something in the closely coupled chain of components goes awry and requires a quick change. I still don't understand the scene in snow piercer where a few wagons were dumped in a siding at speed. If we can do that one neat trick perhaps monoliths would become much more maintainable. In early stages of a product monoliths are a nice simple entry point, especially if the features are narrowly scoped and well coupled. On the reverse the monolith may be a very good idea for a mature product which is not changing rapidly and perhaps needs to be optimised for performance instead by reducing communication overhead between components by introducing tight coupling. In both cases a modular approach and service-oriented designs are still feasible, as long as the implementation and maintenance team is aware of the implications. People are still driving around in classic cars from the 1900's, where as steam locomotives from that era are languishing in museums.

Trucks are flexible

One of the killer advantages of trucks in the logistics business is their ability to deliver right to the factory or warehouse loading bay. It is simply not feasible to build train tracks to serve every address. Even in areas with great railway infrastructure, buffers (known as Inland container depots) have to be placed to cover the last few miles of transport from the rail to the industrial areas. This sort of mode can sometimes be seen in Microservices being layered on older monoliths to provide user facing services, especially in banking systems. The other great advantage trucks have is the ability overtake each other gradually along the road, this manifests itself in software systems as rolling deployment of new features. Such an approach requires careful management of the stateful parts of the system such as storage and database schemas. Otherwise it turns into a Fast and Furious game of stealing a container from a moving platform, aka the Romanian Rollover.

This analogy is not new

The logistics analogies are rife in software engineering, we ship code, we package things, we have release trains. The largest real world container orchestration organization Maersk uses a 7-point logo surprisingly similar to the most popular software container orchestration platform Kubernetes. I will continue updating this post as more ideas and links come together.

You can engage with article via comments or the twitter thread.

Sunday, October 4, 2020

Desktop Software API's in Python (KiCAD, FreeCAD, Blender, QGIS)

Python wraps around everything

For the last couple of years I have mostly written Satellite Data Processing code in Python and plenty of Flask/Django web services. However Python is also an excellent automation tool for GUI based applications allowing custom plugins to be written and functionality provided out of the box extended by users.

The first desktop application I seriously looked at Python plugins for was QGIS. It was early days of learning how to wrap C++ code using SWIG/SIP etc. In the old mailing list you can find a much younger me making inane comments about mixing wrapper metaphors in QGIS with SWIG + SIP. We have come a long way since then and SIP based bindings are the mainstay of QGIS plugins.


QGIS has so many Python plugins that they need a registry of their own. Occasionally QGIS Python gets twisted around itself due to multiple Pythons in the user enviroment. You can also flip the python API around and instead of building a plugin you can turn QGIS into a custom desktop application. Which is what I have done with my basic Airport Viewer demo.

QGIS being a fairly extensive and complex C++ application which takes hours to compile, being able to make small quick changes in python is invaluable.


At the time of writing KiCAD has an extensive Python API for processing the automating the PCB layout part of the workflow and this has lead to many innovations in automating traditionally laborious hand layout or even performing complex simulations / optimization to set trace lengths. For example Josh Johnson has one for laying parts out in a circle and Greg Davill has several for length matching and rendering file generation. My personal favourite among the KiCAD scripts is the one for generation of Interactive BOM.

I am really looking forward to Python script support in the Schematic Editor. Meanwhile programmatic Schematic generation tools like Skidl provide schematic oriented Python fun.

The rendering of the PCB's is often done in Blender. Which has its own set of Python nicities.


My first foray in creating a Blender API based application was during the Kinect USB protocol hacking days. The data stream had just been decoded and I wanted an easy pipeline to a commonly installed / open-source 3D display software. The Python API is mature enough for people these days to quickly put together motion capture plugins for Blender. This plugin however demonstrates the challenges for creating native plugins for blender, the .pyd files for Python have to be recreated for different versions of Blender for ABI compaitibility.

Getting the binaries working has had me thrashing about and posting in forums, then sticking to a working Blender build with Python 2.7 for about 5 years since I did not want to touch it and break it. My integration actually reversed the embedding process, i.e. instead of using additional modules in the Blender embedded python I embedded Blender in a 3D GIS automation.

Native plugin weirdness aside, Blender Python API is a really powerful tool for creating procedural objects from waves / fluid simulation to astrophysics with amuse.


FreeCAD is sort of the third part of my physical electrical / mechnical design triumvirate. I occasionally design parts for KiCAD in FreeCAD, or bring multiple boards together to test enclosure fit. FreeCAD also has an extensive python library which is leveraged by KiCAD part library maintainers to parametrically generate parts.

The scripting in FreeCAD can be used much like the PCB layout scripts in KiCAD to create this with circular symmetry, like ball bearings which are difficult and repetitive to do by hand.

Final words

There are lots of other pieces of desktop software I have used that have started shipping with Python API's to address the never ending demand from users to easily automate repeated tasks. The live process for making this blogpost in somewhat recursive fashion can be found here.

I have even made videos withs a proprietary one, I will live that here for anyone interested in my attempts at a voiceover.

Sunday, August 30, 2020

Compiling QGIS in MSVC in 2020

Compiling QGIS on Windows in 200x

I don't quite remember when I decided to help compile QGIS on Windows. It was somewhere between compiling GDAL with ECW support for Photoshop on Windows and getting carried into Direct3D and C# land with NASA WorldWind. It was sometime in the 2000's while still working at Apogee Imaging in Lobethal.

At that point I was manually building a database of the footprints of satellite imagery that filled up a wall cabinet with CD's and DVD's. The technique was something like open up the image, go around edges and trace a polygon. This was days before mature boolean thresholding and reliable/easy raster-to-vector logic.

I hopped on IRC on #qgis in Freenode and chatted with luminaries like timlinux, frankw and gsherman. Listened to the automated notifications from sigq, the commits bot. Things were heating up and instead of a Linux cross-compile to windows using MingW, something native to windows say using MSyS+MingW instead of Cygwin was desired. A lot of GDAL and Qt worked in MingW, so presumably QGIS would too. So I set myself to put together an MSYS environment with all the third-party dependencies that could be used to happily build QGIS. Eventually I built a release in NSIS as well.

My MSYS environment got packed in a zip and shared via FTP/HTTP on a VPS I had back then to the rest of the community. I earned myself a pin in the QGIS core contributor map in Adelaide. Something I am very proud of to this day. Eventually the MingW build got deprecated and native MSVC builds were supported. That's how contributions work, nothing lasts forever. In my IRC days, I helped on-board Nathan Woodrow to QGIS, who in turn I believe helped on-board Nyall Dawson. Nyall has surpassed us all in feature contributions and work on QGIS.

Fast forward to 2020, compiling QGIS in MSVC

I am getting back into doing lots of Open-source work after long hiatus in private industry with Aerometrex and start-up land with Lorisystems. It is great fun working on mostly in the open at Geoscience Australia. There is actually a recently archived opendatacube + qgis repository here. Seeing that repo and speaking to Nathan and LinuxConfAu inspired me to have a go and getting back into actively working on the Qgis code base. It has sprawled out, with lots and lots of new features. The build system is still familiar via CMake and actually much easier now with MSVC. I cast around for a recent guide and found this. The guide mostly works, however I made some refinements.

  • Ditched bison and flex via Cygwin to using the one available via Msys2. These can be found here. Not needing the while Cygwin system helps in keeping the windows build system light. Simply download the binaries and add them to the Osgeo4W binaries directory.
  • Captured my CMakeCache.txt to make it easier to reproduce and debug the build environment for others.
  • Used Incredibuild in demo mode to use a few NUC's I have lying around to speed up the build. Recording while building failed the first time and worked the next. The whole build from scratch still tooks around 35minutes overall.

I am planning to throw some of my day to day DevOps skills towards the QGIS project and start helping again with Raster enhancements and windows release management. Perhaps getting Incredibuild in the hands of the windows maintainers will help tighten up the iteration cycle and make testing easier.

The twitter thread/ stream of consciousness edition of this is available as well.

Wednesday, July 22, 2020

Microservices the hard way - folders in EC2 Instance

For day to day work I wrangle containers in EKS these days. However when doing personal projects EKS is a luxury (baseline cost being $70 or so per month). So I decided to do microservice development for the rain radar project using no Docker, no Kubernetes but using:

  • multiple venvs
  • multiple service folders
  • environments in .env files (secrets in plain text)
  • web service start using @reboot in Cron
  • scheduled services using ... ya Cron

The whole thing started with noble intentions to use lambda's all the way however I got stuck in using S3-SNS to trigger the lambda and decided to scan the S3 bucket using timestamps to find latest files to process. More on the pitfalls of that later.

The major microservices handle are:

  • Raw radar data preparation using custom hand crafted algorithm, being ported to Rust here.
  • Inserting prepared data to DynamoDB as a sparse array and serving this via Flask.
  • Nowcasting using the timeseries of sparse array of rain observations also serving results via Flask.
  • Capturing rain events and nowcasts and creating text and gif to send to twitter.

Each of these applications consumes the other to some extent and is sort of separated in responsibility. I decided to deploy them with basic a folder per application on the /home/ubuntu directory, with a venv per folder.

I had it like this for a while. Then I got tired for sshing into the box and git pulling in each folder. So I decided to write a fabfile per application which would do this for me and created deployment keys which would be used to pull the code to this folder. Then I got tired of running multiple fabfiles and decided to setup a polled process which run the fabfiles and git synced the code from a master pipeline.

Eventually I got around to bootstrapping the whole VM using Packer + Ansible playbooks. The development work for it was done locally using Vagrant with Hyper-V as the VM provide to test the same Ansible playbooks. I will follow up on this with a few characters on twitter.

Once the initial Packer AMI is established the choice is to either keep building this image or to move away from the whole VM based old-school stuff to a more modern/fun Kubernetes way.

Monday, June 1, 2020

Replicating Databases in RDS

One Sunday in 2018 I sat for a whole day in Art Caffe at the ground floor of Yaya Centre in Nairobi on the phone to Norman at AWS Support in Cape Town discussing DMS for MSSQL servers. After a whole day of screen sharing and being on call we decided what we were trying to do was no achievable, but AWS was working on it. The next day AWS sent me an NDA (since expired).

Data replication from on-prem Database instances or between cloud database instances is an issue that comes up all the time. I have hands on experience doing this a couple of times now. This post summarizes my 3 or so attempts at doing this with different sources and targets and lessons learnt.


At the start-up I was working at we adopted a pre-built mini ERP, it covered logistics workflows and finance / billing aspects. It was built in the 2000's in .NET Classic and ran on IIS and MSSQL server. Quickly the MSSQL became the single point of lack of scalability in the system. Since AWS does not natively support read-replicas for MSSQL RDS instances I looked at DMS to create these replicas. DMS did not quite work as expected and led to the conversation alluded to above with Norman. I ended up performing replication using CloudBasic as the SaaS provider for managing the Change Tracking and handling schema changes in the source tables and propagating them to the target replicas. The replication was fine for single replicas, but quickly bogged the source database down as I added more replicas.

As aside the same database was also being replicated to a Redshift cluster for BI usage using tooling provided by Periscope Data.

As part of this excercise I came to appreciate the advantage to write-only / append-only schemas in improving no-lock replication performance (at the cost of storage), also the need for timestamp columns such as update_time to perform incremental data transfer. I spent a lot of time reading the Schemaless articles by Uber Eng around building Schemaless DB's on top of RDBMS's like MySQL. I don't 100% agree with their design choices but it adds interesting perspective. The bottomline CRUD at scale is HARD.

RDS PostgreSQL to PostgreSQL using DMS

Fast forward a year or so, I am now working at Geoscience Australia, with the Digital Earth Australia. Everything runs on Kubernetes and is highly scalable. Single point of lack of scalability is again the database. A pattern seems to be emerging here. We were performing cluster migration in Kubernetes and I offered to investigate DMS again.

In the MSSQL scenario there is a small prologue, I had previously migrated around 1million cells from a massive Google Sheet to the MSSQL database at the start of my tenure at the startup, by the time we hit scalability issues in the single instance MSSQL we were at 10million rows in the largest append-only table. The PostgreSQL migration of the datacube tables featured 8-9 million rows in the largest table. However the database also has lots of indexes and uses PostGIS for some applications, particularly Datacube Explorer. DMS fell down in support for the Geometry columns, however I learnt a lot in setting up DMS using Terraform IAC and fine tuning for JSON Blob columns, which in Datacube design in 1.8.x series can be upto 2MB in size. DMS migrates standard columns separately from LOB columns.

Ultimately DMS was not feasible for datacube DB migration to a new RDS instance. However I believe core datacube can be migrated next time I try with applications depending on Materialized views and PostGIS setup afresh on new DB. Also by the time I try again Amazon may have better PostGIS support. For the cluster migration we ended up using a snaphot of the last DB.

On-prem PostgreSQL to RDS PostgreSQL

There is a Datacube PostgreSQL DB instance at the NCI which needs to be regularly replicated to RDS. It powers the Explorer Datacube application. However DB migration from one server without direct disk access to RDS where we also don't have disk access using pg_dump / pg_restore for a DB with largest tables being around 22 million rows and the compressed dump being around 11GB is a long running task. Ideally we sort something out that is incremental using update_time generated using triggers. The options explored so far are :

  • Using an Airflow DAG with Kubernetes Executors wrapped around pg_dump/restore with some application specific details.
  • Using COPY from S3 support for Aurora PostgreSQL, CSV's restored using the COPY command are generated incrementally.
  • Using PostgreSQL publish / subscribe and Logical Replication. Networking over the internet to maintain connectivity securely to the on-prem instance via SSH Tunnels and to the RDS instance via EKS port-forwarding.

Thursday, May 7, 2020

ADE7816 Energy Monitor

I have been meaning to try out the ADE7816 Single-phase 6 current channels energy monitor for a while. However time has been lacking for the last couple of years. Finally I have a working version with successful board bring-up and a semi-working Micropython driver, with an Arduino driver in the works.

PCB Design

The PCB design process for this was not easy mostly due to a footprint choice mistake on my part. I had placed the 5x5mm QFN part instead of the 6x6mm QFN part in KiCAD. This made the DRC fail everywhere in standard settings. However it ended up being a collaboration opportunity with Greg Davill who loves to practice and photograph bodging stuff. So I now have a work of art at hand instead of a non-functional board.

I am even debating whether to place the rest of the parts and possibly take away from the dead-bug awesomeness. Next time need to order parts in advance and make sure I do 1-1 prints to verify footprints before pulling the trigger on PCB's.

Energy Montor Details

Now to more about the energy monitor. This ASIC features 3 single-ended and 3 differential current inputs and a single-phase voltage input, all in very compact 40-pin 6x6mm QFN package. In fact the PCB is large on purpose to accomodate ease of use with stereo-jack type current clamps. The main usage would be in standard households where there are typically 3-4 lighting circuits, 1-2 socket circuits and dedicated Air Conditioning circuit. A single energy monitor could be built to monitor all channels using a single-ASIC and leave out fancy NILM stuff from worrying about the lights. The socket circuits could have anything plugged into them and can potentially have point-of-load monitoring instead of breaker board based monitoring. All this translates to more data being generated for IoT platforms and some sensible firmware work needs to be done to handle that.

ADE7816 Driver Development

This is still work in progress. I have done some initial exploration to find prior art. Nothing exists yet from Arduino however there is some register lists from a Javascript driver written for the now defunct Intel Edison.

Intel never quite had the maker market pinned right to market that board, it makes me sad to think of all the engineering ours sunk into a now defunct platform. Open-source software / hardware helps us salvage some of that. I also sped up the register listing by copy-pasting the ubquitous table from the ADE7816 datasheet and dropping it into a Jupyter notebook to parse all the registers, not as fancy as the Pdf parser I had built before, but much more reliable.

My driver development follwed the now tried and tested Micropython + Jupyter Notebook + Logic Analyzer path. I used an ESP32 feather as host processor with standard micropython loaded and probed the SPI bus with read-write packets for known registers until the protocol gave in and started responding with some values. The ASIC is super versatile in supported protocols - I2C Slave, SPI Master and SPI Slave modes are all viable. So developing a fully functional driver supporting all the possible modes will take a while. The initial work so far is on the SPI slave mode since all my other work in DIN rail and Featherwing formats is linked to the SPI bus, however the I2C mode can be really interesting for host-processors with fewer pins and flaky SPI support (while having solid I2C support - like the Onion).

If anyone is interested in driver development I am happy to send you a board or you can get one yourself from Oshpark, Aisler or PCBWay. Once the drivers mature I will list it for a wider audience on Tindie.

Saturday, April 4, 2020

Distributed Locking from Python while running on AWS

In the day and age of eventually consistent web-scale applications the concept of locking may seem very archaic. However in some instances attempting to obtain a lock and failing to do so within a limited window can prevent dogpile effects for expensive server side operations or prevent over-write of already executing long running tasks such as ETL processes.
I have used 3-basic approaches to create distributed locks on AWS with the help of built-in services and accessed them via Python which is what I build most of my sofware in.

File locks upgraded to EFS

File based locks in UNIX file-systems are very common. They are typically created using the flock command, avalaible in Python under os-specific flock API. Also checkout the platform independent filelock. This is well and good for a VM or single application instance. For distributed locking, we will need EFS as the filesystem on which these locks are held, Linux-Kernel and NFS will use byte-range locks to help simulate locally attached file system type locks. However if the client loses connectivity the NFS lock-state cannot be determined, better run that EFS with enough replicas to ensure connectivity.
File locking this way is very useful if we are using EFS for holding large file and processing data anyway.

Redis locks upgraded to ElastiCache

Another popular pattern for holding locks in Python is using Redis. This can be upgraded in the cloud-hosted scenario to Redis-Elasticache, This pairs well with the redis-lock library.
Using redis requires a bit of setup and is subject to similar network vagaries and EFS. It makes sense when using Redis already as an in-memory cache for accelration or as a broker/results mechanism for Celery. Having data encrypted at rest and transit may require running an Stunnel Proxy.

An AWS only Method - DynamoDB

A while ago AWS published an article for creating and holding locks on DynamoDB using a Java lock client. This client creates the lock and holds it live using heart-beats while the relevant code section executes. Since then it has been ported to Python and I am maintaining my own fork.
It works well and helps scale-out singleton processes run as Lambdas to multiple lambdas in a serverless fashion, with a given lambda quickly skipping over a task another lambda is holding a lock on. I have also used it on EC2 based stuff where I was already using DynamoDB for other purposes. This is possibly the easiest and cheapest method for achieving distributed locking. Locally testing this technique is also quite easy using local-dynamodb in a docker container.
Feel free to ping me other distributed locking solutions that work well on AWS and I will try them out.