Wednesday, July 22, 2020

Microservices the hard way - folders in EC2 Instance

For day to day work I wrangle containers in EKS these days. However when doing personal projects EKS is a luxury (baseline cost being $70 or so per month). So I decided to do microservice development for the rain radar project using no Docker, no Kubernetes but using:

  • multiple venvs
  • multiple service folders
  • environments in .env files (secrets in plain text)
  • web service start using @reboot in Cron
  • scheduled services using ... ya Cron

The whole thing started with noble intentions to use lambda's all the way however I got stuck in using S3-SNS to trigger the lambda and decided to scan the S3 bucket using timestamps to find latest files to process. More on the pitfalls of that later.

The major microservices handle are:

  • Raw radar data preparation using custom hand crafted algorithm, being ported to Rust here.
  • Inserting prepared data to DynamoDB as a sparse array and serving this via Flask.
  • Nowcasting using the timeseries of sparse array of rain observations also serving results via Flask.
  • Capturing rain events and nowcasts and creating text and gif to send to twitter.

Each of these applications consumes the other to some extent and is sort of separated in responsibility. I decided to deploy them with basic a folder per application on the /home/ubuntu directory, with a venv per folder.

I had it like this for a while. Then I got tired for sshing into the box and git pulling in each folder. So I decided to write a fabfile per application which would do this for me and created deployment keys which would be used to pull the code to this folder. Then I got tired of running multiple fabfiles and decided to setup a polled process which run the fabfiles and git synced the code from a master pipeline.

Eventually I got around to bootstrapping the whole VM using Packer + Ansible playbooks. The development work for it was done locally using Vagrant with Hyper-V as the VM provide to test the same Ansible playbooks. I will follow up on this with a few characters on twitter.

Once the initial Packer AMI is established the choice is to either keep building this image or to move away from the whole VM based old-school stuff to a more modern/fun Kubernetes way.

Monday, June 1, 2020

Replicating Databases in RDS

One Sunday in 2018 I sat for a whole day in Art Caffe at the ground floor of Yaya Centre in Nairobi on the phone to Norman at AWS Support in Cape Town discussing DMS for MSSQL servers. After a whole day of screen sharing and being on call we decided what we were trying to do was no achievable, but AWS was working on it. The next day AWS sent me an NDA (since expired).

Data replication from on-prem Database instances or between cloud database instances is an issue that comes up all the time. I have hands on experience doing this a couple of times now. This post summarizes my 3 or so attempts at doing this with different sources and targets and lessons learnt.


At the start-up I was working at we adopted a pre-built mini ERP, it covered logistics workflows and finance / billing aspects. It was built in the 2000's in .NET Classic and ran on IIS and MSSQL server. Quickly the MSSQL became the single point of lack of scalability in the system. Since AWS does not natively support read-replicas for MSSQL RDS instances I looked at DMS to create these replicas. DMS did not quite work as expected and led to the conversation alluded to above with Norman. I ended up performing replication using CloudBasic as the SaaS provider for managing the Change Tracking and handling schema changes in the source tables and propagating them to the target replicas. The replication was fine for single replicas, but quickly bogged the source database down as I added more replicas.

As aside the same database was also being replicated to a Redshift cluster for BI usage using tooling provided by Periscope Data.

As part of this excercise I came to appreciate the advantage to write-only / append-only schemas in improving no-lock replication performance (at the cost of storage), also the need for timestamp columns such as update_time to perform incremental data transfer. I spent a lot of time reading the Schemaless articles by Uber Eng around building Schemaless DB's on top of RDBMS's like MySQL. I don't 100% agree with their design choices but it adds interesting perspective. The bottomline CRUD at scale is HARD.

RDS PostgreSQL to PostgreSQL using DMS

Fast forward a year or so, I am now working at Geoscience Australia, with the Digital Earth Australia. Everything runs on Kubernetes and is highly scalable. Single point of lack of scalability is again the database. A pattern seems to be emerging here. We were performing cluster migration in Kubernetes and I offered to investigate DMS again.

In the MSSQL scenario there is a small prologue, I had previously migrated around 1million cells from a massive Google Sheet to the MSSQL database at the start of my tenure at the startup, by the time we hit scalability issues in the single instance MSSQL we were at 10million rows in the largest append-only table. The PostgreSQL migration of the datacube tables featured 8-9 million rows in the largest table. However the database also has lots of indexes and uses PostGIS for some applications, particularly Datacube Explorer. DMS fell down in support for the Geometry columns, however I learnt a lot in setting up DMS using Terraform IAC and fine tuning for JSON Blob columns, which in Datacube design in 1.8.x series can be upto 2MB in size. DMS migrates standard columns separately from LOB columns.

Ultimately DMS was not feasible for datacube DB migration to a new RDS instance. However I believe core datacube can be migrated next time I try with applications depending on Materialized views and PostGIS setup afresh on new DB. Also by the time I try again Amazon may have better PostGIS support. For the cluster migration we ended up using a snaphot of the last DB.

On-prem PostgreSQL to RDS PostgreSQL

There is a Datacube PostgreSQL DB instance at the NCI which needs to be regularly replicated to RDS. It powers the Explorer Datacube application. However DB migration from one server without direct disk access to RDS where we also don't have disk access using pg_dump / pg_restore for a DB with largest tables being around 22 million rows and the compressed dump being around 11GB is a long running task. Ideally we sort something out that is incremental using update_time generated using triggers. The options explored so far are :

  • Using an Airflow DAG with Kubernetes Executors wrapped around pg_dump/restore with some application specific details.
  • Using COPY from S3 support for Aurora PostgreSQL, CSV's restored using the COPY command are generated incrementally.
  • Using PostgreSQL publish / subscribe and Logical Replication. Networking over the internet to maintain connectivity securely to the on-prem instance via SSH Tunnels and to the RDS instance via EKS port-forwarding.

Thursday, May 7, 2020

ADE7816 Energy Monitor

I have been meaning to try out the ADE7816 Single-phase 6 current channels energy monitor for a while. However time has been lacking for the last couple of years. Finally I have a working version with successful board bring-up and a semi-working Micropython driver, with an Arduino driver in the works.

PCB Design

The PCB design process for this was not easy mostly due to a footprint choice mistake on my part. I had placed the 5x5mm QFN part instead of the 6x6mm QFN part in KiCAD. This made the DRC fail everywhere in standard settings. However it ended up being a collaboration opportunity with Greg Davill who loves to practice and photograph bodging stuff. So I now have a work of art at hand instead of a non-functional board.

I am even debating whether to place the rest of the parts and possibly take away from the dead-bug awesomeness. Next time need to order parts in advance and make sure I do 1-1 prints to verify footprints before pulling the trigger on PCB's.

Energy Montor Details

Now to more about the energy monitor. This ASIC features 3 single-ended and 3 differential current inputs and a single-phase voltage input, all in very compact 40-pin 6x6mm QFN package. In fact the PCB is large on purpose to accomodate ease of use with stereo-jack type current clamps. The main usage would be in standard households where there are typically 3-4 lighting circuits, 1-2 socket circuits and dedicated Air Conditioning circuit. A single energy monitor could be built to monitor all channels using a single-ASIC and leave out fancy NILM stuff from worrying about the lights. The socket circuits could have anything plugged into them and can potentially have point-of-load monitoring instead of breaker board based monitoring. All this translates to more data being generated for IoT platforms and some sensible firmware work needs to be done to handle that.

ADE7816 Driver Development

This is still work in progress. I have done some initial exploration to find prior art. Nothing exists yet from Arduino however there is some register lists from a Javascript driver written for the now defunct Intel Edison.

Intel never quite had the maker market pinned right to market that board, it makes me sad to think of all the engineering ours sunk into a now defunct platform. Open-source software / hardware helps us salvage some of that. I also sped up the register listing by copy-pasting the ubquitous table from the ADE7816 datasheet and dropping it into a Jupyter notebook to parse all the registers, not as fancy as the Pdf parser I had built before, but much more reliable.

My driver development follwed the now tried and tested Micropython + Jupyter Notebook + Logic Analyzer path. I used an ESP32 feather as host processor with standard micropython loaded and probed the SPI bus with read-write packets for known registers until the protocol gave in and started responding with some values. The ASIC is super versatile in supported protocols - I2C Slave, SPI Master and SPI Slave modes are all viable. So developing a fully functional driver supporting all the possible modes will take a while. The initial work so far is on the SPI slave mode since all my other work in DIN rail and Featherwing formats is linked to the SPI bus, however the I2C mode can be really interesting for host-processors with fewer pins and flaky SPI support (while having solid I2C support - like the Onion).

If anyone is interested in driver development I am happy to send you a board or you can get one yourself from Oshpark, Aisler or PCBWay. Once the drivers mature I will list it for a wider audience on Tindie.

Saturday, April 4, 2020

Distributed Locking from Python while running on AWS

In the day and age of eventually consistent web-scale applications the concept of locking may seem very archaic. However in some instances attempting to obtain a lock and failing to do so within a limited window can prevent dogpile effects for expensive server side operations or prevent over-write of already executing long running tasks such as ETL processes.
I have used 3-basic approaches to create distributed locks on AWS with the help of built-in services and accessed them via Python which is what I build most of my sofware in.

File locks upgraded to EFS

File based locks in UNIX file-systems are very common. They are typically created using the flock command, avalaible in Python under os-specific flock API. Also checkout the platform independent filelock. This is well and good for a VM or single application instance. For distributed locking, we will need EFS as the filesystem on which these locks are held, Linux-Kernel and NFS will use byte-range locks to help simulate locally attached file system type locks. However if the client loses connectivity the NFS lock-state cannot be determined, better run that EFS with enough replicas to ensure connectivity.
File locking this way is very useful if we are using EFS for holding large file and processing data anyway.

Redis locks upgraded to ElastiCache

Another popular pattern for holding locks in Python is using Redis. This can be upgraded in the cloud-hosted scenario to Redis-Elasticache, This pairs well with the redis-lock library.
Using redis requires a bit of setup and is subject to similar network vagaries and EFS. It makes sense when using Redis already as an in-memory cache for accelration or as a broker/results mechanism for Celery. Having data encrypted at rest and transit may require running an Stunnel Proxy.

An AWS only Method - DynamoDB

A while ago AWS published an article for creating and holding locks on DynamoDB using a Java lock client. This client creates the lock and holds it live using heart-beats while the relevant code section executes. Since then it has been ported to Python and I am maintaining my own fork.
It works well and helps scale-out singleton processes run as Lambdas to multiple lambdas in a serverless fashion, with a given lambda quickly skipping over a task another lambda is holding a lock on. I have also used it on EC2 based stuff where I was already using DynamoDB for other purposes. This is possibly the easiest and cheapest method for achieving distributed locking. Locally testing this technique is also quite easy using local-dynamodb in a docker container.
Feel free to ping me other distributed locking solutions that work well on AWS and I will try them out.

Friday, January 24, 2020

Testing the OrangeCrab r0.1

After hassling Greg Davill for a while on twitter and admiring his OrangeCrab hardware I managed to catch up with him in person in Adelaide. I have been away in Nairobi till October last year, then I spent a brief few days in Adelaide before coming over to Canberra to take up a position in Geoscience Australia. The new gig is much less time commitment than the start-up world and hopefully will allow more time for blogging and board bring-ups like this one.

I caught up with Greg at a Japanese restaurant in Rundle mall and was treated to his now trademark led cube and led icosahedron. They are insanely detailed pieces of work and deserve staring at. However I am most grateful for the care package he left me, an OrangeCrab v0.1. This is an ECP5 board in feather form factor with an ADC built in to respect the Analog In pins on the feather. My aim for this board is to host some energy monitoring code on the FPGA with a fast 4mbps or so ADC and perform power/energy calculation on some parts of LUT's/DSP and have a softcpu push data out.
Greg also left me a home-made FTDI based board to use a JTAG programmer. The whole setup requires 3 USB cables:

  •  To plugin and power the FPGA board (eventually it should be alos programmable via this port) 
  • To attach a USB-Serial converter and watch the console when the gateware comes up 
  • To program the board over JTAG using an FTDI chip

Getting firmware compiled these days is getting easier, but Greg had done his initial testing with Lattice Diamond. I managed to installed it in WSL and promptly ran into a tonne of issues. The weirdest being close coupling to bash, Ubuntu actually uses dash as its default shell. You can get a Diamond licence and help support integration of diamond in litex-buildenv.
I was about to give up then Greg got it working with the opensource toolchain and [NextPNR-ECP5. I had by then setup litex-buildenv to support the orangecrab. So getting some gateware on was relatively easy. Then I got stuck on RAM timing bug in Litex till a few hours ago when I tested out some new gateware.

Also Checkout out Greg's foboot fork and help make programming over the USB possible and reduce 1 USB cable. More work on this including getting MicroWatt running coming soon.

Sunday, October 6, 2019

Around Kenya in 4 days and a year ( part 2)

The New Year adventure continued past the sad mercury laden mines of Migori to the Tanzania border , Isebania in particular. We met up with professor Sangai Mohochi and had a brief gander into Tanzania ( Serere) for a beer. Later we had some Orokore Beer made from millet, we sat around a car tyre sipping still fermenting beer from straws.

We took our leave from the professor and had a long drive out to Rusinga island. A beautiful place without much of the tourist trappings lakeside places have. The Suba culture is being revived with festivals. We woke up early and did a trip around Rusinga Island and visited Tom Mboya's Mausoleum. One of my high school's illustrious alumni. 

After another long day of driving and some fiddling with charging phones directly from a car battery ( another story on how many engineers does it take to charge a phone), we ended up in Naiberi River Campsite. It is more like a glamping spot and ideal for a quiet New Year's celebration around a fire in the main hall.

A late lunch/early dinner at Rift Valley Lodge Golf Club. We went for a post dinner walk on the greens and ran into herds of zebras and antelopes. No wonder this golf club is classed as one of the best in the Africa. All the excitement was followed by an uneventful drive back to Nairobi.

We arrived tired but energized the beauty and possibilities in Kenya. The roads need to be built, electricity needs to be channeled to houses, there is a lot to do. We look forward to hanging around and getting it done. Eric is heading back to his PhD at Harvard, Gichini will be doing something about the weather prediction and I have helped put Lori Systems on a solid technical footing and change the Logistics landscape.

Wednesday, August 14, 2019

Open-source Sustainability (The tale of 2 package managers)

Last weekend I had the priviledge to attend the 10th PyconAU and listen to some amazing speakers. I went with my I will write markdown on the fly and make a blog-post at the end of the day mindset. Even though I did write a lot of markdown on the fly, I haven't gathered the courage to push these unedited notes into a public post. Excellent examples of live-blogging from conferences here.
What did happen was that the niggling doubts I had around how open-source works in the real world outside of just the code crystallized. This was as a result of 2 very good talks , one about how the PyPi project works and another around Open-source sustainability beyond money.

For the last year I have been writing and reviewing a lot of React Frontend, Python backend (Flask/Django) and Notebooks code. Both frameworks are super easy to buy batteries for where the included ones are running out of juice. Simply via pip install and npm install you can climb onto the shoulders of giants who are library maintainers and the life-blood of lean start-ups everywhere. However the maintainer burnout is a thing and start-ups when building their stack should be highly cognizant of this. Package repository burnout is also a thing. In my time in the software industry I have seen Maven repositories disappear. More recently NPM go through an identity crisis and the left-pad incident.

The PyPi talk gave me great background on a tool I use every single-hour without too much thought. It takes some dedicated volunteers to keep the dream alive. Who according to Dustin Ingram are :

  • Unemployed and bored and poor (but super talented) 
  • Paid for by their employer (thanks employers who support FOSS) 
  • Not getting enough sleep (or in my case time with the family)

Vicky's talk covers another aspect. Developers and maintainers need more than money to keep going, they need back-up. The community need insurance against the bus-factor and burn-out. I have been guilty of this myself, putting a few dollars behind features I would like to see in BountySource instead of diving in. This has become more so as I have progressed in my career and become increasingly time-poor. Talking about this would anyone at VSCode like to claim the few dollars we put here ?

I love the longevity and discipline of project warehouse and will find some time to contribute to it. I also look forward to a similar alternative to npm, rather than a caching proxy with community behind it.