Machine Learning for Satellite Imagery on the UN Global Platform
Creating a containerised inference environment using rastervision.
In the UN Global Platform for Official Statistics, we are developing capabilities to create official statistics from satellite and aerial images. To do this, we have been using rastervision, a new framework specifically designed to aid in aerial image analysis. This article is going to briefly discuss how we have gone about designing and creating a system for analysing new images with the models that we have trained.
When you train a new rastervision model, all the required transformation for the raw image data are zipped inside what is called a
predict-package.zip. You can then simply run rastervision predict using a url to this zip file, and scripts complete the processing of the image before running the inference. While this is powerful, it means there is a large number of dependencies, some python packages, others C++ libraries such as rgdal. Many of these a highly version specific, creating a challenge in developing an environment in which we can generate predictions on new images. We have two options:
- Take the raw tensorflow or keras model and develop our own code to manipulate the raw image data. This would all be implemented in the UNGP methods service.
- Develop our own scalable service to create predictions - the requests to this service can then be sent from the methods service, to create a consistent experience for users that only want basic interaction with the aerial imagery algorithms.
After several frustrating and failed attempts at number 1, we have used a combination of docker, flask and Google’s Kubernetes Engine (gke). At this point, we only have a very basic implementation, but so far the solution for this is performing well.
Rastervision provides a docker image for training machine learning models. This includes all the required dependencies, and there is an accompanying image with GPU acceleration enabled. Using docker we can build small scripts on top of this image, creating a method for accessing this environment.
Flask_restful is a simple framework for creating APIs in python. We have written a very simple API that takes a json message containing the s3 location of the test image, s3 location of the rastervision model and s3 location to store the output of the model when run on this test image. Due to the way that the rastervision
predict-package.zip is written, any model can be loaded into this environment and, providing the image can be processed, run any new image source.
This is the behaviour that we want from the methods service. Firstly, it’s completely reusable with new images and rastervision models. Secondly, it’s a small, containerised service, meaning we could launch multiple processes each working on a single image. This allows for concurrent processing.
To create our new docker image, we write a simple Dockerfile that we are building FROM the base rastervision model. We then move the requirements file and the api script into the new image using the COPY command, and run the necessary commands. On the command line, in the folder holding the Dockerfile, we then run
docker build to create the new image. This will build the new image, based on the original rastervision, which when run, will start a server. In the
docker run command we need to add the argument
-p 5000:5000 to ensure the ports are correctly mapped.
To deploy this container, I uploaded the image to Google’s container registry, created a small cluster in gke and set the container into this cluster. We are now working on developing more z