Michiels Blog

Reinout van Rees: prefect.io

Reinout is a long-time visitor of the Python Leiden meetup. In fact, he is a regular on most nearby Python meetups and conferences. The nice thing is that he also writes summaries of the talks he attends and he publishes those on his blog. I really appreciate this! Today he gave a talk at our meetup, and I decided to ‘return the favour’ so here is my summary of his talk!

Reinout discusses prefect.io: it is a server that provides a dashboard, has an API, shows workflows, and allows you to start/restart jobs, examine logs, and such.

In his company much IT-related work involves loading data from web APIs, transforming it in some form, loading it in databases and so on.

Previously this was all done on the command line and people had to log in over SSH and there were constraints about rotating ssh keys which all made it very cumbersome. Now they have a nice web UI and not so much IT restraints and it’s much easier for people because they can just log in to a web UI and click around.

In Prefect, workflows are defined. Workflows consists of one or more tasks. Those tasks are written in python. Prefect comes with some niceties to configure retries on your tasks, it provides a custom logging framework and a lot more. In the UI you can manually start jobs, configure workers, and monitor progress.

Prefect does not provide its own authentication mechanism at the moment. So if you want to ‘protect’ it people are using HTTP Basic auth.

It comes with a docker setup, which is what Reinout is using. Also a hosted version is available.

Reinout created a cookiecutter template that allows non-programmers to create simple python tasks. This has built in workflow that creates a github repository, and configures github actions that creates docker containers on every commit on main, and pushes those to a docker registry.

He uses ‘watchtower’ in the docker compose setup, which allows him to simply deploy the stack once, and the stack will poll for updated containers and keep itself up to date.

I think it’s nice to see that they are able to provide a frictionless way to set up, deploy and monitor data jobs for people who are more data-minded and not hardcore sofware developers!