Thursday, June 15, 2017

Django Docker and Celery

I've finally had the time to create a Django+Celery project that can be completely run using Docker and Docker Compose. I'd like to share some of the steps that helped me achieve this. I've created an example project that I've used to demo this process. The example project can be viewed here on Github.

https://github.com/JoeJasinski/docker-django-demo/tree/blogpost

To run this example, you will need to have a recent version of Docker and Docker Compose installed. I'm using Docker 17.03.1 and Docker Compose 1.11.2.

Let's take a look at one of the core files, the docker-compose.yml.   You'll notice that I have defined the following services:
  •  db - the service running the Postgres database container, needed for the Django app
  • rabbitmq - service running the RabbitMQ container, needed for queuing jobs submitted by Celery
  • app - the service containing Django app container
  • worker - the service that runs the Celery worker container
  • web - the service that runs the Nginx container, which proxies web requests to the Django service.  
Some of these services depend on others, and are specified as such by using the depends_on Docker Compose parameter. Some of the services set environment variables to configure the behavior of each container.

   i.e.
   environment:
       - DATABASE_URL=postgres://postgres@db/postgres
       - CELERY_BROKER_URL=amqp://guest:guest@rabbitmq:5672//
       - SITE_DIR=/site/
       - PROJECT_NAME=dddemo
       - DJANGO_DEBUG=False

Each of these services belong to the network named jaz, and can communicate with each other. For services that only need to communicate internally with other services, I've used the expose configuration parameter to expose given ports to the other containers. Only for the Nginx container, I've used the ports parameter to expose tcp ports 80 and 443 to the Docker host and make the app available for browsing with a web browser.

   i.e.
   web:
     image: nginx:1.11
     ...
     ports:
       - "80:80"
       - "443:443"
     networks:
       - jaz

   networks:
     jaz:

I've created a Docker volume called static-volume to hold static files generated by the app. These are where static files - copied by Django's collectstatic management command - get copied to. This volume is shared between the app service and the Nginx service, and Nginx serves up the staticfiles. 

    services:
       
       ...
       web:
         image: nginx:1.11
         ...
         volumes:
           ...
           - static-volume:/static

    volumes:
      static-volume:

The Postgres, RabbitMQ, and Nginx services all are built off of official Docker images for those services.  However, the app and worker services run the Django codebase, and they require some environmental customization.  The build docker-compose keyword lets us specify the Dockerfile to use and define build context (the subdirectory tree the Dockerfile build process has access to).

   i.e.
   app:
     build:
       context: .
       dockerfile: Dockerfile

Therefore, I've created a Dockerfile to build the app service properly. The Dockerfile uses the ubuntu:16.04 base image; however, a more lightweight image could be used instead. One Docker best practice is to run all the apt-get installs plus a cleanup step in a single Dockerfile RUN command. This ensures that when the Docker layer is created, it is not created with extra temporary files needed by Apt. (Specifically, the layer is created only after doing some space cleanup.)

    RUN apt-get update && apt-get install -y \
        build-essential \
        ...
        zlib1g-dev \
        && rm -rf /var/lib/apt/lists/*

Next, in the Dockerfile, we create a virtualenv and some other supporting directories. There is some debate about the need for creating a virtualenv inside of a Docker container. I personally think it is a good idea because it isolates the application Python modules from the OS-level Python modules (just as it does in a non-Docker setup). It offers one more layer of isolation.

    RUN mkdir -p $SITE_DIR
    WORKDIR $SITE_DIR
    RUN mkdir -p proj/ var/log/ htdocs/
    
    RUN python3 -mvenv env/

After creating the virtualenv, I force an upgrade of pip to ensure I'm using the most recent pip version.

    RUN env/bin/pip install pip --upgrade

One important step that may seem out of place is that I copy in the Python requirements.txt file and install the Python requirements early on in the Dockerfile build process. Installing the Python requirements is a time-consuming process, and I can leverage Docker's build-in caching feature to ensure that Docker only needs to install the requirements if a change is specifically made to the requirements file. If I were push that step further down in the Dockerfile, I'd risk unnecessarily re-installing the Python requirements every time I make an arbitrary change to the code.

    COPY requirements.txt requirements.txt
    RUN env/bin/pip install -r requirements.txt

I like to explicitly install uwsgi to ensure it's present, since it's required to even attempt to start the app. I also set some environment variables, such as the database connection string. Note: these environment variables can be overridden by defining them in the environment section of the docker-compose.yml file.

    RUN env/bin/pip install uwsgi

    ENV NUM_THREADS=2
    ENV NUM_PROCS=2

    ENV DJANGO_DATABASE_URL=postgres://postgres@db/postgres

Next, the Dockefile copies a docker-utils folder into the container. This folder contains most of the Docker-specific scripts and configuration files needed to run Django and other services, and this copy makes these files available to the container.

    COPY docker-utils/ docker-utils/

After that, the entire codebase is copied from the current directory to a proj/ directory inside the container. The proj directory will be where the container runs the codebase from.

    COPY . proj/

Finally, the Dockerfile ENTRYPOINT is set to a script called entrypoint.sh. Normally, the entrypoint defaults to a shell executable, such as /bin/bash. However, overriding it allows us to do some interesting things - more on that later.  The Dockerfile CMD is set to another shell script, app-start.sh. This specifies the default command (or script) that should be run when the container starts. In this case, the app-start.sh actually runs the Django process by executing the uwsgi command.

    ENTRYPOINT ["./docker-utils/entrypoint.sh"]
    CMD ["./docker-utils/app-start.sh"]

Let's take a closer look at the entrypoint.sh. This was one of the files in the docker-utils folder copied to the container. The entrypoint.sh is called every time the container starts, regardless of arguments passed to the container. It is just a shell script that lets us add some additional optional logic. In this example, if someone passes "init" as an argument to the docker-compose run command, we run the Django migrate and collectstatic management commands.

    #!/bin/bash

    set -eoux pipefail

    if [ "$1" == 'init' ]; then
        echo "Run Migrations"
        ${SITE_DIR}/env/bin/python ${SITE_DIR}/proj/manage.py migrate
        ${SITE_DIR}/env/bin/python ${SITE_DIR}/proj/manage.py collectstatic --no-input
    elif [ "$1" == 'manage' ]; then
        shift
        echo "Manage.py $@"
        ${SITE_DIR}/env/bin/python ${SITE_DIR}/proj/manage.py $@
    else
        exec "$@"
    fi

The start-app.sh command is pretty straightforward as it is a simple wrapper to call uwsgi and run the Django app. There are a few things worth mentioning about it. First, several configuration options are passed into it via environment variables. We use the --chdir flag to change directory to the proj/ directory (containing the codebase) before running. As required by Django, we set the DJANGO_SETTINGS_MODULE environment variable to make uwsgi aware of the Django settings. Another useful setting is the --python-autoreload=1 parameter, which tells uwsgi to reload when it detects a change to the codebase. This operates very similarly to how runserver reloads and is very useful for development. Many of the options are fairly standard uwsgi command options.

#!/bin/bash


echo "Starting uWSGI for ${PROJECT_NAME}"


$SITE_DIR/env/bin/uwsgi --chdir ${SITE_DIR}proj/ \
    --module=${PROJECT_NAME}.wsgi:application \
    --master \
    --env DJANGO_SETTINGS_MODULE=${PROJECT_NAME}.settings \
    --vacuum \
    --max-requests=5000 \
    --virtualenv ${SITE_DIR}env/ \
    --socket 0.0.0.0:8000 \
    --processes $NUM_PROCS \
    --threads $NUM_THREADS \
    --python-autoreload=1

Now that we've talked a bit about the Dockerfile and docker build process, let's take a closer look at the Nginx service in the docker-compose.yml file. The official Nginx Docker image has a very specific way that you are supposed to customize the Nginx configuration.  Specifically, an Nginx configuration template (default.template.conf) is passed into the container via a Docker volume. When the container is executed, the envsubst command combines the configuration template with environment variables from the container and generates the actual Nginx configuration. This allows you to dynamically craft an Nginx configuration file by passing in different environment variables via the docker-compose.yml file.

   web:
     image: nginx:1.11
     ...
     volumes:

       - ./docker-utils/nginx/default.template.conf:/root/default.template.conf

     command: /bin/bash -c "envsubst '$$NGINX_HTTP_PORT $$NGINX_HTTPS_PORT' < /root/default.template.conf > /etc/nginx/conf.d/default.conf && nginx -g 'daemon off;'"

     environment:
       - NGINX_HTTP_PORT=80
       - NGINX_HTTPS_PORT=443

Lets take a look at the Celery worker service in the docker-compose.yml file. This service uses the same Dockerfile that was used for the build of the app service, but a different command executes when the container runs. There is nothing magic going on with this command; this simply executes Celery inside of the virtualenv.

   worker:
     build:
       context: .
       dockerfile: Dockerfile
     container_name: dddemo-worker
     command: /site/env/bin/celery worker -A dddemo --workdir /site/proj/ -l info

Finally, we can move away from the Docker-related configuration and take a look at the Celery configuration in the Django project. Much of the following configuration is boilerplate from the Celery 4.0 docs, so I won't go into too much detail.

First, there exists a celery.py file inside of the Django project. This integrates Celery into the Django project, and reads in the Celery configuration from settings.py. Next, there exists an __init__.py at the root of the project, which initializes the aforementioned celery.py. The Django settings.py contains some Celery configuration, including how to connect to the RabbitMQ service. A very simple Celery add task is defined in tasks.py; this task will add two numbers passed to it. A very simple Django view hosts a page at the root url and will execute the add task in the background. The results of the add task can be viewed in the taskresult module in the Django Admin.

Though this is a very basic prototype example, it demonstrates a complete Django project and backing services needed to execute Celery jobs. The entire distributed system needed for this application can be executed with only two Docker Compose commands.

To get started, the README describes how to run the project and provides some other common commands that can be used to administer the project. Hope this is useful, and let me know your feedback.



1 comment:

  1. The Info in the blog is out of this world, I so want to read more.
    selderij gezond

    ReplyDelete