Thursday, June 15, 2017

Django Docker and Celery

I've finally had the time to create a Django+Celery project that can be completely run using Docker and Docker Compose. I'd like to share some of the steps that helped me achieve this. I've created an example project that I've used to demo this process. The example project can be viewed here on Github.

https://github.com/JoeJasinski/docker-django-demo/tree/blogpost

To run this example, you will need to have a recent version of Docker and Docker Compose installed. I'm using Docker 17.03.1 and Docker Compose 1.11.2.

Let's take a look at one of the core files, the docker-compose.yml.   You'll notice that I have defined the following services:
  •  db - the service running the Postgres database container, needed for the Django app
  • rabbitmq - service running the RabbitMQ container, needed for queuing jobs submitted by Celery
  • app - the service containing Django app container
  • worker - the service that runs the Celery worker container
  • web - the service that runs the Nginx container, which proxies web requests to the Django service.  
Some of these services depend on others, and are specified as such by using the depends_on Docker Compose parameter. Some of the services set environment variables to configure the behavior of each container.

   i.e.
   environment:
       - DATABASE_URL=postgres://postgres@db/postgres
       - CELERY_BROKER_URL=amqp://guest:guest@rabbitmq:5672//
       - SITE_DIR=/site/
       - PROJECT_NAME=dddemo
       - DJANGO_DEBUG=False

Each of these services belong to the network named jaz, and can communicate with each other. For services that only need to communicate internally with other services, I've used the expose configuration parameter to expose given ports to the other containers. Only for the Nginx container, I've used the ports parameter to expose tcp ports 80 and 443 to the Docker host and make the app available for browsing with a web browser.

   i.e.
   web:
     image: nginx:1.11
     ...
     ports:
       - "80:80"
       - "443:443"
     networks:
       - jaz

   networks:
     jaz:

I've created a Docker volume called static-volume to hold static files generated by the app. These are where static files - copied by Django's collectstatic management command - get copied to. This volume is shared between the app service and the Nginx service, and Nginx serves up the staticfiles. 

    services:
       
       ...
       web:
         image: nginx:1.11
         ...
         volumes:
           ...
           - static-volume:/static

    volumes:
      static-volume:

The Postgres, RabbitMQ, and Nginx services all are built off of official Docker images for those services.  However, the app and worker services run the Django codebase, and they require some environmental customization.  The build docker-compose keyword lets us specify the Dockerfile to use and define build context (the subdirectory tree the Dockerfile build process has access to).

   i.e.
   app:
     build:
       context: .
       dockerfile: Dockerfile

Therefore, I've created a Dockerfile to build the app service properly. The Dockerfile uses the ubuntu:16.04 base image; however, a more lightweight image could be used instead. One Docker best practice is to run all the apt-get installs plus a cleanup step in a single Dockerfile RUN command. This ensures that when the Docker layer is created, it is not created with extra temporary files needed by Apt. (Specifically, the layer is created only after doing some space cleanup.)

    RUN apt-get update && apt-get install -y \
        build-essential \
        ...
        zlib1g-dev \
        && rm -rf /var/lib/apt/lists/*

Next, in the Dockerfile, we create a virtualenv and some other supporting directories. There is some debate about the need for creating a virtualenv inside of a Docker container. I personally think it is a good idea because it isolates the application Python modules from the OS-level Python modules (just as it does in a non-Docker setup). It offers one more layer of isolation.

    RUN mkdir -p $SITE_DIR
    WORKDIR $SITE_DIR
    RUN mkdir -p proj/ var/log/ htdocs/
    
    RUN python3 -mvenv env/

After creating the virtualenv, I force an upgrade of pip to ensure I'm using the most recent pip version.

    RUN env/bin/pip install pip --upgrade

One important step that may seem out of place is that I copy in the Python requirements.txt file and install the Python requirements early on in the Dockerfile build process. Installing the Python requirements is a time-consuming process, and I can leverage Docker's build-in caching feature to ensure that Docker only needs to install the requirements if a change is specifically made to the requirements file. If I were push that step further down in the Dockerfile, I'd risk unnecessarily re-installing the Python requirements every time I make an arbitrary change to the code.

    COPY requirements.txt requirements.txt
    RUN env/bin/pip install -r requirements.txt

I like to explicitly install uwsgi to ensure it's present, since it's required to even attempt to start the app. I also set some environment variables, such as the database connection string. Note: these environment variables can be overridden by defining them in the environment section of the docker-compose.yml file.

    RUN env/bin/pip install uwsgi

    ENV NUM_THREADS=2
    ENV NUM_PROCS=2

    ENV DJANGO_DATABASE_URL=postgres://postgres@db/postgres

Next, the Dockefile copies a docker-utils folder into the container. This folder contains most of the Docker-specific scripts and configuration files needed to run Django and other services, and this copy makes these files available to the container.

    COPY docker-utils/ docker-utils/

After that, the entire codebase is copied from the current directory to a proj/ directory inside the container. The proj directory will be where the container runs the codebase from.

    COPY . proj/

Finally, the Dockerfile ENTRYPOINT is set to a script called entrypoint.sh. Normally, the entrypoint defaults to a shell executable, such as /bin/bash. However, overriding it allows us to do some interesting things - more on that later.  The Dockerfile CMD is set to another shell script, app-start.sh. This specifies the default command (or script) that should be run when the container starts. In this case, the app-start.sh actually runs the Django process by executing the uwsgi command.

    ENTRYPOINT ["./docker-utils/entrypoint.sh"]
    CMD ["./docker-utils/app-start.sh"]

Let's take a closer look at the entrypoint.sh. This was one of the files in the docker-utils folder copied to the container. The entrypoint.sh is called every time the container starts, regardless of arguments passed to the container. It is just a shell script that lets us add some additional optional logic. In this example, if someone passes "init" as an argument to the docker-compose run command, we run the Django migrate and collectstatic management commands.

    #!/bin/bash

    set -eoux pipefail

    if [ "$1" == 'init' ]; then
        echo "Run Migrations"
        ${SITE_DIR}/env/bin/python ${SITE_DIR}/proj/manage.py migrate
        ${SITE_DIR}/env/bin/python ${SITE_DIR}/proj/manage.py collectstatic --no-input
    elif [ "$1" == 'manage' ]; then
        shift
        echo "Manage.py $@"
        ${SITE_DIR}/env/bin/python ${SITE_DIR}/proj/manage.py $@
    else
        exec "$@"
    fi

The start-app.sh command is pretty straightforward as it is a simple wrapper to call uwsgi and run the Django app. There are a few things worth mentioning about it. First, several configuration options are passed into it via environment variables. We use the --chdir flag to change directory to the proj/ directory (containing the codebase) before running. As required by Django, we set the DJANGO_SETTINGS_MODULE environment variable to make uwsgi aware of the Django settings. Another useful setting is the --python-autoreload=1 parameter, which tells uwsgi to reload when it detects a change to the codebase. This operates very similarly to how runserver reloads and is very useful for development. Many of the options are fairly standard uwsgi command options.

#!/bin/bash


echo "Starting uWSGI for ${PROJECT_NAME}"


$SITE_DIR/env/bin/uwsgi --chdir ${SITE_DIR}proj/ \
    --module=${PROJECT_NAME}.wsgi:application \
    --master \
    --env DJANGO_SETTINGS_MODULE=${PROJECT_NAME}.settings \
    --vacuum \
    --max-requests=5000 \
    --virtualenv ${SITE_DIR}env/ \
    --socket 0.0.0.0:8000 \
    --processes $NUM_PROCS \
    --threads $NUM_THREADS \
    --python-autoreload=1

Now that we've talked a bit about the Dockerfile and docker build process, let's take a closer look at the Nginx service in the docker-compose.yml file. The official Nginx Docker image has a very specific way that you are supposed to customize the Nginx configuration.  Specifically, an Nginx configuration template (default.template.conf) is passed into the container via a Docker volume. When the container is executed, the envsubst command combines the configuration template with environment variables from the container and generates the actual Nginx configuration. This allows you to dynamically craft an Nginx configuration file by passing in different environment variables via the docker-compose.yml file.

   web:
     image: nginx:1.11
     ...
     volumes:

       - ./docker-utils/nginx/default.template.conf:/root/default.template.conf

     command: /bin/bash -c "envsubst '$$NGINX_HTTP_PORT $$NGINX_HTTPS_PORT' < /root/default.template.conf > /etc/nginx/conf.d/default.conf && nginx -g 'daemon off;'"

     environment:
       - NGINX_HTTP_PORT=80
       - NGINX_HTTPS_PORT=443

Lets take a look at the Celery worker service in the docker-compose.yml file. This service uses the same Dockerfile that was used for the build of the app service, but a different command executes when the container runs. There is nothing magic going on with this command; this simply executes Celery inside of the virtualenv.

   worker:
     build:
       context: .
       dockerfile: Dockerfile
     container_name: dddemo-worker
     command: /site/env/bin/celery worker -A dddemo --workdir /site/proj/ -l info

Finally, we can move away from the Docker-related configuration and take a look at the Celery configuration in the Django project. Much of the following configuration is boilerplate from the Celery 4.0 docs, so I won't go into too much detail.

First, there exists a celery.py file inside of the Django project. This integrates Celery into the Django project, and reads in the Celery configuration from settings.py. Next, there exists an __init__.py at the root of the project, which initializes the aforementioned celery.py. The Django settings.py contains some Celery configuration, including how to connect to the RabbitMQ service. A very simple Celery add task is defined in tasks.py; this task will add two numbers passed to it. A very simple Django view hosts a page at the root url and will execute the add task in the background. The results of the add task can be viewed in the taskresult module in the Django Admin.

Though this is a very basic prototype example, it demonstrates a complete Django project and backing services needed to execute Celery jobs. The entire distributed system needed for this application can be executed with only two Docker Compose commands.

To get started, the README describes how to run the project and provides some other common commands that can be used to administer the project. Hope this is useful, and let me know your feedback.



Monday, July 25, 2016

uWSGI Basic Django Setup

Here are two basic examples of almost the same uWSGI configuration to run a Django project; one is configured via an ini configuration file and the other is configured via a command line argument.

This does not represent a production-ready example, but can be used as a starting point for the configuration.

Setup for this example:
# create dir for virtualenv and activate
mkdir envs/
virtualenv envs/runrun/
. ./envs/runrun/bin/activate

# create dir for project codebase
mkdir proj/

# install some django deps
pip install django uwsgi whitenose

# create a new django project
cd proj/
django-admin startproject runrun
cd runrun/

# Add to or modify django settings.py to setup static file serving with Whitenoise.
# Note: for prod environments, staticfiles can be served via Nginx.
# settings.py 
MIDDLEWARE_CLASSES = [
    ....
    'whitenoise.middleware.WhiteNoiseMiddleware',

]
STATIC_ROOT = os.path.join(BASE_DIR, 'staticfiles')
STATICFILES_STORAGE = 'whitenoise.storage.CompressedManifestStaticFilesStorage'

Create the config file
$ cat run.ini 
[uwsgi]
module = runrun.wsgi:application
virtualenv = ../../envs/runrun/
env = DJANGO_SETTINGS_MODULE=runrun.settings 
env = PYTHONPATH="$PYTHONPATH:./runrun/" 
master = true
processes = 5
enable-threads = true
max-requests = 5000
harakiri = 20
vacuum = true
http = :7777
stats = 127.0.0.1:9191

Execute config
uwsgi --ini run.ini

Or create a shell script
$ cat run.sh
 uwsgi --module=runrun.wsgi:application \
      -H ../../envs/runrun/ \
      --env DJANGO_SETTINGS_MODULE=runrun.settings \
      --env PYTHONPATH="$PYTHONPATH:./runrun/" \
      --master \
      --processes=5 \
      --max-requests=5000 \
      --harakiri=20 \
      --vacuum \
      --http=127.0.0.1:7777 \
      --stats 127.0.0.1:9191 

Execute the shell script:
./run.sh

Command line reference
--env = set environment variable
--max-requests = reload workers after the specified amount of managed requests
--https2 = http2.0
--http = run in http mode
--socket = use in place of http if using an Nginx reverse proxy
--vacuum = clear environment on exit; try to remove all of the generated file/sockets
--uid = setuid to the specified user/uid
--gid = setgid to the specified group/gid
--stats = run a stats server
--H, --virtualenv = virtualenv dir
--processes = num of worker processes
--module = python module entry point

Wednesday, December 9, 2015

Automatic Maintenance Page for Nginx+Django app

If you've used Django with Nginx, you are probably familiar with how to configure the Nginx process group to reverse proxy to a second Gunicorn or uWSGI Django process group.  (The proxy_pass Nginx parameter passes traffic through Nginx to Django.)

One benefit of this approach is that if your Django process crashes or if you are preforming an upgrade and take Django offline, Nginx can still be available to serve static content and offer some sort of "the system is down" message to users.  With just a few lines of configuration, Nginx can be set to automatically display a splash page in the above scenarios.

If the Django process running behind the reverse proxy becomes unavailable, a 502 error will be emitted by Nginx.  By default, that 502 will be returned to the browser as an ugly error message.  However, Nginx can be configured to catch that 502 error and respond with custom behavior.  Specifically, if a 502 is raised by Nginx, Nginx can check for a custom html error page document and serve up that document if it exists.  If that document does not exist, Nginx can just return the default ugly 502 error message.  The Nginx configuration below executes this described behavior.

This Nginx example is stripped down to just demonstrate the concept described.  It only shows the server section portion of the Nginx config.  This example assumes that you have created your custom error page at the following location: /sites/joejasinski/htdocs/50x.html.

server.conf
...

# reverse proxy boilerplate
upstream app_server {
    server unix:/sites/joejasinski/var/run/django.socket;

}

server {
    listen   443;
    server_name  joejasinski.com www.joejasinski.com;
    ...

    # if Django is unreachable, a 502 is raised...
    error_page 502 @502;
    location @502 {
      root   /sites/joejasinski/htdocs/;
      # try to load a file called 50x.html at the document root
      # or re-raise a generic 502 if no such file is present.
      try_files /50x.html =502;
   }

    # reverse proxy boilerplate for hosting the Django app 
    location / {
        try_files $uri @proxy_to_app;
    }

    # reverse proxy boilerplate for hosting the Django app
    location @proxy_to_app {
        ....
       proxy_pass http://app_server;
    }
}


It's worth mentioning that, by default, if your Django app raises a 502 error behind the reverse proxy, Nginx will not catch it and thus will not respond with the custom Nginx 502 page.  The Nginx 502 error page is only displayed for 502 errors that originate from Nginx; an unavailable reverse proxy application qualifies as such.

This behavior is desirable since Django has it's own error handler for handling 50x errors within the application; we want Nginx to display an error page when Django is down, but Django should handle all errors generated by itself.

Note: you can change this behavior to have Nginx handle any error generated by the reverse proxied application.  This can be done with the proxy_intercept_errors Nginx parameter.

Note: using a similar approach to the above, you can also create error Nginx-based error handlers for other error codes generated by Nginx.

In summary, this approach allows you to automatically show users a friendly error page if your Django process crashes or is down for maintenance.

Wednesday, July 20, 2011

Django: Using Caching to Track Online Users

Recently I wanted a simple solution to track whether a user is online on a given Django site.  The definition of "online" on a site is kind of ambiguous, so I'll define that a user is considered to be online if they have made any request to the site in the last five minutes.

I found that one approach is to use Django's caching framework to track when a user last accessed the site.  For example, upon each request, I can have a middleware set the current time as a cache value associated with a given user.  This allows us to store some basic information about logged-in user's online state without having to hit the database on each request and easily retrieve it by accessing the cache.

My approach below.  Comments welcome.

In settings.py:
# add the middleware that you are about to create to settings
MIDDLEWARE_CLASSES = (
    ....
    'middleware.activeuser_middleware.ActiveUserMiddleware',
    ....
)

# Setup caching per Django docs. In actuality, you'd probably use memcached instead of local memory.
CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        'LOCATION': 'default-cache'
    }
}

# Number of seconds of inactivity before a user is marked offline
USER_ONLINE_TIMEOUT = 300

# Number of seconds that we will keep track of inactive users for before 
# their last seen is removed from the cache
USER_LASTSEEN_TIMEOUT = 60 * 60 * 24 * 7

In activeuser_middleware.py:
import datetime
from django.core.cache import cache
from django.conf import settings

class ActiveUserMiddleware:

    def process_request(self, request):
        current_user = request.user
        if request.user.is_authenticated():
            now = datetime.datetime.now()
            cache.set('seen_%s' % (current_user.username), now, 
                           settings.USER_LASTSEEN_TIMEOUT)

In your UserProfile module or some other model associated with the user:
class UserProfile(models.Model):
    ....
    ....

    def last_seen(self):
        return cache.get('seen_%s' % self.user.username)

    def online(self):
        if self.last_seen():
            now = datetime.datetime.now()
            if now > self.last_seen() + datetime.timedelta(
                         seconds=settings.USER_ONLINE_TIMEOUT):
                return False
            else:
                return True
        else:
            return False

Then in the template where you want to display whether the user is online or not:
{% with request.user.get_profile as profile %}

 <table>
   <tr><th>Last Seen</th><td>{% if profile.last_seen %}{{ profile.last_seen|timesince }}{% else %}awhile{% endif %} ago</td></tr>
   <tr><th>Online</th><td>{{ profile.online }}</td></tr>
 </table>

{% endwith %}

Pros:
 - Simple solution
 - Doesn't need to hit the database for saving the timestamp each request

Cons:
  - Last user access times are cleared if the server is rebooted or cache is reset
  - Last user access times are accessible only as long as they exist in the cache.

Friday, July 8, 2011

Django Permission TemplateTag

In a previous post, I wrote about a way to keep track of user permissions on a model instance.  For example, I suggested that each model have a permissions subclass that could be instantiated with a user instance passed as a constructor argument.  Methods on that permissions class could then be called to determine if that user has permission to perform various actions.

I also suggested that the threadlocals module could then be used to pass in the user instance to the permissions object in the Django template.  However, from various readings, I get the impression that threadlocals may not be the best thing for passing arguments in a template function.   Therefore, I decided to use a more traditional route of creating a template tag to do something similar.

I created a template tag that lets you surround a block of HTML code to hide or show the contents based on the return value of the permission function.  The tag below basically says, "if the logged in user has 'can_edit_group' permission on the given 'group' object instance, then display the Edit link".

Reference the original post for details.

In the Django template
{% load permission_tags %}
{% permission request.user can_edit_group on group %}
<a href="">Edit</a>
{% endpermission %}

Here is the templatetag definition that fits the example above.

In templatetags/permissions_tags.py
from django import template
register = template.Library()

def permission(parser, token):
    try:
        # get the arguments passed to the template tag; 
        # first argument is the tag name
        tag_name, username, permission, onkeyword, object = token.split_contents()
    except ValueError:
        raise template.TemplateSyntaxError("%r tag requires exactly 4 arguments" % token.contents.split()[0])
    # look for the 'endpermission' terminator tag
    nodelist = parser.parse(('endpermission',))
    parser.delete_first_token()
    return PermissionNode(nodelist, username, permission, object)


class PermissionNode(template.Node):
    def __init__(self, nodelist, user, permission, object):
        self.nodelist = nodelist
        # evaluate the user instance as a variable and store
        self.user = template.Variable(user)
        # store the permission string
        self.permission = permission
        # evaluate the object instance as a variable and store
        self.object = template.Variable(object)

    def render(self, context):
        user_inst = self.user.resolve(context)
        object_inst = self.object.resolve(context)
        
        # create a new permissions object by calling a permissions 
        # factory method of the model class
        permissions_obj = object_inst.permissions(user_inst)
        
        content = self.nodelist.render(context)
        
        if hasattr(permissions_obj, self.permission):
            # check to see if the permissions object has the permissions method
            # provided in the template tag
            perm_func = getattr(permissions_obj, self.permission)
            # execute that permissions method
            if perm_func():
                return content 
        return ""

register.tag('permission', permission)

This tag currently works like an 'if' template tag and shows/hides anything wrapped between the permission and endpermission tags.  A future goal may be to make this work like an if/else tag so I can specify an else condition.

Wednesday, June 22, 2011

Django: Logging Quickstart in Django 1.3

Logging is always one of those things that seems like there is a lot of boilerplate code that I tend to always have to lookup oneline each time.  Though logging is very customizable, most of the time I need simply to only need log to a file or files.   Therefore, I'm writing this in hopes of compling a simple configuration that can be dropped into a project.  My understanding is still rudimentary so feel free to drop a comment.

A logging framework is built into Django 1.3 which makes it easier to integrate into a given project.  The logging framework contains three main components: formatters, handlers, and loggers.  Formatters determine how the output will be displayed when it is logged.  Handlers handle where output is logged, whether that be a file, the counsole, or an email.  Loggers are associated with handlers and are the objects that one interacts with when logging something.  As in the example below, more than one formatter, handler or logger can be defined in a project.

In settings.py:
import os

# choose a path that is your virtual environment root
VENV_ROOT = os.path.join('/','web','myvenv')

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': {
        'verbose': {
            'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
        },
        'simple': {
            'format': '%(levelname)s %(message)s'
        },
    },
    'handlers': {
        'file_userlogins': {                # define and name a handler
            'level': 'DEBUG',
            'class': 'logging.FileHandler', # set the logging class to log to a file
            'formatter': 'verbose',         # define the formatter to associate
            'filename': os.path.join(VENV_ROOT, 'log', 'userlogins.log') # log file
        },

        'file_usersaves': {                 # define and name a second handler
            'level': 'DEBUG',
            'class': 'logging.FileHandler', # set the logging class to log to a file
            'formatter': 'verbose',         # define the formatter to associate
            'filename': os.path.join(VENV_ROOT, 'log', 'usersaves.log')  # log file
        },
    },
    'loggers': {
        'logview.userlogins': {              # define a logger - give it a name
            'handlers': ['file_userlogins'], # specify what handler to associate
            'level': 'INFO',                 # specify the logging level
            'propagate': True,
        },     

        'logview.usersaves': {               # define another logger
            'handlers': ['file_usersaves'],  # associate a different handler
            'level': 'INFO',                 # specify the logging level
            'propagate': True,
        },        
    }       
}

In application code such as in views.py:
import logging   # import the required logging module
logger_logins = logging.getLogger('logview.userlogins')  # logger from settings.py
logger_logins.info('Log info')
logger_logins.debug('Log Debug information %s' % ("can pass in variables"))

# we can log to a different file using the other logger
logger_saves = logging.getLogger('logview.usersaves')
logger_saves.error('log an error')

Tuesday, June 7, 2011

Django: Executing manage.py from Anywhere

One nice convenience would be to have a way to call manage.py from anywhere within a virtual environment.  For example, if I want to execute 'manage.py shell', it would be nice to be able to execute that from anywhere rather than having to change directory to where manage.py lives.

We can already run management commands from anywhere using django-admin by utilizing the --settings flag, but doing this requires a lot of typing. And wrapping the django-admin in a shell script seems inelegant somehow.

Instead, I wrote a small python script based off of manage.py.  This can be placed in the ./bin directory in the virtualenv.  The DjangoManageSettings object is meant to be customized on a per project basis, and I intend to make it so it can be passed in from a file configuration somewhere.

Anyway, thought I would share.  I'm curious to see other approaches to this.

# ---------- ${VIRTUAL_ENV}/bin/django_manage.py -------------------------
#!/usr/bin/env python

import os, sys
from django.core.management import execute_manager


class DjangoManageSettings(object):

    def __init__(self):
        self.VENV_ROOT = os.getenv("VIRTUAL_ENV")
        # The name of the module that settings.py resides in
        self.PROJECT_NAME="myproject"
        # The path to the directory where the Django project lives. 
        # I made it relative to the virtualenv root. 
        self.PROJECT_PARENT_PATH = os.path.join(self.VENV_ROOT, "app")
        # The name of the django settings model in the project
        self.SETTINGS_MODULE_NAME='settings'


dm_settings = DjangoManageSettings()

if not dm_settings.VENV_ROOT:
    print("A virtual environment must be activated before using this script.")
    exit(1)


sys.path.insert(0, os.path.join(dm_settings.PROJECT_PARENT_PATH))
mod = __import__(dm_settings.PROJECT_NAME, globals(), locals(), 
                 [dm_settings.SETTINGS_MODULE_NAME])
settings = getattr(mod, dm_settings.SETTINGS_MODULE_NAME)

if __name__ == "__main__":
    execute_manager(settings)