Skip to main content

Docker: Develop with Private Git Repositories in requirements.txt file

In trying to define my ideal Docker and Django development setup, I've been looking for a solution to the following needs:
  1. I've often found the need to install my project's private Git repositories into Docker container via a standard requirements.txt file. 
  2. I also want to easily develop on those private Git repositories, in addition to the Git repo main project repository that contains my Django project.
As you've probably encountered, a standard pip requirements file for installing both pypi packages and private Git repos might look something like the following.

# requirements.txt
django=1.11.3
-e git+git@github.com:my_github_user/repo_1.git#egg=my_project1
-e git+git@github.com:my_github_user/repo_2.git@my_branch@egg=my_project2

If you are not using Docker and you run pip install -r requirements.txt inside of a virtualenv, pip will download pypi packages (in this case Django) into the lib/python3.x/site-packages/ directory of your virtualenv and will create Git checkouts of the Git urls into the "src/" directory of your virtualenv.  If the Git urls are private Git repositories, pip will use your ssh private key to gain access to the git remote (Github in this case). By default, pip will look for your private key in ~/.ssh/id_rsa (but that location can be changed through configuring ssh).

While this might work fine for checking out private Git code from a requirements.txt on your local desktop or onto a secure webserver, installing this code into a Docker container produces other challenges.

Specifically, your Docker container needs access to your ssh private key in order to download the code from the private Git repos. However, if you were to copy your private key into a Docker container using the Dockerfile "COPY" command, a new Docker layer would be created and that key would be stored in the Docker "layer-stack" permanently, even if you were to delete it in a subsequent layer. This is not ideal for security reasons; if you were to push your Docker image to Dockerhub or some other Docker remote, your ssh private key would be baked into it for all of your colleagues to see (and hopefully not steal).

I spent a lot of time looking for a good solution to this problem, but every "solution" seemed hack-ish.

I first thought that maybe you could pass the key into the Docker container via a Docker volume mount. However, the volume mount happens at container runtime (when you execute the "docker run" command), but the container needs access to that key prior to that, at container "build" time (when you execute the "docker build" command). So that really didn't seem like a workable solution.

Second, some blog posts on this topic suggested running an http server on your local machine or elsewhere that would serve up the ssh private key to the container. The Dockerfile could use a single RUN command to download the key into the container, use the key to install the private requirements, and then delete the key. This would prevent the layer from being saved with the key in it. However, this solution seemed over the top, as it requires running a server just for this step.

Another approach that I considered was to pre-download the code from the Git repos outside of the container, with the "pip download" command, prior to running a Docker build (i.e. pip download --no-deps -r requirements.txt -d ./vendor/). The downloaded packages directory could then be copied into the container with the Dockerfile COPY command, and then could be installed inside of the container with "pip install". However, I simply couldn't work out how to make this solution work without defining two requirements files: one with the Git urls and one with filesystem urls to the downloaded packages. I tried a number of things with this approach, but none seemed to be very satisfactory.

I also considered using the "docker secrets" feature to pass in the key to the container. However, this feature seemed to be designed only for use with Docker Swarm, and not plain old Docker containers.  I wanted this solution to work for options other than just Docker Swarm, so didn't end up going this route.

Finally, I found a solution in Docker Mulit-stage builds. Multi-stage builds are a new feature of Docker (as of Docker 17.05) that lets you use one image to build your code and then use a second image to run your code. Multi-stage builds are often used so that you can build your Docker codebase - with all of your compilers and development tools - in the first image, but then only copy over the compiled distribution code to the "runtime image", thus allowing you to fashion more lightweight images.

One key thing about multi-stage builds is that the layer history of the first image is not carried over to the second image. This is perfect for the case of our ssh private key. The first image is where we copy in our private key, create a virtualenv, and install the private repos and other pypi packages into our virtualenv. We can then create a second image, copy the already built virtualenv into that second image, then run that image in a container, and push the image to a Docker remote. At no point does the private ssh key make it into the second image.

Let's get into specifics.

We first need to temporary set your private key to an environment variable so it can be passed into the container at build time.

export ssh_prv_key="$(cat ~/.ssh/id_rsa)"

Next, we need to define a docker-compose file that reads in the private key environment variable and passes that in as a Docker build argument.

docker-compose.yml


version: '2.2'
services:
  app:
    build:
      context: ./
      dockerfile: Dockerfile
      args:
        ssh_prv_key: "$ssh_prv_key"
    volumes:
      #- ./vendor/:/site/vendor/
      - .:/site/proj/
    environment:
      - DEBUG=True


Dockrfile

Next, we define a Docker file. We start with a Docker FROM statement; however, notice the "as builder". This is part of the new magic of multi-stage builds. It lets us temporarily name the image so we may reference this image in a future build step.

FROM python:3.6 as builder

Next, we define and create a working directory where we can store our code. This is mostly Docker housekeeping.

ARG site_dir=/site/
RUN mkdir -p $site_dir
WORKDIR $site_dir

Next, we install whatever Apt packages needed for the build. Notice how we are following the Docker best-practice of removing the temporary Apt apt files.

RUN apt-get update; apt-get install -y \
    python3-dev python3-venv \
    && rm -rf /var/lib/apt/lists/*

Here we read the ssh private key in from an environment variable, save it to a private location, /root/.ssh/id_rsa, and adjust permissions.

RUN chmod 700 /root/.ssh; \
    echo "$ssh_prv_key" > /root/.ssh/id_rsa; \
    chmod 600 /root/.ssh/id_rsa

Next, we tell the ssh config about the key. These settings here are needed to allow the git+ssh pip install to succeed within the container.
RUN echo " IdentityFile /root/.ssh/id_rsa" >> /etc/ssh/ssh_config; \
    echo " StrictHostKeyChecking=no" >> /etc/ssh/ssh_config; \
    echo " UserKnownHostsFile=/dev/null" >> /etc/ssh/ssh_config; \
    echo " GlobalKnownHostsFile=/dev/null" >> /etc/ssh/ssh_config

Then, we create a virtual environment. This not only used to isolate the project packages from the OS image Python packages, but it will be needed to copy all of the Python dependencies from our "build image" to our "runtime image". More on this later.
RUN python3 -m venv env; \
    env/bin/pip install --upgrade pip

Docker copies the requirements file into the container, then pip installs them. I chose to specify the --src flag, which will install all of the Git-downloaded python requirements into the "vendor" directory.

COPY requirements.txt $site_dir/requirements.txt
RUN env/bin/pip install -r $site_dir/requirements.txt --src vendor/

This what makes it all possible! We can define a second FROM statement in a single Dockerfile. Again, this defines a completely new image with no shared layer history from the first.

FROM python:3.6
ARG site_dir=/site/

We install whatever Apt packages are needed to run the app. 

RUN apt-get update; apt-get install -y \
    python3-dev python3-venv \
    && rm -rf /var/lib/apt/lists/*

RUN mkdir -p $site_dir
WORKDIR $site_dir

Finally, we copy the entire site directory (virtualenv included) from the first container to the second. The --from flag lets us specify that the first argument is a directory in the first container.

COPY --from=builder $site_dir $site_dir

CMD ["echo", "done"]

That's basically it! You can build the image as normal using Docker Compose:

docker-compose up --build

And if you have a look at the image that is built, using docker history command, you'll notice that the final image does not contain the ssh private key as a build argument, nor do any of the layers contain the ssh key.

$ docker history testdockermulti_app
IMAGE CREATED CREATED BY SIZE COMMENT
0391cd115e89 4 hours ago /bin/sh -c #(nop) CMD ["echo" "done"] 0B
93525a31703c 4 hours ago /bin/sh -c #(nop) COPY dir:49bbdd1f1b1ee2b... 55.1MB
931472817f48 26 hours ago |1 site_dir=/site/ /bin/sh -c apt-get upda... 90MB
2086dc5bbc53 3 days ago /bin/sh -c #(nop) WORKDIR /site/ 0B
5c0aa0069dae 3 days ago |1 site_dir=/site/ /bin/sh -c mkdir -p $si... 0B
969057afd396 3 days ago /bin/sh -c #(nop) ARG site_dir=/site/ 0B
01fd71a97c19 8 days ago /bin/sh -c #(nop) CMD ["python3"] 0B
<missing> 8 days ago /bin/sh -c set -ex; wget -O get-pip.py '... 5.23MB
<missing> 8 days ago /bin/sh -c #(nop) ENV PYTHON_PIP_VERSION=... 0B
<missing> 8 days ago /bin/sh -c cd /usr/local/bin && ln -s idl... 32B
...




Finally, if you want to work on those private repositories (on your Docker host) as part of your development workflow, you can create a "vendor" directory within your project and clone the codebases into the exact same location that pip would install them.  Then, using a Docker volume mount, you can place the Docker host's copy of the "vendor" directory into the images "vendor" directory.

# on Docker build host
mkdir vendor; cd vendor
git clone git@github.com:my_github_user/repo_1.git  my_project1
git clone git@github.com:my_github_user/repo_2.git  my_project2


# then mount the "vendor" directory 

Anyway, hopefully this might give you some ideas as to how to build Docker images with private resources in requirements.txt.  I'm still refining this process and am working it into my workflow, but I'm glad to hear your thoughts on the topic.

Comments

Popular posts from this blog

Docker: Run as non root user

It's good practice to run processes within a container as a non-root user with restricted permissions.  Even though containers are isolated from the host operating system, they do share the same kernel as the host. Also, processes within a container should be prevented from writing to where they shouldn't be allowed to as extra protection against exploitation. Running a Docker process as a non-root user has been a Docker feature as of version 1.10. To run a Docker process as a non-root user, permissions need to be accounted for meticulously.  This permission adjustment needs to be done when building a Dockerfile. You need to be aware of where in the filesystem your app might write to, and adjust the permissions accordingly.  Since everything in a container is considered disposable, the container process really shouldn't be writing to too many locations once build. Here is an annotated example of how you might create a Dockerfile where the process that runs within runs a

Django: Using Caching to Track Online Users

Recently I wanted a simple solution to track whether a user is online on a given Django site.  The definition of "online" on a site is kind of ambiguous, so I'll define that a user is considered to be online if they have made any request to the site in the last five minutes. I found that one approach is to use Django's caching framework to track when a user last accessed the site.  For example, upon each request, I can have a middleware set the current time as a cache value associated with a given user.  This allows us to store some basic information about logged-in user's online state without having to hit the database on each request and easily retrieve it by accessing the cache. My approach below.  Comments welcome. In settings.py: # add the middleware that you are about to create to settings MIDDLEWARE_CLASSES = ( .... 'middleware.activeuser_middleware.ActiveUserMiddleware' , .... ) # Setup caching per Django docs. In actuality, you

Automatic Maintenance Page for Nginx+Django app

If you've used Django with Nginx, you are probably familiar with how to configure the Nginx process group to reverse proxy to a second Gunicorn or uWSGI Django process group.  (The proxy_pass Nginx parameter passes traffic through Nginx to Django.) One benefit of this approach is that if your Django process crashes or if you are preforming an upgrade and take Django offline, Nginx can still be available to serve static content and offer some sort of "the system is down" message to users.  With just a few lines of configuration, Nginx can be set to automatically display a splash page in the above scenarios. If the Django process running behind the reverse proxy becomes unavailable, a 502 error will be emitted by Nginx.  By default, that 502 will be returned to the browser as an ugly error message.  However, Nginx can be configured to catch that 502 error and respond with custom behavior.  Specifically, if a 502 is raised by Nginx, Nginx can check for a custom html erro