Integrating JupyterHub with Django using OAuth

Saturday, 08 February, 2020

Objectives

Imagine that you have just coded a Django web application which is capable of offering users the ability to login and use various services. You have now decided that you want to give the users the ability to interact with the services you provide using a Jupyter notebook spawned from within the Django web application. How would you do it?

Well, the good folks in the Jupyter community have already given us "JupyterHub" which allows users to spawn their own notebook servers starting from a single starting point - 'the hub' - after authenticating themselves in some way. By default, the JupyterHub system authenticates users against the Linux usernames and passwords for user accounts created on that system. So, the real change we need to make is to somehow tell JupyterHub to authenticate using the user accounts stored in the Django application.

These are the goals of this tutorial:

Let's begin! You will of course gain more insight into the workings if you follow along but if you wish to directly obtain the minimal working source code, check out this Github repository.

Setting Up Django

We are going to make a Django application from scratch. We are going to assume that you have some basic experience in Django and thus not explain every single step in this process. We want to create a virtual environment in a separare folder, install Django inside it and create a new project which we call "service_provider" since it will provide the user authentication service.

$ mkdir django-oauth-jupyterhub-demo
$ cd django-oauth-jupyterhub-demo
$ python3 -m venv venv/
$ source venv/bin/activate
$ pip3 install django==2.2.7
$ django-admin startproject service_provider

By default, Django is designed to use SQLite as a database. For our current purposes, we want to continue using the same. So, let us now create the database by 'migrating' it and create our first superuser.

$ cd service_provider
$ python manage.py migrate
$ python manage.py createsuperuser
(enter the details prompted)

Let us now test that everything works fine. Start the development server by saying,

$ python manage.py runserver

Point your favorite browser to http://127.0.0.1:8000/ and see if the success page appears. Next, head to http://127.0.0.1:8000/admin and see if you are able to login using the username and password you specified during the process of creating the super user.

Crash Course in OAuth

If you are a regular netizen, you would have done the following many times. You come across a site which requires you to login. You click on Login and you are given an option to use your existing Google or Github or some other login to sign into the web site. Stackoverflow, Evernote etc. are examples of such sites. This is great because it saves you, the user, the trouble of creating a new account and managing it. Instead you allow Google or Github or whatever to authenticate you using the account you created with them and share some information with the service you are trying to use. Here, we say that Google or Github are service providers and applications such as Evernote or Stackoverflow are client applications.

The mechanism by which a client application is able to authenticate a user using account information stored and maintained on the service provider's application or database is called OAuth or Open Authentication. Note that this is A popular way to achieve this - there can be other methods to do this as well!

In our example, Django will be the service provider since it will store all the user accounts and details while JupyterHub will be the client application.

What does Django need to do in order to support the OAuth mechanism?

Phew! Sounds like hard work! Well, Django comes with an extension which allows this to be mostly automated. Let's set Django up for this.

Setting OAuth on Django Side

We begin by installing the Django OAuth Toolkit. This can be done by saying,

$ pip install django-oauth-toolkit==1.2.0

Next, we need to install this app into our Django project. So, open up service_provider/settings.py and under INSTALLED_APPS list, add oauth2_provider. So, this part of your settings.py file will look something like this:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',

    'oauth2_provider',
]

And under MIDDLEWARE add the following entry:

'oauth2_provider.middleware.OAuth2TokenMiddleware',

That's it - all the logic needed to handle OAuth is now within your Django application. In order to maintain a list of applications, the access credentials and more, the OAuth toolkit that we just installed needs some tables to exist in the database. Let's run the migrate command and update our database.

$ python manage.py migrate

If you are curious to see what new models have been introduced by this, check out the Django admin console and see what has changed!

Some more settings are needed by the way! Unfortunately, because we are now going to ask two applications (likely hosted on two separate URLs) to send requests to each other, we are going trigger some security mechanisms placed inside Django. When something hosted on one URL triggers a request to something hosted on another URL it is called Cross Origin Resource Sharing (CORS). Unless the headers of the HTTP request are populated with some information, such requests are generally blocked by good browsers and good web applications. So, we need to tweak Django to deal with this. Without too much explanation here is what needs to be done.

$ pip install django-cors-middleware==1.4.0

Open service_provider/settings.py. Under INSTALLED_APPS, add the following line:

'corsheaders',

And under MIDDLEWARE add,

'corsheaders.middleware.CorsMiddleware',

And run a migration.

$ python manage.py migrate

Setting up Django URLs

The final step in terms of Django code changes is to setup the URLs over which Django app can be told that a client application needs a user to be authenticated. Yes, the client application needs to know the URL or address over which it ask Django for user information etc.

Now, the most essential URLs mandated by OAuth framework are already available to us thanks to our installing the oauth2_provider application. All we need to do is to tell Django to expose them to the world.

Open service_provider/urls.py and under the urlpatterns list, add the following entry:

path('o/', include('oauth2_provider.urls', namespace='oauth2_provider')),

At the top of the file, you will need to add the following:

from django.urls import include

Now, we are going to go to our browser and type http://127.0.0.1:8000/o/applications/. Then click "Click here". Enter the following information.

Please copy the Client ID and Client Secret in a text file somewhere and keep it handy. We are going to need it later!

The final URL we need to setup is the one that returns a JSON containing the currently logged in user! This is needed by JupyterHub and is not a part of the minimal requirements of the OAuth framework.

WARNING: I'm breaking form here! Ideally, you should have a separate Django application with its collection of URL definitions and view functions, all neatly arranged. But for the sake of achieving a bare minimal working example, I'm going to avoid creating any app at all and define a view function inside the main project's urls.py. For your main ready-to-serve application, you should be separate out this logic as per your style.

Next, open service_provider/urls.py and add the following code.

from django.http import HttpResponse
from django.contrib.auth.decorators import login_required
import json

@login_required()
def userdata(request):
    user = request.user
    return HttpResponse(
        json.dumps({
            'username': user.username
        }),
        content_type='application/json'
    )

And in the urlpatterns list, add

path('userdata', userdata, name='userdata')

To test if this works, ensure you are logged in (using the admin console) and type http://127.0.0.1:8000/userdata - you should get a JSON data dump containing the key "username".

If the user is not logged in and tries to access a page which requires the user to be logged in, Django automatically tries to redirect the user to the login page in a way such that once the user logs in successfully, the user is redirected back to the page they were trying to access initially. By default, the view function responsible for login requires a template to be defined as registration/login.html but instead of creating a page we can use ask Django to use the Django admin login page for now.

Open the settings.py file and add the following line:

LOGIN_URL = '/admin/login'

Finally, our Django application which by default had only one way of authenticating a user - by the default User model - needs to be told to recognize also those users who have identified themselves via OAuth. So, we add the settings.py the following lines:

AUTHENTICATION_BACKENDS = (
    'oauth2_provider.backends.OAuth2Backend',
    'django.contrib.auth.backends.ModelBackend'
)

That's the last change to be made to Django.

Enter JupyterHub

We are now ready to bring in the last piece of the puzzle - JupyterHub itself! Start by installing it and the Jupyterhub extension that supports OAuth framework.

$ pip install jupyter==1.0.0 jupyterhub==1.0.0 oauthenticator==0.9.0

Now, alongside our parent Django project folder service_provider we are going to create a new folder called hub_config where our JupyterHub config files will be kept and from where the Jupyterhub will be launched.

$ cd <path/to/django-oauth-jupyterhub-demo>
$ mkdir hub_config
$ cd hub_config

Next, we are going to create a file called jupyterhub_config.py which will contain the following code dump which I've explained with inline comments.

# This is how we tell Jupyter to use OAuth instead of the default
# authentication which is done using local Linux user accounts.
c.JupyterHub.authenticator_class = 'oauthenticator.generic.GenericOAuthenticator'

# Where should Django pass the authentication results back to?
c.GenericOAuthenticator.oauth_callback_url = 'http://localhost:8010/hub/oauth_callback'

# What is the client ID and client secret for Jupyterhub provided Django?

c.GenericOAuthenticator.client_id = 'irhIz1p3G8lyiBDWv66LzuwLacyV1i98jJP0qXQx'
c.GenericOAuthenticator.client_secret = 'tidEvFtozIJTTIfmHqkBEnlEtFl0Wd3tB7WnD2EvXDkRkk36Lphr5N3RoPaJhuJBaSuQ2j3WZSF7OrCrdGwG9ejEWty1VN
gkjon3EyTdKpeBXVLw8q4nk0szvU3tHUx6'

# Where can Jupyterhub get the token from?
c.GenericOAuthenticator.token_url = 'http://localhost:8000/o/token/'

# Where can it get the user name from? What method shall it use?
# What key in the JSON output is the username?
c.GenericOAuthenticator.userdata_url = 'http://localhost:8000/userdata'
c.GenericOAuthenticator.userdata_method = 'GET'
c.GenericOAuthenticator.userdata_params = {}
c.GenericOAuthenticator.username_key = 'username'

# What address will Jupyterhub be accessed from?
c.JupyterHub.bind_url = 'http://localhost:8010'

# By default Jupyterhub requires that a Linux user exist for every
# authenticated user. For testing, we are going to trick JupyterHub
# to merely pretend that such a user exists and launch notebook servers
# for the same user running the hub process itself!
from jupyterhub.spawner import LocalProcessSpawner

class SameUserSpawner(LocalProcessSpawner):
    """Local spawner that runs single-user servers as the same user as the Hub itself.

    Overrides user-specific env setup with no-ops.
    """

    def make_preexec_fn(self, name):
        """no-op to avoid setuid"""
        return lambda : None

    def user_env(self, env):
        """no-op to avoid setting HOME dir, etc.""" 
        return env

c.JupyterHub.spawner_class = SameUserSpawner

Wow! That was a lot. Take some time to read the settings and absorb them!

Launching and Testing JupyterHub

Now, keep the Django server running as is! Next, we are going to have to launch Jupyterhub but Jupyterhub requires some more pieces of info in the form of environment variables - the URL in Django which will authorize JupyterHub as an application and which gives the token. So, we will create a shell script that initializes these variables and then launches the hub.

1
2
3
4
5
6
#! /bin/bash

export OAUTH2_AUTHORIZE_URL="http://localhost:8000/o/authorize"
export OAUTH2_TOKEN_URL="http://localhost:8000/o/token/"

jupyterhub -f jupyterhub_config.py

Let's launch!

$ chmod u+x launch.sh
$ ./launch.sh

And now test! Head to http://localhost:8010. Click on the button "Sign In With GenericOAuth2". If you are logged into admin console already, then you should be take to a page where you click Authorize. If you are not already logged in, the Login page will appear first after which the page where you click Authorize will present itself. Once you click Authorize, your Jupyter notebook should launch.

That's it! JupyterHub has successfully learned how to authenticate a user using user account information stored in your Django application!

Remember: Source Code Available Here.

Note: In the application that I actually built I had to containerize both the Django application as well as JupyterHub and allow the latter to launch per user notebook servers as containers. I'll try to bring this out in a future tutorial.




Up