Embedding a Jupyter Notebook in Joomla

Proper syntax highlighting, and a demonstration of including rendered Jupyter notebooks inline in a Joomla blog post.

 

I've invested quite a bit of time in the current version of my website in Joomla. Overall, I'm very happy with my Joomla experience and have resolved nearly all of the issues that were troubling me after first setting the site up (most of them involved display or other settings that were hard to find in the Joomla backend or overlapped with other controls or functionality). 

There's been one lingering exception though: properly displaying code (and Jupyter notebooks in particular). This website will present a lot of code and I want it to look just right. I've been using GitHub a lot lately and have gotten used to the way code and notebooks look there. The solution to the code issue turned out to be relatively simple - I just needed the right syntax highlighter. It took me a while to find PRISM Syntax Highlighter, which now makes it easy to display properly formatted and highlighted code:

import numpy as np

x3 = np.random.randint(10, size=(6, 4, 2)) # Three-dimensional array

print(x3[1:3,2:4])

 

Now the code looks proper, with the right color block surrounding it, and beautiful highlighted display like we have come to expect from GitHub and other sites. 

 

Embedding Jupyter Notebooks in Joomla

PRISM makes it easy to properly display beautifully highlighted code. But what about embedding whole Jupyter notebooks in a Joomla page? It turns out that PRISM is a key part of the solution to this issue as well. At GitHubnbviewer, and many other sites that display notebooks, they are rendered to HTML automatically. That wouldn't be possible without Python installed on my web host account (which I can set up, but would rather avoid if I don't have to). But we can render the notebooks manually from Jupyter and then include them in Joomla pages. 

Before I get too far into the details, here's the overview:

  1. Install PRISM Syntax Highlighter
  2. Install Custom CSS and set up an instance with the appropriate CSS to include with notebooks
  3. Render your notebook from Jupyter and put it into a Joomla article, including the Custom CSS module at the top

You only need to do steps 1 and 2 one time - after that you can just render your notebooks, paste them into an article with the CSS include at the top, publish it, and you're good to go. Here are the details:

 

1. Install PRISM

The easiest way to install PRISM for Joomla is to use Extensions: Manage in the Joomla backend. Search for 'PRISM' to find the extension:

 

Click on the PRISM extension to go to the details page, then click 'Install' to install:

 

 

 

2. Install the Custom CSS Extension

Jupyter notebooks include some CSS code that we need to reproduce in Joomla. We can use the Custom CSS extension to do this. Install Custom CSS the same way you did PRISM, using the extensions manager in Joomla. Once Custom CSS is installed, make sure it is enabled (if the button next to the name has a red X, click to enable and the X will turn into a green check mark:

 

Now we need to create an instance of the Custom CSS extension and specify where to include it. Go to Extensions -> Modules -> New and choose 'Custom CSS' from the list. Name your instance and paste in the CSS from this file: jupyter-notebook.css:

Create a new article. At the top of the article, insert a module using your editor and select your Custom CSS, or insert the include code shown in the image below.

 

Then disable or toggle your visual editor so you can enter raw HTML (many of the common Joomla editors strip some html out if you enter it in the visual editor). Paste in the notebook HTML from your rendered file:

 

 

Publish and save your article wherever you want it to appear on your site. Now when you go to your page your notebook should be included inline with proper formatting:

 

 

Optimization with numpy and numexpr - Measuring Performance

Nearly everyone who programs in Python has used numpy at some point. Numpy is an essential data science tool, part of the 'scientific stack' which also includes pandas, matplotlib, and several other very powerful libraries commonly used in scientific computing.

Numpy provides an array type and (highly optimized) functions for array operations. Many of us probably know that optimized execution (i.e., speed) is one of the advantages of numpy, but have never payed too much attention to the details of how much performance advantage we can achieve using numpy's types and functions and other related optimization tricks. We'll use execution timing to begin to take a more detailed view of the potential performance improvements available with numpy. We'll also look at numexpr, a numerical array expression evaluator, which can be used on its own or in combination with numpy to achieve some dramatic performance improvements.

First, let's do the necessary imports. Notice that we only import the functions we need from the math library. See this thread for why you should avoid using 'import *'.

In [2]:
from math import log, cos
import numpy as np
 

We will first do some common numeric computation without numpy. This will give us a performance baseline against which to measure various improvements. You have probably written something like the following code many times, performing some key computation step inside a for loop. In this case, we are using a list comprehension, which gives the same result as a for loop but provides a more concise and readable syntax.

We'll try doing a basic numerical computation for each of a series of a numbers, using two of python's built-in types - list and range.

In [3]:
loops = 25000000

a, b = range(1, loops), [i for i in range(1,loops)]
print(f'Type of a is {type(a)}; type of b is {type(b)}')

def f(x):
    return 3 * log(x) + cos(x) ** 2

%timeit r = [f(x) for x in a]
%timeit r = [f(x) for x in b]
 
Type of a is <class 'range'>; type of b is <class 'list'>
1 loop, best of 3: 11.5 s per loop
1 loop, best of 3: 11.1 s per loop
 

The results are similar for each type - it takes ~12s to do the calculation for all the items in the list or range. Intuitively, this seems pretty slow, but we would like to quantify exactly how much improvement is reasonably possible. Now let's do the same calculation using numpy.

In [4]:
a = np.arange(1, loops)
print(f'The type of a is {type(a)}')

%timeit r = 3 * np.log(a) + np.cos(a) ** 2
 
The type of a is <class 'numpy.ndarray'>
1 loop, best of 3: 1.05 s per loop
 

You can see that the execution time is much improved (by an order of magnitude) when using the numpy array type and the arange function (which is equivalent to range() when using integer arguments, but returns an array instead of a list).

Notice also that there is another advantage to using numpy besides just speed - our code is cleaner. Because numpy functions use the array type, a single call to a numpy function performs work that would require a loop or other structure if using a built-in type. In the example above, we only save a couple of lines, since f() is very simple. Let's look at an example where the code reduction (and the speedup) is more significant.

In [7]:
list_1, list_2 = [np.random.random() for i in range(10000)], [np.random.random() for i in range(10000)]
array_1, array_2 = np.asarray(list_1),np.asarray(list_2)

def dot(a,b):
    sum = 0
    for j in a:
        for k in b:
            sum += j*k

    return sum

%timeit r = dot(list_1,list_2)
print('\n')
%timeit r = np.dot(array_1,array_2)
 
1 loop, best of 3: 5 s per loop

The slowest run took 57.63 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 2.84 µs per loop
 

Here we use numpy's dot function to calculate the dot product of two array vectors. In this case, the performance improvement is even more dramatic. Even the slowest run (~160µs) is thousands of times faster than the best non-numpy run!

We also see a real advantage here in terms of making the code more compact and readable. Declaring our user-defined dot() function requires several lines, whereas we can do the same calculation in one line with np.dot(). When you have to perform numeric computation of this sort, always ask yourself if there is a numpy function that will take care of this for you. If you find yourself writing a lot of code to perform a numerical calculation, particularly involving nested loops, then you are almost certainly doing it the hard way. Using numpy's functions can often save many lines or tens of lines of code.

Making full use of numpy has numerous advantages:

  • saving development time
  • making your code execute faster
  • making your code also more compact and readable
 

Faster numerical expression evaluation with numexpr

Another tool for speeding up execution time of numeric calculations is numexpr, a fast numerical expression evaluator. The full details of how numexpr achieves these improvements is beyond the scope of this article, but here's a quick overview. Expressions are compiled to byte code and executed on a virtual machine written in c, which uses vector registers to handle blocks of elements at a time for the most efficient execution.

In [8]:
import numexpr as ne

ne.set_num_threads(1)
f = '3 * log(a) + cos(a) ** 2'
%timeit r = ne.evaluate(f)
 
1 loop, best of 3: 502 ms per loop
 

We see that numexpr provides quite a performance improvement - evaluating the expression with numexpr is twice as fast (~.5s vs ~1s) with numexpr as with numpy. Numexpr also makes use of threading to further optimize execution. Let's use four threads instead of just one this time.

In [9]:
ne.set_num_threads(4)
%timeit r = ne.evaluate(f)
 
1 loop, best of 3: 229 ms per loop
 

We have cut execution time in half yet again, down to around 200ms. From the original ~12s execution time, we have cut the time down by roughly a factor of 50 using numexpr with threading. Numpy and numexpr can be combined to achieve further speed improvements. For some more information on combining numpy and numexpr, see Numpy micro-optimization and numexpr.

Notice that timeit dynamically determines the number of test runs based on execution time. For more info, see the timeit docs.

We'll go into more detail on these topics in another article.

Finally, a note on the question of Python's purported slowness.

 

How slow (or not) is Python?

Python has a reputation for being 'slow'. Here are some helpful articles that shed some light on this issue:

In a shallow way this view is correct. If we consider only execution time, speed is not python's strong suit in comparison to c/c++ and other languages. But when considering development time, speed becomes a big advantage for python. I've been working on a project involving an old, mostly c++ codebase that is being converted to python to try to bring it up to date. We're also trying to make some small improvements to the existing codebase in the short-term. On several issues, I've spent hours or a couple of days on doing something in c++ that could be done in minutes in python.

I'd like to help make the point that python doesn't have to be slow. Fmiliarizing yourself with some basic techniques can go a long way towards improving the performance of your python applications. This article introduces a few of the ways to optimize numerical computation. As we will see, some simple techniques can provide large reductions in execution time (an order of magnitude or more).

One common technique for improving application performance is to combine python with c/c++. There are many, many tools and techniques for building hybrid applications, so we'll cover those in a separate article.


Print