Hacking Jupyter notebooks is easy and fun, thanks to the lack of any protections by default when you run a notebook. So much so, that notebooks often appear in CTFs as easy targets for beginners.
Jupyter offers a web-based platform for coding in a bunch of languages. Python devs, especially data scientists, often rely on these notebooks for basic work tasks. Data scientists don’t have to think too much about security, unlike systems programmers and web devs. But since Jupyter notebooks are web-based, they network can access them. That’s where we come into the picture.
You can find official docs that show you how to install a Jupyter notebook on your system here: https://jupyter.org/install.
To hack a Jupyter notebook, you need to know how it works enough to exploit some kind of weakness that the admin of the notebook missed. So let’s look at how notebooks run, and try to find something we can exploit.
Network architecture for hacking Jupyter notebooks
In the most common scenario, devs run the Jupyter notebook locally. The web server that serves the notebook is not accessible from the global internet, or even (usually) the LAN.
However, this is not always the case. Sometimes, devs want to run code that will take a long time, and they don’t want to leave their laptop running all night. Even more commonly, the process needs computational resources that are not available on a personal computer.
In either case, the solution is to move the notebook to a cloud hosting provider. Google even offers Colab, a special, dedicated hosting system for Jupyter notebooks.
Forming an attack plan
When devs run a notebook locally, it’s not easy for us to get to it. We basically have to hack into their system some other way, just to get to the notebook, which partially defeats the purpose of attacking the notebook in the first place. But what about the second case? When a developer hosts their notebook on a VPS, for example, it’s available via the global internet. In other words, it’s exposed to attackers.
I propose the following method for hacking Jupyter notebooks:
- Search the internet for publicly accessible Jupyter notebooks
- Filter out password protected and non-root notebooks
- Launch a root shell and pwn the remaining notebooks
Launching our attack
Let’s organize our attack according to the three steps listed above.
A search engine for hacking Jupyter notebooks
We can use a Shodan Dork to quickly find all Jupyter notebooks without authentication: port:8888 title:”Home Page – Select or create a notebook”
In a real attack, we’d use the Shodan CLI to automatically open all of these. From there, we’d write a script to use the Jupyter web server’s API to automatically launch a shell, check our privileges, and so on. But for the purpose of demonstration, it will be simpler to just open them all in a bunch of tabs and see what happens.
When opening one of these public notebooks, you’ll see something like this.
We can open a shell by clicking New -> Terminal, which will give us a command line as the user who owns the notebook.
Who is jovyan?
The user jovyan is the default user created when Docker sets up a Jupyter notebook. The idea that Jovian means related to Jove (aka Jupiter). And since Jupyter is based on Jupiter, then Jovian becomes Jovyan. Or consult this now-famous Github issue.
In other words, this notebook is not secure. But it’s also not a very valuable target, because it doesn’t give us root access to the machine. It’s just a Docker container.
Still, if there is anything valuable in the notebook itself, we could access that. And of course, an attacker can still abuse this to get free computing – ie, mining crypto. A lot of these machines already run Monero mining rigs, which hackers install to monetize their access. For example, the one below.
This machine actually contains two separate pieces of malicious software running. The peer2profit file auctions off this machine’s networking capacity, and its CPU (or GPU) power is mining Monero via a rig installed in the script al.sh.
Anyway, a few of the machines from Shodan do give us root shells.
On this machine, we have total access. From here we could install malware, or even sell access to the machine on a darknet market. But for our purposes, it’s enough to show that we have root access.
What really makes this kind of hack worrying is that Jupyter is a popular choice for data science. Computers used for data science have two traits that make them attractive to hackers.
- Access to sensitive data (used for analysis, training ML models, etc).
- Powerful computing resources (high-end GPUs, plenty of memory, etc).
Companies spend so much money defending their web apps and infra. Yet data engineering lags woefully behind when it comes to security.
Preventing this kind of attack
The first defense against this kind of hack is setting up a password for your Jupyter notebook. It only requires you to run a single command:
➜ ~ jupyter server password
[JupyterPasswordApp] Wrote hashed password to /Users/maria/.jupyter/jupyter_server_config.json
➜ ~ jupyter-notebook
[I 2023-10-10 01:19:09.700 ServerApp] Jupyter Server 2.7.3 is running at: http://localhost:8888/tree
Now, when I load up the notebook, it will insist I supply a password.
Sadly, Jupyter does not ask the user to set up authentication when they run the notebook for the first time in the browser.
The next step is crucial: do not run your notebook as root. Just don’t! This can happen accidentally when people run sudo jupyter-notebook for example. Make sure you run the notebook as a normal system user. Ideally, create a special user to run the notebook, and don’t give this user any additional permissions on the system.
To make this step easier, you can run a Docker image that will set up Jupyter for you, plus our favorite user: jovyan. You can find out how to set up Jupyter through Docker here: How to Run Jupyter Notebook on Docker.
Finally, if you host any important data in your notebook, make backups. Attackers may not be careful with your data as they rummage in your system, or may even intentionally encrypt and ransom it. Honestly, this is a good practice anyway, just to avoid losing your hard work if your hard drive dies!