Stegonagraphy is the art of hiding information in plain sight. Unlike crypto, which hides data by protecting it with math so only the intended reader can read it, stego hides data by making it appear as though you are not hiding anything at all. Steganalysis is the art of detecting and breaking steganography. In this article, we’ll focus on JPEG and PNG steganalysis specifically.
To demonstrate how this works concretely – imagine a young person wants to share a file that contains a graphic video game with their friend. However, their parents read all of their emails, and they don’t want their parents to see the game.
So the kid comes up with a strategy – what if he could take an innocuous file, like an image, and /hide/ the video game file inside of it? This is not hypothetical. There are a variety of techniques available for hiding files, secret messages, and encrypted text inside ordinary JPEG image files.
Moreover, these techniques appear at the heart of some of the most pivotal cultural moments in hacking history, as we’ll see later.
Let’s dive in and look at how JPEG and PNG steganalysis works, common ways that people hide content in images, and how you can detect and defeat these techniques.
Appending archives to JPEGs
The simplest way to hide a file in an image file is to put it in a zip archive and append it to a JPEG. Simply running the following commands is enough:
$ cat hacking_tools.zip >> cat_meme.jpg
The new cat_meme.jpg file appears normal when opened as an image. But if changed to a .zip and opened, it allows you to access the original files. The same trick works for .rar files, perfect for easy password encryption.
The hacker tool Dangerous Kitten was famously shared by Anonymous on imageboards like 4chan.org/g/ and 711chan.org/i/ using precisely this technique. A set of hacking tools was appended to the following JPEG:
You can even see the original page where Anonymous hosted and documented the tool via the Wayback Machine: Dangerous Kitten – /i/nsurgency W/i/ki
Detecting this is as simple as checking whether data appears, appended after the image data ends. The JPEG file format uses the byte
0xffd9 to indicate when image data ends within a file. So we can simply see what’s written beyond that byte, but how to do so?
Using afterjpg to extract appended JPEG data
afterjpg tool does precisely this! Observe:
$ afterjpg stego_cat.jpeg > secret.zip
$ unzip secret.zip
$ cat hi
this is content
Marvelous! This tool is easy to use for detecting appended data en masse because it actually raises an exception if no appended data is found:
$ afterjpg Downloads/girl.jpeg
Traceback (most recent call last):
File "/Users/eliana/detect_appended_jpeg.py", line 26, in <module>
raise FileNotFoundError('No data appended to JPEG!')
FileNotFoundError: No data appended to JPEG!
Thus, for example, we could scan an entire directory for JPEGs with a few lines of Bash.
for i in *jp*g; do
afterjpg "$i" &> /dev/null;
if [ $? -eq 0 ]; then
Let’s move on to the next JPEG steganography technique we’ll want to detect.
Exif data in JPEG and PNG steganalysis
Exif data is a kind of metadata for images. JPEGs support Exif. Unbeknownst to many would-be photographers, Exif is often detrimental to privacy – including the make and model of the device the photo was taken with, or the GPS coordinates where the photo was taken. In fact, this is precisely how the hacker w0rmer was caught. He posted photos of his girlfriend on his Twitter page without scrubbing the Exif data, which enabled law enforcement to find his girlfriend, and subsequently, him.
Slightly more recently, and perhaps famously, John McAfee was arrested by Guatemalan authorities using Exif data in a selfie he took with a reporter from Vice. Here’s the original photo, including the Exif data:
Let’s read the Exif data and find the GPS coordinates where this photo was taken, the same way law enforcement did. We can do this in three ways:
- Command line apps like ExifTool
- Web applications like https://exifable.com/
- Desktop photo applications like the MacOS Photos app
Any of these work, but since I’m on a Mac, let’s see what happens when we open the photo’s metadata via the Photos app.
Not only do we get the type of phone and date, we get the GPS coordinates where the photo was taken! You can also easily remove Exif data from an image. However, scrubbing Exif from JPEGs is sometimes not as simple as deleting it, because Exif contains data about how an image should rotated before viewing.
Scrubbing the Exif data
To scrub Exif data with a script, we’ll need to rotate the image accordingly, /then/ delete the Exif data. For example, S3 Exif Cleaner does this in this code snippet
# Download image as raw bytes
img_bytes_exif = BytesIO()
# Transpose image per EXIF data, then scrub
img_bytes_NO_exif = BytesIO()
img_ops = Image.open(img_bytes_exif)
# exif data includes orientation image
# without which the image may end up rotated
# incorrectly. So we rotate the image per
# the exif data before scrubbing it.
img_ops = ImageOps.exif_transpose(img_ops)
# img_ops.save implicitly excludes EXIF data
# unless you explicitly tell it not to via the
# exif parameter
# Overwrite original image w/ scrubbed version
s3.meta.client.upload_fileobj(img_bytes_NO_exif, args.bucket, obj.key)
scrubbed_files_count += 1
Exif data is well-known, easy to detect, and easy to remove, making it a suboptimal place to hide content. Nevertheless, this technique is commonplace in CTF-style games. Let’s start getting more sophisticated.
Modifying the least significant bit for JPEG and PNG steganalysis
An image file will typically contain a section that uses a series of bytes to represent the color, darkness, and so on of a map of pixels representing the image itself. One way to hide a message is to use the very last bit in each byte, and spell our message with these bits. This changes the image ever so slightly – but since the last bit in a byte is the least significant one, the image only changes ever so slightly.
So imagine we had this binary data:
We would decode this by looking only at the last bit in each byte, which gives us a new number: 01001101. This is the binary equivalent of the decimal number 77, which represents the letter M in ASCII.
Unlike our previous examples, this time we’ll use a PNG image. Although LSB (least significant bit) steganography works with JPEGs, the lossy nature of JPEG means that the least significant bit is often discarded during compression. An example of a tool that implements LSB stego in PNGs is LSB-Steganography.
Let’s try it out. We can use a JPEG for the input and the tool will output a PNG for us.
$ python3 LSBSteg.py encode -i milady.png -o altered_milady.jpg -f secret
$ Output file changed to altered_milady.png
Let’s see the file it created and compare it to the original:
Looks pretty similar to me! However, if we look at the size of the files…
$ ls -lh *milady.png
-rw-r--r--@ 1 eliana staff 311K Oct 1 00:32 altered_milady.png
-rw-r--r--@ 1 eliana staff 297K Oct 1 00:30 milady.png
We see that the altered file is substantially larger! (well, considering that the file we’ve hidden in it is less than 30 bytes). Speaking of, let’s decode it and access the secret file we hid inside.
Accessing the original secret
$ python3 LSBSteg.py decode -i altered_milady.png -o revealed_secret.txt
$ cat revealed_secret.txt
the secret is: i love you
Of course, given the risk of detection, steganographers should also encrypt such data before concealing it within the image.
So it works…but remember how the files were different sizes? This is an effective way to defeat LSB. Simply compare the stego image to the original. Of course, you need access to the original image. One way to acquire the original is to use a reverse image lookup service like Tineye.
Searching the altered image quickly gives us the original, which we can compare using
ls -lh just like before.
Comparing filesize is unwieldy. Also, it’s possible to change LSBs without even changing the filesize. To address this, you could compare hashes. Getting a hash of an image is easy.
➜ ~ sha1sum milady.png
➜ ~ sha1sum altered_milady.png
This has a variety of use cases. For example, the film industry finds pirated files using hashes. They search for hashes of known pirated copies. Thus, pirates often alter the files slightly before distributing them. This creates a unique hash. State agencies also have big databases of known image hashes to detect steganography.
Detecting LSB without access to the original image
What if you want to detect modified LSBs without access to the original image? In this case, there is an imperfect solution: statistical analysis of an image for abnormalities.
To show you what I mean, let’s look at two histograms. On the left, is the stego image, on the right, the original.
When it comes to JPEG and PNG steganalysis, you can use stats a few ways. Typically, we compare the LSBs to norms for similar kinds of images (portraits, landscapes, etc). If the data is unencrypted, it’s less random than normal. If it is encrypted, it’s likely more random than a typical PNG or JPEG.
No premade tools do the hard work for you. Often, you will implement algorithms from academic papers. To make things even harder for steganalysts, free and open source stego tools like OutGuess are difficult to detect. They use complex mathematics to select and modify bytes that are very difficult to detect with conventional statistical analysis. OutGuess was a crucial part of the famous Cicada 3301 challenge about a decade ago.
In theory, free tools like StegDetect apply a variety of techniques to detect hidden content. In practice, however, these tools lag behind their steganagraphic counterparts. Download OutGuess here: https://www.rbcafe.com/software/outguess/
I hope, this gave you a taste of the world of JPEG and PNG steganalysis. Like steganography itself, steganalysis is more art than a science. Nevertheless, starting with the techniques above will set you on a path to defeat basic stego strategies and understand the kinds methods for attacking more complex stego.