JPEG and PNG steganalysis - introduction to breaking image stego

Stegonagraphy is the art of hiding information in plain sight. Unlike crypto, which hides data by protecting it with math so only the intended reader can read it, stego hides data by making it appear as though you are not hiding anything at all. Steganalysis is the art of detecting and breaking steganography. In this article, we’ll focus on JPEG and PNG steganalysis specifically.

To demonstrate how this works concretely – imagine a young person wants to share a file that contains a graphic video game with their friend. However, their parents read all of their emails, and they don’t want their parents to see the game.

So the kid comes up with a strategy – what if he could take an innocuous file, like an image, and /hide/ the video game file inside of it? This is not hypothetical. There are a variety of techniques available for hiding files, secret messages, and encrypted text inside ordinary JPEG image files.

Moreover, these techniques appear at the heart of some of the most pivotal cultural moments in hacking history, as we’ll see later.

Let’s dive in and look at how JPEG and PNG steganalysis works, common ways that people hide content in images, and how you can detect and defeat these techniques.

Appending archives to JPEGs

The simplest way to hide a file in an image file is to put it in a zip archive and append it to a JPEG. Simply running the following commands is enough:

$ ls
cat_meme.jpg hacking_tools.zip
$ cat hacking_tools.zip >> cat_meme.jpg

The new cat_meme.jpg file appears normal when opened as an image. But if changed to a .zip and opened, it allows you to access the original files. The same trick works for .rar files, perfect for easy password encryption.

The hacker tool Dangerous Kitten was famously shared by Anonymous on imageboards like 4chan.org/g/ and 711chan.org/i/ using precisely this technique. A set of hacking tools was appended to the following JPEG:

You can even see the original page where Anonymous hosted and documented the tool via the Wayback Machine: Dangerous Kitten – /i/nsurgency W/i/ki

Detecting this is as simple as checking whether data appears, appended after the image data ends. The JPEG file format uses the byte 0xffd9 to indicate when image data ends within a file. So we can simply see what’s written beyond that byte, but how to do so?

Using afterjpg to extract appended JPEG data

Luckily, the afterjpg tool does precisely this! Observe:

$ afterjpg stego_cat.jpeg > secret.zip
$ unzip secret.zip 
Archive:  secret.zip
 extracting: hi                      
$ cat hi
this is content

Marvelous! This tool is easy to use for detecting appended data en masse because it actually raises an exception if no appended data is found:

$ afterjpg Downloads/girl.jpeg 
Traceback (most recent call last):
  File "/Users/eliana/detect_appended_jpeg.py", line 26, in <module>
    raise FileNotFoundError('No data appended to JPEG!')
FileNotFoundError: No data appended to JPEG!

Thus, for example, we could scan an entire directory for JPEGs with a few lines of Bash.

for i in *jp*g; do 
    afterjpg "$i" &> /dev/null;
  if [ $? -eq 0 ]; then 
    echo "$i";
  fi
done

Let’s move on to the next JPEG steganography technique we’ll want to detect.

Exif data in JPEG and PNG steganalysis

Exif data is a kind of metadata for images. JPEGs support Exif. Unbeknownst to many would-be photographers, Exif is often detrimental to privacy – including the make and model of the device the photo was taken with, or the GPS coordinates where the photo was taken. In fact, this is precisely how the hacker w0rmer was caught. He posted photos of his girlfriend on his Twitter page without scrubbing the Exif data, which enabled law enforcement to find his girlfriend, and subsequently, him.

Slightly more recently, and perhaps famously, John McAfee was arrested by Guatemalan authorities using Exif data in a selfie he took with a reporter from Vice. Here’s the original photo, including the Exif data:

Let’s read the Exif data and find the GPS coordinates where this photo was taken, the same way law enforcement did. We can do this in three ways:

Command line apps like ExifTool
Web applications like https://exifable.com/
Desktop photo applications like the MacOS Photos app

Any of these work, but since I’m on a Mac, let’s see what happens when we open the photo’s metadata via the Photos app.

Not only do we get the type of phone and date, we get the GPS coordinates where the photo was taken! You can also easily remove Exif data from an image. However, scrubbing Exif from JPEGs is sometimes not as simple as deleting it, because Exif contains data about how an image should rotated before viewing.

Scrubbing the Exif data

To scrub Exif data with a script, we’ll need to rotate the image accordingly, /then/ delete the Exif data. For example, S3 Exif Cleaner does this in this code snippet

            # Download image as raw bytes
            img_bytes_exif = BytesIO()
            img_bytes_exif.seek(0)
            bucket.download_fileobj(obj.key, img_bytes_exif)

            # Transpose image per EXIF data, then scrub
            img_bytes_NO_exif = BytesIO()
            img_ops = Image.open(img_bytes_exif)
            #&nbsp;exif data includes orientation image
            # without which the image may end up rotated
            # incorrectly. So we rotate the image per
            # the exif data before scrubbing it.
            img_ops = ImageOps.exif_transpose(img_ops)
            # img_ops.save implicitly excludes EXIF data
            # unless you explicitly tell it not to via the
            #&nbsp;exif parameter
            img_ops.save(img_bytes_NO_exif, format='jpeg')

            # Overwrite original image w/ scrubbed version
            img_bytes_NO_exif.seek(0)
            s3.meta.client.upload_fileobj(img_bytes_NO_exif, args.bucket, obj.key)
            scrubbed_files_count += 1

source: https://github.com/seisvelas/S3-Exif-Cleaner/blob/main/s3_cleanse.py

Exif data is well-known, easy to detect, and easy to remove, making it a suboptimal place to hide content. Nevertheless, this technique is commonplace in CTF-style games. Let’s start getting more sophisticated.

Modifying the least significant bit for JPEG and PNG steganalysis

An image file will typically contain a section that uses a series of bytes to represent the color, darkness, and so on of a map of pixels representing the image itself. One way to hide a message is to use the very last bit in each byte, and spell our message with these bits. This changes the image ever so slightly – but since the last bit in a byte is the least significant one, the image only changes ever so slightly.

So imagine we had this binary data:

10001000
10100001
11011000
10111000
01001001
11001011
11011000
11001011

We would decode this by looking only at the last bit in each byte, which gives us a new number: 01001101. This is the binary equivalent of the decimal number 77, which represents the letter M in ASCII.

Unlike our previous examples, this time we’ll use a PNG image. Although LSB (least significant bit) steganography works with JPEGs, the lossy nature of JPEG means that the least significant bit is often discarded during compression. An example of a tool that implements LSB stego in PNGs is LSB-Steganography.

Let’s try it out. We can use a JPEG for the input and the tool will output a PNG for us.

$ python3 LSBSteg.py encode -i milady.png -o altered_milady.jpg -f secret
$ Output file changed to  altered_milady.png

Let’s see the file it created and compare it to the original:

Looks pretty similar to me! However, if we look at the size of the files…

$ ls -lh *milady.png
-rw-r--r--@ 1 eliana  staff   311K Oct  1 00:32 altered_milady.png
-rw-r--r--@ 1 eliana  staff   297K Oct  1 00:30 milady.png

We see that the altered file is substantially larger! (well, considering that the file we’ve hidden in it is less than 30 bytes). Speaking of, let’s decode it and access the secret file we hid inside.

Accessing the original secret

$ python3 LSBSteg.py decode -i altered_milady.png -o revealed_secret.txt
$ cat revealed_secret.txt 
the secret is: i love you

Of course, given the risk of detection, steganographers should also encrypt such data before concealing it within the image.

So it works…but remember how the files were different sizes? This is an effective way to defeat LSB. Simply compare the stego image to the original. Of course, you need access to the original image. One way to acquire the original is to use a reverse image lookup service like Tineye.

Searching the altered image quickly gives us the original, which we can compare using ls -lh just like before.

Comparing filesize is unwieldy. Also, it’s possible to change LSBs without even changing the filesize. To address this, you could compare hashes. Getting a hash of an image is easy.

➜  ~ sha1sum milady.png
280bfa02108a01976da228d0ff316cc40867b696  milady.png
➜  ~ sha1sum altered_milady.png
c4c1286632c6556e75508675193fc185f5245a3d  altered_milady.png

This has a variety of use cases. For example, the film industry finds pirated files using hashes. They search for hashes of known pirated copies. Thus, pirates often alter the files slightly before distributing them. This creates a unique hash. State agencies also have big databases of known image hashes to detect steganography.

Detecting LSB without access to the original image

What if you want to detect modified LSBs without access to the original image? In this case, there is an imperfect solution: statistical analysis of an image for abnormalities.

To show you what I mean, let’s look at two histograms. On the left, is the stego image, on the right, the original.

When it comes to JPEG and PNG steganalysis, you can use stats a few ways. Typically, we compare the LSBs to norms for similar kinds of images (portraits, landscapes, etc). If the data is unencrypted, it’s less random than normal. If it is encrypted, it’s likely more random than a typical PNG or JPEG.

No premade tools do the hard work for you. Often, you will implement algorithms from academic papers. To make things even harder for steganalysts, free and open source stego tools like OutGuess are difficult to detect. They use complex mathematics to select and modify bytes that are very difficult to detect with conventional statistical analysis. OutGuess was a crucial part of the famous Cicada 3301 challenge about a decade ago.

In theory, free tools like StegDetect apply a variety of techniques to detect hidden content. In practice, however, these tools lag behind their steganagraphic counterparts. Download OutGuess here: https://www.rbcafe.com/software/outguess/

I hope, this gave you a taste of the world of JPEG and PNG steganalysis. Like steganography itself, steganalysis is more art than a science. Nevertheless, starting with the techniques above will set you on a path to defeat basic stego strategies and understand the kinds methods for attacking more complex stego.

Introduction to JPEG and PNG steganalysis