Many file formats, besides presenting content, such as JPEG showing a taken photo, can contain metadata. This hidden information can be invaluable during OSINT investigations and penetration tests. Let’s look at some cases and tools useful for extracting these seemingly hidden pieces of information.
Not Cleaning Metadata from Photos
The days are gone when well-known social media platforms did not clean metadata from users’ photos published on profiles. Extracting unremoved GPS coordinates could be a serious privacy breach, such as revealing a user’s home address. Although social media platforms now remove metadata from photos, many applications of a different nature often do not. It is worth paying attention to this aspect during tests to protect users of the tested software from situations similar to what happened to a German TV journalist. She posted such a photo online:
Curious internet users downloaded this photo and checked its metadata. In addition to the usual information, it turned out to contain something more. During the cropping of the above frame, the graphics program saved the original photo as a thumbnail showing a wider frame – revealing far too much.
FOCA – Bulk Metadata Detection in Documents
Extracting GPS coordinates or hidden thumbnails from a photo is “only” a privacy breach. However, when checking documents for metadata, we may come across information that can significantly aid us during penetration tests. Manually searching for such files and reviewing them can take a lot of time. The FOCA (Fingerprinting Organizations with Collected Archives) tool automates this process. For a given domain, FOCA searches for documents such as Microsoft Office, Open Office, or PDF and then extracts metadata from them.
The document metadata itself may contain user logins who created the document. This information can be used to attempt to log into the tested application using that user’s account.
Other important data that may be found there are information about the software used to create the file along with its version. It may turn out that the revealed software is vulnerable to some kind of attack, and it is enough to use a ready-made exploit to further infiltrate the system.
Besides software information, sometimes documents also leave annotations about the version of the operating system. Such a mention also makes penetration testing significantly easier. After all, we now know which system to compile our piece of malicious code for.
Another interesting thing that may be found in document metadata is information about the network location of printers. This can help an attacker map the network infrastructure and attack the printers themselves.
Tool available on GitHub – https://github.com/ElevenPaths/FOCA
ExifTool Scanner – Automation during Penetration Tests
While the previously discussed tool works as a standalone document scanner searching for what has already been indexed by search engines, it will not help us in manual reconnaissance. After all, documents are often generated on demand in the form of a one-time download access. Here, a Burp Suite plugin called ExifTool Scanner comes to the rescue. It will passively monitor traffic for appearing files with metadata and extract useful information for us. In one of the audits, the tested application left information about the tool dynamically generating PDFs and the version of the system it was running on.
Tool available in BApp Store and on GitHub – https://github.com/portswigger/exiftool-scanner
Funny – Removing Metadata Can Be Dangerous
In 2021, GitLab paid $20,000 for discovering a bug that, as a result of removing metadata, led to remote code execution on the server. It was enough to place the following code in the image metadata:
(metadata
(Copyright "\
" . qx{echo vakzz >/tmp/vakzz} . \
" b ") )
The vulnerable version of ExifTool performs a verification in line 31 of exiftool/lib/Image/ExifTool/DjVu.pm, which is responsible for removing attributes using $ (Perl variables) or @ (Perl arrays) to ensure security. This is done because this content is then used in the eval function [9] in line 34, which executes the content as code. To trigger the vulnerable function, a valid DjVu file with an annotation containing the payload must be created, which will be executed by the eval function as Perl code.
More about this vulnerability can be read here: A Case Study on CVE-2021-22204: ExifTool RCE.