domingo, 11 de octubre de 2015

Tutorial: Metadata Analysis (ING)

Most image files do not just contain a picture. They also contain information (metadata) about the picture. Metadata provides information about a picture's pedigree, including the type of camera used, color space information, and application notes. •Finding Metadata

•Extracting Metadata

•Metadata Types (File, EXIF, Maker Notes, IPTC, ICC Profiles, XMP, PrintIM, and other records)

•Advanced Analysis

•Caveats

Finding Metadata

Different picture formats include different types of metadata. Some formats, like BMP, PPM, and PBM contain very little information beyond the image dimensions and color space. In contrast, a JPEG from a camera usually contains a wide variety of information, including the camera's make and model, focal and aperture information, and timestamps.

PNG files typically contain very little information... unless the image was converted from a JPEG or edited with Photoshop. Converted PNG files may include metadata from the source file format.

Extracting Metadata

Viewing metadata requires extracting the information from the file. There are plenty of open source, free, and commercial solutions available. Some of these only support one file type (e.g., JPEG-only), while others support many file formats. In addition, different programs may support different types of metadata.

A few examples of available image metadata tools: •Exiv2 is an open source tool that decodes Exif, IPTC, and XMP metadata. (See Metadata Types for a description of these formats.) This command-line utility is provided as an executable for Windows, or source code for Linux and Mac.

•ExifTool by Phil Harvey is one of the most powerful command-line metadata extraction tools available. It supports hundreds of different file and metadata formats -- including many that are manufacturer-specific. The entire tool is well-documented and written in Perl; it works very well under Linux and Mac, but could be difficult for Windows users who do not have Perl installed.

•Adobe Photoshop is a commercial application that includes an XMP viewer. In Photoshop CS5, it is under File → File Info. While not as powerful or as complete as Exiv2 and ExifTool, Adobe's viewer does provide the ability to decode XMP, IPTC, Exif, and other types of metadata in a graphical interface.

•Preview Inspector. The default Apple Mac OS X picture viewer is called Preview. Preview contains an 'Inspector' to view metadata. This tool displays a small fraction of the available metadata and can provide misleading analysis results. Do not use Preview Inspector for any official work.

•Microsoft Windows Photo Viewer. The default photo viewer under Windows 7 and Windows 8 contains a 'properties' option that lists metadata. However, this tool displays fictional metadata fields that do not exist in the file, omits most fields that do exist, rewrites some metadata values, and renames some of the fields that it displays. Do not use Windows Photo Viewer for any metadata analysis.

There are also plenty of online web sites where you can upload a picture to see the metadata. Virtually all of them use ExifTool or Exiv2 as the back-end data extractor.

FotoForensics uses ExifTool for metadata extraction. In some situations, the ExifTool results are augmented with additional information identified by tools provided by Hacker Factor.

Metadata Types

There are many different types of metadata. Some types are only generated by cameras. Other types are created by specific applications. And a few types of metadata can come from anywhere. The most common types of metadata you are likely to encounter include File, EXIF, Maker Notes, IPTC, ICC Profiles, XMP, PrintIM. However, there are many other types of metadata records.

Common Metadata Blocks and Likely Sources

From Cameras

From Applications

From Either

MakerNotes

PrintIM ICC Profile

IPTC

Photoshop

XMP File

EXIF

JFIF

APP14

File

The file metadata describes the image itself. This includes the type of image (e.g., JPEG or PNG), internal formats, dimensions, and colors. If there is a comment in the file's header, then it is included here. Most digital cameras do not include comments, so the presence of a comment likely indicates that an application processed the image.

EXIF

The Exchangeable Image File format (commonly called Exif or EXIF) is typically used by camera manufacturers to identify information about the camera's settings used for the photo. It typically includes timestamps, camera make/model, lens settings, and more.

For example, this bookshelf picture contains the following EXIF data:

EXIF

Make Hewlett-Packard

Camera Model Name HP PhotoSmart 618 (V1.10)

Orientation Horizontal (normal)

X Resolution 72

Y Resolution 72

Resolution Unit inches

Y Cb Cr Positioning Centered

Exposure Time 1/60

F Number 2.7

ISO 100

Exif Version 0210

Date/Time Original 2007:05:28 12:56:08

Components Configuration Y, Cb, Cr, -

Compressed Bits Per Pixel 1.6

Shutter Speed Value 1/64

Aperture Value 2.8

Exposure Compensation 0

Max Aperture Value 4.0

Subject Distance 0.72 m

Metering Mode Multi-segment

Light Source Unknown

Flash Fired

Focal Length 7.7 mm

Flashpix Version 0100

Color Space sRGB

Exif Image Width 800

Exif Image Height 600

Image Width 96

Image Height 72

Bits Per Sample 8 8 8

Compression Uncompressed

Photometric Interpretation RGB

Strip Offsets 1750

Samples Per Pixel 3

Rows Per Strip 72

Strip Byte Counts 20736

Planar Configuration Chunky

This data identifies the camera's make (Hewlett-Packard) and model (HP PhotoSmart 618). The photo was taken on 28-May-2007. And the subject (the bookshelf) was about 0.72 meters in front of the camera (about 2 feet 4 inches away).

This metadata also gives information about the image itself. For example, it says it uses the standard RGB (sRGB) color space and should be 800x600. When a picture is cropped or scaled, the metadata may not be updated. If the picture is not 800x600, then at least the dimensions of the prior source are known.

Some cameras, such as smartphones, may also include GPS information as a subset of the Exif record. Although this is rare to come across on the web, if it exists then it will be decoded in the Exif block. (ExifTool will also decode this under a "Composite" heading.)

Maker Notes

Besides the standardized Exif information, cameras may also include manufacturer-specific extensions. In general, pictures that do not originate from a camera will not include Maker Notes.

IPTC

The International Press Telecommunications Council (IPTC) standardized the metadata format used for recording information related to press images. Typically this includes the language set (usually UTF-8) and a version number. However, pictures intended for the mass media, such as those provided by Reuters and Getty Images, will usually include attributions such as the photograph's byline, description, location, and much more.

Most digital cameras do not generate IPTC information. Moreover, few cameras offer a means to enter in the photographer's name, photo description, and other details. (The few cameras that do support it make it extremely difficult to the point that virtually nobody uses this in-camera functionality.)

The presence of IPTC information, particularly with detailed text fields, indicates that the file was modified. At minimum, IPTC information was added by software after the photo was created. While this modification does not indicate that the picture was edited or modified, it does indicate that the file, as a whole, is not a straight-from-the-camera original.

ICC Profile

The International Color Consortium defined a color-space transformation system using a set of ICC Profiles. These are used to ensure that colors display the way they were intended.

Although the color "red" has a specific RGB value (255,0,0), it may display differently on different monitors. This is most apparent at TV stores, when they have a wall of television sets all showing the same thing, but some TVs look brighter than others. The same problem occurs when printing; red on the monitor may not look the same as the red produced by an inkjet printer.

Color profiles are used to convert between the raw RGB values and the intended color tone. Technically, two ICC Profiles are required. The first one converts the raw color to a common colorspace, such as XYZ or L*a*b. The second profile converts from the common space to the display device (monitor or printer). When a picture contains an ICC Profile, it includes the first half of this transformation: converting from the raw color values to the common colorspace. When rendered for the screen or printing, a second color profile is applied -- the one for your monitor or printer. (Of course, this all assumes that you are using sofware that supports color profiles, which is usually not the case. Usually color profiles are just ignored metadata.)

With the exception of a few high-end cameras, cameras do not generate ICC Profiles. ICC Profiles are added by applications as the file is edited or converted. Many applications -- including Adobe Photoshop -- will default to adding a color profile to the picture.

Most profiles are generic and not hardware specific. For example, "Adobe RGB (1998)" is the Adobe standard and implies that an Adobe software package was used, while "IEC 61966" (also called "sRGB") is the standard color profile used by everyone else. (IEC is the International Electrotechnical Commission, a standards organization.)

Besides colorspace information, the ICC Profile will include the primary platform. This typically represents the system used to add in the profile. If you see "Apple Computer", then the image was likely edited on a Mac. If you see "Microsoft", then it likely came from a Windows PC. However, do not assume that "Hewlett-Packard" means an HP-computer was used; many platforms, including non-HP systems, use the HP color profiles.

ICC_Profile

Profile CMM Type appl

Profile Version 2.2.0

Profile Class Input Device Profile

Color Space Data RGB

Profile Connection Space XYZ

Profile Date Time 2003:07:01 00:00:00

Profile File Signature acsp

Primary Platform Apple Computer Inc.

CMM Flags Not Embedded, Independent

Device Manufacturer appl

Device Model

Device Attributes Reflective, Glossy, Positive, Color

Rendering Intent Perceptual

Connection Space Illuminant 0.9642 1 0.82491

Profile Creator appl

Profile ID 0

Red Matrix Column 0.45427 0.24263 0.01482

Green Matrix Column 0.35332 0.67441 0.09042

Blue Matrix Column 0.15662 0.08336 0.71953

Media White Point 0.95047 1 1.0891

Chromatic Adaptation 1.04788 0.02292 -0.0502 0.02957 0.99049 -0.01706 -0.00923 0.01508 0.75165

Red Tone Reproduction Curve (Binary data 14 bytes)

Green Tone Reproduction Curve (Binary data 14 bytes)

Blue Tone Reproduction Curve (Binary data 14 bytes)

Profile Description Camera RGB Profile

Profile Copyright Copyright 2003 Apple Computer Inc., all rights reserved.

Within the ICC Profile is the "Profile Date Time" field. This indicates when the profile was initially generated. This does not indicate when the profile was attached to the file. The Profile Date Time field must predate the picture's last save. In most cases, the ICC Profile's date predates the photo by years since it was generated long before the photo was captured.

In some cases, you may also see a profile description that includes the type of device, such as "DELL 2001FP" or "iMac" -- suggesting the hardware used by the person who edited the picture. The profile may be from the manufacturer or specifically created when the user tuned the monitor. For example, a Mac user who calibrates the display will create a new Apple ICC Profile with a date that identifies when the recalibration occurred.

Although many graphics programs will add in color profiles, few will alter the color profile without an explicit action by the user. This means that a picture saved using Apple's iPhoto may include an Apple ICC Profile. If Photoshop edits the same picture on a Windows computer, then it will likely retain the Apple ICC Profile and not reflect the Windows system or Adobe software.

XMP

Not willing to use the existing, standard metadata formats, Adobe created their own Extensible Metadata Platform (XMP). This is an XML (text) block that replicates much of the information found in existing Exif and IPTC records.

XMP is almost exclusively generated by Adobe products. (Exiv2 and Apple's iPhoto and Quicktime programs do generate XMP records, but they are nowhere near as extensive as the ones generated by Adobe applications. Also, Exiv2 and Apple XMP records will lack the "Adobe" name in the metadata.) If an XMP record is present, then it will usually contain a large amount of information about the image. This can include:

•Exif data. The original Exif data will be replicated here. If the picture was cropped or resized, then there may be a disparity between the Exif, Maker Note, and XMP information.

•Tool identification. XMP typically includes the name and version of software that was used to edit the file and when the edits occurred.

•History. XMP records may include a summary of modifications, such as a record of each time the file was saved or converted. This does not specify what happened; it only indicates that something happened. A long history of edits implies that the image was manipulated.

•Sources. When multiple pictures are combined, the XMP block will record this as multiple sources. This does not indicate what was combined; it only indicates that something happened. In addition, if a source is included but then deleted (or included but not used), then the XMP record will show the additional source but not indicate the removal.

The presence of an XMP block usually indicates a resave by an Adobe product. Adobe products automatically modify images, which can result in rainbowing and sharpening along high-contrast edges. These show up during an Error Level Analysis.

PrintIM

The Epson Print Image Matching (PrintIM) data is a proprietary block that provides color enhancement information for Epson printers. This data plays the same role as an ICC Profile.

The creation of the PrintIM metadata block appears to be exclusive to digital cameras. While some graphics editors will strip out the PrintIM metadata when the image is saved, other programs will retain the metadata. This means that the presence of a PrintIM record strongly suggests that the image originated from a digital camera, but this version of the image may not necessarily be a camera-original image.

Other Records

There are plenty of other types of metadata records. Some typically contain nothing more than a version number (e.g., the JPEG File Interchange Format -- JFIF), others contain thumbnail images, and still others contain additional information about the picture or file format (e.g., 8BIM or Photoshop records).

FotoForensics uses ExifTool to extract metadata. ExifTool generates a "Composite" block at the end of each extraction. The composite is not metadata found in the file. Rather, this is ExifTool's high-level summary. It includes the actual image size as well as camera and GPS information from the metadata.

Advanced Analysis

Although metadata does not identify the exact changes made to the picture, it can be used to identify attributes, inconsistencies, additional sources, edits, timelines, and a rough sense of how the image was managed. In effect, metadata provides clues about an image's pedigree.

There is no standard for a required set of metadata to exist in any particular picture. However, known tools generate known metadata fields; some types of metadata are generated during a save and others are appended to existing data. Some may be updated, while others may be retained or removed. By understanding the metadata and when it can appear, an investigator can develop a timeline and identify the order of changes made to the file.

Caveats

Metadata is an invisible component to a picture. While not required for rendering the picture, it does exist in the image file. Although metadata can provide a wealth of information about a photo, it also has a number of limitations. These include: •Timestamps. Many cameras and computers do not regularly synchronize clocks. As a result, times can drift. If the camera is transported across time zones, then the local time can be significantly different from the camera's clock. Also, it is common for metadata timestamps to omit time zone information.

•Misleading. Metadata consists of field-value pairs. The field defines the value's purpose. Don't be confused by a field that is vendor-specific. For example, Kodak defined a large number of metadata extensions. Each of these records will have "Kodak" in the field name. That only means that the format was defined by Kodak, not that the data came from a Kodak camera. The value of these fields may identify a non-Kodak system.

•Inconsistent. Not all manufacturers follow the industry standards. Many Kodak cameras set the 'Make' to 'Digital Camera' rather than 'Kodak'. Some manufacturers use the same firmware on multiple camera models, so the 'Model' may contain a list of cameras (and the list may not even include the actual camera model). A few camera models leave all of this information blank. Since many cameras have unique quirks, an inconsistent EXIF field may be due to a specific camera make/model and not an indication of tampering or modification.

•Stripped. Resaving an image may strip some or all of the metadata information. Some people strip metadata to obscure the image's origins. However, this really raises questions when a picture is supposed to be original, or has been processed by an application that should always leave metadata. In addition, stripping metadata can impact how the image is rendered.

•Hosting. Many online picture sharing sites strip out metadata, including Facebook, Twitter, and Imgur. Other sites, like Photobucket and Google's Picasa, may alter the metadata. Even if the picture was unaltered when it was uploaded, the hosting site may have stripped out metadata. As a result, the metadata may not identify if it came from a digital camera.

•Faked. Although uncommon, people who want to create fake photos (such as pictures of UFOs, ghosts, and dead celebrities) may attempt to edit the metadata. Since many metadata fields are plain text and not part of any cryptographic checksum, simple edits to the metadata may not be detectable.

•Residues. Adobe has a known problem with metadata. If you load a photo into Photoshop, paste over it with a different photo, and save it, then the new photo will retain the original metadata. This can generate misleading information since the metadata will not match the new picture.

•Scanners. Pictures from scanners may appear to come from a camera or an application, depending on how it was scanned. If the picture was captured using internal firmware from a standalone scanner, then it will appear more like a camera. However, scanners accessed through a computer program, such as Photoshop, will produce images that appear to come from an application.

Metadata analysis is one of many different types of analysis. The interpretation of results from any single analysis method may be inconclusive. It is important to validate findings with other analysis techniques and algorithms.