Saturday 14 October 2017

Monkey takes a .heic

The hills are alive ... with the compression of H.265!

With iOS 11 and macOS High Sierra (10.13), Apple has introduced a file container format called High Efficiency Image File Format (aka HEIF - apparently its pronounced "heef"). Apple is using HEIF to store camera/video/Apple "Live Photos". HEIF is based on multiple standards such as:
- ISO Base Media File Format ISO (14496-12) for structuring data sections within the file container
- ISO/IEC 23008-12 MPEG-H Part 2 / ITU H.265 for compressing the actual still picture and video data. Also referred to as High Efficiency Video Coding (HEVC). Theoretically, HEIF could use other compression algorithms but Apple is using it exclusively with HEVC / H.265.

Some benefits of HEIF are:
- It approximately halves the file size for a given image/video quality.
- It allows for a single file to contain multiple media (eg multiple animated still pictures AND sound e.g. an Apple "Live Photo").

Apple HEIF images will have a .heic file extension. Apple HEVC encoded movies will have the familar Quicktime .mov extension but internally they will use HEVC / H.265 compression. The ISO Base Media File Format ISO (14496-12) is based upon the Quicktime file structure and so it will apply to both .heic images and HEVC .mov files. 

Because it uses a more complex compression algorithm than previous standards (eg H.264 and JPEG), only recent model Apple devices have the required hardware to create HEVC content.
According to Apple's 2017 WWDC presentation 503 "Introducing HEIF and HEVC", to create HEVC pictures/video you need (at least) an iPhone 7 / iPad Pro (A10 Fusion chip) running iOS 11 or a 6th generation Intel Core processor running macOS 10.13 High Sierra.
Software decoding support is apparently available for all Apple devices (presumably running iOS 11 / High Sierra) but playback performance will probably suffer on older hardware.

For the rest of this post we will discuss:
- how to view .heic and HEVC .mov files
- the file format for .heic files
- the file format for HEVC .mov files

We won't be discussing how HEVC / H.265 compression works. For a quick overview on some basic concepts and the difference between H.264 and H.265, please watch this video.

And before we dive any deeper ...
Special Thanks to Maggie Gaffney from Teeltech USA for providing us with iPhone 8 Plus test media files.
We also used sample .heic files from an Ars Technica review (iPad Pro) and sample files provided to the FFmpeg forum (iPhone 7 Plus).

Viewing & Compatibility Issues

Here is an article showing how to set up an iOS 11 device to save/transfer .heic files in their original format (Camera set to "High Efficiency" and "Photos - Transfer to Mac or PC" set to "Keep Originals"). Apple can auto-magically convert .heic files to .jpg files (and h.265 .mov to h.264 .mov) when transferring to non-compatible devices/destinations (eg PC or emails). So if you're not receiving .heic files, check those iOS settings.

Apart from viewing them natively on iOS or High Sierra (eg using Apple Photos or Preview), we found the easiest way to view .heic files was using this free Windows HEIF utility by @liuziangexit.
Note: there are two versions - Chinese and English. Being the uncultured lapdog monkey that we are, we downloaded the English version. Be sure to read the readme file included. Running it on Windows 10 (VM) also required installing the signed Microsoft C++ Redistributable package which was conveniently included in the download zip file.



There is also a website that converts .heic to .jpeg but this may not be appropriate for sensitive photos.

For playing HEVC/H.265 encoded .mov files, we found that IrfanView and VLC player worked OK (IrfanView seemed to have better performance than VLC when viewing high resolution videos).

FFmpeg (v3.3.3) can also be used to screen grab frames  (1 per sec) from a H.265 .mov. The command is:
ffmpeg.exe -i sourcemovie.mov -vf fps=1 outputframe_%d.png
This will result in "outputframe_1.png", "outputframe_2.png" etc. being generated to the current directory.

For more compatible playback, we can convert an H.265 .mov into an H.264 .mov. The command is:
ffmpeg.exe -i source265movie.mov -map 0 -c copy -c:v libx264 outputmovie264.mov
This copies all other streams (eg audio, subtitles) to the new output file and re-encodes/outputs the source video stream to H.264. See here for details on using the FFmpeg map argument.

We found the easiest way to send a test .heic from an iPhone to a PC was to upload it to Dropbox which has been updated to support .heic and H.265 encoded .mov files. You can view both .heic and .mov files from the Dropbox.com website. Unfortunately, it appears that Dropbox might rename the files upon upload. We were expecting to see something like "IMG_4479.heic" but the filename on Dropbox was something like "Photo Oct 08, 10 20 05.heic". Consequently, a hash compare of the source/destination files may be required to verify exact copies.

Exiftool (v10.63) added improved support for HEIF and it will display the EXIF data from an Apple generated .heic or H.265 .mov file. It has not been confirmed if iOS created .heic / HEVC .mov files will retain ALL of their original EXIF metadata after being auto-magically converted to .jpg / H.264 files.

We have not been able to find a non-Apple viewer for HEVC encoded "Live Photos". Trying to transfer them via Dropbox resulted in a "Live Photo" .heic file containing a single image (no sound or other animation). Sorry, no "Live Photos" for you!

File Structures

Now that we know how we can view iOS created HEIF images and videos, lets take a closer look at the actual file formats.
This will be a (reasonably?) short overview - we aren't going to become "data masochists" and delve into every field or the compression side of things. Maybe in a future post (especially if you've been a bad, bad, dirty, dirty monkey and feel the need to be punished LOL) ...

Apple created .heic and .mov files are BIG Endian.

Both .heic and .mov files are based on the ISO Base Media File Format. This means a .heic or .mov file container is divided into dozens of functional "boxes" of data. The start of each box will be marked with a 4 byte box size (typically) and a four byte box type string (eg. 'ftyp', 'mdat', 'meta'). Within the box, there will be other data fields which may consist of other boxes and/or a structured pattern of bytes. So there is a complicated hierarchy of boxes within boxes thing happening which makes it difficult to quickly understand every detail. The majority of the bytes (ie compressed data) will be stored in an "mdat" box. Other boxes will be used to store meta data about how to access/treat the data in those "mdat" boxes.
For further details on how these boxes are structured, please refer to the ISO Base Media File Format standard. Both it and the Quicktime movie format document will be your best friends for this section. FYI the ISO Base Media File Format is also used for .mp4 and .3gp files - so learning about this format will aid in understanding multiple types of media files.

Other handy references include:
- Chapter 3 of Lasse Heikkila's HEIF implementation thesis
- the Nokiatech HEIF Github site
- the 2017 Apple WWDC HEIF presentations (follow the transcript and slide PDF links) for the HEIC File Format and the Intro to HEIF amd HEVC.

For an Apple iPhone 8 Plus .heic file (containing a single 4032 x 3024 image) the file structure can look like this:

ftyp (size=0x18, majorbrand = 'heic', minorversion = 0, compatiblebrands = mif1, heic)
meta (size = 0xF74)
    hdlr (size = 0x22, handler_type is "pict" i.e. file is an image)
    dinf (size = 0x24)
    pitm (size = 0xE, item_ID = 0x31 = primary item)
    iinf (size 0x43D, entry_count = 0x33 = number of items stored)
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x1, item_type = hvc1, item_name = ""  [Tile 1]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x2, item_type = hvc1, item_name = ""  [Tile 2]
        ...
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x30, item_type = hvc1, item_name = "" [Tile 30]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x31, item_type = grid, item_name = ""  [derived image from all tiles]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x32, item_type = hvc1, item_name = ""  [thumbnail]
        infe = ItemInfoEntry, size = 0x15, version = 2, item_ID = 0x33, item_type = Exif, item_name = ""  [EXIF]

    iref (size = 0x94, version = 0, contains array of SingleItemTypeReferenceBox)
        dimg (size = 0x6C, from_item_ID = 0x31, reference_count = 0x30, to_item_ID = 0x1, 0x2 ... 0x30) [derived image]
        thmb (size = 0xE, from_item_ID = 0x32, reference_count = 0x1, to_item_ID = 0x31) [thumbnail]
        cdsc (size = 0xE, from_item_ID = 0x33, reference_count = 0x1, to_item_ID = 0x31) [content description ref / exif]

    iprp (size = 0x6F3)
        ipco (size = 0x5AD) = ItemPropertyContainerBox = property data*
            colr (size = 0x230) = Colour Information 1
            hvcC (size = 0x70) = decoder configuration 1
            ispe (size = 0x14) = spatial extent 1-1
            ispe (size = 0x14) = spatial extent 1-2
            irot (size = 0x9) = Image rotation 1
            pixi (size = 0x10) = Pixel information 1
            colr (size = 0x230) = Colour Information 2
            hvcC (size = 0x70) = decoder configuration 2
            ispe (size = 0x14) = spatial extent 2-1
            pixi (size = 0x10) = Pixel information 2
        ipma (size = 0x13E) = Item Property Association = connects property data in ipco to item numbers*

            List of 0x32 items. Each item has the structure [item number(2 bytes), size (1 byte), data (size bytes)]
    idat (size = 0x10)
    iloc (size = 0x340, version = 1, offset_size = 4, length_size = 4, base_offset_size = 0, index_size = 0, item_count = 0x33 )
        [item_id = 0x1, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X1, extent_length = Y1]
        [item_id = 0x2, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X2, extent_length = Y2]
        ...
        [item_id = 0x33, file offsets used, base_offset = 0, extent_count = 0x1, extent_offset = X33, extent_length = Y33]

mdat (size = variable, contains data on EXIF / thumbnail / image data)

It looks a little daunting (and this doesn't even show all of the boxes/fields!) but once you figure out which fields are relevant, its not too bad.  We've color coded some sections to make it more followable/wake up those weary eyes.

The ftyp section declares the 'majorbrand' (i.e. file type) as "heic".
The meta section declares how to interpret the raw data stored in the mdat section. Notable sub-boxes include:
    hdlr = The 'handler_type' is set to "pict" which means this is an image (as opposed to a video).
    pitm = Specifies the Primary Item number (eg item_ID 0x31)
    iinf = Contains a list of ItemInfoEntrys. The number of items and sizes will change with resolution/shape (eg camera specs, square photo).  From the 2017 WWDC 513 presentation and actual iOS samples we've observed, images are divided/stored as tiles.
            For a 4032 x 3024 resolution image, there were 0x33 items declared in each .heic file. These consisted of:
            0x30 items with each item_type = 'hvc1'. Each item corresponds to a 512x512 tile.
            1 'grid' item represents the full derived image
            1 'hvc1' item is used for the 320x240 thumbnail
            1 'Exif' item is used for storing EXIF data
    iref = contains array of SingleItemTypeReferenceBox items. From this section we can see that item_ID = 31 is a derived image ('dimg') that refers to item_IDs 0x1 to 0x30 (tiles). There are also references to the thumbnail and exif items.
    iprp = connects item_IDs in the 'ipma' sub-section to properties in the 'ipco' sub-section. *We were unable to find much public documentation on how this is implemented (apart from the Nokia HEIF Github source code).
    iloc = contains file offsets for each item_ID section. e.g. For EXIF (item_ID = 0x33), the extent_offset = 0x000043DB,  extent_length = 0x000007F8. So if we go to the file offset at 0x000043DB, we will see the EXIF item data.
The mdat section contains the raw image data, thumbnail and Exif information.

Due to the tiling, full file recovery could be a bastard a lot more complicated compared to recovering a jpeg (where you can carve everything between the 0xFFD8 and 0xFFD9 markers).
As iOS 11 also uses file based encryption, it *should* be impossible to carve & recover .heic files anyway.
However, if those .heic files were also copied to a separate non-encrypted device (eg PC) and then corrupted/deleted, it *may* be possible to repair or recover some/all of the tiles (theoretically!).

OK, suck it up buttercups ...because there's more!

Here's the file structure for a 6.73 second 7.3 MB Apple HEVC / H.265 encoded .mov taken with an iPhone 8 Plus:

ftyp (size = 0x14, majorbrand = 'qt  ')

wide (size = 0x8)
mdat (size = 0x00746120, contains HEVC / H.265 video data)
moov (size = 0x0028FA)
    mvhd (size = 0x6C, version = 0, creation_time = 0xD5FFE81E (secs since 1JAN1904), modification_time = 0xD5FFE825, ­

                timescale = 0x0000258 = 600 dec. units per sec, duration = 0x00000FC7 = 4039 dec units => 4039/600 = 6.73 secs, 
                next_track_ID = 0x5)
    trak (size = 0x0FE6)
        tkhd (size = 0x5C, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x1, 
                  duration = 0xFC7, width = 0x07800000 => 0x780 = 1920 decimal, height = 0x04380000 => 0x438 = 1080 decimal)
        tapt (size = 0x44)
        edts (size = 0x24)
        mdia (size = 0xF1A) = media box
            mdhd (size = 0x20, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x258, 
                        duration = 0xFC7)
            hdlr (size = 0x31, component type = mhlr = media handler, component subtype = vide, component manufacturer = appl,

                     component name = "Core Media Video"
            minf (size = 0xEC1) = contains file offsets to samples/chunks of samples
    trak (size = 0x07B4)
        tkhd (size = 0x5C, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x2, 
                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        mdia (size = 0x72C)
            mdhd (size = 0x20, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x000AC44 =
                        44100 samples/sec, duration = 0x00049000 = 299008 samples = 6.78 sec)
            hdlr (size = 0x31, component type = mhlr = media handler, component subtype = soun, component manufacturer = appl,

                     component name = "Core Media Audio"
            minf (size = 0x6D3) = contains file offsets to samples/chunks of samples
    trak (size = 0x042E)
        tkhd (size = 0x5C, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x3, 
                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        tref (size = 0x20)
        mdia (size = 0x386)
            mdhd (size = 0x20, version = 0, creation_time =
0xD5FFE81E, modification_time = D5FFE825, timescale = 0x258, 
                        duration = 0xFC7)
            hdlr (size = 0x34, component type = mhlr = media handler, component subtype = meta, component manufacturer = appl,

                     component name = "Core Media Metadata"
            minf (size = 0x32A) = contains file offsets to samples/chunks of samples
    trak (size = 0x0271)
        tkhd (size = 0x5C, version = 0, creation_time = 0xD5FFE81E, modification_time = 0xD5FFE825, track_ID = 0x4, 

                  duration = 0xFC7, width = 0, height = 0)
        edts (size = 0x24)
        tref (size = 0x20)
        mdia (size = 0x1C9)
            mdhd (size = 0x20, version = 0, creation_time =
0xD5FFE81E, modification_time = 0xD5FFE825, timescale = 0x258, 
                        duration = 0xFC7)
            hdlr (size = 0x34, component type = mhlr = media handler, component subtype = meta, component manufacturer = appl,

                     component name = "Core Media Metadata"
            minf (size = 0x16D) => contains file offsets to samples/chunks of samples

udta (size= 0x08)
free (size = 0x400)
meta (size = 0x5BD)
    hdlr (size = 0x22, component subtype = mdta)
    keys (size = 0xC9) => contains various metadata field names
    ilst (size = 0xCA) => contains various metadata field values
    free (size = 0x400)

free (size = 0x88)

We can see some familiar 4 letter strings (reckon you might be spouting some others of your own by now ...) and the offset information is now contained in the 'moov' section (recall that offset info is stored in the 'meta' / 'iloc' section for a .heic).
Also, instead of utilising "items" like .heic, the movie is organised into traks (eg video trak, sound trak). These 'trak' boxes include file offsets to the 'mdat' section (via 'trak' / 'mdia' / 'minf').
The 'moov' box has a movie header atom labelled 'mvhd'. This shows the length of the movie and creation/modified dates (amongst other things).
There were 4 traks recorded - one for video (track_ID=1), one for sound (track_ID=2) and two for meta data (track_ID=3 and 4). The second (smaller) metadata trak (track_ID=4) may be extending the first (track_ID=3) metadata trak (due to space limitations?) as the metadata strings are different but seem related.
'free' marks boxes that can be ignored/skipped
'meta' marks a box containing metadata however, the 'meta' structure from a .mov *will not* match the 'meta' data structure from a .heic image. Presumably Exiftool will grab metadata from both 'trak' and 'meta' boxes.

In other observed H.265 .mov files (both smaller and larger), multiple 'mdat' sections were observed. This may be related to the existence of a 'hoov' box which we couldn't find any documentation on. The 'hoov' box appeared at an lower (earlier) file offset than the 'moov' and also contained 'mvhd' and 'trak' boxes etc.
Perhaps the 'hoov' box was a previous 'moov' box that had its name modified so its data can be overwritten? eg as file grows in size, data gets re-written? #SpeculatorMonkey


Final Thoughts

Oh, my aching lederhosen!
And this post has only just scratched the arse surface of the HEIF-y beast.
There are a lot more possibilities with HEIF than what Apple has currently implemented. The Nokiatech Github site demonstrates a bunch of different image file possibilities (eg single images, sequences of images, HD movies, combined images/video).

Although we weren't able to capture a native Apple "Live Photo" for examination, we *suspect* it will use a sequences .heics file and have a 'moov' box in addition to 'ftyp', 'mdat' and 'meta' boxes. This was kinda shown on slide 60 of the 2017 Apple WWDC slides for High Efficiency Image File Format.

This post by macrumors.com states that Apple "Live Photos" initially consisted of a 12 MP jpeg with 45 frames of H.264 .mov at 15 fps (i.e. 3 secs video = 1.5 secs before/after button press).
This anandtech forum article states that "Live Photos" are:
"1440x1080 HEVC on certain devices, albeit paired with HEIC images now instead of JPEG. There is also the option of leaving it has 1440x1080 h.264 with JPEG though."

Anyhoo, if you are able to catch a "Live Photos" unicorn file, we'd be very interested to hear about its file structure (leave a comment?).

UPDATE 15OCT2017:
For "Live Photos" we tried directly connecting an iPhone 8 running iOS 11.0.3 to a Windows 7 PC and was able to see the DCIM folders. The iPhone 8 was set to the default "High Efficiency" Photos with Mac/PC transfer set to "Keep Originals". However, after copying the files over, when we looked at the transferred .heic file structures on the PC they were single images.
There were no 'moov' or 'trak' items. So it looks like iOS is not openly exposing their "Live Photo" file structure to non-Apple devices. Boo! :'(

Finally, if you know of any other easy to install/use .heic viewers or have any thoughts/suggestions, please leave a note in the comments.