Friday, April 29, 2011

Creating ZoomWalks

This post will tour the whys and hows of the ZoomWalk video clips that were published in this post. It was a learning experience for me from both a photographic and a technological (software) point of view, one which I'm certain hasn't yet reached its end.

Photography Tips

The heart of the ZoomWalk is the individual photographs that compose it. Hundreds of photographs. Photographs which should make sense when lined up one after the other in a video. Here, in no particular order, are some of the points I discovered.
  • Walking into the sun is a bad idea. The photographs aren't as good, and it's extremely difficult to see the camera display, which means the shots won't line up as well as they should. Squinting can give you a headache.
  • Take advantage of any 'guide lines' or autofocus marks on the camera display to line up your shots: to keep the horizon consistent (avoiding rotation), to avoid sliding off to the right or the left or top or bottom, and so on.
  • It is helpful to find a distant object to use as a benchmark, and try to place it in the same spot in the series of photos. Closer objects are unsuitable because they are supposed to be moving quickly to the side of the frame anyway.
  • Plan the transition from one benchmark object to the next, so your frame doesn't suddenly jump up or down or to one side or another.
  • Sometimes (in the woods) there won't be a suitable distant benchmark that's visible. Do your best and keep going.
  • You don't need 10 megapixel images to create a 1280x720 video. Four would be plenty, and I got by with 3.2. Smaller pictures will be easier for your computer to cope with.
  • Use a variable number of steps based on what's going on. I started with 4 paces between each shot, and my paces are about 2¼ to 2½ feet. Use fewer when visually interesting things are happening (passing over bridges, taking stairs). A few more steps per photo are OK when passing through straight,  less interesting parts.
  • Take more shots when going around curves and corners. The sharper the turn, the more shots are needed to maintain continuity. Here you need to keep the horizon consistent, but the view in subsequent pictures will naturally shift from frame to frame until it settles into 'straight forward' for the new direction.
  • Be patient. Yes, you're taking hundreds of photos, but you'll save yourself time and grief at the computer if the photos are better aligned to start.
Other thoughts:
  • When going around a curve or corner, mimic what the human eye does and look through the curve before you arrive there. We don't lock our eyes on the last section of straight ahead trail, but tend to look ahead along the upcoming curve. (Thanks Joan!)
  • When going down a hill or stairs, the far horizon seems to rise to the human eye. Keeping it at a constant height in the image (as a benchmark) may leave that segment appearing unnatural ... this needs more investigation!
  • When passing by or through interesting sights -- a bridge or interesting building -- stop and pan to the sight, so that the viewer appears to pause and regard the view.
  • Take a spare, fully charged battery along. The camera will never go to sleep with the steady taking of pictures, so the battery may drain faster than you anticipate.
  • It would be helpful if the camera had a bubble level or inclinometer, to avoid small rotations of the frame. The iPhone has at least one such app.

Initial Renaming

Now I have a series of still photographs. What do I do with them? It starts out similarly to my discussion on time-lapse videos: for purposes of further processing, rename the images in sequential order starting with '1', in a subdirectory to avoid messing with the originals. I use a simple shell script. (All these scripts are written for the Bourne shell in a Linux/Unix environment, but they should be adaptable.)
let ix=0
/bin/rm -f fr*.jpg *.tiff

# reduce input pictures to a more manageable size that is
# still sufficent to hold a 1280x720 image after trimming
# due to rotations and translations.
for i in ../P*.JPG
    let ix=$ix+1
    fname=`printf "fr%04d" $ix`
    convert -scale 1602x1066 -crop 1600x1065+1+1 $i $fname.jpg
    echo $fname.jpg
This script first scales to 1602x1066 and then crops to 1600x1065 because I shot the originals in 3:2 format, and scaling alone had rounding errors (only 1598 wide). We'll see later why you need extra pixels which will be discarded later to produce a final 1280x720 video. The convert command used is part of the imagemagick package.


Unless your photography was extremely skillful, using the images directly will result in a shaky video because of misalignments between frames. It's not your fault, because you can only align as well as your camera display will allow. I prefer to tweak the alignments wherever reasonable; that is, where the benefit is greater than the work. This is where 95% of the effort lies, and so I was curious if there was any way to automate it. The answer is, sort of ... but you must always click through the video (view it frame by frame) to look for errors.

The tool I discovered (initially through this link) was align_image_stack. This command line tool is part of the larger hugin project. It is designed to align multiple images of the same subject taken at different exposures, a preprocessing step for High Dynamic Resolution (HDR) imaging. As such, it aligns images that are expected to be almost the same. My ZoomWalk images are different from each other, some very much so (corners), some not so much. Also, objects close to the edge of the frame move quickly. How well would align_image_stack work for my ZoomWalk pictures?

The first problem was to determine how good a job align_image_stack thought it did. I had to intercept the status printouts from the command and identify the final RMS (root mean square) error that align_image_stack reported. If it was too large, it was likely though not certain that align_image_stack had overcorrected.

My first script compared successive images; that is, it would:
  • compare #1 with #2
  • based on the rms, either accept the modified #2 or stick with the original
  • compare #2 with #3
  • etc
After looking at the results, I felt that this wasn't working. For one thing, because the replaced frames were changed, comparing a changed frame N with an unchanged frame N+1 allowed small corrections to accumulate from frame to frame. For example, given this method, here are frame #1 and frame #10:
To avoid this problem, I modified the script; if image N was successfully modified, the original image for N+1 was automatically chosen next -- without even running align_image_stack. This prevents accumulation of errors, but means that no more than 50% of the images can be automatically aligned!

Another problem was that, because align_image_stack assumes that the pictures are intended to be identical, it would struggle against the view turning around a curve or corner, even to the point of skewing the perspective:

Because align_image_stack works by selecting control points to compare between frames, it is possible, if the control points are poorly chosen (for whatever reason), for things to go very wrong:

To catch these mistakes, I added a test for the percentage change in the median brightness of the image. If the change was too large, because of large black background areas being added, the script would stick with the original.

Align_image_stack has an option for compensating for small variances in magnification of the images (-m option). I thought this sounded promising for the ZoomWalk photos, because the central part of the image would be undergoing, in effect, magnification. However, it didn't work out as I had hoped:

Sometimes an alignment would be rejected by these tests that, on visual inspection, appeared reasonable. Therefore, the script saves rejected images for later inspection, and in two groups, one for RMS rejections and one for percentage brightness rejections. As I said earlier, you must always inspect the results of the alignment process.

There are two possible enhancements to the alignment script that I still need to investigate:
  • instead of comparing image N with image N+1 and accepting or rejecting the realignment of N+1, and skipping any possible alignment of N+2 if the changes for N+1 are accepted, compare image N with N+1 and accept or reject the changes for N. Then image N+1 is untouched and we can compare N+1 with N+2, etc. This way all, rather than half, of the images could be potentially realigned by align_image_stack. (Technical note: the script would have to list N+1 as the first image, and N as the second, because align_image_stack leaves untouched the first image in the set it is working on.)
  • to reduce the impact of a single poorly taken photo, experiment with comparing three images at a time rather than two. Compare N, N+1, and N+2, accepting or rejecting the changes to N. Then move on to compare N+1, N+2, and N+3.
If I could request one enhancement to align_image_stack -- and I realize it wasn't intended for the uses to which I put it -- it would be to have an option to scale back its changes. For example, a way to say that small changes should be adopted in their entirety, moderate changes should be applied but only by half, and large changes shouldn't be applied at all.

Here is the current state of the alignment script:


# a subroutine to perform align_image_stack and return a whole number
# approximation of the rms value in the align_image_stack output.
    /bin/rm -f rms*.tif foo.jpg

    frms=`align_image_stack -s 2 $1 $2 -a rms 2>&1 | grep after | \
        tail -1 | awk ' { print $4 } ' - `
    # if no return value, or no tif file, align_image_stack couldn't do
    # anything with this pair.
    if [ -z "$frms" -o ! -f rms0001.tif ]
        echo "99"

    rmsi=`echo $frms | cut -f1 -d'.' `
    if [ $rmsi -gt 0 ]
        rmsd=`echo $frms | cut -f2 -d'.' | cut -c1`
        # set rounding-up threshold to suit
        if [ $rmsd -ge 5 ]
            let rmsi=$rmsi+1
    echo $rmsi

# a subroutine to calculate delta percentages using dc to return a
# floating point number.
$2 -
100 *
$1 / p
echo "$pct"

/bin/rm -f algn*.jpg reject*.jpg

# align the frames
let i=1
fname=`printf "fr%04d" $i`
aname=`printf "algn%04d" $i`
cp $fname.jpg $aname.jpg

let i=$i+1
aname=`printf "algn%04d" $i`
xname=`printf "fr%04d" $i`

# create an all-black image for background, at the same size as
# what we're getting in our input frames.
convert -size 1600x1065 xc:black blackback.tif

while [ -f $xname.jpg ]
    # do NOT even bother accepting two image_align_stacks in a row!
    if [ $repeat == false ]
        echo compare $fname.jpg $xname.jpg
        br=`identify -format "%[mean]" $xname.jpg`
        rms=`pair_rms $fname.jpg $xname.jpg`
        if [ $rms -gt 1 ]
            echo "  rms is $rms"
            echo "  cp $xname.jpg $aname.jpg"
            cp $xname.jpg $aname.jpg
            # save rejected realignment, if it exists, for manual inspection.
            if [ -f rms0001.tif ]
                rname=`printf "reject%04d-rms" $i`
                convert rms0001.tif $rname.jpg
            # composite -geometry +0+0 rms0001.tif blackback.tif foo.jpg
            # aft=`identify -format "%[mean]" foo.jpg`
            # pct=`pct_brt_delta $br $aft`
            # echo "  brightness delta would have been $pct%"
            echo "  rms is $rms, check brightness delta"
            # put image over a black background to prevent passing transparent
            # pixels to ffmpeg later on.
            composite -geometry +0+0 rms0001.tif blackback.tif foo.jpg
            aft=`identify -format "%[mean]" foo.jpg`
            pct=`pct_brt_delta $br $aft | cut -f1 -d'.'`
            if [ -z "$pct" ]
            if [ $pct -ge 7 ]
                echo "  brightness delta would have been $pct%"
                echo "  cp $xname.jpg $aname.jpg"
                cp $xname.jpg $aname.jpg
                # save rejected realignment for manual inspection.
                rname=`printf "reject%04d-brt" $i`
                cp foo.jpg $rname.jpg
                echo "  brightness delta is $pct%"
                echo "  cp foo.jpg $aname.jpg"
                cp foo.jpg $aname.jpg
        echo "cp $xname.jpg $aname.jpg (no repeats)"
        cp $xname.jpg $aname.jpg

    let i=$i+1
    xname=`printf "fr%04d" $i`
    aname=`printf "algn%04d" $i`

To manually adjust/align a frame, I use the GIMP. Open the first frame normally, with File -> Open. If the Layers window isn't open, start it with Windows -> Layers. Then open the second frame as a separate layer with File -> Open as Layers. Set the Opacity of the new layer to roughly 50%. This allows you to see through to the other layer, and you can move either layer to create the alignment you want. Before saving the modified layer, delete the other layer (you don't want it to be part of the frame!), set the Opacity back to 100%, and invoke Layer -> Layer to Image Size. This last step is necessary to fit the modified layer within the original image size, cropping as needed.

Creating a set of aligned frames that you are happy with is the most time-consuming step in creating a ZoomWalk video. It will take too much time to manually adjust every frame, so run the video several times to see where it needs help. You can also click through the aligned frames one-by-one to look for any surprises. You may end up replacing aligned frames with the original version, manually modifying an original or aligned frame, or accepting one of the rejected frame alignments.

Frame Interpolation/Morphing

Taking a photo every 10 feet, and playing back the video at 25 frames per second, would mean that the point of view would hurtle forward at 250 feet/second, or 170 miles/hour. However, to take a photo every one or two steps would double or treble the number of photos taken, the time required to take the photos, the workload for the computer, and the visual inspection of the frames. I used the morph command from the imagemagick package to ameliorate this problem.

Morph does not understand objects, or the concept of objects changing their position between frames. It just blends the color of each pixel; if you invoke "morph -1" you will get one image that is 50% of each real frame. If you specify "morph -2," which is what I settled on, you get two interpolated or in-between frames, the first of which is 67% frame #1 and 33% frame #2, and the second will be 33% frame #1 and 67% frame #2. To generate the fade in and fade out of the title sequences, I used "morph -10" to gradually transition from a black background to the title frame and back again. Morph is in essence a shorthand for this particular use of the blend operator.

This image shows the results of a "morph -2". The top left image is the first real frame, and the lower right image is the second real frame.

I decided that having three blended images was too much; the appearance became more of fading in and out rather than moving forward. The videos I produced all used two interpolated frames.

Here is the script fragment for generating the title sequence:


echo "create title frames"

/bin/rm -f smalgn*.jpg

convert -size 1280x720 xc:black blackf.tif

convert -size 1280x720 xc:black -font Cooper-Blk-BT-Black -pointsize 40 \
    -gravity center -draw \
        "fill white text 0,-28 \"ZoomWalk #3\" \
        fill white text 0,28 \"Chautauqua Park to Walton Lake\" " \

# create intermediate frames to fade from black to title and back.
# Total number of frames generated is 23.
convert blackf.tif title.png blackf.tif -morph 10 smalgn%05d.jpg
let seq=23

After the title sequence has been generated, it is time to enhance the frames and trim them to their final size, here 1280x720.  The trimming also removes the black edges created by realignment, and is accomplished by extracting the 1280x720 frame from the center of the larger image. The file names start with a sequence number (seq) established from the final title sequence frame.

cnt=`ls -1 algn*.jpg | wc -l`
echo "buff and resize $cnt frames"

# to compare same-size and same-color frames, we defer contrasting,
# sharpening, and trimming the edges until this point, when we are
# about to generate the intermediate frames. We are trimming from
# 1600x1065 to 1280x720.
for i in algn*.jpg
    of=`printf "smalgn%05d" $seq`
    convert -crop 1280x720+160+172 -contrast-stretch 0.30x0.35% -unsharp 4x1.5+0.36+0.5 $i $of.jpg
    let seq=$seq+1

Using morph on all the images at once uses a lot of the computer memory. Even on my desktop computer, juno, with 4 GB of RAM, processing 1280x720 images would fill the memory rapidly, causing the computer to use the disk (swap/pagefile) to hold information and slowing the processing down immensely. The script works around this limitation by generating the interpolated frames in batches of 100 real frames, and resequencing the numbers:

# we must generate the morphed/interpolated frames in batches, else
# most desktop computers will run out of memory!

/bin/rm -f fbatch*.jpg ffr*.jpg


smbase=`printf "smalgn0%02d" $seq`
first=`printf "smalgn0%02d00.jpg" $seq`

while [ -f "$first" ]
    let seq=$seq+1
    echo "create interpolation batch ${seq} ($smbase))"
    # handle gap between last of this and first of next batch by
    # including "next"....
    next=`printf "smalgn0%02d00.jpg" $seq`

    if [ -f "$next" ]
        convert $smbase*.jpg $next -morph 2 fbatch%03d.jpg
        # handle dup of last of this batch and first of next batch
        # by removing last of this batch.
        ls -l fbatch300.jpg
        /bin/rm fbatch300.jpg
        convert $smbase*.jpg -morph 2 fbatch%03d.jpg

    for i in fbatch*.jpg
        oname=`printf "ffr-%05d" $bigseq`
        mv $i $oname.jpg
        let bigseq=$bigseq+1

    smbase=`printf "smalgn0%02d" $seq`
    first=`printf "smalgn0%02d00.jpg" $seq`

Another area for future experimentation is with the morphing values. By using the blend operator directly, you can play with percentages other than those used by morph. For instance, you could still have two interpolated frames, but instead of 67% and 33%, they could be further apart (75% and 25%) or asymmetrical (80% and 50%). Video from stills is a very large sandbox in which to play!

Generating the Video

Now, finally, the frames can be assembled into a video. This script generates two slightly different versions, one at a high quality/less compression setting (-qmax 3) and another at a slightly less high (but still high) quality setting (-qmax 4).

/bin/rm -f
/bin/rm -f

# now we can finally assemble the video.
ffmpeg -f image2 -i ffr-%05d.jpg -qmax 3
ffmpeg -f image2 -i ffr-%05d.jpg -qmax 4

These scripts have used a consistent naming convention for each step of the process, so that one step does not interfere with prior steps. For example, you could experiment with the interpolation/morphing step many times while leaving the alignment work untouched. The convention is,
  • files starting with 'fr' are the resized copies of the original photographs.
  • files starting with 'algn' are the aligned versions of the frames, which could be untouched copies of the 'fr' file, automatically aligned versions, or manually aligned versions.
  • files starting with 'smalgn' are the title sequence followed by the enhanced and trimmed frames.
  • files starting with 'ffr' are the final frames, real plus interpolated.

Video Services

Any video service provider translates the videos that are uploaded into a particular encoding scheme or schemes, and allows a certain bandwidth for replaying them. As I documented in the ZoomWalk post, it was necessary to alter those videos (embedding a smaller video in a larger black image) to obtain decent reproduction from YouTube; the gambit caused YouTube to use the HD (High Definition) settings for a video, but the videos had a modest (for HD) bandwidth requirement because of the static black background.

The ZoomWalk videos are handicapped, in a sense, because they contain more change per frame than a typical video. If I look at some of the earlier videos shot as videos by my camera, and adjust for image size, using the same encoding scheme (mp4/mov), the regular videos require roughly 3 MB of uploaded file for each second of playback time, while the ZoomWalks consume about 6¼ MB/second. The Griggs Dam video was taken by my camera at 1280x720 in AVCHD Lite format, which is considered a lossy compressing format, but so is mp4, for which I used high-quality settings to compensate. The AVCHD Lite video used just under 2 MB/second.

Given my dissatisfaction with the embedded-in-black trick for YouTube -- it creates an odd appearance in the blog -- I decided to experiment with Vimeo, an alternative video hosting service. The experience was similar to using YouTube until I upgraded to a paid "Plus" membership, qualifying me for better playback. Here are the three ZoomWalk photos, hosted by Vimeo, in widescreen (16:9 aspect ratio) and in a size that works well embedded in a blog post, without the distracting black background trick.

The companies offering video services are in competition, and will occasionally leapfrog each other in technology or service. YouTube recently bought a video enhancement company (Green Parrot), so in six months or a year my choice might change again.

These three compositions were intriguing and very instructive for me, and I hope to create better ZoomWalks, both technically and artistically. I have enjoyed sharing them with you (although I can do without blogger trying to eat them).

No comments:

Post a Comment

Comments may not appear immediately as they are moderated by the author to eliminate spam. Please, no commercial links!