Quick and dirty web image optimization
Given a large pile of images that nominally live on a web server, I
want to make them smaller and more friendly to serve to clients. This is hardly
novel: for example, Google offer detailed advice on reducing the size of images
for the web. I have mostly JPEG and PNG images, so, jpegtran
and optipng
are the tools of choice for bulk lossless compression.
To locate and compress images, I’ll use GNU find
and
parallel
to invoke those tools. For JPEGs I take a simple
approach, preserving comment tags and creating a progressive JPEG (which can be
displayed at a reduced resolution before the entire image has been downloaded).
find -iname '*.jpg' -printf '%P,%s\n' \
-exec jpegtran -o -progressive -copy comments -outfile {} {} \; \
> delta.csv
Where things get a little more interesting is when I output the name and size of
each located file (-printf ...
) and store those in a file (>
delta.csv
).1 This is so I can collect more information about the changes
that were made.
Check the JPEGs
Loading up a Jupyter notebook for quick computations driven with Python, I load the file names and original sizes, adding the current size of each listed file to my dataset.
|
|
From there, it’s quick work to do some list comprehensions and generate some overall statistics:
|
|
This is by no means good code, but that’s why this short write-up is “quick and dirty”. Over the JPEGs alone, I rewrote 2162 files, saving 8820474 bytes overall; about 8.4 MiB, 8.02% of the original size of all of the files- a respectable savings for exactly zero loss in quality.
PNGs
I processed the PNGs in a similar fashion, having optipng
emit interlaced
PNGs which can be displayed at reduced resolution without complete data and try
more combinations of compression parameters than it usually would, trading CPU
time for possible gains. There was no similar tradeoff with the JPEGs, since
apparently jpegtran
’s optimization of encoding is entirely deterministic
whereas optipng
relies on experimental compression of image data with varying
parameters to find a minimum size.
Since I observed that many of the PNGs were already optimally-sized and
interlacing made them significantly larger (by more than 10% in many cases), I
also considered only files that were more than 256 KiB in size. To speed up
processing overall, I used parallel
to run multiple instances of optipng
at
once (since it’s a single-threaded program) to better utilize the multicore
processors at my disposal.
find -iname '*.png' -size +256k -printf '%P,%s\n' \
-exec parallel optipng -i 1 -o 9 ::: {} + \
> delta.csv
Running the same analysis over this output, I found that interlacing had a significant negative effect on image size. There were 68 inputs larger than 256 KiB, and the size of all of the files increased by 2316732 bytes; 2.2 MiB, nearly 10% of the original size.
A few files had significant size reductions (about 300 KiB in the best case), but the largest increase in size had similar magnitude. Given the overall size increase, the distribution of changes must be skewed towards net increases.
Try again
Assuming most of the original images were not interlaced (most programs that
write PNG don’t interlace unless specifically told to) and recognizing that
interlaced PNGs tend to compress worse, I ran this again but without interlacing
(-i 0
) and selecting all files regardless of size.
The results of this second run were much better: over 3102 files, save 12719421
bytes (12.1 MiB), 15.9% of the original combined size. One file saw a whopping
98% reduction in size, from 234 KB to only 2914 bytes- inspecting that one
myself, the original was inefficiently coded 32 bits per pixel (8-bit RGBA), and
it was reduced to two bits per pixel. I expect a number of other files had
similar but less dramatic transformations. Some were not shrunk at all, but
optipng
is smart enough to skip rewriting those so it will never make a file
larger- the first run was an exception because I asked it to make interlaced
images.
That’s all
I saved about 20 MiB for around 5000 files- not bad. A copy of the notebook I used to do measurements is available (check out nbviewer for an online viewer), useful if you want to do something similar with your own images. I would not recommend doing it to ones that are not meant purely for web viewing, since the optimization process may strip metadata that is useful to preserve.
Assumption: none of the file names contain commas, since I’m calling this a CSV (comma-separated values) file. It’s true in this instance, but may not be in others. ↩︎