ZLIB Optimization

ZLIB is used by famous gzip utility for compression and decompression of any data. It consists of two main algorithms inflate and deflate. The inflate algorithm (for decompression) is implemented in files src/inflate.c which in turn calls a heavily optimized routine inflast_fast() implemented in external/zlib/src/inffast.c. The code comments in inffast.c mention a lot of suggested optimizations which have actually turned out to be slower than the existing code.

Possible areas for optimization

The idea is to find any small loops (as inner as possible) in inflate_fast() implementation. There are two places where there is byte by byte copy operation ongoing in a loop namely:

while (len > 2) {
    PUP(out) = PUP(from);
    PUP(out) = PUP(from);
    PUP(out) = PUP(from);
    len -= 3;
} 
if (len) {
    PUP(out) = PUP(from);
    if (len > 1)
        PUP(out) = PUP(from);
}

and

do { /* minimum length is three */
    PUP(out) = PUP(from);
    PUP(out) = PUP(from);
    PUP(out) = PUP(from);
    len -= 3;
} while (len > 2);
if (len) {
    PUP(out) = PUP(from);
    if (len > 1)
        PUP(out) = PUP(from);
}

On a first look they seem similar to memcpy/memmove, but it seems there are some side effects of replacing these with memcpy/memmove (leads to crashes). To overcome the side effects a condition memcpy is used based on the check whether "from" and "out" overlap or not. This is the patch which has been tested at our end. After applying this patch, the modified zlib is used to compile the gzip program. Using this 'gzip' program for compressing and decompressing various files gives mixed results with improvement of around 20-30% in some cases. Next the tests provided by ZLIB distribution were used.

Tests and Performance Figures

All the tests described below are run on Android KitKat on a platform with ARMv7 dual core 1.2GHz.

ZLIB distribution also provides a simple test program in external/zlib/src/test/example.c The tests provided here are of two types:

  • Tests which include file i/o of gzipped files
  • Tests which compress/decompress in memory data

For in memory data the memory used by both input and output buffers is around 40KB and the test fills it with a pattern and tries to compress and decompress it and verifies that the original data is recovered. Since the data size was small, a loop of 1000 times was used to compress and decompress this data and the time taken during the decompression phase was added up. The performance figures generated indicate a very good improvement with the proposed patch. The function clock_gettime() which is supposed to have a nanoseconds accuracy was used to measure timing.

The performance data (in nanoseconds, based on data size 40 * 1000 KB) is as follows:

Iteration

Without Optimization

With Optimization

1

566,912,909

499,453,583

2

541,944,012

498,269,533

3

547,286,352

504,166,203

4

547,140,137

502,552,295

5

557,788,897

498,208,390

6

540,883,863

508,257,004

7

566,540,488

515,776,990

8

554,876,496

505,078,713

9

549,241,562

514,751,947

10

544,583,244

495,904,585

11

567,164,350

494,194,112

12

549,071,289

513,422,074

13

572,865,716

494,835,810

14

544,081,989

495,156,675

15

544,479,517

492,700,283

which gives a performance benefit of 9.1%. However this is based on a data with specific pattern used in the test program. On using some random streams the results are mixed and half the time there is a performance improvement of around 20-30%. The performance data based on these tests is as follows:

Sample

Size (Bytes)

Without Patch

With Patch

Gain (%)

Sample01.gz

27,111,698

5,296,602,106

3,626,360,271

31.5

Sample02.gz

3,836,176

820,564,101

1,055,639,068

-28.6

Sample03.gz

18,378,033

6,482,008,941

4,424,539,739

31.7

Sample04.gz

6,943,249

6,718,112,475

4,697,512,172

30.0

Sample05.gz

169,712,534

142,053,476,569

145,627,166,007

-2.5

Sample06.gz

7,166,666

9,348,545,144

7,495,270,076

19.8

Sample07.gz

12,112,692

14,484,019,084

17,393,463,820

-20.0

Sample08.gz

9,510,643

12,338,618,848

13,953,473,050

-13.0

Sample09.gz

15,881,887

14,275,170,084

17,882,978,188

-25.2

Sample10.gz

5,966,757

4,015,208,971

4,815,521,406

-19.9

Sample11.gz

2,912,761

3,561,491,704

2,481,505,003

30.3

Sample12.gz

7,864,488

12,449,199,915

18,425,318,355

-48.0

Sample13.gz

6,885,443

10,889,146,413

4,656,129,539

57.2

Sample14.gz

11,279,920

26,913,262,266

19,873,775,857

26.1

Sample15.gz

2,020,126

5,037,869,006

5,379,250,807

-6.7

Android_Zlib_Optimizations (last modified 2014-07-17 10:06:05)