[Back to Computer Security] [CTT Home Page] [Part 2]

-----------------------------------------------------------------

File Compression

DOS File Compression Tests
Macintosh File Compression Tests
Conclusions
Disclaimer

File Compression is a technique where a program is used to "compress", or "shrink" other files. At its simplest, file compression means replacing strings of similar characters with codes to signify the entire string. For example, a simple file compression system might replace the following: ==================== with a two or three character code which means "20 equals signs." Newer file compression routines, under favourable circumstances, can achieve compression ratios where the final file is only 5% of the size of the original.

The uses of file compression include making more room for programs or files on your hard disk, backing up large files onto diskettes, distributing software on diskettes, and, most commonly, sending or receiving files via modem. The first file compression routines were intended to save money sending files over long-distance calls, but even on a local call, sending a file that is 1/4 or less the size of the original can make life much easier, since you need not tie up your phone for such long periods of time.

There are many file compression utilities available these days. Most of them are incompatible with each other, but the file extension will usually give you a hint which software to use:

In the DOS world, PKZIP and INFO-ZIP create files with an extension of .ZIP, ARC and PKARC default to .ARC, PAK uses .PAK (although it can compress and decompress .ARC and .ZIP files), LArc creates .LZS files, LHA defaults to .LZH, RAR creates .RAR files, SQZ creates .SQZ files, and ZOO, naturally enough, uses .ZOO. Many utilities have the ability to create self-extracting archives, which will usually have the extension .EXE (some very small archives may use .COM.) Some file compression programs, like ARC, ZIP, and ZOO, are available for several operating systems. If you are using some operating system other than DOS, it may be possible to decompress a self-extracting archive by using your operating system's version of the utility which created it. (Thus, a Macintosh .ZIP utility should be able to decompress an .EXE archive created by PKZIP.)

In the Macintosh world, CompactPro uses .CPT, MacLHA uses .LZH files, ZipIT and several other Mac utilties use .ZIP, and Stuffit, the reigning Macintosh file compression standard, uses .SIT (although it can also compress and decompress .ZIP and several other formats.) Self extracting archives in the Macintosh world usually have an extension of .SEA, for Self-Extracting Archive. Aladdin Systems has released a Windows version of StuffIt, which can compress and decompress .SIT files.

Some file compression utilities are only available as commercial software, but most of them have some sort of shareware or demo version which you can find on many BBSs, in the SimTel ftp archives, on commercial online services, and of course on the web through a site such as http://www.shareware.com.

DOS File Compression Tests

Macintosh File Compression Tests
Conclusions
Disclaimer
Return to Top

I did a test using the various file compression utilities I could get my hands on. Here are the results of the tests. I used eight test conditions: small, medium, large, and very large text files, small, medium, and large binary files, and a typical shareware archive. Except for the very large text file, which was a little over three megabytes (I used the FidoNet nodelist, the largest text file I could find), each of the test archives fit onto a 1.44 Meg floppy disk before being compressed. In this way, I sought to give equal weight to all types of files to be compressed, whether programs or text and data files, and whether large, medium, or small. Some programs perform better under certain conditions than others do, so by testing under several conditions, a better picture of overall performance is possible.

Software Tested

The following programs were used in the tests:

ARC Version 6.02
Registration: $50.00 US.
System Enhancement Associates
21 New Street
Wayne, NJ, 07470, USA

ARC is the oldest of the programs tested. It was first released in 1985, and for many years, the ARC format was the standard for file compression on microcomputers. To this day, the default compression routine used in FidoNet is ARC. For that reason, it's important to have a copy of ARC (or a compatible like PAK) available, even though newer compression routines are much more efficient. There are programs which can handle ARC-format files for almost every operating system, from the Apple II on up. SEA have released a newer version of ARC, which packs more tightly than 6.x, but I could not get my hands on a copy for testing, because it's not available as shareware. ARC 7.x can read ARC files from older versions, and compatibles, but they cannot read ARC 7.x files unless they are created in the older format. Ironically, in most of the tests, PAK (in ARC-compatible mode) created smaller files than ARC did, even though they were ARC format files.

ARJ Version 2.41a
Registration: $40.00 US.
ARJ Software
Robert Jung
2606 Village Road West
Norwood, MA, 02062, USA
CompuServe: 72077,445

ARJ was first released in 1990, and has proved to be among the top tier of file compression programs. It placed fourth overall in my data compression tests. (That's better than it sounds. The top five programs were all within a percentage point of one another.) In two of the eight conditions, it compressed better than PKZIP 2.04g in maximum compression mode, but it did not finish first in any of the test conditions. Since February, 1993, ARJ has been the official compression format for the Shareware Distribution Network (SDN) and compatible networks. There are two basic drawbacks to ARJ. First, it's only available for MS-DOS. As far as I know, there are no ARJ-compatible archivers for other operating systems, like Macintosh, Amiga, Atari, or UNIX. This limits the usefulness of ARJ to those who are running DOS, Windows, OS/2, or a "DOS emulator" under another operating system. Second, ARJ files can contain "install" programs which will run automatically when the file is extracted. This leaves open the possibility for virus or "trojan horse" attacks, and thus ARJ is not particularly beloved by the anti-virus community.

INFO-ZIP
Free
Mark Adler, Richard B. Wales, Jeanloup Gailly, Kai Uwe Rommel, Igor Mandrichenko and John Bush

INFO-ZIP is a public domain file compression utility compatible with PKZIP. There are versions for DOS, Amiga, Atari, Macintosh, OS/2, UNIX, VMS, and Windows NT. ZIP, UNZIP, and the source code for each version are freely available on many ftp sites. INFO-ZIP enjoys all the advantages of being compatible with PKZIP, which has been the de facto compression standard the past few years, and has the additional advantage of price. (You can't beat free!) If you are currently using an unregistered copy of PKZIP, and honestly can't afford the $47.00 to register it, you should switch to INFO-ZIP so you can have a clear conscience. PKZIP still has some features INFO-ZIP doesn't (for instance, INFO-ZIP will not create a .ZIP file which spans multiple floppies), but for most uses, INFO-ZIP is just fine.

LArc Version 3.33
Free
Kazuhiko Miki, Haruhiko Okumura, and Ken Masuyama

LArc first came out in 1988. It doesn't compress very well, although it does do better than ARC or ZOO. Nor is it fast, coming in dead last (by a long time) in my speed tests. Still, it's free, and if anyone ever sends you a file with an extension of LZS, you'll know what to do with it. LArc also has an odd quirk of creating archives where the date and time stamp on the archive will match that of the newest file in the archive. That can be useful if you're repackaging old shareware archives, and want to be able to tell from the date stamp on the archive how old the program inside is, but in many applications, such a practice is confusing at best.

LHA Version 2.13
Free
Haruyasu Yoshizaki

LHA was first released in 1988, and has been widely distributed. It was widely used up to a few years ago, before PKZIP 2.x came out and proved to be superior in terms of compression as well as speed. PKZIP 1.x and LHA were very close in terms of compression capability, although LHA was much slower. PKZIP 2.x compresses significantly smaller files, as well as working faster. I have not been able to find a more recent version of LHA than 2.13, and it may be that the author has moved on to other projects. Even so, it would probably be a good idea to obtain a copy if you can, just in case somebody sends you a file compressed with LHA. LHA creates files with a default extension of LZH. No other program I tested in this batch can create or uncompress LZH files. There is LZH compatible software available for Amiga and Atari ST/Falcon computers.

PAK Version 2.51
Registration: $15.00 US. (Full screen version: $30.00)
NoGate Consulting
P.O. Box 88115
Grand Rapids, MI, 49518-0115, USA
Fax: 1-616-455-8491
Data: 1-616-455-5179
Telephone: 1-616-455-6270

PAK appears not to have been upgraded since the release of PKZIP 2.x. Before that time, it was competitive, but not quite as good at compressing as PKZIP or LHarc. PAK does have two advantages. First, it can compress and decompress ARC format and ZIP 1.x format files, as well as its native PAK format. (Ironically, PAK produces smaller archives than ARC 6.x does when it is run in ARC-compatible mode, even though ARC can extract the files created by PAK. In another irony, in some tests, PAK created a smaller archive in ZIP 1.x compatible mode than it did in its native mode.) Second, it has a full-screen version as well as a command line version. (At this time, RAR is the only other program which has full-screen capability, although there are several "shell" programs which provide full-screen control for most archivers available.) For several years, PAK was the official compression format for the Shareware Distribution Network (SDN) and compatible networks. Since PAK packs tighter than ARC 6.x, even in ARC-compatible format, and can zip and unzip ZIP 1.x format files, it can be a good utility if you are hard up for disk space, and need to stick to one program. However, you would need to inform anyone sending you ZIP files to use version 1.x of PKZIP to create files to send to you.


PKZIP Version 2.04g
Registration: $47.00 US.
Phil Katz
PKWare, Inc.
9025 North Deerwood Drive
Brown Deer, WI, 53223, USA
Fax: 1-414-354-8559
Data: 1-414-354-8670
Telephone: 14143548699
CompuServe: 75300,730

PKZIP and the ZIP format have been the de facto compression standard for several years now. The ZIP file format has been released to the public domain, so that anyone with the technical ability can create a ZIP-compatible program. Thus, there are many available, not only for DOS (like INFO-ZIP and PAK), but for Amiga, Atari, Macintosh, OS/2, UNIX, and Windows. Despite a rash of newer programs, PKZIP has managed to hold its own, coming in third in my tests, less than half a percentage point behind RAR, and less than a tenth of a percentage point behind SQZ. (In fact, if I didn't include the TEXT1 and PROG1 conditions, which were small text and binary files under 100K and 35K respectively, then PKZIP would have come out ahead of SQZ, which did best on very small files.)

RAR 1.55
Registration: $35.00 US.
Eugene Roshal
INFO-SHARE
PL 97
02101 Espoo, Finland
Fax: 35-80298-3308
Data: 35-80506-2622
Telephone: 35-8029-83307
FidoNet: 2:220/22

RAR is a new program, first released in 1994 by Russian programmer Eugene Roshal. It came out tops in my compression tests, and was fairly competitive (second place) in my speed tests. Currently, RAR is only available for DOS and OS/2, but the author promises versions for other operating systems are on the way. (It appears that a UNIX version might be next.) Aside from PAK, RAR is the only compression program to have a full-screen option. (In fact, even when you run RAR in command line mode, the full screen pops up, and then exits when the program is finished processing your commands.) RAR also offers the ability to manage archives created by other programs, like PKZIP or LHA, by running them from RAR as if it were a compression utility shell program. However, like ARJ, RAR can run an "install" program automatically when you extract an archive, and this opens the possibility for virus or "trojan horse" attacks.

SQZ Version 1.08.3
Registration: 150SEK ($30.00 US)
Jonas I Hammarberg
Pl 529. St. Harrie 10:2
S-244 91 Kaevlinge
Sweden
Telephone: 46-46-730-088
FidoNet: 2:200/107.24

SQZ is another relatively new program, released in 1993. The author promises versions for Amiga, Macintosh, OS/2, UNIX, VMS, and Windows. It did very well on the compression tests, coming out on top in small and medium text files and the shareware archive. However, it was quite slow compared to the other top compression performers, coming in fourth slowest in the entire field. Thus, in an application where speed is important, you may want to consider a different choice.

ZOO Version 2.10
Free
Rahul Dhesi

ZOO doesn't compress files as well as most of the other programs tested (although it beats ARC), nor does it work very fast, but it does have the advantage of being available for several different operating systems: DOS, Amiga, UNIX, and VMS. In fact, before the development of a ZIP archiver for the Amiga, it was one of the few archive formats available on both machines.

Files to be Compressed

TEXT1 File (small text files.) 224 files. 936781 bytes in total.
TEXT2 File (medium text files.) 64 files. 1438253 bytes in total.
TEXT3 File (large text files.) 8 files. 1440456 bytes in total.
TEXT4 File (very large text file.) 1 File. 3328651 bytes in total.
PROG1 File (small binary files.) 90 files. 1407575 bytes in total.
PROG2 File (medium binary files.) 9 files. 1409830 bytes in total.
PROG3 File (large binary file.) 1 File. 1249382 bytes in total.
PROG4 File (typical shareware distribution archive.) 27 files. 1120547 bytes in total.

Total: 8 test archives. 12331027 bytes in total.

Test Procedures

The testing was done on a 386 DX 40 with 6 megabytes of RAM and no math co-processor. No RAM-resident programs or drivers were loaded in memory during the tests. Each test was run through a batch file which ran the archiver through each of the test archives and put time stamps before and after each test. Each collection of test files was located in a separate subdirectory (text1, text2, etc.) After the batch file was run, the time stamps and size of each archived file were compared to the originals.

Results

The first priority in file compression is the final (archived) file size. The final sizes for each of the test conditions are listed below, along with the total size of the files.

Program Archive size Percentage of original size
All eight test conditions 12331515 100
RAR (best compression) 5506983 44.7
SQZ 5556064 45.1
PKZIP (best compression) 5565635 45.1
ARJ 5596023 45.4
INFO-ZIP 5604031 45.4
LHA 5715489 46.3
RAR (best speed) 5723361 46.4
PAK 5996867 48.6
PAK (ZIP 1.x format) 6067240 49.2
PKZIP (best speed) 6137121 49.8
LArc 6832143 55.4
ZOO 7298409 59.2
PAK (ARC format) 7474051 60.6
ARC 7555402 61.3
Average 6256540 50.7

A second consideration, at least in some applications (like backing up a hard disk onto floppy disks or using an offline mail reader), is the speed of compression. The speed of compression for the eight test conditions is listed below.

Program Compression Time (min:sec.100ths)
All eight test conditions  
PKZIP (best speed) 2:47.97
RAR (best speed) 4:14.69
PAK (ARC format) 4:24.19
ARC 5:02.36
PKZIP (best compression) 6:00.42
ARJ 7:01.39
INFO-ZIP 7:03.75
RAR (best compression) 8:01.86
SQZ 8:19.71
LHA 8:34.55
PAK (ZIP format) 9:27.76
PAK 9:32.32
ZOO 10:12.91
LArc 16:20.35

Macintosh File Compression Tests

DOS File Compression Tests
Conclusions
Disclaimer
Return to Top

There are several file compression utilities available for Macintosh. The de facto standard is StuffIt, which is available in several versions: StuffIt Deluxe, a shareware version called StuffIt Lite, and two "drag and drop" utilities called StuffIt Expander (which is free) and DropStuff (which is shareware.) The next most common Mac-only compression utility is Compact Pro. There are also at least three Macintosh programs which use the ZIP compression algorithms, of which I tested ZipIt. And finally, there is MacLHA.

Software Tested

The following programs were used in the tests. Addresses of the software developers are given where known.

Compact Pro 1.51
Registration: $25.00 US
Cyclos-CP
P.O. Box 31417
San Francisco, CA 94131-0417 USA
Fax: 1-415-821-1168
CompuServe: 71101,204
URL: www.cyclos.com

DropStuff with Expander Enhancer 4.5
Registration: $30.00 US
StuffIt Lite 3.6
Registration: $30.00 US
Aladdin Systems
165 Westridge Drive
Watsonville, CA 95076, USA
Fax: 1-831-761-6206
Telephone: 1-831-761-6200
URL: www.aladdinsys.com

MacLHA 2.21
Freeware
Kazuaki Ishizaki
URL: www.vector.co.jp/authors/VA008909/

ZipIt 1.3.8
Registration: $15.00 US
Tom Brown
110-45 Queens Blvd. Apt. 716
Forest Hills, NY 11375 USA
URL: www.awa.com/softlock/zipit/zipit.html

Test Procedures

The testing was done on a Macintosh LC 575 with 24 megabytes of RAM running System 8.0. Each folder was compressed with each program (except in the case of MAIL.ZIP and REPLY.ZIP, in which case only the two text files were compressed, and not the enclosing folder.) The final size of the file was then compared with the original. Due to the difference in sector sizes, the amount of disk space used by a given file will depend on the total size of the disk. Thus, size differences which are less than a few K may not matter on a large disk.

I used twelve test folders: the first four were four randomly-selected folders in my data hierarchy, as they were at the time of last week's backup run. The second four folders were roughly two megabytes of files randomly selected from each of my Control Panels, Extensions, and Fonts folders, plus two megabytes of small applications such as Disinfectant. The last four conditions consisted of my address books from Claris Emailer and MacSoup, the contents of a SOUP mail packet, the contents of a SOUP reply packet, and the beta version of Lynx for Macintosh, which I use for testing my Web sites, along with all of its assorted documentation and support files. Between the 12 test cases, I was trying to test each file compression utility for its effectiveness on data files, application files, and the kinds of files which might be exchanged during a typical online session.

Results

The final sizes for all twelve of the test conditions are listed below, along with the total size of the files.

Program Archive size Percentage of original size
All twelve test conditions 18158376 100
ZipIt 7622475 41.98
DropStuff 7904381 43.53
StuffIt Lite 7908093 43.55
MacLHA 8035399 44.25
Compact Pro 8379903 46.15
Average 7970050 43.89

Conclusions

DOS File Compression Tests
Macintosh File Compression Tests
Disclaimer
Return to Top

In the DOS tests, the top five programs in terms of compression were, in order, RAR, SQZ, PKZIP, ARJ, and INFO-ZIP. They were all within a percentage point, and the first three were within half a percentage point. All of the first three were tops in at least one test condition. Although RAR came out on top in the total compression capability in these particular tests, the difference between them is so slight as to be insignificant, since the nature of the files to be compressed has a significant bearing on which one comes out ahead. The next rank of programs was LHA, PAK, and the high speed settings of RAR and PKZIP. LHA actually managed to place second in small text files and third in small program files. The least effective programs were LArc, ZOO, and ARC. (ARC placed dead last in six of the eight test conditions.)

In the Macintosh tests, all five programs tested were within five percentage points of one another. The surprise for me was that ZipIt showed a significant advantage over both StuffIt versions I tested in every single case except two (CTT-Web and REPLY.ZIP, in which DropStuff showed a slight edge.) This surprised me, because StuffIt, in its various forms, is far and away the dominant file compression utility for Macintosh, and very nearly every Mac user at least has a copy of StuffIt Expander on their desktop. Compact Pro, which is the oldest of the programs tested, showed the biggest gap in compression performance, but still fares respectably well.

The first time I ran the DOS tests, there was a significant tradeoff between speed and compression performance between all programs. The programs which did well in terms of compression were not very fast, with the exception of PKZIP, which compressed well and was fast, too. The general rule still holds true, but not as firmly as it used to. RAR's high compression setting is still a little faster than SQZ (which isn't saying much.) ARC did manage to redeem itself a little in terms of speed, ranking fourth after PAK. But PKZIP's high speed setting still led the pack, and PKZIP's high compression setting is about 25% faster than RAR's high compression setting or SQZ. In an application where speed is critical (like an offline mail reader where you are calling long distance), PKZIP seems to offer the best compromise between compression ratio and speed. Unfortunately, I was not able to record precise times for the Macintosh tests, but subjectively, it seems like the tradeoff between speed and compression exists there, too.

In terms of compatibility across computing platforms, .ARC, being the oldest standard, is probably the most widely supported, with compatible products on virtually every microcomputer made. There are programs compatible with LZH for the Amiga, Atari, DOS, and Macintosh. RAR is available for DOS and OS/2. (There may be other ports of RAR of which I have not yet heard.) StuffIt is available for Macintosh and Windows, and Aladdin has released a DOS utility which can uncompress .SIT 1.5 archives. Compact Pro is unique to the Macintosh. ZIP compatible software is available for Amiga, Atari, DOS, Macintosh, OS/2, UNIX, VMS, and Windows. (Note that StuffIt Deluxe and StuffIt Expander both support the .ZIP format, and that PAK supports ZIP 1.0.) ZOO is available for Amiga, DOS, Unix, and VMS. In the end, whether any one compression format becomes "standard" depends more on how quickly and smoothly it is ported to various operating systems than on the raw performance of the algorithms upon which it is based. It must also be said that some compression programs which are fairly slow under DOS seem to run much faster on computers with different processors. For the past few years, it seems that ZIP files are the most common, although if you are processing FidoNet nodelist files, you still need ARC or a compatible, and if you are processing files from SDN or a similar network, you will need PAK or ARJ.

It is unrealistic to compress each file several times and then keep whichever archive ends up being the smallest. Clearly, unless there are other considerations, use of any of these five programs should prove perfectly satisfactory. None of them can claim to be superior in every instance. If your most important consideration is saving disk space, then the program at the top of each result table should be considered, although which program comes out on top varies a little according to the kind of files being compressed. In fact, the top five DOS compression utilities and the all the Macintosh utilities I've tested work quite well. If price is the bottom line, then go for one of the freebies (or all of them!) If you want to be able to share across operating systems, then a a utility which supports the .ZIP format is your best bet. But whatever utility you choose for your own compression use, you might wish to keep several handy to decompress files in formats other than your favourite.

Disclaimer

DOS File Compression Tests
Macintosh File Compression Tests
Conclusions
Return to Top

I do not work for any software developer, nor do I own stock in any software company. I have no financial interest in which software you choose to use.

-----------------------------------------------------------------

[Back to Computer Security] [CTT Home Page] [Part 2]

[Back to Computer Security] [Up to CTT Home Page] [Up to Part 2 Menu]