Freedom from Paper — At Last

photo by Rich Pasco

Until recently, my home office felt like the one pictured at the right. An information packrat, I was drowning in a sea of paper financial records, technical articles, and more.

Since then, I have reduced eight four-drawer filing cabinets of paper to a hard disk I can hold in the palm of my hand. Even if portability were not required, the saving in floor space, weight and volume are awesome. Here I share the details here with everyone. If you have any questions feel free to write.

After scanning, I retain original documents with inherent value (e.g. stock and bond certificates, birth and death certificates, and deeds and titles) in my bank's safe deposit box. I dispose of those whose only value is their information content (e.g. bank statements, copies of tax returns, and technical articles). I shred confidential documents and recycle the rest.

Image Quality and File Format

For a target image quality of "xerographic copier quality" I set my scanner as follows:

  • Resolution: 300 pixels per inch (also called 300 dots per inch or DPI).
  • Depth: 1 bit/pixel (bi-level, black/white quantization)
  • File Format: Tagged Image File Format (TIFF) with CCITT Group 4 (G4) compression

With these settings, each scanned letter-size typed or printed page occupies about 100 kilobytes (KB), more or less, depending of the density of the information on it.

Storage

photo

A typical personal computer hard disk of today costs about $100, weighs a few ounces, fits in the palm of your hand, and stores a terabyte (TB) or two, where a TB is a thousand gigabytes (GB), a GB is about a thousand megabytes (MB), and a MB is a thousand kilobytes (KB), and a KB is a thousand bytes. So today's $100 hard disk can store about ten million pages scanned as above.

Each drawer of a filing cabinet can store about five thousand pages, fewer if they're divided into a lot of folders. So a four drawer file can store around 20,000 pages, and about a thousand filing cabinets (way more than shown in the photo above) would fit on a single Western Digital 2 TB My Passport shown at the right. My four filing cabinets are well under 10 GB.

See also "Storage—Then and Now" for an amusing glimpse of history.

Scanning

photo

The next consideration is how to scan all those papers. I knew I would easily tire of opening and closing the lid on my flatbed scanner for each sheet, so I bought a high-speed duplex scanner with a stack loader. ("Duplex" means it scans both sides of the page at once.) I chose the Canon DR-2080C document scanner which retailed for about $500. It's no longer in production, but I'm sure there are others. The DR-2080C scans about ten pages per minute. I can put a stack of fifty pages into its feeder and come back ten minutes later to find them all done.

One drawback I found is that occasionally the DR-2080C feeds two sheets at once (sticking together), scanning the front of the first one with the back of the second, and omitting the faces between. It is therefore necessary to review the scanned images to ensure that no pages were missed before tossing the paper into the shredder or recycle bin.

Another issue I've found is that with bi-level scanning, the quality of the result depends on the scanner's "brightness" control. The default setting works well for ordinary laser-printer output but must be set darker for pencil markings or lighter for material printed on shaded backgrounds. Sometimes I need to scan a test page several times to get the right setting before starting an unattended batch run.

Always inspect your scanned images before disposing of your paper documents.

Software

In addition to the minimal software bundled with the scanner (you'll want at least a TWAIN driver; the DR-2080C also comes with a utility called CapturePerfect), I found a good image-management desktop utility to be essential. This allows you to see a desktop full of thumbnails, drag and drop to rearrange them, view a "slide show" of your images, etc. A good image-management utility will support both scanned documents and digital camera photos.

For this purpose I use Thumbs Plus from Cerious Software, which is an excellent and powerful utility I highly recommend.

Other products in the market include Adobe Bridge, Paperport Desktop from Nuance (formerly Scansoft) and Paint Shop Pro from Corel (formerly JASC). I am not familiar enough with any of these products enough to review them here.

Backup

A fundamental rule of computers is, "never live with just one copy of important data." With every disk (or solid-state memory), the only question is "when will it fail?" Besides, your computer could be stolen, your house could burn down, or a virus could wipe out your files.

As a computer consultant, often I hear clients ask how to recover from a system crash wiping out their files. Sometimes my advice includes "reformat your hard disk and restore your data from backup" to which I often hear my client respond, "what backup?". I hope you're not one of them.

Most data safety experts recommend off-site backups, in case your house burns down. Hurricane Katrina showed even that to be inadequate. In several cases both an office and its backup site were flooded.

Keep at least one recent backup disconnected from your computer, because particularly vicious malicious software (malware) can wipe out the data on all online drives. Consider the possibility of ransomware.

For daily backup, I use a pair of external 3 TB hard disks (Western Digital MyBook) and a backup utility that runs every night and automatically makes backups of what I did that day. I keep one disk at home for daily backups, and one in my safe deposit box at the bank. Every month or so, I swap them. I periodically burn important files to CD-ROMs and store them. I also duplicate my most important files between my desktop and laptop machines, and keep separate backups in my two homes (2400 miles apart).

Another important rule is not to trust media written more than a decade ago. If you burned a CD in 1996, it's probably a good idea to make a copy in 2006. Save both if you like, but verify that you can actually read the new copy before you destroy the old one!

In fact, no matter what backup system you use, it's a good idea to actually test it now and then by restoring a few sample files, both to be sure it works and to become comfortable with the restore procedure before a crisis necessitates.

Never live with just one copy of important data.

Note: Even a relatively new hard disk drive can fail, regardless of whether or not it is still under warranty. Manufacturer's warranties only provide for the replacement of defective hardware and do not guarantee the recovery of any data you may have stored on the defective drive. Services which recover data from failed hard drives are very expensive and are not covered under the drive's warranty. So, you should always back up all your hard drives, even those which are still under warranty.

Recycling and Shredding

Where practical, I recycle non-confidential office paper. San Jose picks it up curbside. But considering rampant identity theft crimes, it's foolish to just toss personal financial records into the recycle bin.

Several years ago when working for scanner maker Visioneer, I quipped that we ought to offer a combination scanner and shredder which would do both operations at once, but I decided that our customers might prefer to back up their hard disks first.

I have an older personal shredder (Fellowes PS50), which fits on a wastebasket and accommodates 3 or 4 sheets at a time. But without a stack feeder, it's tediously slow to do filing cabinets full at a time. For volume shredding I toss everything into bins, and when they're full I call Shred-It. This company has truck-mounted shredders and will come to your home or office and shred all you've got for a fee.

Incidentally if your shredder produces paper strips (unlike the confetti from the newer cross cut shredders), you can donate bags of them to the Humane Society for lining animal cages.

Optical Character Recognition (OCR)

Optical Character Recognition (OCR) is the software process of transforming images of printed pages into character files representing the text of those pages.

OCR is necessary if you plan to import scanned text into a word-processor for editing and revisions. It is also necessary if you want to search among scanned pages by the text printed on them.

Generally I don't bother to OCR my pages when I scan them. Saving just the scanned images is sufficient for my goal of reducing paper clutter. If it turns out I do need the text later on, I can always OCR the images then.

When I do need OCR, I've been using OmniPage Pro 12.0 from Nuance (who bought Scansoft who bought Caere, the original publisher. I'm not totally happy with it. This program works well under Windows 2000 but frequently hangs when starting up under Windows XP. Nuance is aware of the problem but their complex instructions to install the service release only yield the terse message "Your product is up-to-date!" without actually updating anything.

A possible alternate that has received good reviews is FineReader from ABBYY. I have no personal experience with FineReader.

Conclusion

OK, that's what I do: Scan, backup, shred. I'm not saying what you "should" do but I am happy to answer questions with more details about what I do.

Articles

Exceptions

Of course there are some times when paper is still necessary, as this humorous video attests.