About Digitizing Services
How Do I Use the Digital Solutions FTP System?
Follow the guidelines outlined here: http://www.hfgroup.com/media/HFGroupFTPHowTo.pdf
What is a “document” and “document management”?
A document houses similar subject items and can consist of one to thousands of pages. Documents do not only have to be paper or electronic files as they can include other media, files, etc. As a very basic explanation, document management includes storing, sorting, indexing, and combining the various components for ease of retrieval.
What image resolution should I use?
This depends on the end usage of the item. For preservation digitization, many clients select 300 dpi but you may decide to request 400 dpi or 600dpi, depending on the object. For paper (office document) scanning, 300dpi typically will suffice.
Is dpi always the best determination for quality?
No, straight dpi should not be the deciding factor as all scanning system’s (and provider’s) dpis are not equivalent. A 600dpi image from one system can often be inferior to a 300dpi image from another. There are many factors that drive quality such as sharpness, color accuracy, dynamic range, artifacts, etc. Our team can work with you to help make these decisions based on our experience and what others are doing in your specific industry.
What is the difference between high quality and preservation quality digitization?
Miscellaneous records and office documents dictate high quality digitization that is scalable. This type of scanning is applicable for corporations or business offices that simply require the highest level of viewable images for optimized OCR.
If you are a library, museum, or other cultural heritage institution, you should not trust your irreplaceable materials to a mass scanning company. Very few providers specialize in performing preservation quality conversion of unbound and bound cultural heritage materials. Personnel should be conservation trained for handling one of-a-kind originals and delivering archival-quality digital renditions. Equipment deployed should also be designed to accommodate a diverse array of fragile materials so each item is returned in the same condition as it was received.
In either scenario, you should select an experienced provider who is committed to delivering quality and is not only quantity or price driven.
What is “deskewing” and how is it handled?
Skewed (crooked or tilted) images can have a negative affect on the accuracy of the OCR process. Software can adjust for skew “on the fly”. DS also has image processing tools in place to manually adjust. This especially proves very effective for adjustments to individually skewed images. The goal of deskewing is to achieve optimal results for OCR accuracy.
Do the bindings on books always have to be cut to achieve the best quality and lowest price?
No, there is specialized equipment available, such as the technology deployed at DS, which will scan books so that they can remain bound while still achieving the highest quality images.
Do we have to skip foldouts within a book?
No, with the right combination of hardware and software, foldouts can also be scanned and relatively easily inserted into the proper position within the file.
Can I scan double-sided documents?
Yes, imaging systems can support both simplex (one-sided) and duplex scanning (two-sided).
Can I scan landscape and portrait pages together in one batch?
Yes, imaging systems allow you to change the orientation of pages as you scan or after scanning through image processing steps.
Isn’t color scanning expensive?
It can be if the proper hardware and software isn’t in place. Providers should have a blend of technology that can intermix color, grayscale, and bitonal images with very little to no additional costs.
What is the standard output format used for images?
Black and white images are most commonly stored as standard TIFF files using Group 4 (two-dimensional) compression. Grayscale and color images are frequently stored as TIFF files, either uncompressed or with LZW compression. Grayscale and color images may also be stored as JPEG or JPEG 2000 (JP2) compressed images.
What other formats including metadata and derivatives do most clients request?
Besides TIFF and JPEG, clients primarily request PDF, PDF/A, and JP2 output for imaging projects. Most clients are also looking for OCR, uncorrected; however, corrected OCR is sometimes requested for mission critical documents. Newspaper images often require meeting NDNP industry specifications.
Other choices for deliverables include METS/ALTO, XML conversion services, and ePub for various mobile devices. Keying to correct OCR for the highest level of accuracy is sometimes requested along with consolidation of images and capturing of handwritten information.
Further, clients increasingly want a printed reproduction of their digital files so we output files that are formatted specifically for print on demand (POD) and can even test them on our internal equipment.
What is OCR?
Optical Character Recognition (OCR) is the most common way of converting words in a scanned image to searchable text. OCR is used for achieving full-text indexing and searches. OCR engines can generally only recognize typed or laser-printed text, not handwriting. OCR accuracy is directly tied to the quality of the original scan.
What makes sense, a “backfile” or “day forward” conversion strategy?
That depends on your goals for the overall project and retention schedule. Often it makes financial sense to start from a point in time. Other times, clients want to strive towards a paperless environment and request that all files be scanned. It may also be a combination of historical and new documents as some items may need to be retained and quickly accessible “forever”. This can be a difficult decision but our team can work with you and your Records Management team to help determine what the best fit is for your organization’s needs.
Can DS help with a backlog of scanned images that require processing?
Yes, DS offers a unique service that can help you catch up either short term or on an ongoing basis. DS can receive batches of images via FTP or hard drive, perform the necessary image processing steps, and deliver the images back to you with all derivatives required for archiving, web presentation, print on demand, etc.