This paper was published in the 2013 Joint Statistical Meetings (JSM) Proceedings, Government Statistics Section, Montreal, Canada: American Statistical Association, Pages 1157-1171.
ADI presented two talks at the Federal Committee on Statistical Methodology (FCSM) Conference on Jan 10-11, 2012 at the Walter E. Washington Convention Center, 801 Mount Vernon Place, NW, Washington, DC 20001.
The first presentation by K. Bradley Paxton was titled:"Use of Synthetic Data in Testing Administrative Records Systems"
Written by: K. Bradley Paxton, and Thomas Hager (ADI, LLC)
Click on the PDF link below to read more about the presentation and view the slide deck.
ADI presented two talks at the Federal Committee on Statistical Methodology (FCSM) Conference on Jan 10-11, 2012 at the Walter E. Washington Convention Center, 801 Mount Vernon Place, NW, Washington, DC 20001.
The second presentation by K. Bradley Paxton was titled:"Testing Production Data Capture Quality"
Written by: K. Bradley Paxton, Steven P. Spiwak, Douglass Huang and James K. McGarity
(ADI, LLC)
Click on the PDF link below to read more about the presentation and view the slide deck.
In doing forms data capture, whether with just human data entry keying from paper forms or with Optical Character Recognition (OCR), Optical Mark Recognition (OMR) and Key From Image (KFI), it has been customary to employ manual methods for data quality assurance. These methods involve a process we refer to as Double Key & Verify (DK&V), wherein one keyer is asked to key a particular data field, and then another keyer is asked to key the same field (preferably without collusion). If the results from both keyers agree with the sampled field, then the sampled data field is deemed to be correct. If they do not, then a third party is usually employed to divine the correct answer. This classic DK&V process is slow and costly; so, in practice, the amount of data sampled for Quality Assurance (QA) purposes is often smaller than desired for useful statistically valid results.
In order to really know how a production forms data capture system is doing, it has been customary to have keyers sample captured data fields and do “double key and verify” operations to determine the correct answers (“truth”) of production data. In the system we call Production Data Quality, which will be used in the 2010 Census, we use software automation and good statistical design to reduce the human effort involved by as much as 40 times and obtain high quality “truth." Once the “truth” is known, the production data may be scored using whatever correctness criteria are appropriate for the application, for example, some type of a “soft match."
A major problem with capturing data from forms filled out by respondents is measuring the accuracy and efficiency of the system. This is true whether traditional “heads-down” keying from paper (KFP), “heads-up” keying from image (KFI), and/or handprint Optical Character Recognition (OCR/ICR) is capturing the data.
It is a fundamental fact that in order to improve the accuracy and efficiency of a system, it is necessary to be able to measure these performance factors. Using our new, patent pending Digital Test Deck® technology, you can now assess your forms processing system more easily and accurately than ever before.
Forms processing technology can be a bit complicated, so in this white paper, we attemptto provide a simple explanation of the basics, and how Digital Test Deck ® technology can be used to save you a lot of money while insuring system accuracy.
A major problem associated with understanding costs in capturing data from forms filled out by respondents is measuring the accuracy and efficiency of the system. Many factors come into play when estimating the costs of data capture. Some of these are: the amortized cost of equipment and software, labor, facilities, etc.
Some factors are technical in nature, such as: OCR Error vs. Reject Rate, forms volume and complexity, keying speeds and costs, keying accuracy and finally, the cost of an error in data capture downstream in the business process. This session explores a simple cost model to guide a financial/management decision about whether or not to invest in a potential data capture system improvement.
ADI has successfully performed quality assurance testing for the Census Bureau for both questionnaire printing and electronic data capture in the 2010 Decennial Census.