THE EFFECT OF EXPECTATIONS ON VISUAL INSPECTION PERFORMANCE John Kane Derek Moore Saeed Ghanbartehrani Oregon State University INTRODUCTION The process of visual inspection is used widely in many industries to locate and identify defects in a broad range of areas including mechanical tool and machine inspection, visual inspection in production lines, quality control of various products and finding critical defects on products such as microchips and aircraft parts. Many industrial inspection tasks are now performed with the use of automated vision systems for removing human error and improving efficiency (Hata 2006). Nevertheless, a majority of tasks such as machine tools inspection and steel quality inspection are still being done manually or under a microscope (Hata 2006). As with anything involving humans, this process is susceptible to human error and inaccuracies that can lead to equipment failures, unnecessary costs or personal injury. Because of this, there has been research to determine the effects of various factors on the visual inspection process and what can be done to mitigate the negative effects. Research has been conducted on many of these factors including age, intelligence, fatigue, inspection speed and others. (US Department of Transportation, 2001) This research will focus on one specific ergonomic factor that is yet to be well understood; the effects of expectations on visual inspection. This research will explore the idea that inspectors will develop certain expectations based on prior experience that could cause unexpected defects to be missed. BACKGROUND Visual inspection is a process in which a person is relied upon to accurately perceive, identify, and categorize visual stimuli. There are many factors external to the human element in the human-machine visual inspection system that can affect accurate perception, and much research has been focused along this vector. A literature search (summarized in table 1) into the field of visual inspection revealed a wealth of knowledge concerning a "bottom up" perspective. For example, characteristics of the inspected image, such as sharpness, size, and position were examined and found to have some effect on perception (Hata 2006). Image complexity was observed to degrade human inspection performance in an expected fashion (Gallwey & Drury, 1986). Characteristics of inspectors, such as gender, age, intelligence (US Department of Transportation, 2001) were also found to have measurable effects on inspection performance. Magnification of images improved inspection performance, but only for images external to the fovea, or below a size threshold if within the fovea. (Chan 2004). However, we found surprising little research into the "top down" perspective. That is, little information was found concerning the effects of preexisting bias or operator expectation upon the performance of a human-machine visual inspection system. It is important to understand how top down processing relates to visual inspection performance so that such systems can be designed to be accurate and consistent. It is in this direction in which we have chosen to direct our research. 1
US Dept. of Transportation (2001) Tetteh (2006) Table 1. literature review and analysis Paper(s) Factors Key Points - visual acuity (static, dynamic, peripheral), color vision, eye movement, scanning, age, experience, personality, sex, intelligence Hata (2008) Hata (2006) Gallwey & Drury (1986) Chan (2004a) (2004b) - pace, search strategy, complexity - detection level of image differences, detection level of defects - changes in brightness, color saturation, color defects (using MacAdms reference), edge sharpness, defect size, defect position, continuous defects - Visual inspection of automatic vision systems vs. human visual inspection -visual inspection complexity (number of fault types, variation of standards between faults, location of faults) - subject groups (professional vs. nonprofessional - magnification (linear and nonlinear) of inspection surface - location of defect (center to periphery of point of focus) - Each factor has unique effects on visual inspection performance. - no interactions between factors - horizontal search works better than vertical - Several hypotheses were posed and experiments are planned to examine human sensitivity to the factors listed. Human color grading is sensitive to: - color purity vs. pure white, - McAdams ellipse characterizes human color distinguishing thresholds - Color change edges affect human perception of white purity - no significant difference in visual inspection performance between subject groups - human detection sensitivity not affected by defect location - increasing complexity in the remaining factors decreases human visual inspection performance in a predictable manner - Image size higher than threshold is key for the fovea region. Above that, magnification makes no difference on human inspection performance. - Magnification makes a large performance difference in periphery of visual field, but not enough to equal performance in the fovea region. OBJECTIVE The specific objectives of this research were: 1. To determine if human subjects exposed to a visual defect location bias perform differently in a subsequent visual inspection than subjects not exposed to such a bias. 2. To determine if such an expectation or bias has more influence on visual inspection performance while the subject is under a time pressure, i.e., rushed. The research team hypothesized that subjects conditioned to have a bias would find more defects in the biased location and fewer in other parts of the object compared to the unbiased subjects. The team also hypothesized that this difference in defect detection would be magnified when subjects were exposed to time pressure. 2 METHODS This research project involved a controlled experiment in which volunteer subjects were exposed to biasing and timing conditions while performing a simulated visual inspection process. Subjects Eight test subjects, of both genders, were identified with an age range of 18-65 years. All subjects had functional vision. All subjects were volunteers. No compensation offered for participation. Apparatus and Tools Participants were provided with a chair and a table in a room free from distraction and under consistent lighting conditions.
Upon completion of data collection, the research team performed a statistical analysis of the results using the statistical software package S-Plus. An ANOVA test was run to compare the means of the separate groups. Stimuli The stimuli used in the experiment were paper planes with characters printed on both sides of wings and body. Inspecting characters printed on a paper plane can fairly represent a 3D visual inspection job because it has several parts (wings and body) folded in different dimensions. The pattern shown in figure 2 printed on letter size plain paper. Figure 1 briefly shows the process of making the paper planes. Figure 1. making the paper plane From left to right and top to bottom shows the process of making the paper plane Printed characters can represent defects in paint job or the body manufacturing process. Characters used were letters in uppercase, size 12 of a serif font (Times new roman) in single paragraph spacing in combination with spaces. Letters were generated randomly and there was a space between each two successive letters. The letter X represented a defect. The purpose of using capitals along with serif font was to simplify the visual search. Two sets of paper planes were created: Set 1) paper planes with 90% of defects randomly distributed on wings; and set 2) paper planes with defects distributed randomly on all parts. Set 1 was used to develop bias (expectation) on desired subjects. Set 2 was used to evaluate both biased and unbiased subjects. The total number of printed "defect" patterns used was 25. Four groups of five planes were used in the trial run and another group of five constituted the stimuli for the experiment. Figure 2. Paper plane Pattern Top View Bottom View Patterns shown above were printed on both sides of a letter sized paper. Dashed lines show where paper should be folded Experimental Design The study used a 2 2 factorial design. There were four control groups, each with two randomly assigned subjects. For each group of subjects, independent (controlled) variables were: 1) timed inspection or untimed inspection, and 2) subject group exposed or not exposed to defect location bias. Dependent (response) variables were: 1) number of defects found in regions with a potential bias of high quantity of defects, and 2) number of defects found in regions with a potential bias of low quantity of defects. The experimental design is summarized in Table 2. Table 2. Design of the experiment Location Bias Biased Unbiased Inspection Time Untimed Timed Group A Set 1 Set 2 Group C Set 1 Set 2 Group B Set 2 Set 2 Group D Set 2 Set 2 Set 1: planes with defects not randomly distributed (bias creating) Set 2: planes with randomly distributed defects Procedures To prevent variable levels of fatigue or alertness from influencing test results, all testing was done during morning hours between 8:00am and 12:00pm during the last week of November, 2010. Subjects were asked to participate in this time frame at their convenience. 3
1. Each subject was given a training presentation in which their tasks were explained, and they were given sufficient descriptions of the "defects" to be identified. 2. After the training presentation, each subject did two "trial" inspections in which they were given the opportunity to practice the inspection. For these "trial" inspections, group A and group C used planes from set 1, that is, planes with the defect location bias. Group B and group D used planes from set 2, planes with randomly located defects. 3. For the third inspection run, all groups used planes from set 2, (randomly distributed defects). Groups A and B performed the third inspection untimed. Groups C and D were given 1 minute per plane to perform the third inspection. This timing was designed to be a challenge and to create a sense of urgency in the subjects. Each run consisted of 5 planes per subject, with 25 defects per plane. When a defect was found, the subject indicated it by using a red marker supplied by the facilitators. All inspected items were collected, with group associations noted for each item. Dependent variables were recorded by the experiment facilitators after the inspection runs were completed. Upon concluding the experiment, subjects were debriefed and thanked for their time. RESULTS Experimental results are presented in Table 3. The data represents the percentage of defects that were found in each region of the planes by each control group. Table 3. Inspection Accuracy Casual observation reveals that subjects in untimed inspections outperformed subjects in timed inspections in every instance. An analysis of variance (ANOVA) for the time variable across both regions had an F-value of 4.7, with an associated p-value of.085. This is suggestive, but inconclusive evidence that the means of the accuracies of the timed vs. untimed groups is different from the null hypothesis (zero difference). Two-sample t-tests were also performed between timed and untimed data in each region. The associated t-test p- value for the wing region was 0.0285. The associated t- test p-value for the body region was 0.0193. These values indicate moderate evidence that the means in these accuracy scores are different from the null hypothesis. That is to say, the data supports that there is a difference in performance between the timed groups and the untimed groups. An analysis of variance (ANOVA) for the bias/unbias variable across both regions had an F-value of 2.35, with an associated p-value of.214. This does not provide evidence that the means between bias/unbias groups were different from the null hypothesis. Two sample t-tests were also performed between biased and unbiased group data in each region. The associated t-test p-value for the wing region was 0.334. That for the body region was 0.226. These values do not provide evidence that the means in these accuracy scores are different from the null hypothesis. That is to say, the data does not support the hypothesis that there is a difference in inspection accuracy between the biased groups and the unbiased groups. DISCUSSION As expected, the subject groups that were not timed during the inspection performed better than the groups that were timed. The signal was clear on this, and though the sample size was low, 2 subjects per group, there was statistically moderate evidence that timing the inspection indeed made a difference in inspection accuracy. It is very interesting that scores in regions in which the biased groups were taught to expect most of the defects, the wings, were the same for both the biased and the unbiased groups, while scores in the regions in which the biased groups were taught to expect fewer defects were indeed lower in the biased-timed group than in the unbiased-timed group. These observations are consistent with the original hypotheses of the study, however, since 4
the sample sizes were very small, no statistically based conclusions can be drawn. The possible implications are indeed intriguing, and worthy of further study. CONCLUSIONS AND RECOMMENDATIONS The results of this experiment are not inconsistent with the original hypotheses that: 1) Human subjects exposed to a visual defect location bias perform differently in a subsequent visual inspection than subjects not exposed to such a bias, and 2) Such an expectation or bias has more influence on visual inspection performance while the subject is under a time pressure, i.e., rushed. Sample sizes were too small to support any such statement with significant statistical confidence. However, though not a test hypothesis, the data, even though sample sizes were small, did moderately support the notion that untimed inspectors outperformed timed inspectors. It is recommended that the experiment be repeated with larger control groups. The larger sample sizes will be necessary to generate statistical information that supports (or does not support) the tested hypotheses with greater confidence. In general, the experiment as designed met expectations, however there are some small recommended improvements if it is to be repeated. In particular, observations and subsequent discussions suggested that biased subjects may realize during the test run that defects are not distributed as they expected, and at that point may start to over-compensate in their inspections. A second experiment should be designed with this possibility in mind. REFERENCES Chan, A. H. S., & Ma, R. C. W. (2004a). Effect of linear magnification on target detection performance in visual inspection. The International Journal of Advanced Manufacturing Technology, 23(5-6), 375-382. Chan, A. H. S., & Ma, R. C. W. (2004b). A comparison of linear and non-linear magnification on target detection performance in visual inspection. Proceedings of the Institution of Mechanical Engineers -- Part B -- Engineering Manufacture, 218(10), 1373-1386. Gallwey, T., & Drury, C. (1986). Task complexity in visual inspection. Human Factors, 28(5), 595-606. Hata, S. (2006). Human factors of visual inspection systems in production. In IECON 2006-32nd Annual Conference on IEEE Industrial Electronics, November 6, 2006 - November 10, 2006, IECON (Industrial Electronics Conference) Proceedings, 5454-5457. Retrieved from http://dx.doi.org.proxy.library.oregonstate.edu/1 0.1109/IECON.2006.347916 Hata, S., Matsuda, Y., Yunoki, K., & Hayashi, J. (2008). Several aspects of human sensitivity for visual inspection. In 2008 Conference on Human System Interaction, HSI 2008, May 25, 2008 - May 27, 2008, 522-525. Retrieved from http://dx.doi.org.proxy.library.oregonstate.edu/1 0.1109/HSI.2008.4581493 Tetteh, E. G., & Jiang, S. (2006). The effects of search strategy, task complexity and pacing on visual inspection performance. In 9th Annual Applied Ergonomics Conference 2006, March 6, 2006 - March 9, 2006, 9th Annual Applied Ergonomics Conference 2006, Conference Proceedings (Vol. 2006, pp. Auburn Engineers, Inc.; Humantech; Ergoweb Media Group, a division of Ergoweb, Inc.; The Ergonomics Center of North Carolina). Orlando, FL, United states: Institute of Industrial Engineers. US Department of Transportation (2001) Reliability of Visual Inspection for Highway Bridges, Volume 1, 16-27. Retrieved from www.tfhrc.gov/hnr20/nde/01105.pdf 5