Estimation of Viewer’s Response for Contextual Understanding of Tasks
using Features of Eye-mov...
1 4
90 *
120 60
5 deg. ...
Table 2: Estimation accuracy using feature vectors.
Combination Fixation ...
of 4

Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks Using Features Of Eye Movements

To estimate viewer’s contextual understanding, features of theireye-movements while viewing question statements in response to definition statements, and features of correct and incorrect responses were extracted and compared. Twelve directional featuresof eye-movements across a two-dimensional space were created, and these features were compared between correct and incorrect responses. The procedure of estimating the response was developed with Support Vector Machines, using these features. The estimation performance and accuracy were assessed across combinations of features. The number of definition statements, which needed to be memorized to answer the question statements during the experiment, affected the estimation accuracy. These results provide evidence that features of eye-movements during reading statementscan be used as an index of contextual understanding.
Published on: Mar 3, 2016

Transcripts - Nakayama Estimation Of Viewers Response For Contextual Understanding Of Tasks Using Features Of Eye Movements

  • 1. Estimation of Viewer’s Response for Contextual Understanding of Tasks using Features of Eye-movements Minoru Nakayama∗ Yuko Hayashi CRADLE Human System Science (The Center for R & D of Educational Technology) Tokyo Institute of Technology Tokyo Institute of Technology Abstract & Karn summarized eye tracking related metrics and effectiveness while subjects completed various tasks [Jacob and Karn 2003]. Ad- To estimate viewer’s contextual understanding, features of their ditionally, Ryner summarized features of eye-movements in the eye-movements while viewing question statements in response to reading process [Rayner 1998]. Most metrics are based on fixa- definition statements, and features of correct and incorrect re- tion and saccade during a specific task, and are scalar, not dimen- sponses were extracted and compared. Twelve directional features sional. Therefore, high-level eye-movement metrics are required, of eye-movements across a two-dimensional space were created, and some of these have already been proposed [Duchowski 2006]. and these features were compared between correct and incorrect re- Also, the features of eye-movement metrics across two-dimensional sponses. The procedure of estimating the response was developed space have also been discussed, as eye-movements can be illus- with Support Vector Machines, using these features. The estima- trated in two dimensions [Tatler 2007; Tatler et al. 2007]. The tion performance and accuracy were assessed across combinations authors’ preliminary analysis of the inferential task suggests that of features. The number of definition statements, which needed estimations using several features and a liner function are useful, to be memorized to answer the question statements during the ex- but subjects’ performance is not sufficient enough to be analyzed periment, affected the estimation accuracy. These results provide in depth [Nakayama and Hayashi 2009]. To improve this perfor- evidence that features of eye-movements during reading statements mance, two approaches will be considered. The first is the creation can be used as an index of contextual understanding. of features of eye-movements which are effective at understanding the viewer’s behavior. The second is that a more robust estimation CR Categories: H.1.2 [User/Machine Systems]: Human informa- procedure be created using non-linear functions such as Support tion processing; H.5.2 [User Interfaces]: Evaluation/methodology Vector Machines (SVM) [Stork et al. 2001]. Also, performance as- sessment procedures are used to emphasize the significance of the Keywords: eye-movements, answer correctness, eye-movement estimation [Fawcett 2006]. metrics, user’s response estimation, discriminant analysis This paper addresses the feasibility of estimating user response correctness of inferential tasks using various features of eye- 1 Introduction movements made while the user selects alternative choices based on their contextual understanding of statements presented to them. “Contextual understanding” is the awareness of knowledge and pre- sented information, including texts, images and other factors. To 2 Experimental method discern a person’s contextual understanding of something, ques- tions are generally given in order to observe the responses. Even in The subjects were first asked to understand and memorize some the human-computer interaction (HCI) environment, various sys- definition statements which described locational relationships be- tems ask users about their contextual understanding. To improve tween two objects (Figure 1(a)). Each definition statement was pre- these systems and make them more environmentally effective for sented for 5 seconds. Then, ten questions in statement form were users and designers, an index for the measurement of contextual given to determine the degree of understanding. These questions understanding is desirable, and should be developed to ferret out asked subjects to choose one of two choices as quickly as possi- problems regarding HCI. Eye-movements can be used to evaluate ble, regarding whether each question statement was “Yes (True)” document relevance [Puol¨ maki et al. 2005], and to estimate user a or “No (False)” (Figure 1(b)). Each question statement was shown certainty [Nakayama and Takahasi 2008]. Results such as these for 10 seconds. When the subject responded to a question state- suggest the possibility that features of eye-movements can estimate ment, the display moved to the next task. All texts were written in viewer responses to questions which are based on contextual under- Japanese Kanji and Hiragana characters, and the texts are read from standing and certainty. Estimation techniques using eye-movement left to right. This task asked subjects for “contextual understand- metrics have already been applied to Web page assessments [Ehmke ing”. The number of statements, which were given to subjects, was and Wilson 2007; Nakamichi et al. 2006]. The effective features of controlled at 3, 5 or 7 per set, as the task difficulty. Five sets were response estimation have not yet been determined, however. created for each statement level. In total, 150 responses per subject To conduct an estimation of viewer’s responses using eye- were gathered (3 levels × 5 sets × 10 questions). The experimen- movements, the appropriate features need to be extracted. Jacob tal sequence was randomized to prevent subjects from experiencing any learning effect. The subjects were 6 male university students ∗ e-mail: ranging from 23 to 33 years of age. They had normal visual acuity Copyright © 2010 by the Association for Computing Machinery, Inc. for this experiment. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed The task was displayed on a 20 inch LCD monitor positioned for commercial advantage and that copies bear this notice and the full citation on the 60 cm from the subject. During the experiment, a subject’s first page. Copyrights for components of this work owned by others than ACM must be eye-movements were observed using a video-based eye tracker honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on (nac:EMR-8NL). The subject rested his head on a chin rest and a servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail small infra-red camera was positioned between the subject and the monitor, 40 cm from the subject. Blink was detected using an as- ETRA 2010, Austin, TX, March 22 – 24, 2010. © 2010 ACM 978-1-60558-994-7/10/0003 $10.00 53
  • 2. 1 4 Incorrect 3 Correct Time (sec.) Accuracy 0.5 Accuracy 2 1 (a) (b) 0 0 3 5 7 Figure 1: Leftside (a) shows a sample of a definition statement: “A Number of statements theater is located on the east side of the police station.” Rightside (b) shows a sample of a question statement: “There is a post office Figure 2: Mean accuracy and reaction times for correct and incor- on the south side of the theater.” rect responses across the number of definition statements. 90 pect ratio of the two diameters. During blink, the lack of eye track- 120 60 * ing data was compensated for by the use of a simple, previously 6 deg used procedure [Nakayama and Shimizu 2004]. The tracker was 150* 30 calibrated at the beginning of the session, and eye-movement was tracked on a 640 by 480 pixel screen at 60 Hz. The accuracy of the 180* 0 spatial resolution of this equipment is noted in the manufacturer’s catalog as being a visual angle of 0.1 degrees. Correct 210* Incorrect 330* The tracked eye-movement data was extracted for the duration of p<0.01 * time subjects viewed each question statement before the mouse but- 240 * 300* ton was pressed. While differences between the captured view- 270 * ing positions were calculated, eye-movements were divided into saccades and gazes using a threshold of 40 degrees per second Figure 3: Mean fixation position for 12 directions. [Ebisawa and Sugiura 1998]. The two-dimensional distribution of eye-movements have not been considered as features of eye- movements, although several studies have used these factors in their research [Tatler 2007; Tatler et al. 2007]. Therefore, features of 3.2 Feature differences between responses fixation and saccade were mapped in 12 directions using 30 degree steps, in order to present a two-dimensional distribution. Four types Extracted features of eye movements for question statements, sac- of features were summarized for each time the statements were cade length, differences in saccade length, saccade frequency and viewed, as follows: the fixation position as the distance in degrees saccade duration, were compared between correct and incorrect re- from the center of the screen, fixation duration, saccade length, and sponses. The distribution of fixation points is illustrated in Figure saccade duration. These are 12-dimensional vectors, and they can 3. The figure shows fixation points covering a horizontal area, in also be noted as a scalar of the means of the components. particular on the right-hand side. In this experiment, single sen- tences were written horizontally, so that subjects viewed them ac- cording to the outline. In Japanese, verbs and negation are written 3 Results at the ends of sentences, so that readers may confirm the relation- ship between the subject and object in the sentence and whether 3.1 Viewer’s response it is a positive or negative statement. When comparing positions between correct and incorrect responses, significant differences are The subject’s responses were classified as correct (hits and correct illustrated with an asterisk mark (*) on the axis (p < 0.01). There rejections) or incorrect (misses and false alarms) according to the are significant differences between the 150 to 330 degree directions context of the question statement. There was a unique answer for and the 60 degree direction. For all cases, mean positions for in- every question, because the question statements were generated us- correct responses were longer than those for correct responses. As ing a rule of logic . The reaction time was also measured in mil- most differences appear on the left-hand side, this suggests that sub- liseconds for all responses. The accuracy of the responses across ject’s fixation points stayed at the beginning of a statement when the number of statements is summarized in Figure 2. The accuracy he made an incorrect response. They might have had some trou- decreases with the total number of statements. According to the re- ble starting reading. Subjects also viewed a wider area when they sults of one-way ANOVA on the accuracy, the factor of the number made incorrect responses than when they made correct responses. of statements is significant (F (2, 10) = 16.5, p < 0.01). The ac- The distribution of the fixation durations for each direction is simi- curacy for a total of 3 statements is significantly higher than for the lar to the mean position from the center of Figure 3. The durations others (p < 0.01), but there is no significant difference between the on the right hand side are relatively longer than for the other di- accuracy for 5 and 7 statements (p = 0.21), however. This suggests rections. This means that viewers’ eye movements stayed in this that the number of statements can be used to control the difficulty of area, at the end of the sentence. When comparing the duration be- the task, and that the task is easiest for 3 statements and hardest for tween correct and incorrect responses, the results for the very right both 5 and 7 statements. Mean reaction times for both correct and hand side are different from the others. Mean durations for correct incorrect responses were also illustrated in Figure 2. There are sig- responses are significantly longer than durations for incorrect re- nificant differences in reaction times between correct and incorrect sponses in the direction of 0 degrees. For other directions, such as responses (F (1, 5) = 109.1, p < 0.01). The factor of the number upward, left-ward and in the directions of 300 degrees, mean du- of statements is not significant (F (2, 20) = 0.6, p = 0.54). This rations for incorrect responses are longer than mean durations for suggests that reaction time is a key factor of response correctness. correct responses. This means that the distribution of the duration 54
  • 3. 90 * 120 60 5 deg. Table 1: Discrimination results for a condition (f(24)s(24): No. of statements=3). 150 * 30* Subject’s ˆ Estimation [t] response [t] Correct Incorrect Total 180* 0* Correct 158 49 207 Incorrect 25 68 93 Correct Total 183 117 300 210* Incorrect 330* * p<0.01 240 300* 270 rameter of the error term C for the soft margin, and the γ parameter as a standard deviation of the Gaussian kernel should be optimized. Figure 4: Mean saccade length across 12 directions. To extract validation results, the leave-one-out procedure was ap- plied to the estimation. Training data consisting of the data of all subjects except the targeted subject was prepared. Both the train- for correct responses shifts towards the right. These metrics show ing, and the estimation of responses was then conducted. These some of the required indices such as the visual area of coverage and were tallied, and the mean performance was evaluated. As a result, visual attention [Duchowski 2006]. the estimation results for three statements are summarized in Table ˆ 1, for all features (t vs. t). The rate for correct decisions consisting Mean saccade lengths in visual angles are summarized across 12 of hits and correct rejections is 76.3%. This discrimination per- directions in Figure 4. The mean of the saccade lengths is spread formance is significant according to the binomial distribution. The more widely along the horizontal axis. In particular, saccade estimation performance is often evaluated using an ROC (Receiver lengths in horizontally opposite directions are the longest, and their Operating Characteristics) curve, which is based on signal detection lengths are almost equal. This behavior shows that subjects care- theory [Fawcett 2006]. LIBSVM tools can provide a probability of fully read the question statements. Also, it may depend on the use the discrimination [Chang and Lin 2008], and then the ROCs are of Japanese grammar, because the subject term and the verb are created for each level of statements using the probability [Fawcett separated horizontally in the text. When comparing the lengths be- 2006]. Furthermore, the validation performance of the discrimi- tween correct and incorrect responses, the mean saccade length in nation is conducted using AUC (Area Under the Curve). The AUC the reverse direction (180 degrees) is longer for correct responses varies between 0 and 1, but the performance is better when the value than it is for incorrect responses. For several other directions (0, approaches 1. When AUC is near 0.5, the performance is at the 90, 150, 210, 300, and 330 degrees), the mean saccade lengths for chance level. Other feature sets of eye-movements, such as fixa- incorrect responses are longer than the saccade lengths for correct tion and saccades, were applied to the same estimation procedure. responses, however. Mean saccade durations clearly show that the Estimation performances and AUCs for a number of statements are mean durations for incorrect responses are definitely longer than summarized in Table 2. The 12 features for fixations consist of the those for correct responses. Though the mean indicates the duration fixation positions across 12 directions. The 13 features for fixa- for a single saccade, overall means are quite different, and there are tion consist of the 12 fixation positions plus a scalar of the fixation significant differences for all directions between correct and incor- duration, and the 24 features consist of the 12 fixation positions rect responses. This suggests that saccadic movement seems to be and the 12 fixation durations. For saccades, the 12 features consist slower when the viewer’s responses are incorrect. of the saccade lengths across 12 directions. The 13 features con- sist of the 12 saccade lengths plus a scalar of the saccade duration, 3.3 Estimation of Answer Correctness and the 24 features consist of the 12 saccade lengths and the 12 saccade durations. As references, performances using a combina- The significant differences in eye movement features between re- tion of scalar features were calculated. The combination “A” shows sponses were summarized in the above sections. These results sug- the performance when selected features (four features of saccades) gest that if there is the possibility of estimating responses using are applied to the estimation [Nakayama and Hayashi 2009]. An- viewer’s eye movement patterns before their decisions are made, other combination “B” shows the performance when another set then this possibility should be determined. Here, the hypothesis of selected features for all saccades (four features) [Nakayama and is that there is a relationship between “correct” or “incorrect” re- Takahasi 2008] is applied to the estimation. The estimation proce- sponses and the acquired features of eye-movements for a question dure is based on the previous study. Both feature sets of “A” and statement. Feature vectors of eye-movements are noted as V , al- “B” do not include the reaction time factor in this paper, because ternative responses are noted as t, and the acquired data can be the factor affected the performance. noted as (V , t) for each question statement. In this section, the per- As a result of the estimations in Table 2, the best performance is formance of the estimation is determined using various features of obtained using all features of fixation and saccade across 12 direc- eye movements. First of all, all extracted features (24+24 dimen- tions. According to the table, the estimation performance using sac- sions, V24+24 ) of eye movements, such as fixation and saccade, are cade features is higher than the performance using fixation features. applied to a discrimination function. For the estimation function, When the estimation was conducted using fixation or saccade fea- support vector machines (SVM) are used for this analysis because tures, the performance was practically independent of the number SVM is quite robust for high dimensionality features and poorly de- of feature dimensions. A combination of features including fixa- fined feature fields [Stork et al. 2001]. Here, a sign function which tion and saccade gives the highest performance. When two sets of is based on the SVM function is defined as G and based on a Gaus- selected features “A” and “B” were applied to the estimation, the sian kernel. The parameters and functions can be noted as follows: performance was not significant. A few estimations were less than t ∈ {+1(correct), −1(incorrect)} 50% accurate. This result provides evidence that the directional t = G(V24+24 ), t ∈ {+1(correct), −1(incorrect)} ˆ ˆ information across 12 directions is quite significant. The AUC met- rics are also the highest when the estimation was conducted using The optimization for the function G(V ) was conducted using LIB- fixation or saccade features. Table 1 shows the rate of false alarms, SVM tools [Chang and Lin 2008]. For the SVM, the penalty pa- which is the number of correct responses estimated for incorrect 55
  • 4. Table 2: Estimation accuracy using feature vectors. Combination Fixation Saccade Fixation+Saccade Tasks A B f(12) f(13) f(24) s(12) s(13) s(24) f(12)s(12) f(12)s(13) f(24)s(24) Estimation accuracy 3 68.3 43.0 65.0 65.7 73.0 77.3 78.7 76.0 76.0 75.7 76.3 5 50.3 55.7 68.3 68.7 65.3 65.7 65.3 64.0 66.7 68.7 70.7 7 44.3 61.0 67.6 67.0 61.6 67.0 66.0 66.7 68.3 67.7 68.7 M 54.3 53.2 67.0 67.1 66.6 70.0 70.0 68.9 70.3 70.7 71.9 AUC: Area under a curve 3 0.37 0.75 0.72 0.73 0.81 0.80 0.82 0.80 0.83 0.84 0.83 5 0.44 0.70 0.74 0.73 0.73 0.70 0.71 0.71 0.75 0.76 0.76 7 0.42 0.71 0.75 0.73 0.71 0.75 0.73 0.73 0.77 0.78 0.78 M 0.41 0.72 0.73 0.73 0.75 0.75 0.75 0.75 0.78 0.79 0.79 (13): 12 features of vector information plus the scalar of the duration. A: mean saccade length, mean differential ratio, saccade frequency, mean succade duration [Nakayama and Hayashi 2009] B: saccade length, dx, dy, saccade duration of every saccade [Nakayama and Takahasi 2008] responses. When the number of false alarms is smaller the AUCs FAWCETT, T. 2006. An introduction to roc analysis. Pattern Recog- are higher. Additionally, both the estimation accuracy and the AUC nition Letters 27, 861–874. metric for the three statement conditions are higher than for other conditions with 5 and 7 statements. According to Figure 2, the re- JACOB , R. J. K., AND K ARN , K. S. 2003. Eye tracking in human– sponse accuracy for three statements is significantly higher than for computer interaction and usability research: Ready to deliver the the other conditions. This response accuracy may affect both the promises. In The Mind’s Eye: Cognitive and Applied Aspects estimation accuracy and the AUCs. of Eye Movement Research, Hyona, Radach, and Deubel, Eds. Elsevier Science BV, Oxford, UK. 4 Summary NAKAMICHI , N., S HIMA , K., S AKAI , M., AND ICHI M AT- SUMOTO , K. 2006. Detecting low usability web pages using To estimate viewer’s contextual understanding using features of quantitative data of users’ behavior. In Proceedings of the 28th eye-movements, features were extracted and compared between International Conference on Software Engineering (ICSE’06), correct and incorrect responses when alternative responses to ques- ACM Press. tion statements concerning several definition statements were of- NAKAYAMA , M., AND H AYASHI , Y. 2009. Feasibility study for fered. Twelve directional features of eye-movements across a two- the use of eye-movements in estimation of answer correctness. dimensional space were created: fixation position, fixation dura- In Proceedings of COGAIN2009, A. Villanueva, J. P. Hansen, tion, saccade length and saccade duration. In a comparison of these and B. K. Ersboell, Eds., 71–75. features between correct and incorrect responses, there were signif- icant differences in most features. This shows evidence that fea- NAKAYAMA , M., AND S HIMIZU , Y. 2004. Frequency analysis of tures of eye-movements reflect the viewer’s contextual understand- task evoked pupillary response and eye-movement. In Eye Track- ing. An estimation procedure using Support Vector Machines was ing Research and Applications Symposium 2002, ACM Press, developed and applied to the experimental data. The estimation per- New York, USA, S. N. Spencer, Ed., ACM, 71–76. formance and accuracy were assessed across several combinations NAKAYAMA , M., AND TAKAHASI , Y. 2008. Estimation of cer- of features. When all extracted features of eye-movements were tainty for responses to multiple-choice questionnaires using eye applied to the estimation, the estimation accuracy was 71.9 % and movements. ACM TOMCCAP 5, 2, Article 14. the AUC was 0.79. The number of definition statements affected estimation performance and accuracy. ¨ ¨ P UOL AMAKI , Y., S ALOJ ARVI , J., S AVIA , E., S IMOLA , J., AND K ASKI , S. 2005. Combining eye movements and collabora- References tive filtering for proactive information retrieval. In Proceedings of ACM-SIGIR 2005, ACM Press, New York, USA, A. Heikkil, A. Pietik, and O. Silven, Eds., ACM, 145–153. C HANG , C., AND L IN , C., 2008. Libsvm: A library for sup- port vector machines (last updated: May 13, 2008). Available R AYNER , K. 1998. Eye movements in reading and information 21 July 2009 at URL: processing: 20 years of research. Psychological Bulletin 124, 3, /˜cjlin/libsvm. 372–422. D UCHOWSKI , A. T., 2006. High-level eye movement metrics in the S TORK , D. G. R., D UDA , O., AND H ART, P. E. 2001. Pattern usability context. Position paper, CHI2006 Workshop: Getting a Classification, 2nd ed. John Wiley & Sons, Inc. Japanese transla- Measure of Satisfaction from Eyetracking in Practice. tion by M. Onoue, New Technology Communications Co., Ltd., Tokyo, Japan (2001). E BISAWA , Y., AND S UGIURA , M. 1998. Influences of target and fixation point conditions on characteristics of visually guided TATLER , B. W., WADE , N. J., AND K AULARD , K. 2007. Ex- voluntary saccade. The Journal of the Institute of Image Infor- amining art: dissociating pattern and perceptual influences on mation and Television Engineers 52, 11, 1730–1737. oculomotor behaviour. Spatial Vision 21, 1-2, 165–184. E HMKE , C., AND W ILSON , S. 2007. Identifying web usabil- TATLER , B. W. 2007. The central fixation bias in scene viewing: ity problems from eye-tracking. In Proceedings of HCI 2007, Selecting and optimal viewing position independently of motor British Computer Society, L. Ball, M. Sasse, C. Sas, T. Ormerod, biases and image feature distributions. Journal of Vision 7, 14, A. Dix, P. Bagnall, and T. McEwan, Eds. 1–17. 56

Related Documents