Imagine you need to find out how customers navigate in a store and which elements and products catch their eye. You can interview the people, ask them what they did see and in which order. However, such results are subject to reporting bias: memory is less than perfect at best, and people are not willing to tell all they remember. Similar challenges exist in all user/consumer behaviour studies: advertisement, web pages, gaming, usability, and package design.
One solution to resolve focus of visual attention and path of the gaze in a quantitative and fully objective way is eye tracking.
Eye tracking – what is it?
Different application areas pose different kinds of requirements to the eye tracking systems, in terms of characteristics such as spatial and temporal accuracy, level of intrusiveness, and ease of setup and calibration. Considerable advances have been made in eye tracking techniques in the recent years, transforming eye tracking from technology that was only suitable to rather cumbersome and limiting laboratory experiments to an increasingly versatile research tool that can be applied as part of many types of consumer and user studies in widely varying environments.
A number of different approaches and techniques have been used over the years for tracking eye movements, ranging from electromyography of ocular muscles to application of special contact lenses.
The most widely used method, especially in user and consumer research applications, is optical eye tracking. It is based on comparison of the relative positions of the eye and the pupil. The relative position changes when the focus of gaze shifts but is insensitive to small motions of the whole head.
Infrared light reflected from the retina in the back of the eye is used to track the pupil location by computer vision methods. The light reflected from the cornea creates a pinpoint highlight in the eye, the corneal reflection that can be located and tracked in the video image of the eye as well. After calibration of the setup, these data allow the computation of gaze projection in real time.
In consumer/user research, gaze is typically recorded with 50–100 Hz sampling rate, while in e.g. neuroscience and psychology high-speed cameras sampling at 1000 Hz range are commonly used.
The two main ET device types are remote and mounted setups.
In the remote systems the cameras are positioned facing the person, usually right below the visual field at fixed distance from the eyes. In many settings, the device is located underneath a computer display but also large display screens further from the person are possible as long as the field of view geometry remains the same. The computed gaze vector is then projected onto the stimulus display and reported as coordinates.
The mounted systems are goggles in which the cameras are built into the goggle frames. Typically the system also includes a forward-pointing video camera recording the person’s visual field which serves as the projection plane for gaze. The choice of the setup depends on the context: the remote setup suits stimulus materials presented in 2D, is fully unobtrusive and the data analysis is usually straight forward, while ET goggles are the tool of choice for 3D context, such as shopping behaviour or handling real packages or devices, but the goggle data analysis is often quite tedious and hard to summarize on group level.
Tracking the user: visual attention and navigation
The eye-mind hypothesis assumes that where someone is looking at is associated with what she is paying attention to and thinking about.1
While it is easy to think of real-life examples of when the attention and gaze do not coincide, typically there is a strong link between the two, especially when a person is looking at something with a goal in mind. By revealing the gaze path, eye tracking thus provides us information about the focus of a person’s attention. On the other hand, the gaze is very fast to react, and the first sub-second interval while viewing a novel scene can be used to reveal the targets of non-conscious attention.
The visually salient features that capture our immediate attention are very much biologically hard-wired: we notice faces, especially with emotionally loaded facial expressions, we follow directional cues to see what people are looking at, and we notice motion.
Visual attention and how it shifts between different objects is a central aspect of the behaviour and experiences of users and consumers in various application fields.2 In the continuing quest of better understanding consumers and users and of designing products and services to better meet their various needs and preferences, as well as when designing marketing communications, eye tracking is a potent research tool that can yield valuable objective information that cannot be acquired by other means. In this article we show, through selected examples from our prior research, what kind of information of can be obtained via collection and analysis of eye tracking data. The examples also provide a glimpse of some of the application areas for eye tracking.
Figure 1 visualizes the gaze path of a single person visiting the main page of an experimental web site. The study focused on developing analysis methods for classifying and describing different kinds of attention transition paths in digital services.
The circles indicate fixations, during which the gaze was focused on a certain location. The size of the circle indicates the duration of the fixation, related to the attention paid to the element. Typically fixations build on visually salient elements such as a headers, pictures, logos, or navigation bars. Consecutive points of fixations are connected by saccades, quick eye movements during which the visual system receives no information.
In interactive products and digital services, a smooth flow of visual attention is a central factor of a good user experience, necessary for attracting the potential users to start using the service and to engage them to keep using the service. In simpler interfaces, e.g. for filling in one’s personal information or booking a flight, eye tracking can supplement performance measures, such as time taken to perform a certain task, and user satisfaction measures via interviews and questionnaires.
Participants of a usability test may not always be able to explain why a certain visual user interface is problematic for them. Eye tracking can reveal hard-to-pin-down problems in finding information or in transferring attention between user-interface elements. More significantly, eye tracking can suggest how the design could be improved for a smoother user experience, or indicate which one of the design versions provides a better user experience – in all phases of an iterative design process.
Figure 2. Heat map visualization of gaze data from 34 healthy, normal-weight women viewing an exercise course brochure for 60 seconds. The heat maps are computed for intervals 0- 10 seconds (left), 10–30 seconds (middle) and 30–60 seconds (right).
In the case of a web page, the situation is more complex as the users can have a wide variety of different types of explicit or implicit goals, from quickly finding a specific piece of information to entertaining themselves by unhurriedly reading any articles they might find interesting. Whatever the reason, the experience of using the service largely determines its value the user.
In conjunction with other user research methods, eye tracking can help to identify and understand different types of users and suggest how the design could be improved to better cater to their variable needs and values.
Gaze paths of individual users can provide valuable qualitative insights into how people actually use the service, which may be quite different from the assumptions made by the design team. However, design choices cannot be based on idiosyncrasies of individual people. In order to provide actionable results for design choices, eye tracking must answer questions concerning the visual attention of larger groups of people. Do they notice and understand the key interface elements? Is their attention drawn to the correct areas of marketing communication material? Would a different layout provide a better user experience and increased value? To answer these questions, group-level statistics of visual attention need to be extracted and communicated further.
Heat maps are a widely used – and misused – way of visualizing eye tracking data. In heat maps the stimulus material is overlaid with colours indicating the areas of most intense visual attention. Heat maps do not tell us whether specific elements were or were not noticed but they do provide an illustrative summary of the relative distribution of attention, either for an individual person or for a larger group of people.
Figure 2 shows cumulative heat maps from three viewing intervals. During the first 10 seconds, the users check the title of the brochure and read the text following it. Later the attention spreads more, but it is notable that visual information and the logo attract relatively little attention. Presenting such data as gaze paths of all the 34 users would not be visually informative but time-dependent heat maps or heat map videos summarize the overall navigation quite nicely.
The picture on the left in Figure 3 shows a cumulative heat map from 40 participants, when they viewed an experimental still image advertisement for five seconds. The heat map quickly reveals that the face of the person and the lotion bottle were the main foci of visual attention, while the shoes and the feet received less attention. Specifying these four elements as areas of interest (AOI), descriptive summative eye tracking statistics can be calculated, providing quantitative information of the visual attention.
The statistics, shown in the picture in the centre, include the dwell time, both in milliseconds and as a percentage of the total viewing time, and the hit ratio, indicating the percentage of participants that had at least one fixation in the given area of interest. We see, for example, that around 25 per cent of the time was spent looking at the lotion bottle and 23 per cent of the time at the woman’s face, while the shoes and feet received less attention, consistent with the heat map.
Figure 3. Heat map on the left visualizes the main focus of attention for 40 participants of an eye-tracking experiment. In the centre examples of group-level statistics of visual attention associated with four areas of interest (AOI) are shown. The picture on the right shows a path visualizing typical transitions of attention between the AOIs during the first moments of viewing the picture.
The heat maps and statistics discussed so far provide useful information of the focus of attention, averaged over many persons. However, they lack information about the transition of attention between different areas, another potentially useful piece of information for the design process.
Gaze paths work well for a single person (Figure 1) but the visualization becomes quickly too cluttered for a larger group. It is generally challenging to define or describe a group-level average gaze path in a meaningful way. However, the transitions of attention during the first moments of viewing can be visualized by first calculating the average time of the first fixation in each AOI, and then connecting the AOIs based on their first entry times (picture on the right in Figure 3). This visualization is easy to interpret but such simplified averaging may hide more intricate information, such as transitions of attention back and forth between two elements.
In our preliminary tests, unpublished as of yet, we have received promising results with multivariate analysis methods for automated identification and visualization of different types of gaze paths in digital services context.
In the examples presented so far we have dealt with static stimuli (still images and web pages without dynamic content). Naturally, the remote setup can be applied to viewing dynamic content as well: videos, gaming – basically anything in which the gaze can provide useful information. The remote setup suits relatively stationary observers looking at a stationary display device. Eye tracking glasses, shown on the right in Figure 4, allow the participant to move relatively freely in real-world environment, such as the grocery shop seen in this example.
Mobile head-mounted eye trackers capture a video recording of the scene viewed by the person. They also calculate and store in real time (30Hz in Fig 4 setup) the gaze position. Based on these data, the person’s focus of attention at each moment can be visualized in a number of different ways. A live heat map, as seen on the right in Figure 4, shows the focus and transition of the visual attention during the shopping trip.
Areas of interest can be specified also for real-world scenes, but in contrast to previous examples, considerable amount of manual work is required to arrive at AOI statistics. The complications here arise from the dynamic stimulus which varies from one participant to another. 3D modelling and computer vision methods hold promise for automating this process, and we have promising preliminary results of using this approach. Similar challenges and solutions apply also in analysis of e.g. game performance and social interaction, even when remote eye tracking setup is used.
Complementing eye tracking information
The information of visual attention can be complemented by other measurements. When measuring the user experience, biosignals from brain and body can be used to better characterize the quality of attention: level of arousal or the direction of approach motivation, and even reflections of emotions.
Eye tracking has been combined to brain imaging as well, providing more specific information of the cognitive processes involved in e.g. social interaction.3 Gaze data can also be used to pinpoint moments of interest, which vary in time between subjects: when is this subject viewing the target item and what is the emotional state at that moment? Viewing a video recording augmented with a visualization of their own gaze was found to be very motivating for the participants in a study of digital news reading services.4 The retrospective comments from the participants about specific moments of interest in their interaction with the service provided useful information for interpreting and integrating the eye tracking data with data from questionnaires, interviews, and preference judgments.
Both in digital and real-world environments various other kinds of data can be collected that helps to build a more comprehensive picture of the behaviour and experiences of customers and users, when analysed together with the eye tracking data.
In the case of digital services such as the web site of Figure 1, click data complements eye tracking data, providing a comprehensive log of interactions of the users with the service and allows tracking their paths from one web page to another.
In case of real world shopping environments, VTT’s people tracking system based on depth camera technology (VTT Impulse 2/2014) reveals the physical paths of all people in the area. This was used in the grocery store study (Figure 4) together with eye tracking in order to understand the navigation routes and investment of visual attention of different types of shoppers in different parts of the shop.
Figure 4. On the left, a participant of a shopping experience study wears eye tracking glasses while visiting a grocery store. On the right, a frame from a live heat map video visualizing the focus and transition of attention during the shopping trip.
Thinking outside the box – future applications
The major change in eye tracking scene has been technological development which has both made the systems easier to use and reduced to prices of the tools. The price of a good quality consumer study device is circa 20 000 €, and then basic use requires no special training. Data transfer interfaces to different devices exist as well. This development will certainly go on and possibly the eye tracking hardware will be integrated in e.g. laptops and display screens.
As eye tracking data is recorded on-line, it can be used as input to external software. User interfaces can be controlled by gaze: a fixation on an element, lasting for required time, triggers an action. Shutting the eyes (pupil detection interrupted) or looking outside stimulus display can be used as well. These control options are attractive for several applications and user groups, but one obvious target group is people with physical disabilities.
Another interesting field is operating vehicles and machinery (partly) by gaze, or using the gaze statistics to assess user/driver alertness. VTT Transport team has investigated e.g. the effect of new in-car displays on the drivers’ visual attention, and its connection to traffic safety.
Both virtual reality headsets and see-through glasses for augmented reality are inherently well-suited for integration of eye-tracking capabilities, and there are indeed already first prototypes and developer solutions for such integration.
Figure 5. Detection of spontaneous responses to hedonic stimuli using a remote eye tracking setup with monitoring of brain (EEG cap), autonomous nervous system (skin conductivity, the finger straps)) and facial expressions (EMG electrodes on facial muscles). The eye tracking device is the bar below the screen. VTT Neurosensing lab.
In virtual reality headsets eye tracking enables foveated rendering, a technology for optimizing the resolution at the area where the user is looking. Analogous with the working of the human visual system, foveated rendering is regarded as essential to the next generation of virtual reality experiences. In virtual reality gaze control can also be used in an intuitive manner in combination with gestures and voice control. The promise of gaze-based interaction is perhaps even greater in augmented reality solutions, where information can be augmented based on the user’s attention, supporting professionals in complex and challenging tasks in fields like manufacturing, logistics, technical service, or medical treatment. Eye tracking is expected to become a standard functionality in VR headsets and AR glasses.
Internet of things will equip everyday objects with sensors and communication skills. In a relatively near future we will have a capability to track the users’ eyes on practically every device they use, with increasing accuracy and processing power. For scientists, this will enable large-scale eye-tracking studies in natural environments. The use of gaze-based interactions will also open novel opportunities for creating smart environments: devices and user interfaces can be operated by gaze, and the environment will learn and adapt to the user’s routines and needs. Outside homes, retail, marketing, media, even manufacturing industry can apply similar ideas for smoother, more discreet and efficient processes.
Johanna Närväinen, PhD (medical physics), works as a senior scientist in VTT’s Digital Health team.
Närväinen is interested in the interaction between physiological responses, personality factors measured in various ways, and implicit and explicit preferences – and combining the understanding of such interaction with the theme of health behaviour. In her work, she uses a range of measurement techniques from brain imaging to activity wrist bands, while the related applications range from the neurobiology of eating behaviour to the identification of stress.
Janne Laine, M.Sc. (engineering), works as a senior scientist in VTT’s Digital Services in Context team. He is currently finishing his doctoral thesis on measurement and modelling of visual perception and user experiences.
Interested in all things visual, with a background in imaging technology and visual psychometry, his current work focuses mainly on understanding the relations between visual and other design variables of various kinds of services and the user and customer experiences and values, and in using this information to guide the design process. He is also interested in the use of serious games and gamification in different application areas.
Hoffman, J. E., 1998. “Visual attention and eye movements,” In:
Attention, Pashler, H. (ed.), Psychology Press, UK. p. 119–154.
Orquin JL and Loose SM, 2013. “Attention and choice: A review on eye
movements in decision making,” Acta Psychologica 144(1); 190-206.
Wilms M et al. 2010. “It’s in your eyes – using gaze-contingent stimuli
to create truly interactive paradigms for social cognitive and
affective neuroscience,” Social Cognitive and Affective Neuroscience
(SCAN) 5; 98-107.
 Laine, Janne, 2016. Experimental comparison of
the user experiences of different digital and printed newspaper
versions. Journal of Print and Media Technology Research (under review).