top of page
Robotec_logo_2022-RGB (1) (2).png



In our previous article, we explored historical and theoretical aspects of the Mental Workload construct, delved into the nature of MWL and its development over time, and learned about some existing models intertwined with this concept.  


But what is the point of explaining the theory behind it without the practical aspects of conducting research? Practical measures are the daily bread of any researcher, as they are essential for understanding how people interact with technology and environments. There are several approaches to measuring MWL, each with its strengths and weaknesses.  


Physiological measures like heart rate variability (HRV) and eye tracking may offer insights into the body’s response to cognitive demands. Rating scales like the NASA-TLX can give us a closer look at participants’ subjective experiences, while task-dependent measures account for the specific nature of the tasks and the cognitive demands they impose. By evaluating these methods, we aim to provide a brief yet comprehensive overview of how they contribute to our understanding of MWL and the challenges they present.  


Before we discuss some of the existing methods related to MWL, it is worth noting that not every measurement technique targets the same aspect of MWL. Part of the confusion stems from the fact that there's no agreed-upon definition of workload, as I mentioned in the first article. Scientists use the term to refer to the demands on the user, the effort the user puts into meeting those demands, or the effects of trying to meet them (Huey and Wickens, 1993). 


Subjective measures 

Rating scales are among the most basic measurement tools when it comes to MWL. These scales are subjective as they rely on the participant's reported feelings and thoughts, unlike objective indicators found in physiological or task-dependent measures. Mental workload scales, as described by Hart and Wickens (1990), can be divided into three categories: 


  1. Global -> single workload rating during performance of the task. 

  2. Multidimensional -> subscale ratings can pinpoint specific sources of workload, while combining these ratings gives a clear picture of the overall workload experienced. 

  3. Hierarchical -> in hierarchical scales raters make a series of choices, each one narrowing down the options until they reach a final numeric rating. 


So, let’s talk about the pros and cons of subjective techniques for measuring MWL. Starting with the pros, subjective techniques capture how individuals personally interpret their workload, giving us, researchers, some insights into their experience and effort. Usually, these methods are generally simple to administer and understand, making them accessible and less intrusive compared to physiological measures. Well-designed rating scales like WP or NASA-TLX are highly sensitive to changes in task difficulty and can accurately reflect variations in mental workload (e.g., Rubio et al., 2004). Finally, these methods are cost-efficient. 


What are the minuses then? I think it is possible to name a few.  First – bias and variability – this approach can be easily influenced by individuals’ internal factors like the level of cooperativeness or mood. Furthermore, surveys or scales require training participants before actual use, which translates to more time-consuming procedures. 


Another issue lies in the fact that the majority of workload subjective scales – except for the NASA-TLX for example – do not account for disentangling mental and physical effort. Without the questions explicitly separating these factors, we basically don’t know if the participant’s answers indicating high effort in a cognitive task are due to the task being indeed mentally exhausting, or if the participant was also assessing their moderate to high physical effort in that task, so the true score and the score given can diverge. 


To better understand this, let’s illustrate that with simple example. For instance, a participant is performing a physically demanding task, such as manually sorting heavy objects while following simple instructions. Although the task requires minimal cognitive effort, the participant may report a high workload due to the physical intensity that accompanies this task. In this case, the mental and physical aspects are not clearly separated which may be confusing for some participants, therefore resulting in a mismatch between the true score and the obtained/given score. 


Subjective measures are more in tune with mental strain but often miss the mark when it comes to physical or automatic tasks. They mostly pick up on workload changes that people are consciously aware of, so the perception of tasks that are repetitive or require little thought might not show up in the ratings. This means they're great for assessing tech that helps with decision-making, but not so much for evaluating repetitive and/or physical tasks (Vidulich, 1988). 

 

 

Performance measures 

Performance measures can be classified into two major categories: primary task measures and secondary task measures, and they can have various forms depending on the main objective. Primary task measures focus on evaluating the operator’s performance on the main task that the operator is asked to accomplish, while secondary task measures involve adding an extra task to the primary one to assess the operator’s remaining cognitive capacity (Eggemeier & Wilson, 1991).  


Let’s say that your primary task is driving a car, which reflects your driving performance – seems pretty simple. Now let’s add a secondary task, which aims at increasing your mental workload levels – using a phone would be a perfect match as it consumes your cognitive resources. These two categories of tasks are strictly intertwined with each other, as one shows performance, and the second one makes it more difficult. What comes from it is useful information about where the demands (driving/using a phone) exceed the operator’s capacity such that performance (driving) degrades from baseline or ideal levels. 


Now that we know what primary and secondary tasks are, we can delve right into the pluses and minuses of this approach. 


First, these methods are closely related to real-world tasks. This gives them high ecological validity. It means that findings can be easily generalized from scientific studies to real-world settings. Another advantage is the variety of indicators available. For example, in-vehicle tests, these can include speed, accuracy, reaction times, or error rates. This variety provides a comprehensive view of the task being studied (Eggemeier & Wilson, 1991). Lastly, these methods can be easily combined with other approaches, such as physiological or subjective methods. 


Now let’s consider the minuses – the level of performance, while necessary, is not sufficient to paint the bigger picture of mental workload, such as the cognitive or emotional cost involved (Wilson, 2004). Another important thing is the lack of scientific rigor in creating procedures that involve this approach which can hinder the outcomes. There are a lot of things to cover, starting with matching modality of primary and secondary tasks, choosing methodologically correct indicators, and ending on environmental setting (laboratory vs real-world) (Hart and Wickens,1990; Wickens, 1992). 


 

Physiological measures 

Physiological measures are some of the most fascinating tools we have for gauging mental workload (MWL). When the brain is tasked with higher levels of cognitive demand, it requires more resources to maintain performance, which in turn affects various physiological activities in the body. These activities include heart rate, brain electrical activity, eye movements, pupil size, and even metabolic changes—all of which serve as valid indicators of mental workload (Fairclough & Houston, 2004). 


One particularly insightful physiological measure is pupillometry, which involves measuring pupil size and its reactivity. This method is widely used in research to assess cognitive responses, as changes in pupil diameter can reflect levels of mental workload, attention, and arousal (Beatty & Lucero-Wagoner, 2000; Just et al., 2003). 


For instance, when we concentrate intensely, our pupils undergo quick, irregular changes known as the dilation reflex. This reflex is controlled by two muscle groups: the dilator muscles, which cause the pupil to enlarge, and the sphincter muscles, which typically make it constrict. During periods of intense focus, the dilator muscles are activated while the sphincter muscles are inhibited, leading to a noticeable and brief dilation of the pupils. 


And then there is light – the most influential factor when it comes to pupil dilation, which leads to certain issues. For example, even though an algorithm used in pupillometry – task-evoked pupillary response (TEPR) is a solid way to measure working memory (da Silva Castanheira et al., 2021), it can’t really tell if your pupils are changing size because of light or because of cognitive demand (Webber et al., 2021). This highlights a key limitation of TEPR because pupil size variation is more heavily influenced by light than by cognitive demand. Additionally, it’s worth emphasizing that it is possible to distinguish between the effects of light and cognitive load on pupil dilation, but standardized lighting conditions are needed for this. The matter becomes increasingly difficult when one wants to conduct a study in dynamic conditions where lighting changes (e.g. while driving). 


But it’s important to say that physiological measures, in that case, are much more than pupillometry alone – for example, when the mental capacity of a particular individual is exceeded, it is possible to see that in heart rate variability with the use of electrocardiography (ECG; see e.g., Heine et al., 2017) or even through electroencephalography (EEG) measurements with the use of event-related potentials (ERPs; see e.g., Dehais et al., 2019), which represent electrical activity generated by brain cortex in response to particular, repetitive stimuli. 

 


Now, let’s delve into the pros and cons of physiological measures. 

 

A distinct advantage of physiological measures over traditional psychological tools, such as response times, is their ability to provide a richer understanding of what’s happening within the body and mind during cognitive tasks. Unlike behavioral measures that primarily capture the outcomes (like how fast or accurately someone responds), physiological data allow us to see the ongoing, underlying processes, offering insights into how cognitive load fluctuates in real time. This continuous monitoring enables moment-by-moment estimates of changes in processing intensity (Beatty, 1982), revealing more about the mechanisms driving our mental and physical responses. 


What are the minuses then? 


The greatest weakness of psychophysiological measures is the lack of a strong conceptual link from the physiological measures to performance (Kramer, 1991), with Wilson and O'Donnell (1988) noting that individual physiological measures, while sensitive to specific demands, are insufficient to capture the multifaceted nature of mental workload. For example, in chess-boxing, a hybrid sport combining chess and boxing, physiological measures such as heart rate may reflect the physical demands of boxing but probably will fail to account for the cognitive load experienced during the chess rounds. This highlights the challenge of linking physiological data to performance in tasks that involve both physical and mental complexity. 


Another significant drawback is the cost—these measures are pretty expensive to use, whether you're considering data acquisition, analysis, or broader integration into your research setup. The implementation costs include not only the hardware and software needed to capture physiological data but also the integration of these systems with other parts of the experimental setup (for example: synchronization with driving simulation), which can be complex and costly. One thing to mention is that psychophysiological literature emphasizes the critical importance of selecting appropriate methods based on their sensitivity and relevance to specific aspects of workload, as physiological measures, while generally indicative of systemic stress, can be misleading if mismatched to the research question at hand (Lysaght et al., 1989). 

 

 

Integration of Multiple Measures 

To get a solid understanding of mental workload, it's often best to use a mix of different measurement methods. Each method—whether it’s subjective, performance-based, or physiological—brings something unique to the table, but they also have their own drawbacks. By combining these various approaches, researchers can get a more complete picture of MWL. This strategy also helps to confirm findings across different methods, offering a clearer view of the cognitive demands involved. 


That said, picking these methods shouldn’t be random. It’s important to think about the research goals, the specific tasks, and the context in which the workload is being measured. Leaning too heavily on just one type of measure, or throwing together a random mix, can result in confusing or misleading outcomes. That’s why it’s a good idea to use a well-chosen set of measures that would complement each other in terms of their potential reach of measurement. 


In the end, while no single method can cover all aspects of MWL, a smart combination of different techniques can offer the most reliable and useful insights for both researchers and practitioners. 



Author:

Mikołaj Sokołowski, Junior Researcher

 

🔴 Interested in full-driver state detection services, including complex testing for drowsiness, distraction, stress, and driving under the influence? E-mail us at:  humanfactors@robotec.ai

🔴 Curious about our latest projects? Follow us on  LinkedIn to keep up with our news.


 

Bibliography:  

Beatty, J. (1982). Task-evoked pupillary responses, processing load, and the structure of processing resources. Psychological Bulletin, 91(2), 276. 


Beatty, J., & Lucero-Wagoner, B. (2000). The pupillary system. In Handbook of Psychophysiology. 


Dehais, F., Duprès, A., Blum, S., Drougard, N., Scannella, S., Roy, R. N., & Lotte, F. (2019). Monitoring pilot’s mental workload using ERPs and spectral power with a six-dry-electrode EEG system in real flight conditions. Sensors, 19(6), 1324. 


Eggemeier, F. T., & Wilson, G. F. (1991). Workload assessment in multi-task environments. In D. L.


Damos (Ed.), Multiple task performance (pp. 207-216). London, GB: Taylor & Francis, Ltd. 


Fairclough, S. H., & Houston, K. (2004). A metabolic measure of mental effort. Biological Psychology, 66(2), 177-190. 


Granholm, E., Asarnow, R. F., Sarkin, A. J., & Dykes, K. L. (1996). Pupillary responses index cognitive resource limitations. Psychophysiology, 33(4), 457-461. 


Hart, S., & Wickens, C. D. (1990). Workload assessment and prediction. In H. R. Booher (Ed.),


MANPRINT: An approach to systems integration (pp. 257-296). New York: Van Nostrand Reinhold. 


Heine, T., Lenis, G., Reichensperger, P., Beran, T., Doessel, O., & Deml, B. (2017). Electrocardiographic features for the measurement of drivers' mental workload. Applied ergonomics, 61, 31-43. 


Huey, F. M., & Wickens, C. D. (1993). Workload transition: Implications for individual and team performance. Washington, DC: National Academy Press. 


Just, M. A., Carpenter, P. A., & Miyake, A. (2003). Neuroindices of cognitive workload: Neuroimaging, pupillometric and event-related potential studies of brain work. Theoretical Issues in Ergonomics Science, 4(1-2), 56-88. 


Kramer, A. F. (1991). Physiological metrics of mental workload: A review of recent progress. In D. L. Damos (Ed.), Multiple task performance (pp. 279-328). London, UK: Taylor & Francis, Ltd. 


Lysaght, R. J., Hill, S. G., et al. (1989). Operator workload: Comprehensive review and evaluation of operator workload methodologies. Fort Bliss, TX: U.S. Army Research Institute for the Behavioral and Social Sciences. 


Rubio, S., Díaz, E., Martín, J., & Puente, J. M. (2004). Evaluation of subjective mental workload: A comparison of SWAT, NASA‐TLX, and workload profile methods. Applied psychology, 53(1), 61-86. 


da Silva Castanheira, K., LoParco, S., & Otto, A. R. (2021). Task-evoked pupillary responses track effort exertion: Evidence from task-switching. Cognitive, Affective, & Behavioral Neuroscience, 21, 592-606. 


Vidulich, M.A. (1988). The cognitive psychology of subjective mental workload. Human Mental Workload. P.A. Hancock and N. Meshkati. Amsterdam, NL, Elsevier Science Publishers B.V. (North-Holland): 219-229. 


Weber, P., Rupprecht, F., Wiesen, S., Hamann, B., & Ebert, A. (2021). Assessing cognitive load via pupillometry. In Advances in Artificial Intelligence and Applied Cognitive Computing: Proceedings from ICAI’20 and ACC’20 (pp. 1087-1096). Springer International Publishing. 


Wickens, C. D. (1992). Engineering psychology and human performance. New York: HarperCollins Publishers Inc. 


Wilson, G. C., et al. (2004). Operator functional state assessment. Paris, France: North Atlantic Treaty Organisation (NATO), Research and Technology Organisation (RTO). 


Wilson, G. F., & O'Donnell, R. D. (1988). Measurement of operator workload with the neuropsychological workload test battery. In Advances in Psychology (Vol. 52, pp. 63-100). North-Holland. 

 

 

 

 

 

bottom of page