General overview of the theories used in assessment

Preparing to load PDF file. please wait...

0 of 0
100%
General overview of the theories used in assessment

Transcript Of General overview of the theories used in assessment

General overview of the theories used in assessment
Lambert WT Schuwirth Cees PM van der Vleuten

AMEE GUIDE
Theories in Medical Education

57

AMEE Guides in Medical Education

www.amee.org

Welcome to AMEE Guides Series 2

The AMEE Guides cover important topics in medical and healthcare professions education and provide information, practical advice and support. We hope that they will also stimulate your thinking and reflection on the topic. The Guides have been logically structured for ease of reading and contain useful take-home messages. Text boxes highlight key points and examples in practice. Each page in the guide provides a column for your own personal annotations, stimulated either by the text itself or the quotations. Sources of further information on the topic are provided in the reference list and bibliography. Guides are classified according to subject:

Teaching and Learning Research in Medical Education Education Management

Curriculum Planning Assessment Theories in Medical Education

The Guides are designed for use by individual teachers to inform their practice and can be used to support staff development programmes.

‘Living Guides’: An important feature of this new Guide series is the concept of supplements, which
will provide a continuing source of information on the topic. Published supplements will be available for download.
If you would like to contribute a supplement based on your own experience, please contact the Guides Series Editor, Professor Trevor Gibbs ([email protected]).
Supplements may comprise either a ‘Viewpoint’, where you communicate your views and comments on the Guide or the topic more generally, or a ‘Practical Application’, where you report on implementation of some aspect of the subject of the Guide in your own situation. Submissions for consideration for inclusion as a Guide supplement should be a maximum of 1,000 words.

Other Guides in the new series: A list of topics in this exciting new series are listed below and
continued on the back inside cover.

30 Peer Assisted Learning: a planning and
implementation framework Michael Ross, Helen Cameron (2007) ISBN: 978-1-903934-38-8
Primarily designed to assist curriculum developers, course organisers and educational researchers develop and implement their own PAL initiatives.
31 Workplace-based Assessment as an
Educational Tool John Norcini, Vanessa Burch (2008) ISBN: 978-1-903934-39-5
Several methods for assessing workbased activities are described, with preliminary evidence of their application, practicability, reliability and validity.
32 e-Learning in Medical Education
Rachel Ellaway, Ken Masters (2008) ISBN: 978-1-903934-41-8
An increasingly important topic in medical education – a ‘must read’ introduction for the novice and a useful resource and update for the more experienced practitioner.
33 Faculty Development: Yesterday, Today
and Tomorrow Michelle McLean, Francois Cilliers, Jacqueline M van Wyk (2010) ISBN: 978-1-903934-42-5
Useful frameworks for designing, implementing and evaluating faculty development programmes.

34 Teaching in the clinical environment
Subha Ramani, Sam Leinster (2008) ISBN: 978-1-903934-43-2
An examination of the many challenges for teachers in the clinical environment, application of relevant educational theories to the clinical context and practical teaching tips for clinical teachers.
35 Continuing Medical Education
Nancy Davis, David Davis, Ralph Bloch (2010) ISBN: 978-1-903934-44-9
Designed to provide a foundation for developing effective continuing medical education (CME) for practicing physicians.
36 Problem-Based Learning: where are we
now? David Taylor, Barbara Miflin (2010) ISBN: 978-1-903934-45-6
A look at the various interpretations and practices that claim the label PBL, and a critique of these against the original concept and practice.
37 Setting and maintaining standards in
multiple choice examinations Raja C Bandaranayake (2010) ISBN: 978-1-903934-51-7
An examination of the more commonly used methods of standard setting together with their advantages and disadvantages and illustrations of the procedures used in each, with the help of an example.

38 Learning in Interprofessional Teams
Marilyn Hammick, Lorna Olckers, Charles Campion-Smith (2010) ISBN: 978-1-903934-52-4
Clarification of what is meant by Inter-professional learning and an exploration of the concept of teams and team working.
39 Online eAssessment
Reg Dennick, Simon Wilkinson, Nigel Purcell (2010) ISBN: 978-1-903934-53-1
An outline of the advantages of online eAssessment and an examination of the intellectual, technical, learning and cost issues that arise from its use.
40 Creating effective poster presentations
George Hess, Kathryn Tosney, Leon Liegel (2009) ISBN: 978-1-903934-48-7
Practical tips on preparing a poster – an important, but often badly executed communication tool.
41 The Place of Anatomy in Medical
Education Graham Louw, Norman Eizenberg, Stephen W Carmichael (2010) ISBN: 978-1-903934-54-8
The teaching of anatomy in a traditional and in a problem-based curriculum from a practical and a theoretical perspective.

Institution/Corresponding address:
L W T Schuwirth, Flinders Innovations in Clinical Education, Health Professions Education, Flinders University, GPO Box 2100, Adelaide 5001, South Australia, Australia Tel: +61-418-989896 or +61-8-8204-7174 Fax: +61-8-8204 5675 Email: [email protected]
The authors:
Professor Dr Lambert Schuwirth MD PhD was formerly Professor for Innovative Assessment at the Department of Educational Development and Research at Maastricht University in the Netherlands. In 2011 he moved to Flinders University and is Strategic Professor for Medical Education at the Flinders Innovation in Clinical Education, Flinders University.
Professor Dr Cees PM van der Vleuten PhD is a psychologist, and he is Professor of Medical Education and Chairman of the Department of Educational Development and Research at Maastricht University in the Netherlands.

This AMEE Guide was first published in Medical Teacher: Schuwirth LW, van der Vleuten CPM (2011). General overview of the theories used in assessment: AMEE Guide No.57. Medical Teacher, 33(10): 783-97.

Guide Series Editor: Production Editor: Published by: Designed by:
© AMEE 2012 ISBN: 978-1-903934-98-2

Trevor Gibbs ([email protected]) Morag Allan Campbell Association for Medical Education in Europe (AMEE), Dundee, UK Lynn Thomson

Guide 57: General overview of the theories used in assessment

Contents

Abstract ..

..

..

..

..

..

..

1

Introduction

..

..

..

..

..

..

2

Theories on the development of (medical) expertise

..

..

..

3

Psychometric theories ..

..

..

..

..

..

7

Validity

..

..

..

..

..

..

..

7

Reliability ..

..

..

..

..

..

..

10

Classical Test Theory (CTT)

..

..

..

..

..

11

Generalisability Theory ..

..

..

..

..

..

13

Item Response Theory (IRT)

..

..

..

..

..

17

Emerging theories ..

..

..

..

..

..

21

Summary ..

..

..

..

..

..

..

26

Recommended reading

..

..

..

..

..

27

References

..

..

..

..

..

..

28

Guide 57: General overview of the theories used in assessment

Abstract
There are no scientific theories that are uniquely related to assessment in medical education. There are many theories in adjacent fields, however, that can be informative for assessment in medical education, and in the recent decades they have proven their value. In this AMEE Guide we discuss theories on expertise development and psychometric theories, and the relatively young and emerging framework of assessment for learning.
Expertise theories highlight the multistage processes involved. The transition from novice to expert is characterised by an increase in the aggregation of concepts from isolated facts, through semantic networks to illness scripts and instance scripts. The latter two stages enable the expert to recognise the problem quickly and form a quick and accurate representation of the problem in his/her working memory. Striking differences between experts and novices is not per se the possession of more explicit knowledge but the superior organisation of knowledge in his/her brain and pairing it with multiple real experiences, enabling not only better problem solving but also more efficient problem solving.
Psychometric theories focus on the validity of the assessment – does it measure what it purports to measure and reliability – are the outcomes of the assessment reproducible. Validity is currently seen as building a train of arguments of how best observations of behaviour (answering a multiple-choice question is also a behaviour) can be translated into scores and how these can be used at the end to make inferences about the construct of interest. Reliability theories can be categorised into classical test theory, generalisability theory and item response theory. All three approaches have specific advantages and disadvantages and different areas of application.
Finally in the Guide, we discuss the phenomenon of assessment for learning as opposed to assessment of learning and its implications for current and future development and research.

The transition from novice to expert is characterised by an increase in the aggregation of concepts from isolated facts, through semantic networks to illness scripts and instance scripts.

Take Home messages
• Neither good quality development of assessment in medical education, nor any scientific study related to assessment, can do without a sound knowledge of the theories underlying it.
• Validation is building a series of arguments to defend the principle that assessment results really represent the intended construct and without which validation is never complete.
• An assessment instrument is never valid per se, it is only valid for a specific goal or specific goals.
• The validity of an assessment instrument is generally not determined by its format but by its content.
• Reliability is the extent to which test results are reproducible and can be seen as one of the important components of the validity argument.
• When applying one of the theories on reliability, the user should be acquainted with the possibilities, limitations and underlying assumptions to avoid over- or underestimations of the reproducibility.
• In addition to calculating the reliability of an instrument it is insightful to calculate the standard error of measurement as well and compare this to the original test data.
• When building an assessment programme it is imperative to clearly define the goals of the assessment programme.

Guide 57: General overview of the theories used in assessment



Introduction
It is our observation that when the subject of assessment in medical education is raised it is often the start of extensive discussions. Apparently assessment is high on everyone’s agenda. This is not surprising because assessment is seen as an important part of education in the sense that it not only defines the quality of our students and our educational processes, but it is also seen as a major factor in steering the learning and behaviour of our students and faculty.
Arguments and debates on assessment however, are often strongly based on tradition and intuition. It is not necessarily a bad thing to heed tradition. George Santayana already stated (quoting Burk) that “Those who do not learn from history are doomed to repeat it”1. So, we think that an important lesson is also to learn from previous mistakes and avoid repeating them.
Intuition is also not something to put aside capriciously, it is often found to be a strong driving force in the behaviour of people. But again, intuition is not always in concordance with research outcomes. Some research outcomes in assessment are somewhat counter intuitive or at least unexpected. Many researchers may not have exclaimed “Eureka” but “Hey, that is odd” instead. This leaves us, as assessment researchers, with two very important tasks. Firstly, we need to critically study which common and tradition-based practices still have value and consequently which are the mistakes which should not be repeated. Secondly, it is our task to translate research findings to methods and approaches in such a way that they can easily help changing incorrect intuitions of policy makers, teachers and students into correct ones. Both goals cannot be attained without a good theoretical framework in which to read, understand and interpret research outcomes. The purpose of this AMEE Guide is to provide an overview of some of the most important and most widely-used theories pertaining to assessment. Further Guides in assessment theories will give more detail on the more specific theories pertaining to assessment.
Unfortunately, like many other scientific disciplines, medical assessment does not have one overarching or unifying theory. Instead it draws on various theories from adjacent scientific fields, such as general education, cognitive psychology, decision-making and judgement theories in psychology and psychometric theories. In addition, there are some theoretical frameworks evolving which are more directly relevant to health professions assessment, the most important of which (in our view) is the notion of ”assessment of learning” versus “assessment for learning” (Shepard, 2009).
In this AMEE Guide we will present the theories that have featured most prominently in the medical education literature in the recent four decades. Of course this AMEE Guide can never be exhaustive; the number of relevant theoretical domains is simply too large, nor can we discuss all theories to their full extent. Not only would this make this AMEE Guide too long, this would also be beyond its scope, namely to provide a concise overview. Therefore, we will discuss only the theories on the development of medical expertise and psychometric theories and then end by highlighting the differences between

Arguments and debates on assessment are often strongly based on tradition and intuition. It is not necessarily a bad thing to heed tradition.

1 From: George Santayana (1905) Reason in Common Sense, volume 1 of The Life of Reason found at http://evans-experientialism.freewebspace.com/santayana.htm, accessed 3 February 2011.



Guide 57: General overview of the theories used in assessment

assessment of learning and assessment for learning. As a final caveat we must say here that this AMEE Guide is not a guide to methods of assessment. We assume that the reader has some prior knowledge about this or we would like to refer to specific articles or to text books, for example Dent and Harden (Dent & Harden, 2009).

Theories on the development of (medical) expertise
What distinguishes someone as an expert in the health sciences field? What do experts do differently compared to novices when solving medical problems? These are questions that are inextricably tied to assessment, because if you don’t know what you are assessing it becomes also very difficult to know how you can best assess.
It may be obvious that someone can only become an expert through learning and gaining experience.
One of the first to study the development of expertise was A.D. de Groot (De Groot, 1978) who wanted to explore why chess grandmasters became grandmasters and what made them differ from good amateur chess players. His first intuition was that grandmasters were grandmasters because they were able to think more moves ahead than amateurs. He was surprised, however, to find that this was not the case; players of both expertise groups did not think further ahead than roughly seven moves. What he found, instead, was that grandmasters were better able to remember positions on the board. He and his successors (Chase & Simon, 1973) found that grandmasters were able to reproduce positions on the board more correctly, even after very short viewing times. Even after having seen a position for only a few seconds they were able to reproduce it with much greater accuracy than amateurs.
One would think then that they probably had superior memory skills, but this is not the case. The human working memory has a capacity of roughly seven units (plus or minus two) and this cannot be improved by learning (Van Merrienboer & Sweller, 2005; van Merrienboer & Sweller, 2010).
The most salient difference between amateurs and grandmasters was not the number of units they could store in their working memory, but the richness of the information in each of these units.
To illustrate this imagine having to copy a text in your own language, then a text in a foreign Western European language and then one in a language that uses a different character set (Cyrillic for example). It is clear that copying a text in your own language is easiest and copying a text in a foreign character set is the most difficult. While copying you have to read the text, store it in your memory and then reproduce it onto the paper. When you store the text in your native language all the words (and some fixed expressions) can be stored as one unit, because they relate directly to memories already present in your long term memory. You can spend all your cognitive resources on memorising the text. In the foreign character set you will also have to spend part of your cognitive resources on memorising the characters, for which you have no prior memories (schemas) in your long term memory.

...if you don’t know what you are assessing it becomes also very difficult to know how you can best assess.

Guide 57: General overview of the theories used in assessment



A medical student who has just started his/her study will have to memorise all the signs and symptoms when consulting a patient with heart failure, whereas an expert can almost store it as one unit (and perhaps only has to store the findings that do not fit to the classical picture or mental model of heart failure). This increasing ability to store information as more information-rich units is called chunking and it is a central element in expertise and its development. Box 1 provides an illustration of the role of chunking.
Box 1 The role of chunking in storing and retrieving information
Through chunking a person is able to store more information and, as long as the information is more meaningful, with even greater ease.
Suppose you were asked to memorise the following 20 characters: Aomcameinaetaiodbtai
You will probably find it a difficult task (but doable)
Suppose we now increase the number of characters and ask you to memorise them again:
Assessment of medical competence and medical expertise is not an easy task, and is often dominated by tradition and intuition.
Now the message contains 126 characters (including spaces and the full stop), but is much easier to memorise.

A medical student who has just started his/her study will have to memorise all the signs and symptoms when consulting a patient with heart failure, whereas an expert can almost store it as one unit.

So, why were the grandmasters better than good amateurs? Well mainly because they possessed much more stored information about chess positions than amateurs did, or in other words, they had acquired so much more knowledge than the amateurs had.
If there is one lesson to be drawn from these early chess studies – which have been replicated in such a plethora of other expertise domains that it is more than reasonable to assume that these findings are generic – it is that a rich and well-organised knowledge base is essential for successful problem solving (Chi et al., 1982; Polsen & Jeffries, 1982).
The next question then would be: “What does ‘well-organised’ mean?” Basically it comes down to organisation that will enable the person to store new information rapidly and with good retention and to be able to retrieve relevant information when needed. Although the computer is often used as a metaphor for the human brain (much like the clock was used as a metaphor in the nineteenth century) it is clear that information storage on a hard disk is very much different from human information storage. Humans do not use a File Allocation Table (FAT) to index where the information can be found, but have to embed information in existing (semantic) networks (Schmidt et al., 1990). The implication of this is that it is very difficult to store new information if there is no existing prior information to which it can be linked. Of course, the development of these knowledge networks is quite individualised, and based on the individual learning pathways and experiences. For example, we – the authors of this AMEE Guide – live in Maastricht, so our views, connotations and associations with “Maastricht” differ entirely from those of most of the readers of the AMEE Guides, although we may share the knowledge that it is a city (and perhaps

...rich and well-organised knowledge base is essential for successful problem solving.



Guide 57: General overview of the theories used in assessment

that it is in the Netherlands) and that there is a university with a medical school, much of the rest of the knowledge is much more individualised.
Knowledge generally is quite domain specific (Elstein et al., 1978; Eva et al., 1998); a person can be very knowledgeable on one topic and be a lay person on another, and because expertise is based on a well-organised knowledge base, expertise is domain specific as well. For assessment this means that the performance of a candidate on one case or item of a test is a poor predictor for his or her performance on any other given item or case in the test. Therefore, one can never rely on limited assessment information, i.e. high stakes decisions made on the basis of a single case (for example a high-stakes final VIVA) are necessarily unreliable.
A second important and robust finding in the expertise literature – more specifically the diagnostic expertise literature – is that problem solving ability is idiosyncratic (cf. for example the overview paper by Swanson et al. (Swanson et al., 1987)). Domain specificity, which we discussed above, means that the performance of the same person varies considerably across various cases. Idiosyncrasy here means that the way different experts solve the same case varies substantially between different experts. This is also logical, keeping in mind that the way the knowledge is organised is highly individual. The assessment implication from this is that when trying to capture for example the diagnostic expertise of candidates the process may be less informative than the outcome, as the process is idiosyncratic (and fortunately the outcome of the reasoning process is much less so).
The third and probably most important issue is the matter of transfer (Norman, 1988; Regehr & Norman, 1996; Eva, 2004). This is closely related to the previous issue of domain specificity and idiosyncrasy. Transfer pertains to the extent to which a person is able to apply a given problem-solving approach to different situations. It requires that the candidate understands the similarities between two different problem situations and recognises that the same problem solving principle can be applied. Box 2 provides an illustration (drawn from a personal communication with GR Norman).

Knowledge generally is quite domain specific; a person can be very knowledgeable on one topic and a lay person on another, and because expertise is based on a wellorganised knowledge base, expertise is domain specific as well.

Box 2: The role of transfer in problem solving
Problem 1: You are in possession of a unique and irreplaceable light bulb. Unfortunately the filament is broken so you cannot light the bulb anymore. There is no way of removing the glass without breaking the light bulb and to repair, you have to weld the filament with a laser beam. For this you will need an energy output of 1000 Watts. Unfortunately the glass will break if a laser beam with an intensity of more than 100 Watts runs through it. • How can you weld the filament?
Problem 2: You are an evil medieval knight. You want to conquer a tower from your enemy. The tower is located on a small piece of land, an island completely surrounded by a moat. To successfully conquer the tower you must bring 500 men simultaneously onto the island. Unfortunately any bridge you can build will only hold 100 men. • How do you bring 500 men on the island simultaneously?

Guide 57: General overview of the theories used in assessment



Most often the first problem is not recognised as being essentially the same as the second and that the problem solving principle is also the same. Both solutions lie in the splitting up of the total load into various parts. In problem 1, the 1000 Watt laser beam is replaced by 10 rays of 100 Watts each, and converging right on the spot where the filament was broken. In the second problem the solution is more obvious: build 5 bridges and then let your men run onto the island. If the problem were represented as: you want to irradiate a tumour but you want to do minimal harm to the skin above it, it would probably be recognised even more readily by physicians. The specific presentation of these problems is labelled as the surface features of the problem and the underlying principle is referred to as the deep structure of the problem. Transfer exists by the virtue of the expert to be able to identify the deep structure and not to be blinded by the surface features.
One of the most widely used theories on the development of medical expertise is the one suggested by Schmidt, Norman and Boshuizen (Schmidt, 1993; Schmidt & Boshuizen, 1993). Generally put, this theory postulates that the development of medical expertise starts with the collection of isolated facts which further on in the process are combined to form meaningful (semantic) networks. These networks are then aggregated into more concise or dense illness scripts (for example pyelonephritis). As a result of many years of experience these are then further enriched into instance scripts, which enable the experienced clinician to recognise a certain diagnosis instantaneously. The most salient difference between illness scripts (that are solidified patterns of a certain diagnosis) and instance scripts is that in the latter contextual, and for the lay person sometimes seemingly irrelevant, features are also included in the recognition. Typically these include the demeanour of the patient or his/her appearance, sometimes even an odour, etc.
These theories then provide important lessons for assessment:
1 Do not rely on short tests. The domain specificity problem informs us that high-stakes decisions based on short tests or tests with a low number of different cases are inherently flawed with respect to their reliability (and therefore also validity). Keep in mind that unreliability is a two-way process: it does not only imply that someone who failed the test could still have been satisfactorily competent, but also that someone who passed the test could be incompetent. The former candidate will remain in the system and be given a re-sit opportunity, and this way the incorrect pass-fail decision can be remediated, but the latter will escape further observation and assessment, and the incorrect decision cannot be remediated again.
2 For high-stakes decisions, asking for the process is less predictive of the overall competence than focussing on the outcome of the process. This is counterintuitive, but it is a clear finding that the way someone solves a given problem is not a good indicator for the way in which s/he will solve a similar problem with different surface features; s/he may not even recognise the transfer. Focussing on multiple outcomes or some essential intermediate outcomes – such as with extended-matching questions, key-feature approach assessment or the script concordance test – is probably better than in-depth questioning the problem-solving process (Bordage, 1987; Case & Swanson, 1993; Page & Bordage, 1995; Charlin et al., 2000).

Transfer exists by the virtue of the expert to be able to identify the deep structure and not to be blinded by the surface features.
The most salient difference between illness scripts and instance scripts is that in the latter contextual, and for the lay person sometimes seemingly irrelevant, features are also included in the recognition.
The domain specificity problem informs us that high-stakes decisions based on short tests or tests with a low number of different cases are inherently flawed with respect to their reliability (and therefore also validity).



Guide 57: General overview of the theories used in assessment
AssessmentTheoriesExpertiseEducationOverview