вход по аккаунту



код для вставкиСкачать
Medical Teacher
ISSN: 0142-159X (Print) 1466-187X (Online) Journal homepage:
Twelve tips for developing an OSCE that measures
what you want
Vijay John Daniels & Debra Pugh
To cite this article: Vijay John Daniels & Debra Pugh (2017): Twelve tips for developing an OSCE
that measures what you want, Medical Teacher, DOI: 10.1080/0142159X.2017.1390214
To link to this article:
Published online: 25 Oct 2017.
Submit your article to this journal
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
Download by: [Georgetown University]
Date: 26 October 2017, At: 02:08
Twelve tips for developing an OSCE that measures what you want
Vijay John Danielsa
and Debra Pughb
Department of Medicine, University of Alberta, Edmonton, Canada; bDepartment of Medicine, University of Ottawa, Ottawa, Canada
Downloaded by [Georgetown University] at 02:08 26 October 2017
The Objective Structured Clinical Examination (OSCE) is used globally for both high and low stakes assessment. Despite its
extensive use, very few published articles provide a set of best practices for developing an OSCE, and of those that do,
none apply a modern understanding of validity. This article provides 12 tips for developing an OSCE guided by Kane’s validity framework to ensure the OSCE is assessing what it purports to measure. The 12 tips are presented in the order they
would be operationalized during OSCE development.
The Objective Structured Clinical Examination (OSCE) was
first introduced in 1975 (Harden et al. 1975) and, since that
time, OSCEs have been used extensively (Patrıcio et al.
2013) for assessing clinical skills, both at local institutions
and on national high-stakes examinations. There are multiple review articles that examine the use of OSCEs in
health professions education (Walsh et al. 2009; Smith et al.
2012; Patrıcio et al. 2013; Hastie et al. 2014; Hodges et al.
€mert et al. 2016; Kreptul
2014; Setyonugroho et al. 2015; Co
and Thomas 2016), including psychometric evidence for
their use (Brannick et al. 2011); however, few provide a set
of best practices for developing an OSCE. Of those that do
(Casey et al. 2009; Nulty et al. 2011; Sturpe 2010), none
apply a modern understanding of validity to ensure the
OSCE is assessing what it purports to measure.
Our understanding of validity has evolved from several
separate types of validity (e.g. criterion, content validity
etc.) to a unitary concept of construct validity in which various sources of evidence are used to support an argument
for validity, first through Messick’s framework of the five
sources (Messick 1989) and, more recently, through Kane’s
argument-based approach to validation (Kane 2013). As
summarized by Cook et al (2015), Kane’s framework
involves focusing on four key steps to ensure valid interpretation from observation to making a decision based on
the assessment. The first step is translation of an observed
performance into a score (Scoring) ensuring the score
reflects the performance as best as possible. The second
step is generalizing the score from the specific examination
to the test performance environment (i.e. all possible
equivalent tests – Generalization). Third is extrapolating performance in the test environment to real life (Extrapolation).
And finally the fourth step is the interpretation of this information for making a decision (Implications). The two main
threats to validity are construct underrepresentation (too
little sampling or inappropriate sampling) and constructirrelevant variance (anything unrelated to the construct of
interest that results in score variability).
Over 25 years ago, Harden published a 12 Tips paper for
organizing an OSCE (Harden 1990). The purpose of that
paper was to provide guidance to those developing and
administering an OSCE, and focused mainly on practical
concerns. In contrast, the purpose of this paper is to provide 12 tips for developing an OSCE that measures what
you want, as viewed through the lens of Kane’s validity
framework. The 12 tips are presented in the order they
would be operationalized when developing an OSCE. Key
points from each tip are summarized in Table 1, demonstrating how they relate to each of the categories of validity evidence.
Tip 1
Decide on the intended use of the results from your
Development of an OSCE should begin with the end: What
decisions will I make with the results?; Is the OSCE formative or summative? The answers to these questions provide
evidence for the Implications stage of Kane’s model. And
though this stage is last, the answers to these questions
will frame the rest of OSCE development, and hence why
they must be asked first. For example, a lower stakes exam
would be used to provide feedback to learners, and could
lead to individual coaching or remediation, compared to a
higher stakes end-of-clerkship or national certification
examination, that can result in repeating a clerkship or year
of residency. For these reasons, a lower stakes exam does
not require the same level of score reliability as a high
stakes examination (Downing 2004), and so a shorter examination is possible.
Another novel design is the sequential OSCE in which all
candidates would be required to participate in a relatively
short screening examination. Then, only those who perform
below a predefined standard would subsequently be
required to participate in a full-length OSCE to assess their
skills. Two different studies used available data to model
CONTACT Vijay John Daniels
Division of General Internal Medicine, University of Alberta Hospital, 5-112 Clinical Sciences
Building, 11350 83rd Avenue NW, Edmonton, Alberta, Canada, T6G 2G3
ß 2017 Informa UK Limited, trading as Taylor & Francis Group
Table 1. Categories and examples of validity evidence.
Categories of validity evidence
Examples of how validity evidence can be demonstrated
Description of how rating instruments were developed and selected
Training of raters
Training of standardized patients
Performance of an item analysis/reliability within each station
Use of test security measures
Quality assurance
Use of a blueprint to ensure appropriate sampling of the domain
Calculation of measures of reliability across stations (e.g. Cronbach alpha, G-study)
Comparison of experts to a novice group
Demonstration of correlations with other measures of the same construct (e.g. communication skills measured by an
OSCE and by an in-training evaluation)
Use of content experts to develop authentic cases
Relevance to real-life clinical tasks
Standard Setting process
Analysis of pass–fail consequences (e.g. remediation opportunities)
Exploration of how the assessment influences learning
Exploration of how the assessment influences curriculum
Downloaded by [Georgetown University] at 02:08 26 October 2017
this approach and demonstrated this would increase score
reliability for borderline candidates and could save money
if designed properly (Pell et al. 2013; Currie et al. 2016).
Tip 2
Decide what your OSCE should assess
OSCEs cannot be used to assess an entire content
domain. Rather, they are used to assess a sample of the
knowledge and skills that learners are expected to have
mastered. To ensure that an OSCE reflects educational
objectives, blueprinting is key. Blueprinting refers to the
process by which content experts ensure that constructs
of interest are adequately represented (Coderre et al.
2009). For example, if the goal of the OSCE is to assess
clinical skills, such as history-taking and physical examination skills, then the blueprint should include a wide variety of stations that reflect this. This helps to ensure that
one can generalize performance on these stations to the
learner’s ability to perform other history and physical
examinations in an OSCE (Generalization). The length of
each station is usually between five and ten minutes
(Khan et al. 2013) but could be longer depending on
what task is being assessed. There must be enough stations to adequately sample the construct of interest, taking into account the intended use of the exam results
(i.e. low versus high stakes). A lower stakes locally developed exam may have only eight to ten stations, whereas
a high stakes OSCE may require 14-18 stations to achieve
acceptable reliability (Khan et al. 2013).
Although OSCEs have been used to assess all of the
CanMEDS roles (Frank et al. 2015; Jefferies et al. 2007),
there are challenges in assessing the intrinsic (i.e. nonMedical Expert) roles authentically (e.g. professionalism, collaboration, etc.), which has an impact on how well the test
performance extrapolates to real-world performance. The
more focused the OSCE blueprint, the better it will provide
validity evidence for generalization to other test settings,
though at the expense of extrapolation to other skills. A
programmatic approach to assessment (Schuwirth and van
der Vleuten 2011) would view an OSCE as one part of an
overall assessment framework. This leads to two questions
that can guide OSCE development: (1) Where else are (or
could) skills be assessed in my overall program?; and (2) If I
choose to assess this in an OSCE, can I do it authentically?
Tip 3
Develop the cases
Once you have decided what will be assessed by your
OSCE, careful consideration should be given to case development. Cases should be developed to ensure that they
authentically represent the clinical problem of interest
(Extrapolation). Instructions to candidates should include
information related to the presenting problem, a task, and
a time-frame for completing the encounter (Pugh and
Smee 2013).
Cases should undergo review by both content experts
as well as educational experts to ensure that the cases
reflect best practices of OSCE case development (Pugh and
Smee 2013). These experts should consider the following
questions in their review: (1) Is the task clear? (Kane’s
Scoring stage); (2) Is there enough time to complete the
task in the allotted time?; (3) Does the case authentically
represent a clinical problem?; and (4) Is the level of difficulty appropriate for the learners? (the last three relate to
Kane’s Extrapolation stage). Pilot-testing of cases at this
stage can help identify and mitigate potential issues.
Tip 4
Decide how your OSCE should assess candidates (the
scoring rubric)
The development of scoring rubrics is an area where much
of the research on OSCE validity has focused. A description
for how rubrics were developed or selected can provide
important validity evidence for Scoring in Kane’s framework.
Rubrics for each OSCE case can involve checklists and/or
rating scales.
Checklists are used to assess observable behaviors (e.g.
asked about smoking history, identified the JVP, etc.).
Checklists are generally dichotomous (e.g. did or did not
do), but they can also be polytomous (e.g. done well,
attempted but not done well, not done) (Pugh, Halman,
et al. 2016). Checklists should be carefully constructed to
avoid rewarding learners who use a rote approach unless
that is the goal, such as for very junior medical students.
For most learners, there should be an attempt to include
items that help to discriminate between learners who
understand the subject matter and those who do not (i.e. a
key features approach) (Daniels et al. 2014). If one uses
Downloaded by [Georgetown University] at 02:08 26 October 2017
long checklists that reward nonspecific thoroughness as
opposed to focusing on key clinically discriminating features in a history or physical examination, this will not
extrapolate well to what we want in physicians as thoughtful diagnosticians. Although, intuitively, it makes sense to
apply differential weights to checklist items based on their
perceived importance, weighting items does not appear to
affect overall reliability or pass/fail decisions significantly
(Sandilands et al. 2014), and thus the decision regarding
weighting checklist items should be based on considerations of the construct of interest and rewarding the behaviors you are seeking.
Unlike checklist items, rating scales can capture a wider
spectrum of performance, and are better suited for skills
that exist along a continuum (e.g. communication, rapport,
organization, procedural flow, etc.) (Swanson and van der
Vleuten 2013). Rating scales allow raters to make judgments about candidate performance, thus capitalizing on
their expertise. When developing rating scales, careful consideration should be given to the anchors used to provide
guidance to raters. Vague anchors (e.g. inferior, borderline
or excellent), may be less meaningful to raters than behavioral anchors (e.g. scattered, shotgun approach to the problem). Entrustability-aligned scales (e.g. could perform the
procedure with minimal assistance”) are emerging as a useful approach to assessment, but are generally reserved for
workplace-based assessment (Gofton et al. 2012).
Tip 5
Train your raters
Further support for Scoring includes evidence demonstrating raters were trained to ensure they interpreted scoring
rubrics as intended. Raters should be provided with an
orientation that includes information about the purpose of
the OSCE, the level of the learners, and how they should
interact with learners (e.g. can they provide prompts or
feedback to learners?). They should also be provided with
examples of the scoring rubrics, including the operational
definition of success on any checklist items and the meaning of each behavioral anchor for rating scales.
A more detailed form of orientation, such as frame-ofreference training, is sometimes provided to raters, which
involves creating a shared mental model of the desired performance by defining performance dimensions, providing
examples of behaviors for each dimension, and then allowing raters to practice and receive feedback on sample performances (Roch et al. 2012). This method can be timeconsuming and is usually reserved for high-stakes examinations, but can strengthen the validity argument for scoring.
It is important to remember that any undesired variation
in rater scoring may introduce construct irrelevant variance
and thus threaten the validity of scoring inferences made.
Despite training, raters may make mistakes. Although traditionally we often think of some raters as excessively harsh
or lenient compared to other raters (i.e. hawks and doves),
more recent research demonstrates that rater variability is
more complex than this (Govaerts et al. 2013; Gingerich
et al. 2014). With that said, when systematic errors in raters
are evident, there are published approaches to deal with
these (Bartman et al. 2013; Fuller et al. 2017) that are
beyond the scope of this paper.
Tip 6
Develop scripts for and train standardized patients
Most OSCEs employ the use of standardized patients (SPs)
to allow learners to demonstrate their clinical skills. A rigorous and standardized approach to SP training provides
further validity evidence for the integrity of Scoring as it
reduces the variance between SP portrayals.
SPs should be provided with a script to guide their portrayal, and basing the script on a real patient adds authenticity. For history stations, the script is relatively rich in
details about: the presenting problem (including a timeline
and pertinent positives and negatives); the SP’s past medical history (including medication use); and social history
(e.g. smoking and alcohol use), as required. At a minimum,
there should be a scripted answer for all checklist items,
but there should be answers provided for any anticipated
questions that learners might ask. For unanticipated questions, SPs can be trained to answer either “no” or “I’m not
sure” depending on the context. In contrast, for physical
examination stations, fewer details may be required, but
SPs can be trained to react to stimuli (e.g. guarding during
an abdominal examination, limited range of motion of a
joint, etc.).
Other details to be included in the script may relate to
demographics (e.g. age and gender), SP starting position in
room (e.g. sitting vs lying down), appearance (e.g. anxious
vs calm), and behavior (e.g. cooperative vs evasive). The
script may also include statements or prompts for the SP to
ask (e.g. “What do you think is going on with me?”) to
allow raters to better assess learners’ understanding of the
Tip 7
Ensure integrity of data collection processes
Data collection should have some sort of quality assurance
to ensure data integrity. This provides further evidence
that test scores reflect the observations (Kane’s Scoring
stage). During an OSCE, staff can periodically verify that
raters are completing the rating instruments correctly (i.e.
not skipping any items) and address any questions they
might have. After the OSCE, if scores are manually
entered into a computer, a random set of score sheets
should be checked to ensure accurate data entry. There
are reasonably-priced software packages that allow creating scannable score sheets which reduces, but does not
eliminate, the need for random verification. Some centers
may have access to tablets and eOSCE systems that have
an added advantage of reducing time to transcribe comments and number of missed rating scales, and can
increase the quantity and quality of feedback (Daniels
et al. 2016; Denison et al. 2016). However, having reliable
internet access for internet-based systems, and back-up
plans for when a tablet or the eOSCE system fails is
Decisions must be made about missing data (e.g. a rating scale that is left blank). For example, scores may be calculated without the missing data, data may be
extrapolated, or, in extreme circumstances, the station may
need to be deleted if there is insufficient data to render a
judgment (Fuller et al. 2017).
Finally, as with any assessment, one must consider the
issue of test security. To ensure an accurate measurement
of learners’ abilities, it is important that all students have
equal access to information about the assessment.
Unauthorized access to test materials (e.g. through student
created ghost banks) provides learners with an unfair
advantage that threatens the validity of the interpretation
of scores from the OSCE.
Tip 8
Downloaded by [Georgetown University] at 02:08 26 October 2017
Choose a standard setting approach
The choice of standard-setting methods (i.e. cut score) also
deserves careful attention in order to support the validity
of score interpretations as this impacts the Implications of
the assessment. Cut scores that are inappropriately high
may result in failing learners who are actually competent,
while cut scores that are too low may lead weak learners
to be overly confident in their abilities. This is especially
important for high-stakes assessments in which pass-fail
decisions have important repercussions for learners, educators and patients.
Although there is no gold standard when setting a cutscore, a detailed rationale for the method chosen should be
provided. The three most common criterion-referenced
methods used for OSCEs are Angoff, Borderline Group, and
Borderline Regression. Detailed explanations of each are provided in Yousuf and colleagues’ (Yousuf et al. 2015) recent
OSCE standard setting study. The chosen method is applied
at the station level to determine the initial cut score.
The next decision is whether the overall pass/fail determination should be based on the overall OSCE score alone,
or if examinees must also pass a minimum number of stations. The latter (conjunctive) approach is favored by some
educators, to ensure that examinees demonstrate a breadth
of knowledge (i.e. that a failing performance on several stations cannot be compensated for by very strong performance on others) (Homer et al. 2017). A conjunctive
approach will increase failure rates so this decision should
be based on the intended use of the OSCE and the consequences of failing it.
Tip 9
Consider how well the OSCE would generalize to all
possible forms
Another important source of validity evidence relates to
the Generalizability of the results. Support for this element
of the validity argument can be provided by analyzing the
psychometric properties of the OSCE.
The reliability (i.e. reproducibility) of scores is an important element of validity evidence. Many readers will be
familiar with Cronbach alpha which is available in common
statistical software packages. Alpha is usually used across
stations to measure overall reliability and to look for problematic stations. If decisions are made based on the performance of a single station (e.g. failing a station leads to
remediating that specific station), then alpha can be used
at the station level to evaluate reliability and identify problematic items. Because OSCEs are inherently multi-faceted
(e.g. persons, items, raters, tracks, etc.), generalizability theory (G-theory) is often preferred for calculating reliability as
well as determining the impact of the various sources of
error. However, G-theory works best if there are multiple
raters per station; otherwise, one cannot tease out the variance due to raters as opposed to due to the station. There
are freely available packages for running G-studies such as
the syntax-based GENOVA (Crick and Brennan 1983) and
the more user friendly G-string IV (Bloch and Norman
2015). For more on G-theory, one can review the AMEE
guide on G-theory (Bloch and Norman 2012) but, in brief,
G-theory seeks to estimate various sources of error in measuring the construct of interest.
Regardless of which statistic is used, alpha or G-theory
the desired coefficient is dependent on the purpose and
use of the test. If a high stakes decision is based on one
OSCE such as a national certification exam, the desire is for
0.8 or even 0.9 (Downing 2004), whereas for a moderate
stakes locally developed examinations, especially those that
are one piece of a program of assessment, lower reliability
would be acceptable. Hence, why the intended use of the
assessment (Tip 1) frames everything.
Tip 10
Review the correlation of your examination with other
One of the main reasons for an argument-based approach
to validity is the lack of an easy gold standard criterion to
which we can compare our assessments in medical education. In medical education, the strongest such evidence
would come in the form of patient outcomes. An example
of such work is the study by Tamblyn and colleagues
(Tamblyn et al. 1998) in which they demonstrated that
lower scores on a licensing examination were associated
with lower quality of clinical practice as measured by patterns in consultations, prescribing, and mammography
screening. This data supports evidence along Kane’s
Extrapolation stage of validity of that licensing exam.
More commonly, evidence is sought by comparing OSCE
scores to other assessments. For example, Pugh and colleagues (Pugh, Bhanji, et al. 2016) demonstrated that performance on a locally developed Internal Medicine OSCE
progress test correlated with scores on the high stakes
Internal Medicine certification examination and could identify residents at an elevated risk of failure. Not all correlations need to be done with data external to the institution.
Local data can be used to correlate OSCE scores to other
assessments measuring similar and dissimilar competencies.
For example, if OSCE scores correlate better with workplace-based assessments than with an MCQ exam, this supports the validity argument as both the OSCE and
workplace-based assessments are measuring performance
over knowledge. Another analysis could examine if an
OSCE discriminates more senior versus junior learners as
this also provides validity evidence.
Tip 11
Evaluate the effects of the OSCE on learners
Whether formative or summative, we know that assessment
drives learning (Kane’s Implications stage). However, it is
Downloaded by [Georgetown University] at 02:08 26 October 2017
important to recall that assessment can influence learning
in both positive and negative ways (Pugh and Regehr
2016), and so one should seek evidence for how an OSCE
is promoting or impeding learning. Cook and colleagues
(2015), in their review of Kane’s model, argue that this is an
underutilized but important aspect of the validity
Questions to be considered include: How does the OSCE
influence learning?; What are the outcomes of learners who
fail versus pass?; If remediation is provided to those who
fail, is there evidence that performance improves on a
repeat assessment?; How does the OSCE influence subsequent changes in the curriculum (e.g. if a high number of
candidates fail a station) and, conversely, do changes to
the curriculum influence OSCE performance?; and finally,
how does the OSCE influence patient care?
If the purpose of the OSCE is to drive learning, then is
there data to show the learners are learning as a result of
the OSCE? Follow-up surveys or focus groups of learners
can look at the impact of assessment on learning. A recent
study involved learners reviewing their OSCE rubrics immediately after the OSCE (tablet scoring was used to facilitate
this) and then writing an action plan of what studying they
will do, and how they will change their clinical behavior as
a result of reviewing their results. A follow-up survey demonstrated that this process did impact future learning with
almost all residents having either reviewed material, or
changed how they approached history or physical exam in
the workplace as a result of this feedback process (Strand
and Daniels 2017).
framework in mind such as Kane’s model during development will allow assessment data that can be used for the
intended purpose of the assessment, whether it is a high
stakes end-of-training exam, or a low stakes formative
Dr. Daniels would like to acknowledge the Department of Medicine’s
Academic Alternative Relationship Plan at the University of Alberta
for its financial support. Dr. Pugh would like to acknowledge the
Department of Medicine at The Ottawa Hospital for their financial
Disclosure statement
The authors report no conflicts of interest. The authors alone are
responsible for the content and writing of this article.
Notes on contributors
Vijay John Daniels, MD, MHPE, FRCPC, is an Associate Professor in the
Department of Medicine at the University of Alberta. He is a member
of the Royal College of Physicians and Surgeons of Canada’s
Examination Committee which reviews the quality of all specialty certification examinations.
Debra Pugh, MD, MHPE, FRCPC, is an Associate Professor in the
Department of Medicine at the University of Ottawa. She serves as
Vice Chair of the Central Examination Committee at the Medical
Council of Canada, and Vice Chair of the General Internal Medicine
Examination Board at the Royal College of Physicians and Surgeons of
Tip 12
Review the entire process to look for threats to
Vijay John Daniels
Debra Pugh
An argument for validity is an iterative process where one
states the proposed interpretation and use of the assessment, then examines the evidence of validity, and if the
evidence does not support the intended interpretation or
use, either revise the use or revise the assessment process.
This should continually happen to ensure the assessment is
meeting its purpose. Too often this ongoing quality assurance is focused solely on psychometrics such as reliability,
but all aspects of the development of an OSCE should be
reviewed to look for issues related to each of the four
stages of Kane’s model. For a full guide on OSCE quality
assurance strategies, we refer the reader to Pell and
colleague’s AMEE guide (Pell et al. 2010). Some OSCE metrics that are often overlooked are the percent of students
who fail overall or fail a specific station (can be program
evaluation information), correlation between a station’s
sum score and global rating scale (lower correlation raises
concern about score sheet content), and comparisons
between groups who encounter the same stations, but
with differences such as raters or locations, (Pell et al.
Development of an OSCE is a significant undertaking with
several steps involved. However, keeping a validity
Bartman I, Smee S, Roy M. 2013. A method for identifying extreme
OSCE examiners. Clin Teach. 10:27–31.
Bloch R, Norman G. 2012. Generalizability theory for the perplexed: a
practical introduction and guide: AMEE Guide No. 68. Med Teach.
Bloch R, Norman G. 2015. G-String IV program. http://fhsperd.mcmaster.
Brannick MT, Erol-Korkmaz HT, Prewett M. 2011. A systematic review of
the reliability of objective structured clinical examination scores.
Med Educ. 45:1181–1189.
Casey PM, Goepfert AR, Espey EL, Hammoud MM, Kaczmarczyk JM,
Katz NT, Neutens JJ, Nuthalapaty FS, Peskin E. Association of
Professors of Gynecology and Obstetrics Undergraduate Medical
Education Committee. 2009. To the point: reviews in medical
education-the objective structured clinical examination. Am J
Obstet Gynecol. 200:25–34.
Coderre S, Woloschuk W, McLaughlin K. 2009. Twelve tips for blueprinting. Med Teach. 31:322–324.
€mert M, Zill JM, Christalle E, Dirmaier J, H€arter M, Scholl I. 2016.
Assessing communication skills of medical students in objective
structured clinical examinations (OSCE)-A Systematic Review of
Rating Scales. PLoS One. 11:e0152717.
Cook DA, Brydges R, Ginsburg S, Hatala R. 2015. A contemporary
approach to validity arguments: a practical guide to Kane’s framework. Med Educ. 49:560–575.
Currie GP, Sivasubramaniam S, Cleland J. 2016. Sequential testing in a
high stakes OSCE: determining number of screening tests. Med
Teach. 38:708–714.
Downloaded by [Georgetown University] at 02:08 26 October 2017
Crick JE, Brennan RL. 1983. GENOVA program. https://education.uiowa.
Daniels VJ, Bordage G, Gierl MJ, Yudkowsky R. 2014. Effect of clinically
discriminating, evidence-based checklist items on the reliability of
scores from an Internal Medicine residency OSCE. Adv in Health Sci
Educ. 19:497–506.
Daniels VJ, Surgin C, Lai H. 2016. Enhancing formative feedback of an
OSCE through tablet scoring. Med Educ. 50 Supplement 1:28.
Denison A, Bate E, Thompson J. 2016. Tablet versus paper marking in
assessment: feedback matters. Perspect Med Educ. 5:108–113.
Downing SM. 2004. Reliability: on the reproducibility of assessment
data. Med Educ. 38:1006–1012.
Frank JR, Snell L, Sherbino J, editors. 2015. CanMEDS 2015 physician
competency framework. Ottawa: Royal College of Physicians and
Surgeons of Canada.
Fuller R, Homer M, Pell G, Hallam J. 2017. Managing extremes of assessor judgment within the OSCE. Med Teach. 39:58–66.
Gingerich A, van der Vleuten CP, Eva KW, Regehr G. 2014. More consensus than idiosyncrasy: categorizing social judgments to examine
variability in Mini-CEX ratings. Acad Med. 89:1510–1519.
Gofton WT, Dudek NL, Wood TJ, Balaa F, Hamstra SJ. 2012. The Ottawa
surgical competency operating room evaluation (O-SCORE): a tool
to assess surgical competence. Acad Med. 87:1401–1407.
Govaerts MJ, Van de Wiel MW, Schuwirth LW, Van der Vleuten CP,
Muijtjens AM. 2013. Workplace-based assessment: raters’ performance theories and constructs. Adv in Health Sci Educ. 18:375–396.
Harden RM. 1990. Twelve tips for organizing an objective structured
clinical examination (OSCE). Med Teach. 12:259–264.
Harden RM, Stevenson M, Downie WW, Wilson GM. 1975. Assessment
of clinical competence using objective structured examination. Br
Med J. 1:447–451.
Hastie MJ, Spellman JL, Pagano PP, Hastie J, Egan BJ. 2014. Designing
and implementing the objective structured clinical examination in
anesthesiology. Anesthesiology. 120:196–203.
Hodges BD, Hollenberg E, McNaughton N, Hanson MD, Regehr G.
2014. The psychiatry OSCE: a 20-year retrospective. Acad Psychiatry.
Homer M, Pell G, Fuller R. 2017. Problematizing the concept of the
“borderline” group in performance assessments. Med Teach.
Jefferies A, Simmons B, Tabak D, McIlroy JH, Lee KS, Roukema H,
Skidmore M. 2007. Using an objective structured clinical examination (OSCE) to assess multiple physician competencies in postgraduate training. Med Teach. 29:183–191.
Kane MT. 2013. Validating the interpretations and uses of test scores.
J Educ Meas. 50:1–73.
Khan KZ, Gaunt K, Ramachandran S, Pushkar P. 2013. The objective
structured clinical examination (OSCE): AMEE guide no. 81. Part II:
organisation and administration. Med Teach. 35:e1447–e1463.
Kreptul D, Thomas RE. 2016. Family medicine resident OSCEs: a systematic review. Educ Prim Care. 27:471–477.
Messick S. 1989. Validity. In: Linn RL, editor. Educational measurement.
3rd edn. New York (NY): American Council on Education and
Macmillan. p. 13–103.
Nulty DD, Mitchell ML, Jeffrey CA, Henderson A, Groves M. 2011. Best
practice guidelines for use of OSCEs: maximising value for student
learning. Nurse Educ Today. 31:145–151.
Patrıcio MF, Juli~ao M, Fareleira F, Carneiro AV. 2013. Is the OSCE a feasible tool to assess competencies in undergraduate medical education? Med Teach. 35:503–514.
Pell G, Fuller R, Homer M, Roberts T. International Association for
Medical Education 2010. How to measure the quality of the
OSCE: a review of metrics - AMEE guide no. 49. Med Teach.
Pell G, Fuller R, Homer M, Roberts T. 2013. Advancing the objective
structured clinical examination: sequential testing in theory and
practice. Med Educ. 47:569–577.
Pugh D, Bhanji F, Cole G, Dupre J, Hatala R, Humphrey-Murto S,
Touchie C, Wood TJ. 2016. Do OSCE progress test scores predict
performance in a national high-stakes examination? Med Educ.
Pugh D, Halman S, Desjardins I, Humphrey-Murto S, Wood TJ.
2016. Done or almost done? Improving OSCE checklists to better capture performance in progress tests. Teach Learn Med.
Pugh D, Regehr G. 2016. Taking the sting out of assessment: is there a
role for progress testing? Med Educ. 50:721–729.
Pugh D, Smee S. 2013. Guidelines for the development of objective
structured clinical examination (OSCE) Cases. Ottawa: Medical
Council of Canada.
Roch SG, Woehr DJ, Mishra V, Kieszczynska U. 2012. Rater training
revisited: an updated meta-analytic review of frame-of-reference
training. J Occup Organ Psychol. 85:370–395.
Sandilands DD, Gotzmann A, Roy M, Zumbo BD, De Champlain A.
2014. Weighting checklist items and station components on a largescale OSCE: is it worth the effort? Med Teach. 36:585–590.
Schuwirth LW, Van der Vleuten CP. 2011. Programmatic assessment:
from assessment of learning to assessment for learning. Med Teach.
Setyonugroho W, Kennedy KM, Kropmans TJ. 2015. Reliability and validity of OSCE checklists used to assess the communication skills of
undergraduate medical students: a systematic review. Patient Educ
Couns. pii: S0738-3991:00277–00273.
Smith V, Muldoon K, Biesty L. 2012. The objective structured clinical
examination (OSCE) as a strategy for assessing clinical competence
in midwifery education in Ireland: a critical review. Nurse Educ
Pract. 12:242–247.
Strand A, Daniels VJ. 2017. Improving Learning Outcomes through
Immediate OSCE Score Sheet Review. Med Educ. 51(Suppl 1):114.
Sturpe DA. 2010. Objective structured clinical examinations in doctor
of pharmacy programs in the United States. Am J Pharm Educ.
Swanson DB, van der Vleuten CP. 2013. Assessment of clinical skills
with standardized patients: state of the art revisited. Teach Learn
Med. 25 (Suppl 1):S17–S25.
Tamblyn R, Abrahamowicz M, Brailovsky C, Grand’Maison P, Lescop J,
Norcini J, Girard N, Haggerty J. 1998. Association between licensing
examination scores and resource use and quality of care in primary
care practice. JAMA. 280:989–996.
Walsh M, Bailey PH, Koren I. 2009. Objective structured clinical evaluation of clinical competence: an integrative review. J Adv Nurs.
Yousuf N, Violato C, Zuberi RW. 2015. Standard setting methods for
pass/fail decisions on high-stakes objective structured clinical examinations: a validity study. Teach Learn Med. 27:280–291.
Без категории
Размер файла
1 029 Кб
2017, 0142159x, 1390214
Пожаловаться на содержимое документа