вход по аккаунту


Exploring the use of videotaped objective structured clinical examination in the assessment of joint examination skills of medical students.

код для вставкиСкачать
Arthritis & Rheumatism (Arthritis Care & Research)
Vol. 57, No. 5, June 15, 2007, pp 869 – 876
DOI 10.1002/art.22763
© 2007, American College of Rheumatology
Exploring the Use of Videotaped Objective
Structured Clinical Examination in the
Assessment of Joint Examination Skills of
Medical Students
Objective. Objective structured clinical examination (OSCE) is a key part of medical student assessment. Currently,
assessment is performed by medical examiners in situ. Our objective was to determine whether assessment by videotaped
OSCE is as reliable as live OSCE assessment.
Methods. Participants were 95 undergraduate medical students attending their musculoskeletal week at Freeman
Hospital, Newcastle (UK). Student performance on OSCE stations for shoulder or knee examinations was assessed by
experienced rheumatologists. The stations were also videotaped and scored by a rheumatologist independently. The
examinations consisted of a 14-item checklist and a global rating scale (GRS).
Results. Mean values for the shoulder OSCE checklist were 17.9 by live assessment and 17.4 by video (n ⴝ 50), and 20.9
and 20.0 for live and video knee assessment, respectively (n ⴝ 45). Intraclass correlation coefficients for shoulder and
knee checklists were 0.55 and 0.58, respectively, indicating moderate reliability between live and video scores for the
OSCE checklists. GRS scores were less reliable than checklist scores. There was 84% agreement in the classification of
examination grades between live and video checklist scores for the shoulder and 87% agreement for the knee (␬ ⴝ 0.43
and 0.51, respectively; P < 0.001).
Conclusion. Video OSCE has the potential to be reliable and offers some advantages over live OSCE including more
efficient use of examiners’ time, increased fairness, and better monitoring of standards across various schools/sites.
However, further work is needed to support our findings and to implement and evaluate the quality assurance issues
identified in this work before justifiable recommendations can be made.
KEY WORDS. Videotape; Examination; Assessment; Skills.
Objective structured clinical examinations (OSCEs), first
introduced by Harden and Gleeson in 1979 (1), are widely
used in the assessment of undergraduate and postgraduate
medical students and are regarded as offering better validity than traditional long-case final examinations (2). In a
Supported by an educational project grant from the Arthritis Research Campaign Education Subcommittee. The
Virtual Rheumatology CD was funded by the Arthritis Research Campaign.
Pirashanthie Vivekananda-Schmidt, DPhil, Martyn
Lewis, PhD: Primary Care Sciences Research Centre, Keele
University, Keele, North Staffordshire, UK; 2David Coady,
MRCP, Catherine Morley, MRCP, Lesley Kay, FRCP, David
Walker, MD: University of Newcastle upon Tyne, Newcastle
upon Tyne, UK; 3Andrew B. Hassell, MD: Keele University,
Keele, North Staffordshire, UK.
long-case examination, the student sees a patient alone for
30 – 60 minutes, obtains a history, and performs a physical
examination. The student is then questioned about the
findings, relevant investigations, and further treatment of
the patient. The processes of history taking and examination are not observed. Therefore, communication and examination skills may not be adequately assessed (3). The
advantages of OSCEs include greater reliability (they pro-
Dr. Kay has received consulting fees and/or honoraria
(less than $10,000 each) from Wyeth, Schering-Plough, and
Address correspondence to Pirashanthie VivekanandaSchmidt, DPhil, Academic Unit of Medical Education, 85
Wilkinson Street, Sheffield University, S10 2GB, UK. E-mail:
Submitted for publication May 4, 2006; accepted in revised form October 18, 2006.
vide a consistent challenge for all candidates assessed [4])
and greater face and content validity (5) because the process as well as the outcome can be assessed. In addition,
OSCEs allow sampling of a greater range of skills than the
long-case examination. They have also been shown to
correlate better with consultant rating of the candidate (2)
than traditional clinical assessments based on long- and
short-case examinations. However, OSCEs require a great
deal of organization, not the least the coordination of a
large number of clinicians to be in the same place at the
same time for a single examination. Not only must these
clinicians be in one place, they also should have undergone some training in the assessment to maximize reliability. Furthermore, long OSCE assessment sessions can affect the objectivity of the assessors due to fatigue.
Videotapes have been used for a number of years for a
variety of purposes within medical education. They are
perceived as effective learning resources in the field of
communication skills (6,7) and have been used in the
learning of skills for self and tutor assessment (8 –11). Lane
and Gottlieb (8) found that use of videotaping improved
students’ interviewing skills and self assessment and had
the advantage of identifying students who overrated themselves. Videos have also been used in the evaluation of
educational interventions (12). They have been used for
evaluating performance and competency (13) as well as
rater bias (14). Videos have been used in the assessment of
communication skills (15,16) and in the assessment of
general practice trainees’ consultation skills in the UK
since the 1980s and have been found to be effective, valid,
and reliable (17).
Successful implementation of videotaped OSCEs
(VOSCEs) would offer considerable potential advantages
to faculty, examiners, and candidates. The first advantage
is in terms of quality control. Videotaped OSCE stations
offer the potential for establishing consensus between examiners for investigating interexaminer variability and
even for comparison of standards between medical
schools. It is possible to increase the objectivity of assessment by having assessors evaluate examination skills
based on an agreed standardized marking criteria. The
second advantage is in terms of practicality. Running an
OSCE for a group of students is very time consuming and
requires expensive clinical expertise and coordination.
Videotaping the student performance and marking the performance at a later point means the OSCE can be run with
relatively few, if any, clinicians present because stations
do not necessarily have to be manned by clinical assessors.
Therefore, the examination process may be perceived to be
more efficient and reliable: the cost and stress involved in
organizing the OSCE might be reduced while improving
the consistency and fairness of assessments.
Evidence that VOSCEs are a practical, valid method of
implementing OSCEs has not been established in the field
of musculoskeletal medicine and is little explored in other
fields of physical examination. In this study, we carried
out formative OSCE assessments of third-year undergraduate medical students performing shoulder and/or knee
examination as part of a larger educational randomized
controlled trial (18) and videotaped these OSCEs. We
present results of an investigation of the relationship be-
Vivekananda-Schmidt et al
tween the live examiner’s assessment and that of a video
assessor, and we discuss the practicalities of videotaping
musculoskeletal examination OSCE stations.
Setting. This study was performed alongside a randomized controlled trial evaluating the educational value of a
computer-assisted learning program, Virtual Rheumatology CD (Newfangled Media, Stoke on Kent, UK), in the
teaching of musculoskeletal clinical examination skills in
undergraduate medical students (18). The study took place
at the University of Newcastle upon Tyne, Newcastle, UK.
Participants. Participants were a subgroup of subjects
who took part in the randomized controlled trial and included third-year undergraduate medical students attending their musculoskeletal week during a 12-week clinical
skills module at Freeman Hospital, Newcastle between
January 2002 and June 2003. Prior to the start of placement, these students had attended a 1-week clinical skills
block, which included teaching of musculoskeletal examination.
OSCE. The OSCE consisted of a station on knee examination and a station on shoulder examination. Participants
in this study were examined on one station only. Each
station was 6 minutes long. There was a 14-item checklist
for scoring the OSCE. Students did not have access to this
checklist prior to the examination. For each item, a score
of 0, 1, or 2 was given for “not done,” “done,” and “done
well,” respectively. Scores of individual items were
summed for each station, resulting in total scores for the
OSCE shoulder and knee assessments based on discrete
numerical scales ranging from 0 (“not done” recorded on
all items) to 28 (“done well” recorded on all items). In
addition, we added a global rating scale (GRS) as a supplementary measure (a 10-cm visual analog scale ranging
from 0 ⫽ poor to 100 ⫽ excellent) to the OSCE score
sheets. GRS have been shown to be valid measures for the
assessment of clinical skills (19).
Video recording of the OSCEs. Digital video cameras
were attached to tripods and placed in the room where the
examination took place. Student performance was videotaped with consent. The recording was done on mini digital videotapes and was converted to VHS tapes for ease of
Assessors. Local rheumatology specialist registrars
(SpR; qualified physicians undertaking specialist postgraduate training in rheumatology, equivalent to residents
in the US) volunteered for the OSCE assessment of students. It was not feasible to train the registrars especially
for this study but all had prior experience in administering
and scoring OSCEs. We had 2 main assessors for the knee
station and the shoulder station. However, due to clinical
commitments, other SpR had to stand in for our regular
assessors. Overall, 4 raters were involved in assessing the
Joint Examination Using Videotaped OSCE
Figure 1. Scatterplots of interrater scores for live versus video scoring of the objective
structured clinical examination.
knee and/or shoulder stations. A consultant rheumatologist (ABH), who was blind to the live scores of the OSCE
assessment, scored the VOSCE for both knee and shoulder
Procedure. On day 4 of the week’s rotation, students
were asked to volunteer for a formative OSCE examination. Students were randomly allocated to 1 of 2 OSCE
stations and at the end of the assessment were given verbal
and written feedback on their performance. To preserve
participant confidentiality, each student was assigned an
anonymized code, which protected his or her name and
identity from the VOSCE assessor. Approval was obtained
from respective chairs of the university ethics committee.
Written consent was obtained from the students for video
recording of the OSCE and for using the data from the
OSCE for the research study.
Sample size. A sample size between 40 and 50 participants was required for each reliability analysis in order to
calculate confidence intervals to the precision of ⫾0.2 on
either side of the reliability coefficients.
Statistical analysis. The association between live and
video OSCE checklist and GRS scores was assessed in a
number of ways. Mean difference and 95% limits of agreement were calculated. Consistency of scoring between the
measures was assessed using Pearson’s correlation. Absolute agreement was determined using the intraclass correlation coefficient (ICC) using a 2-way random effects
model (ICC2,1). OSCE checklist scores were classified according to the traditional examination grading system of
fail (score ⬍14, i.e., ⬍50%), pass (score 14 –20, i.e., 50 –
74%), and honors (score ⱖ21, i.e., ⱖ75%), and reliability
between live and video grades was evaluated using observed agreements and the chance-corrected weighted
kappa statistic (using linear weights). In addition to evaluating total scores, we also looked at the reliability of each
of the individual items of the 2 OSCE stations using observed agreements and weighted kappa.
Fleiss demonstrated that the ICC was closely related to
the weighted kappa (20), and recommended that an ICC
value ⬍0.4 was poor, between 0.4 and 0.75 was fair to
Table 1. Mean ⴞ SD scores for live and video objective
structured clinical examination assessments
(n ⴝ 50)
(n ⴝ 38)
(n ⴝ 45)
(n ⴝ 31)
17.9 ⫾ 3.4
17.4 ⫾ 3.4
76.0 ⫾ 11.3
59.0 ⫾ 16.1
20.9 ⫾ 2.5
20.0 ⫾ 2.8
73.0 ⫾ 11.5
60.5 ⫾ 15.8
Vivekananda-Schmidt et al
Table 2. Reliability of scoring of the objective structured clinical examination for live versus video assessments*
Mean absolute difference (range)†
Mean difference (⫾2 SD difference)‡
ICC2,1 (95% CI)
Pearson’s r (95% CI)
Composite scale
(n ⴝ 50)
Global scale
(n ⴝ 38)
Composite scale
(n ⴝ 45)
Global scale
(n ⴝ 31)
2.26 (0–9)
0.50 (⫺5.98, 6.98)
0.55 (0.33, 0.72)
0.55 (0.32, 0.72)
17.8 (2–45.5)
17.0 (⫺7.1, 41.2)
0.36 (⫺0.10, 0.69)
0.66 (0.43, 0.81)
1.96 (0–5)
0.93 (⫺3.71, 5.57)
0.58 (0.34, 0.75)
0.62 (0.38, 0.80)
15.3 (2–42.5)
12.6 (⫺16.6, 41.7)
0.32 (⫺0.05, 0.61)
0.46 (0.13, 0.70)
* ICC ⫽ intraclass correlation coefficient; 95% CI ⫽ 95% confidence interval.
† Absolute value of difference between live score and video score.
‡ Live score minus video score.
good, and ⬎0.75 was excellent (21). We adopted the similar and widely accepted classification according to Landis
and Koch (22) to provide adjectives to describe the reliability values for the ICC and kappa calculated in this
study: 0.01– 0.20 indicated slight, 0.21– 0.40 indicated fair,
0.41– 0.60 indicated moderate, 0.61– 0.80 indicated substantial, and 0.81–1.00 indicated almost perfect.
A random subsample of participant videos were rescored after 3 months by the same consultant (ABH). The
intrarater agreement of the OSCE checklist scores was
evaluated by ICC and by kappa (after classifying the scores
into grades [fail, pass, honors] as described above).
The results are based on 50 matched pairs of observations
for the shoulder OSCE and 45 for the knee. Of the 4
assessors, one (CM) scored 39 (78%) participants, one (DC)
scored 7 (14%), and another (MF) scored 4 (8%) at the
shoulder station; one assessor (DC) scored 26 (58%) participants, one (MB) scored 15 (33%), and one (MF) scored
4 (9%) at the knee station. The subgroup of individuals
who were assessed by VOSCE in addition to the OSCE for
this study had similar baseline characteristics to individuals who were assessed by OSCE but not VOSCE, e.g., 66%
and 69% were women, respectively; mean OSCE shoulder
scores were 19.0 and 18.5, respectively; and mean OSCE
knee scores were 21.1 and 21.0, respectively.
Paired data for the live and video scores are illustrated
in Figure 1. Live and video summary scores on the OSCE
checklist were very similar (Table 1). Mean values for the
OSCE checklist shoulder scores were 17.9 by live assessment and 17.4 by video assessment, and for the knee
scores were 20.9 and 20.0, respectively. By contrast, GRS
scores were lower for the video assessment than the live
assessment. Pearson’s correlation coefficients between
OSCE and GRS scores for the live and videotaped assessments ranged from 0.46 to 0.66 (Table 2). The ICC coefficients indicated moderate reliability between video and
live scores, with values of 0.55 and 0.58 for the OSCE
checklist of the shoulder and knee, respectively. The reliability was only fair between scores for the global ratings.
Table 3. Agreement between live and video ratings for individual items of the objective structured clinical examination
(OSCE) shoulder assessment*
Agreement, %†
Shoulder OSCE items
Kappa (95% CI)‡
Approach to the patient
Inspected shoulder from in front and from behind
Palpated shoulder for tenderness
Identifies bony landmarks
External rotation of the shoulder with the elbows tucked in
Asked patient to put hands behind head and hands behind back
Assess forward flexion
Assess extension
Inspects active neck movements
Assess for painful arc
Assess scapular movement (viewed from behind)
Assess the acromioclavicular joint
Performs resisted movement
Identifies abnormalities correctly
⫺0.03 (⫺0.27, 0.21)
0.57 (0.33, 0.81)
0.46 (0.20, 0.72)
0.08 (⫺0.04, 0.20)
0.63 (0.41, 0.85)
0.68 (0.44, 0.92)
0.32 (0.12, 0.52)
0.41 (0.19, 0.63)
0.83 (0.56, 1.00)
⫺0.04 (⫺0.17, 0.08)
0.44 (0.20, 0.64)
0.76 (0.52, 1.00)
0.85 (0.59, 1.00)
0.42 (0.21, 0.62)
⬍ 0.001
⬍ 0.001
⬍ 0.001
⬍ 0.001
⬍ 0.01
⬍ 0.001
⬍ 0.001
⬍ 0.001
⬍ 0.001
⬍ 0.001
⬍ 0.001
* 95% CI ⫽ 95% confidence interval.
† Based on linear weights.
‡ Based on linear weights. Agreement expected by chance alone; the kappa coefficient measures the chance-corrected agreement ([observed agreement
⫺ expected agreement]/[1 ⫺ expected agreement]).
§ Agreement expected by chance alone; the kappa coefficient measures the chance-corrected agreement ([observed agreement ⫺ expected agreement]/
[1 ⫺ expected agreement]).
Joint Examination Using Videotaped OSCE
Table 4. Agreement between live and video ratings for individual items of the objective structured clinical examination
(OSCE) knee assessment*
Agreement, %†
Knee OSCE items
Kappa (95% CI)‡
Approach to the patient (including asking about knee pain)
Inspection (including from the end of the bed)
Assessment of temperature
Assessment of muscle bulk
Palpation of patella
Palpate joint line (including the back of the knee)
Patella tap ⫾ cross fluctuation
Assess full extension
Assess full flexion
Collateral ligament assessment at 15 degrees
Undertakes active and passive movements
Anterior draw test
Gets patient to walk
Identifies normality/abnormalities correctly
⫺0.06 (⫺0.32, 0.21)
0.18 (⫺0.10, 0.46)
0.68 (0.43, 0.94)
0.18 (0.00, 0.37)
0.27 (0.07, 0.47)
0.50 (0.26, 0.74)
0.44 (0.21, 0.66)
0.52 (0.32, 0.73)
0.45 (0.24, 0.66)
0.19 (⫺0.07, 0.45)
0.44 (0.22, 0.66)
⫺0.03 (⫺0.31, 0.25)
0.05 (⫺0.06, 0.16)
⫺0.12 (⫺0.32, 0.07)
⬍ 0.001
⬍ 0.05
⬍ 0.01
⬍ 0.001
⬍ 0.001
⬍ 0.001
⬍ 0.001
⬍ 0.001
* 95% CI ⫽ 95% confidence interval.
† Based on linear weights.
‡ Based on linear weights. Agreement expected by chance alone; the kappa coefficient measures the chance-corrected agreement ([observed agreement
⫺ expected agreement]/[1 ⫺ expected agreement]).
§ Agreement expected by chance alone; the kappa coefficient measures the chance-corrected agreement ([observed agreement ⫺ expected agreement]/
[1 ⫺ expected agreement]).
The video examiner consistently scored candidates lower
than did the live examiner on the GRS score, but not on the
checklist score (Figure 1).
Data comparing the live and video ratings of the individual items of the shoulder assessment and the knee
assessment are presented in Tables 3 and 4, respectively.
Large variations in reliability were seen across the items
in both shoulder and knee OSCE stations. Substantial reliability (␬ ⬎ 0.6) for shoulder OSCE was seen for the items
“performs resisted movement,” “inspects active neck
movements,” “assesses the acromioclavicular joint,”
“asked patient to put hands behind head and hands behind back,” and “external rotation of the shoulder with the
elbows tucked in.” Similarly, substantial reliability (␬ ⬎
0.6) for knee OSCE was observed for “assesses temperature,” and moderate reliability (␬ ⫽ 0.41– 0.60) was observed for “assesses full extension,” “assesses full flexion,”
“palpates joint line,” “patella tap,” and “undertakes active
and passive movements.”
Reliability was moderate (␬ ⫽ 0.41– 0.60) for the overall
grades of the shoulder and knee OSCE assessments (Table
5). This could be further improved by considering the
omission or modification of poorer agreement items (see
Tables 3 and 4) within the OSCE checklists.
We rescored 22 video OSCEs to evaluate the intrarater
agreement of the VOSCE. The test–retest included 11
shoulder checklists and 11 knee checklists. The scores
were pooled so that the reliability analysis was based on
Table 5. Agreement between live and video ratings for graded classification of the objective structured clinical examination
(OSCE) checklist*
agreement, %†
Video rater
Live rater
Shoulder checklist
Fail (⬍14)
Pass (14–20)
Honors (ⱖ21)
Knee checklist
Fail (⬍14)
Pass (14–20)
Honors (ⱖ21)
Fail (<14)
Pass (14–20)
Honors (>21)
3 (6)
3 (6)
1 (2)
2 (4)
26 (52)
7 (14)
0 (0)
2 (4)
6 (12)
1 (2)
0 (0)
0 (0)
0 (0)
17 (38)
8 (18)
0 (0)
4 (9)
15 (33)
Weighted kappa (95% CI)
0.43 (0.23, 0.63)‡
0.51 (0.24, 0.78)‡
* Values are the number (percentage) unless otherwise indicated. 95% CI ⫽ 95% confidence interval.
† Based on linear weights.
‡ P ⬍ 0.001.
22 pairs of scores. The ICC was 0.98 (95% confidence
interval 0.96, 0.99) and the kappa value based on graded
classification of scores was 1.00 (100% agreement).
Our goal was to investigate the relationship between the
assessments of live and videotaped OSCE stations. Our
results demonstrated moderate interrater reliability between the live scorer(s) and the video scorer for both the
knee station and the shoulder station using a checklist
scoring approach. The interrater reliability using a GRS
was lower: the live (SpR) scorer consistently scored the
students higher than did the (consultant) video scorer,
indicating examiner bias. Poor interrater reliability for the
GRS may reflect different expectations on the part of a
strict consultant compared with the lenient SpR. Additionally, it is possible that the live examiner forms more of
a relationship with the candidate and therefore tends to
give them higher scores.
Reliability between live and video assessments ranged
from moderate to almost perfect for 16 of 28 of the individual items of the OSCE checklist. We were not able to
distinguish which of the 2 methods of assessment, live or
video, was most accurate because there was no gold standard to compare against. The rationale behind the poorer
agreement of the remaining 12 items may be viewed from
clinical and statistical perspectives. The shoulder items
“identifies bony landmarks” and “assess for painful arc”
had poor reliability. Identification of bony landmarks can
be particularly difficult to score on a video if the focus is
not close enough and/or the students do not name the
landmarks or explain what they are doing. The video assessor scored students as having done well only if students
gave a verbal description of what they were palpating
during the procedure. This highlights one of the key areas
where differences may occur. In the event of any uncertainty regarding any aspect of the student clinical examination, live assessors can seek further clarification from
the students. This is not possible via prospective scoring
by video, and underlines a limitation of the video method
of assessment. Similarly, the items “inspection” and “assessing muscle bulk” from the knee station are also difficult to score via video unless students describe what they
are doing. There was only slight reliability for “collateral
ligament assessment at 15 degrees” from the knee checklist, which may be explained in part by the fact that this
examination requires complex movements and handling
skills. Hoving et al (23) found that movements that were
complex and required handling skills had poorer interrater
reliability than movements that were simple. The item
“approach to the patient” scored very poorly at both knee
and shoulder stations in terms of agreement. The video
assessor gave a score of 2 only if the student both introduced themselves and specifically asked the patient about
pain prior to examining the patient, whereas the live assessors did not appear to have the same criteria for scoring
this question. This finding raises another generic point
concerning checklist marking: to maximize reliability, the
checklist must make explicit how marks are awarded. The
Vivekananda-Schmidt et al
item “gets patient to walk” was also scored very differently
between the live and video examiners. Scores by the video
examiner were most frequently recorded as 0 (“not done”)
whereas the live examiners most frequently scored this
item as “done well,” suggesting that there were quite different scoring criteria adopted for the 2 approaches, the
criteria for the video scoring being more strict. If a student
was instructed or prompted by the video examiner, a full
mark was not awarded. The item “identifies normality/
abnormalities correctly” from the knee station had only
slight reliability, although there was better agreement for
this item in relation to the shoulder station. Because this
item draws from the other checklist items, the poor agreement for this item can be due to poor concordance within
other items.
It should be noted that poor reliability may also be
deducted from results based on inadequate statistical measurement. In the context of this study, less than moderate
reliability was concluded for some items (specifically “approach to the patient,” “inspection,” “anterior draw test”)
when the expected (or chance) agreements of the items
were high. As the expected agreement increases, the kappa
becomes increasingly limited in its capacity to yield meaningful reliability values (24,25). If for a specified item a
certain category has a high likelihood of being scored by
all raters, then the expected interrater agreement for that
item will be high. For example, both the live rater and the
video rater most frequently scored the items “inspection”
and “anterior draw test” as having been done well because
both items were relatively easy examinations for the students to perform. As a result, the expected agreements of
the 2 items were 84% and 97%, respectively (i.e., close to
100%), leaving little room for measuring agreement above
that expected by chance alone.
No gold standard exists to establish the content validity
of a musculoskeletal examination OSCE station. However,
Coady et al (26) have derived a core set of clinical skills
relevant to musculoskeletal examination skills in students.
Of the 22 core skills relevant to the examination of the
shoulder and knee joint from the Regional Examination of
Musculoskeletal System (REMS) for undergraduate medical students (26), our OSCEs included 19 skills. The skills
not addressed by our tool include assessing leg length
when leg length discrepancy is suspected and when appropriate, assessing neurologic and vascular systems during the assessment of a problematic joint, and making a
qualitative assessment of movement.
There is published evidence that examiners’ clinical
experience has an impact on interexaminer agreement on
the palpatory diagnosis in osteopathy (27). In this study,
the level of agreement between live and video examiners
might have been stronger had their level of clinical experience been closer. Unlike the study by Branch and Lipsky
(28), which measured the impact of an educational intervention on retention, confidence, and ability of musculoskeletal examination skills of medical students, ours is an
exploratory study. There are aspects of this study that can
be addressed with improvement. The key area is the lack
of face-to-face preassessment discussion between all the
assessors on how to score each of the items. This was not
possible for several pragmatic reasons. The video assessor
Joint Examination Using Videotaped OSCE
was geographically too far away from the live assessors.
Owing to the busy schedule of the clinical placement, the
formative OSCE assessments were offered as an optional
addition during the lunch hour and the volunteer assessors had little time to prediscuss scoring criteria for assessment.
OSCE assessment via video is a very attractive proposition in the current climate of increasing pressure for clinicians to take on the role of teachers and assessors. It may
also provide a higher level of consistency between institutions and paves the way for better quality assurance
issues such as anonymized marking to increase fairness,
ability of all students to go through the stations in the same
order, and ability of the facility to monitor standards in
assessment across various hospital sites as well as across
schools. Moreover, interrater reliability of live scorers has
been shown to vary from 0.25 at some stations to 0.77 at
others (29), providing evidence that the consistency between live assessors is not much different from the reliability between live and video assessments in this study.
To improve the reliability of video or live assessment, it is
important to improve the process of assessment (for example, by standardizing methods of evaluation, scoring, and
administration). Our test–retest results, albeit based on a
subsample of our original study population, suggest that
reproducibility of video scoring is likely to be almost perfect, which further implies that the overall reliability of
video scoring by different observers and its reliability
against live scoring would probably be increased by standardizing the methods of evaluation and trying to establish
scoring consensuses between different assessors.
There are a number of other key pragmatic issues, which
need to be taken into account when designing an OSCE
station that is to be videotaped. It is important that the
necessary equipment and expertise are available so that
good quality recordings can be obtained. We discarded one
videotaped examination as not scorable due to poor positioning of the video camera and therefore poor recording.
In this study, we used only 1 video camera to assess the
student examining the patient’s joint. An alternative
method would be to use 2 cameras simultaneously, where
one camera could record the student examining and the
other could focus on the joint being inspected. The latter
may give better visual information to the video assessor
and may improve the reliability of assessment of items that
involve visual inspection. However, this would have to be
weighed against the probable increase in duration of assessment by video. Although studies in other specialties
(largely communication skills) demonstrate that the use of
videotaping can be a valuable learning experience for students to improve their skills, not all students may be
comfortable with being recorded. We did not explore students’ views in this study. It is also known that student
performance may be influenced by the videotaping process
(30). Offering a certain period of adaptation time before the
formal assessment phase begins so that the students have a
chance to familiarize themselves with the environment
may minimize this effect. In contrast, it could be argued
that students may express different levels of anxiety about
performing clinical examinations in front of a live exam-
iner. It also remains to be seen if VOSCEs are suitable for
other specialties in medicine.
Further work is needed to establish the potential for the
VOSCE in the assessment of clinical examination skills.
The reliability of video scoring after standardized scoring
methods have been put in place should be established; our
work on intraobserver variability suggests reliabilities will
be considerably enhanced. There is room for further investigation of how procedures including the set up process
and quality of equipment can improve the integrity of
scoring videotaped OSCE assessments. We have not yet
addressed the views of examiners and students regarding
videotaped assessment. Finally, there is considerable opportunity for investigating whether VOSCE assessments
are valid across different clinical specialties.
In conclusion, VOSCEs have the potential of improving
quality assurance and saving resources. In practice they
need to be conducted with care, taking into account practical issues of camera and patient placement as well as the
principles of effective assessment, with good examiner
training to ensure consistency of scoring. Finally, this
study highlights the potential of VOSCE stations in examiner training. We cannot conclude that videotaped scoring
is better than live scoring of OSCE assessments, but our
findings do suggest that VOSCE may be an efficient and
reliable alternative to traditional live scoring.
We thank Dr. Matt Bridges and Dr. Mohammed Farhod for
their help in conducting this study.
Dr. Vivekananda-Schmidt had full access to all of the data in
the study and takes responsibility for the integrity of the data and
the accuracy of the data analysis.
Study design. Vivekananda-Schmidt, Lewis, Coady, Hassell.
Acquisition of data. Vivekananda-Schmidt, Coady, Morley, Kay,
Analysis and interpretation of data. Vivekananda-Schmidt,
Lewis, Hassell.
Manuscript preparation. Vivekananda-Schmidt, Lewis, Coady,
Statistical analysis. Vivekananda-Schmidt, Lewis.
1. Harden RM, Gleeson FA. Assessment of clinical competence
using an objective structured clinical examination (OSCE).
Med Educ 1979;13:41–54.
2. Probert CS, Cahill DJ, McCann GL, Ben-Shlomo Y. Traditional
finals and OSCEs in predicting consultant and self-reported
clinical skills of PRHOs: a pilot study. Med Educ 2003;37:
597– 602.
3. Sood R. Long case examination: can it be improved? J India
Acad Clin Med 2001;2:251–5.
4. Harden RM, Stevenson M, Downie WW, Wilson GM. Assessment of clinical competence using the objective structured
examination. BMJ 1975;1:447–51.
5. Newble D. Techniques for measuring clinical competence:
objective structured clinical examinations. Med Educ 2004;
38:199 –203.
6. Mir MA, Marshall RJ, Evans RW, Hall R, Duthie HL. Comparison between videotape and personal teaching as methods of
Vivekananda-Schmidt et al
communicating clinical skills to medical students. Br Med J
(Clin Res Ed) 1984;289:31– 4.
Marita P, Leena L, Tarja K. Nurses’ self-reflection via videotaping to improve communication skills in health counselling. Patient Educ Couns 1999;36:3–11.
Lane JL, Gottlieb RP. Improving the interviewing and selfassessment skills of medical students: is it time to readopt
videotaping as an educational tool? Ambul Pediatr 2004;4:
244 – 8.
Thorburn J, Dean M, Finn T, King J, Wilkinson M. Student
learning through video assessment [review]. Contemp Nurse
2001;10:39 – 45.
Winters J, Hauck B, Riggs CJ, Clawson J, Collins J. Use of
videotaping to assess competencies and course outcomes.
J Nurs Educ 2003;42:472– 6.
Hill R, Hooper C, Wahl S. Look, learn, and be satisfied: video
playback as a learning strategy to improve clinical skills performance. J Nurses Staff Dev 2000;16:232–9.
Yudkowsky R, Downing S, Klamen D, Valaski M, Eulenberg
B, Popa M. Assessing the head-to-toe physical examination
skills of medical students. Med Teach 2004;26:415–9.
Ritchie PD, Cameron PA. An evaluation of trauma team leader
performance by video recording. Aust N Z J Surg 1999;69:
183– 6.
Vogt VY, Givens VM, Keathley CA, Lipscomb GH, Summitt
RL Jr. Is a resident’s score on a videotaped objective structured assessment of technical skills affected by revealing the
resident’s identity? Am J Obstet Gynecol 2003;189:688 –91.
Humphris GM, Kaney S. The Objective Structured Video
Exam for assessment of communication skills. Med Educ
2000;34:939 – 45.
Smit GN, van der Molen HT. Development and evaluation of
a video test for the assessment of interviewing skills. J Cancer
Educ 1995;10:195–9.
Ram P, Grol R, Rethans JJ, Schouten B, van der Vleuten C,
Kester A. Assessment of general practitioners by video observation of communicative and medical performance in daily
practice: issues of validity, reliability and feasibility. Med
Educ 1999;33:447–54.
Vivekananda-Schmidt P, Lewis M, Hassell AB, and the ARC
Virtual Rheumatology CAL Research Group. Cluster randomized controlled trial of the impact of a Computer-Assisted
Learning package on the learning of musculoskeletal examination skills by undergraduate medical students. Arthritis
Rheum 2005;53:764 –71.
Hodges B, McIlroy JH. Analytic global OSCE ratings are sensitive to level of training. Med Educ 2003;37:1012– 6.
Fleiss JL. Measuring agreement between two judges on the
presence or absence of a trait. Biometrics 1975;31:651–9.
Fleiss JL. The design and analysis of clinical experiments.
New York: John Wiley & Sons; 1986.
Landis JR, Koch GG. The measurement of observer agreement
for categorical data. Biometrics 1977;33:159 –74.
Hoving JL, Buchbinder R, Green S, Forbes A, Bellamy N,
Brand C, et al. How reliably do rheumatologists measure
shoulder movement? Ann Rheum Dis 2002;61:612– 6.
Feinstein AR, Cicchetti DV. High agreement but low kappa. I.
The problems of two paradoxes. J Clin Epidemiol 1990;43:
Hasnain M, Onishi H, Elstein AS. Inter-rater agreement in
judging errors in diagnostic reasoning. Med Educ 2004;38:
609 –16.
Coady D, Walker D, Kay L. Regional Examination of the
Musculoskeletal System (REMS): a core set of clinical skills
for medical students. Rheumatology (Oxford) 2004;43:
Beal MC, Patriquin DA. Interexaminer agreement on palpatory diagnosis and patient self-assessment of disability: a pilot
study. J Am Osteopath Assoc 1995;95:97–106.
Branch VK, Lipsky PE. Positive impact of an intervention by
arthritis educators on retention of information, confidence,
and examination skills of medical students. Arthritis Care Res
1998;11:32– 8.
Newble DI, Hoare J, Elmslie R. The validity and reliability of
a new examination of the clinical competence of medical
students. Med Educ 1981;15:46 –52.
Wakefield J. Direct observation. In: Neufeld VR, Norman GR,
editors. Assessing clinical competence. New York: Springer
Publishing Company; 1985. p. 51–71.
Без категории
Размер файла
97 Кб
structure, examination, exploring, objective, clinical, joint, students, skills, medical, assessment, use, videotape
Пожаловаться на содержимое документа