Materials and methods
Participants were 314 trainees. The evaluation was conducted in two parts. The first consisted of a 50-item multiple choice question (MCQ) test with each item worth two points. The items were chosen according to the core knowledge required, as indicated by the Taiwan Joint Commission on Hospital Accreditation (TJCHA). The second part involved six 10-min stations objective structured clinical examination (OSCE). Standardized patients (SPs) were used in four stations including internal medicine, surgery, obstetrics and gynecology, and pediatrics. Two stations concerned clinical skills performance such as endotracheal tube intubation and infection-protective clothing. The evaluation was held in the last month of the training program the Group PGY trainees accepted mentioned as below.
All of the 314 trainees participated in the MCQ exam. They were divided into four groups according to their training program.
Group R2 contained 156 2nd-year residents enrolled in a 6-month PGY training program.
Groups R1a and R1b contained 61 and 49 1st-year residents, respectively who were also enrolled in a 6-month PGY training program. According to the TJCHA's policy, the 61 R1a residents were enrolled in the PGY training program from July to December 2011 and then continued onto their 1st-year resident training program. The 49 R1b residents proceeded with their resident training program and then enrolled in the PGY training program from January to June 2012.
Group PGY consisted of 48 general residents who had just completed their internship training and then enrolled in a 1-year PGY training program from July 2011 to June 2012.
In Groups R2, R1a, and R1b, the residents chose their specialization for residency prior to enrolling in the PGY training program. The trainees of Group PGY had not decided on their specialization for a residency at the time the study was conducted.
In the second part, 24 residents from each group (n = 96) chosen randomized participated in the OSCE. The criteria for passing or failing each station were determined by the Angoff method. The results of every checklist were divided into three possible scores, not completed (score of 0), partially completed (score of 1), and fully completed (score of 2). The final score obtained at each station was determined by using the following equation: (Score obtained/maximum obtainable score) × 100. The mean score was then calculated across all stations. All the raters were qualified by the Taiwan Association of Medical Education after completing the rater training program.
The item difficulty index and the item discrimination index of the MCQ test were analyzed after the assessment. The trainees were scored by arrangement, taking the upper and lower quartiles, and then categorized into high- and low-grade groups with respect to the correct rate for each item as percentage in high (PH) or percentage in low (PL). The item difficulty index was calculated as (PH + PL)/2 and the item discrimination index as (PH−PL). An unpaired t-test, ANCOVA, and Pearson correlations were used to analyze the data via SPSS Version 19.0 (SPSS, Inc., Chicago, IL, USA). A p value below 0.05 was considered to indicate statistical significance.
The mean MCQ score for all of the 314 doctors was 68 ± 7 (range: 40–86). After further analysis, the mean scores in the four groups were 68 ± 7 (range: 52–82) in the R2 group, 69 ± 7 (range: 40–86) in the R1a group, 68 ± 8 (range: 48–86) in the R1b group, and 69 ± 7 (range: 46–86) in the PGY group. There was no significant difference between the four groups (p = 0.424).
The passing rates of the first and last 25% were used to determine the item discrimination and difficulty index for the MCQ test. The item discrimination index was defined as follows: Bad (≤0.19), acceptable (0.2–0.29), good (0.3–0.39), and excellent (≥0.4). The item difficulty index was defined as difficult (<0.4), moderate (0.4–0.6), and easy (>0.6). Among the 50 MCQ items, the item discrimination index was bad in 27 (54%), acceptable in 11 (22%), good in six (12%), and excellent in six [12%; Fig. 1A]. We re-evaluated the trainees' performance after excluding the 27 items with a bad index. The mean number of items passed was 13.2 in Group R2, 13.8 in Group R1a, 12.8 in Group R1b, and 13.5 in Group PGY. There was also no significant difference [p = 0.429, Fig. 2]. The item difficulty index was easy in 9 (39%), moderate in 10 (44%), and difficult in four (17%) of 23 items [Fig. 1B].
In the OSCE, the mean final scores of the six stations were 64.6 ± 6.5 in Group R2, 64.9 ± 6 in Group R1a, 64.1 ± 6.2 in Group R1b, and 68 ± 4.8 in Group PGY. The p value was 0.082 for the four groups [ANCOVA, Fig. 3]. When the performance difference between the assessments was analyzed, the p values were 0.236 for the SP-stations assessment and 0.527 for the clinical skills performance assessment. Finally, the correlation coefficient between the MCQ and the OSCE of all trainees was 0.333 [p = 0.002, Fig. 4].
The postgraduate training program for general medicine was implemented by the Taiwanese government after the severe acute respiratory syndrome pandemic in 2003 to address a need for improved professional training. The training program was implemented by the TJCHA with the aim of improving the competency of medical graduates with respect to patient-centered care as well as developing their the ability to perform holistic medical care and competency in medical knowledge, clinical skills, professional attitude, etc. PGY residents in Taiwan have been required to complete a general medicine training program since August 2003.
The current form of the Taiwanese postgraduate training program developed over three stages. The initial stage of the training program included a 3-month training period where the goal was to improve medical graduates' knowledge and attitude toward community health. After July 2006, the PGY program was extended to incorporate a 6-month training course (the second stage). It included the development of the training model and assessment methods and consisted of 1-month of training in general medicine, 2 months of training in community medicine, 3 months of training in specialty courses focused on primary care, and was followed by another 6 months of training in holistic care practice. In the third stage, a full-year program and was initiated in August 2011. This program included 3 months of community medicine, 3 months of general medicine, 2 months of general surgery, 1-month of emergency medicine, 1-month of pediatric medicine, 1-month of obstetrics and gynecology, and 1-month of a chosen specialty course.
In the first and second stages, the students could choose a specialized residency after graduating from medical school with the PGY training program being included in the 1st year of the residency training program. In the third stage, the students became general medical residents after graduation and enrolled in the full-year PGY training course prior to choosing a specialty residency. In 2011, there was an overlap of the second and third stages of the PGY training program, which provided a good opportunity to analyze and compare the results of the two programs.
The six core competencies emphasized and cultivated in the PGY training program followed the rules suggested by the Accreditation Council for Graduate Medical Education (ACGME). These competencies were patient care, medical knowledge, professionalism, interpersonal and communication skills, practice-based learning and improvement, and systems-based practice. It was important that the program had an effective plan for assessing trainees' performance throughout the program and a method for utilizing assessment results to improve the residents' performance. An evaluation toolbox from the ACGME suggested the best methods to assess competence. SPs, checklists, and OSCEs were used to evaluate competency in interpersonal and communication skills and patient care. MCQs and oral examinations were useful for evaluating competency in medical knowledge while OSCEs and checklists evaluated professionalism effectively. Student competency in practice-based learning and improvement was assessed with OSCE, SPs, checklists, and MCQ tests. In the evaluation of system-based practice, MCQ, OSCE, and checklists proved useful. Given their proven efficacy, MCQ tests, OSCEs with SPs, and checklists were used to analyze the learning outcomes of the different training programs examined in this study.
The MCQ test was used in departmental or comprehensive examinations for determining progress or certification. It is used more widely than the other methods due to its cost-effectiveness and its ability to yield a reliable score. The effectiveness of the MCQ test depends on a close relationship between the quality of the overall examination and the individual items. Each item should be developed to test competence in a clinical situation or in handling laboratory data, not memory, and students should be required to apply the knowledge they have gained to find a solution to the problem presented. Guidelines on the development of such items have been published,. Furthermore, the structure of the items plays an important role in their discriminatory power. Jozefowicz et al. presented a scale for rating item quality. All of the items, we used were developed following the aforementioned principles and had matched at least score four on Jozefowicz et al.'s scale. One of the most important aspects of quality is the ability to discriminate between students who learn well and those who did not. The discrimination index is also a valid measure of item quality. A relationship has also been demonstrated between the item discrimination index and the difficulty index. We analyzed the discrimination index of the original 50 items, with 23 items (46%) having acceptable results, and nearly half of the 23 items being moderately difficult (10 items, 44%). Though, there was no difference between the trainees after the evaluation via the 23 items, more items were needed to confirm the result.
A useful assessment tool is the use of SPs in a simulated clinical encounter, otherwise known as the OSCE. The OSCE was first introduced by Harden and Gleeson in 1979. Interactions with SPs can be tailored to meet specific educational goals and student performance can be rated dependably. According to the literature, evaluation reliability could be increased from 0.85 to 0.90 if there is a sufficient number of stations and trainees. The specific skills rated during the OSCE at our institute include history taking skills, physical examination skills, communication skills, technical skills, and skills on data interpretation, differential diagnosis, and making treatment decisions. Through the use of a checklist after evaluation of its reliability and validity, it could provide an objective and organizational structured assessment of trainees' technical skills. Across the modalities, there were no statistically significant differences among the four groups in our study and the trainees performed similarly after a further analysis (the p values were 0.834 for internal medicine, 0.297 for surgery, 0.071 for obstetrics and gynecology, 0.633 for pediatrics, 0.525 for endotracheal tube intubation, and 0.575 for infection-protective clothing).
This study revealed no significant differences in medical knowledge and clinical performance among the four groups of trainees regardless of program, and showed that the learning results persisted long after the training ended if the programs were well-designed. The weaknesses of the study consisted of the limited number of stations that the trainees participated in during the clinical performance evaluation and possibly the high-quality MCQ items; however, the results still provide valuable information that can be used to improve the design of the training program such as arranging more core competencies in the 1-year program.
Source of support
Peng-Wei Hsu received funding from this study from Chang Gung Memorial Hospital (grant CDRPG3B0021).
Conflicts of interest