The First Step: Establishing a Reliable and Valid Tool

By Elaine R. Graf
Assistant Professor of Nursing
Originally published in Faculty Bulletin 59 (Jan. 1996): 35-38

In an era that focuses on the outcomes of a baccalaureate education, it is imperative that curricular decisions are made based on data obtained from reliable and valid assessment measures. In 1993, the School of Nursing received a Student Outcomes Assessment Grant and began assessing student writing performance by a holistic grading methodology. The project is a direct extension of the assessment studies completed by our English Department and provides reliability and validity data on the use of the "Exit Criteria for First Year Composition," a holistic grading tool, within a professional degree-granting program.

Holistic grading is a method of writing assessment with a long record of successful use in large-scale academic assessments. At Northern the English department uses this method to determine English course placement for entering freshman. The intent of holistic grading is to provide a score that indicates the general quality of a student's writing based on a given set of writing criteria: engagement of topic, establishment of an appropriate voice, organization, interaction with sources, use of appropriate documentation/referencing style, and presentation of a polished paper. The School of Nursing sought to validate the writing criteria within the nursing program and to assess the writing abilities of our students at various points within the curriculum. We collected and analyzed 160 sophomore and 79 senior papers in the first year of our project and an additional 79 senior papers in the second year, for a total sample of 318 papers.

All students used the same style and format for their papers. Acting as lobbyists for the Illinois Nurses Association (INA), they analyzed a "given" legislative health proposal and developed an argumentative paper recommending INA action in relation to the proposed legislation. Five faculty members graded the papers using faculty-generated analytic scoring guides that identified either percentages or points given to specific paper attributes (content, organization, use of APA style, spelling, grammar). Additionally, they each independently read five selected papers and graded them according to an analytic scoring guide that was provided, obtaining an intragroup reliability of .48.

Subsequently a writing assessment team trained in the use of the "Exit Criteria for First Year Composition" holistically analyzed a second ungraded copy of the papers. Each paper was read and scored separately on a scale of one to six by two readers. Agreement was determined by exact or adjacent scores given by the readers. The scores were then added, resulting in a range from tow to twelve. If the readers disagreed, the paper was read by a third independent reader whose score was then compared with the first two scores to determine agreement. In the first year of the study, the writing assessment team, consisting of four graduate research assistants, maintained an intergroup reliability of .81. Additionally, to calculate a six-month measurement of score stability, two of the four graduate assistants reread ten papers, obtaining a stability factor of .80. In the second year of the study, the assessment team, consisting of two graduate research assistants, established an interrater reliability of .91. These results support the findings of other studies that holistic grading methods yield higher levels of interrater reliability than analytic methods.

Eleven nursing faculty members participated in holistic grading inservices during the second year of the project. They learned how to holistically read papers using the above scale and commented on the face validity of the criteria for the nursing writing curriculum, the usability of the scale, and the use of holistic grading as a measurement technique. They each read and scored three to five papers and achieved an intragroup reliability of .78. Faculty unanimously agreed that the scale was easy to use and that the criteria measured by the scale were similar to writing criteria in the School of Nursing. The inservices provided a unique opportunity for faculty to compare impressions of the quality of the papers and to openly discuss what they value in student writing. Many faculty could identify a philosophical perspective to their grading methods and found themselves questioning their beliefs.

Project staff analyzed data from the 318 papers using frequency, correlation, and paired t-test procedures. Frequency distributions for assessment team scores resulted in a normal curve distribution with a mean score of 8.85 (reflecting C+ or satisfactory writing skills), while faculty score distributions reflected a negatively skewed B+-A modal curve with a mean score of 3.45 (reflecting B+ or above satisfactory skills). The holistic scoring method resulted in a higher degree of score discrimination. Based on the data from assessment team scores, 91 percent of the papers were assessed to be at a satisfactory writing level or higher. This percentage remained the same for both years of the project.

We found significant correlations between assessment team scores and the following variables: faculty scores for both sophomore and senior level classes (r=.39 or r=.30, p=.01), and freshmen English core competency exam I scores (r=.31, p=.05). Given the longitudinal design of the project, we able to assess the correlation between sophomore and senior writing abilities for 37 students. This correlation was also significant (r=.33, p=.05).

Paired t-test analysis (n=53) of the mean assessment team score(M=9.1) with the mean English core competency I score (M=7.6) showed a significant mean difference ( p=.000.), suggesting that writing skills of sophomore and senior nursing students improved dramatically during their college years. On the other hand, paired t-test analysis (n=37) of the mean assessment team score for senior papers (M=8.8) with the mean assessment team score for sophomore papers (M=8.7) showed no significant difference, suggesting that writing skills may remain stable during the last two years of a student's college experience. Given the sample size, however, these data should be interpreted cautiously.

The assessment project has been very helpful in identifying areas of strength and areas of concern within the School of Nursing's writing curriculum, particularly with regard to the 9 percent of students with unsatisfactory skills. Unsatisfactory papers will be analyzed to detect areas of weakness. Concern over the low level of interrater reliability on paper scores among faculty will continue to be addressed through faculty inservices. Faculty will be encouraged to participate in open reading sessions to compare their impressions of paper attributes, discuss the importance of various writing subskills, and monitor interrater reliability. On the basis of the reliability and validity data we have established, we believe that the "Exit Criteria for Freshman English" is an effective tool to evaluate writing and that replication of our project in other departments within the university would be invaluable.

White, Edward M. (1985). Teaching and assessing writing, Washington: Jossey-Bass Publishers.

Appreciation is extended to Dr. Bob Self, Dr. Charles Pennel, Ms. Ellen Franklin and all nursing faculty who participated in this project for their ongoing support and dedication to the need for assessment.

Appreciation is also extended to the Committee for the Improvement of Undergraduate Education for funding the second year of the project.

The "Exit Criteria for First Year Composition" is available to all faculty within the university. Contact the English department if you are interested.