Assessment Terms Glossary
Glossary IndexClick here to view an index of terms
Annual update: A brief report from each academic program based on its assessment plan and submitted annually, which outlines how evidence was used to improve student learning outcomes through curricular and/or other changes or to document that no changes were needed.
Assessment: A method for analyzing and describing student learning outcomes or program achievement of objectives. Many assessments are not tests. For students, a reading miscue analysis is an assessment, a direct observation of student behavior can be an assessment, and a student conference can be an assessment. For programs, a senior exit interview can be an assessment, and an employer survey of satisfaction with graduates can be an assessment. Good assessment requires feedback to those who are being assessed so that they can use that information to make improvements. A good assessment program requires using a variety of assessment instruments each one designed to discover unique aspects of student learning outcomes and achievement of program objectives.
Assessment for accountability: Assessment of some unit (could be a program, department, college or entire institution) to satisfy stakeholders external to the unit itself. Results are summative and are often compared across units. For example, to retain state approval, the achievement of a 90 percent pass rate or better on teacher certification tests by graduates of a school of education.
Assessment of individuals: Uses the individual student, and his/her learning, as the level of analysis. Can be quantitative or qualitative, formative or summative, standards-based or value added, and used for improvement. Would need to be aggregated if used for accountability purposes. Examples: improvement in student knowledge of a subject during a single course; improved ability of a student to build cogent arguments over the course of an undergraduate career.
Assessment of institutions: Uses the institution as the level of analysis. Can be quantitative or qualitative, formative or summative, standards-based or value added, and used for improvement or for accountability. Ideally institution-wide goals and objectives would serve as a basis for the assessment. Example: how well students across the institution can work in multi-cultural teams as sophomores and seniors.
Assessment of programs: Uses the department or program as the level of analysis. Can be quantitative or qualitative, formative or summative, standards- based or value added, and used for improvement or for accountability. Ideally program goals and objectives would serve as a basis for the assessment. Example: how sophisticated a close reading of texts senior English majors can accomplish (if used to determine value added, would be compared to the ability of newly declared majors).
Assessment plan: A document that outlines the student learning outcomes and program objectives, the direct and indirect assessment methods used to demonstrate the attainment of each outcome/objective, a brief explanation of the assessment methods, an indication of which outcome(s)/objectives is/are addressed by each method, the intervals at which evidence is collected and reviewed, and the individual(s) responsible for the collection/review of evidence.
Authentic assessment: An assessment that measures a student's ability to perform a “real world” task in the way professionals in the field would perform it. An authentic writing task might arise if students had been reading about nutrition and decided to ask the school to provide healthy snacks rather than candy machines; their writing would be assessed in terms of the response it received from the principal and/or school board. An authentic reading task would require assessing a student's understanding of a book he or she had selected to read without any suggestions or restrictions by the teacher. Opportunities for truly authentic assessment do not occur regularly in most classrooms.
Authentic performance assessment: Since regular opportunities for truly authentic tasks come infrequently in most classrooms, this term generally indicates an evaluation of a student's ability to perform a complex task that is common in the classroom. An authentic performance assessment in a science class would occur when a student is asked to perform an experiment and write a lab report; an authentic writing performance assessment would occur when a student generated a topic, created multiple drafts, sought outside opinions and editorial assistance, and published his or her paper in a classroom magazine or web page. Taking a test over science terms or labeling the parts of a sentence would not be authentic performance assessment. Writing an essay in a limited amount of time in response to a prompt is not an authentic writing assessment either because these circumstances do not match the way writing is usually produced outside of school.
Behavioral observations: Measuring the frequency, duration, topology, etc. of student actions, usually in a natural setting with non-interactive methods, for example, formal or informal observations of a classroom. Observations are most often made by an individual and can be augmented by audio or videotape.
Capstone Courses: could be a senior seminar or designated assessment course. Program learning outcomes can be integrated into assignments.
Case Studies: involve a systematic inquiry into a specific phenomenon, e.g. individual, event, program, or process. Data are collected via multiple methods often utilizing both qualitative and quantitative approaches.
Classroom Assessment: is often designed for individual faculty who wish to improve their teaching of a specific course. Data collected can be analyzed to assess student learning outcomes for a program.
College Portrait: is a web based tool with which users can obtain information about state colleges' and universities' students, programs, degrees awarded, financial aid, admissions, udergraduate success and progress rates, etc. College Portrait is the product of the Voluntary System of Accountability (VSA).
Collective Portfolios: Faculty assemble samples of student work from various classes and use the “collective” to assess specific program learning outcomes. Portfolios can be assessed by using scoring rubrics; expectations should be clarified before portfolios are examined.
Commercial, norm-referenced, standardized exams: Group administered, mostly or entirely multiple-choice, "objective" tests in one or more curricular areas. Scores are based on comparison with a reference or norm group. Typically must be purchased from a private vendor.
Content Analysis: is a procedure that categorizes the content of written documents. The analysis begins with identifying the unit of observation, such as a word, phrase, or concept, and then creating meaningful categories to which each item can be assigned. For example, a student’s statement that “I learned that I could be comfortable with someone from another culture” could be assigned to the category of “Positive Statements about Diversity.” The number of incidents that this type of response occurred can then be quantified and compared with neutral or negative responses addressing the same category.
Convergent validity: General agreement among ratings, gathered independently of one another, where measures should be theoretically related.
Course-embedded assessment: Course-embedded assessment refers to techniques that can be utilized within the context of a classroom (one class period, several or over the duration of the course) to assess students' learning, as individuals and in groups. Course-embedded assessments can be formative or summative. When used in conjunction with other assessment tools, course-embedded assessment can provide valuable information at specific points of a program. For example, faculty members teaching multiple sections of an introductory course might include a common pre-test to determine student knowledge, skills and dispositions in a particular field at program admission. There are literally hundreds of classroom assessment techniques, limited only by the instructor's imagination (see also embedded assessment).
Criterion-referenced: Criterion-referenced tests determine what test-takers can do and what they know, not how they compare to others. Criterion-referenced tests report on how well students are doing relative to a predetermined performance level on a specified set of educational goals or outcomes included in the curriculum. For example, student scores on tests as indicators of student performance on standardized exams.
Direct assessment methods: These methods involve students' displays of knowledge and skills (e.g. test results, written assignments, presentations, classroom assignments) resulting from learning experiences in the class/program.
Embedded assessment: A means of gathering information about student learning that is built into and a natural part of the teaching learning process. Often used for assessment purposes in classroom assignments that are evaluated to assign students a grade. Can assess individual student performance or aggregate the information to provide information about the course or program; can be formative or summative, quantitative or qualitative. Example: as part of a course, expecting each senior to complete a research paper that is graded for content and style, but is also assessed for advanced ability to locate and evaluate Web-based information (as part of a college-wide outcome to demonstrate information literacy).
eportfolio (electronic portfolio): An electronic format of a collection of work developed across varied contexts over time. The eportfolio can advance learning by providing students and/or faculty with a way to organize, archive and display pieces of work. The electronic format allows faculty and other professionals to evaluate student portfolios using technology, which may include the Internet, CD-ROM, video, animation or audio. Electronic portfolios are becoming a popular alternative to traditional paper-based portfolios because they offer practitioners and peers the opportunity to review, communicate and assess portfolios in an asynchronous manner (see also portfolios also called course-embedded assessment).
Evaluation: (1) Depending on the context, evaluation may mean either assessment or test. Many test manufacturers and teachers use these three terms interchangeably which means you have to pay close attention to how the terms are being used and why they are being used that way. For instance, tests that do not provide any immediate, helpful feedback to students and teachers should never be called “assessments,” but many testing companies and some administrators use this term to describe tests that return only score numbers to students and/or teachers.
Evaluation: (3) A value judgment about the results of assessment data. For example, evaluation of student learning requires that educators compare student performance to a standard to determine how the student measures up. Depending on the result, decisions are made regarding whether and how to improve student performance.
Exit and other interviews: Asking individuals to share their perceptions of their own attitudes and/or behaviors or those of others, evaluating student reports of their attitudes and/or behaviors in a face-to-face-dialogue.
External examiner: Using an expert in the field from outside your program, usually from a similar program at another institution to conduct, evaluate, or supplement assessment of your students. Information can be obtained from external evaluators using many methods including surveys, interviews, etc.
External validity: External validity refers to the extent to which the results of a study are generalizable or transferable to other settings. Generalizibality is the extent to which assessment findings and conclusions from a study conducted on a sample population can be applied to the population at large. Transferability is the ability to apply the findings in one context to another similar context.
Fairness: (1) Assessment or test that provides an even playing field for all students. Absolute fairness is an impossible goal because all tests privilege some test takers over others; standardized tests provide one kind of fairness while performance tests provide another. The highest degree of fairness can be achieved when students can demonstrate their understanding in a variety of ways.
Fairness: (2) Teachers, students, parents and administrators agree that the instrument has validity, reliability, and authenticity, and they therefore have confidence in the instrument and its results.
Focus groups: Typically conducted with 7-12 individuals who share certain characteristics that are related to a particular topic, area or assessment question. Group discussions are conducted by a trained moderator with participants to identify trends/patterns in perceptions. The moderator's purpose is to provide direction and set the tone for the group discussion, encourage active participation from all group members, and manage time. Moderators must not allow their own biases to enter, verbally or nonverbally. Careful and systematic analysis of the discussions provides information that can be used to assess and/or improve the desired outcome.
Follow-up report: A report requested by the Academic Planning Council (APC) following program review to address specific issue(s)/concern(s) that result from the Council's examination review of program review documents. The report is submitted within the time frame identified by the Council prior to the program's full review by the APC.
Formative assessment: The gathering of information about student learning during the progression of a course or program and usually repeatedly-to improve the learning of those students. Assessment feedback is short term in duration. Example: reading the first lab reports of a class to assess whether some or all students in the group need a lesson on how to make them succinct and informative.
“High stakes” use of assessment: The decision to use the results of assessment to set a hurdle that needs to be cleared for completing a program of study, receiving certification, or moving to the next level. Most often the assessment so used is externally developed, based on set standards, carried out in a secure testing situation, and administered at a single point in time. Examples: at the secondary school level, statewide exams required for graduation; in postgraduate education, the bar exam.
Indirect assessment of learning: Gathers reflection about the learning or secondary evidence of its existence. Example: a student survey about whether a course or program helped develop a greater sensitivity to issues of diversity.
Institutional portfolios: Institutional portfolios provide a means of assessing the impact of the entire educational experience on student learning. They can be used to drive internal improvement and external accountability. Like student portfolios, they allow for internal improvement and external accountability, but on the level of the whole institution (see also portfolios).
Internal validity: Internal validity refers to (1) the rigor with which the study was conducted (e.g., the study's design, the care taken to conduct measurements, and decisions concerning what was and wasn't measured) and (2) the extent to which the designers of a study have taken into account alternative explanations for any causal relationships they explore.
Interviews: are conversations or direct questioning with an individual or group of people. The interviews can be conducted in person or on the telephone. The length of an interview can vary from 20 minutes to over an hour. Interviewers should be trained to follow agreed-upon procedures (protocols).
Local assessment: Means and methods that are developed by an institution's faculty based on their teaching approaches, students, and learning goals. Is an antonym for “external assessment.” Example: one college's use of nursing students' writing about the “universal precautions” at multiple points in their undergraduate program as an assessment of the development of writing competence.
Matrices: are used to summarize the relationship between program objectives and courses, course assignments, or course syllabus objectives to examine congruence and to ensure that all objectives have been sufficiently structured into the curriculum.
Norm-reference: A norm-referenced test is one designed to highlight achievement differences between and among students to produce a dependable rank order of students across a continuum of achievement from high achievers to low achievers.
Observations: can be of any social phenomenon, such as student presentations, students working in the library, or interactions at student help desks. Observations can be recorded as a narrative or in a highly structured format, such as a checklist, and they should be focused on specific program objectives.
Observer effect: The degree to which the assessment results are affected by the presence of an observer.
Performance assessment : A method for assessing how well students use their knowledge and skills in order to do something. Music students performing a new piece of music before a panel of judges are undergoing performance assessment; students who are expected to demonstrate an understanding of basic grammar, spelling, and organizational skills while writing a paper are undergoing performance assessment; business students asked to write a proposal to solve a problem presented in a case study are undergoing performance assessment.
Program review: The administrative (college and provost's staff) and peer (Academic Planning Council) review of academic programs conducted on an eight-year cycle, the results of which are reported to the NIU Board of Trustees and the IBHE. This review includes a comprehensive analysis of the structure, processes, and outcomes of the program. The outcomes reported in the program reviews include program outcomes (e.g. costs, degrees awarded) as well as student learning outcomes (i.e. what students know and can do at the completion of the program).
Qualitative methods of assessment: Methods that rely on descriptions rather than numbers. Examples: ethnographic field studies, logs, journals, participant observations, open-ended questions on interviews and surveys.
Quantitative methods of assessment: Methods that rely on numerical scores or ratings. Examples: surveys, inventories, institutional/departmental data, departmental/course-level exams (locally constructed, standardized, etc.)
Reflective Essays: generally are brief (five to ten minute) essays on topics related to identified learning outcomes, although they may be longer when assigned as homework. Students are asked to reflect on a selected issue. Content analysis is used to analyze results.
Reliability: The extent to which an experiment, test or any measuring procedure yields the same result on repeated trials.
Rubrics: A set of categories that define and describe the important components of the work being completed, critiqued or assessed. Each category contains a graduation of levels of completion or competence with a score assigned to each level and a clear description of what criteria need to be met to attain the score at each level.
Simulations: A competency-based measure where a person's abilities are measured in a situation that approximates a "real world" setting. Simulation is primarily used when it is impractical to observe a person performing a task in a real world situation (e.g. on the job).
Stakeholder: Anyone who has a vested interest in the outcome of the program/project. In a high stakes standardized test (a graduation requirement, for example), when students' scores are aggregated and published in the paper by school, the stakeholders include students, teachers, parents, school and district administrators, lawmakers (including the governor), and even real estate agents. It is always interesting to note which stakeholders seem to have the most at risk and which stakeholders seem to have the most power; these groups are seldom the same.
Standard: The performance level associated with a particular rating or grade on a test. For instance, 90% may be the standard for an A in a particular course; on a standardized test, a cutting score or cut point is used to determine the difference between one standard and the next.
Standardized test: This kind of test (sometimes called “norm-referenced”) is used to measure the performance of a group against that of a larger group. Standardized tests are often used in large-scale assessment projects, where the overall results of the group are more important than specific data on each individual client. Standardized tests are not authentic. They are most useful for reporting summative information, and are least useful for classroom diagnosis and formative purposes.
Status report: A description of the implementation of the plan's assessment methods, the findings (evidence) from assessment methods, how the findings were used in decisions to maintain or improve student learning (academic programs) or unit outcomes (support units), the results of previous changes to improve outcomes, and the need for additional information and/or resources to implement an approved assessment plan or gather additional evidence.
Summative assessment: Assessment that is done at the conclusion of a course or some larger instructional period (e.g., at the end of the program). The purpose is to determine success or to what extent the program/project/course met its goals.
Surveys: are commonly used with open-ended and closed-ended questions. Closed ended questions require respondents to answer the question from a provided list of responses. Typically, the list is a progressive scale ranging from low to high, or strongly agree to strongly disagree.
Transcript Analysis: are examined to see if students followed expected enrollment patterns or to examine specific research questions, such as to explore differences between transfer and freshmen enrolled students.
Utility: (2) The relative value of an outcome with respect to a set of other possible outcomes. Hence test utility refers to an evaluation, often in cost-benefit form, of the relative value of using a test vs. not using it, of using a test in one manner vs. another, or of using one test vs. another test.
- relevance - the option measures your educational objective as directly as possible
- accuracy - the option measures your educational objective as precisely as possible
- utility - the option provides formative and summative results with clear implications for educational program evaluation and improvement
Value added: The increase in learning that occurs during a course, program, or undergraduate education. Can either focus on the individual student (how much better a student can write, for example, at the end than at the beginning) or on a cohort of students (whether senior papers demonstrate more sophisticated writing skills-in the aggregate-than freshmen papers). Requires a baseline measurement for comparison.
Voluntary System of Accountability (VSA): a joint accountability initiative by the American Association of State Colleges and Universities (AASCU) and the Association of Public and Land Grand Universities (APLU) aimed at making institutional data transparent.
References and definitions adopted from the following links: