Cultivate Learning : Center for research and professional developement

WaKIDS Concurrent-Validity and Reliability study (2012-2013)

During the 2012-2013 school year, the CQEL team collaborated with OSPI and DEL to evaluate the use of the WaKIDS whole child assessment (adapted version of TS-GOLD) in Washington State.  There were two components of the study: (1) inter-rater reliability –an examination of how well teachers’ ratings of students on the WaKIDS assessment matched with ratings from another independent and well trained observer, and (2) concurrent validity – an examination of whether teachers’ ratings on the WaKIDS assessment gave a valid picture of student skills when compared with student scores from other externally assessed standardized measures in each of the six developmental domains.

For the reliability component, 54 WaKIDS teachers (across 42 schools and 26 districts) were recruited to view video portfolios of four different students working and interacting in various kindergarten classrooms. The four students were selected based on gender, ethnicity, skill/developmental level, and language proficiency in order to ensure that results reflected teachers’ reliability using the WaKIDS assessment across multiple types of learners.  Teachers were asked to complete the 19 GOLD objectives from the WaKIDS assessment for each student by entering scores into an online survey. Teachers’ scores were then compared with those given by an independent, well trained observer, trained to expertise in scoring the GOLD assessment. Analyses explored the degree to which teachers were in exact agreement, adjacent agreement, or sufficiently discrepant from the master code to result in a different readiness rating in terms of the cut point score.

For the concurrent validity component, a psychometric design was implemented in which kindergarten students’ scores (n = 333) from the WaKIDS assessment (administered by their teachers in the fall of 2012) were compared to scores from individually-administered, norm-referenced assessments (one or two selected per WaKIDS domain). The individually-administered assessments selected were divided up into two separate batteries, so that students were only asked to complete three or four assessments, and were administered by our team in the fall of the same year. These assessments included:

Teacher and student using magnifying glassBattery A:

The Peabody Picture Vocabulary Test-Fourth Edition (PPVT-4)
Woodcock-Johnson III Tests of Achievement (WJ III) Applied Problems
Test of Phonological Awareness PLUS (TOPA-2+)
The Early Screening Inventory-Revised (ESI-R)
Social Skills Improvement System (SSIS) – Parent Form
Individual Observation Form

Battery B:

The Test of Early Reading Ability-Third Edition (TERA-3)
The Oral and Written Language Scales-Second Edition (OWLS II) Oral Expression Scale
The Learning Motivation Task (adapted from Smiley & Dweck, 1994)
Social Skills Improvement System (SSIS) – Parent Form
Individual Observation Form

Across all classrooms, 49.5% of the students received Battery A, and 50.5% received Battery B. These groups were balanced in terms of student gender, which was the only grouping factor used to determine battery assignment.

The concurrent validity of the WaKIDS assessment was examined by computing zero-order correlations between student scores on each of the six WaKIDS domains and the scores on the corresponding individually-administered assessments. Additionally, Hierarchical Linear Modeling was used to examine the student data more closely by accounting for the nesting of students within classrooms and related issues of reliability in teacher ratings.

Key findings from both the reliability and validity components are summarized below:

In general, ratings are similar

Overall, teachers in the state of Washington and ratings from an independent, master rater for the same children were mostly moderately correlated (98.1% of teachers).

Training helps

Completion of the Teaching Strategies GOLD inter-rater reliability certification (in addition to the OSPI summer trainings) is beneficial for teachers.  Teachers in the state of Washington who participated in this certification were more likely to agree with the ratings from an independent, well trained observer of the same child’s skill levels.

Experience matters

Teacher accuracy of ratings for students on TS-GOLD tends to increase with number of years teaching in the kindergarten classroom.  Teachers in the state of Washington with more experience teaching kindergarten students were more likely to agree with ratings from an independent, well trained observer of the same child’s skill levels.

Accuracy varies by domain

Teacher accuracy of ratings was greater in the domains of social-emotional, physical and language, as opposed to the cognitive, literacy, and math domains. Teachers in the state of Washington were more likely to agree with ratings from an independent, well trained observer of the same child’s skill levels in the social-emotional, physical, and language domains.

Accuracy varies in terms of individual student characteristics

Teacher accuracy of ratings was greater for the typically developing, native English speaking male and female students.  Teachers in the state of Washington were more likely to agree with ratings from an independent, well trained observer of the same child’s skill levels for the typically developing, native English speaking male and female students.

 

Chart

 

Some areas of development on TS-GOLD may be trickier to identify than others

Misidentification of students in regards to demonstration of “characteristics of entering kindergarteners” was more pronounced in the physical, cognitive, and math domains and for the lower functioning and English language learner students.  Teachers in the state of Washington had more difficulty rating accurately in these areas to the extent that the same student was identified as demonstrating skills on the other side of the developmental cut point from the independent, well trained observer’s rating.

Ratings are valid

Domain ratings on TS-GOLD are correlated with their corresponding standardized measures in each of the six learning domains, showing significant positive relationships. Four of the six domains revealed correlations in the moderate range.

Further, domain ratings on TS-GOLD significantly predicted individual scores on standardized measures in the domains of math, language, and literacy.  For example, a teacher’s rating for “Johnny” in the TS-GOLD math domain was predictive of “Johnny’s” score on an individually administered standardized math measure conducted by an external assessor.  That is, as teacher ratings in TS-GOLD math increases, so does the score on the standardized math measure.  This indicates that TS-GOLD taps the intended constructs in the math, language, and literacy domains.

These results provide support for the reliability and concurrent validity of the WaKIDS assessment as it pertains to the range of abilities and linguistic/ethnic/cultural diversity represented by kindergarteners in the State of Washington. For further details on this Reliability and Concurrent-Validity study, you can view the full report here.