999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Introduction to Classical Test Theory

2017-03-31 21:49:39孫千惠
青春歲月 2017年3期
關(guān)鍵詞:理論

Abstract:This paper gives an introduction to the Classical Test Theory (CTT), including the history, the procedure, the expansion of CTT. Also in this paper, shortcomings and reasons of its downfall are listed.

Key words:CTT;theory introduction

【摘要】本文介紹了經(jīng)典測試?yán)碚?,并且給出了經(jīng)典測試?yán)碚摰陌l(fā)展歷史,使用流程以及拓展。此外,文中還介紹了經(jīng)典測試?yán)碚摰娜秉c(diǎn)和其逐漸沒落的原因。

【關(guān)鍵詞】經(jīng)典測試?yán)碚摚焕碚摻榻B

1. Introduction

Classical Test Theory (CTT) is a body of related psychometric theory that predicts outcomes of psychological testing such as the difficulty of items or the ability of test-takers. Generally speaking, the aim of the theory is to understand and improve the reliability of psychological tests.

2. History

CTT was born only after the following 3 achievements or ideas were conceptualized: a recognition of the presence of errors in measurements, a conception of that error as a random variable, and a conception of correlation and how to index it. In 1904, Charles Spearman was responsible for figuring out how to correct a correlation coefficient for attenuation due to measurement error and to obtain the index of reliability needed in making the correction, and his finding was seen as the beginning of the theory(Traub, 1997). Others who had an influence in the theorys framework include: G U Yule, K R Formulas, M R Novick, etc. CTT as we know it today was codified by Novick (1966) and described in classic texts such as Lord & Novick (1968) and Allen & Yen (1979/2002).

Spearman created the theory in 1904, which was loosely utilized until 1966 when Novick put its use at the forefront of psychological theory (Novick, 1966). CTT can be identified as the theory of a true-test score, taking into account the previous score of a test item or a test-taking population to predict a future score for the same item or population. Using previous scores, theorists can predict which test questions will be answered correctly and which population tends to answer the questions successfully. Successful responses are then referred to as normative responses.

When considering a population, the entire population must be taken into account. For example, if all of the eleventh graders in the United States took the Advanced Placement Exam (APE) for English and the same overall score was identified trial after trial, that score would be identified as the normative score for the population. It is meaningless when correlated with any individual. One could individually score higher or lower than the normative score; however, CTT can make reliable identifications based on populations or individuals, depending upon the purpose of the test.

CTT believes that each person has a true score T that would be obtained if there were no errors in measurement. Unfortunately, test users never observe a person's true score, only an observed score, X, which is assumed to equal true score T plus some error E. The relations between the three variables X, T and E are used to describe the quality of test scores. The reliability of the observed test scores X, which is denoted as {\rho^2_{XT}}, is defined as the ratio of true score variance {\sigma^2_T} to the observed score variance {\sigma^2_X}:

{\rho^2_{XT}} = \frac{{\sigma^2_T}}{{\sigma^2_X}}

Because the variance of the observed scores can be shown to equal the sum of the variance of true scores and the variance of error scores, this is equivalent to

{\rho^2_{XT}}=\frac{{\sigma^2_T}}{{\sigma^2_X}}= \frac{{\sigma^2_T}}{{\sigma^2_T}+{\sigma^2_E}}

This equation, which formulates a signal-to-noise ratio, has intuitive appeal: The reliability of test scores becomes higher as the proportion of error variance in the test scores becomes lower and vice versa. The reliability is equal to the proportion of the variance in test scores that we could explain if we knew true scores. The square root of the reliability is the correlation between true and observed scores.

3. The process of CTT

1. come up with the question; 2. get data; 3. analysis data; 4. explain data; 5. come to a conclusion

And the pattern of data contains: 1. Nominal scale; 2. Ordinal scale; 3. Interval scale

4. Item Discrimination

The more an item discriminates among individuals with different amounts of the underlying concept of interest, the higher the item-discrimination index. The extreme group method can be used to calculate the discrimination index using the following 3 steps. Step 1 is to partition respondents who have the highest and lowest overall scores on the overall scale, aggregated across all items, into upper and lower groups. Step 2 is to examine each item and determine the proportion of individual respondents in the sample who endorse or respond to each item in upper and lower groups. Step 3 is to subtract the pair of proportions noted in Step 2. The higher this item-discrimination index, the more the item discriminates. It is useful to compare the discrimination indexes of each of the items in the scale.

5. Second language test

For ESL students, the fastest growing community of school-age children, it is common to have a non-native English speaker in the classroom. However, there is only one exam given to ESL students, the Test of English as a Foreign Language (TOEFL), as an entrance exam for students applying to college. The format for the TOEFL is a standardized, multiple-choice question exam. Dudley (2006) offers that multiple true-false question exams (MTF), can be just as reliable and a valid alternative to multiple-choice tests, which can be confusing to students (p. 199).

Dudley (2006) took two forms of test, which were multiple-choice in nature, and converted them to a multiple true-false format. He notes the findings are supportive with MTF format. (Dudley, 2006, p. 224) He also notes that conclusions of the study have provided sound empirical evidence that central factors such as item interdependence, reliability and concurrent validity are viable with MTF items that assess vocabulary and reading comprehension in the realm of norm-referenced testing (p. 224). Even though Dudley's (2006) focus was on undergraduate students, it is not a far reach to offer that teachers in the K-12 sector could begin creating MTF nature or converting already created multiple-choice exams to MTF using CTT.

6. Reliability

Reliability is important in the development of PRO measures. Validity is limited by reliability. If responses are inconsistent(unreliable), it necessarily implies invalidity. Reliability refers to the proportion of variance in a measure that can be ascribed to a common characteristic shared by the individual items, whereas validity refers to whether that characteristic is actually the one intended.

Test–retest reliability, which can apply to both single-item and multi-item scales, reflects the reproducibility of scale scores on repeated administrations over a period during which the respondents condition did not change. As a way to compute test–retest reliability, the kappa statistic can be used for categorical responses, and the intraclass correlation coefficient can be used for continuous responses. Further, having multiple items in a scale increases its reliability. In multi-item scales, a common indicator of scale reliability is Cronbach coefficient alpha, which is driven by the number of items and correlations of items in the scale.

The greater the proportion of shared variation, the more the items share in common and the more consistent they are in reflecting a common true score. The covariance-based formula for coefficient alpha expresses such reliability while adjusting for the number of items contributing to the prior calculations on the variances. The corresponding correlation–based formula, an alternative expression, represents coefficient alpha as the mean inter-item correlation among all pairs of items after adjustment for the number of items.

7. Shortcomings

One of the most well-known shortcomings of CTT is that examinee characteristics and test characteristics cannot be separated: each can only be interpreted in the context of the other. Another shortcoming lies in the definition of Reliability in CTT, which states that reliability is "the correlation between test scores on parallel forms of a test".The problem is that various reliability coefficients provide either lower bound estimates of reliability or reliability estimates with unknown biases. A third shortcoming involves the standard error of measurement. The problem here is that, the standard error of measurement is assumed to be the same for all examinees. However, as Hambleton explains in his book, scores on any test are unequally precise measures for examinees of different ability, thus making the assumption of equal errors of measurement for all examinees implausible (Hambleton, Swaminathan, Rogers, 1991, p.4). A fourth and final shortcoming of CTT is that it is test oriented, rather than item oriented. In other words, CTT cannot help us make predictions of how well an individual or even a group of examinees might do on a test item.

What makes CTT effective is also its primary downfall in that the normative scores used to predict future scores are specific to the samples previously studied. One may have received the highest score on the exam but was grouped with the population of test-takers when the APE results were used to predict future success or effectiveness. (Reid, et. al, 2007, p. 179). A secondary problem with CTT is that to gain useful information, an entire testing instrument has to be completed to gain predictable information regarding a population or an individual. Only the completed exam is what matters. Finally, as Reid, et al. (2007) points out, "the instability of scores at extreme levels of an ability or trait, even within the normative sample" is a concern with CTT (p. 179).

8. Conclusion

Although CTT has a lot of shortcomings in modern life, but its truly a famous theory and contributes much to the education and modeling and something like that. Its a practical way of tackling the complex questions and problems by collecting data, analyzing data and giving answers.

【Reference】

[1] American Psychiatric Association. Diagnostic and statistical manual of mental disorders(3rd ed., rev.)[M]. Washington, DC: Author, 1987.

[2] Bolton, B. Handbook of measurement and evaluation in rehabilitation(3rd ed.)[M]. Gaithersburg, MD: Aspen, 2001.

[3] Brown, J.D. and Hudson, T. Criterion-referenced language testing[M]. New York, NY: Cambridge University Press, 2002.

[4] Corkum, P. Andreou, P. Schachar, R. Tannock, R. & Cunningham, C. The Telephone Interview Probe[M]. Educational & Psychological Measurement, 2007:67,169-185.

[5] Cronbach, L. J. Note on the multiple true-false test exercise[M]. Journal of Educational Psychology, 1939:30,628-31.

[6] Cronbach, L. J., Nageswari, R., & Gleser, G.C. Theory of generalizability: A liberation of reliability theory[M]. The British Journal of Statistical Psychology, 1963:16,137-163.

[7] Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, N. The dependability of behavioral measurements: Theory of generalizability for scores and profiles[M]. New York: John Wiley, 1972.

[8] Dudley, A. Multiple dichotomous-scored items in second language testing: investigating the multiple true-false item type under norm-referenced conditions[M]. Language Testing, 2006:23,198-228.

[9] Haladyna, T. M. Developing and validating multiple-choice test items(2nd ed.)[M]. Mahwah, NJ: Lawrence Erlbaum, 1999.

[10] Koppitz, E. M. Psychological evaluation of children's human-figure drawings[M]. New York: Grune & Stratton, 1968.

[11] Novick, M. R. The axioms and principal results of classical test theory[M]. Journal of Mathematical Psychology, 1966:3,1-18.

【作者簡介】

孫千惠(1992—),女,漢族,碩士研究生學(xué)歷,天津市武警后勤學(xué)院大學(xué)英語助教,研究方向:外國語言學(xué)及應(yīng)用語言學(xué)。

猜你喜歡
理論
堅(jiān)持理論創(chuàng)新
神秘的混沌理論
理論創(chuàng)新 引領(lǐng)百年
相關(guān)于撓理論的Baer模
多項(xiàng)式理論在矩陣求逆中的應(yīng)用
基于Popov超穩(wěn)定理論的PMSM轉(zhuǎn)速辨識(shí)
十八大以來黨關(guān)于反腐倡廉的理論創(chuàng)新
“3T”理論與“3S”理論的比較研究
理論宣講如何答疑解惑
婦女解放——從理論到實(shí)踐
主站蜘蛛池模板: 日韩精品专区免费无码aⅴ| 国产精品 欧美激情 在线播放 | 热久久这里是精品6免费观看| 欧美日韩中文字幕二区三区| 国产成人区在线观看视频| 一区二区在线视频免费观看| 国产精品免费电影| 久久综合丝袜长腿丝袜| 国产成人a在线观看视频| 国产SUV精品一区二区6| 女人18一级毛片免费观看 | 国产9191精品免费观看| 午夜天堂视频| 亚洲乱强伦| 亚洲AⅤ波多系列中文字幕 | 亚洲中久无码永久在线观看软件| 亚洲精品午夜天堂网页| 日韩麻豆小视频| 国产在线91在线电影| 国产微拍一区| 国产a v无码专区亚洲av| 国产日本一线在线观看免费| 无码一区18禁| 国产男女免费完整版视频| 精品无码一区二区在线观看| 在线欧美a| 手机在线看片不卡中文字幕| 久久精品中文字幕免费| 亚洲视频欧美不卡| 在线国产资源| 蜜芽一区二区国产精品| 伊人大杳蕉中文无码| 日韩欧美视频第一区在线观看| 国产精品人莉莉成在线播放| 欧美色香蕉| 国产综合另类小说色区色噜噜| 欧美一级黄色影院| av天堂最新版在线| 久久人搡人人玩人妻精品| 色爽网免费视频| 国产精品无码一二三视频| 国产菊爆视频在线观看| 亚洲欧美极品| 成人精品视频一区二区在线| 澳门av无码| 亚洲免费成人网| 国产99视频精品免费观看9e| 亚洲午夜久久久精品电影院| 国产欧美视频综合二区| 亚洲码在线中文在线观看| 99久久精品国产麻豆婷婷| a级毛片免费播放| 欧美另类精品一区二区三区| 欧美成人精品高清在线下载| 免费全部高H视频无码无遮掩| 丰满人妻中出白浆| 无码电影在线观看| 国产在线小视频| 日本免费新一区视频| 久久九九热视频| 国产欧美在线| 97青青青国产在线播放| 欧美在线一级片| 呦女亚洲一区精品| 国产丝袜无码一区二区视频| 欧美国产综合色视频| 中文字幕免费视频| 国产日韩丝袜一二三区| 国产成人欧美| 亚洲精品午夜天堂网页| 一区二区偷拍美女撒尿视频| 全部毛片免费看| 国产特一级毛片| 伊在人亚洲香蕉精品播放| 精品一区国产精品| 国产sm重味一区二区三区| 日韩免费毛片| 亚洲av无码成人专区| 精品無碼一區在線觀看 | 国产一区二区三区免费观看| 亚洲欧洲日韩综合色天使| 国产精品亚洲va在线观看|