Your conditions: 彭亚风
  • 不同认知结构被试的测验设计模式

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: Doctors have to use different medical technologies to diagnose different kinds of illness effectively. Similarly, teachers have to use well designed tests to provide an accurate evaluation of students with different cognitive structures. To provide such an evaluation, we recommend to adopt the Cognitive Diagnostic Assessment (CDA). CDA could measure specific cognitive structures and processing skills of students so as to provide information about their cognitive strengths and weaknesses. In general, the typical design procedure of a CDA test is as follow: firstly, identify the target attributes and their hierarchical relationships; secondly, design a Q matrix (which characterizes the design of test construct and content); finally, construct test items. Within that designing framework, two forms of test are available: the traditional test and the computerized adaptive test (CAT). The former is a kind of test that has a fixed-structure for all participants with different cognitive structures, the latter is tailored to each participant’s cognitive structure. Researchers have not, however, considered the specific test design for different cognitive structures when using these two test forms. As a result, the traditional test requires more items to gain a precise evaluation of a group of participants with mixed cognitive structures, and a cognitive diagnosis computer adaptive test (CD-CAT) has low efficiency of the item bank usage due to the problems in assembling a particular item bank. The key to overcome these hurdles is to explore the appropriate design tailored for participants with different cognitive structures. As discussed above, a reasonable diagnosis test should be specific for the cognitive structure of target examinees so to perform classification precisely and efficiently. This is in line with CAT. In CAT, an ideal item bank serves as a cornerstone in achieving this purpose. In this regard, Reckase (2003, 2007 & 2010) came up with an approach named p-optimality in designing an optimal item bank. Inspired by the p-optimality and working according to the characteristics of CDA, we proposed a method to design the test for different cognitive structures. We conducted a Monte Carlo simulation study to explore the different test design modes for different cognitive structures under six attribute hierarchical structures (Linear, Convergent, Divergent, Unstructured, Independent and Mixture). The results show that: (1) the optimal test design modes for different cognitive structures are different under the same hierarchical structure in test length, initial exploration stage (Stage 0), accurately estimation stage (Stage 1); (2) the item bank for cognitive diagnosis computer adaptive test (CD-CAT) we built, according to the different cognitive structures’ optimal test design modes, has a superior performance on item pool usage than other commonly used item banks no matter whether the fixed-length test or the variable-length test is used. We provide suggestions for item bank assembling basing on results from these experiments.

  • 基于作答时间数据的改变点分析在检测加速作答中的探索——已知和未知项目参数

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: In recent years, response time has received a rapidly growing amount of attention in psychometric research, likely due to the increasing availability of (item-level) response time data through computer-based testing and online survey data collection. Compared to the conventional item response data that are often dichotomous or polytomous, the response time is continuous and can provide much more information. Aberrant response behaviors are frequently encountered during testing. It could cause various negative effects. Change point analysis (CPA) is a well-established statistical process control method to detect changes in a sequence, and it has provided testing professionals a new lens through to understand test-taking behavior at both the examinee and item levels. In this paper, we took test speededness as an example to illustrate how the CPA method can be used to detect aberrant behavior using item response time data. Response time under speededness was simulated using the gradual-change log-normal model for response time. Two CPA-based test statistics, the Likelihood Ratio Test and Wald Test, were used to detect aberrant response behaviors. The critical values were obtained through Monte Carlo simulations and compared with the approximate critical values in a previous study. Based on the chosen critical values, we examined the performance of the likelihood ratio test and Wald test in detecting speeded responses, specifically in terms of power and empirical Type-I error. On the one hand, the critical values are almost identical for Wald and the likelihood ratio test. They vary substantially at different nominal α levels, but do not differ much across different test lengths. On the other hand, compared to approximate critical values, the critical values are not too far away from them but are different. That may be because the approximate critical values are suitable for situations where the change point appears in the middle of the test. Results indicate that the proposed method is much more powerful based on the critical values than conventional methods that use item response data. The power was close to 1 for most of the conditions while keeping the type-I error rate well-controlled. Real data analysis also demonstrates the performance of the method. This study uses CPA with response time data and offers a very promising approach to detecting aberrant response behavior. Through the simulation study, we demonstrated that it is possible to use fixed critical values in different test lengths, which makes the application of the method straightforward. It also means that it is unnecessary to reconduct the simulation to update critical values when small changes occur in the test. CPA is very flexible. This study assumed that the log-normal model fits the response time data, but the method is not bounded by that assumption.

  • 多级计分测验中基于残差统计量的被试拟合研究

    Subjects: Psychology >> Social Psychology submitted time 2023-03-27 Cooperative journals: 《心理学报》

    Abstract: Tests are widely used in educational measurement and psychometrics, and the examinee’s aberrant responses will affect the estimation of their abilities. These examinees with aberrant responses should not be treated with conventional methods, the important thing is to accurately screen them out of the normal group. To achieve this, a common method is to construct person-fit statistics to detect whether the response patterns fit their estimated abilities. In this study, a residual-based person-fit statistic R was proposed, which can be applied to both dichotomous or polytomous IRT models. The construction of R is based on a weighted residual between the observed response and the expected response. By accumulating the weighted residuals, the goodness of fit can be calculated and compared with a specific critical value to determine whether an examinee is aberrant or not. Given that tests with polytomous items can provide more information, polytomously scored items are being increasingly popular in educational measurement and psychometrics. The ability of R statistic to detect aberrant response patterns under the graded response model was mainly considered in this article. An existing polytomous person-ft statistic lzp was also introduced in its outstanding standardized form and superior power. In the first study, a simulation study was conducted to generate the empirical distribution of R statistic and lzp. R statistic is an accumulation of weighted residuals, showing a positive skew distribution; lzp shows a negative skew distribution when the test is less than 80 items. Both of them differ from the standard normal distribution, It is necessary to set critical value according to the type 1 error, using it to distinguish whether each respondent's response pattern is fitted. In the second study, examinees with different aberrant behaviors (e.g., Cheaters, Lucky guessers, Random respondents, Careless respondents, Creative respondents and Mixed) under different test length conditions were simulated, and the detection rate as well as area under curve (AUC) were used to compare the effectiveness of the two person-fit statistics. The results show that the R statistic has a better detection rate than lzp when the aberrant behavior affects only a few items or the aberrant behavior is cheating or guessing. When the aberrant behavior covers plenty of items, lzp is slightly better than R statistic. Then, an empirical study was also conducted to show the power of R statistic. Both of the R statistic and the lzp have their own pros and cons, so we may combine them in the future person-fit studies. The R statistic has a better detection rate under certain conditions compared to the lzp, especially when cheating and lucky guessing happened. Considering that cheating and guessing behaviors of low-ability examinees are more preferred in many aberrant test behaviors, the R statistic is worthy of further research and exploration in real-world applications.

  • Application of Change Point Analysis to Detect Speededness Based on Response Time Data with Known/Unknown Item Parameters

    Subjects: Psychology >> Psychological Measurement submitted time 2022-05-14

    Abstract:

    In recent years, response time has received a rapidly growing amount of attention in psychometric research, likely due to the increasing availability of (item-level) response time data through computer-based testing and online survey data collection. Compared to the conventional item response data that are often dichotomous or polytomous, the response time is continuous and can provide much more information. Aberrant response behaviors are frequently encountered during testing. It could cause various negative effects. Change point analysis (CPA) is a well-established statistical process control method to detect changes in a sequence, and it has provided testing professionals a new lens through to understand test-taking behavior at both the examinee and item levels.

    In this paper, we took test speededness as an example to illustrate how the CPA method can be used to detect aberrant behavior using item response time data. Response time under speededness was simulated using the gradual-change log-normal model for response time. Two CPA-based test statistics, the Likelihood Ratio Test and Wald Test, were used to detect aberrant response behaviors. The critical values were obtained through Monte Carlo simulations and compared with the approximate critical values in a previous study. Based on the chosen critical values, we examined the performance of the likelihood ratio test and Wald test in detecting speeded responses, specifically in terms of power and empirical Type-I error.

    On the one hand, the critical values are almost identical for Wald and the likelihood ratio test. They vary substantially at different nominal α  levels, but do not differ much across different test lengths. On the other hand, compared to approximate critical values, the critical values are not too far away from them but are different. That may be because the approximate critical values are suitable for situations where the change point appears in the middle of the test. Results indicate that the proposed method is much more powerful based on the critical values than conventional methods that use item response data. The power was close to 1 for most of the conditions while keeping the type-I error rate well-controlled. Real data analysis also demonstrates the performance of the method.

    This study uses CPA with response time data and offers a very promising approach to detecting aberrant response behavior. Through the simulation study, we demonstrated that it is possible to use fixed critical values in different test lengths, which makes the application of the method straightforward. It also means that it is unnecessary to reconduct the simulation to update critical values when small changes occur in the test. CPA is very flexible. This study assumed that the log-normal model fits the response time data, but the method is not bounded by that assumption.

  • Detection of aberrant response patterns using a residual-based statistic in testing with polytomous items

    Subjects: Psychology >> Psychological Measurement Subjects: Psychology >> Statistics in Psychology submitted time 2022-04-06

    Abstract:本文提出一种多级计分项目下的个人拟合统计量R ,考察它在检测6种常见的异常作答模式(作弊、猜测、随机、粗心、创新作答、混合异常)下的表现,并与标准化对数似然统计量lzp 进行比较。结果表明:(1) 在异常作答覆盖率较低并且异常作答类型为作弊和猜测时,R 的检测率显著高于lzp ;(2) 随着测验长度和被试异常程度的增加,两种统计量的检测率都会上升;(3) 在一些条件下,R 与lzp 检测效果接近。实证数据分析进一步展示了R 统计量的使用方法和过程,结果也表明R 统计量具有较好的应用前景。