Skip to content

Item Analysis For Essay Questions

Test Item Analysis

Contents

Overview

Item analysis provides statistics on overall test performance and individual test questions. This data helps you recognize questions that might be poor discriminators of student performance. You can use this information to improve questions for future test administrations or to adjust credit on current attempts.

Roles with grading privileges (such as instructors, graders, and teaching assistants) may access item analysis in three locations within the assessment workflow. It is available in the contextual menu for a:

  • Test deployed in a content area.

    Selecting Item Analysis from a Test link in a course content area.

  • Deployed test listed on the Tests page.
  • Selecting Item Analysis from the Course Management > Control Panel > Course Tools > Tests, Surveys and Pools > Tests form.

  • Grade Center column.

    Accessing the Item Analysis function through the Grade Center

You can run item analyses on deployed tests with submitted attempts, but not on surveys. Access previously run item analyses under the Available Analysis heading or select a deployed test from the drop-down list and click Run to generate a new report. The new report’s link appears under the Available Analysis heading or in the status receipt at the top of the page.

For best results, run item analyses on single-attempt tests after all attempts have been submitted and all manually graded questions are scored. Interpret the item analysis data carefully and with the awareness that the statistics are influenced by the number of test attempts, the type of students taking the test, and chance errors.

Return to top

Watch a Tutorial

Double-click the video to expand the viewing area.

Return to top

How to Run an Item Analysis on a Test

You can run item analyses on tests that include single or multiple attempts, question sets, random blocks, auto-graded question types, and questions that need manual grading. For tests with manually graded questions that have not yet been assigned scores, statistics are generated only for the scored questions. After you manually grade questions, run the item analysis again. Statistics for the  manually graded questions are generated and the test summary statistics are updated.

  1. Go to one of the following locations to access item analysis:
    • A test deployed in a content area.
    • A deployed test listed on the Tests page.
    • A Grade Center column for a test.
  2. Access the test’s contextual menu.
  3. Select Item Analysis.
  4. In the Select Test drop-down list, select a test. Only deployed tests are listed.
  5. Click Run.
  6. View the item analysis by clicking the new report’s link under the Available Analysis heading or
    by clicking View Analysis in the status receipt at the top of the page.

Return to top

About the Test Summary on the Item Analysis Page

The Test Summary is located at the top of the Item Analysis page and provides data on the test as a whole.

Item Analysis Test Summary Report Section

Return to top

About the Question Statistics Table on the Item Analysis Page

The question statistics table provides item analysis statistics for each question in the test. Questions that are recommended for your review are indicated with red circles so that you can quickly scan for questions that might need revision.

Item Analysis Question Statistics Report Section

In general, good questions have:

  • Medium (30% to 80%) difficulty.
  • Good or Fair (greater than 0.1) discrimination values.

Questions that are recommended for review are indicated with red circles. They may be of low quality or scored incorrectly. In general, questions recommended for review have:

  • Easy ( > 80%) or Hard ( < 30%) difficulty.
  • Poor ( < 0.1) discrimination values.

With the Question Statistics table, you may filter questions by question type and difficulty category, and then investigate specific questions by clicking on their titles and reviewing their Question Details pages. Statistics displayed for each question include:

  • Discrimination: Indicates how well a question differentiates between students who know the subject matter those who do not. A question is a good discriminator when students who answer the question correctly also do well on the test. Values can range from -1.0 to +1.0. Questions are flagged for review if their discrimination value is less than 0.1 or is negative. Discrimination values cannot be calculated when the question’s difficulty score is 100% or when all students receive the same score on a question.
  • Difficulty: The percentage of students who answered the question correctly. Difficulty values can range from 0% to 100%, with a high percentage indicating that the question was easy. Questions in the Easy (greater than 80%) or Hard (less than 30%) categories are flagged for review. Difficulty levels that are slightly higher than midway between chance and perfect scores do a better job differentiating students who know the tested material from those who do not. It is important to note that high difficulty values do not assure high levels of discrimination.
  • Graded Attempts: Number of question attempts where grading is complete. Higher numbers of graded attempts produce more reliable calculated statistics.
  • Average Score: Scores denoted with an * indicate that some attempts are not graded and that the average score might change after all attempts are graded. The score displayed here is the average score reported for the test in the Grade Center.
  • Standard Deviation: Measure of how far the scores deviate from the average. If the scores are tightly grouped, with most of the values being close to the average, the standard deviation is small. If the data set is widely dispersed, with values far from the average, the standard deviation is larger.
  • Standard Error: An estimate of the amount of variability in a student’s score due to chance. The smaller the standard error of measurement, the more accurate the measurement provided by the test question.

Return to top

How to View Question Details on a Single Question

You can investigate a question that is flagged for review by accessing its Question Details page.

Item Analysis Question Details Example

This page displays student performance on the individual test question you selected.

  1. On the Item Analysis page, scroll down to the question statistics table.
  2. Select a linked question title to display the Question Details page.
  3. Use the arrows to page through questions sequentially or to skip to the first or last question.
  4. Click Edit Test to access the Test Canvas.
  5. The summary table displays statistics for the question, including:
    • Discrimination: Indicates how well a question differentiates between students who know the subject matter those who do not. The discrimination score is listed along with its category: Poor (less than 0.1), Fair (0.1 to 0.3), and Good (greater than 0.3). A question is a good discriminator when students who answer the question correctly also do well on the test. Values can range from -1.0 to +1.0. Questions are flagged for review if their discrimination value is less than 0.1 or is negative. Discrimination values cannot be calculated when the question’s difficulty score is 100% or when all students receive the same score on a question.
    • Difficulty: The percentage of students who answered the question correctly. The difficulty percentage is listed along with its category: Easy (greater than 80%), Medium (30% to 80%), and Hard (less than 30%). Difficulty values can range from 0% to 100%, with a high percentage indicating that the question was easy. Questions in the easy or hard categories are flagged for review. Difficulty levels that are slightly higher than midway between chance and perfect scores do a better job differentiating students who know the tested material from those who do not. It is important to note that high difficulty values do not assure high levels of discrimination.
    • Graded Attempts: Number of question attempts where grading is complete. Higher numbers of graded attempt produce more reliable calculated statistics.
    • Average Score: Scores denoted with an * indicate that some attempts are not graded and that the average score might change after all attempts are graded. The score displayed here is the
      average score reported for the test in the Grade Center.
    • Std Dev: Measure of how far the scores deviate from the average score. If the scores are tightly grouped, with most of the values being close to the average, the standard deviation is small. If the data set is widely dispersed, with values far from the average, the standard deviation is larger.
    • Std Error: An estimate of the amount of variability in a student’s score due to chance. The smaller the standard error of measurement, the more accurate the measurement provided by the test question.
    • Skipped: Number of students who skipped this question.
  6. The question text and answer choices are displayed. The information varies depending on the question type.

Return to top

Answer Distributions

The distribution of answers among the class quartiles is included for Multiple Choice, Multiple Answer, True/False, Either/Or, and Opinion Scale/Likert question types. The distribution shows you the  categories of students who selected the correct or incorrect answers.

  • Top 25%: Number of students with total test scores in the top quarter of the class who selected the answer option.
  • 2nd 25%: Number of students with total test scores in the second quarter of the class who selected the answer option.
  • 3rd 25%: Number of students with total test scores in the third quarter of the class who selected the answer option.
  • Bottom 25%: Number of students with total test scores in the bottom quarter of the class who selected the answer option.

Return to top

Symbol Legend

Item Analysis Symbol Legend

Symbols appear next to the questions to alert you to possible issues:

  • Review recommended: This condition is triggered when discrimination values are less than 0.1 or when difficulty values are either greater than 80% (question was too easy) or less than 30% (question was too hard). Review the question to determine if it needs to be revised.
  • Question may have changed after deployment: Indicates that a part of the question changed since the test was deployed. Changing any part of a question after the test has been deployed could mean that the data for that question might not be reliable. Attempts submitted after the question was changed may have benefited from the change. This indicator helps you interpret the data  with this in mind. This indicator is not displayed for restored courses.
  • Not all attempts have been graded: Appears for a test containing questions that require manual grading, such as essay questions. In a test containing an essay question with 50 student attempts, this indicator shows until the instructor grades all 50 attempts. The item analysis tool uses only attempts that had been graded when the report was run.
  • (QS) and (RB): Indicate that a question came from a question set or random block. Due to random question delivery, it is possible that some questions get more attempts than others.

Return to top

About Item Analysis and Multiple Attempts, Question Overrides, and Question Edits

The item analysis tool handles multiple attempts, overrides, and other common scenarios in the following ways:

  • When students are allowed to take a test multiple times, the last submitted attempt is used as the input for item analysis. For example, a test allows three attempts and Student A has completed two attempts with a third attempt in progress. Student A’s current attempt counts toward the number listed under In Progress Attempts and none of Student A’s previous attempts are included in the current item analysis data. As soon as Student A submits the third attempt, subsequent item analyses will include Student A’s third attempt.
  • Grade Center overrides do not impact the item analysis data because the item analysis tool generates statistical data for questions based on completed student attempts.
  • Manually graded questions or changes made to the question text, correct answer choice, partial credit, or points do not update automatically in the item analysis report. Run the analysis again to see if the changes affected the item analysis data.

Return to top

Examples

Item analysis can help you improve questions for future test administrations or fix misleading or ambiguous questions in a current test. Some examples are:

  • You investigate a multiple choice question that was flagged for your review on the item analysis page. More Top 25% students choose answer B, even though A was the correct answer. You realize that the correct answer was incorrectly keyed during question creation. You edit the test question and it is automatically regraded.
  • In a multiple choice question, you find that nearly equal numbers of students chose A, B, and C.  Examine the answer choices to determine if they were too ambiguous, if the question was too difficult, or if the material was not adequately covered.
  • A question is recommended for review because it falls into the hard difficulty category. You examine the question and determine that it is a hard question, but you keep it in the test because it is necessary to adequately test your course objectives.

Return to top

Abstract
Background: The most common tool used for assessment of knowledge is the essay questions. Their evaluations depend upon test and item analysis which is consisting of analysis of individual questions and analysis of the whole test. The objective of our study was to calculate the Item analysis (Facility value, FV and discrimination index, DI) of questions given in terminal examination of MBBS students and observes the adequacy of questions framing in examination.
Methods: Study contains 150 medical students undergone terminal examination consist of questions of essay type, structured essay type and short answer questions were given to the students. We divide them into high ability group and low ability group. Mark range was decided for essay type, structured essay type and short answer type questions and FV & DI was calculated.
Results: Studies described that facility values of 62.5 percentage questions were come under recommended & acceptable range and 50 percentage questions were come under acceptable range. Discrimination value of 100 percentage questions were come under recommended & acceptable range where 75 percentage questions were come under acceptable range.
Conclusion: The importance of item analysis is highlighted from our results. For improvement of examination items with average difficulty and high discrimination should be implemented into future examinations to improve test scores and properly discriminate among the students.

Key words: Item analysis of essay type questions, Facility value, Discrimination index