Verbal Ability Test Technical Report Introduction: The purpose of this test is twofold; firstly, it is to provide a free, rapid, reliable and accurate measure of verbal intelligence. This test was constructed using items from the Scholastic Aptitude Test (SAT) and Graduate Record Exam (GRE) from the mid- 1980s. The SAT and GRE before 1994 is often considered to be a thinly veiled IQ test. The goal was to select the items that have a history and reliability in measuring the general factor of intelligence (g) and IQ. Vocabulary in the form of antonyms and verbal analogies are both highly g loaded items and have a long history of reliability. As such, extracting only antonyms and analogies from these tests should provide a fast and reliable test of verbal intelligence. Secondly, collecting data on this battery should put to the test the idea that the old SAT and GRE were IQ tests. If this is true, we should see a significant correlation with professional IQ tests. Test construction: Antonyms and analogies were extracted from two SATs and two GREs from the 1980s. Initially items were proportional to their representation on the SAT. Items were rank ordered by difficulty. The ceiling of the test was then extended from 45 items to 49 with the addition of two difficult GRE items to both antonyms and analogies. These items represent the hardest items in their randomly selected test and the hardest item across 10 years of GREs. Preliminary normalization: Analysis of the raw data from the 1984 SAT was used for an initial analysis. 20000 18000 16000 14000 12000 10000 8000 Series1 6000 4000 2000 0 300 200 250 350 400 450 500 550 600 650 700 750 800 850 900 950 1000 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 SAT Score The raw data very nearly approximates a perfect Gaussian distribution, representing 964,739 high school seniors in the USA. The only skew is evident in the bottom 2% of the distribution. It is likely that people with intellectual disabilities would not take the SAT. Correcting for this, it was estimated that the population average IQ for high school seniors taking the SAT was 101, equating to an SAT score of 890. The ceiling is approximately in the range of 160-166. Comparing the difficulty of the currently 45 item antonym/analogy battery to the average difficulty of the full 800 point SAT Verbal allowed for a simple estimate of raw score to 800 SAT Verbal score. It was taken into account the relative distribution of Math scores compared to Verbal scores and that the Math scores were lower. Aggregate SAT Verbal scores were then converted to estimate total score out of 1600. After this conversion from raw score to IQ was complete, the ceiling was extended using the more difficult items. IQ estimates for these were rough guesses based on the existing curve and the relative difficulty of the additional items. Given the extreme difficulty of the items, it was assumed that scores above 45/49 would be extremely rare. A score of 46/49 was selected as the preliminary ceiling equivalent of the SAT Verbal and 47/49 as the SAT ceiling. Using the mean score and the distribution, a preliminary norm was formed. Data collection: As of 1/16/2021, 163 attempts at the test were taken. Attempts were removed for any of the following reasons: 1. Blatant trolling or otherwise advising me not to use the data. 2. Incomplete test 3. More than one attempt (only the first attempt was counted). The final count of acceptable attempts was N = 108. Data spread: The highest score achieved was 46 / 49. Regression: 40 professional scores were reported and acceptable, with an average of 137. WAIS VIQ was the most common, followed by Reynolds. Several people reported Stanford Binet scores or PPVT. For this analysis I did allow Dr. Jouve tests if both IAW and CCAT were taken; the average was used. In a few cases, WAIS VIQ was reported as maxed (150). In these cases, I substituted FSIQ if it was higher. The correlation at N = 40 was R = 0.774. No correction or further analysis was done at this time. Reliability: N = 108. All measures of reliability were deemed high. Reliability Statistics Cronbach's Alpha Based on Cronbach's Standardized Alpha Items N of Items .901 .902 49 Reliability Statistics Cronbach's Alpha Part 1 Value .856 N of Items 25a Part 2 Value .791 N of Items 24b Total N of Items 49 Correlation Between Forms .752 Spearman-Brown Equal Length .859 Coefficient Unequal Length .859 Guttman Split-Half Coefficient .851 Factor Analysis: N = 108 Factor loadings (h^2, sometimes called g-loadings) were all above 0.5, with more than half of the items above 0.7. Communalities Extractio n Q1 .718 Q2 .780 Q3 .621 Q4 .765 Q5 .694 Q6 .741 Q7 .687 Q8 .690 Q9 .571 Q10 .833 Q11 .707 Q12 .681 q13 .728 q14 .746 q15 .699 q16 .841 q17 .587 q18 .773 q19 .627 q20 .751 q21 .720 q22 .710 q23 .668 q24 .656 q25 .666 q26 .735 q27 .622 q28 .750 q29 .637 q30 .675 q31 .792 q32 .799 q33 .680 q34 .723 q35 .718 q36 .704 q37 .826 q38 .703 q39 .716 q40 .647 q41 .637 q42 .594 q43 .665 q44 .726 q45 .716 q46 .748 q47 .647 q48 .730 q49 .714 Normalization: Version 1 of the normalization follows. There was no statistically significant difference between the preliminary norm mean and the reported scores mean, and the average difference between reported scores and preliminary mean was 0. The difference in the data spread was minor and deviated only towards the ceiling, which was corrected. RAW IQ 49 Over 170 48 167 47 162 46 158 45 155 44 152 43 150 42 148 41 145 40 143 39 140 38 137 37 134 36 131 35 128 34 125 33 123 32 121 31 118 30 115 29 113 28 111 27 110 26 108 25 106 24 104 23 103 22 101 21 99 20 97 19 94 18 91 17 88 16 85 15 82 14 79 13 76 12 73 11 70 Most difficult items: 3rd place: MARTIAL : MILITARY :: 14% answered correctly. Factor loading was 0.714. Average raw score of correct responders was 42. 2nd place: QUAFF : SIP :: 8% answered correctly. Factor loading was 0.730. Average raw score of correct responders was 40. 1st place: FLAG 4% answered correctly. Factor loading was 0.735. Average raw score of correct responders was 41. Conclusions: The Verbal Ability Test is a reliable battery with very good items that strongly correlates with VIQ as measured by reputable tests. It is therefore considered acceptable in personal use as a rapid measure of general intelligence and particularly verbal ability. Additionally, strong correlations with professional tests definitively show that the old SAT and GRE were IQ tests. Future improvements to the test could involve the addition of difficult items to smooth the difficulty curve near the ceiling and improve the ability to measure very high intelligence.
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-