Validity refers to the accuracy of a measurement, or how well it measures what it’s intended to measure. Reliability refers to the repeatability of a measure. It’s important for coaches to understand these terms when testing and monitoring athletes. For the data to provide meaningful information, it must truly reflect what it is that the coach is trying to assess. Two major factors determine the validity and reliability of a metric. 1) that the test and/or technology is engineered and designed appropriately so that it can in fact accurately and repeatedly measure what it is supposed to, and 2) that the user is properly conducting the test the way in which it was intended to be done.
For instance, a BodPod can be a valid predictor of body composition, and it can track changes over time making it a reliable metric. However, measurement error by the coach or technician can make the results invalid and obscure the true changes in body composition. A more practical example pertains to hand-timed speed tests. The stop watch is an accurate tool, but the margin for error in a coach’s reaction time is tremendous. This is evidenced by the contrast in electronic versus hand timed 40 yard dash times in high school and collegiate football players.
Assuming the measurement procedures are correctly followed, it is generally desirable to get testing completed as efficiently as possible so time can be spent on physical and technical development. For performance testing and monitoring, athletes will typically be given 2-3 or more attempts at a given test, at which point the results are either averaged or the best score is recorded. Which of the two methods is better? Best performance or average value?
In a study be Al Haddad and colleagues recently published ahead of print in the International Journal of Sports Physiology and Performance, results of performance testing in over 100 high level young soccer players (13-17 years old) was reported as either the best of, or average of repeated trials. The performance tests were administered before and after four months of training and included counter-movement jumps and 40 m sprints with 10 m splits. The results showed a 6.1% and 7% increase in counter-movement jump for best and average performance, respectively. Changes in 40 and 10 m sprint times were the same for both best and average measures. The typical error for counter movement jumps were 4.8% and 4.3% for best or average values, respectively. For 40 and 10 m dashes, the typical errors were virtually the same. The authors conclude that best or average values will yield similar outcomes. From a practical standpoint, it would be interesting to see if repeated trials are necessary, or if a single test is sufficient. This can be particularly important for routine vertical jump testing for fatigue monitoring.
Reference:
Al Haddad, H., Simpson, B. M., & Buchheit, M. (2015). Monitoring Changes in Jump and Sprint Performance: Best or Average Values?. International Journal of Sports Physiology and Performance. Ahead of Print.