The use of play as an indicator of animal welfare shows promise, and an arena test is one method used to assess how different procedures or housing conditions affect calves’ motivation to perform locomotor play. However, it is unclear if this test reflects play in the home pen. In addition, the specific conditions of this test, including timing of the tests and design of the arena, vary between studies; the consequence of these differences in methodology is largely unknown. In a series of experiments, we explored the relationship between play in the home pen and arena (Experiment 1), how testing on consecutive versus alternate days (Experiment 2) and size and shape of the arena (Experiment 3) affected performance of play behavior. In all experiments running duration was continuously recorded. Additionally, the number of bucks, jumps, and kicks were recorded in the second and third experiments. On average, play in the arena reflected play in the home environment (r ≥ 0.56, P ≤ 0.002). However, neither play in the home pen nor the arena was consistent from day-to-day. In both the first and second experiments, frequency and duration of running were, on average, three times greater during the initial exposure to the arena than in subsequent tests, which may reflect calves’ reaction to novelty or an increase in motivation to play due to restriction in the home pen. Conducting arena tests on alternate, rather than on consecutive, days had no effect on play. In contrast, calves spent more time running in larger (60 m2 vs. 30 m2; ranked means: 41 ± 3.0 vs. 32 ± 3.0 rank of s/15 min, P = 0.034; true means: 19 vs. 8 s/15 min) and longer pens (twice the length; ranked means: 40.6 ± 3.0 vs. 32.4 ± 3.0 rank of s/15 min, P = 0.054; true means: 18 vs. 9 s/15 min). This is likely due to the greater distance available for uninterrupted running in the larger and longer spaces. Calves tended to jump more often in the small pens (ranked means: 40.6 ± 3.0 vs. 32.4 ± 3.0 rank of times/15 min, P = 0.066; true means: 3 vs. 2 times/15 min), possibly because less space is required to perform this behavior. We found that while arena tests offer a promising tool for measuring play, caution is needed when interpreting these results, as more play is seen during the first exposure to this environment than in subsequent exposures, as well as in larger and longer arenas.