The aim of the present study was to compare horses' heart rate (HR), heart rate variability (RMSSD, pNN50) and behaviour in the same temperament test when being ridden, led, and released free. Behavioural measurements included scores and linear measurements for reactivity (R), activity (A), time to calm down (T) and emotionality (E), recorded during the approach (1) and/or during confrontation with the stimulus (2). Sixty-five horses were each confronted 3 times (1 ridden, 1 led, 1 free running in balanced order) with 3 novel and/or sudden stimuli. Mixed model analysis indicated that leading resulted in the lowest (P < 0.05 throughout) reactions as measured by A1, A2, E1, E2, R2, and pNN50 while riding produced the strongest (A1, T2, HR, RMSSD, pNN50) or medium (E1, E2, R2) reactions. Free running resulted either in the strongest (A2, E1, E2, R2) or in the lowest (A1, T2, HR, RMSSD, pNN50) reactions. The repeatability across tests for HR (0.57), but not for RMSSD (0.23) or pNN50 (0.25) was higher than for any behavioural measurement: the latter ranged from values below 0.10 (A1, A2, T2) to values between 0.30 and 0.45 (E1, E2, R2). Overall, the results show that a rider or handler influences, but not completely masks, the horses' intrinsic behaviour in a temperament test, and this influence appeared to be stronger on behavioural variables and heart rate variability than on the horses' heart rates. Taking both practical considerations and repeatabilities into account, reactivity appears to be the most valuable parameter. Emotionality and heart rate can also yield valid results reflecting additional dimensions of temperament although their practical relevance may be less obvious. If a combination of observed variables is chosen with care, a valid assessment of a horse's temperament may be possible in all types of tests. However, in practice, tests that resemble the practical circumstances most closely, i.e. testing riding horses under a rider, should be chosen.