The Welfare Quality® (WQ) protocol for on-farm dairy cattle welfare assessment describes 33 measures and a step-wise method to integrate the outcomes into 12 criteria scores, grouped into four principle scores and into an overall welfare categorisation with four possible levels. The relative contribution of various welfare measures to the integrated scores has been contested. Using a European dataset (491 herds), we investigated: i) variation in sensitivity of integrated outcomes to extremely low and high values of measures, criteria and principles by replacing each actual value with minimum and maximum observed and theoretically possible values; and ii) the reasons for this variation in sensitivity. As intended by the WQ consortium, the sensitivity of integrated scores depends on: i) the observed value of the specific measures/criteria; ii) whether the change was positive/negative; and iii) the relative weight attributed to the measures. Additionally, two unintended factors of considerable influence appear to be side-effects of the complexity of the integration method. Namely: i) the number of measures integrated into criteria and principle scores; and ii) the aggregation method of the measures. Therefore, resource-based measures related to drinkers (which have been criticised with respect to their validity to assess absence of prolonged thirst), have a much larger influence on integrated scores than health-related measures such as 'mortality rate' and 'lameness score'. Hence, the integration method of the WQ protocol for dairy cattle should be revised to ensure that the relative contribution of the various welfare measures to the integrated scores more accurately reflect their relevance for dairy cattle welfare.