Investigations comparing the behaviour and welfare of animals in different environments have led to mixed and often conflicting results. These could arise from genuine differences in welfare, poor validity of indicators, low statistical power, publication bias, or inappropriate statistical analysis. Our aim was to investigate the effects of using four approaches for inferential analysis of datasets of varying size on model outcomes and potential conclusions. We considered aggression in 864 growing pigs over six weeks as measured by ear and body injury score and relationships with: less and more enriched environments, pig's relative weight, and sex. Pigs were housed in groups of 18 in one of four pens, replicating the experiment 12 times. We applied four inferential models that either used a summary statistic approach, or else fully or partially accounted for complexities in study design. We tested models using both the full dataset (n = 864) and also using small sample sizes (n = 72). The most appropriate inferential model was a mixed effects, repeated measures model to compare ear and body score. Statistical models that did not account for the correlation between repeated measures and/or the random effects from replications and pens led to spurious associations between environmental factors and indicators of aggression, which were not supported by the initial exploratory analysis. For analyses on smaller datasets (n = 72), due to the effect size and number of independent factors, there was insufficient power to determine statistically significant associations. Based on the mixed effects, repeated measures models, higher body injury scores were associated with more enrichment (coef. est. = 0.09, p = 0.02); weight (coef. est. = 0.05, p < 0.001); pen location on the right side (coef. est. = 0.08, p = 0.03) and at the front of the experimental room (coef. est. = 0.11, p = 0.003). By comparison, lower ear injury scores were associated with more enrichment (coef. est. = −0.51, p = 0.005) and pen location at the front of the experimental room (coef. est. = −0.4, p = 0.02). These observed differences support the hypothesis that injuries to the body and ears arise from different risk factors. Although calculation of the minimum required sample size prior to conducting an experiment and selection of the inferential analysis method will contribute to the validity of the study results, conflict between the outcomes will require further investigation via different methods such as sensitivity and specificity analysis.