Strategies selected by combining multiple signals suffer severe overfitting
biases, because underlying signals are typically signed such that each predicts
positive in-sample returns. As a result, “highly significant” backtested
performance is easy to generate selecting stocks using combinations of randomly
generated signals, which by construction have no true power. This paper
analyzes t-statistic distributions for multi-signal strategies, both empirically
and theoretically, to determine appropriate critical values, which can be several
times standard levels. Overfitting bias also severely exacerbates the multiple
testing bias that arises when investigators consider more results than they
present. Combining the best k out of n candidate signals yields biases similar
to those obtained using the single best of nk candidate signals.