What is better: Factor Zoo or Factor Museum?

Here are my 8-thoughts and 1 solution idea about Campbell Harvey and Yan Liu’s recently released paper on their influential concept of the factor zoo. To sum it up, it says that there are too many data-mined factors out there and that we should be using much higher t-statistics to accept factors.

Ironically, which is perhaps subtlety intentional, it feels like the mega-list of factors in the paper is an invitation to quants to browse and look for factor ideas they might have missed instead of the direct implication of the paper that most factors on that list should be avoided (see a top 10 list of quant mega-multi-factor papers here)
The least data-mined and most academically accepted narrow definitions of quant factors have had flat returns for a couple of decades now (you can see for yourself here) making them candidates for a factor museum rather than a factor zoo. Less data-mining, is not going to solve the problem of failing to assemble models that work.
Doesn’t focusing on t-stat alone seems incomplete? For example the Size factor has been accepted by the Noble Prize winner Professor Fama with a robust theoretical backing, and yet it doesn’t pass Professor Harvey’s adjusted t-stat. And what about all the type II errors i/e failing to innovate?
Does anyone expect every sketch and painting by Picasso to be a complete masterpiece? Does anyone expect every song on an album to be the best hit? Many individual factors are indeed likely to be noise, but that does not invalidate the final combination of factors that produces positive model returns.
Do factors with the higher backtest t-stats perform any better out-of-sample than then ones with lower but still significant ones?
Quants know that even a simplest factor like Earnings to Price can be defined in dozens of ways with normalization, adjustments, smoothing, trailing, filtering, lagging, contextualizing etc. And these details add up to big difference at the model level. Looking at any univariate factor applied to the entire universe is not the end of factor investing, but only a small beginning, a sort of idea sketch. Hence to pass a go/no go judgement on any given ratio just based on its t-statistic is, in my opinion, to miss the point of the early stage of research.
Even if many factor-species will vanish and end up in the factor museum, it is still better to have idea diversity than total homogeneity. Just like quants spent many years explaining to fundamental investors that quant is about getting baskets of stocks to outperform and not about predicting any individual stock, the same logic applies to factors: factor models are about creating baskets of factors that perform vs. expecting each factor to remain positive in the out-of-sample. What is your model’s factor hit rate?
On the one hand we have the chaotic zoo. On the other hand, we have the orderly 5-factor model. Neither extreme feels like the right answer that will save quants from ending up in the museum of investment approaches. How about the messy middle?

Life is better in a Zoo than a Museum, and even better in the Wild.

My proposed solution: every finance journal should team up with the key data providers and create a unified required open source code depository that replicates and updates the factor results from each published paper going forward. (The authors of each paper would be required to build a fresh set of code in this common depository). Something similar to the Fama-French website. That way, others can replicate, stress-test, and ‘break’ the results and provide comments, similar to a Wikipedia page. Also, a decade later, we can all look back and see if the original strategies stood the test of time. Datamining is inevitable and only the real out-of-sample performance can inform investors about the quality of the original idea and more importantly about quality of the selected combinations of ideas - the final models, and actual investment performance.

PS1. If the proposed depository existed, I would bet that a lot of factor definitions that many would declare as clearly data-mined ex-ante, would actually hold up quite well empirically out of sample, and visa versa.

P.S.2 This ‘zoo problem’ is not isolated to finance. The “Replication Crisis” currently ongoing in natural sciences is a hot topic these days. For example, only 13 out of 21 most influential studies published in Nature and Science recently replicated. But this does not mean it’s a bad thing. It just means that publishing an idea backed by some empirical findings is only an early step in the research process and not the end of it.

P.S.3 [Idea Mining -> Factor zoo] vs [Quant Alpha Extinction -> Factor Museum]