Gary Connolly recalls the 1907 Knickerbocker crisis, the 1929 Great Crash, Black Monday in 1987 and the 2008 collapse during the financial crisis.
History is not reassuring. According to the Financial Times, more than a quarter of US monthly losses of a tenth or more since 1871, happened in October.
As an investor, are you right to be concerned – and what possible explanation could there be for this phenomenon? I’d argue that for the most part we are being fooled by randomness. There is no explanation, as there is nothing to explain. It’s human nature to seek patterns and assign significance to otherwise meaningless events.
The most famous stock market adage, “sell in May and go away…”, recommends that you take the summer off and give your portfolio of stocks a rest. Much has been written about this apparent anomaly and it’s tempting to believe that we can juice the returns from our portfolio from doing essentially nothing other than selling stocks in May and moving to cash until September, after the St Ledger Horse race.
CAUTION WHEN CORREALATING
So consistent has this pattern been that there is a myriad of papers attempting to explain it. The tendency to find patterns in data has some unfortunate consequences for our investment behaviour. It leads us to attempting to time the market and, worse, causes us to be a lot more active with our portfolio than we should.
Slicing and dicing stock market data to predict the future is big business. There are hundreds of websites hawking ‘proprietary trading tools’ and plenty of charlatans willing to offer you access to ‘winning’ trading systems. Stock market trading rules are limitless, but unfortunately most are also in fact useless.
The stock market generates vast quantities of information. But remember: it only has one past. Scouring a vast selection of data to explain a small piece of financial market history can produce bizarre results.
In attempting to figure out what exactly is correlated with high stock market returns, academics and finance gurus often compare many different variables.
The issue here is what Tim Harford refers to as the “jelly bean problem”, named after a cartoon. The cartoon shows scientists testing whether jelly beans cause acne, applying a commonly used test. The test is to assume that jelly beans don’t cause acne. But if the observed correlation between jelly beans and acne has a less than 5% probability of occurring by chance, then rethink that assumption. The scientists test 20 different colours of jelly beans and amazingly it turns out that the green jelly beans are correlated with acne.
If you’re studying lots and lots of variables, you have a greater chance of a false result. If 20 statistical patterns are analysed and there’s no genuine causal relationship in any of them, we’d still expect one of them to look strikingly correlated. Correlation is not causation. Green jelly beans no more cause acne than umbrellas cause rain, but both are correlated with each other.
According to quantitative investor Ronald Kahn, it is surprisingly easy to search through historical data and find patterns that don’t really exist. To understand why data mining is easy, it helps to understand the statistics of coincidence.
Kahn provides a great real-life example. In the 1980s, Evelyn Adams won the New Jersey (NJ) state lottery twice in four months. Newspapers at the time put the odds of that happening at 17 trillion to one. Two Harvard statisticians, however, estimated the odds at 30 to one. How could there be such a gulf between the two estimates?
Slicing and dicing stock market data to predict the future is big business
It turns out that the odds of winning the NJ lottery twice were in fact 17 trillion to one. But that result is of interest only to Evelyn Adams’ immediate family. The odds of someone, somewhere, winning two lotteries, given the millions of people entering lotteries every day, are only 30 to one. Coincidences appear improbable only when viewed from a narrow perspective. When viewed from the correct (broad) perspective, coincidences are no longer so improbable.
Investment research involves exactly the same statistics and the same issues of perspective. The narrow perspective substantially inflates our confidence in the results. When viewed from the proper perspective, confidence in the results lowers accordingly. Shown long-term evidence that stock market returns between May and September appear much lower than October to April, we would be impressed, maybe even convinced. If you were told that one thousand other tests were run before the ‘sell in May rule’ was found to work, would you be as impressed?
Data-mined numbers can be very enticing. Billions of euros flow into data-mined investing strategies every year. According to veteran quantitative money manager David Leinweber, they are one of the leading causes of ‘money evaporation’ in quantitative strategies. So incensed by the “stupid tricks of data mining”, as he calls them, Leinweber conducted a satirical experiment on data mining in financial markets.
Leinweber wanted to be able to predict the US stock market, so he set about conducting analysis on the S&P500. Sifting through a large data series published by the United Nations, Leinweber found a strong correlation between the S&P and butter production in Bangladesh. In fact, butter production in Bangladesh explained 75% of the variation in the US stock market. By adding in cheese production in the US and sheep population, accuracy increased to 99%.
If this seems utterly ridiculous, it’s because it is. Before the start of the data period and after the end, these variables were about as useful in predicting the stock market as your intuition would suggest.
However, substitute interest rates, GDP and inflation as the variables and all of a sudden the story sounds much more plausible, though potentially no more useful than the butter-in-Bangladesh parable.
As humans we are hard-wired to find patterns in data, resisting randomness where possible. Luckily, Leinweber offers some advice to mitigate the risks of being duped by the tricks of data mining. He recommends dividing the data into thirds to see if the strategy did well only part of the time.
A lot of strategies look great on back-testing but get found out at implementation, when consideration is given to trading costs, management fees and taxes.
October is associated with a number of infamous market crashes throughout history, but it is a stretch to argue that the tenth month of the calendar year is the cause of these episodes. Too much investment tomfoolery is the result of spurious links being drawn between correlated variables that are no more causal than umbrellas are of rain.
Leinweber offers the clichéd advice that if something seems too good to be true, then it is. I’d offer some other clichéd advice: don’t put all your eggs in one basket. Accepting an inevitable tendency to be persuaded by irresistible-looking returns, at least you won’t blow up your portfolio if you allocate to investments in small doses.