We just showed you an objective approach to investing with our forecast data that yields both decent and consistent returns. But at this point you might have some questions about our forecast data. For example how do we generate our data? What it is our accuracy? Are there any important considerations to keep in mind? In this blog-post we take a step back from finance to answer these and other questions about our forecast data.
Identifying Market Patterns
Simply stated our goal is to identify market patterns for a specific security or a group of securities like an index. However markets are extremely complex, often illogical and highly random environments. Therefore it is impossible to create a solely deterministic model of market behavior. A better approach is to recognize the inherent randomness of markets and adopt a statistical approach that models markets in terms of statistical parameters over time. This approach implies the use of some type of statistical model, from which we can identify statistical patterns over time and ultimately forecast the likelihood of particular events occurring – such as a price decrease in the next four days.
The first step in generating our forecast data begins with a statistical model for a particular security or group of securities. The process for creating a statistical model is shown in the schematic below.
Identifying Market Patterns
Simply stated our goal is to identify market patterns for a specific security or a group of securities like an index. However markets are extremely complex, often illogical and highly random environments. Therefore it is impossible to create a solely deterministic model of market behavior. A better approach is to recognize the inherent randomness of markets and adopt a statistical approach that models markets in terms of statistical parameters over time. This approach implies the use of some type of statistical model, from which we can identify statistical patterns over time and ultimately forecast the likelihood of particular events occurring – such as a price decrease in the next four days.
The first step in generating our forecast data begins with a statistical model for a particular security or group of securities. The process for creating a statistical model is shown in the schematic below.
Everything begins with a machine-learning algorithm that constructs, trains, evaluates and tunes a statistical model using so-called "training-data". In our case the training-data are historical price-data for a particular security, basket of securities, or an index like the CAC 40. During the training process the machine-learning algorithm attempts to find relationships among statistical parameters in the underlying price-data over different timespans. As you can imagine, this process is quite complex mathematically and relies on robust machine-learning tools that abstract much of the complexity away from the end-user while still yielding valid results. Nonetheless our end-goal always remains the same no matter the complexity: the creation of a statistical model that identifies statistically-significant patterns in market behavior.
For machine-learning algorithms we rely on BigML, a leading US-based provider of such tools with worldwide presence. BigML's software environment and training materials along with its deep expertise in machine-learning allow us to not only create and train statistical models but also iteratively improve results. As a result and in close collaboration with BigML, we are able to optimize our statistical models for both accuracy and consistency over long timespans with various securities and indices.
Model Verification is Key
Just because a statistical model for a specific security or index has been created does not mean one is finished. Actually far from it! Instead the tedious yet necessary process of testing the model now begins. This step entails testing the model results against historical prices over many timespans. We want to ensure both statistical accuracy and consistency against the actual price-record regardless the timespan.
Of course we must be very careful during model verification so as to avoid only testing timespans, during which we train the model. Why? Well this would induce a bias into the verification process. Essentially we would be testing a model using the same price-data, which we originally used to train the model. In other words, "training-data in, training-data out". Therefore we also test against price-data, which was not used to train the model. This ensures that we truly vet the underlying statistical model and avoid fooling ourselves with biased test results.
How Have Our Models Performed?
Each week from April to August 2016 we published a summary of the forecasts from the previous week for an earlier forecast format. These summaries also included a running tally of the accuracy of our forecasts since we began publishing forecast data in early April. But how have we performed over longer periods of time than just six months? Let us take a look at our historical performance with the CAC 40 index from our previous blog-post.
First consider the ceiling forecast data shown in the chart below from 1990-2015. Along the horizontal axis are our three confidence intervals: 50%, 60% and 70%. The vertical axis shows the actual success rates for each of these confidence intervals. The sample sizes are listed in each data column (e.g., n = 1097 for the 50% interval).
For machine-learning algorithms we rely on BigML, a leading US-based provider of such tools with worldwide presence. BigML's software environment and training materials along with its deep expertise in machine-learning allow us to not only create and train statistical models but also iteratively improve results. As a result and in close collaboration with BigML, we are able to optimize our statistical models for both accuracy and consistency over long timespans with various securities and indices.
Model Verification is Key
Just because a statistical model for a specific security or index has been created does not mean one is finished. Actually far from it! Instead the tedious yet necessary process of testing the model now begins. This step entails testing the model results against historical prices over many timespans. We want to ensure both statistical accuracy and consistency against the actual price-record regardless the timespan.
Of course we must be very careful during model verification so as to avoid only testing timespans, during which we train the model. Why? Well this would induce a bias into the verification process. Essentially we would be testing a model using the same price-data, which we originally used to train the model. In other words, "training-data in, training-data out". Therefore we also test against price-data, which was not used to train the model. This ensures that we truly vet the underlying statistical model and avoid fooling ourselves with biased test results.
How Have Our Models Performed?
Each week from April to August 2016 we published a summary of the forecasts from the previous week for an earlier forecast format. These summaries also included a running tally of the accuracy of our forecasts since we began publishing forecast data in early April. But how have we performed over longer periods of time than just six months? Let us take a look at our historical performance with the CAC 40 index from our previous blog-post.
First consider the ceiling forecast data shown in the chart below from 1990-2015. Along the horizontal axis are our three confidence intervals: 50%, 60% and 70%. The vertical axis shows the actual success rates for each of these confidence intervals. The sample sizes are listed in each data column (e.g., n = 1097 for the 50% interval).
Over time with ever increasing sample sizes, our success rates should converge to the given confidence intervals. Indeed this is the case as each of the success rates is within (or almost within) its respective confidence interval. For example when we state there is a 70% chance that the price goes above the ceiling in the next four days, on average we are correct 70% of the time.
Second consider the floor forecast-data shown in the next chart.
Second consider the floor forecast-data shown in the next chart.
Like the ceiling the success rates for the floor converge to their corresponding confidence intervals. As a result a forecast of "70% stay above" really means that about 70% of the time the price remains above the floor.
However the results in the previous two charts contain significant timespans with training data – that is data used to train the statistical model. Therefore we must also consider timespans without training data. For example let us now compare the timespan 2013-2014 without training-data against the timespan 1990-2013 with training data. We begin with comparisons for the ceiling forecasts in the following two charts. The first chart is for the "go above" forecasts; the second chart is for the "stay below" forecasts.
However the results in the previous two charts contain significant timespans with training data – that is data used to train the statistical model. Therefore we must also consider timespans without training data. For example let us now compare the timespan 2013-2014 without training-data against the timespan 1990-2013 with training data. We begin with comparisons for the ceiling forecasts in the following two charts. The first chart is for the "go above" forecasts; the second chart is for the "stay below" forecasts.
As you can see the results still converge to the expected confidence intervals with little significant difference between time-spans with and without training-data. This indicates that the underlying statistical model is insensitive to timespans with and without training data.
Now consider the floor comparisons with and without training-data in the two charts below. The first chart is for "go below" forecasts while the second chart is for the "stay above" forecasts.
Now consider the floor comparisons with and without training-data in the two charts below. The first chart is for "go below" forecasts while the second chart is for the "stay above" forecasts.
Just like the ceiling results, the floor results converge to the same confidence intervals. Exactly what we want and indeed expect to see – assuming of course that we have done our job correctly!
Important Considerations to Keep in Mind
The first and foremost consideration involves the potential bias from too few data samples – either for model training or during result interpretation. For example just consider our weekly summaries from April to August 2016. Although we collected twenty-one weeks of data, we still have only about ten data samples for each confidence interval.
Unfortunately too few data samples yield significant uncertainty in our results. Therefore when we train or verify models, we use at least two years’ data (although it is usually much more than just two years). Typically two years provides about sixty or more data samples. This is exactly the reason why we clearly state: if one is expecting to become rich quickly with our forecast data, she will be sorely disappointed. Instead one needs about two years to begin seeing decent returns relative to traditional investment strategies like buy-and-hold.
The second consideration to keep in mind involves the trade-off between model accuracy and consistency over different timespans both short and long. Sure one can have higher probabilities up to 90%. However, consistency will suffer. Nothing in life is without compromise – our models are no exception. Therefore we always aim to create models that balance accuracy over any given timespan. Such an approach demands a continuous balance between accuracy and consistency with neither characteristic being optimized alone but always in combination.
The final consideration to remember involves yet another bias – this time associated with training-data. Imagine for example that you have training-data for a timespan in which the market continually trends upward. Such a trend indeed occurred from about 1995-2000. Now suppose that you train your statistical model with data only from this timespan. What would happen? Well if you then used the model to forecast performance after 2000, you would be rather disappointed for the years immediately after 2000 that are characterized by significant market declines across a wide-range of securities. Therefore it is critically important that we use training-data from long timespans that contain realistic market behavior with both gains and declines. Likewise it is important that we test the models with and without training-data over a range of timespans with realistic market behavior.
Generating Value with Our Forecast Data
As you can see, the process of generating our forecast data with statistical models incorporates several steps and involves state-of-the art tools from BigML. However the overall goal always remains the same. Namely we aim to identify underlying statistical patterns for a given security, basket of securities, or an index and thereby accurately forecast upcoming movements in price.
The process of generating both accurate and consistent forecast data underpins our focus on creating sustainable value for investors in the form of decent yet stable returns over a number of years instead of decades. As a result we offer small investors a better path to investing. A path characterized by data-driven trading decisions within an automated and configurable software framework – all without relying solely on subjective information, error-prone emotions or expensive financial advisors.
In closing we again emphasize the same point that we made several times before in our posts. Please do not trade with our public forecast data! At this point we are simply trying to educate you about our approaches by being as transparent as possible. Nonetheless if you insist on trading with our data, then you assume full responsibility for any outcomes.
Next Blog-Post
In our next post we will introduce new forecast-data types and associated investment strategies, which we have been testing in parallel over the past months. These new data-types and strategies yield higher, more consistent returns than the simple strategy outlined in the last post with the current four-day forecast data.
As always let us know if you have any questions or feedback. We are always happy to hear from you. And stay tuned for upcoming blog posts!
Important Considerations to Keep in Mind
The first and foremost consideration involves the potential bias from too few data samples – either for model training or during result interpretation. For example just consider our weekly summaries from April to August 2016. Although we collected twenty-one weeks of data, we still have only about ten data samples for each confidence interval.
Unfortunately too few data samples yield significant uncertainty in our results. Therefore when we train or verify models, we use at least two years’ data (although it is usually much more than just two years). Typically two years provides about sixty or more data samples. This is exactly the reason why we clearly state: if one is expecting to become rich quickly with our forecast data, she will be sorely disappointed. Instead one needs about two years to begin seeing decent returns relative to traditional investment strategies like buy-and-hold.
The second consideration to keep in mind involves the trade-off between model accuracy and consistency over different timespans both short and long. Sure one can have higher probabilities up to 90%. However, consistency will suffer. Nothing in life is without compromise – our models are no exception. Therefore we always aim to create models that balance accuracy over any given timespan. Such an approach demands a continuous balance between accuracy and consistency with neither characteristic being optimized alone but always in combination.
The final consideration to remember involves yet another bias – this time associated with training-data. Imagine for example that you have training-data for a timespan in which the market continually trends upward. Such a trend indeed occurred from about 1995-2000. Now suppose that you train your statistical model with data only from this timespan. What would happen? Well if you then used the model to forecast performance after 2000, you would be rather disappointed for the years immediately after 2000 that are characterized by significant market declines across a wide-range of securities. Therefore it is critically important that we use training-data from long timespans that contain realistic market behavior with both gains and declines. Likewise it is important that we test the models with and without training-data over a range of timespans with realistic market behavior.
Generating Value with Our Forecast Data
As you can see, the process of generating our forecast data with statistical models incorporates several steps and involves state-of-the art tools from BigML. However the overall goal always remains the same. Namely we aim to identify underlying statistical patterns for a given security, basket of securities, or an index and thereby accurately forecast upcoming movements in price.
The process of generating both accurate and consistent forecast data underpins our focus on creating sustainable value for investors in the form of decent yet stable returns over a number of years instead of decades. As a result we offer small investors a better path to investing. A path characterized by data-driven trading decisions within an automated and configurable software framework – all without relying solely on subjective information, error-prone emotions or expensive financial advisors.
In closing we again emphasize the same point that we made several times before in our posts. Please do not trade with our public forecast data! At this point we are simply trying to educate you about our approaches by being as transparent as possible. Nonetheless if you insist on trading with our data, then you assume full responsibility for any outcomes.
Next Blog-Post
In our next post we will introduce new forecast-data types and associated investment strategies, which we have been testing in parallel over the past months. These new data-types and strategies yield higher, more consistent returns than the simple strategy outlined in the last post with the current four-day forecast data.
As always let us know if you have any questions or feedback. We are always happy to hear from you. And stay tuned for upcoming blog posts!