In the Quick Start section, we described the general steps involved in using OpenForecast, as follows:
In the following sections, we'll describe each of these steps in a little more detail.
A DataSet is simply a collection of DataPoint objects. In many respects, you can think of it as just another Java 2 Collection. In fact, the DataSet class does implement the java.util.Collection interface.
You can create a new DataSet object directly, as follows:
DataSet observations = new DataSet();
No surprises here. This creates a new DataSet object named observations. We will refer to this data set in the next section, when we start to build our list of observations or data points.
Once you have a data set, you can begin to add DataPoint objects to it, using the DataSet add method.
There are primarily two ways of defining DataPoint objects. If you already have your observations/data points defined in some Java class, you could extend or modify that class to implement the DataPoint interface. Alternatively, OpenForecast provides an implementation of the DataPoint interface called, Observation. The Observation class is most convenient if you don't currently have an implementation of your observation data.
The Observation class provides a complete implementation of the DataPoint interface. In other words, an Observation is a DataPoint.
Consider the quarterly sales of a company product to be $500 (thousand) for period 1, $600 (thousand) for period 2, and $700 (thousand) for period 3. To create three Observation objects representing these observations, you could use the following code listed below.
Note that the Observation constructor takes a single value. This value is the dependent value - the value we observe initially, but later want to forecast. After creating a new Observation object, we then invoke the setIndependentValue method for each independent value associated with the observation. In this case, we have just one independent variable, and that is, "quarter". Therefore, for each observation we must set the value of the independent variable, quarter, to the appropriate value, as shown in the sample code.
Note that the independent variable name, quarter, used in each of the observations must be exactly the same amoung the different observations if they really refer to the same independent variable.
The above listing shows how to create and initialize three Observation objects to describe our observations. Next, we must add these to our DataSet. In the section called Create a DataSet object, we defined a DataSet object, dataSet. The following code adds the Observations just defined to this data set.
In the previous example, we used the following scenario:
Consider the quarterly sales of a company product to be $500 (thousand) for period 1, $600 (thousand) for period 2, and $700 (thousand) for period 3.
If, in addition, we expect that the average daytime high temperature for the quarter has an influence on sales, then we can add the value of this independent variable to each observation. For example, if we knew the average daytime high temperatures for quarters 1, 2 and 3 were 45°F, 63°F and 97°F respectively, then we could define the three observations as follows:
Once again, note that the independent variable names used in each of the observations, quarter and avgTemp, must be exactly the same amoung the different observations for the same independent variables.
To add these observations to our DataSet, we use the add method, as before.
In the next section, we look at how to use a set of observations - or DataPoint objects - to select a ForecastingModel to use for forecasting other values.
Now that we have a set of DataPoint objects defining our observations, we need to obtain a ForecastingModel. Two primary approaches are available here. The first approach requires little knowledge of the different forecasting models available and, in general, is the preferred approach.
The second approach to obtaining a ForecastingModel is to decide which model to use, and instantiate it directly. This provides for selection of a specific model, however, the trade-off is that you may not necessarily get the model that best fits your data.
These two approaches are described in the section called Obtaining a "good" forecasting model and the section called Using a specific forecasting model.
The Forecaster class is a factory class. That is, a class that can be used to "produce" instances of other classes.
The Forecaster class contains one method of particular interest, and that is the static getBestForecast method. By calling this method and passing it the DataSet created earlier, you can obtain an instance of a ForecastingModel that is "best" - most appropriate - forecast model for your data set.
The concept of what is "good" forecasting model and, even more so, what is the "best" forecasting model is always somewhat subjective. Just like with any investment plan you see a disclaimer of the form, "past performance is no guarantee of future performance", the same idea can be applied to forecasting models. Just because a forecasting model performs well against past observations, this does not mean that it will be any good at estimating future values.
The getBestForecast method is responsible for evaluating a variety of different forecasting models that may be appropriate to your DataSet, and selecting the one that, based on the DataSet provided, gives the "best" model. Several metrics - such as Mean Squared Error, Mean Absolute Percentage Error, and others - are calculated and a combination of these metrics is used to determine which model is "best".
As an aside, if you are interested in how the "best" model is selected, you might want to look into the source code for the private, static betterThan method in Forecaster. [Ah! The beauty and freedom of open source! :-)] The betterThan method is the one that determines whether one forecasting model is "better than" another forecasting model.
You can help the Forecaster choose a "good" model for your data set by gathering a decent sized data set to begin with. In the example code we gave earlier, we only created three observations. This was done just to illustrate the creation of DataPoint objects and their addition to a DataSet. In practice, you should try to use many more observations. Again, just how many are really necessary is somewhat subjective but, in general (though by no means in all cases), the more data points used to select and define a forecasting model, the better the accuracy of the model.
In particular, if you are dealing with seasonal data and hope to capture seasonality, a minimum of 3 years data would be recommended. If your data is quarterly, then that would be a total of 3 years*4 periods per year=12 observations. However, if you were using monthly data, then a total of 3 years * 12 periods per year = 36 observations would provide a better starting point.
The Forecaster class provides a good forecasting model for your data set in most cases. More advanced users - in particular those with some additional forecasting knowledge and experience - or in some special cases, you may want to override this behavior and use a specific type of forecasting model.
To use a specific forecasting model, the most difficult part is deciding exactly which model to use. Assuming you have done that, then create a new instance of that model by instantiating it directly using the appropriate class from the net.sourceforge.openforecast.models package. Once you have a ForecastingModel instance, you must then invoke the init method passing in your data set to initialize your forecasting model.
For example, if you have a ForecastingModel instance referenced by the variable, model then you could use the following:
The init method uses the data set passed to it to initialize various properties of the underlying model. For example, if one of the regression models is used, then the DataSet passed to init will be used to initialize the coefficients of regression. Alternatively, if a moving average model is used, then properties of a moving average model will be initialized from the DataSet.
Once the model has been initialized, it is then ready for use to forecast further values. Details of how to do this are described in the next section.
In order to generate a forecast, you must create a data set containing all the data points for which you require a forecast. You then pass this data set to the ForecastingModel's forecast method to obtain the required forecasts. This section illustrates these steps by building on the example in the section called Using the Observation class.
Note that it is also possible to generate forecast values for individual DataPoints. However, the preferred approach is to create a data set and use that, as described in the following sections.
To obtain a forecast for other data points, you first need to decide - or otherwise determine - what data points you want to produce a forecast for. Using the quarterly sales examples from the section called Using the Observation class, say that we want to produce quarterly sales forecasts for quarters 4 and 5 (the first quarter in the following year). Then we'd need to define two data points, as before, but referring to quarters 4 and 5.
Note that in the above code, we initialize the dependent value for each DataPoint to 0.0. It really doesn't matter what value is used here because the dependent value - the value that we intend to forecast - will be updated when we call forecast.
Once we have created the required DataPoint objects, we gather them together in a data set, fcDataSet.
Once we have defined a data set containing the data points for which we require forecasts of the dependent values, all that is necessary is to pass this to the model's forecast method, as follows:
For convenience, the forecast method returns the DataSet passed in, that will have been updated with the new forecast values. However, since it is the same as the one passed in, it is not uncommon to ignore the return value.
You should now have a complete set of DataPoint objects containing the forecast values as their dependent values. To access these forecast values, you'll generally want to iterate through the data set, using something like the following code:
Once you have your forecast values, you'll probably want to output them in some form, whether you choose to output them to a CSV file, XML file, database table, or display them graphically on a chart - using something like JFreeChart. See the examples in the section called ForecastingChartDemo.java, as well as the helper "outputter" classes in the net.sourceforge.openforecast.output package.