stevengould.org

HomeWritingsFreewarePhotos

 

Books
Articles
Columns
 
Independent writings
 
Support this site:
 

Writing an Application using OpenForecast

In the Quick Start section, we described the general steps involved in using OpenForecast, as follows:

  • Create a DataSet object.

  • Add to the data set object a series of DataPoint objects that define a series of observations.

  • Using the static getBestForecast method of Forecaster, obtain a reference to the most appropriate forecast model for your data set.

  • Use the forecast method of this ForecastingModel to forecast additional values.

In the following sections, we'll describe each of these steps in a little more detail.

Create a DataSet object

A DataSet is simply a collection of DataPoint objects. In many respects, you can think of it as just another Java 2 Collection. In fact, the DataSet class does implement the java.util.Collection interface.

You can create a new DataSet object directly, as follows:

DataSet observations = new DataSet();

No surprises here. This creates a new DataSet object named observations. We will refer to this data set in the next section, when we start to build our list of observations or data points.

Add DataPoints to the DataSet

Once you have a data set, you can begin to add DataPoint objects to it, using the DataSet add method.

There are primarily two ways of defining DataPoint objects. If you already have your observations/data points defined in some Java class, you could extend or modify that class to implement the DataPoint interface. Alternatively, OpenForecast provides an implementation of the DataPoint interface called, Observation. The Observation class is most convenient if you don't currently have an implementation of your observation data.

Using the Observation class

The Observation class provides a complete implementation of the DataPoint interface. In other words, an Observation is a DataPoint.

Consider the quarterly sales of a company product to be $500 (thousand) for period 1, $600 (thousand) for period 2, and $700 (thousand) for period 3. To create three Observation objects representing these observations, you could use the following code listed below.

Note that the Observation constructor takes a single value. This value is the dependent value - the value we observe initially, but later want to forecast. After creating a new Observation object, we then invoke the setIndependentValue method for each independent value associated with the observation. In this case, we have just one independent variable, and that is, "quarter". Therefore, for each observation we must set the value of the independent variable, quarter, to the appropriate value, as shown in the sample code.

     // Create Observation for quarter 1
     Observation observationQ1 = new Observation(500.0);
     observationQ1.setIndependentValue("quarter",1);

     // Create Observation for quarter 2
     Observation observationQ2 = new Observation(600.0);
     observationQ2.setIndependentValue("quarter",2);

     // Create Observation for quarter 3
     Observation observationQ3 = new Observation(700.0);
     observationQ3.setIndependentValue("quarter",3);

Note that the independent variable name, quarter, used in each of the observations must be exactly the same amoung the different observations if they really refer to the same independent variable.

The above listing shows how to create and initialize three Observation objects to describe our observations. Next, we must add these to our DataSet. In the section called Create a DataSet object, we defined a DataSet object, dataSet. The following code adds the Observations just defined to this data set.

     // Add Observations to the DataSet
     dataSet.add( observationQ1 );
     dataSet.add( observationQ2 );
     dataSet.add( observationQ3 );

A more advanced example

In the previous example, we used the following scenario:

Consider the quarterly sales of a company product to be $500 (thousand) for period 1, $600 (thousand) for period 2, and $700 (thousand) for period 3.

If, in addition, we expect that the average daytime high temperature for the quarter has an influence on sales, then we can add the value of this independent variable to each observation. For example, if we knew the average daytime high temperatures for quarters 1, 2 and 3 were 45°F, 63°F and 97°F respectively, then we could define the three observations as follows:

     // Create Observation for quarter 1
     Observation observationQ1 = new Observation(500.0);
     observationQ1.setIndependentValue("quarter",1);
     observationQ1.setIndependentValue("avgTemp",45);

     // Create Observation for quarter 2
     Observation observationQ2 = new Observation(600.0);
     observationQ2.setIndependentValue("quarter",2);
     observationQ2.setIndependentValue("avgTemp",63);

     // Create Observation for quarter 3
     Observation observationQ3 = new Observation(700.0);
     observationQ3.setIndependentValue("quarter",3);
     observationQ3.setIndependentValue("avgTemp",97);

Once again, note that the independent variable names used in each of the observations, quarter and avgTemp, must be exactly the same amoung the different observations for the same independent variables.

To add these observations to our DataSet, we use the add method, as before.

In the next section, we look at how to use a set of observations - or DataPoint objects - to select a ForecastingModel to use for forecasting other values.

Obtain a ForecastingModel

Now that we have a set of DataPoint objects defining our observations, we need to obtain a ForecastingModel. Two primary approaches are available here. The first approach requires little knowledge of the different forecasting models available and, in general, is the preferred approach.

The second approach to obtaining a ForecastingModel is to decide which model to use, and instantiate it directly. This provides for selection of a specific model, however, the trade-off is that you may not necessarily get the model that best fits your data.

These two approaches are described in the section called Obtaining a "good" forecasting model and the section called Using a specific forecasting model.

Obtaining a "good" forecasting model

The Forecaster class is a factory class. That is, a class that can be used to "produce" instances of other classes.

The Forecaster class contains one method of particular interest, and that is the static getBestForecast method. By calling this method and passing it the DataSet created earlier, you can obtain an instance of a ForecastingModel that is "best" - most appropriate - forecast model for your data set.

The concept of what is "good" forecasting model and, even more so, what is the "best" forecasting model is always somewhat subjective. Just like with any investment plan you see a disclaimer of the form, "past performance is no guarantee of future performance", the same idea can be applied to forecasting models. Just because a forecasting model performs well against past observations, this does not mean that it will be any good at estimating future values.

The getBestForecast method is responsible for evaluating a variety of different forecasting models that may be appropriate to your DataSet, and selecting the one that, based on the DataSet provided, gives the "best" model. Several metrics - such as Mean Squared Error, Mean Absolute Percentage Error, and others - are calculated and a combination of these metrics is used to determine which model is "best".

As an aside, if you are interested in how the "best" model is selected, you might want to look into the source code for the private, static betterThan method in Forecaster. [Ah! The beauty and freedom of open source! :-)] The betterThan method is the one that determines whether one forecasting model is "better than" another forecasting model.

You can help the Forecaster choose a "good" model for your data set by gathering a decent sized data set to begin with. In the example code we gave earlier, we only created three observations. This was done just to illustrate the creation of DataPoint objects and their addition to a DataSet. In practice, you should try to use many more observations. Again, just how many are really necessary is somewhat subjective but, in general (though by no means in all cases), the more data points used to select and define a forecasting model, the better the accuracy of the model.

In particular, if you are dealing with seasonal data and hope to capture seasonality, a minimum of 3 years data would be recommended. If your data is quarterly, then that would be a total of 3 years*4 periods per year=12 observations. However, if you were using monthly data, then a total of 3 years * 12 periods per year = 36 observations would provide a better starting point.

Using a specific forecasting model

The Forecaster class provides a good forecasting model for your data set in most cases. More advanced users - in particular those with some additional forecasting knowledge and experience - or in some special cases, you may want to override this behavior and use a specific type of forecasting model.

To use a specific forecasting model, the most difficult part is deciding exactly which model to use. Assuming you have done that, then create a new instance of that model by instantiating it directly using the appropriate class from the net.sourceforge.openforecast.models package. Once you have a ForecastingModel instance, you must then invoke the init method passing in your data set to initialize your forecasting model.

For example, if you have a ForecastingModel instance referenced by the variable, model then you could use the following:

     // ForecastingModel model
     model.init(dataSet);

The init method uses the data set passed to it to initialize various properties of the underlying model. For example, if one of the regression models is used, then the DataSet passed to init will be used to initialize the coefficients of regression. Alternatively, if a moving average model is used, then properties of a moving average model will be initialized from the DataSet.

Once the model has been initialized, it is then ready for use to forecast further values. Details of how to do this are described in the next section.

Generate forecasts

In order to generate a forecast, you must create a data set containing all the data points for which you require a forecast. You then pass this data set to the ForecastingModel's forecast method to obtain the required forecasts. This section illustrates these steps by building on the example in the section called Using the Observation class.

Note that it is also possible to generate forecast values for individual DataPoints. However, the preferred approach is to create a data set and use that, as described in the following sections.

Defining the forecast data set

To obtain a forecast for other data points, you first need to decide - or otherwise determine - what data points you want to produce a forecast for. Using the quarterly sales examples from the section called Using the Observation class, say that we want to produce quarterly sales forecasts for quarters 4 and 5 (the first quarter in the following year). Then we'd need to define two data points, as before, but referring to quarters 4 and 5.

     // Create DataPoint/Observation for quarter 4
     DataPoint fcDataPointQ4 = new Observation(0.0);
     fcDataPointQ4.setIndependentValue("quarter",4);

     // Create Observation/DataPoint for quarter 5
     DataPoint fcDataPointQ5 = new Observation(0.0);
     fcDataPointQ5.setIndependentValue("quarter",5);

     // Create forecast data set and add these DataPoints
     DataSet fcDataSet = new DataSet();
     fcDataSet.add( fcDataPointQ4 );
     fcDataSet.add( fcDataPointQ5 );

Note that in the above code, we initialize the dependent value for each DataPoint to 0.0. It really doesn't matter what value is used here because the dependent value - the value that we intend to forecast - will be updated when we call forecast.

Once we have created the required DataPoint objects, we gather them together in a data set, fcDataSet.

Obtaining Forecast values

Once we have defined a data set containing the data points for which we require forecasts of the dependent values, all that is necessary is to pass this to the model's forecast method, as follows:

     // Assume forecasting model already created and initialized
     model.forecast( fcDataSet );

For convenience, the forecast method returns the DataSet passed in, that will have been updated with the new forecast values. However, since it is the same as the one passed in, it is not uncommon to ignore the return value.

You should now have a complete set of DataPoint objects containing the forecast values as their dependent values. To access these forecast values, you'll generally want to iterate through the data set, using something like the following code:

     // After calling model.forecast, our fcDataSet now contains
     //  forecast data points
     Iterator it = fcDataSet.iterator();
     while ( it.hasNext() )
        {
        DataPoint dp = (DataPoint)it.next();
        double forecastValue = dp.getDependentValue();

        // Do something with the forecast value, e.g.
        System.out.println( dp );
        }

Once you have your forecast values, you'll probably want to output them in some form, whether you choose to output them to a CSV file, XML file, database table, or display them graphically on a chart - using something like JFreeChart. See the examples in the section called ForecastingChartDemo.java, as well as the helper "outputter" classes in the net.sourceforge.openforecast.output package.