1. Refresher and Setup
In the first part, we discussed how to check for stationarity using the Augmented Dickey-Fuller test. Remember, it's crucial to have a stationary time series for accurate modeling. Here's how you can quickly check for stationarity:
from statsmodels.tsa.stattools import adfuller
result = adfuller(df_summary['Sales'])
print(result)
Once confirmed, we can proceed to advanced forecasting methods.
2. Holt-Winters Method
Holt-Winters is an excellent model for capturing seasonality. It's ideal for datasets with a clear seasonal pattern and a trend, either increasing or decreasing. The model can be additive or multiplicative, depending on the nature of the seasonal effect. Here’s how you can implement it:
from statsmodels.tsa.holtwinters import ExponentialSmoothing
# Preprocessing data and splitting into training and testing sets
# ... [your code for preprocessing and splitting] ...
# Initialize and fit the model
model = ExponentialSmoothing(train, seasonal='add', seasonal_periods=12, trend='add')
model_fit = model.fit()
# Forecasting
forecast = model_fit.forecast(steps=len(test))
We then evaluated the model using metrics like MAE and RMSE and visualized the results.
3. SARIMA Model
Seasonal AutoRegressive Integrated Moving Average (SARIMA) is a powerful model for more complex time series data. It extends the ARIMA model by adding seasonality components. Here's a snippet for SARIMA implementation:
import statsmodels.api as sm
# Defining the model with identified order and seasonal order
sarima_model = sm.tsa.SARIMAX(df_monthly, order=(0, 1, 1), seasonal_order=(1, 0, 1, 12))
sarima_model_fit = sarima_model.fit()
# Forecasting
forecast_extended = sarima_model_fit.get_forecast(steps=13)
The SARIMA model allows for a more nuanced understanding and prediction, especially when dealing with data having both trends and seasonality.
4. Prophet by Facebook
Prophet, developed by Facebook, is a procedure for forecasting time series data based on an additive model where non-linear trends fit with yearly, weekly, and daily seasonality. It works best with daily periodicity data that have strong seasonal effects.
from prophet import Prophet
# Preparing data for Prophet
monthly_data = df_copy.resample('M', on='Order Date').sum()
monthly_data = monthly_data.rename(columns={'Order Date': 'ds', 'Sales': 'y'})
# Fitting the model
model = Prophet()
model.fit(monthly_data)
# Predicting
future = model.make_future_dataframe(periods=365, freq='D')
forecast = model.predict(future)
Prophet is particularly good at handling outliers, missing data, and changing trends, making it robust for real-life data.
5. Evaluation and Comparison
We evaluated all models using R-squared, MAE, and RMSE. These metrics help us compare their performance and understand their strengths and weaknesses in different scenarios.
- R-squared: Indicates the model's explanatory power.
- MAE (Mean Absolute Error): Shows the average size of errors in a set of predictions, without considering their direction.
- RMSE (Root Mean Square Error): Similar to MAE but gives a relatively high weight to large errors.
6. Conclusion and Best Practices
Through this journey, we’ve seen that there's no one-size-fits-all in time series forecasting. The choice of model depends on the dataset's characteristics. Remember:
- Start with simple models and move to complex ones as needed.
- Always visualize your data and model results to gain insights.
- Regularly evaluate your model’s performance.