Analyzing Time Series Data with Python: Part 2 - Advanced Forecasting Techniques

2023-10-29 20:58:46 AI 1423

Abstract

Today, in Part 2, we delve into using forecasting techniques for data authentication.

1. Refresher and Setup

In the first part, we discussed how to check for stationarity using the Augmented Dickey-Fuller test. Remember, it's crucial to have a stationary time series for accurate modeling. Here's how you can quickly check for stationarity:

from statsmodels.tsa.stattools import adfuller
result = adfuller(df_summary['Sales'])
print(result)

Once confirmed, we can proceed to advanced forecasting methods.

2. Holt-Winters Method

Holt-Winters is an excellent model for capturing seasonality. It's ideal for datasets with a clear seasonal pattern and a trend, either increasing or decreasing. The model can be additive or multiplicative, depending on the nature of the seasonal effect. Here’s how you can implement it:

from statsmodels.tsa.holtwinters import ExponentialSmoothing

# Preprocessing data and splitting into training and testing sets
# ... [your code for preprocessing and splitting] ...

# Initialize and fit the model
model = ExponentialSmoothing(train, seasonal='add', seasonal_periods=12, trend='add')
model_fit = model.fit()

# Forecasting
forecast = model_fit.forecast(steps=len(test))

We then evaluated the model using metrics like MAE and RMSE and visualized the results.

3. SARIMA Model

Seasonal AutoRegressive Integrated Moving Average (SARIMA) is a powerful model for more complex time series data. It extends the ARIMA model by adding seasonality components. Here's a snippet for SARIMA implementation:

import statsmodels.api as sm

# Defining the model with identified order and seasonal order
sarima_model = sm.tsa.SARIMAX(df_monthly, order=(0, 1, 1), seasonal_order=(1, 0, 1, 12))
sarima_model_fit = sarima_model.fit()

# Forecasting
forecast_extended = sarima_model_fit.get_forecast(steps=13)

The SARIMA model allows for a more nuanced understanding and prediction, especially when dealing with data having both trends and seasonality.

4. Prophet by Facebook

Prophet, developed by Facebook, is a procedure for forecasting time series data based on an additive model where non-linear trends fit with yearly, weekly, and daily seasonality. It works best with daily periodicity data that have strong seasonal effects.

from prophet import Prophet

# Preparing data for Prophet
monthly_data = df_copy.resample('M', on='Order Date').sum()
monthly_data = monthly_data.rename(columns={'Order Date': 'ds', 'Sales': 'y'})

# Fitting the model
model = Prophet()
model.fit(monthly_data)

# Predicting
future = model.make_future_dataframe(periods=365, freq='D')
forecast = model.predict(future)

Prophet is particularly good at handling outliers, missing data, and changing trends, making it robust for real-life data.

5. Evaluation and Comparison

We evaluated all models using R-squared, MAE, and RMSE. These metrics help us compare their performance and understand their strengths and weaknesses in different scenarios.

R-squared: Indicates the model's explanatory power.
MAE (Mean Absolute Error): Shows the average size of errors in a set of predictions, without considering their direction.
RMSE (Root Mean Square Error): Similar to MAE but gives a relatively high weight to large errors.

6. Conclusion and Best Practices

Through this journey, we’ve seen that there's no one-size-fits-all in time series forecasting. The choice of model depends on the dataset's characteristics. Remember:

Start with simple models and move to complex ones as needed.
Always visualize your data and model results to gain insights.
Regularly evaluate your model’s performance.

Copyright： please credit the source when reposting

Post Link：https://digitaldwellings.tech/blog/article/31/

Sales data authentication: Part1 - EDA Troubleshooting Access to PostgreSQL in a K3s Pod

0 comments