Blog Unveiling Year-Over-Year Trends: A Time Series Analysis
a

Unveiling Year-Over-Year Trends: A Time Series Analysis

Time series analysis is a powerful method for revealing hidden patterns in temporal data, such as trends, noise, and seasonality. While statistical models are often used for this, SQL functions can also uncover these insights. Knowi makes this process easy with rolling, cumulative, and moving average functions, allowing direct time series analysis within the platform.

In this article, we’ll explore the key components of time series data and apply them to a real-world case study on wildlife sightings, aiming to identify long-term population trends.

What is Time Series Analysis?

A time series is a sequence of data points indexed by time. The goal is to identify patterns and trends in the data and sometimes forecast future trends based on historical data.

Key Characteristics:

  1. Trend – Long-term increase or decrease.
  2. Seasonality – Repeating short-term patterns at regular intervals.
  3. Cyclic Patterns – Fluctuations not tied to fixed intervals, often due to external factors.
  4. Noise – Random variations considered errors or outliers.

Key Cloud9QL Functions for Time Series Analysis

Cloud9QL, Knowi’s version of SQL, streamlines data transformation and analysis with built-in functions tailored for tasks like trend analysis, data smoothing, and tracking changes over time. The table below summarizes these key functions

FunctionDescriptionBest ForExample Query
ACCUMULATECreates cumulative totals for a field between records, given a sorted dataset.Cumulative totals and tracking growth over time, smoothing data to highlight long-term trends.select accumulate(sent), date
GROWTHCalculates a growth percentage for a field between records, for a sorted dataset.Identifying growth or decline patterns between consecutive periods. Helpful for financial and sales metrics.select growth(sent), date
DELTACalculates the difference between data points over time for a given field.Analyzing changes between records to detect fluctuations or step changes in time series data.select delta(sent), date
SMASimple moving average based on a window size. Assumes sorted data.Smoothing out short-term fluctuations and highlighting longer-term trends and seasonality.select sma(sent, 10)
CMACumulative moving average that includes all data up to the current point.Long-term trend analysis that reduces short-term noise. Useful for identifying overall upward or downward trends.select cma(sent)
TMATime moving average calculated over a specified time window (e.g., weekly, monthly). Assumes sorted data.Detecting patterns in data with fixed time intervals, such as seasonality and recurring events.select tma(sent, date, 1w)
TMSTime moving sum calculated over a specific time unit window.Analyzing the total over a set period, such as weekly or monthly totals for sales or production.select tms(sent, date, 1w)

Case Study Example: Gatekeeper Butterfly Sightings

In this case study, we aim to uncover population trends for Pyronia tithonus, commonly known as the Gatekeeper butterfly, using time series analysis. Our primary goal is to explore long-term patterns in butterfly sightings, isolating seasonal behaviors and determining whether there is evidence of population decline. To achieve this, we will utilize SQL-based methods to smooth out seasonality and noise and detect subtle changes in the population.

The dataset we’ll analyze comes from the Regional GAM Species Counts on Kaggle, which tracks the abundance and distribution of various species across multiple sites. For those interested in following along with the analysis or conducting their own, the dataset can be downloaded from Kaggle’s Regional GAM Species Counts.

The data shows a clear seasonal pattern in butterfly sightings, with most sightings occurring between June and August. Over the span of 10 summers, it appears that either the total number of sightings is declining, or the peak periods are becoming less pronounced, with butterflies being spotted more steadily over an extended period rather than in sharp, concentrated bursts. 

Step 1: Handing Seasonality

To account for seasonality, we use the Time Moving Average (TMA), which calculates an average based on a time window of your choice. 

SELECT Date, TMA(Sightings, Date, ‘1y’) AS Sightings_TMA_1y

The TMA(Sightings, Date, ‘1y’) function calculates a time-based moving average over a one-year window. This helps us smooth out data over annual cycles, allowing us to compare sightings across years. Here we see an overall downward trend in sightings with some annual fluctuations.

Step 2: Identifying the Long-Term Trend

After removing seasonality, we can calculate the Cumulative Moving Average (CMA) to uncover the long-term trend in sightings. This function helps us understand whether sightings are increasing, stable, or declining over time:

SELECT Date, CMA(Sightings) AS Sightings_CMA

The CMA(Sightings) function calculates the cumulative moving average of sightings up to the current date. This smooths out short-term variations and provides a clear view of the overall trend in sightings, which is decreasing. 

Conclusion: Are Gatekeeper Butterfly Sightings Declining?

Our analysis suggests a decline in the Gatekeeper butterfly population when we look at the long term trends. Removing the seasonal functions make this abundantly clear. This approach is widely applicable, not just for wildlife data, but for any time series dataset—whether you’re analyzing sales data, environmental data, or any other kind of time-dependent information.

What is Knowi? Knowi is a business intelligence (BI) platform based in Oakland, California. Unlike other BI tools that focus primarily on visuals, Knowi takes a data-first approach, allowing it to seamlessly connect to any type of data, from any source. Knowi’s in-house version of SQL, Cloud9QL, comes with many powerful functions to streamline time series analysis.

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
About the Author:

RELATED POSTS