Heighten SQL Abilities with SQL Problem/Solution Examples That Include a Time Component

Jmstipanowich
8 min readJan 4, 2022

SQL Exercises Involving Time Expand the Data Processing Possibilities within Databases and the Use Cases for SQL

Structured Query Language or SQL is “a standardized programming language that is used to manage relational databases and perform various operations on the data in them” (definition from https://searchdatamanagement.techtarget.com/definition/SQL). People implement SQL to perform a variety of different tasks on database data in order to organize data in databases or modify database structures. The purpose of utilizing SQL is to identify data relationships and comprehend data arrangements in databases to obtain insights on how to adapt a world of data to life and how data processes can be improved. There are many ways to manipulate data with SQL, but in this blog I will discuss a unique use case for SQL relating to the transformation of data with SQL when a time element is applied. SQL data can be adjusted to showcase only data in specified time frames. Also, time can determine the overall construction of SQL data.

To demonstrate application of time with SQL today, I am providing three SQL problems and solutions I completed on the courses section of the Kaggle website (https://www.kaggle.com/learn) that all contain time as a major component in the SQL queries. These exercises are offered to help data professionals practice including time in SQL creations and enhance understanding of how time can be manipulated with SQL. The problems and solutions are provided below.

Note: The problems and solutions are not my own creation, but were obtained from the “Intro to SQL” and “Advanced SQL” courses on the Kaggle website. However, explanations of the solutions to the provided exercises will be given, and those explanations were procured from my own experiences working through these problems on the Kaggle website.

Chicago Taxi Trips By Month Exercise

The project for this exercise is titled, “bigquery-public-data.” The dataset is titled, “chicago_taxi_trips.” The table used for the query is the taxi_trips table.

Problem:

You’d like to take a look at taxi rides from 2017. Write a query that counts the number of trips in each month. Use a WHERE clause to limit the query to data from 2017.

Your results should have two columns:

  • month- the month of the trips
  • num_trips- the number of trips in that month

Solution:

Query Results:

Explanation:

The trip_start_timestamp column in the taxi_trips data table exists as a date with each date presented in seconds, minutes, hours, days, months, and years. The “SELECT EXTRACT(MONTH FROM trip_start_timestamp) AS month” statement selects just the month from each date in the trip_start_timestamp column to be used in a month column for the query. The “COUNT(1) AS num_trips” adds up the number of rows that correspond with the other query instructions. Each row in this data is a taxi trip so each taxi trip gets counted as 1 trip and goes under the num_trips column with a monthly specified structure. The “FROM `bigquery-public-data.chicago_taxi_trips.taxi_trips`” specifies the project, dataset, and table to utilize data from for the query. The “WHERE EXTRACT(YEAR FROM trip_start_timestamp) = 2017” statement searches for the year 2017 in each trip_start_timestamp and limits the queried results to only include data from the year 2017 in the results. The “GROUP BY month” statement groups the results (taxi trips) by each month of the year in 2017. The “ORDER BY month” orders the months in the queried results beginning with the first month of the year, January, and proceeding to the last month of the year, December. The number of taxi trips per month in 2017 are shown in the query results above.

Stack Overflow Questions and Answers Exercise (Part 1)

The project for this exercise is titled, “bigquery-public-data.” The dataset is titled, “stackoverflow.” The tables used for the query are the posts_questions and posts_answers tables.

Problem:

You’re interested in understanding the initial experiences that users typically have with the Stack Overflow website. Is it more common for users to first ask questions or provide answers? After signing up, how long does it take for users to first interact with the website? To explore this further, you draft the query below.

The query returns a table with three columns:

  • owner_user_id- the user ID
  • q_creation_date- the first time the user asked a question
  • a_creation_date- the first time the user contributed an answer

You want to keep track of the users who have asked questions, but have yet to provide answers. And, your table should also include users who have answered questions, but have yet to pose their own questions.

To avoid returning too much data, we’ll restrict our attention to questions and answers posed in January 2019.

Solution:

Query Results (the first five items):

Explanation:

The “SELECT q.owner_user_id AS owner_user_id” statement creates the owner_user_id column to be included in the query results since owner_user_id is a common field between both the posts_questions and posts_answers tables that are utilized in this query. The “MIN(q.creation_date) AS q_creation_date,” statement selects the first time or minimum time that a Stack Overflow user created a question and adds that value to a q_creation_date column for the query results. The “MIN(a.creation_date) AS a_creation_date” statement selects the first time or minimum time that a Stack Overflow user created an answer and adds that value to an a_creation_date column for the query results. The “FROM `bigquery-public-data.stackoverflow.posts_questions` AS q” specifies the project, dataset, and table to include data from first for the query. The “FULL JOIN `bigquery-public-data.stackoverflow.posts_answers`AS a” and “ON q.owner_user_id = a.owner_user_id ” statements joins the posts_questions table and posts_answers table with a common column of owner_user_id. A full join includes all results whether there is a corresponding row or not from either table in the results. In relation to this problem, a full join means the users who have posed questions and answers are included in the results, the users who have asked questions without providing answers are included in the results, and the users who have answered questions without posing their own questions are included in the results. The “WHERE q.creation_date ≥ ‘2019–01–01’ AND q.creation_date < ‘2019–02–01’ AND a.creation_date ≥ ‘2019–01–01’ AND a.creation_date < ‘2019–02–01’” statement restricts all questions and answers observed as created by users on the Stack Overflow website to only include those originating in January 2019 in the results. The “GROUP BY owner_user_id” groups the Stack Overflow questions and answers to their owners. The first five user IDs with Stack Overflow questions or answers are shown above in the query results.

Stack Overflow Questions and Answers Exercise (Part 2)

The project for this exercise is titled, “bigquery-public-data.” The dataset is titled, “stackoverflow.” The tables used for the query are the posts_questions and posts_answers tables.

Problem:

How many distinct users posted on January 1, 2019?

Write a query that returns a table with a single column:

  • owner_user_id- the IDs of all users who posted at least one question or answer on January 1, 2019. Each user should appear at most once.

In order for your answer to be marked correct, your query must use a UNION.

Solution:

Query Results (the first five items):

Explanation:

The “SELECT q.owner_user_id” and “FROM `bigquery-public-data.stackoverflow.posts_questions` as q” statements select all the owner_user_id values from the posts_questions table in the stackoverflow dataset. THE “WHERE EXTRACT(DATE FROM q.creation_date) = ‘2019–01–01’” statement limits the posts_questions table data to only allow owner_user_id values from the posts_questions table that link to a question created on the Stack Overflow website on January 1, 2019. The “SELECT a.owner_user_id” and “FROM `bigquery-public-data.stackoverflow.posts_answers` as a” statements select all the owner_user_id values from the posts_answers table in the stackoverflow dataset. THE “WHERE EXTRACT(DATE FROM a.creation_date) = ‘2019–01–01’” statement limits the posts_answers table data to only allow owner_user_id values from the posts_answers table that link to an answer created on the Stack Overflow website on January 1, 2019. The “UNION DISTINCT” statement in the middle of the query unites the two data tables of posts_questions and posts_answers on the common owner_user_id field in both tables. Any time there was a user who posed a question or provided an answer on January 1, 2019, the owner_user_id for the user is taken note of to be part of the query results. The distinct part of the union of data tables makes sure that users are only included once in the resulting query. An owner_user_id is not duplicated in the results query with a distinct union if the owner_user_id was in both the posts_questions table and the posts_answers table or appeared in one table more than one time. The first five resulting user IDs that posted questions or answers on January 1, 2019 on the Stack Overflow website are shown in the query results above.

SQL Problem/Solution Examples with a Time Component Solved

Structured Query Language or SQL can be applied to database data for a variety of purposes from simple categorization of data to mass creations of new data structures embracing new data representations. In my blog today, I discussed a more unusual way to employ SQL: the SQL data expression with a time-related component. I provided three SQL exercises manipulating time data with problems, solutions, and explanations to allow data professionals to further scrutinize time data and develop their knowledge of SQL queries that comprise a time component. SQL can greatly alter the way database data is expressed when incorporating time components in the queries. I hope the exercises were helpful. Good luck with future adventures performing SQL queries when time components are involved!

Resources:

--

--