by Luke Vella Critien

This second article within this series focuses on data manipulation techniques through the use of SQL (Standard Query Language). This article follows the Data Collection for Renewable Energy Data article.

SQL is a language that is used in Relational Database Management Systems (RDBMs) such as Microsoft SQL Server.  This language has been around for many years and is widely used to handle structured data, stored in tables (entities).  SQL can be split down into different sub-sections such as Data Definition Language (DDL), Data Manipulation Language (DML), Data Control Language (DCL), and Transaction Control Language (TCL).  These sub-sections include different commands which can be used to perform different operations with a relational database environment.  In this section, we will use some of the commands within the Database Manipulation Language section which is used to add, modify, delete, and read data within the database.

In order to follow through this article, you need to have Microsoft SQL Server Management Studio (SSMS) installed. You should open this tool, connect to the local Microsoft SQL Server database instance where the data has been loaded and open a New Query window. The below query is then to be executed so as to point to the PVData database.

--Point to the database which contains the Renewable Energy data
USE PVData;

Selecting columns from a table

While using SQL to retrieve data from a table you can either select all the columns and rows using SELECT * or else you can select particular columns using SELECT column_name/s

--Obtaining all the rows and columns from the ['1 - Stock of PV installations$'] table
SELECT * 
FROM dbo.['1 - Stock of PV installations$'];
--Obtain the Year, Sector and [Gozo & Comino] columns for all the available rows
SELECT Year, 
       Sector, 
       [Gozo & Comino]
FROM dbo.['1 - Stock of PV installations$'];

Filtering rows of data from the whole table

The WHERE clause is to be used whenever particular rows are to be included in the result of an SQL statement.  The WHERE clause is typically followed by one or more conditions which can include different operators – numeric, logical, etc..

-- Obtain the Year, Sector and [Gozo & Comino] columns for the commercial sector 
SELECT Year, Sector, [Gozo & Comino]
FROM dbo.['1 - Stock of PV installations$']
WHERE Sector = 'Commercial';

Note: In the output, only data for the Commercial sector is displayed.  Also in the SQL statement, the word Commercial is enclosed in single quotes as it’s data type is a string

-- Obtain the Year, Sector and [Gozo & Comino] columns for the years 2105 and 2019 only
SELECT Year, Sector, [Gozo & Comino]
FROM dbo.['1 - Stock of PV installations$']
WHERE Year = 2015 OR Year = 2019;

--or

SELECT Year, Sector, [Gozo & Comino]
FROM dbo.['1 - Stock of PV installations$']
WHERE Year IN (2015, 2019);

Note:

  • In the first statement, we have included two conditions (one for 2015 and another for 2019) and made use of the logical operator OR (either of the two conditions is to be true for a row to be included in the result).
  • In the second statement, we used the IN operator which allows for multiple data values to be included between curved brackets

Aggregating and grouping data

It is common that data will need to undergo some form of arithmetic calculation and for this reason the typically operators (+, -, *, /, etc.) can be used.  There are also ready built aggregate functions (SUM, MIN, MAX, COUNT, AVG, etc.) which can be applied to data.  Below are a few simple examples of what can be done.

--Display the Sector and Total PV Stock in 2015 across the Maltese archipelago
SELECT Sector, 
       [Southern Harbour]+[Northern Harbour]+[South Eastern]+Western+Northern+[Gozo & Comino] 'Total 2015'
FROM dbo.['1 - Stock of PV installations$']
WHERE Year = 2015;

Note:

  • In the above query, we simply added the columns for each region using the + operator.
  • Since for each year we had data split by Sector, the result included the total stock of PV panels per sector for 2015
--Display the Grand Total PV Stock in 2015 for all the sectors together
SELECT SUM([Southern Harbour]+[Northern Harbour]+[South Eastern]+Western+Northern+[Gozo & Comino]) 'Total PV Stock in 2015'
FROM dbo.['1 - Stock of PV installations$']
WHERE Year = 2015;

Note: In this query we used the SUM function to add the values obtained by the previous SQL statement as one grand total.

--Display the Year and the Grand Total PV Stock in 2015 for all the sectors together 
SELECT Year,
       SUM([Southern Harbour]+[Northern Harbour]+[South Eastern]+Western+Northern+[Gozo & Comino]) 'Total PV Stock'
FROM dbo.['1 - Stock of PV installations$']
GROUP BY Year;

Note: In the above SQL statement we had to use the GROUP BY clause so that the grand total for each year could be calculated.  As a general rule when in the SELECT statement there exists a column (in this case YEAR) which does not make use of an aggregate function (in this case SUM) then this is to be included in the GROUP BY clause.

--Display the Year, Sector and Grand Total PV Stock for all the sectors together with sub-totals
SELECT Year, 
       Sector,
       SUM([Southern Harbour]+[Northern Harbour]+[South Eastern]+Western+Northern+[Gozo & Comino]) 'Total PV Stock'
FROM dbo.['1 - Stock of PV installations$']
GROUP BY ROLLUP(Year, Sector);

Note: In this statement the ROLLUP function was used and this will add the sub-totals for the Year alone and the grand total at the very end.

----Display the Year, Sector and Grand Total PV Stock for all the sectors together with sub-totals for each value in the Year and Sector columns
SELECT Year, 
       Sector,
       SUM([Southern Harbour]+[Northern Harbour]+[South Eastern]+Western+Northern+[Gozo & Comino]) 'Total PV Stock'
FROM dbo.['1 - Stock of PV installations$']
GROUP BY CUBE(Year, Sector);

Note: In the above statement the CUBE was used and this is a step further than the ROLLUP as in this case it also includes the sub-totals for the distinct sectors apart from those of the Year and the grand total.