by Luke Vella Critien
This latest series of articles are related to three simple data related techniques that involve basic data collection, querying using SQL (Standard Query Language), and data visualization. This latest series requires the software listed at the end of this article.
Renewable Energy is a topic that has gained a lot of importance in recent years. This is so, as the world is coming to terms with decisions that have harmed the environment quite drastically. In this article, official data related to renewable energy generated from Photovoltaic Panels (PV) in Malta is going to be used.
Disclaimer – The data used may contain incorrect data as there were several discrepancies in the original files which could not be verified.
The Photovoltaic Panels data was obtained from three (3) separate files from Malta’s National Statistics Office, through the following link.
The original Microsoft Excel files (News2018_094.xls, News2019_097.xls and News2020_099.xls) contained different data:
- numerical tabular data in T1, T2, T3, and T4 for three (3) successive years starting from 2015
- images of maps without the actual data on which they were based
- classification and methodological notes
Given that the files did not contain large amounts of data the amalgamation process was done manually (typically this process would require an automated tool in order to correct/avoid mistakes and to be more efficient). The numerical tabular data in T1, T2, T3, and T4 from the original 3 files were copied into a single file (Aggregated_Data.xls) with 3 different tabs. The different tabs contained the following data:
- Stock of PV installations: the number of PV panels installed between 2015 and 2019 in the different Maltese regions (Southern Harbour, Northern Harbour, South Eastern, Western, Northern, Gozo, and Comino) within different sectors (Commercial, Domestic, Public)
- Total kWp from PV: the number of total kWp generated from PV panels between 2015 and 2019 in the different Maltese regions within the different sectors specified above
- GWh estimate by PV: an estimate of the GWh generated by PV panels between 2015 and 2019 within different regions. No reference for sectors is included in this table
Loading data in Microsoft SQL Server instance
There are various possibilities of loading data from a Microsoft Excel file to a Microsoft SQL Server. In the below section a demonstration of how to perform such an operation using an inbuilt tool wizard and an SQL based query are included.
Microsoft SQL Server import data tool
The process of loading the data from Microsoft Excel file to the Microsoft SQL Server database server is quite straight forward while using the in-built SQL Server import data tool. In order to transfer the data, we will use Microsoft SQL Server Management Studio (SSMS) which is an intermediary tool (graphical interface) between the user and the database server. It is important to note that there are other tools that can be used to communicate with the database server.
Loading the data from the Microsoft Excel file (Aggregate_Data.xls) to our database instance (LobesLab) requires a number of steps as described and shown below:
- Connect to the database instance via SSMS
- Create a new database that will store the loaded data.
- Load the data via the data import tool
- Choose a Data Source: the Microsoft Excel file that contains the data
- Choose a Destination: the server and respective database that will store the data
- Select Source Tables and Views: the different tabs in the Microsoft Excel file that are to be copied to the tables within the database
- Save and Run Package: allows the creation of an SSIS package for future reference or immediate execution
Following the execution of these steps the data in each Microsoft Excel tab should be successfully loaded into a new table within the selected server and database.
SQL data import
Another alternative to importing data from a Microsoft Excel file to a table within a Microsoft SQL Server database is via the use of a distributed SQL query. In order to perform this operation, you will need to first install Microsoft Access Database Engine 2010 Redistributable and make sure that the Microsoft SQL Server database instance is using a Local System account as shown below.
In the Query editor execute the below SQL statements to enable ad-hoc distributed queries server configuration option
sp_configure 'show advanced options', 1; RECONFIGURE; GO sp_configure 'Ad Hoc Distributed Queries', 1; RECONFIGURE; GO
If you encounter the Cannot create an instance of OLE DB provider “Microsoft.ACE.OLEDB.12.0” for linked server (null) you are to execute the below code:
EXEC sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'AllowInProcess', 1 EXEC sp_MSset_oledb_prop N'Microsoft.ACE.OLEDB.12.0', N'DynamicParameters', 1
You are to execute the below code that essentially creates a new database named PVData, changes the scope to the PVData and executes an ad-hoc distributed query that takes all the data from the Microsoft Excel file name Aggregate_Data.xlsx within the ‘1 – Stock of PV installations’ tab and copies it to a table named Data1.
--Creates a database named PVData CREATE DATABASE PVData; --Changes the scope to the PVData database USE Test; GO --Copies data from the tab '1 - Stock of PV installations' within Aggregate_Data.xlsx to the table Data1 in the PVData database SELECT * INTO Data1 FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0', 'Excel 12.0; Database=C:\Temp\Aggregate_Data.xlsx', ['1 - Stock of PV installations$']); GO
In order to follow through this article the following tools are required:
- Microsoft Excel
- Microsoft SQL Server instance
- Microsoft SQL Server Management Studio
- Microsoft Power BI