From the course: Introduction to Career Skills in Data Analytics

Transforming data in SQL

From the course: Introduction to Career Skills in Data Analytics

Transforming data in SQL

- [Instructor] If you are a data analyst, at some point you will encounter or hear SQL or SEQUEL. Let's start with the basics. SQL stands for Structured Query Language. Structured query language is vast. It's not unlike any other language. SEQUEL, however, is a computer language that works with data and the relationships between them. Microsoft SQL Server is a relational database management system developed by Microsoft with the primary function of being storing and retrieving data, although it does so much more. It was developed over 30 years ago and it does a lot of different things with data. And it's important to understand you don't need to know them all. As a data analyst, you do need to know some basic queries. A basic query allows you to select data from the database. There are two required statements for a SELECT. You must know what you want and where you want it from. This is the SELECT and the FROM statement. The SELECT will list all the fields from the table, and the FROM actually lists the table name. If I want to filter data, then I'll use the WHERE statement. And if I want to sort data, I can use the ORDER BY statement. WHERE and ORDER BY are not required to be in the statement. However, when they are used together, they are required to be in the right order. You have to filter the data before you sort it. Let me show you how to run some basic SEQUEL statements. I'm using SQL Server Express and I'm working with Microsoft SQL Server Management Studio on the Wide World Importer Sample Database. I'm going to run the supplier transactions. I'll right click, I'll select top 1000 rows. This generates a basic SQL statement. It selects all of the fields from Wide World Importers. Okay, it's also a filter for the top 1000 records. I'll go ahead and remove that statement and execute it again. If you look on the bottom right hand side, this tells me I have 2438 supplier transactions. To add more meaning to this data, I actually need to add another table. And this brings us to working with joins. When you have data in multiple tables, you leverage joins to control what data shows in the results. Okay, I'm going to highlight my select statement. I'll right click it and go to the design query and editor. Even though I can code all these statements, it is easier to work inside a GUI, a graphical user interface, especially if you're at the beginning. Okay. So I'm going to size my table here so I can see everything. All right, I'll right click and go add a table. And I want to add the suppliers. Because these tables have an established relationship in the database design, they're automatically joined. They're joined by the supplier ID being in both tables. I can also see that it's a key shape with a one to many, meaning I have one supplier listed and they may be attached to many transactions. When I hover over the diamond shape, it shows me the inner join, but I can also see that in the statement here. Okay, perfect. Now let me add the supplier name. Now it will automatically throw the supplier name at the last part of the select statement. But if I want to put it at the beginning, I can just drag it up. Okay, I'll click OK, and then I'll execute my statement. An inner join works by looking at both tables to find a match. And what that means with these two tables is that if I have a supplier name and that supplier has a transaction record, they will show in the results. This is showing me 2438 records where I have a supplier and a transaction. This is perfect. The only issue I have with this data is if I wanted to report on suppliers that we have in our system, regardless of their transactions, I have to adjust the join type. All right, I'll highlight my statement, I'll go to the design view. The diamond shape is where I can control the joins. I'll right click this and tell it to show me all rows from suppliers. And this will create an outer join. If it's left or right, will be determined on how the database sees left or right. So I've told it to show me all suppliers, regardless of their transactions. I'll click OK. And if you'll notice in the statement, it's a right outer join. It sees the supplier table on the right of the data. Okay, I'll go ahead and execute. I now see that I have more records. I have 2,444 records. This means I do have suppliers listed in our data set that do not have transaction records. Let's scroll to the bottom and see what that looks like. Starting with nod publishers, I see the first set of suppliers that do not have transaction records. They're easy to see because the transaction record all says null in each field. That's because there are no supplier transactions for these final suppliers. If I want to see if there are supplier transactions that do not have a supplier, then I can adjust the join type. I can tell it to give me a left outer join. Now I could go to the design view and adjust this, or I can just type left outer join here in my statement. And then I can execute. That's because there's a relationship between these tables that will not allow you to put a transaction in without a valid supplier. But again, you could easily have suppliers that do not have transactions yet. Because join types do impact the data we have in our results set, you always need to critically think through what you're trying to achieve with your data and know that you might need to adjust the join type.

Contents