From the course: Introduction to Career Skills in Data Analytics

Understanding the rules of the data

From the course: Introduction to Career Skills in Data Analytics

Understanding the rules of the data

- We hear about business requirements in the world of business all the time. They control what we are doing on any given project. Part of meeting the business requirements is the business rules. It is important when working with any data that you understand the rules around the data that you're working with. These rules can inform you when to expect data, what you can do with certain data of certain criteria, and also explain what needs to happen in the transformation of data. Let's work through some examples of business rules and how they can impact our data. Let's get started with just understanding what we mean by rules. Business rules can be as simple as a definition, is a contact for a salesperson, a customer, or a prospect. It could be as simple as a business rule that defines a customer is a customer once they actually place an order. These rules also control the flow of data. So if in our system, we have a sales order record, that means that the order has occurred. It means that that prospect and the potential sale made it to a certain stage of the process. Then the business can use this to easily distinguish a potential sale from an actual sale. This is an example of a simple business rule, and this rule can also be used to then convert a prospect to a customer using data. Some rules can be a bit more specific and have a technical requirement. We have some sales order data. This sales order data is going to be prepared to go into a new system that provides additional reporting about our sales orders. This information will go to our production team. So the business requirement is that we need to prepare the data to go into the new system. Now we have the data that we want to transfer to another system for reporting purposes. It has a specific template, and we must use this data from our system to match that data specification of where it's going. We've been provided this technical requirements document for our data. Let's take a quick read through that. First of all, it tells us that the sales order ID must be converted to a text data type, but it must not contain any letters. All of the date fields should not include time stamps. We also have to have a main account GL number. And that main account GL number holds a four-digit code for accounting and the last two digits to specify the category. Also, we see that territory ID and comment fields need to be removed. And the final step is to save our data in a CSV or comma-separated value file so that we can import it into the new reporting system. So now that we have our technical requirements, let's take a look at the data. Okay, so the business role in our technical spec said that sales order ID and sales order number need to be text data types. So I can look at sales order number and see pretty quickly it's a number data type. I know that because it's right-aligned in the field. I can see the sales order number is already a text data type. It's aligned left, but it doesn't meet the requirements because it contains two letters, S and O for sales order. I'll take a look at my dates. I can clearly see they include time stamps, so part of my technical requirement will be to clean this data to meet the rules, which would be only dates and no timestamps. Our specification also said we had to have a main account GL number, and this is a four-digit code for accounting and the last two digits specify the category. But when I look at the data, I don't see a main account GL number. However, because I know the business rules of the account number for these records, I know that that main account GL number could actually be created from the account number. I also see we do have columns that they said to not include, which would be the territory ID and the comments. When working with any new data project, you want to make sure you consider the rules of the organization in regard to their definitions for data. You also need to account for the flow of data and any specific technical requirements.

Contents