Python is an experiment in how much freedom programmers need. Too much freedom and nobody can read another’s code; too little and expressiveness is endangered.
Python has acquired the front seat in relevance to Data, which exactly passes through the centre of Data Science circle. What about SQL?. SQL is among the top skills that are required to avail Data. Before that, you might be interested to know what role exactly SQL fulfils for the Data.
SQL is a structured query language. In most of the organisations, the majority of the Data is stored within the relational database.
In the hierarchy of Data Analysis, when the Data is investigated and visualised and when Data is put through fetching mode, SQL roles come into the part. Yes, you can use SQL to fetch your data. Few relational databases use SQL as the key API.
Relational Database management is an integral part of Full-stack Data Science. CRM (Customer relationship management) in organisations make good use of SQL. Modern big data systems like Hadoop and spark use of SQL to make relational Database too.
Why SQL fits in well?
If you ask me why SQL makes a convenient programing language to elaborate Data, I would say, SQL is one of the easiest programming languages to learn and pick up. SQL is language friendly and has a comparatively smaller syntax. SQL is the query language which makes all the data sorting possible. It is used for querying, inserting, and modifying data. All the work related to storing information about Data, be it customers information (name, email, location, gender) or business information ( information about sales, supply numbers ).
In a layman language, what is Data?. The list of all the relevant information right?. When the list is small, it is easy to analyse the patterns. But what will happen when the information list is huge with thousands of rows and columns. How does the little mind going to decipher the pattern? It needs to stack down a pattern which is visually impactful. SQL helps the Data scientist to retrieve the data in the information environment.
What are the SQL directive Skills that a Data scientist should know?
- Relational Database model system (RDBMS)
RDBMS is a Database management system based on relational models outlined by E.F Codd. RDBMS makes the basic framework for SQL, MS SQL Server, MySQL, Oracle, IBM DB2 and Microsoft access.
Components of RDBMS:
A table is the cumulative collection of all the related data consisting of rows and columns. It is the simplest format for any data storage.
Let the name of the table be declared as “CLIENT.”
A table is divided into different sectors to store specific information.
Each column is called a Field. A Field contains facts in bits such as “Name”, “ID”, “Address”, “Age”, “Salary”.
A table thus divided into rows, and each row is called a Record. One row comprises of all the details for an individual Client. It is the horizontal entity of the table.
Column forms the vertical entity of the table. It stores all the data for a particular field. One column for this particular Table is Name.
e) Null value:
A null value means “no information”. While creating Fields, there may be some which has been left blank, and these are designated as “Null Value.” I want you to know that a null value is not equal to Zero value. A Null Value is not equal to a field that contains space either.
2. Understanding of SQL commands
As a Data scientist, you would need to have knowledge about several SQL commands such as:
i) Data Query Language: The SQL command, which is used to retrieve data from the Database, is called DQL. One such command is “SELECT.”
SELECT field names
FROM <table name>;
Data is retrieved either row-wise or column-wise using two operations;
- Project operation: When you want to retrieve an information column-wise, use this command, syntax:
- SELECTION operation: When you want to retrieve an information row-wise, from the relation or schema, use this command, the syntax for SELECTION operation is;
WHERE ClientSalary >30,000.
ii) Data manipulation language: The set of commands which deals with manipulation of data in the Database comes under DML.
The operations used for this purpose are
- INSERT: To insert any information/data to the table, this command is used. You can append one data to the table by adding this instruction, the syntax for this command is;
INSERT INTO table_name (column1, column2, column3….)
VALUES (value1, value2, value3,…..)
- UPDATE: To make modifications to the existing data in the table, this command is helpful. You can alter the preexisting information. The syntax for this command is;
SET <column_name= value>
- DELETE: This command is used to eliminate data or information from the table. The syntax for this command is;
DELETE FROM CLIENT
iii) Data Definition Language: The set of commands used to define any derivative of data inside a table consists of DDL. It generally deals with the description of the schema and can be utilised for creating and modifying the table framework.
The operations used for this purpose are,
- CREATE: syntax;
CREATE DATABASE testDB;
- DROP: It is used to delete data from the table.
DROP DATABASE databasename;
- ALTER: It is used to modify the Database.
ALTER TABLE table_name
ADD column_type datatype;
- TRUNCATE: This command removes all the records from the table, including spaces, but not the table itself.
TRUNCATE TABLE categories;
- COMMENT: This command is useful in adding a comment to the data dictionary.
iv) Data Control Language: The set of command which deals with the authenticity, right, permission and access to the Database.
The operations include:
- GRANT: This command gives the user access to the Database.
- REVOKE: This command retaliate the user from having the privilege to access the Database.
Did you see how relevant is SQL to the Data world? SQL is the helping hand which has enhanced the usability of data by taking charge of its representation in the most convenient manner.