Data Science Interview Questions — SQL
Below are some of the common data science interview questions that I have seen being asked in interviews. If you think there are some important ones that are missing please let me know. I have not added the “write this select query” or “write this join query” as this can be different in different interviews. You should absolutely practise those questions before you go to interviews. The best resource that I know for practising SQL questions is hackerrank (follow the link).
Share your thoughts on normalised and denormalised data. What is the best for a data science?
Normalized is what SQL freaks love. For example to normalize addresses, SQL freaks love to create 3 separate tables (Entities):
STREETS
ZIP_CODES
HOUSE_NUMBERS
This is an extreme example of over-normalization. This was actually done in the ‘70-ties when disk-space was expensive. Today only a few idiots are going this far. It doesn’t make sense anymore. We just create one single table with full addresses.
Demoralized is the opposite. It’s also known as ‘flat’. A flat Database only has one single table It has only one single table where all data is stored. The disadvantage of this approach is that doubles may exist. On the other hand, Data storage is cheap and we don’t need to worry a lot of this.
It also explains the popularity of NoSQL or Non-Relational Databases. They are easy to Index and Search and can become very large. Another great advantage of NoSQL is that these types of Databases can easily replicate (sharding) over multiple locations. This is a must for Big Data.
A denormalized data structure uses fewer tables because it groups data and doesn’t exclude data redundancies — it offers better performance when reading data for analytical purposes.
I guess working on denormalised data is better as this would mean that it is easier for the analyst to find all the information that is needed to him and would help him to understand and find the relationships better.
What is Normalization and what are the advantages of it?
Normalization is the process of organizing data to avoid duplication and redundancy. Some of the advantages are:
- Better Database organization
- More Tables with smaller rows
- Efficient data access
- Greater Flexibility for Queries
- Quickly find the information
- Easier to implement Security
- Allows easy modification
- Reduction of redundant and duplicate data
- More Compact Database
- Ensure Consistent data after modification
What is the difference between CHAR and VARCHAR2 datatype in SQL?
Both Char and Varchar2 are used for characters datatype but varchar2 is used for character strings of variable length whereas Char is used for strings of fixed length. For example, char(10) can only store 10 characters and will not be able to store a string of any other length whereas varchar2(10) can store any length i.e 6,8,2 in this variable.
What are Constraints?
Constraints are used to specify the limit on the data type of the table. It can be specified while creating or altering the table statement. The sample of constraints are:
- Not null
- check
- defeault
- unique
- primary key
- foreign key
What is ACID property in a database?
ACID stands for Atomicity, Consistency, Isolation, Durability. It is used to ensure that the data transactions are processed reliably in a database system.
Atomicity: Atomicity refers to the transactions that are completely done or failed where transaction refers to a single logical operation of a data. It means if one part of any transaction fails, the entire transaction fails and the database state is left unchanged.
Consistency: Consistency ensures that the data must meet all the validation rules. In simple words, you can say that your transaction never leaves the database without completing its state.
Isolation: The main goal of isolation is concurrency control.
Durability: Durability means that if a transaction has been committed, it will occur whatever may come in between such as power loss, crash or any sort of error.