Data science is an inter-disciplinary field that uses scientific methods, processes, algorithms and possibly automated systems to extract knowledge and insights (wisdom) from data. It’s very important for data science practitioner (Data Scientist) to understand core components of data science.
What are the Core Components of Data Science?
- Big Data
- Machine Learning
- Probability and Statistics
- Programming Languages
What is Data?
The characteristics or information of any type (Understanding data type) that are collected through observation is known as data. In a more technical sense, data is a set of values of qualitative or quantitative variables about one objects. In reality we have mainly two types of data as given below.
- Structured Data
- Unstructured Data
To understand difference between structure and unstructured data; consider following diagram
What is Big Data?
The main purpose of data science is to extract insight and wisdom from data (Big Data) which is difficult to process by traditional application software. Term big data is related to Inter of Things (IoT). When a lot of devices (computers, vehicles and human etc. connected together). These devices will generate huge amount of data, such huge amount of data is termed as Big data.
What Do You Mean by Machine Learning?
Machine learning is an application of artificial intelligence (AI) that provides systems (machines) the ability to automatically learn and improve from experience without being explicitly programmed.
Let’s understand Machine Learning first
Which are the Types of Machine Learning?
- Supervised Machine Learning
- Unsupervised Machine Learning
Review Machine Learning techniques
Define Probability and Statistics
Mathematics (mainly Statistics and Probability) is the key component for understanding machine learning algorithms and data science process.
Statistics deals with collection of data, organization of data, interpretation and presentation of data. (Statistics Details)
Probability is the branch of mathematics concerning numerical descriptions of how likely an event is to occur. (Probability Details)
Which Programming Languages are Used?
In order to implement all concepts in computer system, we have to use programming language and theoretically we can implement data science concepts in any programming language. Due to the availability huge number of libraries and function following programming languages are widely used in data science.
Congratulations; you have an idea of components of data science process. Complete data science process which will be explained in other chapters is shown in Diagram below: