What is Data Science in layman's terms?
In this post I try to explain what data science is at 4 levels of maturity.
When sharing with people what I study, I generally get a response along the lines of:
1) "Is that with computers and data analysis? bloody good money is that" - Random old chap at the Fox House bus stop.
2) "I don't even know what that is, sounds complicated though" - colleague from work.
While neither of these responses are "academically" correct, they're actually not far of. We do use computers, preform data analysis and yes at times it's quite complicated! So without further a do, I shall attempt to explain Data Science at 4 different levels (inspired by one of my favourite YouTube Series).
Level 1 Child
Data Science is about using computers to find stuff out from data. Data can be anything e.g. how many toys do you have, what colour are your toys and what types of toys you have. From this we can see which toys you prefer and are most likely to get your parents to buy.
Level 2 Teen
Data Science uses computers and different softwares to produce insight from data. There are 5 main stages to doing this: generating a question, gathering data, structuring the data, exploring the data and communicating the data. We may need to revisit the first 3 stages whilst we explore the data as the question might change after exploring the data, we may need more data, or we might need to restructure it.
Level 3 College Student
Data Science is a method that utilises elements from computer science, maths & statistics and domain knowledge you're utilising it in (e.g. healthcare, manufacturing, cyber security etc.). To produce insights from data available or generated. This data is collected, processed and cleaned using various softwares. Then we explore the data using statistics and visualisations to see whats going on. We then might deploy machine learning algortihms and statistical models to make inferences/predictions. With these results we can communicate, visualise or report findings to stakeholders to help them make decisions or gain insights within their domain.
Level 4 University Student
There is no commonly agreed definition of data science or what it is. Definitions vary depending on perspective however there are main themes of the definition. Data Science is a methodology, transforming data using mathematics and statistics to produce valuable insights, decisions and products. For my preferred definition please see below:
"Data Science is a field of study and practice that involves using data to achieve specified goals by designing or applying computational methods for finding patterns, connections or relationships that can be used for inference and prediction. Data Science also involves subject or domain understanding, effective communication of insights, building data products, and consideration of wider issues (e.g. ethics). The goal of Data Science is to help people understand real-world phenomena, support data-driven decision-making and provide automation." - (Fayyad and Hamutcu, 2020)
Data Science uses a range of techniques (clustering, dimension reduction, classification, deep learning networks etc.), tools (regression, compression, graph algorithms, linear and non-linear optimisation, encryption) and technologies (Python, R, Java, C, C#, SQL etc.). For a visual representation of just some of the technologies click here.
Necessary skills needed to be an efficient data scientist as defined by (Kelleher and Tierney, 2018) include: Business understanding, Data exploration and preparation, Data representation and transformation, Computing with data, Model building, Model Evaluation and maintenance, Visualisation and preparation, Deployment and application development, Communication.
Comments ()