IBM Data Science Professional Certification

A review of the most popular introductory certification to Data Science.

IBM Data Science Professional Certification

The IBM Data Science Professional Certification was a comprehensive, fun way to introduce myself to Data Science. Which ultimately helped me get onto my masters in Data Science and land my first job in Data Analysis.

Here's an overview of what I did on the course and review for anyone contemplating the course. For anyone interested in the code for the capstone project of predicting SpaceX rocket launches check out my GitHub.

If you're in a hurry scroll down to the bottom for the TLDR đŸ˜„


Overview

This is a beginner level course with no prior experience required and according to the homepage takes 5 Months to complete, at 10 hours a week. When I was completing this I was working full time and could afford roughly 5-7 hours per week whilst maintaining a reasonable social life (you could probably do it quicker if you drink less aperol spritz than me on a nice summers day).

You can log on whenever you want and continue your progress, you can also start any of the 10 modules at varying times, but I'd advise doing them in chronological order. Modules vary in difficulty and there are some guides online to help you through the difficult sections (if needed).

The content is well produced and maintained, as detailed in the versioning notes at the bottom of the Jupyter Lab Notebooks provided. Each module comes with some presentations with quizzes throughout (pay attention!), videos, mini lab sessions and at the end of each module there is an assignment with a Jupyter Lab Notebook detailing the scenario, task required and blank sections for you to input your coding answers. Coursework is uploaded and marked by fellow colleagues on the course, usually you get your mark within 1-2 days of submitting (probably shorter now due to the popularity of this course currently).

It cost me £29 per month for the Coursera subscription which grants you access to do this course and don't worry about the enrollment deadline like I did its just a marketing ploy to get you to sign up there and then. It's an online course where students mark each others work - it runs itself (other than maintenance of material).

Stuff you'll get exposure to: Jupyter notebooks, R studio, IBM Watson, SPSS modeller, statistics, Neural Networks, Foundations of John Rollins methodology for data science, Python Basics (Data Types, Expressions, Variables, and Data Structures) Python programming logic (Branching, Loops, Functions, Objects & Classes, Pandas, Numpy and Beautiful Soup).

More stuff you'll get exposure to...

Access web NBA data using APIs and web scraping from Python in Jupyter Notebooks.

Using Jupyter notebook with SQL to collect and analysed census, crime, and school data for a given neighbourhood. Identifying causes with a multiple linear regression model that impacts the enrolment, safety, health, environment ratings of schools. Striving to improve educational outcomes for children and youth in the City of Chicago.

Use JupyterLabs, Python (libraries: pandas, numpy, seaborn and matplotlib) to predict the market price of a house given a set of features of the houses e.g. zip codes, bedrooms, bathrooms etc. Achieved using a linear regression model. Then preformed a second order polynomial transformation on the training data and testing data. To identify the curvilinear relationship between independent and dependent variables as the data was non-linear.

Learn the rudiments of machine learning techniques such as Regression (Linear Regression, Multiple Linear Regression), Classification techniques (K-Nearest Neighbours, Decision trees and Regression trees), Linear Classification (Logistic Regression) and Clustering with K-Means.

Utilise Python (Pandas, Beautiful Soup) to extract Tesla and GameStop financial data to build an automated/live dashboard comparing stock price and company financial metrics.

Use SQL and SpaceX API to select and sort SpaceX Falcon 9 launch data. Conducted EDA in JupyterLabs (python), constructed dashboard to analyse launch records interactively with Plotly and launch site proximity with Folium. Split data into test and training sets. Tuned hyperparameters for several models (Logistic Regression, Support Vector Machines, Decision Tree Classifier and L=K-Nearest Neighbours) and deployed models to predict successful landing of Falcon 9 rocket. Communicate findings in a powerpoint.


Modules

Below is a list of the modules included and brief summary of contents:

What is Data Science?

  • Define data science and its importance in today’s data-driven world.
  • Describe the various paths that can lead to a career in data science.
  • Summarize advice given by seasoned data science professionals to data scientists who are just starting out.
  • Explain why data science is considered the most in-demand job in the 21st century.

Tools for Data Science

  • Describe the Data Scientist’s tool kit which includes: Libraries & Packages, Data sets, Machine learning models, and Big Data tools
  • Utilize languages commonly used by data scientists like Python, R, and SQL
  • Demonstrate working knowledge of tools such as Jupyter notebooks and RStudio and utilize their various features
  • Create and manage source code for data science using Git repositories and GitHub.

Data Science Methodology

  • Describe what a data science methodology is and why data scientists need a methodology.
  • Apply the six stages in the Cross-Industry Process for Data Mining (CRISP-DM) methodology to analyze a case study.
  • Evaluate which analytic model is appropriate among predictive, descriptive, and classification models used to analyze a case study.
  • Determine appropriate data sources for your data science analysis methodology.

Python for Data Science, AI & Development

  • Learn Python - the most popular programming language and for Data Science and Software Development.
  • Apply Python programming logic Variables, Data Structures, Branching, Loops, Functions, Objects & Classes.
  • Demonstrate proficiency in using Python libraries such as Pandas & Numpy, and developing code using Jupyter Notebooks.
  • Access and web scrape data using APIs and Python libraries like Beautiful Soup.

Python Project for Data Science

  • Play the role of a Data Scientist / Data Analyst working on a real project.
  • Demonstrate your Skills in Python - the language of choice for Data Science and Data Analysis.
  • Apply Python fundamentals, Python data structures, and working with data in Python.
  • Build a dashboard using Python and libraries like Pandas, Beautiful Soup and Plotly using Jupyter notebook.

Databases and SQL for Data Science with Python

  • Analyze data within a database using SQL and Python.
  • Create a relational database and work with multiple tables using DDL commands.
  • Construct basic to intermediate level SQL queries using DML commands.
  • Compose more powerful queries with advanced SQL techniques like views, transactions, stored procedures, and joins.

Data Analysis with Python

  • Develop Python code for cleaning and preparing data for analysis - including handling missing values, formatting, normalizing, and binning data
  • Perform exploratory data analysis and apply analytical techniques to real-word datasets using libraries such as Pandas, Numpy and Scipy
  • Manipulate data using dataframes, summarize data, understand data distribution, perform correlation and create data pipelines
  • Build and evaluate regression models using machine learning scikit-learn library and use them for prediction and decision making

Data Visualization with Python

  • Implement data visualization techniques and plots using Python libraries, such as Matplotlib, Seaborn, and Folium to tell a stimulating story
  • Create different types of charts and plots such as line, area, histograms, bar, pie, box, scatter, and bubble
  • Create advanced visualizations such as waffle charts, word clouds, regression plots, maps with markers, & choropleth maps
  • Generate interactive dashboards containing scatter, line, bar, bubble, pie, and sunburst charts using the Dash framework and Plotly library

Machine Learning with Python

  • Describe the various types of Machine Learning algorithms and when to use them
  • Compare and contrast linear classification methods including multiclass prediction, support vector machines, and logistic regression
  • Write Python code that implements various classification techniques including K-Nearest neighbors (KNN), decision trees, and regression trees 
  • Evaluate the results from simple linear, non-linear, and multiple regression on a data set using evaluation metrics 

Applied Data Science Capstone Project

  • Demonstrate proficiency in data science and machine learning techniques using a real-world data set and prepare a report for stakeholders
  • Apply your skills to perform data collection, data wrangling, exploratory data analysis, data visualization model development, and model evaluation
  • Write Python code to create machine learning models including support vector machines, decision tree classifiers, and k-nearest neighbors
  • Evaluate the results of machine learning models for predictive analysis, compare their strengths and weaknesses and identify the optimal model

source: https://www.coursera.org/professional-certificates/ibm-data-science

Now on to the meat and potatoes of the review...


Strengths

  • Comprehensive course that prepared me well for my masters in data science (had more coding involved in it than a £14,000 MSc - no I'm not salty at all...).
  • Great structure and content for the price, you can download all the notes, slides and Jupyter Notebooks provided. Which I referred to more than once when I started my first Data Analyst position.
  • A well respected certification. From reddit to LinkedIn Data Nerds across the simulation refer to this certification as a great starting point and I agree.
  • You get a certification badge for LinkedIn and printable certificate which is worth attaching to job application portals in the additional documents section.
  • Great prospects. Each to their own but, with this certification, a BA in Accounting and Financial Management, 60k worth of student debt and a plethora of kitchen work experience on my CV. I received several interviews for data related roles and settled for a Business Operations Analyst role with an up and coming start up. Obviosuly there are lots of variables at play here but for simplicity - I believe having this on my CV put me in good stead for a career in data.

Limitations

  • Due to the extensive coverage of this certification it doesn't go into the
    minutiae of the machine learning models which a £14k masters would.
  • My first Data Scientist position, reinforced to me one of my old professors warnings - that data engineering is 80% of the job. This is where this course fails in terms of technical skills. All data sets are neatly provided and little to none data formatting/transformation is needed. It will come as a big shock when you need to explode some dict's, handle varchar's, missing and duplicated data in the workplace.
  • I'm convinced that some people are just marking to receive their mark. A caveat of this course is that you only receive your grade for module assignments when you've marked 2 of your colleagues' submissions. Only once I received a mark which was not full marks, which yes is flattering but, I would have liked some more in depth feedback. So whether the grade accurately reflected the work done or was just done in a rush by someone eager to receive their grade I don't know.

Summary

Lets keep this snappy I've got meal prep to make (chicken, feta, hummus, olive, butter bean, and tomato salad).

I'd reccomend this course for anyone wanting to start their journey in Data Science, whether you've just read the inspiring Harvard Business review article, already in the data world as an analyst or engineer and want to get to the science'y part, or you've been studying biology for 3 years and want to earn mega bucks as a biostatistician. This is the perfect certification to get you started.


TLDR

"The IBM Data Science Professional Certification is an accessible and comprehensive online course ideal for beginners, requiring no prior experience and offering flexible learning at 10 hours a week. It includes a broad range of topics from Python programming to machine learning techniques, with practical assignments in Jupyter Notebooks. The course offers good value at £29 per month on Coursera, and successfully completing it can bolster one's CV, as evidenced by the author's transition to a Masters in Data Science and a career in Data Analysis. However, it lacks depth in machine learning theory and data engineering skills, which may require further learning." - Mr GPT