Review: SQL for Data Science Specialization by UC Davis
Program Overview
There are winners and losers in the realm of analytics, and SQL is definitely a winner. After aggregating thousands of job descriptions for analytics careers, we identified SQL as the #2 most employer sought-after skill behind Tableau (which UC Davis also covers, btw).
Enter UC Davis's SQL Basics for Data Science Specialization. This specialization runs the gamut of concepts, from the basics of SQL syntax all the way to more theoretical concepts like SQL in Machine Learning and AB testing. If you're interested in learning more about the general field of analytics first, check out our pieces on data scientists and product analysts to get a better handle of the responsibilities. Either way, SQL is a very applicable skill for many different tech careers.
The courses also topics such as the importance of cleaning data, how to optimize queries for performance, and how to use Databricks to enhance a company's data science program. The program finishes with a hands-on capstone where students can apply their newfound skills to a complete a comprehensive project with several custom datasets and metrics.
The five courses to complete the specialization are:
- SQL for Data Science
- Data Wrangling, Analysis and AB Testing with SQL
- Distributed Computing with Spark SQL
- SQL for Data Science Capstone
The material is taught by real (and impressive, might we add) industry professionals, including two data scientists at Databricks, and the founder of Women in Data Sadie St. Lawrence- who is even a kinder soul in person!
If you only have time for one course and want to be extra dangerous this year- maybe you're a product manager looking to influence others with your data skills- we especially loved very first course focused on SQL fundamentals. We loved it because it was super practical, and a completer could walk away able to analyze a database with the most popular queries.
The curriculum is meant to be consumed over a 16-ish week period, but it can also be completed at your own pace. From the reviews, we think you could finish in as little as 6-8 weeks if you really push yourself.
Best for: Beginners, SQL Newbs and Aspiring Analytics Professionals
This course has some great material, but it is definitely aimed at an audience with no prior knowledge of data science or SQL. If that's you, great! We definitely recommend weighing this course as a way to broaden your expertise of the role and its responsibilities.
If you have experience with data or web analytics tools, even in another program like Tableau, this specialization may be taught at a pace that's a little slow for you. If that's you, we've compiled some supplemental resources for your learning that progress at a more engaging pace.
After completing the specialization, you'll be equipped to apply for entry-level analyst positions, and have a leg up at companies that use Tableau (which is a lot of them!).
Weekly Breakdowns
We’ve recapped the learning objectives from each week to set your expectations for course material. The great part about this program is that you can jump to any course, and any section if it’s interesting to you. For example, if you’re looking to just learn the rudimentary syntactical commands of SQL, check out course 1, week 2.
To audit an individual week-- find the exact course (we've linked them individually here) and click "audit" to save it to your profile. Then open the desired week on the side panel that aligns with our recaps.
Course 1: SQL for Data Science
Learning Objectives from Week 1: Getting Started and Selecting & Retrieving Data with SQL
- Learn the difference between SQL for data science applications and SQL for common data management.
- Utilize an Entity Relationship diagram to show the relationships and inter-dependencies of data elements to answer business questions.
- Practice retrieving relevant columns of data from a table using SQL queries.
- Review basics of SQL and practice adding comments in queries for collaborators to understand.
Learning Objectives from Week 2: Filtering, Sorting and Calculating Data with SQL
- Showcase the difference between working with filtered and unfiltered data sets, including performance metrics.
- Get familiar with SQL syntax, including basic clauses (WHEN, IN, NOT, AND, OR), aggregate functions (AVG, COUNT, MAX, MIN, SUM), sorting (ORDER BY) and summary terms (GROUP and HAVING).
- Learn how to utilize wildcards in data filtering and searching situations.
Learning Objectives from Week 3: Subqueries and Joins in SQL
- Use subqueries to connect multiple tables and retrieve data.
- Practice filtering a dataset using set theory by joining tables using Natural, Inner, Outer, and Self Joins.
- Compare pros and cons with Cross and Cartesian Joins.
- Learn how to create an analysis table from multiple queries using the UNION operator.
Learning Objectives from Week 4: Modifying and Analyzing Data with SQL
- Integrate data from different sources by using strings, dates, and numeric data.
- Identify circumstances that need a join when preparing data for analysis, including organizational, governance, business, and data considerations.
- Practice implementing the 3 rules for translating an analysis question into a SQL statement: (1) identify the columns needed for the analysis, (2) specify the conditions for filtering the data, and (3) define the desired level of aggregation.
Course 2: Data Wrangling, Analysis and AB Testing with SQL
Learning Objectives from Week 1: Data of Unknown Quality
- Practice identifying trustworthy and unreliable data points.
- Troubleshoot why some data might be missing.
- Learn how to answer more ambiguous questions by defining new metrics to measure.
Learning Objectives from Week 2: Creating Clean Datasets
- Practice naming categories of data types and using tools to create trustworthy tables.
- Show how unfiltered data can be worked into a table, and learn why a data warehouse is different than a production database.
Learning Objectives from Week 3: SQL Problem Solving
- Practice mapping out joins and identifying the level of detail needed to answer different types of questions.
- Practice creating plans to answer all questions with a data model.
Learning Objectives from Week 4: AB Testing Case Study
- Learn how to use SQL with an AB testing calculator tool.
- Practice checking data quality and identifying key metrics that are tied to business value.
Course 3: Distributed Computing with Spark SQL
Learning Objectives from Week 1: Introduction to Spark
- Learn the basic data structure of Spark, also known as a DataFrame.
- Practice using the collaborative Databricks workspace and writing SQL code that executes against a machine cluster.
- Identify the core concepts of distributed computing, and when it's useful for an organization.
Learning Objectives from Week 2: Spark Core Concepts
- Learn the core concepts of Spark and use Spark SQL.
- Practice caching data and making configurations for increased performance.
- Show how Spark UI can analyze performance and identify bottlenecks.
Learning Objectives from Week 3: Engineering Data Pipelines
- Access and compare the tradeoffs between a variety of data formats.
- Examine semi-structured JSON data.
- Create and train an end-to-end pipeline that reads, transforms, and saves data.
Learning Objectives from Week 4: Data Lakes, Warehouses and Lakehouses
- Identify characteristics of data lakes and warehouses, and highlight the advantages of a lakehouse architecture.
- Demonstrate the value add of combining Delta Lake with Apache Spark.
- Practice building your own lakehouse with Delta Lake.
- Recap how to utilize Spark for data science & machine learning use cases.
Course 4: SQL for Data Science Capstone Project
Week 1 Milestone: Project Proposal and Data Selection/Preparation
- Select your client, audience, datasets and create a project proposal for a data analysis.
- Develop a working Entity Relationship Diagram (ERD).
Week 2 Milestone: Descriptive Stats & Understanding Your Data
- Characterize, analyze and clean your dataset.
- Prove and disprove your project hypothesis.
Week 3 Milestone: Beyond Descriptive Stats
- Track relationships in your data by using advanced SQL techniques.
Week 4 Milestone: Present your Findings with a Data Story
- Use best practices to identify your audience and nail your presentation.
- Present your insights and recommendations to a group.
Cost and Auditing
The program is only $39/month, and comes with a Linkedin Certificate on behalf of the University of California Davis.If you complete the curriculum on the proposed timeline, it should take about 4-5 months, though you could blitz through it on a break in far less. While that seems steep, compared to a degree or bootcamp this micro-certification is a steal!
If you have a learning budget, or are dedicated to upskilling your career with a data-focus– we recommend paying for and completing the program to get the shareable certificate (GET RECEIPTS!). This will help make your Linkedin more searchable to recruiters who may be looking for specific keywords and programs.
To audit the program and simply learn the material, this program is completely free! Thanks Coursera!
Student Reviews
This program has been around since 2020, and the concepts are fairly up to date and relevant. The capstone has a little under 200, which is to be expected for a predominantly online class. For Coursera standards, the course is incredibly popular and highly rated!
Some of our favorite positive review points:
- I thought this course was great! Great introduction to Relational Databases and SQLite. Highly recommend for anyone new to SQL, Databases, or someone looking to get started with a data science career.
- Joshua G. - The course starts with the definition of SQL and how it is different from other computer languages. This course also provides related reading resources, which helped me gain more insights into this field and come to know about good resources from where I can practice this newly acquired skill set.
This course also introduced ER diagrams, necessary clauses, and operators, including WHERE, BETWEEN, IN, OR, NOT, LIKE, ORDER BY, and GROUP BY, subqueries, and joins with advantages and disadvantages. You will be able to use the wildcard function to search for more specific or parts of records, including their advantages and disadvantages and how best to use them.
You will be able to discuss how to use basic math operators and aggregate functions like AVERAGE, COUNT, MAX, MIN, and others to begin analyzing our data.It also discussed how to modify strings by concatenating, trimming, changing the case, and using the substring function. Also discussed the date and time strings specifically.
You will be able to use case statements and finish this module by discussing data governance and profiling. You will also be able to apply fundamental principles when using SQL for data science. You'll be able to use tips and tricks to apply SQL in a data science context. - Alpesh G. - Great high level overview for Spark beginners with focus on application. Course materials are reasonably up to date and well designed. Might be nice if there was a PySpark complement to this course but I understand that it's part of the SQL specialization. Would highly recommend. - David Y.
Aggregations of negative review points:
- Material took way less than the given timeline (this feels like a good thing to us)
- Course 3 tried to cover too much ground and had a lot of superficial Databricks content.
- Capstone project is peer-reviewed and felt like an unnecessary addition to the specialization.
... and our favorite overall review:
Well-crafted course for a beginner. You don't need any prior knowledge of programming or other languages like C. The course pushes you to read about SQL from other sources as well. The quizzes are also designed well for enhanced learning. The final assignment was also interesting as it requires you to work like a data scientist, design your own problem and solve it. Final word: Go for this course if you're looking for a basic introduction to SQL.
-Vishal G.
Supplemental Materials
For the fans of the data-storytelling piece: Udemy's Data Storytelling
This class has more than ten thousand reviews and averages a 4.5 star rating. Upon completion you will be able to use a 5-stage arc to tell memorable stories and engage your audience.
It's tool agnostic, so if you're just looking for the story piece, this might be for you. Plus, it's only twenty bucks so your risk is low.
For the fans of data visualization: Mastering Data Visualization: Theory and Foundations
This class is extremely highly rated with a 4.7 and more than two thousand students. It focuses on how to present data clearly and effectively. It's a more high-level course and not based on a tool (take that how you will), but a great start for beginners who may be interested in the concept. The professor brags that once you take this class, you'll look at all charts in a new light!
For analysts who just need Tableau Help: Tableau 2022 A-Z: Hands-On Tableau Training for Data Science
This course is extremely popular on Udemy with more than eighty thousand reviews and a bestseller tag. It's also extremely highly rated with a 4.6 star average.
The material is more updated than UC Davis, and goes way deeper into the functionality of Tableau as a tool. We'd recommend this as a follow-on to the course after you've learned the basics, or if you're familiar with another analytics tool and looking for something with a quicker pace.
Best Data Science Specializations in 2023
This is specifically aimed at Coursera Data Analaytics programs-- so if you sign up for Coursera unlimited, theoretically you could stack all of these. We don't recommend that, but we do recommend checking these out and seeing if any of them hit particular concepts that are interesting to you.
For the web analytics power-user: Google's Data Analytics Specialization
Google also sponsors a data analytics certificate program through Coursera. This is one of the more coveted certificates in the industry for learning the Google Analytics tool specifically, hence our recommendation of this course from Duke. Google's course is also free to audit, but same rules apply if you want the certificate to show off at $49 a month.
Comparable mid-level program: University of Minnesota's Analytics for Decision Making
The University of Minnesota runs a great program with a 4.7 star average. It's free to audit, but if you want the certificate it's covered under a $49/month Coursera subscription. We especially love course 2 for the experimenters out there... you can never go wrong with data-driven optimization strategies.
This is a newer beginner-level class that has a great overview of types of analytics, and when to use each method to maximize effectiveness.
For a comprehensive overview of analytics fields: Wharton's (UPenn) Business Analytics Specialization
Wharton is a prestigious business school and offers a great overview of different analytics fields, including marketing, ops and HR analytics.
This is definitely a beginner-level specialization for people looking to identify their favorite concepts. Read our full writeup here.
More data science-y: University of Michigan's Applied Data Science Specialization
The University of Michigan also runs a great specialization that focuses on python techniques for effective data science. The reviews said it was pretty tough, but might be worth it if you're looking to expand your skillset into data science and enrolled in other Coursera options. Read our full writeup here.
Conclusion
UC Davis's program is a great way to sharpen your SQL skillset in a traditional online classroom format. A certification in SQL is a great way to show diligence and focus on analytics technology without breaking the bank with an additional college degree or bootcamp.
Here at Bridged we are huge fans of stacking micro-certifications to achieve desired career results. This program could be one notch in your arsenal to really kick your technical expertise into gear!