In September 2016, almost two years ago, I decided to upgrade my DataCamp account to a paid subscription. Luckily, back then there was a 9$/month sale and I’m still on that contract. Currently, they’re charging $25 a month - not cheap, but definitely worth the money! Right after joining their paid service, I received an email telling me that I would become a “proficient Data Science Professional in no time!”. Well, that’s quite a value proposition! Spoiler alert: The experience was (and is) really great, but this claim is seriously far-fetched!

DataCamp Certificate (Data Science, Ingo Kleiber) An example of how DataCamp certificates look like (aka. bragging).


Warning: This post will be both a report of my personal experience with DataCamp and a (rather narrative) review of their (Python) career track. Also, much of this review will sound like a blatant plug. I’m in no way associated with DataCamp - I just really like what they are doing!

TL;DR: DataCamp is really great and extremely good value. However, you’ll most likely not be a fully-fledged “Data Scientist” at the end of it. They are, however, a great addition to your education if you’re already well-rounded in some field and you want to ‘brush up’ on some coding and statistics.



Alright, let’s step back for a moment. DataCamp is an education company that offers various interactive courses in data science and especially R and Python. While there are many vendors offering similar courses, DataCamp scores with amazing teachers (and often true data science experts), great interactive exercises, a great community, as well as a very solid and (didactically as well as technically) very sound curriculum.

Back in 2016, I needed to learn some R and ended up with DataCamp, which at the time was heavily focused on R, for the abovementioned reasons. I was happy with the service and quickly learned everything about R that I needed to know. Ultimately, the exercise heavy approach led to me being able to effectively use the language very quickly. That being said, the courses (that I took) did not force me to understand the inner workings of the language and the various packages at all. Well, DataCamp didn’t claim to be a CS degree!

Slowly, but steadily, the DataCamp team started to come up with more and more courses related to Python - my usual ‘weapon’ of choice. At the time, out of curiosity, I took a few of their courses and really enjoyed them.

Usually, DataCamp courses consist of a combination of (very well done) instructional videos followed by interactive coding exercises in an IPythonesque web-IDE. Also, sometimes a little formative assessment in the form of multiple choice quizzes is thrown your way. Overall, the level of instruction is really great and the exercises, for the most part, are engaging and based on real-life datasets and problems.

Time passed, I kept my subscription, and in May 2017 DataCamp released a new feature called ‘Career and Skill Tracks’. These tracks are essentially predefined paths (curricula) through a specific topic and are supposed to help you find the right courses for your current goals (i.e. the things you want to study).

Being slightly addicted to online courses and eLearning in general (also for professional reasons), I decided to enroll into the Python Programmer career track. Two years down the line, I happily finished their most advanced track: Data Scientist with Python. While I started the track out of mere curiosity, at some point I really wanted to finish it. Maybe gamification works after all … I mean, I collected over 130.000 XP-points; that must mean something!

DataCamp Career Tracks (as of June 2018) The three DataCamp Python career tracks (simplified; as of June 2018). Please don’t sue me for using your lovely badge-icons!


As you can see above, the DataCamp team currently offers three Python career tracks. There are, as you probably expected, parallel tracks for R. Basically, with a few minor exceptions, the Analyst track contains everything the Programmer track includes and the Data Scientist track adds to the Analyst one. There are also many, highly interesting, courses available that are not officially part of the career track. For example, I can recommend Katharine Jarmul’s Natural Language Processing Fundamentals in Python. Over the course of 15 videos and 51 exercises, she is going from “What is a token?” to “How to build a fake news classifier?”.

As I already mentioned, I progressed through all three (Python) career tracks over a period of roughly one year. According to DataCamp the Data Science with Python track (from now on: track), consisting of 22 individual courses, will take you 67 hours of work to complete. Since I took the courses in low-priority-mode (besides my actual job), I can’t really judge the validity of this estimation. However, if one wants to really work through the problems and exercises, 67 hours seems like a stretch.

The track is designed to take you from novice (i.e. almost no knowledge of programming) to “Data Scientist” (whatever that exactly means) over the course of 22 courses. Since at the time of starting the track I already had a solid foundation in statistics and research (that’s what a couple of years working at a university gets you) as well as a rather advanced knowledge of Python, the first couple of courses felt like a fast-forward repetition (memento) of various classes and projects that definitely took more than 67 hours.

Also, I didn’t exactly follow the predefined (linear) path because I had already taken some of the required courses when tracks weren’t available yet. From a didactic perspective, however, following the track’s progressions seems very promising and rewarding. The courses are selected well and with some (minor) exceptions (see below) the choice of content makes a whole lot of sense.

All of that being said, the introductory courses (both for Python and R) are fantastic and well designed. While you’re being thrown into the deep end, you’ll learn to swim quite fast! However, if you have absolutely no experience in Python whatsoever, I would consider taking some prior classes (e.g. via codecademy) geared more towards general coding skills rather than data science.

The (loosely spiral) curriculum itself also works quite well and is well-rounded. After building foundations in Python and the SciPy stack, it’s swiftly progressing towards machine- and deep learning, modeling, data management, and advanced statistical methods. Databases and various visualization techniques are also introduced along the way.

Ultimately, I had only two serious issues with the curriculum. Firstly, some specific aspects and techniques became very repetitive after a short while. Put simply, I really don’t need to exercise how to import matplotlib over and over again. Overall, a certain level of (automated) adaptability would greatly enhance the (sometimes cumbersome but mostly very engaging) exercise experience! Of course, repetition and exercising are important, but some of the courses need tweaking in their level of repetitiveness.

Secondly, DataCamp, first and foremost, teaches you how to code. While there are excellent theoretical courses in the curriculum (e.g. Justin Bois’ Statistical Thinking in Python), theoretical and methodological considerations are often treated as less important. Personally, I would like to see more courses geared towards building background knowledge.

Most of the courses I took were really engaging and I enjoyed working through most of the (real-world) examples and datasets. I commend DataCamp and their teachers for providing a lot of specialized datasets alongside the ‘usual suspects’ (Iris!). To my surprise I was often able to find (useful) links between my day-job as a researcher/teacher and the DataCamp courses! This leads me to the conclusion that the DataCamp team really thinks about skills that are required ‘in the wild’. Nevertheless, to me, some of the classes and exercises (especially the ones on pandas) felt rather tedious. This, however, doesn’t have to be a bad thing necessarily. While these classes weren’t the most engaging ones, they still delivered on what they promised.

A big issue with many online classes is whether they are able to deliver on their promises. For this course (or rather this track) there are at least two things to consider:

  1. The track promises (or at least hints) towards you becoming a ‘data scientist’. This is, even considering the Unterdefiniertheit (= underdefinedness?) of the term ‘data scientist’, an unrealistic promise for a program that is supposed to be achievable in only a couple of weeks. However, if you have a solid (academic) background in research and a specific field, the track certainly can fill-in the gaps. In my case, the DataCamp curriculum felt like a very fitting and valuable addition to my previous coursework and my professional experience. As I said, I highly commend them for providing classes that are clearly tailored towards real-world needs!

  2. From a career (read: CV) perspective, the validity or worth of a DataCamp certificate needs to be considered. In other words: Would it be a good idea to put a DataCamp certificate on your CV? The answer to this question comes down to the fact that there’s no way of proving that you actually did the coursework yourself and that you didn’t cheat [1]. Since there’s no proctoring going on, the certificates, albeit shiny and motivating, are, in reality, weak signaling instruments. Hence, what it ultimately comes down to is the ability to show (and apply) what you’ve learned in these courses.

[1] ‘Cheating’, i.e. getting XP for an exercise you were unable to solve, is fairly trivial. Get all hints for an exercise (including the actual solution), reload the page before submitting the solution, and paste the solution in. I only did this for a couple of the super-repetitive exercises - I promise!

Making DataCamp Even Better

While there’s a lot to love about the track (and all the other material such as the DataFramed podcast), I would like to suggest four things that could be optimized:

  1. Adaptive exercises that react to the courses and exercises already taken. For example, if I’ve successfully completed Joining Data in PostgreSQL, I probably do not need to exercise simple SELECT queries. An alternative would be the option to skip certain basic courses within the track after passing some sort of placement test.

  2. Offer more advanced courses and especially more courses related to statistics, math, and (research) theory and methodology. The teachers are great, the platform is great, and the coding content is great, too. What’s missing, from my point of view, is more content regarding the (maybe less exciting) theory behind everything. Having said all of this, the upcomming courses seem really promising with regards to this criticism (e.g. they have a course planned on Linear Algebra for Data Science in Python).

  3. Update the tracks with the current courses available. For example, I don’t understand why Network Analysis in Python II, Introduction to Shell for Data Science, and Introduction to Git for Data Science are not part of the career track(s)!

  4. Based on my short discussion of validity and trust above, I would love to see some options regarding proctored exams and validated certificates.

Conclusion

After two years of using their service, I’m still recommending DataCamp to colleagues and friends who want to brush-up (or start) their ‘data’ skillset. Looking at the upcomming courses, I’m really excited. Currently, I’m particularly looking forward to the courses related to geospatial data and RNNs!

Of course, you can find me on DataCamp! I also sometimes hang around the Slack channel :).