The course 5340 Discovery and Learning with Big Data seemed to be carrying over a title from a different era because by this time the term “Big Data” isn’t really en vogue anymore. Is there any other data now? I think this course could have easily been titled “An Introduction to Machine Learning utilizing Python”. It began with a crash course/review of Python in order to utilize it going forward via a Jupyter notebook environment. Just like posit Cloud was a great resource and tool for learning and utilizing the R programming language in my previous course, this time I discovered Google Colab and I was in love! Unfortunately, we still had to go through all of the Anaconda local installation and environment configuration process for our first assignment. Thankfully, our professor agreed to let us use Anaconda Cloud and Google Colab for the rest of the assignments and classwork. I tried Anaconda Cloud and really wanted to like it. They had great Python courses, etc. However, in the end, Google Colab won out for me.

Python Class Books

The crash course in Python at the start of the course consisted of 12 homework assignments/exercises that included programming basics, basic Python data types, data structures (lists, range, strings, tuples, series, dataframes, NumPy arrays). The lectures began with the basics of the data analytics life cycle and exploratory data analysis including data visualization utilizing MatplotLib which I did not care for at all. Just plain ugly. By this time I had already used Plotly for Python, Seaborn, Altair, and Bokeh and preferred any of them over MatPlotLib.

The rest of the course moved rapidly from an introduction to Machine Learning using Python (NumPy, Pandas, SciKIt-Learn) to rapidly covering Supervised Machine Learning (Linear and Logistic Regression, CART & KNN), Unsupervised Machine Learning (KMeans, Anomaly Detection), and the The Azure Machine Learning Studio (a drag and drop, wireframe, no code ML tool).

Tools Utilized: Python (NumPy, Pandas, Scikit-Learn) Jupyter Notebooks, Google Colab, Microsoft Machine Learning Studio (Azure)
Skills Acquired/Developed: Data analytics cycle, preprocessing, Exploratory Data Analysis (EDA), Supervised Machine Learning Algorithms, Supervised Non-Linear Algorithms, Unsupervised Algorithms, Evaluating Algorithms