After completing my first two Information Science courses along with a Crash Course in Data Science from John Hopkins University, I decided to take two data analytics related courses. Now, after completing these two courses–Data Analysis and Knowledge Discovery and Data Visualization and Communication I have decided to end my brief journey in the Information Science graduate program to pursue a Master of Science in Advanced Data Analytics. A new journey begins!
You May Also Enjoy
Analyze Trends and Predict College Enrollment
For my final capstone project (M.S. Advanced Data Analytics), I analyzed **Fall 2023 U.S. college enrollment data** (115K records, 5,900+ institutions) to uncover demographic patterns and predict graduate enrollment.  - **Key Findings:** - Women consistently outnumber men in enrollment, with the gap widening at the graduate level (60.6% vs. 39.4%). - Hispanic student representation drops sharply from undergrad (25.6%) to graduate (15.2%). - Institutional size distribution is highly skewed (median 588 vs. mean 3,332). - **Models Utilized: Linear Regression, Decision Trees, and Random Forest. - Best performer: Random Forest (R² ≈ 0.78, MAE ~631). - Strongest predictors of graduate enrollment: female enrollment and Asian student representation. 🔗 [View Full Repository on GitHub](https://github.com/saulzmtz/Analyze-Enrollment-Trends-and-Predict-College-Enrollment)
Analyze Demographic Factors and Predict College Completion
Demographic Factors and College Completion Tools:Python (pandas, scikit-learn, matplotlib), Jupyter Notebook This project analyzed 2022–2023 college degree completions across 16,000+ U.S. institutions** to uncover the strongest demographic predictors of success. Key Findings: - Female completions were the most influential factor. - Non-traditional students (ages 25–39) play a critical role in completions. - Random Forest achieved **~99% accuracy**, outperforming logistic regression and decision trees. [View Full Repository on GitHub](https://github.com/saulzmtz/Analyze-Demographic-Factors-and-Predict-College-Completion)
Data Visualization and Communication
I recently completed the [Data Visualization and Communication](https://s3.amazonaws.com/mirror.facultyinfo.unt.edu/jy0282%2Fschteach%2FINFO5709_Fall2020-1.pdf) course and utilized spreadsheets, Python and Plotly to [explore and analyze the 2019 AustinISD TEA accountability ratings](https://sites.google.com/view/5709final-saulmtz/home?authuser=0) as my final project! An analysis of the AustinISD Texas Education Agency accountability statewide ratings for 2019.  Course Description: Introduces principles and techniques for data visualization for creating meaningful displays of quantitative and qualitative data to facilitate decision-making. Emphasis is placed on the identification of patterns, trends and differences among data sets. TOOLS: Excel, PowerBI, Tableau, Python (MatPlotLib, Seaborn, Plotly) SKILLS: Graphic design principles - Color, Text, Interaction, Perception, Exploratory Data Analysis, data visualization techniques from charts to dashboards
Data Analysis and Knowledge Discovery
As my final project for the [Data Analysis and Knowledge Discovery](https://s3.amazonaws.com/mirror.facultyinfo.unt.edu/joh0019%2Fschteach%2FINFO5810-001005%20Syllabus-1.pdf) course, I created a [side-by-side comparison tool to explore the 2019 ratings of any two schools or districts in Texas](https://docs.google.com/spreadsheets/d/1ssrQSMmZnD8PD6fi37IlOLM9aGGupy1KaSN_yn7VQeI/edit?usp=sharing). Originally created using Excel using advanced look up functions and formulas, it has been transferred to Google sheets. Make a copy to interact and utilize it. My plan is to make an improved, web based version of this tool. Course Description: Introduction to data analysis, data mining, text mining and knowledge discovery principles, concepts, theories and practices. Designed for the aspiring or practicing information professional and covers the basics of working with data from a hands-on and practical perspective. TOOLS: Excel, RapidMiner SKILLS: Spreadsheet Modeling Basics - Lookup, Index, Match Functions, Pivot Tables, Array Formulas, Charts and Dashboards Data Mining Basics - Data Prep, Correlation Methods, Association Rules, K-Means Clustering, Discriminant Analysis, k-nearest neighbors, Naive Bayes, Text Mining, Decision Trees, Neural Networks
