From a single neuron to a deployed, monitored ML service. 100 challenges covering the full supervised learning lifecycle.
Start with nothing and end with a production ML system. You'll implement linear and logistic regression from scratch in NumPy, understand every line of the gradient descent loop, then move into scikit-learn Pipelines for real datasets, decision trees and ensembles, neural networks in PyTorch, CNNs, sequence models, hyperparameter tuning with Optuna, model evaluation and SHAP explanations, containerized REST serving with FastAPI and Docker, and production monitoring for data drift. Every module has runnable code in Python and a real project to ship.
Built by Lakshya Kumar
We grant free access case-by-case — students, career-switchers, builders on a tight budget. Sign in to send us a note.
Sign in to applyComplete all modules, then submit the required number of capstone projects. Each must earn a passing rating from an admin reviewer.
Pick a real tabular dataset (not Iris, not MNIST — something from Kaggle, UCI, or your own domain). Train at least two model families (e.g., Ridge + GBM), compare them with proper nested cross-validation, tune hyperparameters with Optuna, generate SHAP explanations for the winning model, serialize it, and deploy behind a FastAPI + Docker endpoint. Ship as a GitHub repo with: README with dataset description, model card (performance metrics, feature importance, fairness notes), `train.py`, `serve/` directory with Dockerfile, and a `curl` example that hits your running container.
Paste this into any AI chat. Fill in the bracketed parts with your context — you'll get back a straight answer on whether this belongs on your plate.
I'm considering a 'Machine Learning: Zero to Production' course. It starts from a single neuron in NumPy and ends with a containerised, monitored ML service. 100 challenges in Python: gradient descent from scratch, scikit-learn Pipelines, decision trees, ensembles (RF/GBM), PyTorch neural nets, CNNs, sequence models, Optuna tuning, SHAP, FastAPI+Docker serving, and drift monitoring. Context about me: 1. My current background: [e.g. "Python developer, never touched ML", "data analyst who uses Excel and SQL", "CS student who took one stats course", "backend engineer tired of calling OpenAI APIs without understanding them"] 2. What I can already do in Python: [e.g. "write functions and classes", "use pandas", "never used NumPy", "comfortable with decorators"] 3. What I want to be able to do after this: [e.g. "get an ML engineer job", "deploy my own model at work", "understand what Kaggle competitors are doing", "build a recommendation system"] Answer these: - For my background, which 2 modules will give me the highest leverage in the next 3 months, and why? - Name a concrete artifact I'd build that I could actually show in a job interview or use at work. - Is 60 hours worth it for me, or should I do something shorter first? - What will I NOT be able to do after this course — e.g. "train large language models", "build real-time video classifiers at scale", "replace a data science team"?
Build a feature pipeline (batch via Airflow + real-time via streaming) that writes to an offline and online feature store. Train a model on the offline features and serve predictions using the online store at P95 < 50ms. Test feature parity between training and serving.
Deploy a model to production (real or simulated) with monitoring: prediction logging, ground-truth join, drift detection (KS-test on inputs, prediction distribution change), and an alert that fires on a deliberately injected distribution shift.
Build a pipeline that retrains a model weekly: pulls fresh data, validates quality, retrains, evaluates against the current production model, and only promotes if metrics improve. Include rollback for catastrophic regressions.
Wire MLflow or Weights & Biases into your training stack. Run 5+ experiments with different hyperparameters; produce a comparison report. Reproduce one experiment from scratch using only the tracked metadata. Document the data versioning approach.
Canonical reference. Open it alongside every scikit-learn task.