Run a small model locally and build a generation loop you control: implement greedy and sampled decoding, measure prefill time vs. decode tokens/sec separately, handle stop conditions, and produce a short report on where the time goes. This is the baseline you'll optimize in later modules.