Lab 3 — Build and Package GitHub Pulse (Milestone)
Time: ~5 hrs · Difficulty: Core / Stretch · Builds on: Lab 1, Lab 2
Objective
Ship the month’s milestone: GitHub Pulse, a real, installable CLI that takes a GitHub username and produces a Markdown activity report — recent public activity, recent pull requests, repositories the user has starred, and a breakdown of the languages across their repos. It reads its token from .env, supports a --json flag, survives rate limits using your get_with_retry from Lab 2, paginates where needed, ships with a small pytest suite that never touches the network, and installs as a genuine command via uv tool / pipx. This is the first program in the course that reaches into the world and returns something genuinely useful — and it is the same GET + auth + retry + pagination + parse pipeline that every model-API call later in the course is built from.
Setup
cd ~/agentic/month-04
uv init github-pulse --package --python 3.12
cd github-pulse
uv add requests python-dotenv
uv add --dev pytest
git init
echo ".env" >> .gitignore
echo ".venv/" >> .gitignore
cp ~/agentic/month-04/api-basics/.env .env # reuse your token
Confirm: git status does not list .env, and cat pyproject.toml shows requests and python-dotenv under dependencies plus pytest under a dev group.
Background
Recall first (from memory): From Lab 2, what does get_with_retry retry on, and what does the max_pages cap protect against? From Month 3, which pyproject.toml table turns a function into an installable command? Answer before reading on.
Everything from Labs 1 and 2 converges. Here is the pipeline you are assembling — the same shape every model-API call takes later:
flowchart LR
A["username arg"] --> B["build_headers (token from .env)"]
B --> C["client.py: GET + retry + paginate"]
C --> D["raw JSON (impure edge)"]
D --> E["report.py: pure tally + render"]
E --> F{"--json flag?"}
F -->|"Yes"| G["print JSON"]
F -->|"No"| H["print Markdown"]
Notice: the network lives only in client.py (the left edge); everything in report.py is pure data-in/data-out, which is exactly why the tests can run with no network.
Auth and .env come from Lab 1; get_with_retry and get_all_pages come from Lab 2 — you will copy that resilience module into this project. The new work is composition and polish: a thin main() that parses arguments (Month 3’s argparse), a small client that gathers the data, pure functions that transform raw API JSON into a tally and into Markdown, a --json escape hatch, a tiny test suite, and a [project.scripts] entry point so the tool installs as github-pulse.
The design principle that makes this testable: separate I/O from logic. Functions that hit the network are impure and hard to test; functions that take data and return data (the language tally, the Markdown renderer) are pure and trivial to test with a fixed input. We keep them apart on purpose. That is the same architecture you will use to test agents in Month 5 and beyond — the network lives at the edges, the logic in the middle.
Steps
1. Confirm the package layout and entry point
uv init --package created src/github_pulse/ and a [project.scripts] table. Open pyproject.toml and edit the script entry to point at a main function we will write:
[project.scripts]
github-pulse = "github_pulse:main"
Checkpoint: pyproject.toml lists requests and python-dotenv in [project] dependencies, a [dependency-groups] (or [tool.uv]) dev group with pytest, and the github-pulse = "github_pulse:main" script line.
If not: if dependencies are missing, re-run uv add requests python-dotenv and uv add --dev pytest. If uv init did not create [project.scripts], add the table by hand exactly as shown.
2. Bring over the resilience layer
Copy your Lab 2 module so this project is self-contained. Create src/github_pulse/http.py with the same get_with_retry, _backoff, and _retry_after from Lab 2, and add get_all_pages:
import random
import time
import requests
RETRYABLE_STATUS = {429, 500, 502, 503, 504}
def get_with_retry(url, *, headers=None, params=None,
max_attempts=5, timeout=10, max_wait=60):
for attempt in range(max_attempts):
try:
resp = requests.get(url, headers=headers, params=params, timeout=timeout)
except (requests.exceptions.Timeout, requests.exceptions.ConnectionError):
if attempt == max_attempts - 1:
raise
time.sleep(_backoff(attempt, max_wait))
continue
if resp.status_code not in RETRYABLE_STATUS:
return resp
if attempt == max_attempts - 1:
return resp
time.sleep(_retry_after(resp) or _backoff(attempt, max_wait))
return resp
def get_all_pages(url, *, headers=None, params=None, max_pages=10):
params = dict(params or {}, per_page=100)
results, next_url, next_params = [], url, params
for _ in range(max_pages):
resp = get_with_retry(next_url, headers=headers, params=next_params)
resp.raise_for_status()
results.extend(resp.json())
nxt = resp.links.get("next")
if not nxt:
break
next_url, next_params = nxt["url"], None
return results
def _backoff(attempt, max_wait):
return min(2 ** attempt, max_wait) + random.uniform(0, 1)
def _retry_after(resp):
value = resp.headers.get("Retry-After")
return float(value) if value and value.isdigit() else None
Checkpoint: uv run python -c "from github_pulse.http import get_with_retry, get_all_pages; print('ok')" prints ok.
If not: ModuleNotFoundError means the file is in the wrong place (it must be src/github_pulse/http.py) or you ran bare python — use uv run from the project root.
3. The client: gather raw data from GitHub
Create src/github_pulse/client.py. It builds the auth headers and fetches the four data sources, returning raw Python data (lists/dicts) — no formatting here.
import os
from dotenv import load_dotenv
from github_pulse.http import get_with_retry, get_all_pages
API = "https://api.github.com"
def build_headers():
load_dotenv()
token = os.environ.get("GITHUB_TOKEN")
if not token:
raise SystemExit("github-pulse: set GITHUB_TOKEN in .env")
return {
"Authorization": f"Bearer {token}",
"Accept": "application/vnd.github+json",
"User-Agent": "github-pulse",
}
def fetch_user(headers, username):
resp = get_with_retry(f"{API}/users/{username}", headers=headers)
if resp.status_code == 404:
raise SystemExit(f"github-pulse: no such user '{username}'")
resp.raise_for_status()
return resp.json()
def fetch_events(headers, username, max_pages=2):
return get_all_pages(f"{API}/users/{username}/events/public",
headers=headers, max_pages=max_pages)
def fetch_repos(headers, username, max_pages=3):
return get_all_pages(f"{API}/users/{username}/repos",
headers=headers, params={"sort": "updated"},
max_pages=max_pages)
def fetch_starred(headers, username, max_pages=2):
return get_all_pages(f"{API}/users/{username}/starred",
headers=headers, max_pages=max_pages)
def gather(username):
headers = build_headers()
fetch_user(headers, username) # validates the user exists / token works
events = fetch_events(headers, username)
repos = fetch_repos(headers, username)
starred = fetch_starred(headers, username)
return {"username": username, "events": events,
"repos": repos, "starred": starred}
Checkpoint: uv run python -c "from github_pulse.client import gather; import json; print(len(gather('octocat')['repos']), 'repos gathered')" prints a repo count. (This makes live calls — it is the only network step; the tests in Step 7 will not.)
If not: a SystemExit about GITHUB_TOKEN means .env did not load — confirm it copied over in Setup. A 403 means you are rate-limited or anonymous; check the token loaded (Lab 2’s ratelimit.py).
Steps 4–5 teach the new skill of the milestone — separating pure logic from I/O so it is testable — as a gradual release: a worked pure function, a faded one you complete, then (in Step 7) an independent test you design.
Stage 1 — Worked example (I do)
4. Pure logic: tally languages and slice events
Study these complete functions, then create src/github_pulse/report.py. Notice the shared property: each takes raw API data and returns summarized data — no network, no printing — which is exactly what makes them testable.
from collections import Counter
def language_tally(repos):
"""Count repos per language, ignoring repos with no detected language."""
counts = Counter(r["language"] for r in repos if r.get("language"))
return counts.most_common()
def recent_pushes(events, limit=5):
"""Pull commit messages out of PushEvents."""
pushes = []
for ev in events:
if ev.get("type") != "PushEvent":
continue
repo = ev["repo"]["name"]
for commit in ev["payload"].get("commits", []):
pushes.append((repo, commit["message"].splitlines()[0]))
return pushes[:limit]
def recent_prs(events, limit=5):
"""Pull pull-request actions out of PullRequestEvents."""
prs = []
for ev in events:
if ev.get("type") != "PullRequestEvent":
continue
pr = ev["payload"]["pull_request"]
prs.append((ev["payload"]["action"], ev["repo"]["name"], pr["title"]))
return prs[:limit]
def starred_summary(starred, limit=10):
return [(r["full_name"], r.get("stargazers_count", 0)) for r in starred[:limit]]
Checkpoint: uv run python -c "from github_pulse.report import language_tally; print(language_tally([{'language':'Python'},{'language':'Python'},{'language':'Go'},{'language':None}]))" prints [('Python', 2), ('Go', 1)] — note the None repo is ignored.
If not: if the None repo is counted, the if r.get("language") filter is missing; a KeyError means you used r["language"] instead of r.get("language").
Stage 2 — Faded practice (we do)
5. Pure rendering: data to Markdown
Same pure-function discipline, but you complete one section. The renderer takes the gathered data and returns a Markdown string — no I/O, so a test can assert on the exact text. Three sections are written; fill the # TODO for the Languages used section, following the exact pattern of the others (a bulleted line per item, an italic placeholder when empty).
def render_markdown(data):
user = data["username"]
pushes = recent_pushes(data["events"])
prs = recent_prs(data["events"])
langs = language_tally(data["repos"])
stars = starred_summary(data["starred"])
lines = [f"# GitHub Pulse: {user}", ""]
lines.append("## Recent commits")
if pushes:
lines += [f"- `{repo}` — {msg}" for repo, msg in pushes]
else:
lines.append("- _no recent public pushes_")
lines.append("")
lines.append("## Recent pull requests")
if prs:
lines += [f"- {action} **{title}** in `{repo}`" for action, repo, title in prs]
else:
lines.append("- _no recent pull request activity_")
lines.append("")
# TODO: render "## Languages used" from `langs` (list of (lang, count));
# one "- {lang}: {count}" line each, or "- _no languages detected_" if empty.
lines.append("## Recently starred")
if stars:
lines += [f"- `{name}` (★ {count})" for name, count in stars]
else:
lines.append("- _none_")
lines.append("")
return "\n".join(lines)
Solution for the TODO
```python lines.append("## Languages used") if langs: lines += [f"- {lang}: {count}" for lang, count in langs] else: lines.append("- _no languages detected_") lines.append("") ```Checkpoint: you can render without the network by feeding empty lists: uv run python -c "from github_pulse.report import render_markdown; print(render_markdown({'username':'x','events':[],'repos':[],'starred':[]}))" prints a full report skeleton with the italic “no …” placeholders — including the languages section.
If not: a missing ## Languages used heading means the TODO is unfilled; a ValueError on unpacking means you iterated langs as single values instead of (lang, count) pairs.
6. The CLI: argparse, --json, and main
Create src/github_pulse/__init__.py:
import argparse
import json
import sys
from github_pulse.client import gather
from github_pulse.report import render_markdown
def main():
ap = argparse.ArgumentParser(
prog="github-pulse",
description="Produce a Markdown activity report for a GitHub user.",
)
ap.add_argument("username", help="the GitHub username to report on")
ap.add_argument("--json", action="store_true",
help="emit raw gathered data as JSON instead of Markdown")
args = ap.parse_args()
try:
data = gather(args.username)
except SystemExit:
raise # already a clean message
except Exception as exc: # last-resort guard
print(f"github-pulse: {exc}", file=sys.stderr)
raise SystemExit(1)
if args.json:
print(json.dumps(data, indent=2))
else:
print(render_markdown(data))
if __name__ == "__main__":
main()
uv run github-pulse octocat
uv run github-pulse octocat --json | head -20
Checkpoint: the first command prints a full Markdown report to your terminal; the second prints valid JSON (pipe it through jq . from Month 2 to confirm it parses). uv run github-pulse --help shows a clean help screen. Redirect to a file and open it: uv run github-pulse octocat > pulse.md.
If not: command not found for github-pulse under uv run means the [project.scripts] line is wrong — it must be github-pulse = "github_pulse:main" and main must exist in __init__.py. An import error means a module name is misspelled.
Stage 3 — Independent (you do)
7. A tiny pytest suite — no network
This is where the pure-vs-impure split pays off. Below are three tests over the pure functions with fixed data — they run instantly and offline. Create tests/test_report.py with them, then add a fourth test of your own (no scaffolding): assert that starred_summary(FAKE["starred"]) returns the expected (full_name, count) pair. You design the assertion from the function’s behavior.
from github_pulse.report import language_tally, render_markdown, recent_pushes
FAKE = {
"username": "tester",
"events": [
{"type": "PushEvent", "repo": {"name": "me/proj"},
"payload": {"commits": [{"message": "fix bug\n\ndetails"}]}},
],
"repos": [{"language": "Python"}, {"language": "Python"},
{"language": "Rust"}, {"language": None}],
"starred": [{"full_name": "torvalds/linux", "stargazers_count": 1}],
}
def test_language_tally_counts_and_ignores_none():
assert language_tally(FAKE["repos"]) == [("Python", 2), ("Rust", 1)]
def test_recent_pushes_takes_first_line_of_message():
pushes = recent_pushes(FAKE["events"])
assert pushes == [("me/proj", "fix bug")]
def test_render_markdown_has_all_sections_and_user():
md = render_markdown(FAKE)
assert "# GitHub Pulse: tester" in md
for heading in ["## Recent commits", "## Languages used",
"## Recently starred", "## Recent pull requests"]:
assert heading in md
assert "Python: 2" in md
uv run pytest -q
Checkpoint: 4 passed in a fraction of a second (three given + your own starred_summary test), with no network access — pytest never imported client.py. This is the habit that matters: logic is tested without the network, because the network lives only in client.py.
If not: if pytest cannot import github_pulse, run uv run pytest from the project root (not bare pytest). If your fourth test fails, print the actual return value of starred_summary(FAKE["starred"]) and match your assertion to it.
8. A faked-response test (stretch into mocking)
Add tests/test_http.py to prove get_with_retry returns a non-retryable response without sleeping, using monkeypatch to replace the real network call:
from github_pulse import http
class FakeResp:
def __init__(self, status_code):
self.status_code = status_code
self.headers = {}
def test_get_with_retry_returns_non_retryable_immediately(monkeypatch):
calls = {"n": 0}
def fake_get(url, **kwargs):
calls["n"] += 1
return FakeResp(404) # not in RETRYABLE_STATUS
monkeypatch.setattr(http.requests, "get", fake_get)
resp = http.get_with_retry("http://example.test", max_attempts=5)
assert resp.status_code == 404
assert calls["n"] == 1 # no retries on a 404
uv run pytest -q
Checkpoint: 5 passed (the four from Step 7 plus this faked-network test). You faked the network entirely — the test confirms a 404 is returned after exactly one call, no retries — and it ran with no internet at all. This is the core technique for testing networked code, expanded fully in Month 5.
If not: if monkeypatch did not intercept the call (the test hangs or hits the network), confirm you patched http.requests.get — the same requests object the module imported — not a fresh import requests in the test.
9. Write the README and commit
Create README.md documenting: what it does, the GITHUB_TOKEN setup in .env, how to run (uv run github-pulse <user> and --json), and the install command from Step 10. Then:
git add -A
git commit -m "GitHub Pulse milestone"
git log -p | grep -i -E 'github_pat_|ghp_|GITHUB_TOKEN=' || echo "CLEAN: no token in history"
Checkpoint: CLEAN: no token in history. Push to a new GitHub repo (gh repo create github-pulse --public --source=. --push) and confirm .env is not in the GitHub file list.
If not: if the grep prints a token, .env was committed — rotate the token immediately (it is compromised), git rm --cached .env, fix .gitignore, and scrub history before pushing (see Troubleshooting).
10. Install it as a real command
uv tool install .
github-pulse octocat
(pipx install . works identically if you prefer pipx.) Because the installed tool runs from its own location, it needs the token in the environment — either run it from a directory with a .env, or export GITHUB_TOKEN=... in your shell first (the production pattern from Core Concepts §4).
Checkpoint: typing github-pulse octocat from any directory (with the token available) produces the report. You shipped a real, installed CLI. Uninstall later with uv tool uninstall github-pulse.
If not: command not found after install means ~/.local/bin is not on your PATH — run uv tool update-shell and restart the shell. “Token missing” from any directory is expected: the installed tool needs export GITHUB_TOKEN=... or a .env in the current directory (see Troubleshooting).
Definition of Done
uv run github-pulse <username>prints a Markdown report with all four sections (commits, PRs, languages, starred) from live data.uv run github-pulse <username> --jsonemits valid JSON (confirmed withjq .).- The token loads from
.env;.envis git-ignored and the leak check printsCLEAN. client.pyusesget_with_retry/get_all_pages(timeouts, backoff+jitter, pagination with a page cap) for every call.uv run pytest -qpasses with at least three tests, all offline (no live network).pyproject.tomlhas a[project.scripts]entry;uv tool install .produces a workinggithub-pulsecommand.- Repo is pushed to GitHub with a README and no token in history.
- Self-verify in one block:
uv run pytest -q && \
uv run github-pulse octocat --json | jq '.username' && \
( git status --porcelain | grep -q '\.env' && echo "FAIL: .env tracked" || echo "OK: .env ignored" )
Self-explain: in one sentence, why does keeping the network inside client.py and the logic inside report.py let your test suite run with no internet at all?
Stretch Goals
- A
--top Nflag to control how many languages/starred repos appear, threaded throughargparseinto the renderer. - Cache to disk. Add
--cachethat writes the gathered JSON to a file and reuses it on the next run (with a timestamp check), so you can iterate on rendering without re-hitting the API — a preview of caching agent results. - Sleep-until-reset. Wire the Lab 2 stretch (honor
X-RateLimit-Reseton a 403/429) into this project’shttp.pyand prove it with a comment in the README. - A real fixture file. Save one live
--jsonrun totests/fixtures/octocat.json, load it in a test, and assert the renderer produces stable Markdown from it (golden-file testing). - Publish to TestPyPI (advanced): build with
uv buildand upload to TestPyPI sopipx install --index-url ...works from anywhere.
Troubleshooting
github-pulse: command not foundafteruv tool install.uvinstalls tools to~/.local/bin; ensure it is on yourPATH(uv tool update-shell, then restart the shell).- Installed tool says token is missing but
uv runworks. The installed command does not see your project’s.env.export GITHUB_TOKEN=...in the shell, or run it from a directory containing a.env. KeyError: 'payload'or'pull_request'for some users. Event payloads vary by event type. Thereport.pyfunctions already filter byev["type"]first; if you add event types, guard every nested access with.get(...).- Empty “Recent commits” for an active user. The public events feed only covers recent activity and only public events; a user whose recent work is private or older than the feed window will legitimately show “no recent public pushes.” Try a busy user like
torvalds. 403mid-run. You hit the rate limit. Confirm the token loaded (authenticated = 5000/hr vs 60/hr anonymous), and letget_with_retryback off; for the unauthenticated case, runratelimit.pyfrom Lab 2 to see your reset time.pytestcan’t importgithub_pulse. With the--package/srclayout, run tests viauv run pytestfrom the project root so the package is on the path; do not run barepytest..envappeared on GitHub. Stop. It was committed before being ignored. Rotate the token immediately (it is compromised),git rm --cached .env, fix.gitignore, and scrub history (git filter-repo) before re-pushing. This is the Month-2 lesson in practice.