4 Codex-backed agents, deterministic routing, live WebSocket orchestration
Built a real-time multi-agent orchestration runtime where four thread-resumed AI musicians improvise together while the server enforces deterministic routing, state continuity, and final pattern composition.
- —Architected a dual-mode Codex and Strudel runtime that powers both assistant interactions and live multi-agent jam sessions.
- —Implemented server-owned orchestration for four isolated Codex-backed agent sessions with deterministic @mention routing and structured composition safeguards.
- —Added real-time streaming, multimodal control, and validation gates to keep the system responsive, reliable, and safe under live conditions.
Next.jsTypeScriptCodexStrudelMCPWebSockets
Validate-then-stream 10-chapter LLM app with persistent state
Deployed a stateful LLM application that turns deterministic story and lesson content into 10-chapter educational adventures, using validate-then-stream delivery, task-specific Gemini routing, and persistent session state to make non-deterministic generation reliable for real users.
- —Validate-then-stream delivery generates a full chapter, checks structural correctness with retries, and only then streams approved content to the user.
- —Deterministic story YAML and lesson CSV inputs are wrapped in task-specific Gemini routing, keeping content grounded without relying on embeddings or RAG.
- —Supabase-backed state, reconnect-safe WebSockets, deferred background tasks, and telemetry keep adventures resumable, responsive, and observable in a production-style stack.
FastAPISupabaseGemini 2.5Imagen 4RailwayReactWebSockets
Aegis AI Travel Insurance Recommender
13 PDFs, 80 evaluated customers, 85-100% scenario pass rates
Built an evaluated LLM travel-insurance recommender that turns policy PDFs and synthetic customer transcripts into structured requirements, insurer comparisons, and transcript-aware recommendations.
- —Extracted structured coverage data from 13 travel-insurance policy PDFs using Gemini 2.5 and schema validation.
- —Built a hybrid recommendation pipeline with deterministic requirement scoring and transcript-aware LLM reranking.
- —Evaluated 80 synthetic customers across four scenarios, reaching 85-100% scenario pass rates and 87.53% valid requirement coverage.
PythonGemini 2.5OpenAICrewAIPydanticReactViteNetlify
Large-PDF processing, adaptive notes, BYOK study workspace
NANA is a public BYOK study prototype that transforms dense PDFs into personalized study workspaces, combining adaptive notes, document overviews, and inline AI explanations around real study workflows.
- —Two-phase PDF pipeline with large-file preprocessing and splitting for cost-efficient note generation.
- —Personalized document overviews and study notes adapted to learner background and goals.
- —Inline AI commands, browser-resumable study state, and exportable Markdown notes built for real study workflows.
PythonFastAPITypeScriptReactViteGemini 3 Flash
CLIP + EmoNet fusion improved VEATIC MAE by about 19%
End-to-end affective computing pipeline that turns noisy real-world video into music recommendations through dual-pathway ML, variance-weighted fusion, and full-stack delivery.
- —CLIP scene analysis + EmoNet facial cues fused through uncertainty-aware dual-pathway ML
- –Variance-weighted fusion with temporal smoothing improved VEATIC MAE by ~19% vs scene-only baselines
- –Delivered a React + Flask demo with synced playback, emotion visualizations, and DEAM music matching
PyTorchReactFlaskOpenCVCLIP ViT-B/32EmoNetMediaPipe
204 personas, 1,651 synthetic entries, ordinal VIF model
Values-alignment journal analyzing whether daily behavior reflects stated priorities across ten Schwartz dimensions.
- —VIF: ordinal MLP heads with MC Dropout uncertainty estimation
- —Synthetic data generation: 204 personas, 1,651 journal entries
- —Automated LLM judge labeling pipeline via Claude Code subagents
PyTorchShinyPolarsnomic-embed-textOpenAI APIThree.js
139 expert images, 631 boxes, seven vision models benchmarked
Expert-grounded ViT attention study testing whether vision models look at the architectural evidence human experts mark as diagnostic.
- —Benchmarked DINOv2, DINOv3, MAE, CLIP, SigLIP, SigLIP2, and ResNet-50 on 139 images with 631 expert boxes.
- —Found DINOv3 strongest as a frozen spatial-prior model; CLIP and MAE improve after adaptation but on different evidence.
- —Used IoU, Coverage, MSE, KL, EMD, calibrated baselines, bootstrap CIs, Wilcoxon tests, and Holm correction.
PyTorchFastAPIReactSQLiteVision Transformers