← All posts

How to run a use-case hackathon that actually ships something

Small groups, one real task each, working prototypes by close of play. I've hosted this AI hackathon format inside private markets firms enough times now that the playbook is worth writing down - including the part everyone skips: the prep.

The internal AI hackathon has a bad reputation, mostly deserved. Done lazily, it's a fun afternoon that produces nothing: toy demos on fake data, a winner's mug, and no change whatsoever to how anyone works on Monday. Done properly, it's the single fastest way I know to surface a firm's real use cases, build genuine capability, and leave working prototypes behind — because the people building them are the people who'll use them, on their own tasks, with their own files.

The difference between the two outcomes is decided almost entirely before anyone enters the room.

01The prep is the product

For every hour of hackathon, expect several hours of preparation. This is what that looks like:

  • Meet the team first. Before the event, sit down with the people who'll attend — or at least their leads. Understand what a normal week looks like, where the drudgery is, what's been tried before. The best use cases are usually complaints in disguise.
  • Inventory the tools, honestly. What do attendees actually have access to on the day? Which models, which connectors, which integrations — can the AI reach SharePoint, email, the deal drive? What's blocked? A hackathon designed for tools people don't have is a demo, not a hackathon. This audit also tells you what's realistically operationalisable afterwards.
  • Write out the candidate use cases in advance — and verify them. Collect ideas from the team beforehand, write them up properly, then test-drive each one against the firm's actual stack before the event. Nothing kills a session's energy like a team discovering at minute 40 that their idea needs a data source nobody can reach. You want a bench of pre-verified ideas ready for any team that arrives without one.
  • Pick the teams deliberately. Two or three people per team, and think about composition. Sometimes you go vertical — a whole team from one function (legal, finance, the investment team), working on their shared workflow. Sometimes horizontal — mixed seniority and mixed functions, which spreads capability and surfaces cross-team use cases. Both work; choose based on what you want out of the day. What doesn't work is random assignment.
Facilitator's noteThe people running the room are facilitators, not lecturers. Once teams are building, the job is to circulate, unblock, and push each team one level deeper — not to present.

02Open with 30–60 minutes of training

Don't skip straight to building. The opening session sets the ceiling for everything after it, and it needs to cover exactly three things:

  1. Core principles. What the tools are genuinely good at (first drafts, synthesis, reformatting, structured output) and where they bite (overconfidence, context gaps). Plus the golden guardrail for anything factual: if it's not in the documents, say Not Found.
  2. Get the AI to help you build. The single most useful habit on the day: before doing anything, ask the model to plan the approach with you. We give every attendee a starter prompt for exactly this (below).
  3. What's connected. Show which integrations are live — files, email, internal document stores — so teams design use cases around what the AI can actually reach.
The planning prompt we hand out"I'm in a hackathon where we're using [tool]. We have access to skills, artifacts, the M365 connector and our own documents. I'm trying to plan out the steps needed to achieve my use case, which is [use case]. Ask me anything you need, then give me a step-by-step plan that fits in 60 minutes."

Prompting fundamentals matter more than any feature walkthrough, and you don't need to invent the material — Anthropic's applied team has a solid, free 25-minute version worth stealing from[1]:

Anthropic's "Prompting 101" — the structure we compress into the opening training [1].

03The run of show: 80 minutes, one real task

The brief to every team is the same: pick something from your actual day-to-day — a report, a summary, a brief, a review — and build it with AI during the session. Real task, real files, real output. Here's the clock:

The 80-minute run of show CHOOSE · 10' PLAN WITH AI · 15' COLLECT DATA · 15' BUILD · 10' TEST & ITERATE · ONGOING Half-way check-in Show & tell Plan phase: interact with the AI to build the use-case plan · Collect phase: documents, folders, spreadsheets, URLs, preferred sources If something doesn't work — ask the AI why. Troubleshooting is part of the training.
Fig 1 — The clock we run. The half-way check-in catches teams that are stuck politely and quietly.

Show and tell is strict: three minutes per team, four beats — the title, what the task was, a walkthrough of what they built, and the honest question that turns a demo into a pipeline: what would it take to operationalise this — or is it ready to use now? That last answer is the real output of the day. It becomes the prioritised backlog for the build phase that follows.

04Go deeper, not wider

Teams that finish early always want to start a second use case. Don't let them. The value curve bends upward as a single use case climbs four levels:

The four levels of any use case LV1 · DO IT ONCE A prompt, a useful answer LV2 · REPEATABLE Reusable prompt, artifact or skill LV3 · RELIABLE Stress-tested with planted errors LV4 · SCALABLE Documented, others can run it
Fig 2 — Climb one use case as far as it'll go before starting a second. Level 3 — planting errors and checking they get caught — is where trust is earned.

Level 2 is where the compounding starts: a prompt that worked becomes a saved skill anyone can run with one click — a pattern Anthropic has since formalised as an open standard, with pre-built libraries for financial services you can borrow from rather than starting cold[2][3].

05Seed ideas by team

Every team is told "feel free to bring your own" — but a pre-verified menu means nobody stares at a blank page. A sample of what we put on it, by function:

TeamExample use cases
Investment — seniorIC deck stress test (pressure-test the thesis before committee) · asset-review talking points · board meeting prep.
Investment — VP / associateMemo first drafts from prior memos · sourcing screens (long list → short list) · model sanity checks and anomaly flags.
LegalNDA / term-sheet redline first pass · SPA issues-list generator · obligation tracker (covenants and deadlines → calendar).
Finance / opsInvoice tracking · budget vs approvals comparison · cash-flow forecast visualised as a live artifact from Excel.
EAs / supportDaily prep brief on the day's meetings · survey year-on-year comparison · itinerary builder.

06Why this works when top-down programmes don't

MIT's enterprise AI research found that deployments succeed when adoption is driven by the people who own the workflows — line managers and their teams — rather than a central lab pushing tools outward[4]. A hackathon is that finding turned into an event: the team picks the use case, builds it on their own work, and presents it to their own colleagues. Ownership is baked in from minute one. And because the ideas were verified against the firm's real stack in prep, the distance from prototype to production is short — which is the whole point.

If you want the honest version of this for your team — prep, facilitation, and the follow-through that turns the winning prototypes into operational workflowsthat's exactly what I do.

Sources & further reading

  1. Anthropic Applied AI team (2025), Prompting 101, Code w/ Claude. YouTube
  2. Anthropic (2025), Introducing Agent Skills. claude.com/blog/skills; engineering deep-dive: Equipping agents for the real world with Agent Skills.
  3. Anthropic, Financial services skills & agents library (open source). github.com/anthropics/financial-services
  4. MIT NANDA (2025), The GenAI Divide: State of AI in Business 2025 — on line-manager-driven adoption and buy-vs-build success rates. Report PDF
  5. Karpathy, A. (2025), Software Is Changing (Again) — on natural language as the new programming interface, which is what makes non-technical teams productive in a hackathon at all. YouTube
JB
James Bell

Founder, Next Step Ventures — a boutique applied-AI practice for private markets and regulated firms, based in London. Builds in public on LinkedIn.