Skills, agents, and workflows — where the lines actually are

Skills, agents, and workflows — where the lines actually are

A skill looks exactly like a prompt. That's the trap — and almost everyone building with AI falls into it. There's a chunk of instructions at the center, and if you squint, that's all of it. But "just a prompt" is the kind of simplification that feels right until you try to build something, and then every boundary blurs — skill, agent, workflow, chaining — and you can't tell which one you're holding. This is the map I wish I'd had. It's the mental model, then the process for turning one of your own repeatable tasks into a skill, then a dumb little example built from nothing.

A skill is not a prompt

A prompt is what you say in the moment. It's loose, single-use, gone when the chat ends.

A skill is a module. Concretely it's a folder with three parts that load at different moments:

  • Metadata — a name and a short description. This stays resident in context all the time. It's tiny, and its only job is to let the agent decide "is this relevant to what I'm doing right now?"
  • The instructions (the SKILL.md body) — the part that looks like a prompt. But it only gets pulled into context after the description matched and the skill fired.
  • Bundled resources — scripts, reference docs, templates, examples. Read or executed on demand, never preloaded.

That staged loading is the first move a flat prompt can't make. It's called progressive disclosure, and the reason it matters is that context is finite. If you tried to hand an agent fifty capabilities as one giant prompt, you'd burn the whole window on instructions for capabilities it isn't using. Skills let you keep fifty cheap pointers resident and hydrate only the one you need. Memorizing every manual versus keeping a labeled shelf and grabbing the right binder.

The second difference is bigger: a skill can ship executable code, not just instructions. A prompt asks the model to reason through a transformation token by token. A bundled script just runs the deterministic step. For manipulating a document or filling out a structured record, you don't want the model freehanding byte work every time — you want a jig. You cut the template once, and every cut after is reproducible because the jig removes the chance to wander. That's reliability you can't prompt your way to.

So: prompt is freehand instruction; a skill is a self-contained module that declares when to engage (description), how to think (instructions), and what to physically run (bundled code) — all loaded lazily so it scales.

The part that tangles everyone

Here's where I kept getting lost. I'd try to compare "skill" against "workflow" against "agent" as if they were a single menu I picked from. They're not a menu. They're two different questions wearing similar clothes.

Axis one — packaging. Is this capability bundled and reusable, or loose? Loose = prompt. Bundled and triggerable = skill. That's the only question "skill" answers. A skill is inert. It does nothing on its own. It's the binder on the shelf — not the worker, not the route the worker takes.

Axis two — control flow. Who decides what happens next: you, in advance, or the model, at runtime? You author the path = workflow. The model authors the path = agent. That's the whole distinction.

Skills don't sit on axis two at all. A skill is what either a workflow or an agent reaches for. Here's the proof they're different in kind: take three skills, and wire the order yourself — A then B then C. That's a workflow. Now instead let the model look at each result and pick which skill runs next. Same three skills, untouched. That's now agentic. The skills didn't change; only who chose the order did. So "skill" can't be the same category as "workflow."

It helps to picture it as a stack, read bottom to top:

   [ you set the order ]   [ model sets the order ]     <- control flow
            \_________________________/                    (who decides)
                        |
                  decides order of
                        v
            [   the agent (worker)   ]                   <- the worker
                holds many skills, picks one               (who acts)
                        |
                   reaches for
                        v
      [ skill A ]   [ skill B ]   [ skill C ]            <- the parts
        (inert, on the shelf, do nothing alone)           (what)

The two arrows mean different things, and that's the whole untangling. "Reaches for" (worker to parts) is containment — the skill/agent relationship. "Decides order" (control flow to worker) is the other axis entirely. They're stacked but they're not the same kind of line. That's why "skill vs workflow" never resolves as one comparison: you're comparing across two arrows.

A few clean facts that fall out of the picture:

  • One agent holds many skills. It reads the descriptions against the task and grabs the matching one. This is the normal case, not the exception — it's the entire reason descriptions sit resident and cheap.
  • One skill is used by many agents. Same binder, different workers, different rooms. That's the portability: author once, hand to anything.
  • Chaining isn't a fourth category. It's slicing that bottom row into smaller binders when one gets too heavy for the context window. Same stack, finer parts.

And the one line to keep: skill is what; workflow-vs-agent is who decides the order; chaining is how finely you've sliced it. Three questions, not one.

Who picks the skill?

When I first thought about this I reasoned like an engineer wiring dependencies: the agent gets a skill assigned at construction. That's a real pattern — it's just not the only one, and not the default.

There are two ways a skill gets picked up, and they're the same who-decides axis showing up at a smaller scale:

  • You assign it. A slash-command pointing at a specific binder, or scoping which skills an agent can even see. This is the controlled end. You decided.
  • It selects. The agent reads the descriptions against the task and grabs the match. Nobody assigned it — it matched "this is about X" to the skill whose description says "use when working with X." That's why the description is load-bearing: it's not documentation, it's the selection key.

You assign when you want control or you're sitting in the loop. You let it match when you want it to run unattended. If you had to hand-invoke every skill, you'd be back to manually driving each step — fine when you're watching, useless when it isn't.

When should a process become a skill?

Not every task earns one. Building a skill costs something up front — discovery (figuring the process out), encoding (writing it and testing it), and a small tail of maintenance as the real process drifts. You only pay that if it pays back. Most tasks shouldn't be skills.

The threshold, in plain terms:

  1. Repeatable. You've done it three-plus times, or you know you will. One-offs are just a saved prompt.
  2. Needs judgment somewhere. If the whole process is deterministic — no decisions, just rules — it doesn't need a model at all. Use a script, or boring automation like a Zap. Determinism is cheaper and more reliable than any model.
  3. You can explain it. This is the sharpest gate. If you can't walk another person through the process start to finish without hand-waving, you don't understand it well enough to automate it, and anything you encode is a frozen snapshot of a moving target. Inability to articulate the steps is the signal it's not ready.

Repeatable plus needs-judgment plus you-can-explain-it → build it. Otherwise leave it as a prompt, a script, or a note.

One trap worth naming because it's the opposite of the obvious one: don't set the bar so high that nothing ever clears it. Premature skills freeze immature work; but a bar nobody can reach just means you never ship. Keep the test light enough that you actually run it.

The process, end to end

Two phases. Resist the urge to design the whole framework on paper first — the building surfaces the gaps for you.

Discovery

Articulate the existing process as a list of steps. If you can't, reverse-engineer it: start from the goal and walk backward — "to get the output, what had to happen right before? and before that?" — with the model as a research buddy when you don't know the tools or the standard approach. Be specific when you ask; "what do people do for X" gets slop, "I'm a solo founder in Y trying to do Z, what works in this space" gets something useful.

Then decide if the whole process is a skill (repeatable + judgment + explainable). This is a per-process question. Separately, look at each step and ask "deterministic or needs judgment?" — but that only decides how each step is implemented inside the skill: a script for the deterministic ones, the model for the judgment ones. One process is one skill made of mixed steps. You decompose to implement, not to split into many skills. If you find yourself making every judgment step its own separate skill, you've jumped to chaining before you needed it.

Encoding

Two beats, and keep them distinct.

  • Beat one: build the dumbest version that runs end to end. Ignore quality completely. You're only checking that every step actually executes and the connectors are there. Don't touch voice, don't tune anything.
  • Beat two: refine what's broken. Now you care about output. The order is examples first, then guardrails, then swap model-for-script wherever a step turns out not to need judgment. Most bad output isn't a weak model — it's weak examples of "what good looks like" or missing guardrails. So when output disappoints, the fix order is examples → more thinking effort → bigger model, in that order. Don't reach for a bigger model first; that's the expensive reflex that usually wasn't the problem.

A dumb little example, start to finish

Say I generate release notes every couple weeks from a pile of merged changes. Let's build it.

Discovery. I write the steps the way I'd hand them to a new hire: (1) gather the changes merged since the last release, (2) group them — features, fixes, everything else, (3) rewrite each one in language a user understands, (4) write a one-line intro, (5) format it as markdown.

Is the whole process a skill? It's repeatable (every release), it needs judgment (the rewriting and the intro are not mechanical), and I can explain it cleanly. Yes.

Now per step: gathering is deterministic (an API call). Grouping is deterministic (sort by label). Rewriting is judgment. Intro is judgment. Formatting is deterministic (a template). So it's one skill containing a mix — scripts do steps 1, 2, and 5; the model does 3 and 4. I do not split this into five skills. One binder.

Encoding, beat one. I make the dumbest version that runs. The SKILL.md says, in plain instructions, "given a list of merged changes, group them by type, rewrite each in user-facing language, add a one-line intro, output as markdown." For now I paste the change list in by hand instead of wiring the API. I run it. Do I get something shaped like release notes, and did every step actually run? Yes. I do not care that the writing is robotic. Beat one is done.

Encoding, beat two. The rewrites read like a corporate changelog generator. So: I add two or three examples of release notes in the voice I actually want — that alone moves the needle more than anything else. I add a guardrail: no marketing fluff, no "we're excited to announce." And I notice the grouping step never needed the model — it's pure sorting — so I pull it out into a tiny bundled script. Now the model only does the part that needs a brain, the script does the mechanical part reliably every time, and the examples carry the voice.

What I'm left with is a real skill: a name, a description written as a selection key, the instructions, a bundled grouping script, and a couple of voice examples. It runs the same way every time. That's the point.

The escalation order

The mistake is pre-classifying — trying to name the bucket before you've built anything. You don't. You start at the top and stop at the first thing that fits:

  1. No judgment needed? Use a script or boring automation. Not a skill.
  2. Not repeatable? A saved one-off prompt. Not a skill.
  3. Repeatable and needs judgment? A skill. Build it as one skill with mixed steps inside.
  4. Context or cost blowing up? Now chain it — split into smaller skills with isolated context.
  5. Path genuinely can't be predetermined? Now reach for an agent.

The overwhelming majority of real business processes live at steps 3 and 4. Agents are the exception you reach for when the path is unknown, not the default.

That's the whole model. A skill is what the agent reaches for. Who decides the order is a separate question. And the way you find out what you're building is by building the dumbest version of it and watching what breaks.


Building AI infrastructure for regulated industries? That's what Aegis does — HIPAA-compliant multi-agent systems that ship, not slide decks. Get in touch.

So What is Next?

Are You Ready? Let's get to work!