Guide

The complete guide to writing AI step prompts that actually work

The AI step is the most powerful component in Flowpath — and the most underused. A well-crafted prompt is the difference between an agent that works reliably and one you have to babysit. Here's everything we've learned from thousands of AI step configurations.

10 min

read

Mar 25, 2026

An article by

Sofia Reyes

Co-founder & CTO

There's a common pattern we see with new Flowpath users. They add an AI step, type something like "analyze this lead and tell me if it's a good fit," watch it work perfectly on the first three test runs, and then deploy it to production. Two weeks later they notice the outputs have been inconsistent — sometimes a number, sometimes a sentence, sometimes a JSON object — and downstream steps have been failing silently because the data format they expected wasn't what the AI returned.

This guide is about preventing that. It's built on patterns we've observed across thousands of AI step configurations on Flowpath, and it covers everything from basic output formatting to advanced prompt engineering techniques for complex classification tasks.

Principle 1 — Specify output format with surgical precision

This is the single most important thing you can do to make an AI step reliable. GPT-4o is a language model — its natural output is prose. If you want structured data, you have to ask for it explicitly and describe the format in detail.

Bad: "Score this lead from 0 to 100." Good: "Return only a single integer between 0 and 100. No explanation, no additional text, no punctuation. Only the integer."

Bad: "Extract the key information from this invoice." Good: "Extract the following fields from the invoice and return them as a JSON object with exactly these keys: invoice_number (string), invoice_date (ISO 8601 date string), total_amount (number, no currency symbol), vendor_name (string), line_items (array of objects with keys: description, quantity, unit_price, total). If any field cannot be found, return null for that key."

The second example leaves no room for interpretation. The model knows exactly what to return and in what format. Your downstream steps will receive consistent, parseable data on every single run.

Principle 2 — Be context-rich

The quality of an AI step's output is directly proportional to the quality of its input. Use every piece of enriched data available to you — don't just pass a company name and hope the model infers the rest.

A lead scoring prompt that passes {{company_name}} only will produce generic, unreliable scores. A prompt that passes {{company_name}}, {{industry}}, {{employee_count}}, {{funding_stage}}, {{job_title}}, {{tech_stack}} and {{annual_revenue_estimate}} will produce scores that are meaningfully differentiated and consistent across runs.

The same principle applies to classification tasks. If you're classifying a support ticket, pass the full ticket body, the customer's account tier, their tenure in months, and their historical ticket count. The model will use all of it — and the output will be significantly better for it.

Principle 3 — Always handle missing data explicitly

Clearbit enrichment fails on roughly 8% of contacts. API calls time out. Form fields get left blank. Your AI step will eventually receive incomplete data, and if your prompt doesn't tell the model how to handle it, the behavior will be unpredictable.

Always add a fallback instruction to any prompt that depends on enrichment data: "If any of the following fields are null or empty, treat them as unknown and adjust your output accordingly. Do not fail or return an error — make your best assessment with the available information and indicate which fields were missing by appending a 'missing_fields' array to your JSON output."

This won't completely compensate for missing data, but it will produce a usable output rather than a failed step — and the missing_fields array lets you identify data quality problems over time.

Principle 4 — Use chain-of-thought for complex decisions

For simple classification (qualify/disqualify, positive/negative, category A/B/C), direct prompting works well. For complex multi-factor decisions — lead scoring with six variables, content moderation with nuanced edge cases, anomaly detection in financial data — chain-of-thought prompting produces significantly better results.

Chain-of-thought means asking the model to reason through the problem before producing the final output. Structure it like this: "First, evaluate each of the following criteria separately and assign a sub-score for each: [criteria list]. Then sum the sub-scores to produce a total. Return a JSON object with the individual criteria scores and the total."

This approach has two benefits: better accuracy (the model is less likely to make lazy generalizations when forced to reason step by step) and better debuggability (you can see exactly how the model scored each criterion when reviewing run logs).

Principle 5 — Set temperature intentionally

Temperature controls how deterministic vs creative the model's outputs are. For scoring, classification and data extraction tasks — anything where consistency matters more than creativity — set temperature to 0. The same input will produce the same output every time.

For tasks where some variation is acceptable or desirable — drafting outreach emails, generating content summaries, producing personalized messages — a temperature of 0.3 to 0.7 gives the model creative latitude while keeping outputs grounded and relevant.

In Flowpath, temperature is set in the AI step configuration panel. It defaults to 0.2 — a sensible middle ground, but worth adjusting deliberately for your specific use case.

Other useful guides

Read all articles

How to automate your entire sales pipeline without writing a single line of code

Most sales teams spend 40% of their time on tasks that have nothing to do with selling. This guide walks you through building the exact workflow we use internally at Flowpath, from form submission to Slack notification, in under an hour.

man in gray denim jacket wearing black framed eyeglasses

James Kim

Co-founder & CEO

Apr 15, 2026

8 min

Read

Guide

How to automate your entire sales pipeline without writing a single line of code

James Kim

Co-founder & CEO

Apr 15, 2026

8 min

Read

Guide

Run logs explained: how to debug any Flowpath agent in under 5 minutes

Every agent run generates a complete execution trace, inputs, outputs, latency, errors. Most users glance at the green success indicator and move on. Knowing how to read logs properly will save you hours when something goes wrong.

Diego Kato

Backend Engineer

Mar 18, 2026

7 min

Read

Guide

Run logs explained: how to debug any Flowpath agent in under 5 minutes

Diego Kato

Backend Engineer

Mar 18, 2026

7 min

Read

Guide

Get started today

Stop doing work that shouldn't exist.

Set up your first agent in under 5 minutes. No code, no engineers, no lengthy onboarding required.

Get started for free

Book a demo

No credit card required

Live in under 5 minutes

9K+ teams running

Cancel anytime

Get started today

Stop doing work that shouldn't exist.

Set up your first agent in under 5 minutes. No code, no engineers, no lengthy onboarding required.

Get started for free

Book a demo

No credit card required

Live in under 5 minutes

9K+ teams running

Cancel anytime

Get started today

Stop doing work that shouldn't exist.

Set up your first agent in under 5 minutes. No code, no engineers, no lengthy onboarding required.

Get started for free

Book a demo

No credit card required

Live in under 5 minutes

9K+ teams running

Cancel anytime