Write step-by-step data cleaning instructions for a messy spreadsheet

Data & Analysis data-cleaningspreadsheetdata-prep

Raw exports from CRMs, survey tools, or ERPs are almost always dirty. This prompt produces a concrete, ordered cleaning checklist tailored to the specific problems in your dataset.

Prompt

You are a data analyst who specializes in preparing raw data for analysis. I have a messy spreadsheet that needs cleaning before I can use it.

Spreadsheet description: {{SPREADSHEET_DESCRIPTION}}
Known problems (list everything you've noticed): {{KNOWN_PROBLEMS}}
Tool I'm using (e.g., Excel, Google Sheets, Python/pandas, R): {{TOOL}}
Final goal for this data (e.g., pivot table, chart, export to database): {{FINAL_GOAL}}

Follow these steps:
1. Group the problems I listed into categories: structural issues (wrong shape, merged cells, headers in wrong row), data-type issues (dates stored as text, numbers with currency symbols), consistency issues (inconsistent naming, mixed cases), and completeness issues (blanks, nulls).
2. For each problem, write a specific, numbered cleaning step with the exact formula, function, or code snippet needed in {{TOOL}}.
3. Order the steps so that each one does not break a later step (e.g., fix structure before applying formulas).
4. Add a validation check after each major step so I can confirm it worked before moving on.
5. Flag any problems that cannot be fixed programmatically and require a manual review or a decision from the data owner.

Do not suggest steps that require tools or permissions I have not mentioned.

Variables to fill in

{{SPREADSHEET_DESCRIPTION}}
{{KNOWN_PROBLEMS}}
{{TOOL}}
{{FINAL_GOAL}}

How to use this prompt

Copy the prompt above (Copy button on the top-right).
Replace each {{VAR}} with your own value. Variables: {{SPREADSHEET_DESCRIPTION}}{{KNOWN_PROBLEMS}}{{TOOL}}{{FINAL_GOAL}}.
Paste it into one of the recommended tools below.
Iterate: tighten constraints in the prompt if the output is generic.

Why this prompt is structured this way

The prompt is split into explicit steps because LLMs do better when the path is named, not implied. Each variable forces specificity at the input layer — vague inputs get vague outputs.

Heads up: some of the links on this page are affiliate links — meaning we may earn a commission if you sign up after clicking, at no extra cost to you. We only recommend tools we'd put on our own stack. You can see our full affiliate disclosure here.

Pair this prompt with a tool

Perplexity

$0/mo (Pro at $20)

AI search engine with citations.

Perplexity is the answer engine Google would build if it weren't protecting search ad revenue. Cited answers, follow-up questions, focused source modes.

learningdata

Claude (Anthropic)

$0/mo (Pro at $20)

Frontier model with long context and strong reasoning.

Claude (Opus / Sonnet / Haiku tiers) is the assistant favored by writers and engineers who care about reasoning quality and tone. 1M token context on Opus.

writingcodinglearning

ChatGPT (OpenAI)

$0/mo (Plus at $20)

The category-defining general-purpose AI assistant.

ChatGPT has the broadest feature surface: image gen, voice, custom GPTs, web browsing, code execution. Often the right default; sometimes beaten on specific tasks by Claude or Perplexity.

writingcodinglearning

The PlaybookPrompts weekly

One short email per week. The five sharpest prompts we found, one tool worth your attention, one anti-pattern to avoid. Unsubscribe in one click.

Newsletter signup is not configured. Set PUBLIC_NEWSLETTER_USERNAME in the build env.