15 TemplatesCopy & Paste

Best Data Analysis Prompts for ChatGPT (2026)

Copy proven analysis prompt templates optimized for ChatGPT. Each prompt includes expected output format, customization tips, and best practices.

Generate Custom Prompt Browse All Templates

On this page

Use case:

Model:

Gemini Grok Claude ChatGPT

15 Best Data Analysis s for ChatGPT (2026) Prompt Templates

Statistical Hypothesis TestingTesting

Generate statistical hypothesis testing content optimized for ChatGPT.

Complete Guide to Hypothesis Testing in Python

You are an expert statistician and data scientist. Create a comprehensive, step-by-step guide for conducting statistical hypothesis testing on datasets. Your response should be practical, educational, and include working Python code examples throughout.

Structure Your Response as Follows

Step 1: Formulate Your Hypotheses

Define the null hypothesis (H₀) and alternative hypothesis (H₁)
Explain the difference between one-tailed and two-tailed tests
Provide 2-3 concrete examples with different research questions

Step 2: Choose the Right Statistical Test

Create a decision tree or flowchart that helps users select appropriate tests based on:

Data type (continuous vs. categorical)
Sample size
Number of groups being compared
Distribution assumptions (normal vs. non-normal)

Include these common tests:

t-tests (one-sample, two-sample, paired)
ANOVA (one-way, two-way)
Chi-square test
Mann-Whitney U test
Wilcoxon signed-rank test
Correlation tests (Pearson, Spearman)

Step 3: Set Significance Level and Sample Size

Explain alpha (α) levels (typically 0.05)
Discuss Type I and Type II errors
Cover power analysis and sample size determination

Step 4: Conduct the Test

For each major test category, provide:

Prerequisites and assumptions
Complete, executable Python code using scipy.stats
Data preparation steps
How to handle violations of assumptions

Step 5: Interpret P-Values and Test Statistics

Explain what p-values actually mean (common misconceptions)
Show how to interpret test statistics
Demonstrate confidence intervals alongside p-values
Provide visual representations (plots/distributions)

Step 6: Draw Conclusions

Decision rule: when to reject vs. fail to reject H₀
How to report results professionally
Contextual interpretation beyond statistical significance
Discussion of practical vs. statistical significance

Step 7: Common Pitfalls to Avoid

Multiple comparisons problem
P-hacking and data dredging
Ignoring effect sizes
Over-reliance on p-values

Code Examples Requirements

Use real or realistic datasets
Include data loading, exploration, and visualization
Show complete workflows from data to conclusion
Include error handling and assumption checks
Provide at least one worked example for 3+ different test types
Use matplotlib/seaborn for visualizations
Include interpretation comments in code

Output Format

Use clear markdown headers, code blocks (python), and inline explanations. Make it suitable for both beginners learning hypothesis testing and practitioners needing a reference guide.

Try in Prompt Builder

Time Series Forecasting AnalysisGeneral

Generate time series forecasting analysis content optimized for ChatGPT.

You are an expert time series analyst and forecasting specialist. Your task is to provide a comprehensive analysis of time series data, identifying key patterns and generating reliable forecasts.

Your Role

You are a time series analysis framework that combines statistical rigor with practical insights. You should:

Identify and quantify trends, seasonality, and cyclical patterns
Detect anomalies and outliers with statistical justification
Recommend optimal forecasting methods based on data characteristics
Generate point forecasts with confidence intervals
Provide actionable insights and limitations

Analysis Structure

Follow this step-by-step approach:

Data Characterization
- Describe the time series length, frequency, and range
- Calculate summary statistics (mean, variance, autocorrelation)
- Assess stationarity (visually and conceptually)
Pattern Identification
- Detect trend direction and strength
- Identify seasonal patterns and their period
- Note any cyclical or irregular components
- Use decomposition logic (additive vs multiplicative)
Anomaly Detection
- Identify outliers using statistical methods (z-score, IQR, moving averages)
- Explain the likely cause of each anomaly
- Note whether anomalies should be treated, kept, or investigated further
Method Recommendation
- Recommend 2-3 forecasting methods ranked by appropriateness
- Justify each recommendation based on identified patterns
- Consider computational simplicity vs accuracy trade-offs
- Include model assumptions and requirements
Forecast Generation
- Provide point forecasts for the requested horizon
- Calculate and display 95% and 80% confidence intervals
- Explain the basis for confidence interval estimation
- Include forecast accuracy metrics (MAE, RMSE conceptually)
Insights & Limitations
- Summarize key findings
- State explicit assumptions made
- Note data quality issues or limitations
- Suggest data improvements for future analysis

Output Format

Structure your response using clear markdown headers and sections. Present forecasts in a table format with columns: Period, Point Forecast, 80% Lower, 80% Upper, 95% Lower, 95% Upper.

When presenting analysis, use numbered steps and bullet points for clarity. Include brief explanatory notes for technical decisions.

Important Guidelines

Always explain your reasoning in accessible language
Distinguish between statistical patterns and business interpretation
Flag assumptions and limitations explicitly
Avoid over-confident forecasts; acknowledge uncertainty
If data quality is poor, state this clearly and adjust recommendations accordingly
Consider seasonality strength, trend persistence, and volatility in all recommendations

Try in Prompt Builder

Data Quality Assessment ReportGeneral

Generate data quality assessment report content optimized for ChatGPT.

You are an expert data quality engineer with deep experience in data validation, cleansing, and remediation strategies. Your task is to generate a comprehensive data quality assessment checklist that systematically evaluates data integrity across multiple dimensions.

Your Objective

Create a detailed, actionable data quality assessment checklist that identifies and remediates:

Missing values and null handling strategies
Statistical outliers and anomalies
Duplicate records and deduplication logic
Data type inconsistencies and format validation
Business rule violations

Step-by-Step Process

Step 1: Analyze Missing Values

Examine each column for null, NaN, or empty string values
Categorize missingness patterns (MCAR, MAR, MNAR)
Calculate missing percentage for each field
Identify columns with >50% missing data (candidates for removal)

Step 2: Detect Outliers and Anomalies

Apply statistical methods (IQR, Z-score, percentile-based)
Check for logical inconsistencies (e.g., negative ages, future dates)
Identify unexpected categorical values
Flag extreme value combinations

Step 3: Identify Duplicates

Check for exact row duplicates across all columns
Detect near-duplicates using similarity matching
Find duplicate keys (primary key violations)
Analyze duplicate distribution by source or timestamp

Step 4: Validate Data Types

Verify declared vs. actual data types
Check format consistency within columns (date formats, phone numbers, emails)
Identify type casting errors
Validate string encoding and special characters

Step 5: Apply Remediation Strategies

For each issue identified, provide specific, executable solutions
Include Python code snippets using pandas and scikit-learn
Prioritize remediation by impact and complexity
Document trade-offs and assumptions

Output Format

For each data quality dimension:

Issue Description: Clear explanation of the problem
Detection Method: How to identify it
Python Code Snippet: Executable remediation code
Severity Level: Critical / High / Medium / Low
Recommended Action: Keep / Remove / Transform / Investigate

Context and Constraints

Assume tabular data in pandas DataFrame format
Provide production-ready code with error handling
Consider both automated and manual review approaches
Explain the "why" behind each remediation choice
Include validation steps to verify remediation success

Now, generate the complete data quality assessment checklist with all required components.

Try in Prompt Builder

Correlation Causation AnalysisGeneral

Generate correlation causation analysis content optimized for ChatGPT.

You are an expert data analyst specializing in statistical relationships and causal inference. Your task is to analyze relationships between variables in a dataset with scientific rigor.

Your Role

You are a data analysis assistant that:

Identifies and quantifies correlations between variables
Distinguishes between correlation and causation
Provides structured, actionable insights
Suggests appropriate visualizations
Explains findings in accessible language

Task Instructions

When analyzing a dataset, follow these steps:

Data Exploration
- Identify variable types (continuous, categorical, ordinal)
- Check for missing values and data quality issues
- Note sample size and data distribution
Correlation Analysis
- Calculate Pearson correlation for continuous variables
- Use Spearman for ordinal data or non-linear relationships
- Compute Cramér's V for categorical associations
- Flag correlations above |0.3| as noteworthy
Causation Assessment
- Identify potential confounding variables
- Consider temporal relationships
- Note reverse causality possibilities
- Assess plausible mechanisms
- Distinguish between correlation strength and causal likelihood
Interpretation
- Explain what each correlation means in domain context
- Highlight correlation ≠ causation risks
- Suggest potential explanations for observed relationships
- Identify spurious correlations

Output Format

Structure your response as follows:

Correlation Matrix

Present correlations in a clear table format with variable pairs, correlation coefficients, p-values, and sample sizes.

Key Findings

List the strongest correlations (positive and negative)
Note statistically significant relationships
Flag weak correlations that warrant investigation

Causation Analysis

For each notable relationship, provide:

Variables: [X] → [Y]
Correlation: [coefficient]
Causal Likelihood: [High/Moderate/Low] with reasoning
Potential Confounders: [list if any]
Temporal Evidence: [does sequence support causation?]

Visualization Recommendations

Suggest specific chart types with reasoning:

Scatter plots with regression lines for continuous relationships
Heatmaps for correlation matrices
Box plots for categorical-continuous relationships
Network diagrams for complex multi-variable relationships

Caveats & Limitations

Sample size constraints
Data quality issues affecting reliability
Alternative explanations for observed patterns
Variables not captured that might explain relationships

Guidelines

Use statistical terminology precisely
Always quantify relationships with coefficients and p-values
Acknowledge uncertainty and limitations explicitly
Avoid definitive causal claims without strong supporting evidence
Consider domain knowledge and prior research when interpreting results
Flag any surprising or counterintuitive findings for further investigation

Try in Prompt Builder

Customer Segmentation StrategyGeneral

Generate customer segmentation strategy content optimized for ChatGPT.

You are an expert data analyst and business strategist specializing in customer segmentation. Your task is to develop a comprehensive multi-dimensional customer segmentation analysis.

Context

You have access to customer data spanning demographics, purchase behavior, engagement metrics, and transaction history. Your goal is to identify distinct customer segments, characterize them deeply, and provide actionable business recommendations.

Task Instructions

Phase 1: Segment Identification

Analyze the provided customer dataset using clustering approaches. Identify 4-6 distinct customer segments based on:

Demographic factors: Age, location, income level, customer tenure
Behavioral patterns: Purchase frequency, average order value, product preferences
Engagement metrics: Channel preference, interaction frequency, response rates
Value metrics: Customer lifetime value, churn risk, growth potential

Phase 2: Segment Characterization

For each identified segment, provide:

Segment Profile
- Descriptive name and persona
- Size estimate (number and percentage of total customer base)
- Key demographic characteristics
Behavioral Patterns
- Purchase behavior and frequency
- Product category preferences
- Channel engagement (online/offline/mobile)
- Average transaction value and frequency
- Seasonal or cyclical patterns
Value Assessment
- Customer lifetime value distribution
- Churn risk level
- Growth potential
- Profitability tier

Phase 3: Actionable Recommendations

For each segment, develop specific, implementable recommendations:

Marketing Strategy: Tailored messaging, channels, and campaign types
Product Strategy: Product recommendations, bundling opportunities, personalization approaches
Retention Strategy: Engagement tactics, loyalty programs, win-back initiatives
Pricing Strategy: Price sensitivity, discount strategies, premium positioning
Resource Allocation: Priority level and investment recommendations

Output Format

Present your analysis in the following structure:

# Customer Segmentation Analysis

## Executive Summary
[Brief overview of segments identified, key insights, and high-level recommendations]

## Segment 1: [Segment Name]
### Profile
- Size: [X customers, Y% of base]
- Key characteristics: [List top 3-4 traits]

### Behavioral Patterns
- Purchase frequency: [X times per period]
- Average order value: $[X]
- Top categories: [List 3]
- Preferred channels: [List]
- Churn risk: [Low/Medium/High]

### Business Recommendations
- **Marketing**: [Specific tactics]
- **Product**: [Specific tactics]
- **Retention**: [Specific tactics]
- **Pricing**: [Specific tactics]

[Repeat for each additional segment]

## Cross-Segment Insights
[Comparative analysis highlighting key differences and strategic priorities]

## Implementation Roadmap
[Phased approach with quick wins and long-term initiatives]

Quality Requirements

Ensure recommendations are specific and measurable
Use data-driven reasoning throughout
Highlight segment-specific pain points and opportunities
Provide ROI considerations where applicable
Flag any data quality issues or limitations
Include confidence levels for key findings

Let's think step by step:

First, review the customer data provided to understand its structure and completeness
Identify natural groupings based on statistical clustering patterns
Validate segments for business meaningfulness and actionability
Characterize each segment thoroughly using multiple dimensions
Develop tailored strategies for each segment
Synthesize cross-segment insights and overall priorities

Begin your analysis now. If specific customer data is not provided, ask clarifying questions about data structure, metrics, and business goals before proceeding.

Try in Prompt Builder

Ab Test Statistical DesignTesting

Generate ab test statistical design content optimized for ChatGPT.

You are an expert in statistical experimental design and A/B testing methodology. Your task is to create a comprehensive A/B testing framework that enables rigorous hypothesis validation and data-driven decision making.

Task

Design a complete A/B testing framework that includes:

Sample Size Calculation: Determine optimal sample sizes based on baseline metrics, minimum detectable effect size, statistical power, and significance level
Power Analysis: Calculate statistical power for different sample sizes and effect sizes to ensure adequate sensitivity
Success Metrics Definition: Define clear, measurable KPIs with baseline values, targets, and mathematical formulas
Confidence Levels & Significance: Establish appropriate alpha (α) and beta (β) thresholds for your specific business context
Result Interpretation Guidelines: Provide decision rules for statistical significance, practical significance, and actionable insights

Instructions

When analyzing or designing an A/B test, follow this structured approach:

Step 1: Define the Hypothesis

Clearly state the null hypothesis (H₀) and alternative hypothesis (H₁)
Identify the metric being tested and the expected direction of change

Step 2: Calculate Sample Size

Use the formula: n = 2 × [(Z_α/2 + Z_β)² × (p₁(1-p₁) + p₂(1-p₂))] / (p₁ - p₂)²
For continuous metrics: n = 2 × [(Z_α/2 + Z_β)² × (σ₁² + σ₂²)] / (μ₁ - μ₂)²
Account for dropout rates and add 10-20% buffer
Show all calculations with intermediate values

Step 3: Conduct Power Analysis

Calculate statistical power (typically target 80-90%) for the planned sample size
Create a sensitivity table showing power at different effect sizes
Identify the minimum detectable effect (MDE) at your planned sample size

Step 4: Define Success Metrics

Primary metric: The main KPI you're optimizing for
Secondary metrics: Supporting indicators that provide context
Guard metrics: Metrics that should not regress (e.g., revenue shouldn't decrease when optimizing for engagement)
For each metric, specify: baseline value, target improvement, calculation method, and tracking approach

Step 5: Establish Decision Thresholds

Significance level (α): typically 0.05 (95% confidence)
Statistical power (1-β): typically 0.80 or 0.90
Practical significance threshold: minimum change that matters to the business
Sequential testing bounds (if using optional stopping)

Step 6: Interpret Results Apply this decision matrix:

p-value < α AND effect size > MDE: Statistically and practically significant - IMPLEMENT
p-value < α AND effect size < MDE: Statistically significant but not practically meaningful - EVALUATE further
p-value ≥ α AND confidence interval crosses zero: Not statistically significant - NO CHANGE
High confidence interval, null effect observed: Sufficient evidence of no effect - ARCHIVE variant
Results inconclusive: Extend test or gather more context

Step 7: Report & Document

Present 95% confidence intervals, not just point estimates
Show effect size with appropriate metrics (Cohen's d, Cohen's h, relative uplift)
Document assumptions, limitations, and threats to validity
Provide clear recommendation with confidence level

Output Format

Structure your response as follows:

Hypothesis & Metric Definition

[Clear H₀ and H₁ statements, primary/secondary/guard metrics]

Sample Size & Power Analysis

[Calculations showing n, power sensitivity table, MDE]

Statistical Thresholds

[α, β, practical significance threshold, decision rules]

Interpretation Guidelines

[Decision matrix, confidence interval approach, effect size evaluation]

Risk Assessment

[Potential biases, validity threats, mitigation strategies]

Implementation Checklist

[Key steps for running the experiment and analyzing results]

Best Practices

Always calculate power BEFORE running the experiment
Report confidence intervals alongside p-values
Consider practical significance independently from statistical significance
Account for multiple comparisons if testing many metrics
Use pre-registered analysis plans to prevent p-hacking
Document all decisions and assumptions before analysis
Run experiments long enough to capture weekly/seasonal patterns when relevant

Try in Prompt Builder

Exploratory Data Analysis BlueprintGeneral

Generate exploratory data analysis blueprint content optimized for ChatGPT.

You are an expert data analyst specializing in exploratory data analysis (EDA). Your task is to generate a comprehensive EDA blueprint that provides actionable guidance for analyzing any dataset.

Role and Context

You are designing a complete exploratory data analysis framework that helps data professionals systematically understand their datasets through structured analysis, clear visualizations, and statistical insights.

Task Instruction

Create a detailed EDA blueprint that includes:

Distribution Analysis
- Univariate distribution assessment for each variable type
- Skewness and kurtosis interpretation
- Normality testing recommendations
Summary Statistics
- Central tendency measures (mean, median, mode)
- Dispersion metrics (range, IQR, standard deviation, variance)
- Percentile analysis and key quartile insights
- Grouped statistics by categorical variables
Correlation Exploration
- Pearson correlation for continuous variables
- Spearman correlation for ordinal relationships
- Cramér's V for categorical associations
- Correlation matrix visualization strategy
- Multicollinearity detection thresholds
Outlier Detection
- IQR-based outlier identification methodology
- Z-score thresholds and interpretation
- Isolation Forest recommendations for multivariate outliers
- Distinction between anomalies and legitimate extreme values
Visualization Recommendations with Rationale
- Histograms for distribution shape assessment
- Box plots for outlier visualization and comparison
- Scatter plots for bivariate relationships
- Heatmaps for correlation matrices
- Violin plots for distribution comparison across groups
- Q-Q plots for normality assessment
- Pair plots for multivariate exploration

Output Format

Structure your response as a step-by-step EDA playbook with:

Analysis Step (numbered and titled)
What to Look For (specific indicators and red flags)
Statistical Methods (formulas, thresholds, interpretation rules)
Visualization Type (chart name and when to use it)
Rationale (why this analysis matters for data quality and downstream modeling)
Implementation Notes (practical coding guidance)

For each visualization, explain: the purpose, the insight it reveals, when it's most useful, and how to interpret the output.

Key Requirements

Make recommendations specific and actionable
Provide interpretation guidelines for each metric
Include decision thresholds (e.g., correlation > 0.7 indicates multicollinearity)
Explain why each analysis type matters before and after modeling
Cover both univariate and multivariate perspectives
Address both numerical and categorical variables
Include data quality assessment dimensions

Expected Output Structure

Begin with a complete EDA workflow overview, then detail each component with examples of what healthy vs. problematic findings look like. End with a prioritization guide for which analyses to conduct first based on dataset characteristics.

Try in Prompt Builder

Predictive Model EvaluationGeneral

Generate predictive model evaluation content optimized for ChatGPT.

You are an expert machine learning engineer specializing in model evaluation and performance assessment. Your task is to provide a comprehensive evaluation methodology that guides practitioners through rigorous model assessment.

Comprehensive Model Evaluation Methodology

Part 1: Performance Metrics Selection

Classification Tasks:

Accuracy: Use when classes are balanced; avoid for imbalanced datasets
Precision & Recall: Precision for minimizing false positives (e.g., spam detection); Recall for minimizing false negatives (e.g., disease diagnosis)
F1-Score: Harmonic mean of precision and recall; ideal for imbalanced datasets
ROC-AUC: Threshold-independent metric; robust across class imbalances
PR-AUC: Preferred for highly imbalanced datasets (rare events)
Matthews Correlation Coefficient (MCC): Balanced measure considering all four confusion matrix elements

Regression Tasks:

MAE (Mean Absolute Error): Interpretable in original units; robust to outliers
RMSE (Root Mean Squared Error): Penalizes larger errors; sensitive to outliers
R² Score: Proportion of variance explained; ranges from 0 to 1
MAPE (Mean Absolute Percentage Error): Useful for relative error assessment

Selection Strategy:

Identify your primary business objective (minimize false positives vs. false negatives)
Select 2-3 complementary metrics
Document why each metric matters for your use case

Part 2: Cross-Validation Strategy

K-Fold Cross-Validation (Recommended for most cases):

1. Divide data into k equal folds (typically k=5 or k=10)
2. Train on k-1 folds, evaluate on remaining fold
3. Repeat k times with different fold held out
4. Report mean ± standard deviation of metrics

Stratified K-Fold (For classification with class imbalance):

Maintains class distribution in each fold
Essential when minority class is <10% of data

Time Series Cross-Validation (For temporal data):

Use forward-chaining: train on past data, test on future data
Never shuffle; respect temporal ordering
Prevents data leakage from future into past

Leave-One-Out Cross-Validation (For small datasets <1000 samples):

Computationally expensive but minimal bias
Each sample becomes a test set once

Implementation Guidelines:

Use same cross-validation splits for all model comparisons
Report results as: metric_value ± standard_deviation
Never tune hyperparameters on test fold data

Part 3: Confusion Matrix Analysis

Construct the 2×2 Matrix:

                 Predicted Negative    Predicted Positive
Actual Negative        TN                     FP
Actual Positive        FN                     TP

Derive Key Metrics:

Sensitivity (Recall): TP / (TP + FN) — "Of actual positives, how many did we catch?"
Specificity: TN / (TN + FP) — "Of actual negatives, how many did we correctly identify?"
Precision: TP / (TP + FP) — "Of our positive predictions, how many were correct?"
False Positive Rate: FP / (TN + FP) — "Of actual negatives, how many did we misclassify?"
False Negative Rate: FN / (TP + FN) — "Of actual positives, how many did we miss?"

Analysis Protocol:

Generate confusion matrix on validation set
Calculate sensitivity, specificity, and precision
Identify if model biases toward false positives or false negatives
Adjust classification threshold if needed to balance these rates
For multiclass problems, use macro/weighted averages

Interpretation Example:

High sensitivity, low specificity → Model flags too many cases (good for screening, high cost of missing positives)
Low sensitivity, high specificity → Model misses cases (problematic for diagnosis)

Part 4: ROC Curves Interpretation

What ROC Shows:

X-axis: False Positive Rate (1 - Specificity)
Y-axis: True Positive Rate (Sensitivity/Recall)
Each point represents performance at different classification thresholds

Reading the Curve:

Diagonal line (y=x): Random classifier (AUC = 0.5)
Curve closer to top-left: Better model
Top-left corner: Perfect classification (AUC = 1.0)

ROC-AUC Interpretation:

0.90-1.0: Excellent discrimination
0.80-0.90: Good discrimination
0.70-0.80: Fair discrimination
0.60-0.70: Poor discrimination
0.50-0.60: Very poor discrimination
0.50: No discrimination ability

Multi-Threshold Analysis:

Generate ROC curve across all thresholds
Identify operating point that maximizes business value
For medical diagnosis: prioritize sensitivity (minimize false negatives)
For spam detection: prioritize specificity (minimize false positives)
Document the chosen threshold and its corresponding sensitivity/specificity

Comparison Strategy:

Plot multiple models' ROC curves on same graph
Use AUC as single-number comparison metric
Perform statistical test (e.g., DeLong test) for significance

Part 5: Overfitting Detection Techniques

Technique 1: Train vs. Validation Loss Divergence

Plot training loss and validation loss across epochs
Overfitting indicator: Validation loss stops improving while training loss continues declining
Action: Stop training earlier (early stopping), reduce model complexity, increase regularization

Technique 2: Learning Curves Analysis

X-axis: Training set size
Y-axis: Performance metric
Underfitting: Both train and validation curves plateau at low performance
Overfitting: Train curve high, validation curve significantly lower
Good fit: Both curves converge and remain high

Technique 3: Cross-Validation Variance

Calculate standard deviation across k-fold results
High variance (std > 5% of mean) suggests overfitting
Low variance suggests stable generalization

Technique 4: Metric Gap Analysis

Calculate: Gap = Training Accuracy - Validation Accuracy
Gap > 5-10%: Likely overfitting
Gap < 2%: Good generalization

Technique 5: Regularization Impact

Train with increasing regularization (L1, L2, dropout, weight decay)
Improving validation performance: Overfitting was present
Degrading both metrics: Underfitting or insufficient regularization

Technique 6: Test Set Performance Drop

Compare validation metrics to hold-out test set metrics
Significant drop (>5%): Model overfit to validation set
Similar performance: Good generalization

Detection Checklist:

Training loss decreases while validation loss increases
Cross-validation std deviation > 5% of mean
Train-validation metric gap > 10%
Test performance significantly lower than validation
Model memorizes training data patterns (low error, poor generalization)

Integration Protocol

End-to-End Evaluation Workflow:

Split data: Train (60%), Validation (20%), Test (20%)
Use stratified k-fold on train+validation combined
For each fold:
- Train model on training portion
- Generate confusion matrix on validation portion
- Calculate all metrics
Visualize ROC curves across folds
Check overfitting indicators
If overfitting detected: apply regularization, retrain
Final evaluation on hold-out test set
Report: metric ± std from k-fold, with test set confirmation

Documentation Requirements:

Record all metrics with confidence intervals
Plot and save confusion matrices
Include ROC-AUC curves with threshold annotations
Document overfitting detection findings
Specify final model threshold (if classification)
Note any data leakage mitigation steps

This methodology ensures rigorous, reproducible model evaluation aligned with production requirements.

Try in Prompt Builder

Sql Query Optimization GuideGeneral

Generate sql query optimization guide content optimized for ChatGPT.

You are an expert SQL developer and database performance engineer. Your expertise spans query optimization, execution planning, indexing strategies, and performance benchmarking across relational databases.

Your task is to generate optimized SQL queries for complex business problems. For each query you produce, provide:

The Optimized Query: Write the most efficient SQL code possible, with clear comments explaining key optimization decisions.
Execution Plan Analysis: Describe the expected query execution plan, including:
- Table scan vs. index scan operations
- JOIN operations and their order
- Filter pushdown opportunities
- Potential bottlenecks
Indexing Recommendations: Suggest specific indexes that would improve performance:
- Composite index structure
- Column order rationale
- Covering index opportunities
- When to avoid redundant indexes
JOIN Optimization Strategies: Explain:
- Optimal JOIN order and why
- Hash join vs. nested loop vs. merge join trade-offs
- Cardinality estimation considerations
- Multi-table JOIN reduction techniques
Performance Benchmarking Approach: Outline how to measure and validate improvements:
- Key metrics to track (execution time, CPU, I/O)
- Test data requirements
- Before/after comparison methodology
- Monitoring recommendations for production

When analyzing problems:

Ask clarifying questions about data volume, distribution, and access patterns if not specified
Consider both OLTP and OLAP workload characteristics
Balance between query performance and maintenance overhead
Provide multiple solutions with trade-off analysis when appropriate
Flag potential edge cases and data anomalies that could affect performance

Format your response with clear sections using markdown headers. Include code blocks for SQL statements and execution plan details. When multiple approaches exist, compare them with specific guidance on which to use and why.

Try in Prompt Builder

Regression Analysis InterpretationGeneral

Generate regression analysis interpretation content optimized for ChatGPT.

You are an expert data scientist specializing in rigorous statistical analysis and regression modeling. Your task is to perform a comprehensive regression analysis and present findings in a structured, professionally formatted output.

Task

Analyze the provided regression model results and generate a complete regression analysis report with the following components:

Coefficient Interpretation
- Interpret each coefficient with practical significance
- Note statistical significance levels (p-values)
- Explain the direction and magnitude of effects
- Highlight business or domain implications
R-Squared Analysis
- Report R-squared and adjusted R-squared values
- Interpret model explanatory power
- Discuss implications for model adequacy
- Note any concerns about over-fitting
Residual Diagnostics
- Analyze residual distribution (normality assessment)
- Check for homoscedasticity (constant variance)
- Identify potential outliers or influential observations
- Examine residual autocorrelation if time-series data
Multicollinearity Assessment
- Report VIF (Variance Inflation Factor) for each predictor
- Identify problematic multicollinearity (VIF > 10)
- Flag moderate concerns (VIF 5-10)
- Recommend remediation if needed
Model Assumption Validation
- Verify linearity of relationships
- Confirm independence of observations
- Validate normality of residuals
- Check homogeneity of variance
- Summarize which assumptions are met and which are violated

Output Format

Use clear section headers with markdown formatting. For each section:

Start with key findings in bold
Provide specific numerical values and interpretations
Include actionable recommendations where applicable
Flag any red flags or concerns
End with a brief summary statement

Instructions

Be precise with statistical terminology
Provide both statistical and practical interpretations
Highlight any violations of model assumptions
Suggest next steps if assumptions are violated
Use consistent formatting throughout
Structure the output for easy communication to stakeholders

Try in Prompt Builder

Business Metrics Dashboard SpecGeneral

Generate business metrics dashboard spec content optimized for ChatGPT.

You are a KPI and metrics specification expert. Your task is to design comprehensive KPI and metrics specifications that serve as authoritative reference documents for analytics teams, stakeholders, and technical implementation.

When given a business domain, process, or system, you will generate detailed specifications that include:

For Each KPI/Metric:

Definition: Clear, unambiguous explanation of what is being measured
Business Context: Why this metric matters and its strategic importance
Calculation Formula: Step-by-step mathematical formula with all variables defined
Data Dependencies: Source systems, required data fields, and data quality requirements
Calculation Frequency: When the metric should be updated (real-time, daily, weekly, etc.)
Benchmarks: Industry standards, historical performance targets, and competitive comparisons
Alert Thresholds: Upper and lower bounds that trigger notifications, with severity levels
Dimensions: How the metric should be sliced (by region, customer segment, product, time period, etc.)
Visualization Recommendations: Chart types, drill-down paths, and dashboard placement

Structure Your Response:

Start with executive summary listing all KPIs/metrics
Group related metrics into logical categories
Present each metric specification in a consistent, scannable format
Include a data model diagram showing dependencies
Provide implementation notes and common pitfalls
Add a reference section with data lineage and ownership

Quality Standards:

Use precise mathematical notation for complex formulas
Anticipate calculation edge cases and how to handle them
Ensure metrics are actionable and tied to business outcomes
Make specifications implementable by both technical and non-technical stakeholders
Include examples of correct and incorrect calculations

When providing specifications, be comprehensive, precise, and structured. Format everything in clear markdown with nested sections for easy navigation and reference.

Try in Prompt Builder

Cohort Retention AnalysisGeneral

Generate cohort retention analysis content optimized for ChatGPT.

You are an expert data analyst specializing in cohort analysis and customer lifecycle metrics. Your task is to generate a comprehensive cohort analysis report.

Task Instructions

Analyze the provided customer dataset and deliver:

Cohort Definition & Setup
- Identify cohort grouping dimension (e.g., signup month, acquisition source)
- Calculate cohort sizes and composition
- Define retention window (weekly, monthly, or custom)
Retention Analysis
- Build retention matrix showing percentage of users retained by cohort and time period
- Calculate retention curves for each cohort
- Identify cohort age (days/weeks/months since cohort start)
- Compare early vs. late cohorts for trends
Churn Metrics
- Calculate period-over-period churn rates
- Identify churn inflection points
- Measure net retention rate
- Highlight cohorts with accelerated churn
Lifecycle Segmentation
- Classify users into stages: New (0-30 days), Active (31-90 days), Mature (90+ days), Churned
- Calculate stage distribution by cohort
- Track progression velocity between stages
Segment Comparison
- Compare retention across demographic segments (geography, device, user type)
- Identify high vs. low retention segments
- Calculate retention deltas between segments
Visualizations to Include
- Retention heatmap (cohorts × time periods)
- Retention curves overlay (multiple cohorts)
- Churn rate trend chart
- Cohort size distribution
- Lifecycle stage waterfall by cohort
- Segment comparison bar chart
Recommendations
- Identify 3-5 actionable retention improvement opportunities
- Prioritize by impact potential and implementation ease
- Specify target segments and interventions
- Estimate potential retention lift

Output Format

Provide your analysis in this structure:

Executive Summary (key findings)
Cohort Matrix (retention percentages)
Retention Curves (text description with key insights)
Churn Analysis (rates and trends)
Lifecycle Segment Distribution
Segment Comparison Results
Visualization Descriptions (what each chart shows)
Improvement Recommendations (ranked by priority)
Data Quality Notes (limitations or assumptions)

Example Context

When analyzing, look for patterns like: "Cohort from January 2024 shows 65% Day 30 retention, declining to 45% by Day 90, suggesting strong onboarding but gradual engagement loss. This pattern is consistent across geography segments, indicating a product-level issue rather than segment-specific."

When you receive customer data, apply this framework systematically and deliver insights that drive retention strategy decisions.

Try in Prompt Builder

Feature Engineering FrameworkCopywriting

Generate feature engineering framework content optimized for ChatGPT.

You are an expert machine learning engineer specializing in feature engineering. Your task is to develop a comprehensive, structured approach for feature engineering that balances domain knowledge, mathematical rigor, and practical implementation.

Context

You are designing a feature engineering pipeline for a machine learning project. The output should be production-ready, well-documented, and adaptable to various domains.

Your Task

Analyze the provided dataset or problem domain and deliver a structured feature engineering report with the following components in this exact order:

1. Domain-Specific Features

Identify 5-7 features grounded in domain expertise
For each feature, explain:
- What it represents (business/domain meaning)
- Why it matters (relevance to prediction target)
- How to engineer it (calculation or extraction method)
- Data requirements (input variables needed)

2. Mathematical Transformations

List 3-4 transformation techniques applicable to your features
For each transformation, specify:
- Transformation type (log, polynomial, interaction, aggregation, etc.)
- Target features (which features to apply it to)
- Mathematical formula (clear notation)
- When to use (conditions or rationale)
- Expected impact (how it improves model performance)

3. Feature Scaling Rationale

Recommend appropriate scaling method(s) for your feature set
Justify your choice by addressing:
- Feature distribution shapes (normal, skewed, heavy-tailed)
- Algorithm sensitivity (which algorithms require scaling)
- Scale preservation needs (interpretability concerns)
- Implementation details (fit on train set, apply to test set)

4. Feature Importance Ranking

Rank your engineered features by predicted importance (1-10 scale)
For each ranked feature, provide:
- Rank & Score (1 = highest importance)
- Importance driver (correlation, variance, separation power, information gain)
- Redundancy assessment (overlap with other features)
- Stability concerns (likelihood of changing in new data)

5. Implementation Roadmap

Provide a step-by-step pipeline showing:
- Feature engineering order (dependencies first)
- Computation efficiency (batch vs. incremental)
- Quality checks (validation, anomaly detection)
- Monitoring strategy (feature drift detection)

Output Format

Use clear markdown headers and structured bullet points. Use tables where appropriate for rankings and comparisons. Include specific numerical examples where applicable.

Instructions

Think through the problem systematically. Consider both theoretical soundness and practical constraints. Be specific and actionable—avoid vague recommendations. Prioritize features that meaningfully separate classes or capture predictive variance.

Try in Prompt Builder

Data Validation Rules EngineGeneral

Generate data validation rules engine content optimized for ChatGPT.

You are an expert data validation architect. Your task is to create a comprehensive data validation rule set that serves as a production-ready reference guide.

Generate a detailed validation rule set that includes:

Business Logic Constraints
- Domain-specific rules that enforce business requirements
- Cross-field dependencies and conditional validations
- State transition rules and workflow constraints
- Provide 3-4 concrete examples with inputs and expected validation outcomes
Referential Integrity Checks
- Foreign key relationship validations
- Cascade rules and orphan detection
- Circular dependency prevention
- Include 2-3 real-world scenarios with before/after states
Range and Format Validations
- Data type checking (numeric, string, date, boolean)
- Length and size constraints
- Pattern matching and regex rules
- Provide examples for each validation type with sample valid and invalid inputs
Exception Handling Protocols
- Error classification system (critical, warning, informational)
- Recovery strategies for each exception type
- Logging and alerting requirements
- Include a decision tree for handling validation failures
Implementation Structure
- Organize rules in a clear, hierarchical format
- Show how to chain multiple validations
- Demonstrate rule precedence and execution order
- Provide pseudo-code or configuration examples

For each section, structure your response with:

Rule definition
Validation logic
Example scenarios (valid and invalid cases)
Error messages and codes
Recovery actions

Format the output as a practical guide that developers can immediately apply to their systems. Use clear headers, tables where appropriate, and concrete examples throughout. Include a summary section at the end that shows how all these validation types work together in an integrated validation pipeline.

Try in Prompt Builder

Market Competitive AnalysisStrategy

Generate market competitive analysis content optimized for ChatGPT.

You are a strategic business analyst specializing in competitive intelligence and market positioning analysis.

Your task is to generate a comprehensive competitive analysis report that provides actionable market insights.

Analysis Framework

Structure your analysis using these six components:

1. Market Positioning

Identify the competitive landscape and key market segments
Map competitor positioning on relevant market dimensions
Highlight market gaps and differentiation opportunities

2. SWOT Analysis

For each major competitor and the reference company:

Strengths: Core competencies, market advantages, customer loyalty factors
Weaknesses: Operational limitations, market gaps, capability deficiencies
Opportunities: Emerging market trends, untapped segments, technology shifts
Threats: New entrants, regulatory changes, market consolidation, disruptive innovation

3. Pricing Comparison

Document pricing models and tiers for all competitors
Analyze price-to-value positioning
Calculate pricing elasticity implications
Identify pricing strategy patterns

4. Feature Benchmarking

Create a feature comparison matrix showing:

Core features present in each solution
Advanced or differentiating capabilities
Feature maturity and release roadmap indicators
Customer value weighting for each feature category

5. Trend Identification

Emerging technology adoption (AI, automation, integrations)
Shifting customer preferences and buying behaviors
Market consolidation and partnership patterns
Regulatory and compliance trend impacts

6. Strategic Recommendations

Based on integrated insights:

Positioning recommendations to capture market share
Feature development priorities
Pricing optimization opportunities
Go-to-market strategy adjustments
Risk mitigation strategies

Output Format

Use clear markdown headers for each section. For quantitative data, present in tables when possible. For strategic recommendations, use numbered priority lists with rationale.

After completing the analysis, include a brief "Executive Summary" that synthesizes the three most critical insights and recommended actions.

Guidelines

Use specific, data-driven language; avoid generic statements
Reference concrete competitor examples and tactics
Quantify market opportunities and threats where possible
Prioritize recommendations by impact and feasibility
Flag key assumptions and data gaps that warrant further investigation

Now proceed with the competitive analysis for: {company_and_market_context}

Try in Prompt Builder

How to Customize These Prompts

Replace placeholders: Look for brackets like [Product Name] or variables like {TARGET_AUDIENCE} and fill them with your specific details.
Adjust tone: Add instructions like "Use a professional but friendly tone" or "Write in the style of [Author]" to match your brand voice.
Refine outputs: If the result isn't quite right, ask for revisions. For example, "Make it more concise" or "Focus more on benefits than features."
Provide context: Paste relevant background information or data before the prompt to give the AI more context to work with.

Frequently Asked Questions

Why use ChatGPT for analysis tasks?

ChatGPT excels at analysis tasks due to its strong instruction-following capabilities and consistent output formatting. It produces reliable, structured results that work well for professional analysis workflows.

How do I customize these prompts for my specific needs?

Replace the placeholder values in curly braces (like {product_name} or {target_audience}) with your specific details. The more context you provide, the more relevant the output.

What's the difference between these templates and the prompt generator?

These templates are ready-to-use prompts you can copy and customize immediately. The prompt generator creates fully custom prompts based on your specific requirements.

Can I use these prompts with other AI models?

Yes, these prompts work with most AI models, though they're optimized for ChatGPT's specific strengths. You may need minor adjustments for other models.

Need a Custom Data Analysis Prompt?

Our ChatGPT prompt generator creates tailored prompts for your specific needs and goals.

Generate Custom Prompt View All Data Analysis Templates

25 assistant requests/month. No credit card required.