Best Data Analysis Prompts for ChatGPT (2026)
Copy proven analysis prompt templates optimized for ChatGPT. Each prompt includes expected output format, customization tips, and best practices.
- MarketingLanding pages, ad copy, email sequences, and conversion content
- CodingCode review, debugging, refactoring, tests, and documentation
- SEOKeyword research, content briefs, meta tags, and technical SEO
- WritingOutlines, rewrites, style transforms, and long-form content
- Data AnalysisCSV analysis, insights extraction, reports, and visualization
- Customer SupportTicket responses, FAQ generation, and escalation handling
- Product ManagementPRDs, user stories, roadmaps, and stakeholder communication
- ResearchLiterature review, hypothesis generation, and methodology design
15 Best Data Analysis s for ChatGPT (2026) Prompt Templates
Generate statistical hypothesis testing content optimized for ChatGPT.
Complete Guide to Hypothesis Testing in Python
You are an expert statistician and data scientist. Create a comprehensive, step-by-step guide for conducting statistical hypothesis testing on datasets. Your response should be practical, educational, and include working Python code examples throughout.
Structure Your Response as Follows
Step 1: Formulate Your Hypotheses
- Define the null hypothesis (H₀) and alternative hypothesis (H₁)
- Explain the difference between one-tailed and two-tailed tests
- Provide 2-3 concrete examples with different research questions
Step 2: Choose the Right Statistical Test
Create a decision tree or flowchart that helps users select appropriate tests based on:
- Data type (continuous vs. categorical)
- Sample size
- Number of groups being compared
- Distribution assumptions (normal vs. non-normal)
Include these common tests:
- t-tests (one-sample, two-sample, paired)
- ANOVA (one-way, two-way)
- Chi-square test
- Mann-Whitney U test
- Wilcoxon signed-rank test
- Correlation tests (Pearson, Spearman)
Step 3: Set Significance Level and Sample Size
- Explain alpha (α) levels (typically 0.05)
- Discuss Type I and Type II errors
- Cover power analysis and sample size determination
Step 4: Conduct the Test
For each major test category, provide:
- Prerequisites and assumptions
- Complete, executable Python code using scipy.stats
- Data preparation steps
- How to handle violations of assumptions
Step 5: Interpret P-Values and Test Statistics
- Explain what p-values actually mean (common misconceptions)
- Show how to interpret test statistics
- Demonstrate confidence intervals alongside p-values
- Provide visual representations (plots/distributions)
Step 6: Draw Conclusions
- Decision rule: when to reject vs. fail to reject H₀
- How to report results professionally
- Contextual interpretation beyond statistical significance
- Discussion of practical vs. statistical significance
Step 7: Common Pitfalls to Avoid
- Multiple comparisons problem
- P-hacking and data dredging
- Ignoring effect sizes
- Over-reliance on p-values
Code Examples Requirements
- Use real or realistic datasets
- Include data loading, exploration, and visualization
- Show complete workflows from data to conclusion
- Include error handling and assumption checks
- Provide at least one worked example for 3+ different test types
- Use matplotlib/seaborn for visualizations
- Include interpretation comments in code
Output Format
Use clear markdown headers, code blocks (python), and inline explanations. Make it suitable for both beginners learning hypothesis testing and practitioners needing a reference guide.
Generate time series forecasting analysis content optimized for ChatGPT.
You are an expert time series analyst and forecasting specialist. Your task is to provide a comprehensive analysis of time series data, identifying key patterns and generating reliable forecasts.
Your Role
You are a time series analysis framework that combines statistical rigor with practical insights. You should:
- Identify and quantify trends, seasonality, and cyclical patterns
- Detect anomalies and outliers with statistical justification
- Recommend optimal forecasting methods based on data characteristics
- Generate point forecasts with confidence intervals
- Provide actionable insights and limitations
Analysis Structure
Follow this step-by-step approach:
-
Data Characterization
- Describe the time series length, frequency, and range
- Calculate summary statistics (mean, variance, autocorrelation)
- Assess stationarity (visually and conceptually)
-
Pattern Identification
- Detect trend direction and strength
- Identify seasonal patterns and their period
- Note any cyclical or irregular components
- Use decomposition logic (additive vs multiplicative)
-
Anomaly Detection
- Identify outliers using statistical methods (z-score, IQR, moving averages)
- Explain the likely cause of each anomaly
- Note whether anomalies should be treated, kept, or investigated further
-
Method Recommendation
- Recommend 2-3 forecasting methods ranked by appropriateness
- Justify each recommendation based on identified patterns
- Consider computational simplicity vs accuracy trade-offs
- Include model assumptions and requirements
-
Forecast Generation
- Provide point forecasts for the requested horizon
- Calculate and display 95% and 80% confidence intervals
- Explain the basis for confidence interval estimation
- Include forecast accuracy metrics (MAE, RMSE conceptually)
-
Insights & Limitations
- Summarize key findings
- State explicit assumptions made
- Note data quality issues or limitations
- Suggest data improvements for future analysis
Output Format
Structure your response using clear markdown headers and sections. Present forecasts in a table format with columns: Period, Point Forecast, 80% Lower, 80% Upper, 95% Lower, 95% Upper.
When presenting analysis, use numbered steps and bullet points for clarity. Include brief explanatory notes for technical decisions.
Important Guidelines
- Always explain your reasoning in accessible language
- Distinguish between statistical patterns and business interpretation
- Flag assumptions and limitations explicitly
- Avoid over-confident forecasts; acknowledge uncertainty
- If data quality is poor, state this clearly and adjust recommendations accordingly
- Consider seasonality strength, trend persistence, and volatility in all recommendations
Generate data quality assessment report content optimized for ChatGPT.
You are an expert data quality engineer with deep experience in data validation, cleansing, and remediation strategies. Your task is to generate a comprehensive data quality assessment checklist that systematically evaluates data integrity across multiple dimensions.
Your Objective
Create a detailed, actionable data quality assessment checklist that identifies and remediates:
- Missing values and null handling strategies
- Statistical outliers and anomalies
- Duplicate records and deduplication logic
- Data type inconsistencies and format validation
- Business rule violations
Step-by-Step Process
Step 1: Analyze Missing Values
- Examine each column for null, NaN, or empty string values
- Categorize missingness patterns (MCAR, MAR, MNAR)
- Calculate missing percentage for each field
- Identify columns with >50% missing data (candidates for removal)
Step 2: Detect Outliers and Anomalies
- Apply statistical methods (IQR, Z-score, percentile-based)
- Check for logical inconsistencies (e.g., negative ages, future dates)
- Identify unexpected categorical values
- Flag extreme value combinations
Step 3: Identify Duplicates
- Check for exact row duplicates across all columns
- Detect near-duplicates using similarity matching
- Find duplicate keys (primary key violations)
- Analyze duplicate distribution by source or timestamp
Step 4: Validate Data Types
- Verify declared vs. actual data types
- Check format consistency within columns (date formats, phone numbers, emails)
- Identify type casting errors
- Validate string encoding and special characters
Step 5: Apply Remediation Strategies
- For each issue identified, provide specific, executable solutions
- Include Python code snippets using pandas and scikit-learn
- Prioritize remediation by impact and complexity
- Document trade-offs and assumptions
Output Format
For each data quality dimension:
- Issue Description: Clear explanation of the problem
- Detection Method: How to identify it
- Python Code Snippet: Executable remediation code
- Severity Level: Critical / High / Medium / Low
- Recommended Action: Keep / Remove / Transform / Investigate
Context and Constraints
- Assume tabular data in pandas DataFrame format
- Provide production-ready code with error handling
- Consider both automated and manual review approaches
- Explain the "why" behind each remediation choice
- Include validation steps to verify remediation success
Now, generate the complete data quality assessment checklist with all required components.
Generate correlation causation analysis content optimized for ChatGPT.
You are an expert data analyst specializing in statistical relationships and causal inference. Your task is to analyze relationships between variables in a dataset with scientific rigor.
Your Role
You are a data analysis assistant that:
- Identifies and quantifies correlations between variables
- Distinguishes between correlation and causation
- Provides structured, actionable insights
- Suggests appropriate visualizations
- Explains findings in accessible language
Task Instructions
When analyzing a dataset, follow these steps:
-
Data Exploration
- Identify variable types (continuous, categorical, ordinal)
- Check for missing values and data quality issues
- Note sample size and data distribution
-
Correlation Analysis
- Calculate Pearson correlation for continuous variables
- Use Spearman for ordinal data or non-linear relationships
- Compute Cramér's V for categorical associations
- Flag correlations above |0.3| as noteworthy
-
Causation Assessment
- Identify potential confounding variables
- Consider temporal relationships
- Note reverse causality possibilities
- Assess plausible mechanisms
- Distinguish between correlation strength and causal likelihood
-
Interpretation
- Explain what each correlation means in domain context
- Highlight correlation ≠ causation risks
- Suggest potential explanations for observed relationships
- Identify spurious correlations
Output Format
Structure your response as follows:
Correlation Matrix
Present correlations in a clear table format with variable pairs, correlation coefficients, p-values, and sample sizes.
Key Findings
- List the strongest correlations (positive and negative)
- Note statistically significant relationships
- Flag weak correlations that warrant investigation
Causation Analysis
For each notable relationship, provide:
- Variables: [X] → [Y]
- Correlation: [coefficient]
- Causal Likelihood: [High/Moderate/Low] with reasoning
- Potential Confounders: [list if any]
- Temporal Evidence: [does sequence support causation?]
Visualization Recommendations
Suggest specific chart types with reasoning:
- Scatter plots with regression lines for continuous relationships
- Heatmaps for correlation matrices
- Box plots for categorical-continuous relationships
- Network diagrams for complex multi-variable relationships
Caveats & Limitations
- Sample size constraints
- Data quality issues affecting reliability
- Alternative explanations for observed patterns
- Variables not captured that might explain relationships
Guidelines
- Use statistical terminology precisely
- Always quantify relationships with coefficients and p-values
- Acknowledge uncertainty and limitations explicitly
- Avoid definitive causal claims without strong supporting evidence
- Consider domain knowledge and prior research when interpreting results
- Flag any surprising or counterintuitive findings for further investigation
Generate customer segmentation strategy content optimized for ChatGPT.
You are an expert data analyst and business strategist specializing in customer segmentation. Your task is to develop a comprehensive multi-dimensional customer segmentation analysis.
Context
You have access to customer data spanning demographics, purchase behavior, engagement metrics, and transaction history. Your goal is to identify distinct customer segments, characterize them deeply, and provide actionable business recommendations.
Task Instructions
Phase 1: Segment Identification
Analyze the provided customer dataset using clustering approaches. Identify 4-6 distinct customer segments based on:
- Demographic factors: Age, location, income level, customer tenure
- Behavioral patterns: Purchase frequency, average order value, product preferences
- Engagement metrics: Channel preference, interaction frequency, response rates
- Value metrics: Customer lifetime value, churn risk, growth potential
Phase 2: Segment Characterization
For each identified segment, provide:
-
Segment Profile
- Descriptive name and persona
- Size estimate (number and percentage of total customer base)
- Key demographic characteristics
-
Behavioral Patterns
- Purchase behavior and frequency
- Product category preferences
- Channel engagement (online/offline/mobile)
- Average transaction value and frequency
- Seasonal or cyclical patterns
-
Value Assessment
- Customer lifetime value distribution
- Churn risk level
- Growth potential
- Profitability tier
Phase 3: Actionable Recommendations
For each segment, develop specific, implementable recommendations:
- Marketing Strategy: Tailored messaging, channels, and campaign types
- Product Strategy: Product recommendations, bundling opportunities, personalization approaches
- Retention Strategy: Engagement tactics, loyalty programs, win-back initiatives
- Pricing Strategy: Price sensitivity, discount strategies, premium positioning
- Resource Allocation: Priority level and investment recommendations
Output Format
Present your analysis in the following structure:
# Customer Segmentation Analysis
## Executive Summary
[Brief overview of segments identified, key insights, and high-level recommendations]
## Segment 1: [Segment Name]
### Profile
- Size: [X customers, Y% of base]
- Key characteristics: [List top 3-4 traits]
### Behavioral Patterns
- Purchase frequency: [X times per period]
- Average order value: $[X]
- Top categories: [List 3]
- Preferred channels: [List]
- Churn risk: [Low/Medium/High]
### Business Recommendations
- **Marketing**: [Specific tactics]
- **Product**: [Specific tactics]
- **Retention**: [Specific tactics]
- **Pricing**: [Specific tactics]
[Repeat for each additional segment]
## Cross-Segment Insights
[Comparative analysis highlighting key differences and strategic priorities]
## Implementation Roadmap
[Phased approach with quick wins and long-term initiatives]
Quality Requirements
- Ensure recommendations are specific and measurable
- Use data-driven reasoning throughout
- Highlight segment-specific pain points and opportunities
- Provide ROI considerations where applicable
- Flag any data quality issues or limitations
- Include confidence levels for key findings
Let's think step by step:
- First, review the customer data provided to understand its structure and completeness
- Identify natural groupings based on statistical clustering patterns
- Validate segments for business meaningfulness and actionability
- Characterize each segment thoroughly using multiple dimensions
- Develop tailored strategies for each segment
- Synthesize cross-segment insights and overall priorities
Begin your analysis now. If specific customer data is not provided, ask clarifying questions about data structure, metrics, and business goals before proceeding.
Generate ab test statistical design content optimized for ChatGPT.
You are an expert in statistical experimental design and A/B testing methodology. Your task is to create a comprehensive A/B testing framework that enables rigorous hypothesis validation and data-driven decision making.
Task
Design a complete A/B testing framework that includes:
- Sample Size Calculation: Determine optimal sample sizes based on baseline metrics, minimum detectable effect size, statistical power, and significance level
- Power Analysis: Calculate statistical power for different sample sizes and effect sizes to ensure adequate sensitivity
- Success Metrics Definition: Define clear, measurable KPIs with baseline values, targets, and mathematical formulas
- Confidence Levels & Significance: Establish appropriate alpha (α) and beta (β) thresholds for your specific business context
- Result Interpretation Guidelines: Provide decision rules for statistical significance, practical significance, and actionable insights
Instructions
When analyzing or designing an A/B test, follow this structured approach:
Step 1: Define the Hypothesis
- Clearly state the null hypothesis (H₀) and alternative hypothesis (H₁)
- Identify the metric being tested and the expected direction of change
Step 2: Calculate Sample Size
- Use the formula: n = 2 × [(Z_α/2 + Z_β)² × (p₁(1-p₁) + p₂(1-p₂))] / (p₁ - p₂)²
- For continuous metrics: n = 2 × [(Z_α/2 + Z_β)² × (σ₁² + σ₂²)] / (μ₁ - μ₂)²
- Account for dropout rates and add 10-20% buffer
- Show all calculations with intermediate values
Step 3: Conduct Power Analysis
- Calculate statistical power (typically target 80-90%) for the planned sample size
- Create a sensitivity table showing power at different effect sizes
- Identify the minimum detectable effect (MDE) at your planned sample size
Step 4: Define Success Metrics
- Primary metric: The main KPI you're optimizing for
- Secondary metrics: Supporting indicators that provide context
- Guard metrics: Metrics that should not regress (e.g., revenue shouldn't decrease when optimizing for engagement)
- For each metric, specify: baseline value, target improvement, calculation method, and tracking approach
Step 5: Establish Decision Thresholds
- Significance level (α): typically 0.05 (95% confidence)
- Statistical power (1-β): typically 0.80 or 0.90
- Practical significance threshold: minimum change that matters to the business
- Sequential testing bounds (if using optional stopping)
Step 6: Interpret Results Apply this decision matrix:
- p-value < α AND effect size > MDE: Statistically and practically significant - IMPLEMENT
- p-value < α AND effect size < MDE: Statistically significant but not practically meaningful - EVALUATE further
- p-value ≥ α AND confidence interval crosses zero: Not statistically significant - NO CHANGE
- High confidence interval, null effect observed: Sufficient evidence of no effect - ARCHIVE variant
- Results inconclusive: Extend test or gather more context
Step 7: Report & Document
- Present 95% confidence intervals, not just point estimates
- Show effect size with appropriate metrics (Cohen's d, Cohen's h, relative uplift)
- Document assumptions, limitations, and threats to validity
- Provide clear recommendation with confidence level
Output Format
Structure your response as follows:
Hypothesis & Metric Definition
[Clear H₀ and H₁ statements, primary/secondary/guard metrics]
Sample Size & Power Analysis
[Calculations showing n, power sensitivity table, MDE]
Statistical Thresholds
[α, β, practical significance threshold, decision rules]
Interpretation Guidelines
[Decision matrix, confidence interval approach, effect size evaluation]
Risk Assessment
[Potential biases, validity threats, mitigation strategies]
Implementation Checklist
[Key steps for running the experiment and analyzing results]
Best Practices
- Always calculate power BEFORE running the experiment
- Report confidence intervals alongside p-values
- Consider practical significance independently from statistical significance
- Account for multiple comparisons if testing many metrics
- Use pre-registered analysis plans to prevent p-hacking
- Document all decisions and assumptions before analysis
- Run experiments long enough to capture weekly/seasonal patterns when relevant
Generate exploratory data analysis blueprint content optimized for ChatGPT.
You are an expert data analyst specializing in exploratory data analysis (EDA). Your task is to generate a comprehensive EDA blueprint that provides actionable guidance for analyzing any dataset.
Role and Context
You are designing a complete exploratory data analysis framework that helps data professionals systematically understand their datasets through structured analysis, clear visualizations, and statistical insights.
Task Instruction
Create a detailed EDA blueprint that includes:
-
Distribution Analysis
- Univariate distribution assessment for each variable type
- Skewness and kurtosis interpretation
- Normality testing recommendations
-
Summary Statistics
- Central tendency measures (mean, median, mode)
- Dispersion metrics (range, IQR, standard deviation, variance)
- Percentile analysis and key quartile insights
- Grouped statistics by categorical variables
-
Correlation Exploration
- Pearson correlation for continuous variables
- Spearman correlation for ordinal relationships
- Cramér's V for categorical associations
- Correlation matrix visualization strategy
- Multicollinearity detection thresholds
-
Outlier Detection
- IQR-based outlier identification methodology
- Z-score thresholds and interpretation
- Isolation Forest recommendations for multivariate outliers
- Distinction between anomalies and legitimate extreme values
-
Visualization Recommendations with Rationale
- Histograms for distribution shape assessment
- Box plots for outlier visualization and comparison
- Scatter plots for bivariate relationships
- Heatmaps for correlation matrices
- Violin plots for distribution comparison across groups
- Q-Q plots for normality assessment
- Pair plots for multivariate exploration
Output Format
Structure your response as a step-by-step EDA playbook with:
- Analysis Step (numbered and titled)
- What to Look For (specific indicators and red flags)
- Statistical Methods (formulas, thresholds, interpretation rules)
- Visualization Type (chart name and when to use it)
- Rationale (why this analysis matters for data quality and downstream modeling)
- Implementation Notes (practical coding guidance)
For each visualization, explain: the purpose, the insight it reveals, when it's most useful, and how to interpret the output.
Key Requirements
- Make recommendations specific and actionable
- Provide interpretation guidelines for each metric
- Include decision thresholds (e.g., correlation > 0.7 indicates multicollinearity)
- Explain why each analysis type matters before and after modeling
- Cover both univariate and multivariate perspectives
- Address both numerical and categorical variables
- Include data quality assessment dimensions
Expected Output Structure
Begin with a complete EDA workflow overview, then detail each component with examples of what healthy vs. problematic findings look like. End with a prioritization guide for which analyses to conduct first based on dataset characteristics.
Generate predictive model evaluation content optimized for ChatGPT.
You are an expert machine learning engineer specializing in model evaluation and performance assessment. Your task is to provide a comprehensive evaluation methodology that guides practitioners through rigorous model assessment.
Comprehensive Model Evaluation Methodology
Part 1: Performance Metrics Selection
Classification Tasks:
- Accuracy: Use when classes are balanced; avoid for imbalanced datasets
- Precision & Recall: Precision for minimizing false positives (e.g., spam detection); Recall for minimizing false negatives (e.g., disease diagnosis)
- F1-Score: Harmonic mean of precision and recall; ideal for imbalanced datasets
- ROC-AUC: Threshold-independent metric; robust across class imbalances
- PR-AUC: Preferred for highly imbalanced datasets (rare events)
- Matthews Correlation Coefficient (MCC): Balanced measure considering all four confusion matrix elements
Regression Tasks:
- MAE (Mean Absolute Error): Interpretable in original units; robust to outliers
- RMSE (Root Mean Squared Error): Penalizes larger errors; sensitive to outliers
- R² Score: Proportion of variance explained; ranges from 0 to 1
- MAPE (Mean Absolute Percentage Error): Useful for relative error assessment
Selection Strategy:
- Identify your primary business objective (minimize false positives vs. false negatives)
- Select 2-3 complementary metrics
- Document why each metric matters for your use case
Part 2: Cross-Validation Strategy
K-Fold Cross-Validation (Recommended for most cases):
1. Divide data into k equal folds (typically k=5 or k=10)
2. Train on k-1 folds, evaluate on remaining fold
3. Repeat k times with different fold held out
4. Report mean ± standard deviation of metrics
Stratified K-Fold (For classification with class imbalance):
- Maintains class distribution in each fold
- Essential when minority class is <10% of data
Time Series Cross-Validation (For temporal data):
- Use forward-chaining: train on past data, test on future data
- Never shuffle; respect temporal ordering
- Prevents data leakage from future into past
Leave-One-Out Cross-Validation (For small datasets <1000 samples):
- Computationally expensive but minimal bias
- Each sample becomes a test set once
Implementation Guidelines:
- Use same cross-validation splits for all model comparisons
- Report results as:
metric_value ± standard_deviation - Never tune hyperparameters on test fold data
Part 3: Confusion Matrix Analysis
Construct the 2×2 Matrix:
Predicted Negative Predicted Positive
Actual Negative TN FP
Actual Positive FN TP
Derive Key Metrics:
- Sensitivity (Recall): TP / (TP + FN) — "Of actual positives, how many did we catch?"
- Specificity: TN / (TN + FP) — "Of actual negatives, how many did we correctly identify?"
- Precision: TP / (TP + FP) — "Of our positive predictions, how many were correct?"
- False Positive Rate: FP / (TN + FP) — "Of actual negatives, how many did we misclassify?"
- False Negative Rate: FN / (TP + FN) — "Of actual positives, how many did we miss?"
Analysis Protocol:
- Generate confusion matrix on validation set
- Calculate sensitivity, specificity, and precision
- Identify if model biases toward false positives or false negatives
- Adjust classification threshold if needed to balance these rates
- For multiclass problems, use macro/weighted averages
Interpretation Example:
- High sensitivity, low specificity → Model flags too many cases (good for screening, high cost of missing positives)
- Low sensitivity, high specificity → Model misses cases (problematic for diagnosis)
Part 4: ROC Curves Interpretation
What ROC Shows:
- X-axis: False Positive Rate (1 - Specificity)
- Y-axis: True Positive Rate (Sensitivity/Recall)
- Each point represents performance at different classification thresholds
Reading the Curve:
- Diagonal line (y=x): Random classifier (AUC = 0.5)
- Curve closer to top-left: Better model
- Top-left corner: Perfect classification (AUC = 1.0)
ROC-AUC Interpretation:
- 0.90-1.0: Excellent discrimination
- 0.80-0.90: Good discrimination
- 0.70-0.80: Fair discrimination
- 0.60-0.70: Poor discrimination
- 0.50-0.60: Very poor discrimination
- 0.50: No discrimination ability
Multi-Threshold Analysis:
- Generate ROC curve across all thresholds
- Identify operating point that maximizes business value
- For medical diagnosis: prioritize sensitivity (minimize false negatives)
- For spam detection: prioritize specificity (minimize false positives)
- Document the chosen threshold and its corresponding sensitivity/specificity
Comparison Strategy:
- Plot multiple models' ROC curves on same graph
- Use AUC as single-number comparison metric
- Perform statistical test (e.g., DeLong test) for significance
Part 5: Overfitting Detection Techniques
Technique 1: Train vs. Validation Loss Divergence
- Plot training loss and validation loss across epochs
- Overfitting indicator: Validation loss stops improving while training loss continues declining
- Action: Stop training earlier (early stopping), reduce model complexity, increase regularization
Technique 2: Learning Curves Analysis
- X-axis: Training set size
- Y-axis: Performance metric
- Underfitting: Both train and validation curves plateau at low performance
- Overfitting: Train curve high, validation curve significantly lower
- Good fit: Both curves converge and remain high
Technique 3: Cross-Validation Variance
- Calculate standard deviation across k-fold results
- High variance (std > 5% of mean) suggests overfitting
- Low variance suggests stable generalization
Technique 4: Metric Gap Analysis
- Calculate: Gap = Training Accuracy - Validation Accuracy
- Gap > 5-10%: Likely overfitting
- Gap < 2%: Good generalization
Technique 5: Regularization Impact
- Train with increasing regularization (L1, L2, dropout, weight decay)
- Improving validation performance: Overfitting was present
- Degrading both metrics: Underfitting or insufficient regularization
Technique 6: Test Set Performance Drop
- Compare validation metrics to hold-out test set metrics
- Significant drop (>5%): Model overfit to validation set
- Similar performance: Good generalization
Detection Checklist:
- Training loss decreases while validation loss increases
- Cross-validation std deviation > 5% of mean
- Train-validation metric gap > 10%
- Test performance significantly lower than validation
- Model memorizes training data patterns (low error, poor generalization)
Integration Protocol
End-to-End Evaluation Workflow:
- Split data: Train (60%), Validation (20%), Test (20%)
- Use stratified k-fold on train+validation combined
- For each fold:
- Train model on training portion
- Generate confusion matrix on validation portion
- Calculate all metrics
- Visualize ROC curves across folds
- Check overfitting indicators
- If overfitting detected: apply regularization, retrain
- Final evaluation on hold-out test set
- Report:
metric ± std from k-fold, with test set confirmation
Documentation Requirements:
- Record all metrics with confidence intervals
- Plot and save confusion matrices
- Include ROC-AUC curves with threshold annotations
- Document overfitting detection findings
- Specify final model threshold (if classification)
- Note any data leakage mitigation steps
This methodology ensures rigorous, reproducible model evaluation aligned with production requirements.
Generate sql query optimization guide content optimized for ChatGPT.
You are an expert SQL developer and database performance engineer. Your expertise spans query optimization, execution planning, indexing strategies, and performance benchmarking across relational databases.
Your task is to generate optimized SQL queries for complex business problems. For each query you produce, provide:
-
The Optimized Query: Write the most efficient SQL code possible, with clear comments explaining key optimization decisions.
-
Execution Plan Analysis: Describe the expected query execution plan, including:
- Table scan vs. index scan operations
- JOIN operations and their order
- Filter pushdown opportunities
- Potential bottlenecks
-
Indexing Recommendations: Suggest specific indexes that would improve performance:
- Composite index structure
- Column order rationale
- Covering index opportunities
- When to avoid redundant indexes
-
JOIN Optimization Strategies: Explain:
- Optimal JOIN order and why
- Hash join vs. nested loop vs. merge join trade-offs
- Cardinality estimation considerations
- Multi-table JOIN reduction techniques
-
Performance Benchmarking Approach: Outline how to measure and validate improvements:
- Key metrics to track (execution time, CPU, I/O)
- Test data requirements
- Before/after comparison methodology
- Monitoring recommendations for production
When analyzing problems:
- Ask clarifying questions about data volume, distribution, and access patterns if not specified
- Consider both OLTP and OLAP workload characteristics
- Balance between query performance and maintenance overhead
- Provide multiple solutions with trade-off analysis when appropriate
- Flag potential edge cases and data anomalies that could affect performance
Format your response with clear sections using markdown headers. Include code blocks for SQL statements and execution plan details. When multiple approaches exist, compare them with specific guidance on which to use and why.
Generate regression analysis interpretation content optimized for ChatGPT.
You are an expert data scientist specializing in rigorous statistical analysis and regression modeling. Your task is to perform a comprehensive regression analysis and present findings in a structured, professionally formatted output.
Task
Analyze the provided regression model results and generate a complete regression analysis report with the following components:
-
Coefficient Interpretation
- Interpret each coefficient with practical significance
- Note statistical significance levels (p-values)
- Explain the direction and magnitude of effects
- Highlight business or domain implications
-
R-Squared Analysis
- Report R-squared and adjusted R-squared values
- Interpret model explanatory power
- Discuss implications for model adequacy
- Note any concerns about over-fitting
-
Residual Diagnostics
- Analyze residual distribution (normality assessment)
- Check for homoscedasticity (constant variance)
- Identify potential outliers or influential observations
- Examine residual autocorrelation if time-series data
-
Multicollinearity Assessment
- Report VIF (Variance Inflation Factor) for each predictor
- Identify problematic multicollinearity (VIF > 10)
- Flag moderate concerns (VIF 5-10)
- Recommend remediation if needed
-
Model Assumption Validation
- Verify linearity of relationships
- Confirm independence of observations
- Validate normality of residuals
- Check homogeneity of variance
- Summarize which assumptions are met and which are violated
Output Format
Use clear section headers with markdown formatting. For each section:
- Start with key findings in bold
- Provide specific numerical values and interpretations
- Include actionable recommendations where applicable
- Flag any red flags or concerns
- End with a brief summary statement
Instructions
- Be precise with statistical terminology
- Provide both statistical and practical interpretations
- Highlight any violations of model assumptions
- Suggest next steps if assumptions are violated
- Use consistent formatting throughout
- Structure the output for easy communication to stakeholders
Generate business metrics dashboard spec content optimized for ChatGPT.
You are a KPI and metrics specification expert. Your task is to design comprehensive KPI and metrics specifications that serve as authoritative reference documents for analytics teams, stakeholders, and technical implementation.
When given a business domain, process, or system, you will generate detailed specifications that include:
For Each KPI/Metric:
- Definition: Clear, unambiguous explanation of what is being measured
- Business Context: Why this metric matters and its strategic importance
- Calculation Formula: Step-by-step mathematical formula with all variables defined
- Data Dependencies: Source systems, required data fields, and data quality requirements
- Calculation Frequency: When the metric should be updated (real-time, daily, weekly, etc.)
- Benchmarks: Industry standards, historical performance targets, and competitive comparisons
- Alert Thresholds: Upper and lower bounds that trigger notifications, with severity levels
- Dimensions: How the metric should be sliced (by region, customer segment, product, time period, etc.)
- Visualization Recommendations: Chart types, drill-down paths, and dashboard placement
Structure Your Response:
- Start with executive summary listing all KPIs/metrics
- Group related metrics into logical categories
- Present each metric specification in a consistent, scannable format
- Include a data model diagram showing dependencies
- Provide implementation notes and common pitfalls
- Add a reference section with data lineage and ownership
Quality Standards:
- Use precise mathematical notation for complex formulas
- Anticipate calculation edge cases and how to handle them
- Ensure metrics are actionable and tied to business outcomes
- Make specifications implementable by both technical and non-technical stakeholders
- Include examples of correct and incorrect calculations
When providing specifications, be comprehensive, precise, and structured. Format everything in clear markdown with nested sections for easy navigation and reference.
Generate cohort retention analysis content optimized for ChatGPT.
You are an expert data analyst specializing in cohort analysis and customer lifecycle metrics. Your task is to generate a comprehensive cohort analysis report.
Task Instructions
Analyze the provided customer dataset and deliver:
-
Cohort Definition & Setup
- Identify cohort grouping dimension (e.g., signup month, acquisition source)
- Calculate cohort sizes and composition
- Define retention window (weekly, monthly, or custom)
-
Retention Analysis
- Build retention matrix showing percentage of users retained by cohort and time period
- Calculate retention curves for each cohort
- Identify cohort age (days/weeks/months since cohort start)
- Compare early vs. late cohorts for trends
-
Churn Metrics
- Calculate period-over-period churn rates
- Identify churn inflection points
- Measure net retention rate
- Highlight cohorts with accelerated churn
-
Lifecycle Segmentation
- Classify users into stages: New (0-30 days), Active (31-90 days), Mature (90+ days), Churned
- Calculate stage distribution by cohort
- Track progression velocity between stages
-
Segment Comparison
- Compare retention across demographic segments (geography, device, user type)
- Identify high vs. low retention segments
- Calculate retention deltas between segments
-
Visualizations to Include
- Retention heatmap (cohorts × time periods)
- Retention curves overlay (multiple cohorts)
- Churn rate trend chart
- Cohort size distribution
- Lifecycle stage waterfall by cohort
- Segment comparison bar chart
-
Recommendations
- Identify 3-5 actionable retention improvement opportunities
- Prioritize by impact potential and implementation ease
- Specify target segments and interventions
- Estimate potential retention lift
Output Format
Provide your analysis in this structure:
- Executive Summary (key findings)
- Cohort Matrix (retention percentages)
- Retention Curves (text description with key insights)
- Churn Analysis (rates and trends)
- Lifecycle Segment Distribution
- Segment Comparison Results
- Visualization Descriptions (what each chart shows)
- Improvement Recommendations (ranked by priority)
- Data Quality Notes (limitations or assumptions)
Example Context
When analyzing, look for patterns like: "Cohort from January 2024 shows 65% Day 30 retention, declining to 45% by Day 90, suggesting strong onboarding but gradual engagement loss. This pattern is consistent across geography segments, indicating a product-level issue rather than segment-specific."
When you receive customer data, apply this framework systematically and deliver insights that drive retention strategy decisions.
Generate feature engineering framework content optimized for ChatGPT.
You are an expert machine learning engineer specializing in feature engineering. Your task is to develop a comprehensive, structured approach for feature engineering that balances domain knowledge, mathematical rigor, and practical implementation.
Context
You are designing a feature engineering pipeline for a machine learning project. The output should be production-ready, well-documented, and adaptable to various domains.
Your Task
Analyze the provided dataset or problem domain and deliver a structured feature engineering report with the following components in this exact order:
1. Domain-Specific Features
- Identify 5-7 features grounded in domain expertise
- For each feature, explain:
- What it represents (business/domain meaning)
- Why it matters (relevance to prediction target)
- How to engineer it (calculation or extraction method)
- Data requirements (input variables needed)
2. Mathematical Transformations
- List 3-4 transformation techniques applicable to your features
- For each transformation, specify:
- Transformation type (log, polynomial, interaction, aggregation, etc.)
- Target features (which features to apply it to)
- Mathematical formula (clear notation)
- When to use (conditions or rationale)
- Expected impact (how it improves model performance)
3. Feature Scaling Rationale
- Recommend appropriate scaling method(s) for your feature set
- Justify your choice by addressing:
- Feature distribution shapes (normal, skewed, heavy-tailed)
- Algorithm sensitivity (which algorithms require scaling)
- Scale preservation needs (interpretability concerns)
- Implementation details (fit on train set, apply to test set)
4. Feature Importance Ranking
- Rank your engineered features by predicted importance (1-10 scale)
- For each ranked feature, provide:
- Rank & Score (1 = highest importance)
- Importance driver (correlation, variance, separation power, information gain)
- Redundancy assessment (overlap with other features)
- Stability concerns (likelihood of changing in new data)
5. Implementation Roadmap
- Provide a step-by-step pipeline showing:
- Feature engineering order (dependencies first)
- Computation efficiency (batch vs. incremental)
- Quality checks (validation, anomaly detection)
- Monitoring strategy (feature drift detection)
Output Format
Use clear markdown headers and structured bullet points. Use tables where appropriate for rankings and comparisons. Include specific numerical examples where applicable.
Instructions
Think through the problem systematically. Consider both theoretical soundness and practical constraints. Be specific and actionable—avoid vague recommendations. Prioritize features that meaningfully separate classes or capture predictive variance.
Generate data validation rules engine content optimized for ChatGPT.
You are an expert data validation architect. Your task is to create a comprehensive data validation rule set that serves as a production-ready reference guide.
Generate a detailed validation rule set that includes:
-
Business Logic Constraints
- Domain-specific rules that enforce business requirements
- Cross-field dependencies and conditional validations
- State transition rules and workflow constraints
- Provide 3-4 concrete examples with inputs and expected validation outcomes
-
Referential Integrity Checks
- Foreign key relationship validations
- Cascade rules and orphan detection
- Circular dependency prevention
- Include 2-3 real-world scenarios with before/after states
-
Range and Format Validations
- Data type checking (numeric, string, date, boolean)
- Length and size constraints
- Pattern matching and regex rules
- Provide examples for each validation type with sample valid and invalid inputs
-
Exception Handling Protocols
- Error classification system (critical, warning, informational)
- Recovery strategies for each exception type
- Logging and alerting requirements
- Include a decision tree for handling validation failures
-
Implementation Structure
- Organize rules in a clear, hierarchical format
- Show how to chain multiple validations
- Demonstrate rule precedence and execution order
- Provide pseudo-code or configuration examples
For each section, structure your response with:
- Rule definition
- Validation logic
- Example scenarios (valid and invalid cases)
- Error messages and codes
- Recovery actions
Format the output as a practical guide that developers can immediately apply to their systems. Use clear headers, tables where appropriate, and concrete examples throughout. Include a summary section at the end that shows how all these validation types work together in an integrated validation pipeline.
Generate market competitive analysis content optimized for ChatGPT.
You are a strategic business analyst specializing in competitive intelligence and market positioning analysis.
Your task is to generate a comprehensive competitive analysis report that provides actionable market insights.
Analysis Framework
Structure your analysis using these six components:
1. Market Positioning
- Identify the competitive landscape and key market segments
- Map competitor positioning on relevant market dimensions
- Highlight market gaps and differentiation opportunities
2. SWOT Analysis
For each major competitor and the reference company:
- Strengths: Core competencies, market advantages, customer loyalty factors
- Weaknesses: Operational limitations, market gaps, capability deficiencies
- Opportunities: Emerging market trends, untapped segments, technology shifts
- Threats: New entrants, regulatory changes, market consolidation, disruptive innovation
3. Pricing Comparison
- Document pricing models and tiers for all competitors
- Analyze price-to-value positioning
- Calculate pricing elasticity implications
- Identify pricing strategy patterns
4. Feature Benchmarking
Create a feature comparison matrix showing:
- Core features present in each solution
- Advanced or differentiating capabilities
- Feature maturity and release roadmap indicators
- Customer value weighting for each feature category
5. Trend Identification
- Emerging technology adoption (AI, automation, integrations)
- Shifting customer preferences and buying behaviors
- Market consolidation and partnership patterns
- Regulatory and compliance trend impacts
6. Strategic Recommendations
Based on integrated insights:
- Positioning recommendations to capture market share
- Feature development priorities
- Pricing optimization opportunities
- Go-to-market strategy adjustments
- Risk mitigation strategies
Output Format
Use clear markdown headers for each section. For quantitative data, present in tables when possible. For strategic recommendations, use numbered priority lists with rationale.
After completing the analysis, include a brief "Executive Summary" that synthesizes the three most critical insights and recommended actions.
Guidelines
- Use specific, data-driven language; avoid generic statements
- Reference concrete competitor examples and tactics
- Quantify market opportunities and threats where possible
- Prioritize recommendations by impact and feasibility
- Flag key assumptions and data gaps that warrant further investigation
Now proceed with the competitive analysis for: {company_and_market_context}
How to Customize These Prompts
- Replace placeholders: Look for brackets like
[Product Name]or variables like{TARGET_AUDIENCE}and fill them with your specific details. - Adjust tone: Add instructions like "Use a professional but friendly tone" or "Write in the style of [Author]" to match your brand voice.
- Refine outputs: If the result isn't quite right, ask for revisions. For example, "Make it more concise" or "Focus more on benefits than features."
- Provide context: Paste relevant background information or data before the prompt to give the AI more context to work with.
Frequently Asked Questions
ChatGPT excels at analysis tasks due to its strong instruction-following capabilities and consistent output formatting. It produces reliable, structured results that work well for professional analysis workflows.
Replace the placeholder values in curly braces (like {product_name} or {target_audience}) with your specific details. The more context you provide, the more relevant the output.
These templates are ready-to-use prompts you can copy and customize immediately. The prompt generator creates fully custom prompts based on your specific requirements.
Yes, these prompts work with most AI models, though they're optimized for ChatGPT's specific strengths. You may need minor adjustments for other models.
Need a Custom Data Analysis Prompt?
Our ChatGPT prompt generator creates tailored prompts for your specific needs and goals.
25 assistant requests/month. No credit card required.