AI Insights Reports

ML-powered analytics for flaky tests, risk heatmaps, predictive failure, and more.

AI Insights

AI Insights are Tier 3 reports that use machine learning to surface actionable patterns your team would not easily find with traditional charts. These reports analyze your historical testing data to detect flaky behavior, predict failures, score test quality, and optimize your suite. Each AI report consumes AI credits when generated.

AI reports consume credits each time they are generated. The credit cost is displayed on each report card before generation. See the AI Features guide for details on credit pricing and management.

Flaky Tests

Chart type: Table | Tier: 3

Identifies tests with inconsistent pass/fail outcomes across runs. A test is considered flaky if it alternates between passing and failing without any code changes, indicating non-deterministic behavior.

What it detects: Tests whose results flip between pass and fail across multiple executions. The report lists each flaky test along with its flip count, recent execution history, and a confidence score.

When to use: Before trusting your test results. Flaky tests erode confidence in the suite. Address the root causes (timing issues, test data dependencies, shared state) to stabilize your results.

Release Readiness

Chart type: Gauge | Tier: 3

An AI-computed readiness score combining three key factors: overall pass rate, execution progress (percentage of tests executed), and critical test case pass rate. The result is a single gauge from 0 to 100 indicating how ready the build is for release.

What it detects: Whether your project meets the quality bar for release. The gauge provides a red/yellow/green indication, along with a breakdown of the contributing factors so you know exactly what is dragging the score down.

When to use: As the final quality gate before a release. Share the readiness score with stakeholders and product owners to support go/no-go decisions.

Risk Heatmap (Folder)

Chart type: Heatmap | Tier: 3

Generates a heatmap of failure risk across test case folders. Each cell represents a folder, with color intensity indicating the risk level based on historical failure rates, defect counts, and test coverage gaps.

What it detects: High-risk product areas that combine poor pass rates, frequent defects, and low coverage. These are the areas most likely to cause production issues.

When to use: During test planning to allocate more testing effort to high-risk areas. Share with development leads to prioritize stabilization work.

Risk Heatmap (Feature)

Chart type: Heatmap | Tier: 3

Similar to the folder heatmap but organized by product feature instead of folder structure. This is useful when your folder hierarchy does not directly map to features.

What it detects: Feature-level risk concentrations. Identifies which product features carry the most testing risk based on failure patterns and defect history.

When to use: When presenting quality status to product managers who think in terms of features rather than test folders.

Test Case Quality

Chart type: Table | Tier: 3

Assigns an AI quality score to each test case based on factors such as step completeness, clarity of expected results, requirement linkage, and historical effectiveness (does the test actually find bugs?).

What it detects: Poorly written test cases that lack detail, have vague expected results, or have never detected a defect despite being run multiple times.

When to use: During test suite audits and continuous improvement efforts. Focus on improving the lowest-scoring test cases first for maximum impact.

Predictive Failure

Chart type: Table | Tier: 3

Predicts which tests are most likely to fail in the next execution based on historical patterns. The model considers factors like recent failure streaks, code change proximity, and defect correlation.

What it detects: Tests with a high probability of failing. This allows your team to prioritize executing these tests early in the cycle so failures are discovered sooner.

When to use: At the start of a new test cycle. Run the predicted-to-fail tests first to get early feedback and triage defects while the rest of the suite is still executing.

Tester Effectiveness

Chart type: Table | Tier: 3

Provides per-tester performance metrics including pass rate, defect discovery rate, execution velocity, and overall effectiveness score. This is not a ranking -- it helps identify patterns and coaching opportunities.

What it detects: Variations in testing thoroughness across the team. Testers with low defect discovery rates combined with high pass rates may be doing surface-level testing.

When to use: During one-on-one meetings and team retrospectives to identify skill development opportunities and optimize team composition.

Stale Tests

Chart type: Table | Tier: 3

Identifies test cases that have not been executed recently. Stale tests may be outdated, irrelevant, or simply forgotten. The report lists each stale test along with the date it was last executed and the number of days since.

What it detects: Tests that have fallen out of rotation. These may be testing deprecated features, may be too difficult to execute (indicating they need simplification), or may have been superseded by newer tests.

When to use: Quarterly test suite reviews. Review the stale test list and decide whether to re-execute, update, or archive each one.

Cycle Health

Chart type: Table | Tier: 3

A health dashboard for each test cycle showing execution progress, pass rate, defect count, blocked test count, and an overall health grade (A through F). This provides a quick summarized view of every active cycle.

What it detects: Cycles that are struggling -- low progress, high failure rates, or many blocked tests. The health grade makes it easy to spot trouble at a glance.

When to use: When managing multiple concurrent test cycles. Check the health dashboard to prioritize which cycles need attention.

Suite Optimization

Chart type: Table | Tier: 3

Analyzes your test suite to identify unused, redundant, and always-passing test cases. The report categorizes tests into actionable groups: tests that have never been run, tests that always pass (and may not be providing value), and tests that appear to duplicate other tests.

What it detects: Bloat in your test suite. Over time, test suites accumulate redundant and low-value tests that increase execution time without improving quality.

When to use: Before starting a new major release cycle. Trim the suite to reduce execution time and focus on tests that provide real value.

Execution Velocity

Chart type: Line | Tier: 3

Tracks daily execution throughput with a rolling average trend line. The chart shows how many tests are being executed each day along with a smoothed trend to filter out daily noise.

What it detects: Throughput slowdowns and bottlenecks. If the rolling average declines, something is slowing the team down -- perhaps environment issues, complex failures requiring investigation, or team capacity changes.

When to use: For sprint capacity planning. Use historical velocity to predict how many tests the team can execute in the next sprint and plan scope accordingly.

AI Insights Reports

On this page