DPSMF — User Guide

DPSMF Guide

Need help?
support@answerpoint.com

Getting Started

Quick Start

Connect your first SQL Server instance and start monitoring in under 5 minutes.

🔌

Connect Your First Instance

▾

1
Log in to DPSMF
Navigate to your DPSMF URL and log in with your admin credentials. First-time users will be prompted to set a password on initial login.
2
Open Instance Manager
Go to Settings → Instances → Add Instance. Enter the SQL Server hostname or IP address, port (default 1433), and credentials. SQL Auth and Windows Auth are both supported.
3
Test the Connection
Click Test Connection. DPSMF will verify connectivity and permissions. A green check confirms the instance is reachable and the monitoring account has the required permissions.
4
Baseline Learning Period (7 Days)
DPSMF observes silently for the first 7 days, building statistical baselines for every query hash, wait type, and resource metric. No alerts fire during this period — this is expected behaviour.
5
Alerts Go Live on Day 8
From day 8, any deviation beyond 3σ from the learned baseline triggers an alert. The longer DPSMF runs, the sharper the baselines become. False positive rates typically drop below 6% after 30 days.

💡

DPSMF is read-only. It only requires VIEW SERVER STATE and VIEW DATABASE STATE on the monitored instance. It never writes to your SQL Server.

📌

What Happens Next

▾

📊

Metrics Collection

DMV polling begins immediately every 10 seconds. CPU, memory, waits, blocking, and query stats are collected and stored.

🤖

Anomaly Engine

Anomaly detection analyses patterns from day 1. Baseline models improve continuously as more data is collected.

🧠

Knowledge Graph

Every query, wait type, and resource metric is linked in a live knowledge graph. Relationships between events are inferred automatically.

🏆

Scorecard

A health scorecard is generated within 24 hours of connection, scoring your instance across 8 dimensions.

Getting Started

Trial Workflow Guide

A step-by-step walkthrough of your DPSMF free trial — from sign-up to your first intelligent recommendation.

1️⃣

Choose Tier

5 min

2️⃣

2 min

3️⃣

First Login

2 min

4️⃣

Connect Instance

15 min

5️⃣

Baseline

24–48 hr

6️⃣

First Alerts

Day 2+

7️⃣

Convert / Extend

Day 15/30

🆕

Step 1 — Choose Your Trial Tier

▾

Tier	Instances	Duration	Sentinel AI	All Modules	Best For
Monitor	3	15 days	Text only	✗	Small teams, single SQL Server
Intelligence	10	15 days	Voice + all tabs	✓	Mid-size DBA teams, multiple instances
Enterprise	Unlimited	30 days	Voice + priority	✓	Large estates, multi-team rollout

💡

Recommendation: Start with Intelligence if you have more than one SQL Server. The self-learning Knowledge Graph requires at least one full business day of data before recommendations are meaningful — the sooner you connect, the faster the graph builds.

👤

Enterprise trials require a brief call with our team. Email sales@dpsmf.ai to schedule — we'll provision a 30-day trial with a dedicated onboarding session.

📝

Step 2 — Register for a Trial

▾

1
Open the Trial Form
Click Start Free Trial on the Pricing page or navigate directly to dpsmf.ai/trial.html. No credit card is required.
2
Fill in Your Details
Enter your full name, work email address, organisation name, and choose a tier from the dropdown. Create a password of at least 8 characters.
3
Submit and Save Your License Key
On success the page displays your DPSMF-XXXXXX-XXXXXX-TRIAL license key. Copy it immediately — it is also emailed to you, but the email may take a few minutes to arrive. You will need the key on first login.

⚠️

One trial per email address. If you need to test multiple tiers, use a different email (e.g. you+monitor@company.com). Trial accounts cannot be upgraded mid-trial — you can convert to a paid subscription at any time.

🔒

Step 3 — First Login & License Activation

▾

1
Navigate to Login
Go to dpsmf.ai/dpsmf-login.html or click Log In from any DPSMF page. Enter the email and password you registered with.
2
Activate Your License Key
After login, if your account does not yet have an active license you will see the License Activation prompt. Paste your DPSMF-XXXXXX-XXXXXX-TRIAL key and click Activate. The key is validated server-side and tied to your account — it cannot be used again.
3
Land on the Dashboard
After activation you are taken to the main dashboard at /app/. The dashboard is initially empty — no instances are connected yet. The trial countdown begins at this point.

🔑

Lost your license key? Check your registration email (search for DPSMF Trial). If you still can't find it, contact support@dpsmf.ai with your registered email address.

🔌

Step 4 — Connect Your First SQL Server Instance

▾

⏱️

Allow 10–15 minutes for this step. You will need SQL Server credentials and network access details before you start.

1
Open Instance Manager
From the dashboard, click Instances in the left sidebar, then click + Add Instance. Alternatively go to Settings → Instances → Add.
2
Enter Connection Details
Fill in: Instance Name (display label), Server Host (hostname or IP), Port (default 1433), SQL Server Version (2016–2025 or Azure SQL). Select authentication: SQL Auth (username + password) or Windows Auth (service account via Kerberos).
3
Verify Connectivity
Click Test Connection. DPSMF will attempt a lightweight DMV query (SELECT @@VERSION) to confirm network and credentials. Common failures: firewall blocking port 1433, incorrect hostname, login not mapped to a database. See Requirements for the full permission list.
4
Select Databases to Monitor
After a successful test, DPSMF lists all databases on the instance. Choose which ones to include in per-DB monitoring. System databases (master, tempdb, msdb) are always included for wait-stat and config analysis.
5
Save and Begin Collection
Click Save Instance. DPSMF begins polling immediately at 15-second intervals for wait stats, DMVs, and execution plans. The instance card on the dashboard shows a green Live badge within 30 seconds.

💡

Minimum SQL Server permissions required: VIEW SERVER STATE (for DMVs), VIEW DATABASE STATE (per-DB), VIEW ANY DEFINITION (execution plans). Read the Requirements panel for the exact GRANT script.

📈

Step 5 — The Baseline Period (24–48 Hours)

▾

DPSMF does not generate recommendations immediately. The Intelligence Engine needs to observe a representative sample of workload patterns — typically one full business day — before the Knowledge Graph can distinguish normal from anomalous behaviour.

✅ What happens during baselining

· Wait stat distributions are recorded by hour-of-day
· Query plan hashes are catalogued and clustered
· CPU, memory, and I/O patterns are fingerprinted
· Knowledge Graph nodes begin forming from DMV data
· Thresholds are auto-calibrated to your instance

🚫 What is not yet available

· Anomaly Detection alerts (needs baseline first)
· Capacity Planner projections (needs trend data)
· Knowledge Graph recommendations (needs node density)
· Scorecard badge scoring (needs full observation window)

💡

Speed up baselining: Run your typical workload — batch jobs, report queries, OLTP load — during the first 24 hours. The more representative data DPSMF sees, the faster the Knowledge Graph reaches useful density. A quiet weekend is a poor time to start.

During the baseline period you can still use Sentinel in Query Mode — ask questions like "What are my top wait types right now?" or "Show me execution plans for queries over 5 seconds." These are answered directly from live DMV data without requiring a trained baseline.

🔔

Step 6 — Your First Alerts Fire

▾

After the baseline window closes, Anomaly Detection activates and the alert system comes fully online. Here is what to expect on Day 2:

1
Dashboard Alert Cards Appear
The dashboard shows active alerts sorted by severity (Critical → Warning → Info). Each card shows the affected instance, the alert type, and a one-line Sentinel summary. Click any card to open Sentinel with the alert pre-loaded into context.
2
Open Sentinel for Root-Cause Analysis
Sentinel displays its evidence chain: the triggering metric, correlated wait types, execution plans, and Knowledge Graph context nodes. Click Explain to get a plain-English explanation. Click Recommend to see prioritised remediation steps with confidence scores.
3
Configure Notification Channels
Go to Settings → Notifications to add email, Microsoft Teams webhook, or SMS recipients. During a trial, alert emails are sent from alerts@dpsmf.ai. Add this address to your allowlist to avoid spam filtering.
4
Create Your First Runbook
When Sentinel recommends a remediation step you want to formalise, click Save as Runbook. The runbook captures the evidence chain, the recommendation, and any notes you add — creating an institutional knowledge record that persists beyond your trial.

💡

Trial tip: The fastest way to evaluate DPSMF is to deliberately cause a workload spike — run a table scan, block a session, or fill tempdb — and watch Sentinel detect and explain it in real time. Most evaluators complete this test within the first 48 hours.

🚀

Step 7 — End of Trial: Convert, Extend, or Export

▾

💳

Convert to Paid

Go to Settings → Licensing → Upgrade. All Knowledge Graph data, runbooks, alert history, and badge progress carry forward. No re-connection required.

🕐

Request Extension

Need more time? Email sales@dpsmf.ai before your trial expires. Extensions are typically 7 days and are granted once per organisation.

📤

Export Your Data

Go to Settings → Export to download your runbooks, alert history, and scorecard report as JSON or PDF before expiry.

⚠️

What happens when a trial expires: The account is suspended. Monitoring stops. No data is deleted for 30 days — you can reactivate by converting to a paid subscription within that window. After 30 days, data is permanently removed.

🏆

Trial checklist before you decide: Have you asked Sentinel a question about a real production issue? Have you reviewed the Knowledge Graph recommendations? Have you generated a Scorecard report for your DBA team? These three activities represent the core value proposition of DPSMF.

Module How-To — Core Module

Performance Intelligence

Real-time wait stats, batch requests, PLE, CPU and memory pressure — with baseline comparison so you know exactly what “normal” looks like.

⚡

What Performance Intelligence Does

▾

⏱

Wait Stat Analysis

Continuously captures sys.dm_os_wait_stats and classifies waits into CPU, I/O, lock, network, and memory categories. Deltas are computed every 10 seconds.

📊

Batch Request Rate

Tracks batch requests/sec as the primary SQL Server workload signal. Sudden spikes or drops are flagged against the 7-day rolling baseline.

💾

Page Life Expectancy (PLE)

PLE is sampled every 30 seconds per NUMA node. Sustained drops below your instance-specific floor (not the generic “300” threshold) fire an alert.

🔥

CPU & Memory Pressure

OS-level CPU utilisation and SQL Server memory counters (Target KB, Total KB, Stolen, Free) tracked side by side. Memory pressure is detected when Free pages drop below 5% of Target.

🚀

How to Use It — Step by Step

▾

1
Open the Sentinel Dashboard
Navigate to Sentinel in the top nav. The live SN1 stream shows current waits, batch rate, and memory state in real time. No configuration needed — this starts collecting on day 1.
2
Check the Baseline Ribbon
The orange baseline ribbon on each metric tile shows what “normal” looks like for this instance. Current value turns red when it exceeds the 3σ threshold. Hover any tile for a 24-hour sparkline.
3
Drill Into Wait Stats
Click Wait Analysis in the Sentinel panel to see the full wait category breakdown. The top 5 wait types for the current 10-minute window are ranked by total wait time, not count — this shows you what SQL Server is actually blocked on.
4
Correlate With Top Queries
When CPU or I/O waits are elevated, click Top Queries to see which query hashes are responsible. Sort by avg CPU ms, avg reads, or execution count to find the culprit fast.
5
Set a Performance Alert Threshold
Go to Alerts → Rules → Add Rule. Choose metric cpu_pct, batch_requests_sec, or page_life_expectancy. Set the operator and threshold, then choose a notification channel. The baseline-aware option fires only when the value deviates from learned normal — not just crosses a static number.

💡

The most actionable signal in Performance Intelligence is wait category shift. If RESOURCE_SEMAPHORE waits suddenly dominate where CXPACKET used to, that’s a memory grant problem — not a parallelism one. The category view catches this instantly.

🔍

Reading the Output

▾

Signal	What It Means	Action
`cpu_pct > 85%` sustained	CPU-bound workload or runaway query	Open Top Queries, sort by avg CPU ms
PLE dropping > 20% in 10 min	Memory pressure from large sort/hash	Check memory grants in Top Queries
Batch rate drops 30%+ vs baseline	Blocking or connectivity issue	Open Blocking Chain view in Sentinel
PAGEIOLATCH waits spike	Missing index or cold buffer pool	See Index & Query Health module
RESOURCE_SEMAPHORE dominates	Insufficient memory for query grants	Review `max server memory` setting in Configuration Intelligence

Module How-To — Flagship Module

Security Audit Engine

Shadow Identity analysis, permission sprawl detection, TDE status, login anomalies, and continuous compliance scoring across every connected instance.

🔒

What the Security Audit Engine Does

▾

🕵

Shadow Identity Detection

Identifies orphaned logins, users without matching logins, disabled accounts still holding permissions, and logins that haven’t authenticated in 90+ days.

📝

Permission Sprawl

Maps every server role, database role, and direct permission grant. Flags overprivileged accounts (e.g., non-SA accounts with CONTROL SERVER) and cross-database permission chains.

🔐

TDE & Encryption Status

Reports encryption state for every database. Highlights databases where TDE is absent, where the certificate is nearing expiry, or where backup encryption is not configured.

🚨

Detects failed login storms, off-hours authentications, logins from new source IPs, and service accounts authenticating interactively.

⚠️

The Security Audit Engine requires the VIEW SERVER STATE and VIEW ANY DEFINITION permissions. If your monitoring account is missing VIEW ANY DEFINITION, TDE and permission sprawl scans will return partial results. Check Settings → Instances → Permission Check to validate.

🚀

How to Run a Security Audit

▾

1
Open the Security Panel
From the Sentinel dashboard, click the Security tab. The compliance score (0–100) for each connected instance is shown at the top. A score below 70 is flagged red.
2
Review Shadow Identities
Click Shadow Identities. The list shows all logins and users with anomalous states. For each entry, the Risk column explains why it was flagged. Use the Export button to share the list with your security team.
3
Audit Permission Sprawl
Click Permission Map. The tree view starts from sysadmin and expands downward through role chains to direct grants. Any path leading to CONTROL SERVER, ALTER ANY LOGIN, or db_owner on a sensitive database is highlighted in amber.
4
Check TDE Coverage
The Encryption table lists every database with its TDE state, certificate thumbprint, and expiry date. Green = encrypted + cert valid. Amber = encrypted but cert expires within 90 days. Red = not encrypted.
5
Create a Security Alert Policy
Go to Alerts → Policies → New Policy, choose template Security Baseline. This pre-configures alerts for: new sysadmin-role member, failed login rate > 10/min, TDE disabled on any new database, and off-hours interactive service-account login.
6
Export a Compliance Report
Go to Reports → Security Audit. Choose the instance, date range, and output format (PDF or Excel). The report covers all 5 compliance dimensions and includes a remediation checklist.

🏆

Compliance Score Breakdown

▾

Dimension	Weight	What Drops It
Identity hygiene	25%	Orphaned logins, stale accounts, disabled users with active grants
Privilege minimisation	30%	Non-SA sysadmin members, `CONTROL SERVER` grants, nested role escalation
Encryption coverage	20%	Unencrypted databases, expired TDE certificates, missing backup encryption
Authentication hygiene	15%	Mixed-mode auth on domain instances, SQL logins without password policy
Audit trail	10%	SQL Audit not configured, C2 audit disabled, no login failure logging

Module How-To — Intelligence Layer

Knowledge Graph

10,000+ nodes of verified SQL Server knowledge, sourced from Microsoft docs, recognised experts, and real-world application — continuously growing.

🧠

What the Knowledge Graph Does

▾

🔗

Node-Based Knowledge

Each node is a verified fact, rule, or pattern — e.g., “RESOURCE_SEMAPHORE waits indicate insufficient memory grant availability.” Nodes are confidence-scored and source-cited.

💡

Contextual Recommendations

When an alert fires, the Knowledge Graph surfaces the 3–5 most relevant nodes: likely cause, diagnostic query, and fix — ranked by how closely they match the live symptom pattern.

📈

Pattern Matching

Combines live metrics with node relationships to detect compound problems: e.g., high CPU + CXPACKET waits + new execution plan = parallelism regression, not a hardware issue.

🧰

Semantic Search

Search the Knowledge Graph in plain English. “Why is my tempdb slow?” returns ranked nodes about contention, metadata, allocation pages, and version store exhaustion.

🚀

How to Use the Knowledge Graph

▾

1
Open Knowledge Graph
Navigate to Intelligence → Knowledge Graph. The graph explorer loads with your instance’s most recently active nodes highlighted — these are the patterns most relevant to what your SQL Server is doing right now.
2
Search for a Symptom
Use the search bar to enter a symptom in plain English, e.g. slow queries after index rebuild or blocking on tempdb. Results are ranked by semantic similarity to your query, not keyword match.
3
Read the Recommendation Chain
Click any node to expand its detail panel. The Related Nodes tab shows the cause chain (upstream) and fix chain (downstream). Follow the chain from symptom → cause → diagnostic → remediation.
4
Use During an Incident
When an alert fires, the alert detail page includes a Knowledge Graph Insight card. This card links directly to the 3 most relevant nodes for that specific alert — no manual searching needed during an incident.
5
Validate and Contribute
Each node has a Validate button. Clicking it after following a recommendation that worked increases the node’s confidence score. Validated nodes are weighted higher in future recommendations for all users.

🤖

The Knowledge Graph grows continuously. New nodes are sourced from Microsoft documentation updates, SQL Server release notes, and real-world patterns submitted by the DPSMF customer base. Your instance-specific patterns are private and never shared.

📚

Common Use Cases

▾

Scenario	What to Search	What You’ll Find
Query suddenly slow after deploy	`plan regression after schema change`	Parameter sniffing nodes, plan cache flush triggers, UPDATE STATISTICS guidance
Tempdb contention alerts firing	`tempdb allocation contention`	GAM/SGAM page contention nodes, trace flag 1118/1117 analysis, file count recommendations
Memory pressure on busy OLTP	`memory grant timeout OLTP`	Resource semaphore nodes, max server memory tuning, memory-optimised table guidance
AG failover investigation	`availability group unexpected failover`	Health check timeout nodes, lease timeout chain, network jitter patterns

Module How-To — Performance Module

Index & Query Health

Missing index detection, fragmentation analysis, execution plan visualisation with AI reasoning chains. Know exactly why a query is slow.

📦

What Index & Query Health Does

▾

🔍

Missing Index Detection

Reads sys.dm_db_missing_index_details continuously, but de-duplicates and scores candidates by projected impact (avg user seeks × avg total cost) rather than just listing them raw.

📸

Fragmentation Analysis

Nightly scan of sys.dm_db_index_physical_stats across all user databases. Results are stored historically so you can see fragmentation growth over time, not just a point-in-time snapshot.

📄

Execution Plan Capture

For the top 25 queries by CPU and I/O cost, DPSMF captures and stores query plans. Plans are compared week-over-week — a plan change that coincides with a performance regression is automatically flagged.

🤖

AI Reasoning Chains

For each flagged query, Sentinel AI generates a plain-English explanation: what the plan is doing, why it’s inefficient, and what to fix first. Powered by the Knowledge Graph + live plan data.

🚀

How to Use Index & Query Health

▾

1
Open Index Health
Go to Intelligence → Index & Query Health. The summary card shows: total missing index candidates, indexes with >30% fragmentation, and queries with plan regressions detected this week.
2
Review Missing Index Candidates
The Missing Indexes tab lists candidates sorted by impact score (not raw column lists). Each entry shows the table, equality/inequality columns, included columns, and an estimated read improvement percentage. Click Generate Script to get a ready-to-review CREATE INDEX statement.
3
Prioritise Fragmentation Rebuilds
The Fragmentation tab shows all indexes with >5% fragmentation, sorted by size × fragmentation impact. Indexes under 1,000 pages are marked REORGANIZE candidates; larger indexes are marked REBUILD. Use the maintenance window filter to schedule rebuilds outside peak hours.
4
Investigate a Slow Query
Click Top Queries, then select any query hash. The detail pane shows: the query text, historical CPU/reads trend (30 days), current execution plan as a visual tree, and the AI reasoning chain explaining the bottleneck. The reasoning chain cites specific Knowledge Graph nodes so you can read the background.
5
Set a Plan Regression Alert
In the query detail pane, click Watch This Query. If the execution plan changes and performance degrades by more than 20%, an alert fires with a side-by-side plan diff so you can see exactly what changed.

⚠️

Don’t add every missing index. SQL Server suggests indexes based on individual queries in isolation — it doesn’t account for write overhead or index overlap. Use DPSMF’s impact score to prioritise the top 3–5 candidates, test in a non-prod environment, and measure the actual improvement before deploying to production.

📉

Interpreting the Fragmentation Scan

▾

Fragmentation %	Page Count	Recommended Action
< 5%	Any	No action needed
5–30%	< 1,000 pages	`ALTER INDEX ... REORGANIZE` (online, low impact)
5–30%	≥ 1,000 pages	`ALTER INDEX ... REBUILD WITH (ONLINE=ON)` if Enterprise
> 30%	Any	`ALTER INDEX ... REBUILD` — schedule in maintenance window
Any	< 100 pages	Ignore — small tables don’t benefit from fragmentation fixes

Module How-To — Config Module

Configuration Intelligence

MAXDOP, cost threshold, memory settings, tempdb file counts — audited against your specific SQL Server version, edition, and workload profile.

🔧

What Configuration Intelligence Does

▾

⚙️

Version-Aware Audit

Configuration checks are scoped to your exact SQL Server version and edition. Recommendations for SQL Server 2019 Enterprise on 32-core hardware differ from 2016 Standard on 8 cores.

📈

Workload-Adaptive Rules

DPSMF observes your actual workload mix (OLTP vs. analytical vs. mixed) before scoring configuration settings. MAXDOP 1 is correct for some OLTP workloads; wrong for others.

📝

Drift Detection

Configuration settings are snapshotted daily. If any setting changes, an alert fires immediately with the before/after value. Configuration drift between instances in the same environment is also detected.

📄

Remediation Scripts

Every flagged misconfiguration includes a one-click script generation: EXEC sp_configure or ALTER DATABASE statements ready to review and run.

🚀

How to Run a Configuration Audit

▾

1
Open Configuration Intelligence
Navigate to Intelligence → Configuration. The audit summary shows a score (0–100) and a count of settings flagged as Critical, Warning, or Advisory.
2
Review Critical Findings First
Critical findings are configurations proven to cause data loss, corruption, or severe performance problems (e.g., max degree of parallelism = 0 on a 32-core NUMA system). These should be addressed before any other tuning.
3
Read the Recommendation Detail
Click any finding to see: the current value, the recommended value, the reasoning (with SQL Server version context), the Knowledge Graph node it links to, and the risk of changing it. Some changes require a service restart — this is flagged prominently.
4
Generate and Review the Script
Click Generate Fix Script on any finding. The script includes the current value as a comment so you can revert if needed. Always test configuration changes on a non-production instance first.
5
Enable Drift Alerting
Go to Alerts → Rules → New Rule and select the Configuration Drift template. This fires within 60 seconds of any sp_configure change, any ALTER DATABASE to a monitored setting, or any tempdb file count change.

💡

MAXDOP and Cost Threshold for Parallelism should be tuned together. DPSMF will not recommend a MAXDOP value without also validating that your Cost Threshold is set high enough to make that MAXDOP meaningful. The two settings are shown side-by-side in the audit.

📊

Key Settings Audited

▾

Setting	Why It Matters	Common Misconfiguration
`max degree of parallelism`	Controls CPU parallelism per query	0 (unlimited) on NUMA systems causes CXPACKET storms
`cost threshold for parallelism`	Minimum plan cost before parallel plan is used	Default value of 5 is too low for most OLTP workloads
`max server memory (MB)`	Caps SQL Server buffer pool	Default 2,147,483,647 starves OS and other services
`tempdb file count`	Reduces allocation page contention	Single file on multi-core servers causes GAM/SGAM waits
`optimize for ad hoc workloads`	Reduces plan cache bloat from single-use plans	Disabled by default; should be ON for most OLTP systems
`backup compression default`	Reduces backup I/O and storage	Disabled by default on Standard Edition

Module How-To — Maintenance Module

Maintenance Scheduler

Intelligent backup monitoring, job health tracking, and scheduled PDF/Excel reports — delivered to email, Teams, or ServiceNow automatically.

📅

What the Maintenance Scheduler Does

▾

💾

Backup Monitoring

Reads msdb.dbo.backupset and suspect_pages across all monitored instances. Alerts when a database has not been backed up within its defined SLA window, or when a backup fails.

✂️

SQL Agent Job Health

Tracks all SQL Agent job outcomes. Duration regressions (job takes 3× longer than its 30-day average) and failure streaks (3+ consecutive failures) generate automatic alerts.

📥

Scheduled Reports

Deliver daily, weekly, or monthly reports to email, Microsoft Teams, or ServiceNow. Reports include health scorecard, alert summary, top queries, backup status, and job health — in PDF or Excel.

🕒

Maintenance Window Management

Define per-instance maintenance windows. Alerts, anomaly scoring, and performance comparisons are suppressed during these windows so nightly index rebuilds don’t generate noise.

🚀

How to Configure Maintenance Monitoring

▾

1
Set Backup SLA Windows
Go to Settings → Instances → [Instance] → Backup SLA. Set the maximum allowed hours since last full backup and since last log backup (for databases in Full recovery model). DPSMF will alert if either threshold is breached.
2
Review Job Health Dashboard
Navigate to Maintenance → Job Health. Every SQL Agent job is listed with its last outcome, average duration (30-day), and a trend sparkline. Red rows indicate a job that failed on its last run or is running significantly longer than normal.
3
Configure a Scheduled Report
Go to Settings → Reports → New Report. Choose: report type (Daily Health, Weekly Executive, Monthly Trend), instances to include, output format (PDF or Excel), delivery channel (email address, Teams webhook, or ServiceNow assignment group), and delivery time. Click Save — the first report delivers on the next scheduled run.
4
Set a Maintenance Window
In Settings → Instances → [Instance] → Maintenance Window, enter the start and end time (e.g., 01:00–05:00). Select days of week. During this window: no new alerts fire, anomaly scores are frozen, and performance metrics are excluded from baseline calculations.
5
Add a Job Failure Alert
Go to Alerts → Rules → New Rule, select category SQL Agent. Choose: specific job name or “any job,” failure trigger (single failure or N consecutive), severity, and notification channel. Duration regression alerts are available under the same category.

💡

Set your backup SLA window tighter than your actual backup schedule. If your full backup runs nightly at 01:00, set the SLA to 26 hours (not 24) to absorb occasional delays without noise. Set log backup SLA to 30 minutes for databases requiring point-in-time recovery.

📥

Report Types & Contents

▾

Report	Cadence	Sections Included
Daily Health	Daily at chosen time	Alert count, top 3 performance issues, backup status, job failures, scorecard delta
Weekly Executive	Weekly (Mon 07:00 default)	Trend charts, scorecard history, top query changes, capacity forecast, action items
Monthly Trend	1st of month	30-day performance summary, index health evolution, security audit delta, growth projections
Security Audit	On-demand or scheduled	Full compliance scorecard, shadow identity list, permission map, TDE status, remediation checklist
Index Rebuild Plan	Weekly or on-demand	Fragmentation scan, rebuild/reorganise recommendations, estimated duration, maintenance window fit

❓

Trial FAQ

▾

Is a credit card required to start a trial?

No. Monitor and Intelligence trials require no payment information. Enterprise trials require a brief qualification call but no card.

Can I connect a cloud SQL Server (Azure SQL, AWS RDS)?

Azure SQL Database and SQL Server on Azure VM are fully supported. AWS RDS for SQL Server is supported with SQL Auth. Ensure port 1433 is open from the DPSMF agent to your cloud instance, or use the DPSMF Relay Agent for private VNet deployments.

Does the agent install anything on my SQL Server?

No. DPSMF is agentless — it connects over a standard SQL connection and reads DMVs. Nothing is installed on your SQL Server host. The only footprint is a read-only login.

What data does DPSMF store from my SQL Server?

DPSMF stores aggregated performance metrics, wait stat distributions, query hashes (not query text by default), execution plan shapes, and configuration values. No business data, row data, or PII is accessed or stored. You can review the full Data Processing Agreement at dpsmf.ai/legal/dpa.

Can I trial DPSMF against a production server?

Yes — in fact we recommend it. DPSMF is read-only and adds negligible load (polling queries are lightweight DMV reads). Many customers connect their busiest production instance first to get the most relevant Knowledge Graph output during the trial window.

How do I get help during my trial?

Email support@dpsmf.ai. Trial customers receive best-effort support with a 24-hour response window. Intelligence and Enterprise trial customers may request a 30-minute onboarding call with the DPSMF team.

Getting Started

System Requirements

What DPSMF needs to connect, collect, and alert correctly.

✅

SQL Server Permissions

▾

VIEW SERVER STATE

Required for DMV access: wait stats, blocking, CPU, memory, connections.

VIEW DATABASE STATE

Required for per-database metrics: index usage, query stats, transaction log.

VIEW ANY DEFINITION

Required for Query Store integration and execution plan analysis. Optional but recommended.

VIEW ANY DATABASE

Required to discover and list all databases on the instance.

💡

DPSMF never requires sysadmin or any write permission. Create a dedicated low-privilege monitoring account.

💻

Supported Versions

▾

Platform	Minimum Version	Notes
SQL Server	2016 (13.x)	Full DMV support from 2016 onwards
Azure SQL Database	Any	Some DMVs restricted; DPSMF adapts automatically
Azure SQL Managed Instance	Any	Full support
Amazon RDS for SQL Server	SQL Server 2016+	Windows Auth not available on RDS

⚠️

SQL Server 2014 and earlier are not supported. Several DMVs required for baseline learning were introduced in 2016.

Monitoring

Sentinel Dashboard

Real-time visibility across all monitored SQL Server instances.

🚩

Overview

▾

Sentinel is the primary monitoring view in DPSMF. It shows a live feed of all connected instances, current health scores, active alerts, and real-time metric sparklines — all updated every 10 seconds.

🔌

Instance Tiles

Each connected instance shows its current health state (green / amber / red), active alert count, and CPU/memory trend over the last 60 minutes.

⚡

Live Alert Feed

New alerts appear at the top of the feed in real time. Click any alert to open the full detail view with root cause analysis and recommended action.

📈

Metric Sparklines

CPU, wait time, blocking count, and active connections are charted as 60-minute sparklines directly on each instance tile.

🅾

Health Score

A 0–100 composite health score is computed every 10 minutes from the latest metrics, alert severity, and baseline deviation across 8 dimensions.

⛏

Live Stream (SN1)

▾

The SN1 live feed delivers a sub-second stream of metric events, alert state changes, and anomaly detections via server-sent events. It powers the real-time indicators on Sentinel tiles.

💡

If Sentinel tiles stop updating, check that your browser allows persistent connections to the DPSMF server. Corporate proxies that terminate long-lived HTTP connections will interrupt the live stream. A 30-second reconnect is attempted automatically.

🏭

Sentinel Feature Flags

▾

Individual Sentinel features (anomaly overlay, AG health panel, capacity widgets) can be enabled or disabled per-user by an admin under Settings → Sentinel Features. This allows feature rollout to be controlled without a code deployment.

Monitoring

Metrics & Collection

What DPSMF collects, how often, and how baselines are built.

📊

Collected Metrics

▾

Category	Metrics	Type
CPU & Schedulers	SQL CPU %, OS CPU %, scheduler queue length, context switches	CPU
Memory	Buffer pool pages, page life expectancy, memory grants pending, stolen server memory	MEM
Wait Statistics	All 900+ wait types tracked; top 20 surfaced with baseline deviation	WAIT
I/O	Read/write latency per file, stall rate, pending I/O count	I/O
Blocking & Deadlocks	Active blocking chains, deadlock events, lock wait time, lead blocker query	LOCK
Query Performance	Top queries by CPU, duration, reads, writes, executions; plan regressions	QUERY
Index Health	Fragmentation %, missing index recommendations, unused indexes, duplicate indexes	INDEX
Tempdb	Version store size, allocation contention, active tasks per session	MEM
Connections	Active connections, blocked sessions, sleeping sessions, orphaned transactions	CONN

⏳

Collection Intervals

▾

Data Type	Default Interval	Configurable
DMV snapshot (waits, CPU, blocking)	10 seconds	10s – 60s
Query stats (sys.dm_exec_query_stats)	60 seconds	30s – 300s
Index fragmentation scan	Daily (2 AM)	Schedule via Settings
Anomaly model refresh	Every 6 hours	Not configurable
Baseline recalculation	Daily (midnight)	Not configurable

⚠️

Reducing the DMV poll interval below 10 seconds on high-load instances can add measurable overhead. Use the default unless instructed by support.

Monitoring

Per-Database Monitoring

Database-level metrics, alerts, and health scoring broken out by individual database.

📄

How Per-DB Monitoring Works

▾

DPSMF collects instance-level metrics by default. Per-database monitoring extends this to track transaction log usage, index health, active query load, and alert thresholds individually per database — so a single busy reporting database does not mask the health of other databases on the same instance.

📄

Log Space Usage

Transaction log utilisation tracked per database. Alerts fire when log space exceeds configurable thresholds (default 75%, critical 90%).

🔍

Query Attribution

Query-level metrics are attributed to their database context, allowing hotspot analysis per database rather than just per instance.

🚨

Per-DB Alerts

Alert thresholds can be set independently per database. A reporting database can have looser CPU thresholds than an OLTP database on the same host.

📈

Growth Trends

Data and log file growth is tracked over time. Projected full dates are shown when a consistent growth trend is detected.

Monitoring

Availability Groups

Monitor Always On AG health, replica synchronisation, and failover readiness.

🔗

AG Health Overview

▾

The AG Monitor tracks all Availability Groups on connected instances. It polls sys.dm_hadr_availability_group_states, sys.dm_hadr_database_replica_states, and sys.dm_hadr_availability_replica_states every 10 seconds.

🔗

Replica State

Primary/secondary role, synchronisation state (SYNCHRONIZED / SYNCHRONIZING / NOT SYNCHRONIZING), and connected/disconnected status.

Live

⌛

Redo & Send Queue

Log send queue and redo queue sizes tracked per replica. Sudden queue growth triggers a warning before data loss risk becomes critical.

🚨

Failover Readiness

Automatic failover eligibility assessed continuously. An alert fires if an AG that had automatic failover configured loses its eligible secondary.

📈

Latency Trending

Commit latency per replica tracked and baselined. Unusual spikes indicate network contention or secondary CPU pressure.

💡

AG monitoring is available on the Intelligence and Enterprise tiers. Connect both the primary and all secondary replicas as separate instances in DPSMF for full cross-replica visibility.

Alerting

Alert Manager

View, acknowledge, escalate, and resolve alerts across all instances.

🔔

Alert Lifecycle

▾

1
Alert Fires
A metric breaches its 3σ threshold (or a configured static threshold). The alert appears in the live feed with severity: Info, Warning, or Critical.
2
Acknowledge
Click Acknowledge to indicate the alert has been seen. An unacknowledged alert will escalate to email after 60 minutes.
3
L2 Escalation
If still unacknowledged after 120 minutes, the alert escalates to L2 with an urgent email. The alert manager shows an escalation badge.
4
Resolve
Once the underlying condition clears, click Resolve and optionally add a resolution note. Resolved alerts are archived and contribute to the knowledge graph for future pattern matching.

🔍

Alert Detail View

▾

Each alert opens a detail view showing the metric timeline, the anomaly score at the time of firing, correlated events within ±30 minutes, and AI-generated root cause analysis with a specific recommended action.

🤖

The Correlated Events panel is the most useful part of the detail view. It shows which other metrics were also deviating at the same time — often pointing directly to the root cause (e.g., a blocking chain that caused a CPU spike 8 minutes later).

Alerting

Alert Policies

Control which metrics alert, at what thresholds, and for which instances.

📋

Creating a Policy

▾

1
Go to Settings → Policies
All active policies are listed here with their scope (instance, database, or global) and current status.
2
Choose Policy Type
Select Baseline (fires when deviation exceeds Nσ), Threshold (fires when metric exceeds a fixed value), or Absence (fires if a metric stops being collected).
3
Set Scope
Scope the policy to a specific instance, a specific database, or apply it globally across all monitored instances.
4
Configure Severity & Suppression
Set the severity (Info / Warning / Critical) and optionally configure a suppression window (e.g., suppress between 01:00–05:00 during maintenance windows).

🔒

Policy Check Enforcement

▾

DPSMF enforces license-level policy limits. Monitor tier allows up to 10 active policies per instance. Intelligence and Enterprise tiers are unlimited. Policy checks are evaluated before every alert fires — a suppressed policy will not alert even if the condition is met.

Alerting

Runbooks

Automated response playbooks that attach to alerts and guide remediation.

📚

What Are Runbooks?

▾

A runbook is a structured set of steps that appears automatically when a specific alert type fires. Instead of DBA tribal knowledge living in someone's head, runbooks encode the correct response procedure and make it instantly available to any engineer on call.

💡

Runbooks are most valuable for alerts that fire at 3 AM. Attaching a runbook to your most common Critical alerts means any on-call engineer can follow the correct remediation steps without escalating to a senior DBA.

📝

Creating a Runbook

▾

1
Go to Settings → Runbooks → New Runbook
Give the runbook a name and description. Choose which alert type(s) it attaches to (e.g., all blocking_count critical alerts).
2
Add Steps
Each step has a title, detail text, and an optional verification query — a read-only T-SQL query that the engineer can run to confirm the condition before and after applying a fix.
3
Set Escalation Path
If the runbook steps don’t resolve the issue, configure the escalation contact (name, email, phone) shown at the bottom of the runbook view.
4
Activate
Toggle the runbook to Active. It will now appear automatically in the alert detail view whenever a matching alert fires.

Alerting

Notifications

How DPSMF delivers alert notifications and how to configure channels.

📨

Notification Channels

▾

💌

Alert emails are sent on first fire, on L1 escalation (60 min unacknowledged), and on L2 escalation (120 min). Configure recipients under Settings → Notifications.

📱

In-App Feed

All alerts appear in the Sentinel live feed in real time. No configuration required.

🔌

Webhook (coming soon)

Outbound webhook support for Slack, Teams, and PagerDuty is on the roadmap for the next release.

Soon

📰

Scheduled Reports

Daily and weekly alert summary reports can be emailed to a distribution list. Configure under Settings → Reports.

Intelligence

Anomaly Detection

Statistical anomaly detection across 140+ metrics using learned baselines.

🤖

How It Works

▾

DPSMF’s anomaly engine builds a rolling baseline for every tracked metric using a 7-day sliding window. It computes the mean and standard deviation for each metric at each time-of-day and day-of-week slot, then scores incoming values against the expected distribution.

Score Range	Meaning	Action
0 – 1σ	Normal variation	No alert
1σ – 2σ	Elevated — watch	Logged only
2σ – 3σ	Significant deviation	Info alert (Aggressive mode)
> 3σ	Strong anomaly	Warning or Critical alert

⚙️

Sensitivity Modes

▾

Mode	Alert Threshold	Best For
Conservative	5σ	High-noise environments; batch/ETL servers
Default	3σ	Most OLTP and mixed-workload environments
Aggressive	2σ	Critical OLTP servers where early warning is paramount

⚠️

Aggressive mode significantly increases alert volume during the first 30 days while baselines are still maturing. Recommended only after at least 14 days of data collection.

🔗

Event Correlation

▾

When an anomaly fires, DPSMF searches the previous 30 minutes for other anomalies on the same instance. Correlated events are ranked by temporal proximity and knowledge graph relationship strength, then presented in the alert detail view as likely contributing causes.

Intelligence

Knowledge Graph

A self-learning graph that links SQL Server events, queries, and resources into a searchable intelligence layer.

🧠

What Is the Knowledge Graph?

▾

The knowledge graph is DPSMF’s long-term memory. Every query hash, wait type, metric event, and alert is stored as a node. Observed relationships between them — temporal co-occurrence, causal linkage, query-to-wait attribution — are stored as edges. A taxonomy layer classifies nodes into categories (workload, resource, maintenance, replication, etc.).

💡

The knowledge graph is the primary reason DPSMF’s false positive rate decreases over time. After 30 days, DPSMF knows that query X always causes wait type Y on your instance — and stops alerting on Y when X is the known cause.

🔍

Exploring the Graph

▾

Open Knowledge Graph → Explorer (/kg-explorer.html) to browse nodes and edges interactively. Search by query hash, wait type name, metric name, or alert ID. Click any node to see its connected edges and the evidence strength for each relationship.

Intelligence

Capacity Planner

Forecast when your SQL Server will run out of CPU, memory, storage, or connections.

📈

How Capacity Forecasting Works

▾

DPSMF fits a linear trend model to each capacity metric (CPU, buffer pool, data file size, log file size, connection count) using the last 30 days of collected data. It extrapolates forward and identifies the projected date when each metric will breach its defined capacity threshold.

📈

Projected Full Date

For storage metrics, DPSMF shows the date when the data or log file is projected to fill the available disk space at the current growth rate.

⚠️

Warning Horizon

An amber warning fires when projected full is within 30 days. A red critical alert fires when within 7 days. Both are configurable.

📊

Trend Confidence

Each forecast shows an R² confidence score. Low confidence (below 0.5) means the growth pattern is irregular — treat the projection as indicative, not precise.

📰

Capacity Reports

A weekly capacity summary email can be configured under Settings → Reports, listing all instances with projected constraint dates.

Intelligence

Scorecard

A composite health score across 8 dimensions, updated every 10 minutes.

🏆

Scorecard Dimensions

▾

Dimension	Weight	What It Measures
Query Health	25%	Top query baseline deviation, plan regression count, missing index severity
Wait Profile	20%	Dominant wait types vs. baseline, unusual wait pattern frequency
Blocking	15%	Active blocking chain count, max block duration, deadlock rate
I/O Performance	15%	Read/write latency vs. baseline, stall rate
Memory Pressure	10%	Page life expectancy trend, memory grants pending, buffer pool pressure
CPU Utilisation	5%	Average and peak CPU vs. baseline, scheduler queue
Index Health	5%	Fragmentation level, unused index count, missing index impact score
Connectivity	5%	Connection count trend, orphaned transactions, sleeping session count

💡

A score of 80+ is healthy. 60–79 indicates monitored issues that are not yet critical. Below 60 indicates one or more significant problems that should be investigated.

Administration

Instance Management

Add, configure, and remove monitored SQL Server instances.

🔌

Adding an Instance

▾

1
Go to Settings → Instances → Add Instance
Enter a friendly name, hostname or IP address, and port. The default SQL Server port is 1433.
2
Enter Credentials
Choose SQL Auth (username + password) or Windows Auth. Credentials are stored encrypted using AES-256-GCM. They are never displayed in plaintext after saving.
3
Test & Save
Click Test Connection to verify. A successful test confirms connectivity and permission adequacy. Click Save — collection begins immediately.

⚠️

Your license determines the maximum number of instances you can connect. Attempting to add an instance beyond your limit will show a license upgrade prompt.

⚙️

Instance Settings

▾

Setting	Default	Description
Poll Interval	10s	How often DMVs are polled. Lower = more granular but higher overhead.
Alert Sensitivity	Default (3σ)	Anomaly threshold. Conservative / Default / Aggressive.
Baseline Window	7 days	Rolling window used to compute the statistical baseline.
Maintenance Window	None	Time range when alerts are suppressed (e.g., nightly index rebuilds).
Tags	—	Free-text tags for grouping instances (e.g., Production, Staging, Region).

Administration

User Management

Create, manage, and reset passwords for DPSMF users.

👤

Roles

▾

Role	Capabilities
dba	Full access: acknowledge/resolve alerts, manage instances, runbooks, policies, settings
viewer	Read-only: view dashboards, alerts, metrics, and scorecard — no write actions
readonly	Demo-level: GET requests only — suitable for stakeholder dashboards

📝

Creating a User

▾

1
Go to Settings → Users → Add User
Enter a username, email address, and temporary password (minimum 8 characters).
2
Assign a Role
Choose dba, viewer, or readonly. Role controls what actions the user can perform across all DPSMF screens.
3
Send Credentials
Share the temporary password with the user. They can change it after first login via Profile → Change Password.

🔒

Resetting a Password

▾

Go to Settings → Users, find the user, and click Reset Password. Enter a new temporary password (min 8 characters). The user will be required to change it on next login.

Administration

Licensing

Manage your DPSMF license, view usage, and understand tier limits.

🏭

License Tiers

▾

Tier	Instances	Anomaly	AG Monitor	Capacity	Policies
Monitor	3	✓	✗	✗	10/instance
Intelligence	10	✓	✓	✓	Unlimited
Enterprise	Unlimited	✓	✓	✓	Unlimited

📋

Activating a License Key

▾

Go to Settings → License → Activate and enter your license key. Keys take the format DPSMF-XXXXXX-XXXXXX. After activation, the license manager shows the tier, expiry date, instance count, and days remaining.

⚠️

Trial licenses expire automatically. If your trial expires while instances are connected, monitoring continues in read-only mode until a paid license is activated. No historical data is lost.

Administration

Settings

Global configuration for alerting, collection, notifications, and maintenance.

⚙️

Settings Reference

▾

Section	Key Settings
General	Default alert sensitivity, timezone, date format, dashboard refresh rate
Instances	Add / edit / remove monitored instances; per-instance poll interval and maintenance windows
Notifications	Alert email recipients, escalation contacts (L1/L2), email server configuration
Policies	Create and manage alert policies; set suppression windows
Runbooks	Create and manage remediation runbooks attached to alert types
Users	Add users, assign roles, reset passwords
License	View current license, activate new keys, check usage vs. limits
Reports	Configure scheduled daily/weekly email reports
Sentinel Features	Enable/disable individual dashboard feature panels per user

Help

FAQ

Common questions and troubleshooting.

❓

Frequently Asked Questions

▾

Why is DPSMF not alerting on anything? +

DPSMF observes silently for the first 7 days to build baselines. No alerts fire during this learning period. Go to Settings → Instances to confirm the connection is active and check the baseline start date.

What SQL Server permissions does DPSMF need? +

VIEW SERVER STATE and VIEW DATABASE STATE are required. DPSMF is read-only and never writes to your SQL Server. For Query Store integration, VIEW ANY DEFINITION is also needed. See the Requirements section for the full list.

Why does the same alert fire every day at the same time? +

This typically means a recurring job or report runs at that time. After 7–14 days DPSMF will learn the pattern and stop alerting. If it persists, open the alert detail, find the responsible query, and use Mark as Expected to suppress it.

Can DPSMF monitor Azure SQL or Amazon RDS? +

Yes. DPSMF connects via standard TDS and supports SQL Server 2016+, Azure SQL Database, Azure SQL Managed Instance, and Amazon RDS for SQL Server. Some DMVs are restricted in Azure SQL — DPSMF adapts its collection strategy automatically.

My Sentinel tiles stopped updating. What do I do? +

The live feed uses a persistent server-sent events connection. Corporate proxies that terminate long connections will interrupt it. DPSMF retries automatically every 30 seconds. If tiles remain stale, reload the page. If the issue persists, check that your proxy allows long-lived HTTP connections to the DPSMF server.

How do I suppress alerts during a maintenance window? +

Go to Settings → Instances, select the instance, and set a Maintenance Window (e.g., 01:00–05:00 daily). Alerts will not fire during that window. Alternatively, create an alert policy with a suppression schedule under Settings → Policies.

What happens when my trial expires? +

Monitoring continues in read-only mode. Historical data and all settings are preserved. You can activate a paid license key at any time to restore full functionality.

How do I add more monitored instances beyond my current limit? +

You need to upgrade to a higher tier. Go to Settings → License and click Upgrade, or contact sales@answerpoint.com. Your existing instances and historical data are unaffected by a tier upgrade.

Is collected data stored on my SQL Server? +

No. DPSMF stores all collected metrics, baselines, and alert history in its own database, separate from the instances it monitors. Nothing is written to your SQL Server instances.