In this final post about xEvent based query monitoring data collection, I’ll cover the following:
- Running daily pre-aggregations of query stats by query hash and purging aging raw xEvent data
- Some useful sample queries for making sense of all the data you are now collecting.
Once you have a solution like this in place and large datasets are growing larger on your central database, you need to consider a data retention process. Obviously you can’t keep a record of every execution of every sql statement in perpetuity, but you do want to maintain a balance between low granularity in the short term and high granularity over a longer period of time. The way I do this is to have a couple extra tables that store daily aggregations of stats (by database and query hash). Basically I run a job every day to calculate the aggregates by database and query hash (individual statements). I added some basic data warehousing fields (day/week/month/year/etc.) and I am aggregating Execution count, CPU, Reads and Durations. The table diagram below shows this:
The queries to generate rows for these tables are straightforward. For example:
declare @StartDate Date=getdate()-1
declare @EndDate date
Select @EndDate = dateadd(day,1,@Startdate)– delete any data you already have for yesterday
— Run the group by query aganist the raw table XEvents_Queries
— and insert rows to the pre-aggregate table [XEvents_Queries_ByDBAggregate]
INSERT INTO [dbo].[XEvents_Queries_ByDBAggregate]
([dbserverid]
,[servername]
,[rowyear]
,[rowmonth]
,[rowDay]
,[rowweek]
,[rowdayofweek]
,[databasename]
,[executioncount]
,[TotalReads]
,[AvgReads]
,[MaxReads]
,[TotalCPU]
,[AvgCPU]
,[MaxCPU]
,[TotalDuration]
,[AvgDuration]
,[MaxDuration])
Select b.dbserverid, a.servername, datepart(year,a.timestamp) rowyear,datepart(month,a.timestamp) rowmonth,datepart(day,a.timestamp) rowDay,
datepart(week,a.timestamp) rowweek,datepart(weekday,a.timestamp) rowdayofweek,a.databasename, count(*) as executioncount,
sum(a.logical_reads) as TotalReads,
Avg(a.logical_reads) as AvgReads,
Max(a.logical_reads) as MaxReads,
sum(cpu_time_ms) as TotalCPU,
avg(cpu_time_ms) as AvgCPU,
max(cpu_time_ms) as MaxCPU,
sum(duration_ms) as TotalDuration,
avg(duration_ms) as AvgDuration,
max(duration_ms) as MaxDuration
from XEvents_Queries a
inner join DB_ServerMain b
on a.ServerName = b.ServerName
WHERE a.timestamp >= @Startdate
and a.timestamp < @EndDate
group by b.dbserverid, a.servername, datepart(year,a.timestamp) ,datepart(month,a.timestamp) ,datepart(day,a.timestamp) ,
datepart(week,a.timestamp) ,datepart(weekday,a.timestamp) ,a.databasename
The code above sets two variables (notice they are date and not datetime) for yesterday and today and then it generates the aggregate data (by databasename in the example above). As designed, you can put the logic above into a sql agent job and run it at 12:01AM each day to generate the aggregates for the previous day. The whole script to populate both the database level and queryhash level aggregate tables can be found here:aggregatesonceaday.sql This script will create the aggregate tables:preaggregatetablecreates.sql
Once you have these in place, you can easily build trend analysis and troubleshooting queries like this:
— return the queries by database AND highest total reads per week
select
qs.servername
as instancename,
qs.databasename,
query_hash,
isnull(objectname,”) as objectname,
isnull(objectname,”) + ‘ – ‘ + sql as sql,
dateadd(week, rowweekofyear, dateadd(year, rowyear–1900, 0)) – 4 –
datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear–1900, 0)) – 4) + 1 as WeekStart,
sum(executioncount)executioncount,
sum(TotalReads) TotalReads,
avg(AvgReads) AvgReads,
max(MaxReads) MaxReads,
sum(TotalCPU) TotalCPU,
avg(AvgCPU) AvgCPU,
max(MaxCPU) MaxCPU,
sum(TotalDuration)TotalDuration,
avg(AvgDuration) AvgDuration,
max(MaxDuration) MaxDuration
,RANK() over (PARTITION by dateadd(week, rowweekofyear, dateadd(year, rowyear–1900, 0)) – 4 –
datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear–1900, 0)) – 4) + 1 , DatabaseName
order by sum(TotalReads) desc) as Rank
from XEvents_Queries_ByDBByQueryHashAggregate qs
WHERE dateadd(week,rowweekofyear,‘1/1/’+convert(varchar(10),rowyear)) > getdate()-35 — for the last 5 weeks
AND qs.servername = ‘<servernamehere>’
group by
qs.servername,
databasename,
query_hash,
isnull(objectname,”),
isnull(objectname,”) + ‘ – ‘ + sql,
dateadd(week, rowweekofyear, dateadd(year, rowyear–1900, 0)) – 4 – datepart(dw, dateadd(week, rowweekofyear, dateadd(year, rowyear–1900, 0)) – 4) + 1
And this… to get the aggregates by database only (so you can get a basic “how active is this database in general?” type metric:
— return the query metrics by database only
select dbserverid,
servername,
databasename,
dateadd(week, rowweek, dateadd(year, rowyear–1900, 0)) – 4 –
datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear–1900, 0)) – 4) + 1 as StartDate,
sum(executioncount)executioncount,
sum(TotalReads) TotalReads,
avg(AvgReads) AvgReads,
max(MaxReads) MaxReads,
sum(TotalCPU) TotalCPU,
avg(AvgCPU) AvgCPU,
max(MaxCPU) MaxCPU,
sum(TotalDuration)TotalDuration,
avg(AvgDuration) AvgDuration,
max(MaxDuration) MaxDuration
from emdw..XEvents_Queries_ByDBAggregate
WHERE dateadd(week,rowweek,‘1/1/’+convert(varchar(10),rowyear)) > getdate()-35
AND servername = ‘<servernamehere>’
group by
dbserverid,
servername,
databasename,
dateadd(week, rowweek, dateadd(year, rowyear–1900, 0)) – 4 –
datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear–1900, 0)) – 4) + 1
order by dateadd(week, rowweek, dateadd(year, rowyear–1900, 0)) – 4 –
datepart(dw, dateadd(week, rowweek, dateadd(year, rowyear–1900, 0)) – 4) + 1 desc
This should get you going and please post comments with any questions or requests for clarifications.