本篇文章將為大家詳細(xì)介紹一種通過電子郵件技術(shù)發(fā)出的警報(bào),可以應(yīng)用在長時(shí)間運(yùn)行的 sql 代理作業(yè)時(shí)收到通知。下面是詳細(xì)請內(nèi)容,希望能夠?qū)Υ蠹矣兴鶐椭?/p>
問題
當(dāng)你必須在繁忙的生產(chǎn)服務(wù)器上管理數(shù)百個(gè)計(jì)劃作業(yè)時(shí),不可避免地會(huì)出現(xiàn)作業(yè)需要很長時(shí)間才能完成的情況,從而導(dǎo)致大量等待或影響其他進(jìn)程的性能。在深入調(diào)查性能下降的原因之前,我們想知道一項(xiàng)工作何時(shí)開始花費(fèi)太長時(shí)間。有些工作不僅需要很長時(shí)間才能完成,而且它們可能需要比通常使用的時(shí)間更長的時(shí)間才能完成。
哪些工作的表現(xiàn)正在倒退?緩慢的工作現(xiàn)在是否一直成為問題,還是一次性發(fā)生?當(dāng)需要調(diào)查一項(xiàng)工作時(shí),通常在時(shí)間限制內(nèi),運(yùn)行時(shí)長是我們分析中首先要考慮的指標(biāo)。
為了將當(dāng)前運(yùn)行時(shí)間與同一作業(yè)之前執(zhí)行的持續(xù)時(shí)間進(jìn)行比較,我們通常會(huì)查看作業(yè)的前一個(gè)運(yùn)行時(shí)間的歷史記錄,并看到當(dāng)前運(yùn)行時(shí)間比之前的幾個(gè)持續(xù)時(shí)間長。然而,如果你以 DBA、開發(fā)人員或 DevOps 工程師的身份管理 500 份工作,一次調(diào)查一項(xiàng)工作的任務(wù)可能需要半天時(shí)間,但仍然無法讓你更接近結(jié)論。
你可能需要檢查 20 個(gè)作業(yè),你不想一直單獨(dú)照看每份工作。在這種情況下,發(fā)送到你的郵箱的自動(dòng)警報(bào)會(huì)派上用場。本文中的解決方案是創(chuàng)建一個(gè)將接受參數(shù)的存儲(chǔ)過程。此參數(shù)是用于計(jì)算過去作業(yè)平均持續(xù)時(shí)間的天數(shù)。該過程會(huì)生成一份報(bào)告,顯示當(dāng)前正在運(yùn)行的作業(yè)的性能正在下降(退化)以及可選的電子郵件警報(bào)。
理解這種方法的某些步驟以及我們已經(jīng)在生產(chǎn)中使用的方便的存儲(chǔ)過程如下所示(在運(yùn)行受監(jiān)控作業(yè)的同一服務(wù)器上安排為單獨(dú)的作業(yè)(有關(guān)此類作業(yè)的腳本,請參閱“附錄”) ) 以幫助您收到有關(guān)在輪詢時(shí)正在退化的一個(gè)或多個(gè)作業(yè)的電子郵件警報(bào)。我們?asp_long_running_Regressing_Jobs_Alerts
?每分鐘在我們的環(huán)境中執(zhí)行 ?Stored Proc
?, ,因?yàn)槲覀兊淖鳂I(yè)持續(xù)時(shí)間從幾秒到幾小時(shí)不等。在給予之前SP DDL,SP 的某些組件被分解為先決條件(小尺寸代碼片段),有助于全面了解警報(bào) SP。
所需工具列表:?SQL Server
?(以下代碼在 2012 及更高版本上測試)
這里的目標(biāo)是將每個(gè)當(dāng)前正在運(yùn)行的作業(yè)的持續(xù)時(shí)間與給定時(shí)間段內(nèi)同一作業(yè)的所有運(yùn)行時(shí)間的平均持續(xù)時(shí)間進(jìn)行比較。
注意:在繼續(xù)編譯下面給出的存儲(chǔ)過程之前,請確保你在要分析計(jì)劃作業(yè)的環(huán)境中具有高訪問權(quán)限。理想情況下,管理員。要檢查你的訪問級(jí)別(如果你不是該服務(wù)器的管理員,請運(yùn)行此或類似的 T-SQL 語句。如果你沒有此查詢的結(jié)果,則你沒有對 MSDB 表的選擇權(quán)限。
USE MSDB
GO
SELECT HAS_PERMS_BY_NAME(QUOTENAME(SCHEMA_NAME(schema_id)) + '.' +
QUOTENAME(name), 'OBJECT', 'SELECT') AS have_select, *
FROM sys.tables
GO
圖 1:結(jié)果集顯示你擁有 SELECT 權(quán)限的表。
解決方案
下面存儲(chǔ)過程的 T-SQL DDL 代碼可以在您選擇的任何數(shù)據(jù)庫中編譯。我們使用專用?DBA_db
?于此類管理 SP。
USE [DBA_db]
GO
-- <begin stored procedure DDL/>
CREATE PROCEDURE asp_long_running_Regressing_Jobs_Alerts
@history_days int = 7,
@avg_duration_multiplier float = 1.5,
@bEmail bit = 0,
@bSaveToTable bit = 0,
@RecipientsList Varchar(1000) = 'myName@myCoDomain.com',
@ignore_zero_durations bit = 0
AS
/* example of usage:
exec DBA_db..asp_long_running_Regressing_Jobs_Alerts
@history_days = 45,
@avg_duration_multiplier = 2,
@bEmail = 0,
@bSaveToTable = 0,
@RecipientsList = 'myName@myCoDomain.com;' ,
@ignore_zero_durations = 1
AUTHOR(s):
Vladimir Isaev;
-- + V.B., S.L;
-- contact@sqlexperts.org
*/
/*PARAMETERS:
@history_days int (how many days back we use for AVF run duration)
@avg_duration_multiplier (how many times longer than AVG will qualify job
for producing an alert)
@bEmail (send out Alert Email or just print the msg about Regressing jobs)
-- 'REGRESSION' is defined here by Duration only
*/
SET NOCOUNT ON
BEGIN
select sj.name,
sja.start_execution_date,
sja.stop_execution_date,
ajt.min_run_duration,
ajt.max_run_duration,
ajt.avg_run_duration,
datediff(ss, start_execution_date, getdate()) as cur_run_duration
into #Regressing_Jobs
from msdb..sysjobactivity sja
left join
(select job_id,
avg(dbo.udf_convert_int_time2ss(run_duration)) as avg_run_duration,
min(dbo.udf_convert_int_time2ss(run_duration)) as min_run_duration,
max(dbo.udf_convert_int_time2ss(run_duration)) as max_run_duration
from msdb..sysjobhistory
where step_id=0
and run_date >CONVERT(varchar(8),GETDATE() - @history_days,112)
and ((run_duration <> 0 or @ignore_zero_durations = 0))
and run_duration < 240000
group by job_id
)ajt on sja.job_id=ajt.job_id
join msdb..sysjobs sj on sj.job_id=sja.job_id
where
sja.session_id = (SELECT TOP 1 session_id
FROM msdb.dbo.syssessions
ORDER BY agent_start_date DESC)
AND start_execution_date is not null
and stop_execution_date is null
and datediff(ss, start_execution_date, getdate()) >
ajt.avg_run_duration * @avg_duration_multiplier
select name as JobName,
start_execution_date,
stop_execution_date,
dateadd(second, min_run_duration, 0) as min_run_duration,
dateadd(second, max_run_duration, 0) as max_run_duration,
dateadd(second, avg_run_duration, 0) as avg_run_duration,
dateadd(second, cur_run_duration, 0) as cur_run_duration
into #Regressing_Jobs_DurAsDate
from #Regressing_Jobs
-- waitfor delay '00:00:10'
declare @sHtml varchar(max) = ''
declare @tableHTML nvarchar(max) =
N'<H1>Job(s) taking longer than recent baseline duration
(in descending avg duration order):</H1>' + Char(13)
+ N' <table border="1">' + Char(13)
+ N' <tr bgcolor="#ddd">' + Char(13)
+ N' <th>Start Time</th>' + Char(13)
+ N' <th>Job Name</th>' + Char(13)
+ N' <th>Host Name</th>' + Char(13)
+ N' <th>History Days</th>' + Char(13)
+ N' <th>Avg Dur Mul</th>' + Char(13)
+ N' <th>Min Dur</th>' + Char(13)
+ N' <th>Max Dur</th>' + Char(13)
+ N' <th>Avg Dur</th>' + Char(13)
+ N' <th>Cur Dur</th>' + Char(13)
+ N' </tr>' + Char(13)
select @tableHTML = @tableHTML
+ FORMATMESSAGE(
'<tr><td>%s</td>' _
+ Char(13) --start_execution_date
+ '<td>%s</td>' + Char(13) --name
+ '<td>%s</td>' + Char(13) --@@SERVERNAME
+ '<td style="text-align:center">%i</td>' _
+ Char(13) --@history_days
+ '<td style="text-align:center">%s</td>'
+ Char(13) --@avg_duration_multiplier
+ '<td>%s</td>' + Char(13) --Min Dur
+ '<td>%s</td>' + Char(13) --Max Dur
+ '<td>%s</td>' + Char(13) --Avg Dur
+ '<td>%s</td>' + Char(13),--Cur Dur
convert(varchar, start_execution_date, 120),
JobName,
@@SERVERNAME,
@history_days,
convert(varchar, @avg_duration_multiplier),
format(min_run_duration, N'HH\hmm\mss\s'),
format(max_run_duration, N'HH\hmm\mss\s'),
format(avg_run_duration, N'HH\hmm\mss\s'),
format(cur_run_duration, N'HH\hmm\mss\s')
)
from #Regressing_Jobs_DurAsDate
order by avg_run_duration desc, JobName
select @tableHTML = @tableHTML + '</tr></table>' + Char(13)
select @sHtml = @tableHTML
--select @sHtml
declare @DateStr varchar(30) = convert(varchar,getdate(),121)
IF @bEmail = 1 and (select count(*) from #Regressing_Jobs) > 0
begin
declare @sSubject varchar(250)
= @@SERVERNAME + ' Job(s) taking longer than recent baseline duration: ' _
+ @DateStr
EXEC msdb.dbo.sp_send_dbmail @profile_name='SQL Server Monitoring Account',
@recipients= @RecipientsList,
@subject=@sSubject,
@body=@sHtml,
@body_format = 'HTML'
print 'email sent: ' + CHAR(13) + @sHtml
end
IF @bSaveToTable = 1
begin
insert into RegressingJobs
(
CaptureDateTime,
JobName,
start_execution_date,
HostName,
history_days,
avg_duration_multiplier,
min_run_duration,
max_run_duration,
avg_run_duration,
cur_run_duration
)
select @DateStr,
JobName,
start_execution_date,
@@SERVERNAME,
@history_days,
@avg_duration_multiplier,
min_run_duration,
max_run_duration,
avg_run_duration,
cur_run_duration
from #Regressing_Jobs_DurAsDate
end
begin
SELECT 'JOBS THAT ARE TAKING LONGER THAN USUAL: '
select @DateStr as CaptureDateTime, JobName, _
start_execution_date, @@SERVERNAME as 'Server',
@history_days as '@history_days', _
@avg_duration_multiplier as '@avg_duration_multiplier',
min_run_duration, max_run_duration, _
avg_run_duration, cur_run_duration
from #Regressing_Jobs_DurAsDate
end
--all currently running jobs:
begin
SELECT ' ALL JOBS THAT ARE CURRENTLY RUNNING: '
SELECT
-- '', -- CAST (ja.job_id AS VARCHAR(max)),
j.name AS job_name,
cast ( ja.start_execution_date as varchar) start_execution_time,
cast ( ja.stop_execution_date as varchar) stop_execution_time,
-- ISNULL(last_executed_step_id,0)+1 AS current_executed_step_id,
Js.step_name step_name
FROM msdb.dbo.sysjobactivity ja
LEFT JOIN msdb.dbo.sysjobhistory jh
ON ja.job_history_id = jh.instance_id
JOIN msdb.dbo.sysjobs j
ON ja.job_id = j.job_id
JOIN msdb.dbo.sysjobsteps js
ON ja.job_id = js.job_id
AND ISNULL(ja.last_executed_step_id,0)+1 = js.step_id
WHERE ja.session_id =
(SELECT TOP 1 session_id
FROM msdb.dbo.syssessions
ORDER BY agent_start_date DESC)
AND start_execution_date is not null
AND stop_execution_date is null;
end
END
GO
-- <end of stored procedure DDL/>
asp_long_running_Regressing_Jobs_Alerts 使用的 2 個(gè) UDF
-- dependencies of asp_long_running_Regressing_Jobs_Alerts:
-- udf_convert_int_time
CREATE FUNCTION [dbo].[udf_convert_int_time] (@time_in INT)
RETURNS TIME
AS
BEGIN
DECLARE @time_out TIME
DECLARE @time_in_str varchar(6)
SELECT @time_in_str = RIGHT('000000' + CAST(@time_in AS VARCHAR(6)), 6)
SELECT @time_out = CAST(STUFF(STUFF(@time_in_str,3,0,':'),6,0,':') AS TIME)
RETURN @time_out
END
GO
-- udf_convert_int_time2ss
CREATE FUNCTION [dbo].[udf_convert_int_time2ss] (@time_in INT)
RETURNS int
AS
BEGIN
DECLARE @time_out int
select @time_out = datediff(ss, 0, dbo.udf_convert_int_time(@time_in))
RETURN @time_out
END
GO
除了 SP 標(biāo)頭中列出的調(diào)用示例之外,以下是一個(gè)典型調(diào)用的示例:
exec dba_DB.dbo.asp_long_running_Regressing_Jobs_Alerts
@history_days = 45,
@avg_duration_multiplier = 2,
@bEmail = 1,
@bSaveToTable = 0,
@RecipientsList = 'myName@myCoDomain.com;
AssociateName@myCoDomain.com’
@ignore_zero_durations = 1
SP 的此調(diào)用意味著以下內(nèi)容:
給我一份報(bào)告(或警報(bào)),說明在 45 天內(nèi)完成相同作業(yè)的平均運(yùn)行時(shí)間所需時(shí)間的兩倍的所有作業(yè)。通過電子郵件將此類報(bào)告發(fā)送給我 ( ?myName
?) 和我的同事 ( ?myAssociateName
?)。不要將此數(shù)據(jù)保存到基線表,并且不包括持續(xù)時(shí)間為零的作業(yè)。
圖 2:SSMS 中對 SP 的另一個(gè)類似調(diào)用的示例輸出:在這種情況下,顯示現(xiàn)在花費(fèi)的時(shí)間比過去 2 天花費(fèi) AVG 多 10% 的作業(yè)。
下圖顯示了由 SP 生成的電子郵件警報(bào)在 HTML 中的外觀示例。
此電子郵件通知僅在 時(shí)通過調(diào)用此 SP 生成?parameter @bEmail = 1
?。
圖 3
電子郵件主題行如下:?<ServerName>
?作業(yè)花費(fèi)的時(shí)間比最近的基線持續(xù)時(shí)間長;?yyyy-mm-dd mm:ss
?
將結(jié)果保存到表格以供將來的歷史分析
如果你決定將報(bào)告保存到表格,請執(zhí)行以下操作:除了通過電子郵件或在 SSMS 中直接運(yùn)行 SP 收到警報(bào)外,還需要一個(gè)表格。用?@bSaveToTable = 1
?呼叫 SP 。(它是 SP 的依賴項(xiàng)之一,因此即使此時(shí)您沒有將結(jié)果放入表格,也最好創(chuàng)建它)。這是表 DDL:
CREATE TABLE [RegressingJobs](
[CaptureDateTime] [datetime] NULL,
[JobName] [sysname] NOT NULL,
[start_execution_date] [datetime] NULL,
[HostName] [sysname] NOT NULL,
[history_days] [int] NULL,
[avg_duration_multiplier] [float] NULL,
[min_run_duration] [time](7) NULL,
[max_run_duration] [time](7) NULL,
[avg_run_duration] [time](7) NULL,
[cur_run_duration] [time](7) NULL
) ON [PRIMARY]
GO
此調(diào)用具有完整功能的 SP 的示例,包括保存到表格和電子郵件警報(bào) ( ?bSaveToTable= 1, bEmail=1
?):
EXEC DBA_db.dbo.asp_long_running_Regressing_Jobs_Alerts
@history_days = 30,
@avg_duration_multiplier = 2,
@bEmail = 1,
@bSaveToTable = 1,
@RecipientsList = 'myName@myCoDomain.com;',
@ignore_zero_durations = 1
如何出于各種目的調(diào)用此 SP 的其他示例
將當(dāng)前運(yùn)行的作業(yè)與其過去 30 天的歷史進(jìn)行比較,并報(bào)告當(dāng)前持續(xù)時(shí)間超過 30 天平均值 1.5 倍的每個(gè)作業(yè)。不要發(fā)送警報(bào)電子郵件,也不要將此信息保存到基線表:
EXECUTE DBA_db.dbo.asp_long_running_Regressing_Jobs_Alerts 30, 1.5, 0, 0
將當(dāng)前運(yùn)行的作業(yè)與其過去 60 天的歷史進(jìn)行比較,并報(bào)告當(dāng)前持續(xù)時(shí)間超過 60 天平均值 2 倍的每個(gè)作業(yè)。向默認(rèn)收件人(列表)發(fā)送電子郵件警報(bào),并且不要將此信息保存到基線表:
EXECUTE DBA_db.dbo.asp_long_running_Regressing_Jobs_Alerts 60, 1.5, 1, 0
注意:強(qiáng)烈建議對參數(shù)進(jìn)行賦值,明確命名每個(gè)參數(shù)。給出上述示例是為了簡潔。
結(jié)論
本文描述了管理員在高度自動(dòng)化的工作負(fù)載環(huán)境中管理和分析多個(gè)作業(yè)的性能問題時(shí)面臨的問題。我在此處共享的存儲(chǔ)過程允許管理員在某些作業(yè)在給定時(shí)間段內(nèi)花費(fèi)的時(shí)間超過其過去平均持續(xù)時(shí)間時(shí)收到警報(bào)。
附錄
下面是用于創(chuàng)建名為 [ ?MyMonitoredServerName_Maintenance- Regressing Jobs
?]的計(jì)劃作業(yè)的 DDL,以?asp_long_running_Regressing_Jobs_Alerts
?每分鐘執(zhí)行一次并向?BigShotAdminBigShotAdmin@MyCoDomain.com
?發(fā)送警報(bào)。
USE [msdb]
GO
/****** Object: Job [MyMonitoredServerName_Maintenance - Regressing_Jobs] ******/
BEGIN TRANSACTION
DECLARE @ReturnCode INT
SELECT @ReturnCode = 0
IF NOT EXISTS (SELECT name
FROM msdb.dbo.syscategories
WHERE name=N'[BigLoad]'
AND category_class=1)
BEGIN
EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[BigLoad]'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
END
DECLARE @jobId BINARY(16)
EXEC @ReturnCode = msdb.dbo.sp_add_job
@job_name=N'MyMonitoredServerName_Maintenance - Regressing_Jobs',
@enabled=1,
@notify_level_eventlog=0,
@notify_level_email=0,
@notify_level_netsend=0,
@notify_level_page=0,
@delete_level=0,
@description=N'Send email notifications to _
@RrecipientsList (last parameter in SP) on CURRENTLY RUNNING _
Agent Jobs that regress in performance by duration compared to _
baseline (baseline collected during the number of days before _
getdate() specified by the first parameter @history_days.',
@category_name=N'[BigLoad]',
@owner_login_name=N'sa', @job_id = @jobId OUTPUT
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
/****** Object: Step [asp_long_running_Regressing_Jobs_Alerts] ******/
EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, _
@step_name=N'asp_long_running_Regressing_Jobs_Alerts',
@step_id=1,
@cmdexec_success_code=0,
@on_success_action=1,
@on_success_step_id=0,
@on_fail_action=2,
@on_fail_step_id=0,
@retry_attempts=0,
@retry_interval=0,
@os_run_priority=0, @subsystem=N'TSQL',
@command=N'exec _
dbmaint..asp_long_running_Regressing_Jobs_Alerts
@history_days = 45,
@avg_duration_multiplier = 2,
@bEmail = 1,
@bSaveToTable = 1,
@RecipientsList = ''BigShotAdmin@MyCoDomain.com;'',
@ignore_zero_durations = 1',
@database_name=N'master',
@flags=0
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'every 1 min',
@enabled=1,
@freq_type=4,
@freq_interval=1,
@freq_subday_type=4,
@freq_subday_interval=1,
@freq_relative_interval=0,
@freq_recurrence_factor=0,
@active_start_date=20201222,
@active_end_date=99991231,
@active_start_time=60000,
@active_end_time=235959,
@schedule_uid=N'999ac144-4e13-4965-82f2-55555cc37a09'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)'
IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback
COMMIT TRANSACTION
GOTO EndSave
QuitWithRollback:
IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION
EndSave:
GO
享受有關(guān)回歸/長時(shí)間運(yùn)行的 SQL 代理作業(yè)的警報(bào)!