Database Engineer: Always On extended events in SQL server

alwayson_ddl_executed - This event collects data when a data definition language statement is executed, including CREATE, ALTER or DROP. These actions causes your AlwaysOn configuration to have runtime issues.
availability_group_lease_expired - This event indicates that there is a connectivity issue in your AlwaysOn implementation and when the lease expires it will cause AlwaysOn to failover or go offline.
availability_replica_automatic_failover_validation - This events collects information about the readiness of your databases after a failover event. It would be good to have this data especially if your Availability Group has automatic failovers and you need to troubleshoot why these automatic failovers are happening.
availability_replica_manager_state - This event collects data about the heartbeat of your replica manager.
availability_replica_state_change - This event indicates that your replica state changed maybe caused by creation of Availability Group or failover event.

SELECT *

FROM sys.availability_groups;

SELECT *

FROM sys.availability_groups_cluster;

SELECT *

FROM sys.availability_group_listeners;

SELECT *

FROM sys.availability_group_listener_ip_addresses;

SELECT *

FROM sys.availability_replicas;

SELECT *

FROM sys.dm_hadr_availability_group_states;

SELECT *

FROM sys.dm_hadr_availability_replica_cluster_nodes;

SELECT *

FROM sys.dm_hadr_availability_replica_cluster_states;

SELECT *

FROM sys.dm_hadr_availability_replica_states;

SELECT *

FROM sys.dm_hadr_auto_page_repair;

SELECT *

FROM sys.dm_hadr_cluster;

SELECT *

FROM sys.dm_hadr_cluster_members;

SELECT *

FROM sys.dm_hadr_cluster_networks;

SELECT *

FROM sys.dm_hadr_database_replica_states;

SELECT *

FROM sys.dm_hadr_database_replica_cluster_states;

SELECT *

FROM sys.dm_tcp_listener_states;

Performance Monitor

The third native solution that can be used for AlwaysOn Availability Group performance monitoring is the use of performance counters via the Performance Monitor. For that purpose, Microsoft has added new objects with number of different counters: SQLServer:Availability Replica and SQLServer:Database Replica

SQLServer:Availability Replica consists of performance counters that returns performance data of the AlwaysOn replicas. All performance counters in this group are valid for both – the primary and the secondary replicas, while the send/receive counters are related to the local replica

The object’s counters with description according to Microsoft msdn¹

Counter Name	Description
Bytes Received from Replica/sec	Number of bytes received from the availability replica per second. Pings and status updates will generate network traffic even on databases with no user updates.
Bytes Sent to Replica/sec	Number of bytes sent to the remote availability replica per second. On the primary replica this is the number of bytes sent to the secondary replica. On the secondary replica this is the number of bytes sent to the primary replica.
Bytes Sent to Transport/sec	Actual number of bytes sent per second over the network to the remote availability replica. On the primary replica this is the number of bytes sent to the secondary replica. On the secondary replica this is the number of bytes sent to the primary replica.
Flow Control Time (ms/sec)	Time in milliseconds that log stream messages waited for send flow control, in the last second.
Flow Control/sec	Number of times flow-control initiated in the last second. Flow Control Time (ms/sec) divided by Flow Control/sec is the average time per wait.
Receives from Replica/sec	Number of AlwaysOn messages received from the replica per second.
Resent Messages/sec	Number of AlwaysOn messages resent in the last second.
Sends to Replica/sec	Number of AlwaysOn messages sent to this availability replica per second.
Sends to Transport/sec	Actual number of AlwaysOn messages sent per second over the network to the remote availability replica. On the primary replica this is the number of messages sent to the secondary replica. On the secondary replica this is the number of messages sent to the primary replica.

SQLServer:Database Replica consists of performance counters that are returning performance info about the databases on the secondary replica of an AlwaysOn availability group. The SQLServer:Database Replica object can be used only for SQL Server instance databases hosted by a secondary replica.

The object’s counters with description according to Microsoft msdn²

Counter Name	Description	View on…
File Bytes Received/sec	Amount of FILESTREAM data received by the secondary replica for the secondary database in the last second.	Secondary replica
Log Bytes Received/sec	Amount of log records received by the secondary replica for the database in the last second.	Secondary replica
Log remaining for undo	The amount of log in kilobytes remaining to complete the undo phase.	Secondary replica
Log Send Queue	Amount of log records in the log files of the primary database, in kilobytes, that has not yet been sent to the secondary replica. This value is sent to the secondary replica from the primary replica. Queue size does not include FILESTREAM files that are sent to a secondary.	Secondary replica
Mirrored Write Transaction/sec	Number of transactions that wrote to the mirrored database and waited for the log to be sent to the mirror in order to commit, in the last second.	Primary replica
Recovery Queue	Amount of log records in the log files of the secondary replica that has not yet been redone.	Secondary replica
Redo Bytes Remaining	The amount of log in kilobytes remaining to be redone to finish the reverting phase.	Secondary replica
Redone Bytes/sec	Amount of log records redone on the secondary database in the last second.	Secondary replica
Total Log requiring undo	Total kilobytes of log that must be undone.	Secondary replica
Transaction Delay	Delay in waiting for unterminated commit acknowledgement, in milliseconds.	Primary replica

These counters are also available via sys.dm_os_performance_counters Dynamic Management View

Below are the suggested list error codes with descriptions for which configuring alerts is recommended

976	The target database cannot be queried. Either data movement is suspended or the availability replica is not enabled for read access.
983	The availability database is not accessible and reason needs to be investigated
1480	The AlwaysOn Availability Group has failed over. The reason for failing over should be examined. This is an informational message only.
19406	The state of the local availability replica has changed. Reason should be investigated
35254	An error occurred while accessing the availability group metadata. Check and investigate the root cause of this error
35262	This is informational error and it indicates that the default startup of database will be skipped as database is member of an availability group
35273	Bypassing recovery since availability group database is marked as inaccessible, because the session with the primary replica was interrupted or the WSFC node lacks quorum endpoint configuration
35274	An availability database recovery is pending while waiting for the secondary replica to receive transaction log from the primary
35275	The availability database is in a potentially damaged state, and as such it cannot be joined to availability group. Restoring and rejoining database is recommended
35276	This error indicates that manual intervention could be needed to restart synchronization of the database. If the problem is persistent, restart of the local SQL Server might be required.
35279	The primary replica rejected joining of the new database to availability group due to error.
35299	Info message that some nonqualified transactions are being rolled back in database for an AlwaysOn change of state.
41048	Local Windows Server Failover Clustering service is not available
41049	Local Windows Server Failover Clustering node is not online anymore
41050	Waiting for the start of a local Windows Server Failover Clustering service
41051	Local Windows Server Failover Clustering service started
41052	AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to start. This is an informational message only. No user action is required.
41053	AlwaysOn Availability Groups: Local Windows Server Failover Clustering node started. This is an informational message only. No user action is required.
41054	AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to come online. This is an informational message only. No user action is required.
41055	AlwaysOn Availability Groups: Local Windows Server Failover Clustering node is online. This is an informational message only. No user action is required.
41089	AlwaysOn Availability Groups startup has been cancelled because SQL Server is shutting down. This is an informational message only. No user action is required.
41091	The local AlwaysOn availability group replica is going offline because the lease expired or lease renewal failed
41131	Bring availability group online has been failed. Verify that the local Windows Server Failover Clustering (WSFC) node is online
41406	The availability group is prepared for automatic failover due to the secondary replica is not ready for an automatic failover. The secondary replica is unavailable, or its data synchronization state is currently not in the SYNCHRONIZED synchronization state.
41414	At least one secondary replica is not connected to the primary replica and the indicated connected state is DISCONNECTED.
41421	Availability database is suspended.

To configure alerts, the first item that needs to be configured is the Operator that will be used for alerts. Creating operators is possible via T-SQL script or using SSMS

Database Engineer

Thursday, 21 June 2018

Always On extended events in SQL server

No comments:

Post a Comment

Check if column exists or not in the SQL server

Total Pageviews

Online Visitors