Thursday, 21 June 2018

Always On extended events in SQL server


  1. alwayson_ddl_executed - This event collects data when a data definition language statement is executed, including CREATE, ALTER or DROP.  These actions causes your AlwaysOn configuration to have runtime issues.
  2. availability_group_lease_expired - This event indicates that there is a connectivity issue in your AlwaysOn implementation and when the lease expires it will cause AlwaysOn to failover or go offline.
  3. availability_replica_automatic_failover_validation - This events collects information about the readiness of your databases after a failover event. It would be good to have this data especially if your Availability Group has automatic failovers and you need to troubleshoot why these automatic failovers are happening.
  4. availability_replica_manager_state - This event collects data about the heartbeat of your replica manager.
  5. availability_replica_state_change - This event indicates that your replica state changed maybe caused by creation of Availability Group or failover event.


SELECT *
FROM sys.availability_groups;

SELECT *
FROM sys.availability_groups_cluster;

SELECT *
FROM sys.availability_group_listeners;

SELECT *
FROM sys.availability_group_listener_ip_addresses;

SELECT *
FROM sys.availability_replicas;

SELECT *
FROM sys.dm_hadr_availability_group_states;

SELECT *
FROM sys.dm_hadr_availability_replica_cluster_nodes;

SELECT *
FROM sys.dm_hadr_availability_replica_cluster_states;

SELECT *
FROM sys.dm_hadr_availability_replica_states;

SELECT *
FROM sys.dm_hadr_auto_page_repair;

SELECT *
FROM sys.dm_hadr_cluster;

SELECT *
FROM sys.dm_hadr_cluster_members;

SELECT *
FROM sys.dm_hadr_cluster_networks;

SELECT *
FROM sys.dm_hadr_database_replica_states;

SELECT *
FROM sys.dm_hadr_database_replica_cluster_states;

SELECT *
FROM sys.dm_tcp_listener_states;


Performance Monitor
The third native solution that can be used for AlwaysOn Availability Group performance monitoring is the use of performance counters via the Performance Monitor. For that purpose, Microsoft has added new objects with number of different counters: SQLServer:Availability Replica and SQLServer:Database Replica
SQLServer:Availability Replica consists of performance counters that returns performance data of the AlwaysOn replicas. All performance counters in this group are valid for both – the primary and the secondary replicas, while the send/receive counters are related to the local replica
The object’s counters with description according to Microsoft msdn1
Counter Name
Description
Bytes Received from Replica/sec
Number of bytes received from the availability replica per second. Pings and status updates will generate network traffic even on databases with no user updates.
Bytes Sent to Replica/sec
Number of bytes sent to the remote availability replica per second. On the primary replica this is the number of bytes sent to the secondary replica. On the secondary replica this is the number of bytes sent to the primary replica.
Bytes Sent to Transport/sec
Actual number of bytes sent per second over the network to the remote availability replica. On the primary replica this is the number of bytes sent to the secondary replica. On the secondary replica this is the number of bytes sent to the primary replica.
Flow Control Time (ms/sec)
Time in milliseconds that log stream messages waited for send flow control, in the last second.
Flow Control/sec
Number of times flow-control initiated in the last second. Flow Control Time (ms/sec) divided by Flow Control/sec is the average time per wait.
Receives from Replica/sec
Number of AlwaysOn messages received from the replica per second.
Resent Messages/sec
Number of AlwaysOn messages resent in the last second.
Sends to Replica/sec
Number of AlwaysOn messages sent to this availability replica per second.
Sends to Transport/sec
Actual number of AlwaysOn messages sent per second over the network to the remote availability replica. On the primary replica this is the number of messages sent to the secondary replica. On the secondary replica this is the number of messages sent to the primary replica.
SQLServer:Database Replica consists of performance counters that are returning performance info about the databases on the secondary replica of an AlwaysOn availability group. The SQLServer:Database Replica object can be used only for SQL Server instance databases hosted by a secondary replica.
The object’s counters with description according to Microsoft msdn2
Counter Name
Description
View on…
File Bytes Received/sec
Amount of FILESTREAM data received by the secondary replica for the secondary database in the last second.
Secondary replica
Log Bytes Received/sec
Amount of log records received by the secondary replica for the database in the last second.
Secondary replica
Log remaining for undo
The amount of log in kilobytes remaining to complete the undo phase.
Secondary replica
Log Send Queue
Amount of log records in the log files of the primary database, in kilobytes, that has not yet been sent to the secondary replica. This value is sent to the secondary replica from the primary replica. Queue size does not include FILESTREAM files that are sent to a secondary.
Secondary replica
Mirrored Write Transaction/sec
Number of transactions that wrote to the mirrored database and waited for the log to be sent to the mirror in order to commit, in the last second.
Primary replica
Recovery Queue
Amount of log records in the log files of the secondary replica that has not yet been redone.
Secondary replica
Redo Bytes Remaining
The amount of log in kilobytes remaining to be redone to finish the reverting phase.
Secondary replica
Redone Bytes/sec
Amount of log records redone on the secondary database in the last second.
Secondary replica
Total Log requiring undo
Total kilobytes of log that must be undone.
Secondary replica
Transaction Delay
Delay in waiting for unterminated commit acknowledgement, in milliseconds.
Primary replica
These counters are also available via sys.dm_os_performance_counters Dynamic Management View

Below are the suggested list error codes with descriptions for which configuring alerts is recommended
976
The target database cannot be queried. Either data movement is suspended or the availability replica is not enabled for read access.
983
The availability database is not accessible and reason needs to be investigated
1480
The AlwaysOn Availability Group has failed over. The reason for failing over should be examined. This is an informational message only.
19406
The state of the local availability replica has changed. Reason should be investigated
35254
An error occurred while accessing the availability group metadata. Check and investigate the root cause of this error
35262
This is informational error and it indicates that the default startup of database will be skipped as database is member of an availability group
35273
Bypassing recovery since availability group database is marked as inaccessible, because the session with the primary replica was interrupted or the WSFC node lacks quorum endpoint configuration
35274
An availability database recovery is pending while waiting for the secondary replica to receive transaction log from the primary
35275
The availability database is in a potentially damaged state, and as such it cannot be joined to availability group. Restoring and rejoining database is recommended
35276
This error indicates that manual intervention could be needed to restart synchronization of the database. If the problem is persistent, restart of the local SQL Server might be required.
35279
The primary replica rejected joining of the new database to availability group due to error.
35299
Info message that some nonqualified transactions are being rolled back in database for an AlwaysOn change of state.
41048
Local Windows Server Failover Clustering service is not available
41049
Local Windows Server Failover Clustering node is not online anymore
41050
Waiting for the start of a local Windows Server Failover Clustering service
41051
Local Windows Server Failover Clustering service started
41052
AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to start. This is an informational message only. No user action is required.
41053
AlwaysOn Availability Groups: Local Windows Server Failover Clustering node started. This is an informational message only. No user action is required.
41054
AlwaysOn Availability Groups: Waiting for local Windows Server Failover Clustering node to come online. This is an informational message only. No user action is required.
41055
AlwaysOn Availability Groups: Local Windows Server Failover Clustering node is online. This is an informational message only. No user action is required.
41089
AlwaysOn Availability Groups startup has been cancelled because SQL Server is shutting down. This is an informational message only. No user action is required.
41091
The local AlwaysOn availability group replica is going offline because the lease expired or lease renewal failed
41131
Bring availability group online has been failed. Verify that the local Windows Server Failover Clustering (WSFC) node is online
41406
The availability group is prepared for automatic failover due to the secondary replica is not ready for an automatic failover. The secondary replica is unavailable, or its data synchronization state is currently not in the SYNCHRONIZED synchronization state.
41414
At least one secondary replica is not connected to the primary replica and the indicated connected state is DISCONNECTED.
41421
Availability database is suspended.
To configure alerts, the first item that needs to be configured is the Operator that will be used for alerts. Creating operators is possible via T-SQL script or using SSMS


No comments:

Post a Comment

I thought you'd be interested in this job at NICE or know someone who might be a good match.

Job: Tech Lead SQL Developer  I thought you'd be interested in this job at NICE or know someone who might be a good match.