We have been burned a few times by high backlogs for whatever reason - DFS has an issue or someone massed changed permissions on a folder and created hundreds of thousands of files needing to replicate.
Long story short though I would like to be notified if there is a high backlog on any DFS server in the domain (we have 5) in any direction for any replicated folder or group.
I found a fantastic script to check this - takes about 40 minutes for the script to run and I can build in alert - ie if count is greater than 1000 - then send an email etc... and then run the script once ever couple of hours.
I assume we are not supposed to link to external web sites but if one was to go to a certain search engine and look for
"How do I get the current DFS replication backlog count? hkey" you would find the powershell script I am referring to. I had to make a change in the comments of the page to make it work for me.
Credit to author Kamal on this. Only change is a time count to see how long script took to run and the changes I mentioned above. Original is just commented out below.
So back to my question - am I over thinking this? - Is there a better event in the event log that I can search for and alert on that will tell me there is a high backlog count - is there an easy way to notify me - hey you might have an issue?
PS - not sure why it thinks the second half is comments... but works pastes into powershell editor and looks fine.
# Get Start Time $startDTM = (Get-Date) # Get all replication groups $replicationgroups = dfsradmin rg list; # Reduce loop by 3 lines to filter out junk from dfsradmin $i = 0; $imax = ($replicationgroups.count -3); # Loop through each replication group foreach ($replicationgroup in $replicationgroups) { # Exclude first and last two lines as junk, and exclude the domain system volume if (($i -ge 1) -and ($i -le $imax) -and ($replicationgroup -notlike "*domain system volume*")) { # Format replication group name $replicationgroup = $replicationgroup.split(" "); $replicationgroup[-1] = ""; $replicationgroup = ($replicationgroup.trim() -join " ").trim(); # Get and format replication folder name $replicationfolder = & cmd /c ("dfsradmin rf list /rgname:`"{0}`"" -f $replicationgroup); $replicationfolder = (($replicationfolder[1].split("\"))[0]).trim(); # Get servers for the current replication group $replicationservers = & cmd /c ("dfsradmin conn list /rgname:`"{0}`"" -f $replicationgroup); # Reduce loop by 3 lines to filter out junk from dfsradmin $j = 0; $jmax = ($replicationservers.count -3); # Loop through each replication member server foreach ($replicationserver in $replicationservers) { # Exclude first and last two lines as junk if (($j -ge 1) -and ($j -le $jmax)) { # Format server names # $sendingserver = ($replicationserver.split(" "))[0].trim(); # $receivingserver = ($replicationserver.split(" "))[2].trim(); $sendingserver = ($replicationserver.split()| where {$_})[0].trim(); $receivingserver = ($replicationserver.split()| where {$_})[1].trim(); # Get backlog count with dfsrdiag $backlog = & cmd /c ("dfsrdiag backlog /rgname:`"{0}`" /rfname:`"{1}`" /smem:{2} /rmem:{3}" -f $replicationgroup, $replicationfolder, $sendingserver, $receivingserver); $backlogcount = ($backlog[1]).split(":")[1]; # Format backlog count if ($backlogcount -ne $null) { $backlogcount = $backlogcount.trim(); } else { $backlogcount = 0; } # Create output string to <replication group> <sending server> <receiving server> <backlog count>; $outline = $replicationgroup + " From: " + $sendingserver + " To: " + $receivingserver + " Backlog: " + $backlogcount; $outline; } $j = $j + 1; } } $i = $i + 1; } # Get End Time $endDTM = (Get-Date) # Echo Time elapsed "Elapsed Time: $(($endDTM-$startDTM).totalseconds) seconds"