Recently i noticed an alarm issue with our Steelhead fleet. At random times the Steelhead appliances were losing trust relationship with the Windows Domain they are members of. This can be really bad, when the main reason your have a product like Riverbed Steelhead is cache chatty protocols such as Microsoft CIFS/SMB. A further look into the logs on the Steelhead appliances – we have one at every site i was seeing the following log entry:
Jul 14 10:31:36 rb-flee sport[9207]: [domain_auth/trusted_domains.WARN] – {- -} Failed to update trusted domains : Failed to communicate with winbindd
Further entries in the logs reported that CIFS and SMB optimizations will be disabled whilst the trust with the domain is down. This is very bad losing all application layer optimization for pretty much anything file transfer related as like many organisations we use Active Directory.
So the troubleshoot this issue, the first thing i did was make sure the steelheads could hit up the site local domain controller for kerberos authentication. I wanted to make sure there was no layer 2 or layer 3 connectivity issue. To do this you can issue a telnet command from the RiOS CLI:
SH > en
SH # telnet 10.10.10.1 88
SH # (config) job 1 command 1 “en”
SH # (config) job 1 command 2 “pm process winbind restart”
SH # (config) job 1 comment “This job will restart winbind service once every day”
SH # (config) job 1 date-time 00:00:00 2014/08/06
SH # (config) job 1 enable
SH # (config) job 1 name “Restart-Winbind”
SH # (config) job 1 recurring “86400”
Comments