This is going to be a reasonably short and quick entry. Last week I went through the process of demonstrating using BGP Anycast on a server in place of a load balancer. The follow-up post described the health-checking script that I wrote in python to check whether the server was healthy or not. That health-checking script would then inject the BGP route if the server was healthy, and withdraw the route if unhealthy.
However, I felt the script could use a bit more intelligence, so I kept working at it. In the previous script, a static variable called service was first set to “down”, which represented the fact that the BGP route was not being announced. Then in the main loop:
- If the Apache server was healthy
- If the service variable was “down”, meaning the BGP route was not being announced
- Inject the route
- Set the service variable to “up”
- Otherwise do nothing
- If the service variable was “down”, meaning the BGP route was not being announced
- If the Apache server was unhealthy
- If the service variable was “up”, meaning the BGP route was being announced
- Withdraw the route
- Set the service variable to “down”
- Otherwise do nothing
- If the service variable was “up”, meaning the BGP route was being announced
The new version of the script looks like this:
#!/usr/local/bin/python # Loops forever, at an interval defined below, checking the health of the local # Apache server. If the server is up, the list of Ethernet interfaces defined # below will be brought up. If down, they'll be brought down. # # Best to start this with nohup. # nohup anycast_healthchk.py & # import urllib3 import socket import subprocess import time # Some variables we'll be using. # Change as needed. server = "172.20.0.1" # server's IP httpport = "80" # server's port (80 or 443) index = "/index.html" # file we'll grab during the health check hc_interval = 5 # health check interval, in seconds ASN = "65300" # server's BGP ASN # These variables probably don't need changing. url = "http://" + server + index # URL we'll be grabbing to health check route_add = "/usr/local/bin/vtysh -c 'enable' -c 'config term' -c 'router bgp " + ASN + "' -c 'network " + server + "/32' -c 'exit' -c 'exit'" route_del = "/usr/local/bin/vtysh -c 'enable' -c 'config term' -c 'router bgp " + ASN + "' -c 'no network " + server + "/32' -c 'exit' -c 'exit'" route_check = "/usr/local/bin/vtysh -c enable -c 'show ip bgp " + server + "/32' -c exit | grep available" # # isOpen(IP_addr, Port) # # Checks to see if it can open a TCP connection to IP:Port. # Returns True if it can, False otherwise def isOpen(ip, port): s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) try: s.connect((ip, int(port))) s.shutdown(2) return True except: return False # Main loop forever, until killed. while(not time.sleep(hc_interval)): # # Set the stdout/stderr variables; we'll need the stdout one for the loop # to make sure the route is or isn't being sent result = subprocess.Popen(route_check, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) stdout,stderr = result.communicate() if stdout.decode('utf-8'): # set the string to the ASCII output, if there is any routing = True else: routing = False # Use our isOpen() function along with a URL request to see if: # A) the server is accepting connections on its HTTP port (L4) # AND # B) we can pull the HTML file successfully. (L7) # # A success on both will mean the server is healthy. if isOpen(server, httpport) and urllib3.PoolManager().request('GET', url).status == 200: if not routing: # we're not announcing the route subprocess.call(route_add, shell=True) # inject the route else: if routing: # we are announcing the route subprocess.call(route_del, shell=True) # withdraw the route
I got rid of the static service variable completely. Now the main loop is using some intelligence to check and see if the server is announcing the prefix or not, before it does any injection or withdrawal. At the beginning of the loop, you can see I’m calling the subprocess.Popen function, and asking the external vtysh application: is the route being sent in BGP? If it is, set the variable called routing to True, otherwise set it to False.
The loop then does pretty much the same thing as the previous loop, except it doesn’t manually set the routing variable to True or False after changing the routing. The routing is actually checked with each loop.
Further Changes?
I don’t think I’m going to continue developing this health-checking script any further. This was just used as an example of what could be done. However, were I serious about this, I might add a way to parse arguments, such as the server’s ASN, the prefix, and the health-checking interval. Further, I might do a few extra error checks, perhaps.
But, again, not at this point in time.