ipvsadm 介紹
balancer failover transparent to client applications
UDP multicast
<Protocal, CIP:CPort, VIP:VPort, RIP:RPort, Flags, State>
24bytes at least
5,000 connections/second => 120KBytes
An ipvs syncmaster daemon is started inside the kernel on the primary load balancer, and it multicasts connection state in the queue periodically
An ipvs syncbackup daemon is started inside the kernel too on each backup load balancer.
three packet-forwarding methods: NAT, tunneling, and direct routing
eight load balancing algorithms
(round robin, weighted round robin, least-connection, weighted least-connection, locality-based least-connection, locality-based least-connection with replication, destination-hashing, and source-hashing)
Usage
ipvsadm COMMAND [protocol] service-address[scheduling-method] [persistence options]
ipvsadm COMMAND [protocol] service-address server-address [packet-forwarding-method] [weight]
COMMANDS
Upper-case: virtual services
Lower-case: real servers
-A, --add-service
# Add a virtual service
-E, --edit-service
# Edit a virtual service.
-D, --delete-service
# Delete a virtual service
-C, --clear
# Clear the virtual server table.
-S, --save
-R, --restore
# stdin | stdout
# Change the timeout values (0 = keep)
# timeout values (in seconds)
# TCP sessions after receiving a FIN packet
# UDP packets
--set tcp tcpfin udp
# Zero the packet, byte and rate counters
-Z, --zero [service-address]
-L, -l, --list
[service-address]
-c connection table
# connection synchronization daemon
# implemented inside the Linux kernel
# stat: master or backup
--start-daemon state [--mcast-interface interface] [--syncid syncid]
--stop-daemon
PARAMETERS
# host[:port]
-t, --tcp-service service-address
-u, --udp-service service-address
-r, --real-server server-address
-s, --scheduler scheduling-method
rr - Robin Robin
# higher weights receive new jobs first and get more jobs
wrr - Weighted Round Robin
A, B and C, have the weights, 4, 3, 2
AABABCABC
# it cannot perform very well because of the TCP's TIME_WAIT state
lc - Least-Connection
# Weighted Least-Connection Scheduling * Default
wlc - Weighted Least-Connection
#
sh - Source Hashing
# clients are grouped for persistent virtual services.
# Default: 255.255.255.255
-M, --netmask netmask
# default of 300 seconds
-p, --persistent [timeout]
# persistent port
<cip, 0, vip, 0, rip, 0>
where cip is client IP address, vip is virtual IP address and rip is real server IP address.
# Catch-all persistence example of VS/DR
# port zero here to catch all persistent services
ipvsadm -A -t virtualdomain:0 -p
ipvsadm -a -t virtualdomain:0 -r 192.168.1.2 -g
ipvsadm -a -t virtualdomain:0 -r 192.168.1.3 -g
ipvsadm -a -t virtualdomain:0 -r 192.168.1.4 -g
lblc - Locality-Based Least-Connection: assigns jobs destined
for the same IP address to the same server if the server is not
overloaded and available; otherwise assign jobs to servers with
fewer jobs, and keep it for future assignment.
lblcr - Locality-Based Least-Connection with Replication:
assigns jobs destined for the same IP address to the least-
connection node in the server set for the IP address. If all the
node in the server set are over loaded, it picks up a node with
fewer jobs in the cluster and adds it in the sever set for the
target. If the server set has not been modified for the
specified time, the most loaded node is removed from the server
set, in order to avoid high degree of replication.
[packet-forwarding-method]
-g, --gatewaying Use gatewaying (direct routing). This is the default.
-i, --ipip Use ipip encapsulation (tunneling)
-m, --masquerading Use masquerading (network access translation, or NAT).
-w, --weight weight
# The default is 1
# 0: quiescent server(no new jobs but still serve the existing job)
-x, --u-threshold uthreshold
# The default is 0
no new connections will be sent to the server when the number of its connections exceeds its upper connection threshold.
# The default is 0 (3/4 * uthreshold)
# the server will receive new connections when the number
# of its connections drops below its lower connection threshold.
-y, --l-threshold lthreshold
<List>
-c, --connection
# option will list current IPVS connections.
--daemon
# display the daemon status
--stats
# statistics information
--rate
# connections/second bytes/second packets/second
--thresholds
# display the upper/lower connection threshold
-n, --numeric
# Numeric output.
--persistent-conn
# Output of persistent connection information.
http://www.linuxvirtualserver.org/VS-DRouting.html
http://www.linuxvirtualserver.org/VS-IPTunneling.html
http://www.linuxvirtualserver.org/VS-NAT.html
Example:
# NAT: default route of the real servers must be set to the director
# director: echo "1" > /proc/sys/net/ipv4/ip_forward
ipvsadm -A -t 207.175.44.110:80 -s rr
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.1:80 -m
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.2:80 -m
ipvsadm -a -t 207.175.44.110:80 -r 192.168.10.3:80 -m
Virtual Server via IP Tunneling:
1. load balancer sends requests to real servers through IP tunnel in the former
2. When the virtual server receives the encapsulated packet, it decapsulates the packet and processes the request,
3. finally return the result directly to the user according to its own routing table.
* they can be geographically distributed
實作:
real server 1
# Insert the ipip module
insmod ipip
# Make the tunl0 device up
# tunl0 up so that the system can decapsulate ipip packets properly
ifconfig tunl0 up
# Start the hiding interface functionality
echo 1 > /proc/sys/net/ipv4/conf/all/hidden
# Hide all addresses for this tunnel device
echo 1 > /proc/sys/net/ipv4/conf/tunl0/hidden
# Configure a VIP on an alias of tunnel device
# it is good to configure VIPs on the aliases of dummy or loopback device
ifconfig tunl0:0 <VIP> up
load balancer:
echo 1 > /proc/sys/net/ipv4/ip_forward
ipvsadm -A -t 172.26.20.110:80 -s wlc
ipvsadm -a -t 172.26.20.110:80 -r 172.26.20.111 -i
ipvsadm -a -t 172.26.20.110:80 -r 172.26.20.112 -i
Virtual Server via Direct Routing
In the VS/TUN and VS/DR clusters, the Virtual IP (VIP) addresses are shared by both the load balancer and real servers, because they all configure the VIP address on one of their interfaces.
we must guarantee that only the load balancer answers arp request for the VIP to accept incoming packets for virtual service, and the real servers(in the same network of load balancer) don't answer arp request for the VIP but can process packets destined for the VIP locally.
# to hide interface from ARP for LVS (The hidden interface approach)
echo 1 > /proc/sys/net/ipv4/conf/all/hidden
All the real servers have their non-arp alias interface configured with the virtual IP address or redirect packets destined for the virtual IP address to a local socket, so that the real servers can process the packets locally.
The load balancer simply changes the MAC address of the data frame to that of the chosen server and restransmits it on the LAN. This is the reason that the load balancer and each server must be directly connected to one another by a single uninterrupted segment of a LAN.
strategies against DoS attack
# randomly drop entries in the connection hash table
# drops entries that are in the SYN-RECV/SYNACK state
/proc/sys/net/ipv4/vs/drop_entry
controlled by "/proc/sys/net/ipv4/vs/drop_entry"
# 0 means that this strategy is always disabled, 1 and 2 mean automatic modes
# 當 automatically enabled 時(2) disabled (1)
# 3 means that that the strategy is always enabled.
# drop n/rate packets before forwarding them to real servers
/proc/sys/net/ipv4/vs/drop_packet
controlled by "/proc/sys/net/ipv4/vs/drop_packet"
# system has available memory: default value is 1024 pages.
/proc/sys/net/ipv4/vs/amemthresh
# complicated state transtition table
/proc/sys/net/ipv4/vs/secure_tcp
The timeout of secure tcp states can be tuned by the following sysctl variables:
/proc/sys/net/ipv4/vs/timeout_close
/proc/sys/net/ipv4/vs/timeout_closewait
/proc/sys/net/ipv4/vs/timeout_established
/proc/sys/net/ipv4/vs/timeout_finwait
/proc/sys/net/ipv4/vs/timeout_icmp
/proc/sys/net/ipv4/vs/timeout_lastack
/proc/sys/net/ipv4/vs/timeout_listen
/proc/sys/net/ipv4/vs/timeout_synack
/proc/sys/net/ipv4/vs/timeout_synrecv
/proc/sys/net/ipv4/vs/timeout_synsent
/proc/sys/net/ipv4/vs/timeout_timewait
/proc/sys/net/ipv4/vs/timeout_udp
ldirectored
ldirectord = Linux Director Daemon
類似 keepalived 的方案
# The ldirectored can be easily started and stopped by heartbeat.
/etc/ha.d/haresources
lvs1.domain.com IPaddr::10.0.0.3 ldirectord::www
# When the ldirectord is up, the IPVS routing table will be configured properly.
/etc/ha.d/www.cf # the number of second until a real server is declared dead timeout = 10 # the number of second between server checks checkinterval = 10 virtual = 10.0.0.3:80 protocol = tcp scheduler = wlc real = 192.168.0.1:80 gate 5 real = 192.168.0.2:80 gate 5 request = "/.testpage" receive = "test page"
Other