ipvsadm 介紹


balancer failover transparent to client applications
UDP multicast
<Protocal, CIP:CPort, VIP:VPort, RIP:RPort, Flags, State>

24bytes at least

5,000 connections/second => 120KBytes

An ipvs syncmaster daemon is started inside the kernel on the primary load balancer, and it multicasts connection state in the queue periodically

An ipvs syncbackup daemon is started inside the kernel too on each backup load balancer.

three packet-forwarding methods: NAT, tunneling, and direct routing

eight load balancing algorithms  
(round robin, weighted round robin, least-connection, weighted least-connection, locality-based least-connection, locality-based least-connection with replication,  destination-hashing, and source-hashing)




ipvsadm COMMAND [protocol] service-address[scheduling-method] [persistence options]

ipvsadm COMMAND [protocol] service-address server-address [packet-forwarding-method] [weight]


Upper-case: virtual  services

Lower-case: real servers

-A, --add-service
# Add a virtual service

-E, --edit-service
# Edit a virtual service.

-D, --delete-service
# Delete  a  virtual  service

-C, --clear
# Clear the virtual server table.

-S, --save
-R, --restore
# stdin | stdout

# Change  the  timeout  values (0 = keep)
# timeout values  (in  seconds)
# TCP  sessions  after receiving a  FIN packet
# UDP   packets
--set tcp tcpfin udp

# Zero the packet, byte and rate counters
-Z, --zero [service-address]

-L, -l, --list
    -c connection table

# connection synchronization daemon
# implemented  inside  the Linux  kernel
# stat: master or backup
--start-daemon state [--mcast-interface interface] [--syncid syncid]

# host[:port]
-t, --tcp-service service-address
-u, --udp-service service-address
-r, --real-server server-address

-s, --scheduler scheduling-method

rr - Robin Robin

# higher weights receive new jobs first and  get  more  jobs
wrr -  Weighted  Round  Robin
A, B and C, have the weights, 4, 3, 2

# it cannot perform very well because of the TCP's TIME_WAIT state
lc - Least-Connection

# Weighted Least-Connection Scheduling * Default
wlc  -  Weighted  Least-Connection

sh  - Source Hashing

# clients  are  grouped  for persistent  virtual services.
# Default:
-M, --netmask netmask

# default of 300 seconds
-p, --persistent [timeout]

#  persistent port
<cip, 0, vip, 0, rip, 0>
where cip is client IP address, vip is virtual IP address and rip is real server IP address.

# Catch-all persistence example of VS/DR
# port zero here to catch all persistent services
ipvsadm -A -t virtualdomain:0 -p
ipvsadm -a -t virtualdomain:0 -r -g
ipvsadm -a -t virtualdomain:0 -r -g
ipvsadm -a -t virtualdomain:0 -r -g

lblc  -  Locality-Based  Least-Connection: assigns jobs destined
         for the same IP address to the same server if the server is  not
         overloaded  and available; otherwise assign jobs to servers with
         fewer jobs, and keep it for future assignment.
lblcr  -  Locality-Based  Least-Connection with Replication:
          assigns  jobs  destined  for  the  same IP address to the least-
          connection node in the server set for the IP address. If all the
          node  in the server set are over loaded, it picks up a node with
          fewer jobs in the cluster and adds it in the sever set  for  the
          target.  If  the  server  set  has  not  been  modified  for the
          specified time, the most loaded node is removed from the  server
          set, in order to avoid high degree of replication.


-g, --gatewaying Use gatewaying (direct routing). This is the default.
-i, --ipip  Use ipip encapsulation (tunneling)
-m, --masquerading     Use   masquerading (network access translation, or NAT).

-w, --weight weight
# The default is 1
# 0: quiescent server(no  new  jobs  but  still  serve  the  existing  job)

-x, --u-threshold uthreshold
# The default is  0
no  new connections will be sent to the server when the number  of  its  connections  exceeds   its   upper   connection threshold.

# The default is 0 (3/4 * uthreshold)
# the server will receive new connections when the number
# of its connections drops below its lower  connection  threshold.
-y, --l-threshold lthreshold


-c, --connection
# option will list current IPVS connections.

# display the daemon status

# statistics  information

# connections/second  bytes/second   packets/second

#  display  the  upper/lower  connection  threshold

-n, --numeric
# Numeric  output.

# Output of persistent connection information. 



# NAT: default route  of  the  real  servers  must  be  set to the director
# director: echo "1" > /proc/sys/net/ipv4/ip_forward

ipvsadm -A -t -s rr
ipvsadm -a -t -r -m
ipvsadm -a -t -r -m
ipvsadm -a -t -r -m

Virtual Server via IP Tunneling:


1. load balancer sends requests to real servers through IP tunnel in the former

2. When the virtual server receives the encapsulated packet, it decapsulates the packet and processes the request,

3. finally return the result directly to the user according to its own routing table.

* they can be geographically distributed


real server 1

# Insert the ipip module

insmod ipip

# Make the tunl0 device up
# tunl0 up so that the system can decapsulate ipip packets properly

ifconfig tunl0 up

# Start the hiding interface functionality

echo 1 > /proc/sys/net/ipv4/conf/all/hidden

# Hide all addresses for this tunnel device

echo 1 > /proc/sys/net/ipv4/conf/tunl0/hidden

# Configure a VIP on an alias of tunnel device
# it is good to configure VIPs on the aliases of dummy or loopback device

ifconfig tunl0:0 <VIP> up

load balancer:

echo 1 > /proc/sys/net/ipv4/ip_forward
ipvsadm -A -t -s wlc
ipvsadm -a -t -r -i
ipvsadm -a -t -r -i


Virtual Server via Direct Routing


In the VS/TUN and VS/DR clusters, the Virtual IP (VIP) addresses are shared by both the load balancer and real servers, because they all configure the VIP address on one of their interfaces.

we must guarantee that only the load balancer answers arp request for the VIP to accept incoming packets for virtual service, and the real servers(in the same network of load balancer) don't answer arp request for the VIP but can process packets destined for the VIP locally.

# to hide interface from ARP for LVS (The hidden interface approach)

echo 1 > /proc/sys/net/ipv4/conf/all/hidden

All the real servers have their non-arp alias interface configured with the virtual IP address or redirect packets destined for the virtual IP address to a local socket, so that the real servers can process the packets locally.

The load balancer simply changes the MAC address of the data frame to that of the chosen server and restransmits it on the LAN. This is the reason that the load balancer and each server must be directly connected to one another by a single uninterrupted segment of a LAN.


strategies against DoS attack


# randomly drop entries in the connection hash table
# drops entries that are in the SYN-RECV/SYNACK state

 controlled by "/proc/sys/net/ipv4/vs/drop_entry"
 # 0 means that this strategy is always disabled, 1 and 2 mean automatic modes
 # 當 automatically enabled 時(2) disabled (1)
 # 3 means that that the strategy is always enabled.
# drop n/rate packets before forwarding them to real servers
 controlled by "/proc/sys/net/ipv4/vs/drop_packet"
 # system has available memory: default value is 1024 pages.
# complicated state transtition table
The timeout of secure tcp states can be tuned by the following sysctl variables:





ldirectord  = Linux Director Daemon

類似 keepalived 的方案

# The ldirectored can be easily started and stopped by heartbeat.


lvs1.domain.com IPaddr:: ldirectord::www

# When the ldirectord is up, the IPVS routing table will be configured properly.


# the number of second until a real server is declared dead
timeout = 10

# the number of second between server checks
checkinterval = 10

virtual =
     protocol = tcp
     scheduler = wlc
     real = gate 5
     real = gate 5
     request = "/.testpage"
     receive = "test page"