symmetricds (DB replicate)

最後更新: 2017-06-28

介紹

It can also be used to replicate files and directories between multiple hosts.

It uses a light-weight, web-based protocol to send and receive data, which makes it easy to work with firewalls.

Replication is done in the background asynchronously, allowing data changes in offline mode.

HomePage: https://www.symmetricds.org

功能

 * partial tables replicate (filtered synchronization)
 * table names and column names be re-mapped
 * data transformation
 * connections between instances of SymmetricDS require password authentication.
 * Data can be encrypted by configuring node addresses to use HTTPS
 * Extendable - Scripts and Java code can be configured to handle events, transform data, and create customized behavior.

Requirements

JRE > 7

 


原理

 

Node

A node is responsible for synchronizing the data from a database or file system with other nodes in the network using HTTP.

Nodes are assigned to one of the node Groups that are configured together as a unit.

The node groups are linked together with Group Links to define either a push or pull communication.

A pull causes one node to connect with other nodes and request changes that are waiting,

while a push causes one node to connect with other nodes when it has changes to send.

Each node is connected to a database with a Java Database Connectivity (JDBC) driver using a connection URL, username, and password.

At startup, SymmetricDS looks for Node Properties Files and starts a node for each file it finds, which allows multiple nodes to run in the same instance and share resources. The property file for a node contains its external ID, node group, registration server URL, and database connection information. The external ID is the name for a node used to identify it from other nodes. One node is configured as the registration server where the master configuration is stored. When a node is started for the first time, it contacts the registration server using a registration process that sends its external ID and node group. In response, the node receives its configuration and a node password that must be sent as authentication during synchronization with other nodes.

Triggers are installed in the database to guarantee that data changes are captured.
(without any special driver software.)

each change is given a sequence number
(timestamps are not used.)

Data changes are grouped together and given a batch number to route through the system.

Channels

Channels define those logical groupings of data.

i.e.

a set of tables that hold customer data might be logically grouped together in a Customer channel.

 


Delay

 

The user configures a delay in milliseconds for when the changes are replicated to the target system.

 

 


Running as a Windows Service

 

To install the service, run the following command as Administrator:

bin\sym_service.bat install
bin\sym_service.bat uninstall

bin\sym_service.bat start
bin\sym_service.bat stop

 


Running as Standalone

 

# starts a standalone instance of SymmetricDS using the built-in Jetty web server

bin\sym

# Configure File

conf/symmetric-server.properties

# Enable synchronization over HTTP.
http.enable=true
http.port=31415
....
# Enable Java Management Extensions (JMX) web console.
jmx.http.enable=true
jmx.http.port=31416

conf/sym_service.conf

# Initial Java Heap Size (in MB)
wrapper.java.initmemory=256

# Maximum Java Heap Size (in MB)
wrapper.java.maxmemory=1024

# Log level (SEVERE, WARNING, INFO, CONFIG, FINE, FINER, FINEST)
wrapper.logfile.loglevel=INFO

# Size of log file when it will rotate to next file
wrapper.logfile.maxsize=10m

# Number of log files to rotate and keep
wrapper.logfile.maxfiles=3

 


Tools

 

SymmetricDS service (sym_service)

command line launcher (sym)

administrative tools (symadmin)

utilities (dbexport, dbimport, dbfill)

 


Demo 環境

 

# samples

cd samples

# Create the sample tables

..\bin\dbimport --engine corp-000 --format XML --alter-case create_sample.xml

# Next, create the SymmetricDS tables (Tables are automatically created when the server starts for the first time.)

..\bin\symadmin --engine corp-000 create-sym-tables

# Load the SymmetricDS configuration and sample item

..\bin\dbimport --engine corp-000 insert_sample.sql

# All three nodes run in the same instance.

bin\sym

 


Initial Load

 

An initial load is the process of seeding tables at a target node with data from a source node.

Instead of capturing data, data is selected from the source table using a SQL statement and then it is streamed to the client.

Initial Load by command

symadmin --engine source-000 reload-node 001

console output

....
Successfully enabled initial load for node 001

log

2017-06-28 17:28:38,453 INFO [store-001] [H2DatabasePlatform] [dataloader-corp-000-1] 
    Tables up to date.  No alters found for ITEM_SELLING_PRICE
2017-06-28 17:28:38,468 INFO [store-001] [ConfigurationChangedDatabaseWriterFilter] [dataloader-corp-000-1] 
    About to refresh the cache of node security because new configuration came through the data loader
2017-06-28 17:28:38,468 INFO [store-001] [PullService] [store-001-pull-default-7] 
    Pull data received from corp:000:000 on channel thread default.  9 rows and 7 batches were processed

Initial Load by SQL

update sym_node_security set initial_load_enabled = 1 where node_id = '001';

* load 後會 initial_load_id 會加 1 及 update "initial_load_time"

 

 

 


verify

 

# verify the databases by logging in

bin\dbsql --engine corp-000

# 在 corp-000 加入 Data

insert into item (item_id, name) values (110000055, 'Soft Drink');
insert into item_selling_price (item_id, store_id, price)
        values (110000055, '001', 0.65);
insert into item_selling_price (item_id, store_id, price)
        values (110000055, '002', 1.00);

# 查看 console output

[store-002] - PullService - Pull data received from corp:000:000 on channel thre
ad default.  2 rows and 1 batches were processed
[store-001] - PullService - Pull data received from corp:000:000 on channel thre
ad default.  2 rows and 1 batches were processed

# Verify Outgoing Batches (on source)

# login "corp-000"

bin\dbsql --engine corp-000

# 改用 list 形式看 BATCH

list

# sql

select * from sym_outgoing_batch order by batch_id desc limit 1;

BATCH_ID                : 534
NODE_ID                 : -1
CHANNEL_ID              : config
STATUS                  : OK
LOAD_ID                 : -1
EXTRACT_JOB_FLAG        : 0
LOAD_FLAG               : 0
ERROR_FLAG              : 0
COMMON_FLAG             : 0
IGNORE_COUNT            : 0
BYTE_COUNT              : 0
EXTRACT_COUNT           : 0
SENT_COUNT              : 0
LOAD_COUNT              : 0
DATA_EVENT_COUNT        : 1
RELOAD_EVENT_COUNT      : 0
INSERT_EVENT_COUNT      : 0
UPDATE_EVENT_COUNT      : 1
DELETE_EVENT_COUNT      : 0
OTHER_EVENT_COUNT       : 0
ROUTER_MILLIS           : 0
NETWORK_MILLIS          : 0
FILTER_MILLIS           : 0
LOAD_MILLIS             : 0
EXTRACT_MILLIS          : 0
TRANSFORM_EXTRACT_MILLIS: 0
TRANSFORM_LOAD_MILLIS   : 0
TOTAL_EXTRACT_MILLIS    : 0
TOTAL_LOAD_MILLIS       : 0
SQL_STATE               : null
SQL_CODE                : 0
SQL_MESSAGE             : null
FAILED_DATA_ID          : 0
FAILED_LINE_NUMBER      : 0
LAST_UPDATE_HOSTNAME    : S08R2
LAST_UPDATE_TIME        : 2017-06-28 17:28:29.779
CREATE_TIME             : 2017-06-28 17:28:29.779
CREATE_BY               : null
SUMMARY                 : sym_node_security
(1 row, 0 ms)

# Verify Incoming Batches (on destination)

bin\dbsql --engine store-001

list
select * from sym_incoming_batch order by batch_id desc limit 1;

output

BATCH_ID             : 535
NODE_ID              : 000
CHANNEL_ID           : heartbeat
STATUS               : OK
ERROR_FLAG           : 0
NETWORK_MILLIS       : 0
FILTER_MILLIS        : 0
DATABASE_MILLIS      : 16
FAILED_ROW_NUMBER    : 0
FAILED_LINE_NUMBER   : 0
BYTE_COUNT           : 766
STATEMENT_COUNT      : 1
FALLBACK_INSERT_COUNT: 0
FALLBACK_UPDATE_COUNT: 0
IGNORE_COUNT         : 0
IGNORE_ROW_COUNT     : 0
MISSING_DELETE_COUNT : 0
SKIP_COUNT           : 0
SQL_STATE            : null
SQL_CODE             : 0
SQL_MESSAGE          : null
LAST_UPDATE_HOSTNAME : S08R2
LAST_UPDATE_TIME     : 2017-06-28 17:41:25.255
CREATE_TIME          : 2017-06-28 17:41:25.239
SUMMARY              : sym_node_host
(1 row, 15 ms)

 


cli - dbsql

 

 a sql shell for database

-e,--engine <arg>

The name of a configured engine. 

The name should correspond to a engine.name setting in one of the properties files in the engines directory.

log

--no-log-console     No output will be sent to the console.

--no-log-file           No output will be sent to the log file. (logs/symmetric.log)

-p,--properties <arg>  

The properties file with settings for the SymmetricDS engine. 

If not provided, defaults are used, then overridden with first symmetric.properties in classpath,

then overridden with symmetric.properties values in user.home directory.

--sql <arg>

Run this sql statement in the shell

 


cli - symadmin

 

list-engines

symadmin list-engines

Log output will be written to C:\symmetric-server-3.8.26/logs/symmetric.log
Engines directory is C:\symmetric-server-3.8.26\engines
The following engines and properties files are available:
corp-000 -> C:\symmetric-server-3.8.26\engines\corp-000.properties
store-001 -> C:\symmetric-server-3.8.26\engines\store-001.properties
store-002 -> C:\symmetric-server-3.8.26\engines\store-002.properties
0 engines returned

open-registration

Usage

symadmin --engine <engine name> open-registration  <node group> <external id>

ie.

symadmin --engine source-000 open-registration dest 001

 


A -> B

 

過程

1. create source properties file

2. create properties source file

3. test properties file (dbsql -e source-000 / dbsql -e dest-000)

4. create "sym_" tables (symadmin --engine source-000 create-sym-tables)

5. create Group, Group Links, Router, Trigger, Trigger Router record

6. start engines (bin\sym)

7. open registration (symadmin --engine source-000 open-registration DST 001)

8. Initial Load (symadmin --engine source-000 reload-node 001)

[1] Connect Source Database

# This is an arbitrary name that is used to access a specific engine using an HTTP URL
# The engine name is also used for the domain name of registered JMX beans.

engine.name=centralNode

# The node group that this node is a member of.

group.id=corp

# The external id for this node has meaning to the user and provides integration into the system where it is deployed.
# For example, it might be a retail store number or a region number.
# each node has a unique sequence number for tracking synchronization events.

external.id=000

# contacted for synchronization. 
# At startup and during each heartbeat, the node updates its entry in the database with this URL.

sync.url=http://localhost:31415/sync/centralNode

# where this node can connect for registration to receive its configuration.
# leave this blank for the source node.
# typically equal to the value of the sync.url of the registration server.

registration.url=

db.driver={JDBC driver class}
db.url={JDBC url for your source database}
db.user=root
db.password=xxxxxx

[3] Connect Target Database

engine.name={an arbitrary name for the node}
group.id={this must match the name of a group inserted in step 3, ex: store}
external.id={id for the node, ex: 001 }
sync.url=http://{hostname}:{port}/{webcontext}/sync/{engine.name}
registration.url={this should match the sync.url in your source property file}
db.driver={JDBC driver class}
db.url={JDBC url for your target database}
db.user={JDBC username}
db.password={JDBC password}

Test properties File by show tables

# Source Side (postgresql)
# Output: schemaname|tablename|tableowner|tablespace|has indexes|hasrules|hastriggers|rowsecurity

dbsql -e source-000

SELECT * FROM pg_catalog.pg_tables;

# Dest. Side (mssql)
# Output: TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME

dbsql -e dest-001

SELECT * FROM information_schema.TABLES;

[2] Configure

 * minimum configuration

 [1] Group
 [2] Group Links       <--  complete registration
 [3] Channels
 [4] Router
 [5] Trigger
 [6] Trigger Router

Groups

insert into SYM_NODE_GROUP
        (node_group_id, description)
        values ('SRC', 'A corporate node');

insert into SYM_NODE_GROUP
        (node_group_id, description)
        values ('DST', 'A retail store node');

Group Links (PUT/GET)

Group links define at a high level how data moves throughout your synchronization scenario.

insert into SYM_NODE_GROUP_LINK
(source_node_group_id, target_node_group_id, data_event_action)
values 
('SRC', 'DST', 'W');

# 雙向連接要加以下 setting
# insert into SYM_NODE_GROUP_LINK
# (source_node_group_id, target_node_group_id, data_event_action)
# values 
# ('DST', 'SRC', 'P');

Wait for Pull [W]

Indicates nodes in the source node group will wait for a node in the target node group to connect via an HTTP GET

Push [P]

Indicates that nodes in the source node group will initiate communication over an HTTP PUT

Route-only [R]

data isn’t exchanged between nodes in the source and nodes in the target node groups

# 當 link 不是相向時, 要注意 P, W

2017-07-06 11:00:50,932 WARN [source-000] [PushService] [source-000-push-default-2] 
Cannot push to node '001' in the group 'dest'.  The sync url is blank

Router

It define more specifically which captured data (Router Type) from a source node

should be sent to which specific nodes in a target node group, all within the context of the node group link.

# defines a router that will send data from the 'corp' group to the 'store' group

insert into SYM_ROUTER
(router_id, source_node_group_id, target_node_group_id, create_time, last_update_time) 
values 
('myRouter','SRC', 'DST', current_timestamp, current_timestamp);

Router Type

  • default  # sends all captured data to all nodes that belong to the target node group
  • java      # A router that executes a Java expression in order to select nodes to route to.
  • bsh       # A router that executes a Bean Shell script expression in order to select nodes to route to
  • ....

Other Setting

  • Sync on Update
  • Sync on Insert
  • Sync on Delete
  • Target Table (target table name is different than the source)

Channels

define logical groupings(transaction data grouped into a transaction channel)

The default channel is called 'default'. (automatically created that all tables will fall into)

 * Transactions will NOT be preserved across channels

    so its important to setup channels to contain all tables that participate in a given transaction.

Trigger

SymmetricDS captures synchronization data using database triggers.

Each record is used by SymmetricDS when generating database triggers.

Database triggers are only generated when a trigger is associated with a ROUTER whose source_node_group_id matches the node group id of the current node.

insert into SYM_TRIGGER 
(trigger_id, source_table_name, channel_id, last_update_time, create_time)
values 
('mytable1', 'mytable1', 'default', current_timestamp, current_timestamp);

Trigger Router

maps the router('myRouter')to the trigger('mytable1')

insert into SYM_TRIGGER_ROUTER 
(trigger_id, router_id, initial_load_order, create_time, last_update_time) 
values 
('mytable1', 'myRouter', 1, current_timestamp, current_timestamp);

One way replication

Table triggers (entries in sym_trigger) specify which tables need to be synchronized.

Trigger router entries match a trigger to a router, specifying where (in which direction) the data for those tables should be routed or sent.

So, in the sample, you have a trigger entry for a table 'item' which is connected to a router corp_2_store by a trigger/router entry.

That specifies that the item table will be sent (one way) from the corp node group to the store node group.

If you added a second trigger_router entry tying the item table trigger to the 'store_2_corp' router, the the item table would be sent bi-drectionaly from corp to store and store to corp.

 


Connection Setting

 

SymmetricDS will search the following locations for database properties:

1. conf/symmetric.properties
2. CLASSPATH:symmetric-override.properties
3. engines/*.properties

 * If the properties are changed in conf/symmetric.properties they will take effect across all engines deployed to the server.

Connection setting

db.driver          # The class name for the JDBC driver
db.url
db.user
db.password
 

driver & url

mySQL (com.mysql.jdbc.Driver)

jdbc:mysql://localhost/sampleroot?tinyInt1isBit=false&zeroDateTimeBehavior=convertToNull

PostgreSQL (org.postgresql.Driver)

jdbc:postgresql://localhost/sampleroot

mssql (net.sourceforge.jtds.jdbc.Driver)

jdbc:jtds:sqlserver://[sql-server:port]/[Database-Name];useCursors=true;bufferMaxMemory=10240;lobBuffer=5242880

H2 (org.h2.Driver)

jdbc:h2:sampleroot;AUTO_SERVER=TRUE;LOCK_TIMEOUT=60000

 


Other properties file setting

 

# This is how often the routing job will be run in milliseconds (Default: 10000)
job.routing.period.time.ms=5000

# This is how often the push job will be run. (Default: 60000)
job.push.period.time.ms=10000

# This is how often the pull job will be run. (Default: 60000)
job.pull.period.time.ms=10000

By default, an initial load will not create the table on the target if it doesn’t already exist.

# Kick off initial load (create the table on the target if it is not present)
initial.load.create.first=true

# delete data on the target node
initial.load.delete.first=true

 * The initial.load.delete.first needs enabled for the source node that sends the initial load.

    It's used when the initial load first starts to create batches.

    You'll see batches that contain rows in sym_data with an event_type of S that contains the delete statement.

 


Node

 

A node is represented by four tables:

  • NODE (basic node information)
  • NODE_IDENTITY (identifies the current node)
  • NODE_SECURITY (Contains a password need to authenticate with another node)
  • NODE_HOST (informational data about the node(IP,OS,Arch,CPU,Java Version ...))

Stopping a Node

Installed nodes are started automatically when the SymmetricDS server is started. An individual node instance can be stopped while other nodes continue to run.

bin/jmx --bean org.jumpmind.symmetric.<engine name>:name=Node --method stop

Uninstalling a Node

Uninstalling a node will remove all SymmetricDS database artifacts and delete the engine’s property file.

bin/symadmin --engine <engine name> uninstall

Registration

When a node is registered, it downloads its SymmetricDS configuration as well as references to the nodes that it should sync with.

A node is considered unregistered if it does not have an NODE_IDENTITY row.

open registration

# A node is considered unregistered if it does not have an "NODE_IDENTITY" row.

# Nodes are only allowed to register if rows exist for the registering node and the registration_enabled(NODE_SECURITY) flag is set to 1.

# Node registration is stored in the NODE and NODE_SECURITY tables.

symadmin --engine <engine name> <node group> <external id>

node group link

Cannot register a client node unless a node group link exists so the registering node can receive configuration updates. 

Please add a group link where the source group id is source and the target group id is dest

insert into SYM_NODE_GROUP_LINK \
(source_node_group_id, target_node_group_id, data_event_action) \
values ('source', 'dest', 'P');

 


Tables

 

sym_node

EXTERNAL_ID

A domain-specific identifier for context within the local system. For example, the retail store number.

SYNC_ENABLED

Indicates whether this node should be sent synchronization.

Disabled nodes are ignored by the triggers, so no entries are made in data_event for the node.

CREATED_AT_NODE_ID

The node_id of the node where this node was created.

This is typically filled automatically with the node_id found in node_identity where registration was opened for the node.

DEPLOYMENT_TYPE

An indicator as to the type of SymmetricDS software that is running.

Possible values are, but not limited to: engine, standalone, war, professional, mobile

sym_node_security

NODE_PASSWORD

The password used by the node to prove its identity during synchronization.
The master node only needs this row if it is going to initiate communicate with another node.

REGISTRATION_ENABLED

Indicates whether registration is open for this node. Re-registration may be forced for a node if this is set back to '1' in a parent database for the node_id that should be re-registred.

INITIAL_LOAD_ENABLED (Default: 0)

Indicates whether an initial load will be sent to this node.

sym_channel

Processing Order

processed in ascending order

Batch Algorithm

Batching is the grouping of data, by channel, to be transferred and committed at the client together.

Default

All changes that happen in a transaction are guaranteed to be batched together. Multiple transactions will be batched and committed together until there is no more data to be sent or the max_batch_size is reached.

Group Link Direction

override the default group link communication.("push" / "pull" )

Enabled

Indicates whether the channel is enabled or disabled. If a channel is disabled, data is still captured for changes that occur on the source system, but it will not be routed and sent to the target until the channel is re-enabled.

 


Jobs

 

Most work done by SymmetricDS is initiated by jobs.
Jobs are tasks that are started and scheduled by a job manager.

job.<name>.period.time.ms

 


TRIGGER_HIST

 

SymmetricDS installs database triggers to capture changes in the DATA table.

A record of the triggers that were installed and
what columns are being captured is stored in the TRIGGER_HIST table.

reason a trigger was rebuilt

N     New trigger that has not been created before
S     Schema changes in the table were detected
C     Configuration changes in Trigger
T     Trigger was missing

 


Disabling Synchronization

 

# 0: Disable the extraction of all channels

#    with the exception of the config channel
dataextractor.enable=1

 


Offline Synchronization

 

Configuring a node as offline will still allow changes to be captured and batched for replication.

# offline push and pull jobs

INSERT INTO sym_parameter
(external_id, node_group_id, param_key, param_value, create_time, last_update_by, last_update_time) VALUES
('ALL', 'ALL', 'start.offline.pull.job', 'true', current_timestamp, 'userid', current_timestamp);

INSERT INTO sym_parameter
(external_id, node_group_id, param_key, param_value, create_time, last_update_by, last_update_time) VALUES
('ALL', 'ALL', 'start.offline.push.job', 'true', current_timestamp, 'userid', current_timestamp);

# node that should be offline

INSERT INTO sym_parameter
(external_id, node_group_id, param_key, param_value, create_time, last_update_by, last_update_time) VALUES
('001', 'STORE', 'node.offline', 'true', current_timestamp, 'userid', current_timestamp);

outgoing batches intended for the offline node are now written as files(csv) <= in outgoing folder

Other nodes are unaffected and will continue to synchronize normally

Instead the local file system will be used for replication.

It is up to the user transport batch (*.csv) files to and from the node based on incoming or outgoing changes.

outgoing folder -> incoming folder

 


Conflict Strategy

 

source_wins
target_wins
manual

conflict table

Defines how conflicts in row data should be handled during the load process.

 


Outgoing batches

 

Outgoing batches are delivered to the target node when the source node pushes or when the target node pulls.

# show outgoing synchronization failures by node:

select count(*), node_id from sym_outgoing_batch
  where error_flag=1 group by node_id;

# how the number of data rows that have not been delivered to target nodes:

select sum(data_event_count), node_id from sym_outgoing_batch
  where status != 'OK' group by node_id;

# To locate batches in error

select * from sym_outgoing_batch where error_flag=1;

# Causing SymmetricDS to skip the batch completely.

update sym_outgoing_batch set status='OK' where batch_id='XXXXXX'

 


Startup / RuntimeParameters

 

* Runtime parameters are read periodically from properties files or the database.

Startup Parameters

auto.config.database (Default: true)

If this is true, when symmetric starts up it will try to create the necessary tables.

auto.sync.config.at.startup

If this is true, then check if configuration should be pulled from registration server at startup.

If the config version in the database does not match the software version, it will pull config.

auto.sync.configuration

Capture and send SymmetricDS configuration changes to client nodes.

engine.name

external.id

registration.url

sync.url

group.id

db.driver

db.url

db.user

db.password

sync.table.prefix (Default: sym)

Runtime parameters

initial.load.create.first (Default: false)

job.purge.period.time.ms

# Don't muddy the waters with purge logging

job.purge.period.time.ms=7200000

The Purge Outgoing Job is responsible to purging outgoing data that has successfully been loaded at the target and is older than purge.retention.minutes.

This job purges the following tables:

  • DATA
  • DATA_EVENT
  • OUTGOING_BATCH
  • EXTRACT_REQUEST

  


Cheat List

 

select * from sym_router;

select * from sym_trigger;

select * from sym_trigger_router;

update sym_node_security set initial_load_enabled = 1 where node_id = '001';

select sum(data_event_count), node_id from sym_outgoing_batch where status != 'OK' group by node_id;

 

 


Doc

 

https://www.symmetricds.org/doc/3.8/html/tutorials.html

https://www.symmetricds.org/doc/3.8/html/user-guide.html

https://www.symmetricds.org/docs/how-to/connect-to-database

 


Other

 

http://datahunter.org/h2_database

 

 

Creative Commons license icon Creative Commons license icon