Elasticsearch Architecture
Elasticsearch is distributed, which means that indices can be divided into shards and
each shard can have zero or more replicas.
Each node hosts one or more shards and acts as a coordinator to delegate operations to the correct shard(s).
Rebalancing and routing are done automatically".
Elasticsearch — used to store data in the elastic database
Elasticsearch Node
Elasticsearch master node
controls the Elasticsearch cluster processing one cluster state at a time and
broadcasting the state to all other nodes.
The master node is in charge of all clusterwide operations,
including the creation and deletion of indexes.
Elasticsearch data node
contains data and the inverted index. This is the default configuration for nodes.
Elasticsearch client node
serves as a load balancer that routs incoming requests to various cluster nodes.
Installing Java on Ubuntu 20.04
apt update -y
apt-get install shasum wget
apt install default-jdk -y
java -version
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17....
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17....
shasum -a 512 -c elasticsearch-7.17.6-amd64.deb.sha512
dpkg -i elasticsearch-7.17.6-amd64.deb
systemctl daemon-reload
systemctl start elasticsearch
systemctl enable elasticsearch
Firewall Settings
# ufw
ufw allow from x.x.x.x to any port 9200
ufw enable
ufw status
# Test
curl 'http://localhost:9200'
# Port
9200 is for REST. (http) # http.port
9300 for nodes communication... (tcp) # transport.port
Settings
Config File: /etc/elasticsearch/elasticsearch.yml # Debian
# Elasticsearch listens for traffic from everywhere on port 9200
network.host: 0.0.0.0 http.port: 9200
discovery.type
discovery.type: single-node
Specifies whether Elasticsearch should form a multiple-node cluster.
Defaults to multi-node, which means that Elasticsearch discovers other nodes when forming a cluster and
allows other nodes to join the cluster later.
MAX_LOCKED_MEMORY
# Lock the memory on startup:
/etc/elasticsearch/elasticsearch.yml
bootstrap.memory_lock: true
# Debian 安裝(*.deb)的設定方式
systemctl edit elasticsearch
[Service] LimitMEMLOCK=infinity
LimitMEMLOCK = Maximum locked memory size.
Set to unlimited if you use the bootstrap.memory_lock option in elasticsearch.yml.
systemctl daemon-reload
systemctl restart elasticsearch
Checking
curl -s http://localhost:9200/_nodes?pretty | grep mlockall
"mlockall" : true
Heap size settings
By default, Elasticsearch automatically sets the JVM heap size based on a node’s roles and total memory.
To override the default heap size, set the minimum and maximum heap size settings, Xms and Xmx.
The minimum and maximum values must be the same.
* Set Xms and Xmx to no more than 50% of your total memory.
原因:
- Elasticsearch requires memory for purposes other than the JVM heap.
- For example, Elasticsearch uses off-heap buffers for efficient network communication and
- relies on the operating system’s filesystem cache for efficient access to files.
- The JVM itself also requires some memory.
Status
# Check License
curl -s http://localhost:9200/_license | jq
{
"license" : {
"status" : "active",
"uid" : "UUID",
"type" : "basic",
...
}
# Get cluster status
curl -s http://localhost:9200/_cluster/health | jq
{ "cluster_name" : "elasticsearch", "status" : "yellow", "timed_out" : false, "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 11, ... }
- a red status indicates that the specific shard is not allocated in the cluster,
- yellow means that the primary shard is allocated but replicas are not, and
- green means that all shards are allocated.
"pretty" in the above request => It enables human-readable format
# Get node status
curl -s http://localhost:9200/_nodes
{ "_nodes" : { "total" : 1, "successful" : 1, "failed" : 0 }, "cluster_name" : "elasticsearch", "nodes" : { ... } }
P.S.
- GET /_nodes/<node_id>
- GET /_nodes/<node_id>/<metric>
#
curl -s http://localhost:9200/_stats
Using Elasticsearch
Elasticsearch uses a RESTful API: CRUD commands: create, read, update, and delete.
You can add your first entry like so:
curl -XPOST -H "Content-Type: application/json" 'http://localhost:9200/tutorial/helloworld/1' -d '{ "message": "Hello World!" }'
You can retrieve this first entry with an HTTP GET request.
curl -X GET -H "Content-Type: application/json" 'http://localhost:9200/tutorial/helloworld/1'
To modify an existing entry, you can use an HTTP PUT request.
curl -X PUT -H "Content-Type: application/json" 'localhost:9200/tutorial/helloworld/1?pretty' -d '
{
"message": "Hello, People!"
}'
"pretty" in the above request. => It enables human-readable format so that you can write each data field on a new row.