MySQL Charset 與 Collate

最後更新: 2018-05-29

目錄

  • Charset 與 Collate
  • Character Set
  • DB, Table, Column Level 的 Character Set 設定
  • Character set and Collation 的順序
  • my.cnf
  • SQL 更改 Character Set
  • utf8 vs utf8mb4
  • PHP Settings
     - php.ini
     - Code
  • Troubleshoot

 


Charset 與 Collate

 

分別

  • Charset:   儲存值時用
  • Collate:    值比較時用 (簡單格式: bin)

The server sets below variables whenever the default database changes

  • collation_database
  • character_set_database

# To see the default character set and collation for a given database

USE db_name;

SELECT @@character_set_database, @@collation_database;

+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8                     | utf8_general_ci      |
+--------------------------+----------------------+

# Show MySQL default character set

SHOW VARIABLES LIKE  'char%';

+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

P.S.

character_set_database (Deprecated)

If there is no default database, the variable has the same value as character_set_server.

Checking

show charset;

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

 


Character Set

 

查看本機支援什麼 Character Set:

# To see the default collation for each character set

mysql> SHOW CHARACTER SET;

  • _ci     大小写不敏感
  • _cs    大小写敏感
  • _bin   二元

查看某 Character Set 的 collation

mysql> SHOW COLLATION WHERE Charset = 'big5';

+-----------------+---------+----+---------+----------+---------+
| Collation       | Charset | Id | Default | Compiled | Sortlen |
+-----------------+---------+----+---------+----------+---------+
| big5_chinese_ci | big5    |  1 | Yes     | Yes      |       1 |
| big5_bin        | big5    | 84 |         | Yes      |       1 |
+-----------------+---------+----+---------+----------+---------+

utf8_*

utf8_bin: compare strings by the binary value of each character in the string

utf8_general_ci: compare strings using general language rules and using case-insensitive comparisons

utf8_general_cs: compare strings using general language rules and using case-sensitive comparisons

For example, the following will evaluate at true with either of the UTF8_general collations, but not with the utf8_bin collation: Code:

Ä = A
Ö = O
Ü = U

With the utf8_general_ci collation, they would also return true even if not the same case.

None of these is inherently "better"; they simply have different functionalities. You need to choose which one best suits your particular needs.

INFORMATION_SCHEMA.SCHEMATA

Database options are stored in the data dictionary and

can be examined by checking the INFORMATION_SCHEMA.SCHEMATA table.

 


DB, Table, Column Level 的 Character Set 設定

 

DB Level

CREATE DATABASE db_name [ DEFAULT CHARACTER SET charset_name [COLLATE collation_name] ]

CREATE DATABASE `domain_db` DEFAULT CHARACTER SET utf8

Table Level

CREATE TABLE tbl_name (column_list) [ DEFAULT CHARACTER SET charset_name [COLLATE collation_name] ]

Column Level

// CHAR、VARCHAR或TEXT類型

col_name {CHAR | VARCHAR | TEXT} (col_length) [CHARACTER SET charset_name [COLLATE collation_name]]

 


Character set and Collation 的順序

 

If both CHARACTER SET charset_name and COLLATE collation_name are specified,

 - character set charset_name and collation collation_name are used.

If CHARACTER SET charset_name is specified without COLLATE,

character set charset_name and its default collation are used.

mysql> SHOW CHARACTER SET WHERE CHARSET = 'utf8';

+---------+---------------+-------------------+--------+
| Charset | Description   | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci   |      3 |
+---------+---------------+-------------------+--------+

If COLLATE collation_name is specified without CHARACTER SET,

 - the character set associated with collation_name and collation collation_name are used
    
Otherwise (neither CHARACTER SET nor COLLATE is specified)

 - the server character set and server collation are used.

If there is no default database,

the variables have the same value as the corresponding server-level system variables,

 - character_set_server

 - collation_server

 


my.cnf

 

For mariadb-5.5 (Centos 7)

/etc/my.cnf.d/server.cnf

[mysqld]
character-set-server=utf8

/etc/my.cnf.d/client.cnf

[client]
default-character-set=utf8

 * character-set 有分 server 及 client 的 !!!

 * It is still necessary for applications to configure their connection using "SET NAMES" after they connect

 


SQL 更改 Character Set

 

SET NAMES utf8;

相當於

SET character_set_client = utf8;
SET character_set_connection = utf8;
SET character_set_results = utf8;

Remark: Checking

SHOW VARIABLES LIKE  'char%';

SHOW VARIABLES LIKE 'collation%';

SET CHARACTER SET utf8;

相當於

SET character_set_client = utf8;
SET character_set_results = utf8;
SET collation_connection = @@collation_database;

P.S.

character_set 與 collation 是兩樣不同東西來

可用的 CHARACTER SET

SHOW CHARACTER SET;

+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
...

 


utf8 vs utf8mb4

 

The utf8 character set in MySQL only supports a subset of Unicode characters,
 specifically those that can be represented with 3 bytes.
It cannot handle certain characters: emojis.

The utf8mb4 is an extension of utf8 that supports the full range of Unicode characters (4 bytes)

Converting from utf8mb4 to utf8

1. Backup

2. DATABASE

ALTER DATABASE database_name CHARACTER SET utf8 COLLATE utf8_general_ci;

3. TABLE

SHOW TABLES;
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci;

 


PHP Settings

 

php.ini

default_charset = "UTF-8"

Code

Version >= 5.2.3

bool mysql_set_charset ( string $charset [, resource $link_identifier = NULL ] )

<?php
  $link = mysql_connect('localhost','user1','pass1',TRUE);
  mysql_selectdb('db1',$link);
  mysql_set_charset('utf8',$link);
?>

 


Troubleshoot

 

解決: had no or invalid character set, and default character set is multi-byte, so character column sizes may have changed

[mysqld]
init-connect='SET NAMES big5'

# character set (--character-set-server=)
character-set-server = big5
# collation (--collation-server=)
collation-server=big5_chinese_ci
default-character-set = big5

mysql> show variables like 'char%';

mysql> show variables like 'collation%';

Mysql 4.0 Character Sets and Corresponding Mysql 4.1 Character Set/Collation Pairs

If you want to start mysqld from a 4.1.x distribution with data created by MySQL 4.0,

you should start the server with the same character set and collation.

In this case, you won't need to reindex your data.

4.0      4.1      4.1 Collation
big5     big5     big5_chinese_ci

 


Change Tables Character Set

 

If you use ALTER TABLE to convert a column from one character set to another,

MySQL attempts to map the data values,

but if the character sets are incompatible, there may be data loss.

 


 

Creative Commons license icon Creative Commons license icon