diff options
author | michel <michel@bestpractical.com> | 2019-11-26 18:26:07 +0300 |
---|---|---|
committer | sunnavy <sunnavy@bestpractical.com> | 2020-02-13 00:06:13 +0300 |
commit | c59003c543e2db4a6e18d8bcc7764ce4b2af5447 (patch) | |
tree | 690784091fee57d6386c13f6038778be1e52d95a /README.MariaDB | |
parent | 57eb2ba67d17a4462b9b8c19943a6045d6012a58 (diff) |
Explain utf8mb4 character set updates
Diffstat (limited to 'README.MariaDB')
-rw-r--r-- | README.MariaDB | 58 |
1 files changed, 58 insertions, 0 deletions
diff --git a/README.MariaDB b/README.MariaDB new file mode 100644 index 0000000000..149f1ffa23 --- /dev/null +++ b/README.MariaDB @@ -0,0 +1,58 @@ +Starting with RT 5.0.0, the minimum supported MariaDB version is 10.2.5 +because this is the first version to provide full support for 4 byte +utf8 characters in tables and indexes. Read on for details on this +change. + +RT 5.0.0 now defaults MariaDB tables to utf8mb4, which is available in +versions before 10.2.5. However, before MariaDB version 10.2.5, utf8mb4 +tables could not have indexes with type VARCHAR(255): the default size +for index entries was 767 bytes, which is enough for 255 chars stored +as at most 3 chars (the utf8 format), but not as 4 bytes (utf8mb4). +10.2.5 sets the default index size to 3072 for InnoDB tables, resolving +that issue. + +https://mariadb.com/kb/en/changes-improvements-in-mariadb-102/ +https://mariadb.com/kb/en/mariadb-1025-changelog/ (search for utf8) + +In MariaDB, RT uses the utf8mb4 character set to support all +unicode characters, including the ones that are encoded with 4 bytes in +utf8 (some Kanji characters and a good number of emojis). The DB tables +and RT are both set to this character set. + +If your MariaDB database is used only for RT, you can consider +setting the default character set to utf8mb4. This will +ensure that backups and other database access outside of RT have the +correct character set. + +This is done by adding the following lines to the MariaDB configuration: + +[client-server] +character-set-server = utf8mb4 + +You can check the values your server is using by running this command: + mysqladmin variables | grep -i character_set + +Setting the default is particularly important for mysqldump, to avoid +backups to be silently corrupted. + +If the MySQL DB is shared with other applications and the default +character set cannot be set to utf8mb4, the command to backup the +database must set it explicitly: + + ( mysqldump --default-character-set=utf8mb4 rt5 --tables sessions --no-data --single-transaction; \ + mysqldump --default-character-set=utf8mb4 rt5 --ignore-table rt5.sessions --single-transaction ) \ + | gzip > rt-`date +%Y%m%d`.sql.gz + +Restoring a backup is done the usual way, since the character set for +all tables is set to utf8mb4, there is no further need to tell MariaDB +about it: + + gunzip -c rt-20191125.sql.gz | mysql -uroot -p rt5 + +These character set updates now allow RT on MariaDB to accept and store 4-byte +characters like emojis. However, searches can still be inconsistent. You may be +able to get different or better results by experimenting with different collation +settings. For more information: + +https://stackoverflow.com/a/41148052 +https://mariadb.com/kb/en/character-sets/ |