diff options
author | GitLab Bot <gitlab-bot@gitlab.com> | 2022-08-18 11:17:02 +0300 |
---|---|---|
committer | GitLab Bot <gitlab-bot@gitlab.com> | 2022-08-18 11:17:02 +0300 |
commit | b39512ed755239198a9c294b6a45e65c05900235 (patch) | |
tree | d234a3efade1de67c46b9e5a38ce813627726aa7 /doc/development/serializing_data.md | |
parent | d31474cf3b17ece37939d20082b07f6657cc79a9 (diff) |
Add latest changes from gitlab-org/gitlab@15-3-stable-eev15.3.0-rc42
Diffstat (limited to 'doc/development/serializing_data.md')
-rw-r--r-- | doc/development/serializing_data.md | 93 |
1 files changed, 7 insertions, 86 deletions
diff --git a/doc/development/serializing_data.md b/doc/development/serializing_data.md index 97e6f665484..aa8b20eded7 100644 --- a/doc/development/serializing_data.md +++ b/doc/development/serializing_data.md @@ -1,90 +1,11 @@ --- -stage: Data Stores -group: Database -info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments +redirect_to: 'database/serializing_data.md' +remove_date: '2022-11-06' --- -# Serializing Data +This document was moved to [another location](database/serializing_data.md). -**Summary:** don't store serialized data in the database, use separate columns -and/or tables instead. This includes storing of comma separated values as a -string. - -Rails makes it possible to store serialized data in JSON, YAML or other formats. -Such a field can be defined as follows: - -```ruby -class Issue < ActiveRecord::Model - serialize :custom_fields -end -``` - -While it may be tempting to store serialized data in the database there are many -problems with this. This document outlines these problems and provide an -alternative. - -## Serialized Data Is Less Powerful - -When using a relational database you have the ability to query individual -fields, change the schema, index data, and so forth. When you use serialized data -all of that becomes either very difficult or downright impossible. While -PostgreSQL does offer the ability to query JSON fields it is mostly meant for -very specialized use cases, and not for more general use. If you use YAML in -turn there's no way to query the data at all. - -## Waste Of Space - -Storing serialized data such as JSON or YAML ends up wasting a lot of space. -This is because these formats often include additional characters (for example, double -quotes or newlines) besides the data that you are storing. - -## Difficult To Manage - -There comes a time where you must add a new field to the serialized -data, or change an existing one. Using serialized data this becomes difficult -and very time consuming as the only way of doing so is to re-write all the -stored values. To do so you would have to: - -1. Retrieve the data -1. Parse it into a Ruby structure -1. Mutate it -1. Serialize it back to a String -1. Store it in the database - -On the other hand, if one were to use regular columns adding a column would be: - -```sql -ALTER TABLE table_name ADD COLUMN column_name type; -``` - -Such a query would take very little to no time and would immediately apply to -all rows, without having to re-write large JSON or YAML structures. - -Finally, there comes a time when the JSON or YAML structure is no longer -sufficient and you must migrate away from it. When storing only a few rows -this may not be a problem, but when storing millions of rows such a migration -can take hours or even days to complete. - -## Relational Databases Are Not Document Stores - -When storing data as JSON or YAML you're essentially using your database as if -it were a document store (for example, MongoDB), except you're not using any of the -powerful features provided by a typical RDBMS _nor_ are you using any of the -features provided by a typical document store (for example, the ability to index fields -of documents with variable fields). In other words, it's a waste. - -## Consistent Fields - -One argument sometimes made in favour of serialized data is having to store -widely varying fields and values. Sometimes this is truly the case, and then -perhaps it might make sense to use serialized data. However, in 99% of the cases -the fields and types stored tend to be the same for every row. Even if there is -a slight difference you can still use separate columns and just not set the ones -you don't need. - -## The Solution - -The solution is to use separate columns and/or separate tables. -This allows you to use all the features provided by your database, it -makes it easier to manage and migrate the data, you conserve space, you can -index the data efficiently and so forth. +<!-- This redirect file can be deleted after <2022-11-06>. --> +<!-- Redirects that point to other docs in the same project expire in three months. --> +<!-- Redirects that point to docs in a different project or site (for example, link is not relative and starts with `https:`) expire in one year. --> +<!-- Before deletion, see: https://docs.gitlab.com/ee/development/documentation/redirects.html --> |