Welcome to mirror list, hosted at ThFree Co, Russian Federation.

gitlab.com/gitlab-org/gitlab-foss.git - Unnamed repository; edit this file 'description' to name the repository.
summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
Diffstat (limited to 'doc/development/database/serializing_data.md')
-rw-r--r--doc/development/database/serializing_data.md90
1 files changed, 90 insertions, 0 deletions
diff --git a/doc/development/database/serializing_data.md b/doc/development/database/serializing_data.md
new file mode 100644
index 00000000000..97e6f665484
--- /dev/null
+++ b/doc/development/database/serializing_data.md
@@ -0,0 +1,90 @@
+---
+stage: Data Stores
+group: Database
+info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/engineering/ux/technical-writing/#assignments
+---
+
+# Serializing Data
+
+**Summary:** don't store serialized data in the database, use separate columns
+and/or tables instead. This includes storing of comma separated values as a
+string.
+
+Rails makes it possible to store serialized data in JSON, YAML or other formats.
+Such a field can be defined as follows:
+
+```ruby
+class Issue < ActiveRecord::Model
+ serialize :custom_fields
+end
+```
+
+While it may be tempting to store serialized data in the database there are many
+problems with this. This document outlines these problems and provide an
+alternative.
+
+## Serialized Data Is Less Powerful
+
+When using a relational database you have the ability to query individual
+fields, change the schema, index data, and so forth. When you use serialized data
+all of that becomes either very difficult or downright impossible. While
+PostgreSQL does offer the ability to query JSON fields it is mostly meant for
+very specialized use cases, and not for more general use. If you use YAML in
+turn there's no way to query the data at all.
+
+## Waste Of Space
+
+Storing serialized data such as JSON or YAML ends up wasting a lot of space.
+This is because these formats often include additional characters (for example, double
+quotes or newlines) besides the data that you are storing.
+
+## Difficult To Manage
+
+There comes a time where you must add a new field to the serialized
+data, or change an existing one. Using serialized data this becomes difficult
+and very time consuming as the only way of doing so is to re-write all the
+stored values. To do so you would have to:
+
+1. Retrieve the data
+1. Parse it into a Ruby structure
+1. Mutate it
+1. Serialize it back to a String
+1. Store it in the database
+
+On the other hand, if one were to use regular columns adding a column would be:
+
+```sql
+ALTER TABLE table_name ADD COLUMN column_name type;
+```
+
+Such a query would take very little to no time and would immediately apply to
+all rows, without having to re-write large JSON or YAML structures.
+
+Finally, there comes a time when the JSON or YAML structure is no longer
+sufficient and you must migrate away from it. When storing only a few rows
+this may not be a problem, but when storing millions of rows such a migration
+can take hours or even days to complete.
+
+## Relational Databases Are Not Document Stores
+
+When storing data as JSON or YAML you're essentially using your database as if
+it were a document store (for example, MongoDB), except you're not using any of the
+powerful features provided by a typical RDBMS _nor_ are you using any of the
+features provided by a typical document store (for example, the ability to index fields
+of documents with variable fields). In other words, it's a waste.
+
+## Consistent Fields
+
+One argument sometimes made in favour of serialized data is having to store
+widely varying fields and values. Sometimes this is truly the case, and then
+perhaps it might make sense to use serialized data. However, in 99% of the cases
+the fields and types stored tend to be the same for every row. Even if there is
+a slight difference you can still use separate columns and just not set the ones
+you don't need.
+
+## The Solution
+
+The solution is to use separate columns and/or separate tables.
+This allows you to use all the features provided by your database, it
+makes it easier to manage and migrate the data, you conserve space, you can
+index the data efficiently and so forth.