Despite the many strategies that are being promoted for optimizing MySQL 5 databases there is one basic strategy that I believe to be important, optimization with proper use of data types. Granted that other strategies are great, but they essentially rely on a database with tables designed in an intelligent fashion.
The two primary categories which I am going to focus on are Numeric Data Types and Character String Data Types for these are used most often and are usually the ones which need optimizing the most.
ABC's of Data Types
First, lets get this out of the way. When designing your table's you should keep the ABC's of data types in mind.
A –
Apt – use a data type that is most fitting for the data that will be stored
B –
Brief – try to use the smallest fitting size for your data as it will save storage space and improve performance
C –
Complete – use a type that is most accommodating for the size of the data being stored (keep in mind future data sizes)
This simple guideline will in fact guide you in the right direction when creating your table and its related fields (columns).
Numeric Data Types
When dealing with numeric data types, there are many options in which you can utilize. Again keeping in mind the ABC's of data types, you must ask yourself, “What am I storing?”. Depending on your requirements, you can use one of the following data types:
Integer Types
Numeric data that does not contain fractions (ie. 1.005)
| Type | Bytes | Minimum Value | Maximum Value |
|---|
| | | (Signed/Unsigned) | (Signed/Unsigned) |
| TINYINT | 1 | -128 | 127 |
| | | 0 | 255 |
| SMALLINT | 2 | -32768 | 32767 |
| | | 0 | 65535 |
| MEDIUMINT | 3 | -8388608 | 8388607 |
| | | 0 | 16777215 |
| INT | 4 | -2147483648 | 2147483647 |
| | | 0 | 4294967295 |
| BIGINT | 8 | -9223372036854775808 | 9223372036854775807 |
| | | 0 | 18446744073709551615 |
Optimize: Following the ABC's you want to select the data type that is most fitting for your min an max data size. Simple as that!
Floating Point Types (FPT)
Numeric data that is used to represent approximate (not accurate) value numbers. Don't think about using this for any kind of monetary system such as an accounting or banking system.
Although not accurate, FPT's are very efficient for storage and computation as they are stored in the native binary floating-point format (IEEE 754) used also by the server's CPU. But again, values are subject to rounding error!
| Type | Storage Required | Signed Range | Unsigned Range |
|---|
| FLOAT | 4 bytes | -3.402823466E+38 to -1.175494351E-38 | 0, and 1.175494351E-38 to 3.402823466E+38 |
| DOUBLE | 8 bytes | -1.7976931348623157E+308 to -2.2250738585072014E-308 | 0, and 2.2250738585072014E-308 to 1.7976931348623157E+308 |
Optimize: Following the ABC's you want to select the data type that is most fitting for your min and max data size. If you are using non fractional numbers with out any calculations, resort to Integer Data Types as they will potentially free up additional byes per entry.
Fixed-Point Types (FXPT)
Numeric data that is used to represent exact-value numbers that have a integer part, a fractional part, or both. This type, unlike the floating point type is not subject to rounding error as they are not stored using the processor's native binary format, hence, not calculated as quickly.
Optimize: Use only if you are in fact using some fractional numbers. In the event that you are not you may want to resort to the most suitable Integer Data Type. These may prove to be smaller in storage size freeing up additional space. Also, if you are storing smaller numbers you could also take advantage of Floating Points and utilize their storage format for efficient calculations, but the smallest storage size will be that of 4 bytes, possibly 8 for Doubles.
Character String Data Types
Although the Numeric Data Types are a bit more straight forward, Text Types are a bit more involved as far as how they are stored and handled by the server. Being that most these types are used quite often, there is usually a great amount for room for enhancing performance with these fields.
CHAR
This data type is a fixed-length string that is padded with spaces to the right of the specified length of the field (ie. CHAR(5) will store “Dan” as “Dan ”). When defining a field as CHAR(m) m specifies the max number of characters accepted in addition to the number of bytes (8 bits in a byte) stored regardless of number of characters stored. (ie From the previous example, “Dan” will take up 5 bytes). The max for CHAR(m) is m = 255.
VARCHAR
This data type is a variable-length string that is not padded with spaces to the specified length, but does include an additional byte to account for the length of the stored text. This length byte is read by the server when querying against the table. Thereby, taking up some extra processing time.
Optimize: First, refer to the ABC's to decide what your field requirements are. Second, lets look at the following scenarios to help in you decide which approach is best in optimizing your fields.
- You know that your data will always be a fixed length. Use CHAR and not VARCHAR! VARCHAR will store an additional byte for the length of the text. You don't need this extra byte of information as the server already knows the fixed length of the column. In addition, the server does not have to read each row to figure out the length of the text like it does for VARCHAR.
- If you are storing data that can vary in length but you know the max length, then use VARCHAR and not CHAR. CHAR(m) will pad the text with trailing spaces to accommodate the specified length of the field, always taking up m bytes. VARCHAR on the other hand will always append an additional byte but it will not pad the text with any trailing spaces, keeping the storage down to a minimum.
| Type | Storage Required | Maximum Length |
|---|
| CHAR (m) | m characters | 255 characters |
| VARCHAR (m) | L characters plus 1 byte | 65,535 characters (subject to limitations) |
| TINYTEXT | L characters + 1 byte | 255 characters |
| TEXT | L characters + 2 bytes | 65,535 characters |
| MEDIUMTEXT | L characters + 3 bytes | 16,777,215 characters |
| LONGTEXT | L characters + 4 bytes | 4,294,967,295 characters |
TEXT
This data type and its variations are closely similar to the previously mentioned with one exception, they all include informational bytes ranging from 1 – 4 in addition to the length of the text itself.
Optimize: Taking into consideration the ABC's begin reviewing your options in terms of suitable data types. Then determine if you can use CHAR or VARCHAR for any of your data fields as they may only use 1 or 0 additional bytes for storage. If not, then use the most fitting data type for your requirements.
Comments
Thu, 14.08.2008 16:58
Thanks for the tip. I made a slight mod you might be interested [...]
Mon, 28.07.2008 15:06
Solution (to my issue): Views > Tools > Flush Views Cache It explains that Views doesn't always keep up with changes [...]
Mon, 28.07.2008 14:52
Thanks for this helpful post. I've seen this effect too. I'm running into a different (but related?) issue - the Views [...]
Tue, 15.07.2008 20:25
Oscar, Krista from Calais here, writing to let you know that Calais 2.1 is live. In addition to our ongoing [...]
Tue, 01.07.2008 11:30
Dan, You are absolutely correct and I should have stated this within my post; the described steps within the post [...]
Mon, 30.06.2008 09:45
i wouldnt recomand this at all, because if something happens and the conection is lost u will have your data lost if the [...]
Mon, 09.06.2008 13:42
PDT syntax highlighting support does not seem to work when subclipse is installed, any one else had this problem?
Mon, 09.06.2008 11:56
I didn't mean to imply that you were bashing unit tests.
Mon, 09.06.2008 11:52
My point isn't to bash unit tests, but rather to say there are a bunch of things you should be doing before you get [...]
Mon, 09.06.2008 11:43
I agree with, what I think is, the gist of your argument. That is, if you don't write code that anticipates failure, [...]
Mon, 09.06.2008 08:58
clipse is an open source IDE — or as they put it themselves: “universal toolset for development”. It [...]
Tue, 27.05.2008 12:17
Navigation links should fill their container to ensure ease of selection. A good method for that is to make them [...]
Thu, 22.05.2008 10:35
One of the better comments I've seen in a while: "Although I like PHP, I agree the language is only as good as the [...]
Tue, 20.05.2008 14:03
Oscar, Yahoo's Term Extraction service takes an entire article and returns a few of (what it thinks are) the most [...]
Tue, 20.05.2008 13:13
Hi, Tom Tague from Calais here. First, thanks for taking note of Calais. And integrating an example right within the [...]