 |
|
Thursday, January 31. 2008
Despite the many strategies that are being promoted for optimizing MySQL 5 databases there is one basic strategy that I believe to be important, optimization with proper use of data types. Granted that other strategies are great, but they essentially rely on a database with tables designed in an intelligent fashion.
The two primary categories which I am going to focus on are Numeric Data Types and Character String Data Types for these are used most often and are usually the ones which need optimizing the most.
ABC's of Data Types
First, lets get this out of the way. When designing your table's you should keep the ABC's of data types in mind.
A – Apt – use a data type that is most fitting for the data that will be stored
B – Brief – try to use the smallest fitting size for your data as it will save storage space and improve performance
C – Complete – use a type that is most accommodating for the size of the data being stored (keep in mind future data sizes)
This simple guideline will in fact guide you in the right direction when creating your table and its related fields (columns).
Numeric Data Types
When dealing with numeric data types, there are many options in which you can utilize. Again keeping in mind the ABC's of data types, you must ask yourself, “What am I storing?”. Depending on your requirements, you can use one of the following data types:
Integer Types
Numeric data that does not contain fractions (ie. 1.005)
| Type | Bytes | Minimum Value | Maximum Value |
|---|
| | | (Signed/Unsigned) | (Signed/Unsigned) | | TINYINT | 1 | -128 | 127 | | | | 0 | 255 | | SMALLINT | 2 | -32768 | 32767 | | | | 0 | 65535 | | MEDIUMINT | 3 | -8388608 | 8388607 | | | | 0 | 16777215 | | INT | 4 | -2147483648 | 2147483647 | | | | 0 | 4294967295 | | BIGINT | 8 | -9223372036854775808 | 9223372036854775807 | | | | 0 | 18446744073709551615 |
Optimize: Following the ABC's you want to select the data type that is most fitting for your min an max data size. Simple as that!
Floating Point Types (FPT)
Numeric data that is used to represent approximate (not accurate) value numbers. Don't think about using this for any kind of monetary system such as an accounting or banking system.
Although not accurate, FPT's are very efficient for storage and computation as they are stored in the native binary floating-point format (IEEE 754) used also by the server's CPU. But again, values are subject to rounding error!
| Type | Storage Required | Signed Range | Unsigned Range |
|---|
| FLOAT | 4 bytes | -3.402823466E+38 to -1.175494351E-38 | 0, and 1.175494351E-38 to 3.402823466E+38 | | DOUBLE | 8 bytes | -1.7976931348623157E+308 to -2.2250738585072014E-308 | 0, and 2.2250738585072014E-308 to 1.7976931348623157E+308 |
Optimize: Following the ABC's you want to select the data type that is most fitting for your min and max data size. If you are using non fractional numbers with out any calculations, resort to Integer Data Types as they will potentially free up additional byes per entry.
Fixed-Point Types (FXPT)
Numeric data that is used to represent exact-value numbers that have a integer part, a fractional part, or both. This type, unlike the floating point type is not subject to rounding error as they are not stored using the processor's native binary format, hence, not calculated as quickly.
Optimize: Use only if you are in fact using some fractional numbers. In the event that you are not you may want to resort to the most suitable Integer Data Type. These may prove to be smaller in storage size freeing up additional space. Also, if you are storing smaller numbers you could also take advantage of Floating Points and utilize their storage format for efficient calculations, but the smallest storage size will be that of 4 bytes, possibly 8 for Doubles.
Character String Data Types
Although the Numeric Data Types are a bit more straight forward, Text Types are a bit more involved as far as how they are stored and handled by the server. Being that most these types are used quite often, there is usually a great amount for room for enhancing performance with these fields.
CHAR
This data type is a fixed-length string that is padded with spaces to the right of the specified length of the field (ie. CHAR(5) will store “Dan” as “Dan ”). When defining a field as CHAR(m) m specifies the max number of characters accepted in addition to the number of bytes (8 bits in a byte) stored regardless of number of characters stored. (ie From the previous example, “Dan” will take up 5 bytes). The max for CHAR(m) is m = 255.
VARCHAR
This data type is a variable-length string that is not padded with spaces to the specified length, but does include an additional byte to account for the length of the stored text. This length byte is read by the server when querying against the table. Thereby, taking up some extra processing time.
Optimize: First, refer to the ABC's to decide what your field requirements are. Second, lets look at the following scenarios to help in you decide which approach is best in optimizing your fields.
- You know that your data will always be a fixed length. Use CHAR and not VARCHAR! VARCHAR will store an additional byte for the length of the text. You don't need this extra byte of information as the server already knows the fixed length of the column. In addition, the server does not have to read each row to figure out the length of the text like it does for VARCHAR.
- If you are storing data that can vary in length but you know the max length, then use VARCHAR and not CHAR. CHAR(m) will pad the text with trailing spaces to accommodate the specified length of the field, always taking up m bytes. VARCHAR on the other hand will always append an additional byte but it will not pad the text with any trailing spaces, keeping the storage down to a minimum.
| Type | Storage Required | Maximum Length |
|---|
| CHAR (m) | m characters | 255 characters | | VARCHAR (m) | L characters plus 1 byte | 65,535 characters (subject to limitations) | | TINYTEXT | L characters + 1 byte | 255 characters | | TEXT | L characters + 2 bytes | 65,535 characters | | MEDIUMTEXT | L characters + 3 bytes | 16,777,215 characters | | LONGTEXT | L characters + 4 bytes | 4,294,967,295 characters |
TEXT
This data type and its variations are closely similar to the previously mentioned with one exception, they all include informational bytes ranging from 1 – 4 in addition to the length of the text itself.
Optimize: Taking into consideration the ABC's begin reviewing your options in terms of suitable data types. Then determine if you can use CHAR or VARCHAR for any of your data fields as they may only use 1 or 0 additional bytes for storage. If not, then use the most fitting data type for your requirements.
Thursday, January 31. 2008
I mentioned in the previous post that a widget is just some code that people can stick on their website. A web-based widget can be directly embedded (such as a Flash or Flex application), or it could simply be an IFRAME linking to code somewhere else. For this example, I created the file widget-photo.php and linked to it from an iframe.
The cool part of this example isn't that a page "lives" within another. The magic happens when the user searches for their local congressperson. When a last name is submitted, the code within the embedded page makes a couple API calls. Using the SunlightLabs API, the PHP script asks for a list of congresspeople with the given last name. The SunlightLabs API returns a nicely-formatted JSON result set. We have officially used an API.
A mashup is just an easy way of saying that two or more functionalities have been combined in hopes of producing something more interesting / helpful / meaningful / awesome, etc. A pretty solid mashup example is HousingMaps, which mashes together craigslist and Google Maps.
The example isn't over yet. We have just returned data regarding a congressperson or bunch of congresspeople. We're about to display them, but there's something we forgot before making this a true mashup. For each congressperson, we're also making a call to the CARMA API to try and find some other information related to that person. We make a call, enter the congressperson's name for the input, and in exchange are given an XML file of that congressperson and their CARMA statistics. How we choose to display all this information is anyone's call from here.
Mission accomplished.
Source: http://tech.forumone.com/code_examples/widget-photo.phph
Wednesday, January 30. 2008
In the next several months, we'll be rolling out several widgets that take full advantage of CARMA's Application Programming Interface.
The CARMA website, especially with the Dig Deeper tool, allows the user to sift through vast amounts of cleanly presented CO2 emissions data like never before. Plants, companies and locations can be filtered by numerous criteria, including but not limited to: future power output, present CO2 emissions, past intensity and parent location. Maps present an intuitive display to compliment the raw data.
With all this information in the CARMA database, there's no wonder that some connections were left untouched. We just couldn't add too much information to the website without it getting overwhelming for the casual visitor. If you have unanswered questions after viewing the CARMA site, then the chances are that the API might be worth looking in to.
With the CARMA API, you can create a widget to simply repackage CARMA data. Examples include:
1. Displaying the future emissions of planned power plants by location.
2. Displaying trends in emissions and carbon intensity for specific plants, companies, or geographic regions.
3. Looking up the carbon intensity of power providers by zip code in the U.S.
You can also mashup data from CARMA and other sources:
1. Using the carbon intensity of individual power companies to calculate accurate carbon footprints for individual users.
2. Mashing up CARMA’s emissions data with campaign contribution data for congressional districts.
3. Mashing up CARMA’s power company emissions data with financial information.
Remember that a lot of the plants have coordinates, so creating visually-appealing map widgets (perhaps mashing up with other types of geographic data, like asthma rates) is entirely possible.
A widget is a little chunk of code that people can call their own and place on their blog or website. For example, this widget simply grabs the latest RSS feeds from somewhere. Some other widgets are more interactive, such as for weather, allowing users to enter some inputs (in this case, their zip code).
Calling the API is relatively simple, and it returns data back in XML format. Let's say that you wanted to get the raw data for the state of California. You can simply browse to http://carma.org/api/1.0/searchLocations?name=California. To build a widget, you could make multiple API calls and even mash it up with external data sources (like Yahoo! finance).
We created the API so anyone can download the raw data and use it to form their own connections. Forum One will be creating several widgets to showcase the great potential of the API, but the real excitement comes from how you interpret and display the data.
Tuesday, January 29. 2008
Dan and I are back from the MySQL for Developers training, and we decided to share the enlightenment. Instead of boring you with a summary of what we've learned, we've decided to divvy up our notes into various "tips & tricks" posts.
Let's say that a client sends you an Access, Excel, or CSV file and tells you to stick all that information into your database. The client is oblivious to the structure of your table, and the file contains several columns that aren't needed. Some of the other columns are improperly formatted, making your life that much harder.
Your first instinct might be to write a PHP (or Perl, or Python, etc) script that squeezes and trims data to the right dimensions, then sends to MySQL. That's fine, but many don't know that MySQL can usually do this tedius work for you. It might also be worth mentioning that MySQL parses this data exceedingly fast.
LOAD DATA INFILE (or LOAD LOCAL DATA INFILE from your local machine) is an importer's dream come true. It allows for MySQL itself to load and parse a file. Once loaded, you can set the FIELDS TERMINATED BY, ENCLOSED BY, LINES TERMINATED BY, and IGNORE ... LINES parameters to split the data into meaningful chunks.
Setup:
LOAD DATE INFILE '[file.ext]' INTO TABLE [table_name]
FIELDS TERMINATED BY '[column_separator]'
ENCLOSED BY '[usually doublequote]'
LINES TERMINATED BY '[usually \r or \n]'
IGNORE [lines_to_skip] LINES
(col1, @variable2, col3, col4)
SET col2 = SOME_FUNCTION(@variable2, some_parameters);
You can use @variables instead of columns for processing. If unneeded file columns exist, then we store that column into a dummy variable (look at @skip below).
The import file uses "Apr 20 1982" for the birthday, but we want it looking like "1982-04-20". Below, we set the @birthday variable equal to the file's birthday column, then SET birthday (the DB column) equal to STR_TO_DATE(@birthday, 'input_format').
Real-world usage:
LOAD DATA INFILE 'import.csv' INTO TABLE people
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 2 LINES
(id, @skip, @fname, @lname, @skip, @birthday)
SET name = CONCAT(@fname, ' ', $lname),
birthday = STR_TO_DATE(@birthday, '%b %e %Y');
NOTE: STR_TO_DATE expects the input date format to be specified. The 2nd parameter contains the curious string '%b %e %Y'. In this example, it's looking for a date formatted like 'Jan 9 2001'. The meaning of those special characters can be found here.
I would highly recommend attending the MySQL training seminars. David Swain, our trainer, was highly competent and had helpful answers and further insight to every question we could throw at him.
Tuesday, January 22. 2008
Congrats to CollabNet! Subversion, the open source versioning tool written as a "compelling replacement to CVS", was named by developer.com as the Product of the Year 2008. We've been using subversion for our work for over a year now and its proved to be a very useful tool in helping us get work done and be more productive.
Before SVN, we had a manual cut over process to move files from our development server to production servers. It was slow, since you had to navigate around the file system, and error-prone, since one could never be sure that you moved all the correct files. This became more of a problem as multiple developers and designers would work on a single site. Now, SVN is constantly keeping track of which files are modified on each environment and automatically synchronizes the code on each. It's also allowed us to move from a single, shared development server, to working locally on our own workstations. We've also started exploiting the hooks in svn for checking PHP syntax before commits, and sending emails for code-reviews upon commit.
If you're working with or hiring programmers, one of your first questions you should ask them is how they manage and track changes to their source code. If they aren't using subversion or another system, that's an immediate red flag.
We've written about subversion before. Dan's previously written about Subversion: Simple Practices and Sandy wrote HOWTO: Use Eclipse PDT with Subversion in 13 Easy Steps.
Tuesday, January 22. 2008
Non Profit Tech blog has posted an excellent start to their 3 part series on How to get your small nonprofit up on the Web, showing how to register for your domain name and why you want one. I'm also heartened to see that part 2 will detail how to use Google Apps for Domains as your email hosting provider, for two reasons. First, we're in the process of switching our email infrastructure from hosting it ourselves to GAD - it made to much sense from a cost, reliability, and support angle. Second, this past weekend I spent time moving my personal email to GAD for the same reason, and found the whole process to be well documented and structured by Google. At every step of the way the instructions are pretty clear and not overly technical. Where it does veer into technical details, there are specific, step by step guides to follow.
One caution about registering domain names - make sure that the email contact address you have for both the Administrative and Technical contacts is a reliable email address that you check regularly. You'll want to keep an eye out for bogus DNS transfers, and more importantly, when your domain name comes up to renewal you'll be notified there. You absolutely do not want your domain to expire, since another person could quickly come along and register it.
Once you have your domain name, you'll likely need DNS hosting. DNS hosting lets route host names, mail, and other services to servers by associating a numerical IP address with, you can read up on DNS hosting services on wikipedia. If you register with Godaddy, I believe they offer that as part of your registration, as most registars do. You may not want to use a registrar's DNS hosting to insulate yourself from having to also move DNS hosting if you switch registrars in the future. Using a separate DNS hosting provider can prevent downtime for your web site and services int hat case. There are many free DNS hosting services available, one to look at is everydns.net, which is free but accepts donations.
Wednesday, January 16. 2008
There's no question that most people have heard of AJAX by now, or have seen some interesting uses of it. When done right, AJAX dramatically improves the user experience, but using AJAX simply for "style points" can be disastrous:
1. DIV popups (with AJAX-powered content) appearing without the user's consent (Snap Shots)
We've all seen it before. When rolling over text or a hyperlink, a bubble pops up with a screenshot of the prospective site, or sometimes, context-related ads appear. Anything that takes control or surprises the user is evil. I would never wish something so vile for even my worst enemies.
2. Automatically loading more content when the user scrolls down (DZone, Live)
An age old idiom of web and interface design in general is not to break existing functionality. When the user hits the 'End' key, we expect to be sent to the end of the page, for example. Pageless pagination, or Endless Pageless styles of loading more data break every convention that makes the web analogous to physical medium. No longer can you remind yourself, "Oh, that cool widget was on page 3". We use page numbers for a reason, and although these pageless methods are technically interesting and flashy, they do no service to the user aside from a monotonous scroll-wheel workout.
More importantly, AJAX content keeps getting loaded onto the same page. Add a couple hundred items (each with related images and/or embedded media), and your inquisitive user might start wondering why their browser isn't responding.
3. Multi-page forms that don't support the browser's "back" button (PBwiki)
Here we are again, altering the way a user expects the page to behave.
4. Browser incompatibility (mostly IE)
One of the more annoying uses of AJAX is when it doesn't work to begin with. When working with AJAX, I'd say that a good 25% of my time was spent correcting code inoperability among browsers. Thanks to libraries like jQuery, that time has been cut down dramatically. Regardless, if these various browsers simply stuck to the standards instead of being web cowboys, web developers would have a noticeably brighter outlook on life.
Contributors: Mike Shade, Oscar Merida, Sandy Smith
Tuesday, January 15. 2008
Many home routers tend to have default settings that close idle connections after a period of time. Normally this interval is 5 minutes. For general home use, that's fine, but as a web worker with many ssh sessions open, it can be maddening to receive the dreaded Connection to server closed. when you were in the middle of editing a file or comparing settings.
One solution is to use the GNU screen app for all of your work. However, this doesn't solve the problem of dropped connections due to inactivity.
Instead, you can nip the problem in the bud with a couple of tweaks to your sshd_config. On Red Hat and derivative systems, edit /etc/ssh/sshd_config and add the following lines:
TCPKeepAlive yes
ClientAliveInterval 60
Save the file, restart sshd, and you'll never want to smash your terminal again.
Monday, January 14. 2008
As a web developer, my job depends greatly not only on how efficient the backend code is, but how intuitive the site is for the user. As the web moves further and further from the "monkey see" to "monkey do" mentality, web developers are needing to become more design conscious.
There are some great User Experience and Design teams out there to do the heavy lifting, but it's important to remember that the success of any site ultimately depends on what makes the user, not necessarily the client, happy. Even if the code is beautiful and highly optimized, the user will be gone in seconds unless they find comfort in the site's layout and ease-of-use. There will undoubtedly be the client who is adamant on a site looking or behaving a certain way, which is fine as long as user testing & feedback is provided. Clients don't often like to hear that their ideas "need some work", but having collected data tends to soften the blow.
No user wants to be hassled into disclosing their private information unless there's a really good incentive to do so. Believe it or not, people normally don't wake up, brush their teeth, and say to themselves, "You know what, I'm in the mood to fill in a couple forms to get the day started". If the user's information is needed, then make the process as painless as humanly possible. As this article implies, only require the user to fill in the bare minimum. Once logged in, any remaining fields can be filled in by the user at his or her own convenience.
Just because the visitor happened to browse to your site doesn't mean that you can or should control their viewing experience. The fact of the matter is that if the user grows wary or even the slightest bit impatient from your site, then then they probably won't be coming back. Selecting the "Sign me up for..." checkbox by default could be seen as disrespect towards the user, and many simply don't appreciate it. The key takeaway is that every visitor should feel privileged, and not feel like they're being rushed through the door to meet some quota.
URL: http://codeulate.com/?p=9
|
|
Comments
Thu, 14.08.2008 16:58
Thanks for the tip. I made a slight mod you might be interested [...]
Mon, 28.07.2008 15:06
Solution (to my issue): Views > Tools > Flush Views Cache It explains that Views doesn't always keep up with changes [...]
Mon, 28.07.2008 14:52
Thanks for this helpful post. I've seen this effect too. I'm running into a different (but related?) issue - the Views [...]
Tue, 15.07.2008 20:25
Oscar, Krista from Calais here, writing to let you know that Calais 2.1 is live. In addition to our ongoing [...]
Tue, 01.07.2008 11:30
Dan, You are absolutely correct and I should have stated this within my post; the described steps within the post [...]
Mon, 30.06.2008 09:45
i wouldnt recomand this at all, because if something happens and the conection is lost u will have your data lost if the [...]
Mon, 09.06.2008 13:42
PDT syntax highlighting support does not seem to work when subclipse is installed, any one else had this problem?
Mon, 09.06.2008 11:56
I didn't mean to imply that you were bashing unit tests.
Mon, 09.06.2008 11:52
My point isn't to bash unit tests, but rather to say there are a bunch of things you should be doing before you get [...]
Mon, 09.06.2008 11:43
I agree with, what I think is, the gist of your argument. That is, if you don't write code that anticipates failure, [...]
Mon, 09.06.2008 08:58
clipse is an open source IDE — or as they put it themselves: “universal toolset for development”. It [...]
Tue, 27.05.2008 12:17
Navigation links should fill their container to ensure ease of selection. A good method for that is to make them [...]
Thu, 22.05.2008 10:35
One of the better comments I've seen in a while: "Although I like PHP, I agree the language is only as good as the [...]
Tue, 20.05.2008 14:03
Oscar, Yahoo's Term Extraction service takes an entire article and returns a few of (what it thinks are) the most [...]
Tue, 20.05.2008 13:13
Hi, Tom Tague from Calais here. First, thanks for taking note of Calais. And integrating an example right within the [...]