Languages are tricky, quickly evolving systems, but almost all of them in their current form predate computers. Indeed, most predate typesetting. So there are a ton of writing systems out there that are designed for writing with a trained human hand, and not optimised for discrete mechanical reproduction. Those languages persist, and computers are finally getting powerful enough to reproduce them elegantly. So how do we deal with them?
First, get familiar with Unicode.
Once you've done that, you'll find that regular web applications need a little more to work properly. For instance, what if you want to use full-text indexing on a script that isn't derived from Latin?
Fortunately, MySQL as of version 4.1.1 makes that a lot easier...but there are some tricks.
You'll need to learn about collations in MySQL. It is entirely possible in PHP to set everything you're doing to UTF-8 and have a site that can accept and reproduce lots of different scripts, but the database is storing them as something else. However, you give up a lot. First, your database dumps will be unrecoverable gibberish. Second, functions that depend on MySQL understanding how the underlying data behaves, such as full text indexes, will fail.
MySQL out of the box to have latin1_swedish_ci be the default collation, which I find an extremely odd choice. OK, so they're fighting Amerocentrism, but they're not exactly promoting international standardization. So unless you specify that you want the database storing information as utf8_unicode_ci (which I've found to be the most hassle-free for the widest range of character sets), you'll need to specify that your database and all its tables and rows use the utf8_unicode_ci collation, which will also cause them to store everything as UTF-8.
But wait--you're not done. You'll also need to make sure your connection defaults to UTF-8. That's right, you have to have a clean UTF-8 path through your whole application. Just one place where UTF-8 isn't respected will turn everything to unreadable gibberish or a string of question marks. You can have your application issue 'SET NAMES utf8' for every request. The best option is to make sure you have an environment where you can control the MySQL configuration until they come up with some better defaults.
Once you do this, and providing you do all the things you need to do in HTML and PHP (or the scripting language of your choice) to make them UTF-8-clean as well, you should be able to do all the things you're used to in other languages.
Comments
Tue, 01.07.2008 11:30
Dan, You are absolutely correct and I should have stated this within my post; the described steps within the post [...]
Mon, 30.06.2008 09:45
i wouldnt recomand this at all, because if something happens and the conection is lost u will have your data lost if the [...]
Mon, 09.06.2008 13:42
PDT syntax highlighting support does not seem to work when subclipse is installed, any one else had this problem?
Mon, 09.06.2008 11:56
I didn't mean to imply that you were bashing unit tests.
Mon, 09.06.2008 11:52
My point isn't to bash unit tests, but rather to say there are a bunch of things you should be doing before you get [...]
Mon, 09.06.2008 11:43
I agree with, what I think is, the gist of your argument. That is, if you don't write code that anticipates failure, [...]
Mon, 09.06.2008 08:58
clipse is an open source IDE — or as they put it themselves: “universal toolset for development”. It [...]
Tue, 27.05.2008 12:17
Navigation links should fill their container to ensure ease of selection. A good method for that is to make them [...]
Thu, 22.05.2008 10:35
One of the better comments I've seen in a while: "Although I like PHP, I agree the language is only as good as the [...]
Tue, 20.05.2008 14:03
Oscar, Yahoo's Term Extraction service takes an entire article and returns a few of (what it thinks are) the most [...]
Tue, 20.05.2008 13:13
Hi, Tom Tague from Calais here. First, thanks for taking note of Calais. And integrating an example right within the [...]
Tue, 20.05.2008 13:03
How does this compare to Yahoo!'s Term Extraction Service?
Thu, 15.05.2008 14:37
I rounded up useful links over on the Forum One Tech blog: Getting your Organization on Facebook
Mon, 21.04.2008 13:43
Hi Vikram-- Have you set up your repository in Subversive and successfully connected?
Mon, 21.04.2008 12:56
On checkout as.. dialog you asked to choose "Check out as a project configured using the New Project Wizard." That [...]