How to get MySQL to sorting Unicode strings correctly
We received a report from one TeamworkPM customer that the sorting for Czech language on our people page was incorrect.
It took me 30 minutes to track down the problem so I hope this will safe some of you a bit of frustration.
Problem #1 - Merging of accented and non-accented characters
Say we had 2 users - "Candy" and "Čatelyn" (note the accent on Č), MySQL was returning only one row when we ran the following code
SELECT DISTINCT ucase( left( userFirstName, 1 ) ) AS letter
FROM users u
ORDER BY letter
was returning just 1 row with "C".
We wanted it to return "C" and "Č".
The fix:
I was scratching my head but it's obvious in hindsight – we just needed to change the collation on the userFirstname field from utf8_unicode_ci to utf8_bin.
Problem #2 - Sort order incorrect
That was one problem fixed. It turns out that MySQL was still putting names that start with accented characters at the end of the list so we had:
Anya
Candy
Mary
Čatelyn
instead of
Anya
Candy
Čatelyn
Mary
To fix this I just sorted using the following:
SELECT DISTINCT ucase( left( userFirstName, 1 ) ) AS letter
FROM users u
ORDER BY letter collate utf8_general_ci, letter
Hope this helps somebody.

