Showing posts with label Collation. Show all posts
Showing posts with label Collation. Show all posts

Character set, Collation

Here is an interesting challenge that I ran into in the Data Warehouse environment. While debugging a particular business user query, I joined between a string built in a sub-query to a column in a particular fact table with some 20million rows. BTW, I was using MySQL 5.0 and Toad 4.5 to run this query and instead of getting any result I got the following error!

MySQL Database Error: Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation '='    22    0

The query was simple and similar to
SELECT s.str, s.d1, s.d2
FROM (
    SELECT distinct 'string' str, dimension1 d1, dimension 2 d2
    FROM table_1
    WHERE ...
) s
LEFT OUTER JOIN table_2 t2
ON ...
WHERE s.str = t2.str
AND ...


Case Sensitive SQL Search

MySQL on default installation uses the case insensitive configuration meaning the data you stored and searched will not be case sensitive.
Let us first check our current server and database's appropariate variables.

SHOW VARIABLES
WHERE variable_name = 'character_set_server'
> latin1
SHOW VARIABLES
WHERE variable_name = 'collation_database'
> latin1_swedish_ci

Since we are using latin based language settings, the cases are different unlike some languages that don't have cases. And with the database collation set to latin_swedish_ci the string search will be case insensitive (ci).

For this illustration, I have created a table "country" with single column "country_name" and inserted 3 rows.
CREATE TABLE country (
     country_name varchar(32) NOT NULL
);
INSERT INTO country
VALUES ('US'), ('UK'), ('ukraine')
; 
COMMIT;