Character set, Collation

Here is an interesting challenge that I ran into in the Data Warehouse environment. While debugging a particular business user query, I joined between a string built in a sub-query to a column in a particular fact table with some 20million rows. BTW, I was using MySQL 5.0 and Toad 4.5 to run this query and instead of getting any result I got the following error!

MySQL Database Error: Illegal mix of collations (utf8_general_ci,COERCIBLE) and (latin1_swedish_ci,IMPLICIT) for operation '='    22    0

The query was simple and similar to
SELECT s.str, s.d1, s.d2
FROM (
    SELECT distinct 'string' str, dimension1 d1, dimension 2 d2
    FROM table_1
    WHERE ...
) s
LEFT OUTER JOIN table_2 t2
ON ...
WHERE s.str = t2.str
AND ...


Regular Expressions

MySQL regular expressions (RE) are a powerful tools that can be very useful in SQL string searches. They enable software engineer to build statements that are very concise and to handle complex string operations that otherwise wouldn't be possible.

If you are new to regular expressions or would like to know more about them, http://www.regular-expressions.info/ is a good site to visit. You can also get a RE tutorial at net|tuts+. Following is quick list of meta characters that can get you started in using them.

. => A dot matches single character
* => An asterisk matches zero or more of previous matched tokens
? => A question mark matches zero or one time the previous matched token
$ => A dollar at the end anchors the search to the end of string
^ => A caret symbol anchors the search to the beginning of the string
| => A pipe matches either of the two. Example: abc|xyz => either 'abc' or 'xyz'
{m,n} => A quantifier matching between 'm' and 'n' times. m & n are integers.

Different computer languages have some variations when it comes to more advanced searches and how they handle given character sets. MySQL uses REGEXP string function to implement and matches the string in case "in-sensitive" mode and to match otherwise see this blog.


MySQL User Defined Variable

This is a brief introduction to user defined variables in MySQL as many times I have gotten queries regarding how to use them in dynamic queries.

In MySQL you can easily create user defined variable and use it through out that particular session or client connection. A variable is a alphanumeric characters following a "@". In the versions 5 and above the name is case-insensitive and hence @VAR, @vaR all mean the same. For example:

set @var = 1;

The above statement creates a variable called "var" and sets it to 1. Also note you don't need to declare it before using it.

The statement,

set @var1 = 1, @var2 = 2, @var3 = 3;

sets all three variables in single statement. You can then select the variables from

select @var1, @var2, @var3;

Now, let us say that you would like to select particular row(s) from a given table. To do so you need to first build the sql string, prepare it and then execute it. This allows you to pass the variables one or more times to the same statement.

For this let us assume we have a table (table_t) that has two columns - id and name and has 3 rows (1, 'one'), (2, 'two') and (3, 'three'). To select row with id = 1

set @var = 1;
set @st = concat('select id, name from table_t where id = ', @var);
prepare stmt from @st;
execute stmt ;

And to select with string variable like names you need to escape the single quote as below.

set @var = 'one'
set @st = concat('select id, name from table_t where name = ', '''', @var, '''');
prepare stmt from @st;
execute stmt ;

This is a trivial example but you get the idea of how to use user defined variables. The same technique can be used to build more complex stored procedures and statements that are executed often with different variables.

Cheers,
Shiva