Joins between tables of different charset

Not too long ago I got bitten by a slow performing MySQL query even though I had right indices (in fact unique index) and the tables were not big tables. The query which is similar to below was taking more than 20 mins which I ended up killing it.

The issue was one table was storing data in "latin1" while other in "utf8".

select count(1)
from table_1 T1
left outer join table_2 T2
  on T1.id = T2.id
where T2.id is null

Note: "id"s here are varchars and they are not integers.

T1 had around 100K while T2 has around 800K joining on MySQL5.1, quad machine with 16G RAM.  I expected much quicker response and on running the explain plan I could see the query doing the full table scan!  First I wasn't sure why this was happening but after digging little bit and found out that one table was defined with "charset=latin1" and another with "charset=utf8".

MySQL was converting latin1 to utf8 and then performing the join which ended up with full table scan. On converting T2 to utf8 with collation utf8_unicode_ci, the query timing dropped to 10secs or so!

With that said by default now we create all tables with charset=utf8. 

No comments:

Post a Comment