Input strings has the format "BucketN_N[ -N | + ]" where N is decimal number. For example, "Bucket1_1" or "Bucket7_21-50" or "Bucket5_100+" are all valid and output should extract bucket number (first N), minium value (second N) and maximum value of the bucket (third optional N).
The query
SELECT bucket_string ,substring(qty_bucket, 7, 1)::integer as bucket_num ,regexp_replace(bucket_string, '^Bucket\\d{1,}_(\\d{1,})([-+].*)?', E'\\1')::integer as minimum_qty ,regexp_replace(regexp_replace(bucket_string, '.*-(\\d{1,})', E'\\1'), '.*_(\\d{1,})([+]?)$', E'\\1')::integer as maximum_qty FROM a_table ORDER BY bucket_string
Though I could have used other functions like substring and other, it was fun using regular expression in smallest code. regexp_replace is the workhorse taking a string and replacing it with matched string (\\1 or $1). "^ and $" are anchors to beginning and end of string, \\d matches with decimal numbers and {1,} means 1 to many quantifier.
And more with regexp_matches and array selection. I wanted to select offset number of days from 2012 December 01 in this simplified example. The offset is a text in another external table of Greenplum and data is not always clean and could have some text before the offset.
SELECT d regexp_matches(d, '(?:.*_)?(\\d+)$') re_ary , '2012-12-01'::date+((regexp_matches(d, '(?:.*_)?(\\d+)$'))[1] || 'days')::interval new_d FROM ( SELECT '90' d union SELECT 'new_90' d union SELECT 'old_90_10' d ) ttreturns
d re_ary new_d ------------------------------------- 90 {90} 2013-03-01 00:00:00 new_90 {90} 2013-03-01 00:00:00 new_90_10 {10} 2012-12-11 00:00:00
With "?:" is non-capturing group and this matched text is optional meaning any text before string ending number is not-captured for later use and original string may have it but not all the time. '$' anchors it to the end of string with regexp_matches returning an array of single element and subscript [1] selecting it.
No comments:
Post a Comment