Yahoo SEM daily report - Currency and Dups handling

If you are pulling daily keyword performance data from major search engines to store in your in Data Warehouse, you need to perform many of transformation before having a daily, weekly or monthly summaries available for business (end) users.  In the case of Yahoo the required transformations that I needed to make are, one to handle the currency conversion and two to handle Yahoo sending more than one keyword when only one is expected. This has seem to happen when query or landing page url is modified for the same bid unit and for high impressions queries. Also, for disabled queries/keywords.

The following sql handles both with data read from staging table and loaded into final table after some transformation and cleaning. It does currency conversion for all countries except US and Mexico (units are in US dollars) which do not need conversions.

Sub-query below uses MySQL GROUP_CONCAT to find any duplicate keywords for the bid unit and dim_curreny has the daily currency exchange rates pulled from external third party services early in the morning. se_query_group table has the grouping of the keywords based on company requirements and search engine. Query also filters out any keyword that has zero impressions.

The query can further be modified for better performance of using query_id by joining with data warehouse query tables and using query_id in the joins also using search engine bid unit as the surrogate key.



INSERT INTO INT_MKTG_KEYWORD_PERF
    (se_date, search_engine, account_id, campaign_id, ad_group_id, keyword_id
    ,domain, url, query_group_id
    ,query, query_num_2
    ,impressions, clicks, ctr
    ,cpc_orig, cpc
    ,revenue_orig, revenue
    ,cost_orig, cost
    ,num_assist
    ,avg_position )
SELECT KP.se_date, KP.search_engine, KP.account_id, KP.campaign_id
       ,KP.ad_group_id, KP.keyword_id, 
       ,KP.domain, KP.url, KP.query_group_id
       ,CASE
           WHEN query2 NOT RLIKE  '---'  THEN  query2                        # Normal single kw
           WHEN query2 RLIKE  '---'  THEN  substring_index(query2, '---', 1) # Get 1st kw
       END query2
       ,CASE
           WHEN query2 RLIKE  '---'  THEN  substring_index(substring_index(query2, '---', 2), '---', -1)
       END  query_num_2                                            # Get 2nd query
       ,impressions
       ,clicks
       ,ctr
       ,cpc cpc_orig
       ,CASE
           WHEN  C.country_code not in ('US', 'MX') THEN cpc / C.currency_rate
           ELSE cpc
       END cpc
       ,revenue revenue_orig
       ,CASE
           WHEN C.country_code not in ('US', 'MX') THEN revenue / C.currency_rate
           ELSE revenue
       END revenue
       ,cost cost_orig
       ,CASE
           WHEN C.country_code  not in ('US', 'MX') THEN cost / C.currency_rate
           ELSE cost
       END cost
       ,CASE
           WHEN avg_position > 0 and avg_position < 1 THEN 1
           ELSE avg_position
       END avg_position
     FROM (
          SELECT se_date, search_engine
              ,account_id, campaign_id, ad_group_id, keyword_id, 
              ,domain, url, query_group_id, query
              ,GROUP_CONCAT(query SEPARATOR '---') query2
              ,SUM(impressions) impressions, SUM(clicks) clicks
              ,AVG(ctr) ctr, AVG(cpc) cpc, 
              ,AVG(revenue) revenue, 
              ,SUM(cost) cost, AVG(avg_position) avg_position
          FROM stg_keyword_perf
          WHERE se_date >=  '<date>' #-- Date here
          GROUP BY se_date ,account_id ,campaign_id ,ad_group_id ,keyword_id
          ) KP #- stg_kw_perf
     JOIN se_query_group QG
       ON KP.query_group_id = QG.query_group_id AND KP.search_engine = QG.search_engine
     JOIN dim_currency C 
       ON KP.se_date = C.currency_date AND QG.country_code = C.country_code
     WHERE KP.se_date = '<date>'   #-- Date here
       AND KP.impressions <> 0
      ;


HTH,
Shiva

No comments:

Post a Comment