MySQL优化之index_merge

查询器的查询负优化？

慢查询

最近在完成一个项目的三期工作，很是艰辛。
前两期历史遗留了许多慢查询，但由于过于久远，团队一直没有合适的时间去再回顾一遍业务逻辑，趁着这次三期工作，也想着顺便干掉之前的慢查询。

主要是针对下表：

CREATE TABLE `demo` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) unsigned NOT NULL DEFAULT '1' COMMENT '站点ID',
  `survey_id` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '问卷id',
  `date` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '日期 (例如: 20160906)',
  `org_id` varchar(20) NOT NULL DEFAULT '' COMMENT '门店id',
  `choice` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '选项id',
  `num` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '选择数量',
  `send_num` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '问卷发送量',
  `reply_num` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '问卷回复量',
  `created_at` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '创建时间',
  `updated_at` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '更新时间',
  PRIMARY KEY (`id`),
  KEY `idx_survey_id` (`survey_id`),
  KEY `idx_org_id` (`org_id`),
  KEY `idx_date_id_org` (`date`,`survey_id`,`org_id`)
) ENGINE=InnoDB AUTO_INCREMENT=2273822 DEFAULT CHARSET=utf8 COMMENT='问卷选项分布'

分析

有这么一句sql（涉及敏感数据均用’×‘代替）：

SELECT `id`, `org_id`, `date`, `survey_id`, `send_num`, `reply_num`, 
SUM(case when choice='×' then num else 0 end) as __A0, 
SUM(case when choice='×' then num else 0 end) as __B0, 
SUM(case when choice='×' then num else 0 end) as __C0, 
SUM(case when choice='×' then num else 0 end) as __D0, 
SUM(case when choice='×' then num else 0 end) as __E0, 
SUM(case when choice='×' then num else 0 end) as __A1, 
SUM(case when choice='×' then num else 0 end) as __B1, 
SUM(case when choice='×' then num else 0 end) as __C1, 
SUM(case when choice='×' then num else 0 end) as __D1, 
SUM(case when choice='×' then num else 0 end) as __E1, 
SUM(case when choice='×' then num else 0 end) as __A2, 
SUM(case when choice='×' then num else 0 end) as __B2, 
SUM(case when choice='×' then num else 0 end) as __C2, 
SUM(case when choice='×' then num else 0 end) as __D2, 
SUM(case when choice='×' then num else 0 end) as __E2, 
SUM(case when choice='×' then num else 0 end) as __A3, 
SUM(case when choice='×' then num else 0 end) as __B3, 
SUM(case when choice='×' then num else 0 end) as __C3, 
SUM(case when choice='×' then num else 0 end) as __D3, 
SUM(case when choice='×' then num else 0 end) as __E3, 
SUM(case when choice='×' then num else 0 end) as __A4, 
SUM(case when choice='×' then num else 0 end) as __B4, 
SUM(case when choice='×' then num else 0 end) as __C4, 
SUM(case when choice='×' then num else 0 end) as __D4, 
SUM(case when choice='×' then num else 0 end) as __E4 
FROM `demo` 
WHERE `date` BETWEEN 20190312 AND 20190412 AND `survey_id` = '×' AND `org_id`='×' AND `site_id`='×'
GROUP BY `date`, `survey_id`, `org_id` ORDER BY `date` DESC LIMIT 10;

其执行时间足足有829ms，当org_id是in操作的时候，可以达到2000ms+。

肯定有人第一直觉，觉得是SUM函数计算过多，导致慢查询。首先，这里并不会介绍其中的业务逻辑，但是这么多SUM运算都是有意义的；其次，这并不是导致慢查询的原因，把SUM全部去掉，执行时间还是有800ms+。

一眼看不出优化策略的，我们可以试着explain一下它：

根据索引最左匹配原则，为什么没有命中idx_date_survey_org这个联合索引呢？在sql的实际执行过程，MySQL查询优化器会根据实际的情况选择索引，之所以在这里不命中idx_date_survey_org是因为date检索的时间跨度比较大，检索数据多，不如查询优化器选择的索引。

那么查询优化器选择的索引又是什么呢？可以看到在这一条sql的执行过程中竟然用到了两个索引，并且进行了一个叫index_merge的东西。

index_merge

什么是index_merge？查阅资料说明如下：

MySQL5.0之前，一个表一次只能使用一个索引，无法同时使用多个索引分别进行条件扫描。但是从5.1开始，引入了 index merge 优化技术，对同一个表可以使用多个索引分别进行条件扫描。

原来是这样，所以说这是MySQL针对查询的优化才对…吗？

这里不得不介绍一下survey_id这一列的特殊性：这个字段总共有三个值，分别对应每一期。也就是说合适的时间跨度下，survey_id的值是一样的。而全表200w+的数据，如果要对此索引进行检索，真的有必要吗？

稍微了解过索引优化的就会知道，这当然是没有必要的。不仅没有必要，这还会拖慢整个sql的查询。因为index_merge必须等待survey_id检索完，再进行intersect（此sql用到的合并算法）。

优化

如此一来，优化策略就很简单粗暴了，有以下三种：

强制使用其一索引
直接干掉where中的survey_id
联系dba干掉index_merge

最后一种过于粗暴，不建议使用（因为这可能会被dba干掉）

强制索引

优化后的语句如下：

SELECT `id`, `org_id`, `date`, `survey_id`, `send_num`, `reply_num`, 
...
FROM `demo` FORCE index (idx_org_id)
WHERE `date` BETWEEN 20190312 AND 20190412 AND `survey_id` = '×' AND `org_id`='×' AND `site_id`='×'
GROUP BY `date`, `survey_id`, `org_id` ORDER BY `date` DESC LIMIT 10;

执行耗时20ms，explain结果：

去掉对应检索项

由于业务方面的考虑，没有对前几期survey_id检索的必要，顾项目中使用了这种方案。

优化后的语句如下：

SELECT `id`, `org_id`, `date`, `survey_id`, `send_num`, `reply_num`, 
...
FROM `demo`
WHERE `date` BETWEEN 20190312 AND 20190412 AND `org_id`='×' AND `site_id`='×'
GROUP BY `date`, `survey_id`, `org_id` ORDER BY `date` DESC LIMIT 10;

执行耗时38ms，explain结果：