Elasticsearch 7.x 深入【10】Aggregation

1. 借鉴

极客时间 阮一鸣老师的Elasticsearch核心技术与实战
Elasticsearch--Aggregation详细总结(聚合统计)
Elasticsearch聚合——Bucket Aggregations
Elasticsearch聚合——Metrics Aggregations
Elasticsearch聚合——Pipeline Aggregations
官网 search-aggregations
地理距离过滤器
Elasticsearch:aggregation介绍
ES aggregation详解
aggregation 详解1(概述)
aggregation 详解2(metrics aggregations)
aggregation 详解3(bucket aggregation)
aggregation 详解4(pipeline aggregations)
[Elasticsearch] 过滤查询以及聚合(Filtering Queries and Aggregations)
官网 search-aggregations-bucket
官网 search-aggregations-metrics
官网 search-aggregations-pipeline
官网 search-aggregations-matrix
Using a bucket script aggregation inside filter aggreagtion
问题:nested查询,内部需要聚合,再刷选,怎么弄?

2. 开始

数据准备:<Elasticsearch 7.x 深入 数据准备>

Aggregation 分类

aggregations提供基于搜索查询的聚合数据,它有以下分类

  • Bucket
    一组构建bucket的聚合,其中每个bucket与一个键和一个文档条件相关联。当执行聚合时,将对上下文中每个文档计算所有bucket条件,当某个条件匹配时,将认为文档“落在”相关bucket中。在聚合过程的最后,我们将得到一个存储段列表——每个存储段都有一组“属于”它的文档。
  • Metric
    在一组文档上跟踪和计算指标的聚合。
  • Matrix
    操作多个字段并根据从请求的文档字段中提取的值生成矩阵结果的一组聚合。与Bucket和Metric不同,这个聚合还不支持脚本。
  • Pipeline
    聚合,聚合其他聚合及其相关指标的输出

聚合的语法

"aggregations" : { // 关键词
    "<aggregation_name>" : { // 自定义的聚合名字
        "<aggregation_type>" : { // 聚合的类型
            <aggregation_body>
        }
        [,"meta" : {  [<meta_data_body>] } ]?
        [,"aggregations" : { [<sub_aggregation>]+ } ]?  // 子聚合
    }
    [,"<aggregation_name_2>" : { ... } ]*  // 同级聚合
}

下面我们依次来看一下

Bucket

在es的文档中有好多类型,这里就不一一列举了

  • Terms
  • Range
  • Date Range
  • Histogram
  • Date Histogram
  • ...
栗子1: terms

我们举个栗子,看下有订单中有几种商品

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_goodsName": {
      "terms": {
        "field": "goodsName.keyword",
        "size": 10
      }
    }
  }
}

我们看下结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_goodsName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "IPhone 8 Plus",
          "doc_count" : 2
        },
        {
          "key" : "IPhone 9 Plus",
          "doc_count" : 2
        },
        {
          "key" : "IPhone 10 Plus",
          "doc_count" : 1
        }
      ]
    }
  }
}
  • 优化terms聚合的性能[在mapping时指定eager_global_ordinals为true]
    在字段需要经常被聚合;同时不断有新文档写入时,可以增加这个属性
  • min_doc_count:我们可以在聚合时指定最小的文档数目,只有满足这个参数要求的个数的词条才会被记录返回

terms聚合中,返回结果中的属性含义:

属性 含义
doc_count_error_upper_bound 被遗漏的term桶,可能包含文档的最大值
sum_other_doc_count 除了返回结果中bucket中的terms之外,其他terms的文档总数(总数-返回的总数)
栗子2:子聚合

取每种商品中,价格最高的1个订单

# 先根据goodsName.keyword分组,然后在按照价格倒序排序,取第一个
GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_goodsName": {
      "terms": {
        "field": "goodsName.keyword",
        "size": 10
      },
      "aggs": {
        "more_amount": {
          "top_hits": {
            "size": 1,
            "sort": [
              {
                "amount": {
                  "order": "desc"
                }
              }
              ]
          }
        }
      }
    }
  }
}

看下返回结果

{
  "took" : 8,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_goodsName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "IPhone 8 Plus",
          "doc_count" : 2,
          "more_amount" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_order",
                  "_type" : "_doc",
                  "_id" : "HAOY-SKXIS-LIWN",
                  "_score" : null,
                  "_source" : {
                    "platform" : "IOS",
                    "amount" : 1200,
                    "createTime" : "2020-04-15 10:00",
                    "originatorId" : 2,
                    "originatorName" : "李四",
                    "goodsId" : 1,
                    "goodsName" : "IPhone 8 Plus"
                  },
                  "sort" : [
                    1200
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "IPhone 9 Plus",
          "doc_count" : 2,
          "more_amount" : {
            "hits" : {
              "total" : {
                "value" : 2,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_order",
                  "_type" : "_doc",
                  "_id" : "USYX_SJJSUL_XUSYA",
                  "_score" : null,
                  "_source" : {
                    "platform" : "PC",
                    "amount" : 500,
                    "createTime" : "2020-01-20 10:00",
                    "originatorId" : 1,
                    "originatorName" : "张三",
                    "goodsId" : 2,
                    "goodsName" : "IPhone 9 Plus"
                  },
                  "sort" : [
                    500
                  ]
                }
              ]
            }
          }
        },
        {
          "key" : "IPhone 10 Plus",
          "doc_count" : 1,
          "more_amount" : {
            "hits" : {
              "total" : {
                "value" : 1,
                "relation" : "eq"
              },
              "max_score" : null,
              "hits" : [
                {
                  "_index" : "aggs_order",
                  "_type" : "_doc",
                  "_id" : "XXSA-KSUWL-USIA",
                  "_score" : null,
                  "_source" : {
                    "platform" : "PC",
                    "createTime" : "2020-01-20 10:00",
                    "originatorId" : 3,
                    "originatorName" : "王五",
                    "goodsId" : 3,
                    "goodsName" : "IPhone 10 Plus"
                  },
                  "sort" : [
                    -9223372036854775808
                  ]
                }
              ]
            }
          }
        }
      ]
    }
  }
}
栗子3:range

按照订单价格区间进行分组(通过这个例子,可以看到range是前闭后开区间 [0, 300) )

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "amount_range": {
      "range": {
        "field": "amount",
        "ranges": [
          {
            "to": 300
          },
          {
            "from": 300,
            "to": 700
          },
          {
            "key": "gt 700",
            "from": 700
          }
        ]
      }
    }
  }
}

看下结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "amount_range" : {
      "buckets" : [
        {
          "key" : "*-300.0",
          "to" : 300.0,
          "doc_count" : 1
        },
        {
          "key" : "300.0-700.0",
          "from" : 300.0,
          "to" : 700.0,
          "doc_count" : 2
        },
        {
          "key" : "gt 700",
          "from" : 700.0,
          "doc_count" : 1
        }
      ]
    }
  }
}
栗子4:script

首先计算出订单中的年,然后按照年进行分组

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_year": {
      "range": {
        "script": {
          "source": """
              JodaCompatibleZonedDateTime dateTime = doc['createTime'].value;
              return params.now - dateTime.getYear();
          """,
          "params": {
            "now": 2020
          }
        },
        "ranges": [
          {
            "to": 1
          },
          {
            "from": 1,
            "to": 3
          },
          {
            "from": 3,
            "to": 5
          },
          {
            "from": 5
          }
        ]
      }
    }
  }
}
栗子5:geo_distance

以给定位置为圆心画一个圆,来找出那些地理坐标落在其中的文档

GET /aggs_hotel/_search
{
  "size": 0, 
  "aggs": {
    "rings_around_amsterdam": {
      "geo_distance": {
        "field": "location",
        "origin": {
          "lon": 109.0000000,
          "lat": 34.0000000
        },
        "ranges": [
          { "to" : 100000 },
          { "from" : 100000, "to" : 300000 },
          { "from" : 300000 }
        ]
      }
    }
  }
}

我们来看下结果

{
  "took" : 82,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 8,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "rings_around_amsterdam" : {
      "buckets" : [
        {
          "key" : "*-100000.0",
          "from" : 0.0,
          "to" : 100000.0,
          "doc_count" : 6
        },
        {
          "key" : "100000.0-300000.0",
          "from" : 100000.0,
          "to" : 300000.0,
          "doc_count" : 0
        },
        {
          "key" : "300000.0-*",
          "from" : 300000.0,
          "doc_count" : 2
        }
      ]
    }
  }
}
  • 我们可以使用unit来指定单位,默认是m
By default, the distance unit is m (meters) but it can also accept: mi (miles), in (inches), 
yd (yards), km (kilometers), cm (centimeters), mm (millimeters).
  • 我们可以使用keyed,将buckets下的数组变为buckets下的每一个hash
栗子5:filter ,nested

我们查一下“泽兰雅家酒店”这个酒店,会员等级为001,住离日期是[2020-05-01, 2020-05-03),所要花费的价格等信息

# 第一种写法,直接筛选
GET /aggs_hotel_price/_search
{
  "size": 0, 
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "name.keyword": "泽兰雅家酒店"
        }
      }
    }
  },
  "aggs": {
     "prices": {
        "nested": {
          "path": "prices"
        },
        "aggs": {
          "group_by_level": {
            "terms": {
              "field": "prices.level",
              "size": 1,
              "include": "001"
            },
            "aggs": {
              "date_range": {
                "date_range": {
                  "field": "prices.selldate",
                  "ranges": [
                    {
                      "from": "2020-05-01",
                      "to": "2020-05-03"
                    }
                  ]
                },
                "aggs": {
                  "stats": {
                    "stats": {
                      "field": "prices.price"
                    }
                  }
                }
              }
            }
          }
        }
    }
  }
}

# 第二种写法-并不是很建议,因为这里对名字进行分组后筛选
GET /aggs_hotel_price/_search
{
  "size": 0,
  "aggs": {
    "group_by_name": {
      "filter": {
        "term": {
          "name.keyword": "泽兰雅家酒店"
        }
      },
      "aggs": {
        "prices": {
          "nested": {
            "path": "prices"
          },
          "aggs": {
            "group_by_level": {
              "terms": {
                "field": "prices.level",
                "size": 1,
                "include": "001"
              },
              "aggs": {
                "date_range": {
                  "date_range": {
                    "field": "prices.selldate",
                    "ranges": [
                      {
                        "from": "2020-05-01",
                        "to": "2020-05-03"
                      }
                    ]
                  },
                  "aggs": {
                    "stats": {
                      "stats": {
                        "field": "prices.price"
                      }
                    }
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

我们看下第二种方式的结果[需要注意的是,上述2中方式的返回结果的格式不一样,因为第二种多了一次聚合]

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_name" : {
      "doc_count" : 1,
      "prices" : {
        "doc_count" : 6,
        "group_by_level" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "001",
              "doc_count" : 2,
              "date_range" : {
                "buckets" : [
                  {
                    "key" : "2020-05-01-2020-05-03",
                    "from" : 1.5882912E12,
                    "from_as_string" : "2020-05-01",
                    "to" : 1.588464E12,
                    "to_as_string" : "2020-05-03",
                    "doc_count" : 2,
                    "stats" : {
                      "count" : 2,
                      "min" : 9.0,
                      "max" : 15.0,
                      "avg" : 12.0,
                      "sum" : 24.0
                    }
                  }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

Metric

在Metric中,有两种类型,一种是单值类型,另外一种是多值类型,我们接下来分别看下

单值类型(只返回一个分析结果)

在es的文档中有好多类型,这里就不一一列举了

  • min 最小值
  • max 最大值
  • avg 平均值
  • sum 总和
  • cardinality 去重后的数量

接下来我们来举个栗子

  • 我要查询订单中最小的支付金额
GET /aggs_order/_search
{
  "size": 0,  // 我这里没有query部分,我也不关系它的返回,这里size设置为0
  "aggs": { // 这里是关键字,不能变的
    "min_aggs": { // 这里是自定义的aggs的名称,自定义
      "min": { // 这里是要聚合的类型,只能是我们上面说的那些
        "field": "amount" // 要进行聚合的字段
      }
    }
  }
}

返回结果如下:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "min_aggs" : {
      "value" : 100.0 // 可以看到这里返回了最小值
    }
  }
}

同样我们也可以有多个聚合,这里我们查询订单中支付金额的最大,最小和平均值

GET /aggs_order/_search
{
  "size": 0, 
  "aggs": {
    "min_aggs": {
      "min": {
        "field": "amount"
      }
    },
    "max_aggs": {
      "max": {
        "field": "amount"
      }
    },
    "avg_aggs": {
      "avg": {
        "field": "amount"
      }
    }
  }
}

我们来看下结果

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "avg_aggs" : { // 平均值
      "value" : 525.0
    },
    "min_aggs" : { // 最小值
      "value" : 100.0
    },
    "max_aggs" : { // 最大值
      "value" : 1200.0
    }
  }
}

多值类型(返回多个分析结果)

在es的文档中有好多类型,这里就不一一列举了

  • stats
  • extended stats
  • percentile
  • percentile rank
  • top hits
  • ...

我们举个栗子,我要看下订单中amount的综合数据,比如最大值,最小值等等

GET /aggs_order/_search
{
  "size": 0, 
  "aggs": {
    "stats_aggs": {
      "stats": { // 指定聚合类型为多值类型中的stats
        "field": "amount"
      }
    }
  }
}

我们看下返回结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "stats_aggs" : {
      "count" : 4, // 包含amount字段的文档数量
      "min" : 100.0, // 最小值
      "max" : 1200.0, // 最大值
      "avg" : 525.0, // 平均值
      "sum" : 2100.0 // 总和
    }
  }
}

Pipeline

对聚合再次进行聚合

  • pipeline的分析结果根据不同聚合,会输出到不同位置[以下解释摘自冰钟,多谢]
    1. Sibling - 以兄弟聚合(同级聚合)的结果作为输入,对兄弟聚合的结果进行聚合计算。计算出一个新的聚合结果,结果与兄弟聚合的结果同级。
      max,min,avg,sum
      stats,extended status
      percentiles
      ...
    2. Parent - 以父聚合的结果作为输入,对父聚合的结果进行聚合计算??梢约扑愠鲂碌耐盎蚴切碌木酆辖峁尤氲较钟械耐爸?。
      derivative[求导]
      cumltive sum[累计求和]
      moving function[滑动窗口]
      ...

在pipeline的聚合中,必须要指定buckets_path,我们看下这个path的语法

buckets_path 的语法

# 聚合分隔符 ==> ">",指定父子聚合关系,如:"my_bucket>my_stats"
AGG_SEPARATOR       =  `>` ;

# metric aggregation的分隔符,指定度量值,如:“my_stats.avg”
# 我自己的实验:bucket和bucket聚合之间用>,bucket和metric聚合之间用>或者.都行,metric和metric之间用metric
METRIC_SEPARATOR    =  `.` ;

# 聚合名称 ==> <name of the aggregation> ,指定聚合的名称
AGG_NAME            =  <the name of the aggregation> ;

# 在多值metric聚合的情况下,指定metric聚合的名字
METRIC              =  <the name of the metric (in case of multi-value metrics aggregation)> ;

# 用于多值聚合选取其中指定名称的聚合进行
# 如:sale_type['hat']>sales
MULTIBUCKET_KEY     =  `[<KEY_NAME>]`

# 最后的路径公式为:
PATH                =  <AGG_NAME><MULTIBUCKET_KEY>? (<AGG_SEPARATOR>, <AGG_NAME> )* ( <METRIC_SEPARATOR>, <METRIC> ) ;
栗子1: min_bucket

计算个人订单的平均金额,并从中取出最小的那个

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorId": {
      "terms": {
        "field": "originatorName"
      },
      "aggs": {
        "avg_amount": {
          "avg": {
            "field": "amount",
            "missing": 0
          }
        }
      }
    },
      "min_avg_amount": { // 这里是自定义的pipeline聚合的名字
        "min_bucket": { // 这里是关键字
          "buckets_path": "group_by_originatorId>avg_amount" // 这里是聚合路径
        }
      }
  }
}

我们看下结果,因为min bucket是Sibling pipeline,所以结果与兄弟聚合的结果同级

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "group_by_originatorId" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "张三",
          "doc_count" : 2,
          "avg_amount" : {
            "value" : 300.0
          }
        },
        {
          "key" : "王五",
          "doc_count" : 2,
          "avg_amount" : {
            "value" : 150.0
          }
        },
        {
          "key" : "李四",
          "doc_count" : 1,
          "avg_amount" : {
            "value" : 1200.0
          }
        }
      ]
    },
    "min_avg_amount" : {
      "value" : 150.0,
      "keys" : [
        "王五"
      ]
    }
  }
}

聚合的作用范围

默认的作用范围是query的查询结果集
我们可以使用以下方式改变聚合的作用范围

post filter

在聚合分析之后进行筛选

# 按照名称分桶,分别统计每个人的订单金额信息[在返回结果的aggregations中展示],最后筛选出张三的信息[在返回结果的hits中展示]
GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName"
      },
      "aggs": {
        "stats": {
          "stats": {
            "field": "amount"
          }
        }
      }
    }
  },
  "post_filter": {
    "term": {
      "originatorName": "张三"
    }
  }
}

查询结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "aggs_order",
        "_type" : "_doc",
        "_id" : "HASA-XSIAN-SIWU",
        "_score" : 1.0,
        "_source" : {
          "platform" : "Android",
          "amount" : 100,
          "createTime" : "2019-05-20 10:00",
          "originatorId" : 1,
          "originatorName" : "张三",
          "goodsId" : 1,
          "goodsName" : "IPhone 8 Plus"
        }
      },
      {
        "_index" : "aggs_order",
        "_type" : "_doc",
        "_id" : "USYX_SJJSUL_XUSYA",
        "_score" : 1.0,
        "_source" : {
          "platform" : "PC",
          "amount" : 500,
          "createTime" : "2020-01-20 10:00",
          "originatorId" : 1,
          "originatorName" : "张三",
          "goodsId" : 2,
          "goodsName" : "IPhone 9 Plus"
        }
      }
    ]
  },
  "aggregations" : {
    "group_by_originatorName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "张三",
          "doc_count" : 2,
          "stats" : {
            "count" : 2,
            "min" : 100.0,
            "max" : 500.0,
            "avg" : 300.0,
            "sum" : 600.0
          }
        },
        {
          "key" : "王五",
          "doc_count" : 2,
          "stats" : {
            "count" : 1,
            "min" : 300.0,
            "max" : 300.0,
            "avg" : 300.0,
            "sum" : 300.0
          }
        },
        {
          "key" : "李四",
          "doc_count" : 1,
          "stats" : {
            "count" : 1,
            "min" : 1200.0,
            "max" : 1200.0,
            "avg" : 1200.0,
            "sum" : 1200.0
          }
        }
      ]
    }
  }
}

global

在该聚合中,忽略掉query部分的查询限制

GET /aggs_order/_search
{
  "size": 0, 
  "query": {
    "range": {
      "amount": {
        "gt": 100
      }
    }
  }, 
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName"
      },
      "aggs": {
        "stats": {
          "stats": {
            "field": "amount"
          }
        }
      }
    },
    "all": {
      "global": {},
      "aggs": {
        "group_by_originatorName": {
          "terms": {
            "field": "originatorName"
          },
          "aggs": {
            "stats": {
              "stats": {
                "field": "amount"
              }
            }
          }
        }
      }
    }
  }
}

我们看下结果比对下:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "all" : {
      "doc_count" : 5,
      "group_by_originatorName" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
          {
            "key" : "张三",
            "doc_count" : 2,
            "stats" : {
              "count" : 2,
              "min" : 100.0,
              "max" : 500.0,
              "avg" : 300.0,
              "sum" : 600.0
            }
          },
          {
            "key" : "王五",
            "doc_count" : 2,
            "stats" : {
              "count" : 1,
              "min" : 300.0,
              "max" : 300.0,
              "avg" : 300.0,
              "sum" : 300.0
            }
          },
          {
            "key" : "李四",
            "doc_count" : 1,
            "stats" : {
              "count" : 1,
              "min" : 1200.0,
              "max" : 1200.0,
              "avg" : 1200.0,
              "sum" : 1200.0
            }
          }
        ]
      }
    },
    "group_by_originatorName" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "张三",
          "doc_count" : 1,
          "stats" : {
            "count" : 1,
            "min" : 500.0,
            "max" : 500.0,
            "avg" : 500.0,
            "sum" : 500.0
          }
        },
        {
          "key" : "李四",
          "doc_count" : 1,
          "stats" : {
            "count" : 1,
            "min" : 1200.0,
            "max" : 1200.0,
            "avg" : 1200.0,
            "sum" : 1200.0
          }
        },
        {
          "key" : "王五",
          "doc_count" : 1,
          "stats" : {
            "count" : 1,
            "min" : 300.0,
            "max" : 300.0,
            "avg" : 300.0,
            "sum" : 300.0
          }
        }
      ]
    }
  }
}

排序

根据关键字排序

  • _count
  • _key

通过聚合后的文档数量和关键词排序

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName",
        "order": [
          {"_count": "desc"},
          {"_key": "desc"}
          ]
      }
    }
  }
}

根据子单值聚合结果排序

使用类似min,max,min等返回单值结果的聚合作为排序条件

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName",
        "order": {
          "avg_amount": "desc"
        }
      },
      "aggs": {
        "avg_amount": {
          "avg": {
            "field": "amount"
          }
        }
      }
    }
  }
}

根据子多值聚合结果排序

使用类似stats等返回多值结果的聚合中的某一项作为排序条件

GET /aggs_order/_search
{
  "size": 0,
  "aggs": {
    "group_by_originatorName": {
      "terms": {
        "field": "originatorName",
        "order": {
          "stats_amount.sum": "desc"
        }
      },
      "aggs": {
        "stats_amount": {
          "stats": {
            "field": "amount"
          }
        }
      }
    }
  }
}

思考题

nested查询,内部需要聚合,再刷选,怎么弄?

业务场景:当前有100w用户,50w红包记录,一个用户有多条红包记录。首先建100w索引记录,然后在用户记录中,使用一个字段nested类型,保存对应当前的红包列表。
红包记录有:红包金额,红包有效期。
需求:需要实现一个功能,在当前的红包有效期内,累计的红包金额满足,对应的当前用户有多少?

在找资料的时候,发现了这么一个问题,然后我自己试了一下,现在给出我的答案

  • 第一种方式
    这种方式需要在terms中指定size,多分片时候会有数据精准度问题,而且如果size过大,会占用更多内存,慎用
GET /aggs_user_envelope/_search
{
  "size": 0,
  "aggs": {
    "aggs_nested": {
      "nested": {
        "path": "envelope"
      },
      "aggs": {
        "filter_date": {
          "filters": {
            "filters": {
              "range": {
                "range": {
                  "envelope.until": {
                    "gte": "2020-05-30 00:00"
                  }
                }
              }
            }
          },
          "aggs": {
            "group_by_username": {
              "terms": {
                "field": "envelope.userId",
                "size": 10
              },
              "aggs": {
                "sum_of_money": {
                  "sum": {
                    "field": "envelope.money"
                  }
                },
                "filter_money": {
                  "bucket_selector": {
                    "buckets_path": {
                      "money": "sum_of_money"
                    },
                    "script": "params.money >= 50"
                  }
                },
                "sort": {
                  "bucket_sort": {
                    "sort": [
                      {"sum_of_money": {"order": "desc"}}
                      ,{"_count": {"order": "desc"}}
                      ,{"_key": {"order": "desc"}}
                      ]
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}
  • 第二种方式
    官网推荐使用composite进行分页,类似scroll分页,但是composite聚合也有限制,内部只能是Terms,Histogram,Date histogram这三种聚合

第一次分页:

GET /aggs_user_envelope/_search
{
  "size": 0,
  "aggs": {
    "nested_wrapper": {
      "nested": {
        "path": "envelope"
      },
      "aggs": {
        "group_by_userName": {
          "composite": {
            "size": 2, 
            "sources": [
              {
                "userName": {
                  "terms": {
                    "field": "envelope.userId",
                    "missing_bucket": true
                  }
                }
              }
            ]
          },
          "aggs": {
            "filter_date": {
              "filter": {
                "range": {
                  "envelope.until": {
                    "gte": "2020-05-30 00:00"
                  }
                }
              },
              "aggs": {
                "sum_of_money": {
                  "sum": {
                    "field": "envelope.money"
                  }
                }
              }
            },
            "filter_money": {
              "bucket_selector": {
                "buckets_path": {
                  "money": "filter_date>sum_of_money"
                },
                "script": "params.money >= 50"
              }
            },
            "sort": {
              "bucket_sort": {
                "sort": [
                  {"filter_date>sum_of_money": {"order": "desc"}}
                  ,{"_count": {"order": "desc"}}
                  ,{"_key": {"order": "desc"}}
                  ]
              }
            }
          }
        }
      }
    }
  }
}

我们看下第一次分页的结果

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "nested_wrapper" : {
      "doc_count" : 9,
      "group_by_userName" : {
        "after_key" : {
          "userName" : "10086" // 使用这个作为下次分页的依据
        },
        "buckets" : [
          {
            "key" : {
              "userName" : "10086"
            },
            "doc_count" : 3,
            "filter_date" : {
              "doc_count" : 3,
              "sum_of_money" : {
                "value" : 50.0
              }
            }
          }
        ]
      }
    }
  }
}

第二次分页需要指定after

GET /aggs_user_envelope/_search
{
  "size": 0,
  "aggs": {
    "nested_wrapper": {
      "nested": {
        "path": "envelope"
      },
      "aggs": {
        "group_by_userName": {
          "composite": {
            "size": 2, 
            "sources": [
              {
                "userName": {
                  "terms": {
                    "field": "envelope.userId",
                    "missing_bucket": true
                  }
                }
              }
            ],
            "after": {"userName" : "10086"} // 这里指定after
          },
          "aggs": {
            "filter_date": {
              "filter": {
                "range": {
                  "envelope.until": {
                    "gte": "2020-05-30 00:00"
                  }
                }
              },
              "aggs": {
                "sum_of_money": {
                  "sum": {
                    "field": "envelope.money"
                  }
                }
              }
            },
            "filter_money": {
              "bucket_selector": {
                "buckets_path": {
                  "money": "filter_date>sum_of_money"
                },
                "script": "params.money >= 50"
              }
            },
            "sort": {
              "bucket_sort": {
                "sort": [
                  {"filter_date>sum_of_money": {"order": "desc"}}
                  ,{"_count": {"order": "desc"}}
                  ,{"_key": {"order": "desc"}}
                  ]
              }
            }
          }
        }
      }
    }
  }
}

3. 大功告成

最后编辑于
?著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 214,128评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,316评论 3 388
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事?!?“怎么了?”我有些...
    开封第一讲书人阅读 159,737评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,283评论 1 287
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,384评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,458评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,467评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,251评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,688评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,980评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,155评论 1 342
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,818评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,492评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,142评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,382评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,020评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,044评论 2 352