1.基本概念

2.安装

2.1 单节点启动

D:\DevTools\Elasticsearch\elasticsearch-2.4.1\bin目录下双击elasticsearch.bat启动es。在浏览器中

下载

基础操作

Elasticsearch基础操作

  • 获取Elasticsearch基础信息
[GET]
http://localhost:9200

response:

{
    "name": "es1",
    "cluster_name": "elasticsearch",
    "cluster_uuid": "WLSntdVPS0SVXnsiTk8xqQ",
    "version": {
        "number": "2.4.1",
        "build_hash": "c67dc32e24162035d18d6fe1e952c4cbcbe79d16",
        "build_timestamp": "2016-09-27T18:57:55Z",
        "build_snapshot": false,
        "lucene_version": "5.5.2"
    },
    "tagline": "You Know, for Search"
}
  • 获取es集群节点信息
[GET]
http://localhost:9200/_cluster/state/nodes

response:

{
    "cluster_name": "elasticsearch",
    "nodes": {
        "0Vq3zolQQsap79JayAoOPw": {
            "name": "es1",
            "transport_address": "127.0.0.1:9300",
            "attributes": {}
        },
        "xByNWSIjS0ukAQhlJqIUcw": {
            "name": "es2",
            "transport_address": "127.0.0.1:9301",
            "attributes": {}
        }
    }
}
  • 关闭集群中一个或多个节点
    > PS. shutdown API从es 2.2开始已经被废弃了
[POST]
关闭某个节点
http://localhost:9200/_cluster/nodes/es1/_shutdown

关闭集群中所有节点
http://localhost:9200/_cluster/nodes/_shutdown

数据操作

  • 添加文档
[PUT]
http://localhost:9200/blog/article/1
{
"id": "1",
"title": "New version of Elasticsearch released!",
"content": "Version 1.0 released today!",
"priority": 10,
"tags": ["announce", "elasticsearch", "release"]
}

or cURL
curl -XPUT http://localhost:9200/blog/article/1 -d '{"title": "New version of
Elasticsearch released!", content": "Version 1.0 released today!", "tags": ["announce",
"elasticsearch", "release"] }'

注意cURL命令的一个新选项: -d 参数。此选项的值是将作为请求负载的文本,也即请求主
体(request body)。这样,我们可以发送附加信息,如文档定义。

response:

{
    "_index": "blog",
    "_type": "article",
    "_id": "1",
    "_version": 1,
    "_shards": {
        "total": 2,
        "successful": 1,
        "failed": 0
    },
    "created": true
}

前面的响应包含了此次操作状态的信息,并显示一个新的文档放在哪里,还包含了有关文档
的唯一标识符( _id )和当前版本( _version )的信息。版本将由Elasticsearch每次更新时自动
递增。

  • 添加文档(自动创建标识符)
[POST]
http://localhost:9200/blog/article/
{
"id": "1",
"title": "New version of Elasticsearch released!",
"content": "Version 1.0 released today!",
"priority": 10,
"tags": ["announce", "elasticsearch", "release"]
}

or cURL
curl -XPOST http://localhost:9200/blog/article -d '{"title": "New version of
Elasticsearch released!", content": "Version 1.0 released today!", "tags": ["announce",
"elasticsearch", "release"] }'

response:

{
    "_index": "blog",
    "_type": "article",
    "_id": "AWN88KPc0YFBzeeAiJSt",
    "_version": 1,
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    },
    "created": true
}

其中AWN88KPc0YFBzeeAiJSt就是自动生成的id标识符

  • 根据标识符搜索文档
[GET]
http://localhost:9200/blog/article/1

response:

{
    "_index": "blog",
    "_type": "article",
    "_id": "1",
    "_version": 1,
    "found": true,
    "_source": {
        "id": "1",
        "title": "New version of Elasticsearch released!",
        "content": "Version 1.0 released today!",
        "priority": 10,
        "tags": [
            "announce",
            "elasticsearch",
            "release"
        ]
    }
}

在前面的响应中,除了索引、类型、标识符和版本,还可以看到说明“发现文件存在”
( found 属性)以及此文档来源( _source 属性)的信息。如果没有找到文档,得到的响应如
下所示:

{
"_index" : "blog",
"_type" : "article",
"_id" : "9999",
"found" : false
}
  • 更新文档
[POST]
http://localhost:9200/blog/article/AWN88KPc0YFBzeeAiJSt/_update

{
    "doc":{
        "title":"哈利波特与魔法石",
        "counter":1
    }
}

es 2.4.1中标准更新操作。此操作会给将原来文档title修改,并新增counter字段并赋值为1。底层处理上,会将此文档删除后重建

response:

{
    "_index": "blog",
    "_type": "article",
    "_id": "AWN88KPc0YFBzeeAiJSt",
    "_version": 3,
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    }
}
  • 删除文档
[DELETE]
http://localhost:9200/blog/article/2

response:

{
    "found": true,
    "_index": "blog",
    "_type": "article",
    "_id": "2",
    "_version": 2,
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    }
}

注意:每次增删改接口的调用会使得文档_version递增。version的作用除了告诉我们此文档的更新次数外,还能够实现乐观锁,如下:

[DELETE]
http://localhost:9200/blog/article/AWN88KPc0YFBzeeAiJSt?version=3

response:

{
    "error": {
        "root_cause": [
            {
                "type": "version_conflict_engine_exception",
                "reason": "[article][AWN88KPc0YFBzeeAiJSt]: version conflict, current [-1], provided [3]",
                "shard": "2",
                "index": "blog"
            }
        ],
        "type": "version_conflict_engine_exception",
        "reason": "[article][AWN88KPc0YFBzeeAiJSt]: version conflict, current [-1], provided [3]",
        "shard": "2",
        "index": "blog"
    },
    "status": 409
}

只有当传入当前版本的时候,才能够删除成功,如下:

response:

{
    "found": true,
    "_index": "blog",
    "_type": "article",
    "_id": "1",
    "_version": 4,
    "_shards": {
        "total": 2,
        "successful": 2,
        "failed": 0
    }
}

个性化操作

  • 查询多个索引

    Elasticsearch的所有查询都是发送到_search端点,可以搜索一个或多个索引

[GET]
http://localhost:9200/books/_search?pretty
http://localhost:9200/books,blog/search?pretty

respose:

{
    "took": 32,
    "timed_out": false,
    "_shards": {
        "total": 10,
        "successful": 10,
        "failed": 0
    },
    "hits": {
        "total": 6,
        "max_score": 1,
        "hits": [
            {
                "_index": "blog",
                "_type": "article",
                "_id": "AWOC-NrEnH9uBqdR_hCF",
                "_score": 1,
                "_source": {
                    "id": "3",
                    "title": "New version of Elasticsearch released!",
                    "content": "Version 1.0 released today!",
                    "priority": 10,
                    "tags": [
                        "announce",
                        "elasticsearch",
                        "release"
                    ]
                }
            },
            {
                "_index": "books",
                "_type": "es",
                "_id": "2",
                "_score": 1,
                "_source": {
                    "title": "Mastering Elasticsearch",
                    "published": 2013
                }
            },
            {
                "_index": "blog",
                "_type": "article",
                "_id": "AWOC-GIknH9uBqdR_hBX",
                "_score": 1,
                "_source": {
                    "id": "1",
                    "title": "New version of Elasticsearch released!",
                    "content": "Version 1.0 released today!",
                    "priority": 10,
                    "tags": [
                        "announce",
                        "elasticsearch",
                        "release"
                    ]
                }
            },
            {
                "_index": "books",
                "_type": "es",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "title": "Elasticsearch Server",
                    "published": 2013
                }
            },
            {
                "_index": "books",
                "_type": "solr",
                "_id": "1",
                "_score": 1,
                "_source": {
                    "title": "Apache Solr 4 Cookbook",
                    "published": 2012
                }
            },
            {
                "_index": "blog",
                "_type": "article",
                "_id": "AWOC-brznH9uBqdR_hDR",
                "_score": 1,
                "_source": {
                    "id": "5",
                    "title": "哈利波特与魔法学院",
                    "content": "Version 1.0 released today!",
                    "priority": 9,
                    "tags": [
                        "哈利波特",
                        "魔法学院",
                        "version"
                    ]
                }
            }
        ]
    }
}
  • lucene语法查询
    我们传到lucene的查询会被查询解析器分为词(term)和操作符(operator)。
    词(term)
    例如传入词和短语应该分别写为:
    title:booktitle:"elasticsearch book"
    操作符(operator)
    如果想找 title 字段包含 book 一词但
    description 字段不包含 cat 一词的文档,传入以下查询:
    q=+title:book -description:cat
  • 自定义排序
    Elasticsearch默认情况下按照计算得分进行排序,如需要自定义排序,则添加sort=field:desc来进行降序或升序排序

    注意:在自定义排序的时候,Elasticsearch默认会取消计算文档_score字段,如果需要自定义排序的同时计算_score得分字段,则添加track_scores=true

  • 设置默认搜索超时时间
    Elasticsearch默认没有超时时间,如果需要设置超时时间,则添加参数timeout=5,即可设置为默认查询超时时间为5s。
  • 指定查询结果集大小
    from=5指定了从第6个结果开始返回
    size=10指定了返回10条
  • 搜索类型
    URI查询允许使用search_type指定搜索类型,默认搜索类型为query_then_fetch
    有以下类型:
dfs_query_then_fetch
dfs_query_and_fetch
query_then_fetch
query_and_fetch
count
scan

索引

创建索引

es默认创建索引的时候回创建5个分片和对应的副本,因此默认会建立10个索引。某些特定的时候需要手动创建索引时设置分片和副本的大小。如下:

  • 手动创建索引
curl -XPUT http://localhost:9200/blog/ -d '{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 2
}
}'

这样我们就创建了一个索引,设置了1个分片和2个副本,即总共3个物理lucene索引

  • 设置映射
    设置映射可以用于描述索引的结构,如SQL中的表结构,尽管es是无模式的搜索引擎,可以即时算出索引结构,但是我们建议自己控制索引结构(映射)也是一个比较好的办法。

在json中,字符串用双银行括起来,数字用数字,布尔值用true/false,这通常有利于es自动识别索引结构

>> 有时候无法将json中每个字段设置对应的数据类型,一切都是字符串形式索引,这时候可以采用类型猜测
curl -XPUT http://localhost:9200/blog/?pretty -d '{
"mappings": {
"article": {
"numeric_detection": true, //数字类型猜测
"dynamic_date_formats": ["yyyy-MM-dd hh:mm"] //设置可被自动识别为日期的格式
}
}
}'

>> 但是有时候自动类型猜测会导致数字的不精确或者其他问题,比如es识别某字段为integer或long,但是后来索引数据的时候来了float浮点型,那这时候浮点型就会被强制删除小数部分,导致精度丢失。 解决方案是禁用类型猜测,手动设置映射:
curl -XPUT http://localhost:9200/blog -d '{
"mappings": {
"article": {
"dynamic": "false",
"properties": {
"id": {
"type": "string"
},
"content": {
"type": "string"
},
"author": {
"type": "string"
}
}
}
}
}'

>> 映射类型定义
假设想创建一个保存博客帖子数据的 posts 索引。
我们可以创建一个post类型的映射(包含标识符id、标题、发布时间、内容),约定post类型的properties字段定义:
curl -XPOST http://localhost:9200/posts -d '
{
"mappings": {
"post": {
"properties": {
"id": {
"type": "long",
"store": "yes",
"precision_step": "0"
},
"name": {
"type": "string",
"store": "yes",
"index": "analyzed"
},
"published": {
"type": "date",
"store": "yes",
"precision_step": "0"
},
"contents": {
"type": "string",
"store": "no",
"index": "analyzed"
}
}
}
}
}'

response:
{"acknowledged":true}

注意:映射类型mappings对象可以存储在json文件中引用,如curl -XPOST http://localhost:9200/posts -d @mapping.json;
也可以同时指定多个类型映射,比如post类型与user类型:

{
"mappings": {
"post": {
"properties": {
"id": {
"type": "long",
"store": "yes",
"precision_step": "0"
},
"name": {
"type": "string",
"store": "yes",
"index": "analyzed"
},
"published": {
"type": "date",
"store": "yes",
"precision_step": "0"
},
"contents": {
"type": "string",
"store": "no",
"index": "analyzed"
}
}
},
"user": {
"properties": {
"id": {
"type": "long",
"store": "yes",
"precision_step": "0"
},
"name": {
"type": "string",
"store": "yes",
"index": "analyzed"
}
}
}
}
}

<pre><code class=" line-numbers"><br />  **>> 字段属性说明**
请见书本P51

**>> 使用分析器**
请见书本P56

## 高级操作

#### 批量操作 [Bulk API](https://www.elastic.co/guide/en/elasticsearch/reference/5.2/docs-bulk.html)
- **准备批量操作数据**
批量操作可处理如下操作:
- 在索引中增加或更换现有文档( **index** );
- 从索引中移除文档( **delete** );
- 当索引中不存在其他文档定义时,在索引中增加新文档( **create** )。

为了获得更高的处理效率,es对于请求格式有一定规范:
请求的每一行包含描述操作说明的JSON对象,第二行为JSON对象本身。可以把**第一行视为信息行**,**第二类为数据行**。唯一的例外是 delete 操作,它只包含信息行。如下:

</code></pre>

{ "index": { "_index": "addr", "_type": "contact", "_id": 1 }}
{ "name": "Fyodor Dostoevsky", "country": "RU" }
{ "create": { "_index": "addr", "_type": "contact", "_id": 2 }}
{ "name": "Erich Maria Remarque", "country": "DE" }
{ "create": { "_index": "addr", "_type": "contact", "_id": 2 }}
{ "name": "Joseph Heller", "country": "US" }
{ "delete": { "_index": "addr", "_type": "contact", "_id": 4 }}
{ "delete": { "_index": "addr", "_type": "contact", "_id": 1 }}

每一个文档或操作说明放置在一行中(以换行符结束)。这意味着无法美化文档格
式。批量索引文件的大小存在限制,它被设定为100 MB,在Elasticsearch配置文件中可以通过
http.max_content_length 属性来改变。这避免了请求过大时可能存在的请求超时及内存问题。

  • 索引数据
    执行批量操作的请求有三种端点格式:

    • /_bulk *(不限定_index/_type)*
    • /index_name/_bulk *(限定_index)*
    • /index_name/type_name/_bulk *(限定_index/_type)*

    假设已经在documents.json文件中存储了数据,请求方式:

  curl -XPOST 'localhost:9200/_bulk?pretty' --data-binary @documents.json

其中,我们使用--data-binary参数而不是使用-d参数,这是因为-d参数会忽略换行符,而批量操作中解析请求数据时尤其看重换行符。
response:

{
    "took": 139,
    "errors": true,
    "items": [{
        "index": {
            "_index": "addr",
            "_type": "contact",
            "_id": "1",
            "_version": 1,
            "status": 201
        }
    },
    {
        "create": {
            "_index": "addr",
            "_type": "contact",
            "_id": "2",
            "_version": 1,
            "status": 201
        }
    },
    {
        "create": {
            "_index": "addr",
            "_type": "contact",
            "_id": "2",
            "status": 409,
            "error": "DocumentAlreadyExistsException[[addr][3][contact][2]: document already exists]"
        }
    },
    {
        "delete": {
            "_index": "addr",
            "_type": "contact",
            "_id": "4",
            "_version": 1,
            "status": 404,
            "found": false
        }
    },
    {
        "delete": {
            "_index": "addr",
            "_type": "contact",
            "_id": "1",
            "_version": 2,
            "status": 200,
            "found": true
        }
    }]
}

返回信息中,一个item对象都是对批量操作中某个请求的相应,因此如果请求量巨大的情况下,相应量也是巨大的。

段合并

提到段及其不变性,指出Lucene库以及Elasticsearch中一旦数据被写入某些结构,就不再改变。虽然这简化了一些东西,但是也引入了额外的工作,其中一个例子是删除。由于段是无法改变的,因而有关删除的相关信息必须单独存储并动态应用到搜索过程中。这样做是为了从返回结果中去除已删除的文件。另一个例子是文档无法修改(有些修改是可能的,例如修改数值型doc值)。当然,我们可以说,Elasticsearch支持文档更新(请参阅1.4节)。然而在底层,实际上是删除旧文档,再把更新内容的文档编入索引。
随着时间的推移和持续索引数据,越来越多的段被创建。因此,搜索性能可能会降低,而且索引可能比原先大,因为它仍含有被删除的文件。这使得段合并有了用武之地。

详情请见书本P74

5 复杂查询DSL

5.1 查询ES

5.1.1 简单查询

原来使用的方式:

curl -XGET 'localhost:9200/library/book/_search?q=title:crime&pretty=true'

用DSL方式请求:
它查询 title 字段中含有 crime 一词的文档

[GET] 
localhost:9200/library/book/_search -d
{
"query" : {
"query_string" : { "query" : "title:crime" }
}
}

注意这里如果用postman的话徐亚选择POST请求方式,否则无法正常传入-d请求数据

5.1.2 分页和结果集大小

  • from:该属性指定我们希望在结果中返回的起始文档。它的默认值是 0 ,表示想要得到从
    第一个文档开始的结果。
  • size:该属性指定了一次查询中返回的最大文档数,默认值为 10 。如果只对切面结果感
    兴趣,并不关心文档本身,可以把这个参数设置成 0 。

例如想从第9个文档开始返回10个文档:

{
    "from":8,
    "size":10,
    "query":{
        "query_string":{"query":"title:crime"}
    }
}

5.1.3 返回版本值version

增加version:true请求参数:

{
    "version":true,
    "from":0,
    "size":10,
    "query":{
        "query_string":{"query":"title:crime"}
    }
}

在某些特殊情况下,可以通过设置最低分值来筛选结果集。如增加:
"min_score" : 0.75,则可以筛选得分高于0.75的结果集。

5.1.4 选定返回字段

增加fields数组参数

{
    "fields" : [ "title", "year" ],
    "query":{
        "query_string":{"query":"title:crime"}
    }
}

1.不设置fields会默认返回_source字段下所有值
2.返回_source比选定某些fields要性能更好
3.如果选定的fields中有数据中不存在的,则不返回该field

特殊的,ES支持使用部分字段 partial_fields功能,如下:
为了在查询中包括以 titl 开头且排除以 chara开头的字段,发出以下查询:

{
    "partial_fields" : {
        "partial1" : {
            "include" : [ "titl*" ],
            "exclude" : [ "chara*" ]
        }
    },
    "query":{
        "query_string":{"query":"title:crime"}
    }
}

数据同步

从db导入es