elasticsearch版本 2.4.5

安装

  • 下载es并解压
  • 修改config目录下elasticsearch.yml文件

    1
    2
    3
    4
    cluster.name: my-es
    node.name: node-1
    network.host: 192.168.70.128
    http.port: 9200
  • 在elasticsearch根目录下创建data目录

elasticsearch不允许使用root账户启动,需要添加一个新账户

1
2
3
4
5
6
useradd esroot #创建用户
passwd esroot #设置密码,连续输入两次
groupadd es #添加分组
usermod -G esroot es #如果报用户“es”不存在,是因为服务器版本问题,改成usermod -G es esroot
chown -R esroot.es * #在elasticsearch根目录下执行,给用户赋权
su esroot #切换用户

启动

1
bin/elasticsearch

如果出现以下代码表示启动成功

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"name": "node-1",
"cluster_name": "my-es",
"cluster_uuid": "PTij19gvR6iKW2lY9XWGtw",
"version": {
"number": "2.4.5",
"build_hash": "c849dd13904f53e63e88efc33b2ceeda0b6a1276",
"build_timestamp": "2017-04-24T16:18:17Z",
"build_snapshot": false,
"lucene_version": "5.5.4"
},
"tagline": "You Know, for Search"
}

插件安装

也可以从本地文件系统安装插件

1
bin/plugin install file:///usr/local/hadoop/thirdparty/elasticsearch-2.4.5/elasticsearch-sql-2.4.5.0.zip

jdbc使用demo

maven仓库地址:http://mvnrepository.com/artifact/org.nlpcn/elasticsearch-sql

1
2
3
4
5
6
7
8
9
10
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>2.4.5</version>
</dependency>
<dependency>
<groupId>org.nlpcn</groupId>
<artifactId>elasticsearch-sql</artifactId>
<version>2.4.1.0</version>
</dependency>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
public void testJDBC() throws Exception {
Properties properties = new Properties();
properties.put("url", "jdbc:elasticsearch://192.168.70.128:9300/");
DruidDataSource dds = (DruidDataSource) ElasticSearchDruidDataSourceFactory.createDataSource(properties);
Connection connection = dds.getConnection();
PreparedStatement ps = connection.prepareStatement("SELECT * from radiott");
ResultSet resultSet = ps.executeQuery();
List<String> result = new ArrayList<String>();
while (resultSet.next()) {
System.out.println(resultSet.getString("id") + "," + resultSet.getString("name"));
}
ps.close();
connection.close();
dds.close();
}
sql的一些注意事项
  • 不等于不能写成“!=”,只能是”<>”
  • 支持join聚合,但不支持在聚合上进行count操作,如select count(0) from a join b on a.x=b.y
  • 分页语法同mysql,使用limit
  • 不支持update,支持delete
  • 查询types使用 select * from indexName/typesName

    elasticsearch-head插件

    github主页

    访问地址:http://192.168.70.128:9200/_plugin/head/

    1
    bin/plugin install mobz/elasticsearch-head

elasticsearch-analysis-ik分词器插件(v1.10.5)

github主页

与Hadoop集成

与Hive集成

下载elasticsearch-hadoop-5.4.0.zip包
github主页

Java工程中maven引用

1
2
3
4
5
6
7
8
9
10
11
12
<!--包含hive、pig、storm、spark等-->
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop</artifactId>
<version>5.4.0</version>
</dependency>
<!--如果仅仅和hive集成,可以用下面精简的pom-->
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-hadoop-hive</artifactId>
<version>5.4.0</version>
</dependency>

详细见文档:https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html

  • 解压elasticsearch-hadoop-5.4.0.zip包并拷贝elasticsearch-hadoop-5.4.0.jar到$HIVE_HOME/lib/目录下
  • 修改hive-site.xml中hive.aux.jars.path属性的value值,追加elasticsearch-hadoop-5.4.0.jar的路径

    1
    2
    3
    4
    <property>
    <name>hive.aux.jars.path</name>
    <value>xxx.jar,file:///usr/local/hadoop/thirdparty/hive/apache-hive-2.1.1/lib/elasticsearch-hadoop-5.4.0.jar</value>
    </property>
  • 建立源数据表

    如果不需要hive做数据分析等可以不用建立源数据表,可以用中间表的方式替代

    1
    CREATE TABLE hive_es_source (id STRING, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
  • 建立hive外部表关联到es的索引

    1
    2
    3
    CREATE EXTERNAL TABLE hive_es (id STRING, name STRING)
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    TBLPROPERTIES('es.resource' = 'radiott/artiststt','es.index.auto.create' = 'true','es.nodes' = '192.168.70.128','es.port' = '9200');

es.resource的radiott/artiststt分别是索引名和索引的类型,这个是在es访问数据时候使用的。
es.nodes是配置的es的url地址,默认是localhost。es.port是端口号码,默认是9200

  • 插入数据
    1
    2
    3
    4
    5
    /*使用源数据表*/
    insert into table hive_es_source vvalues('1','test111');
    insert into table hive_es select * from hive_es_source where id='1';
    /*使用temp中间表,类似于oracle中的dual*/
    insert into table hive_es select '1','test111' from temp limit 1;

一些配置项说明

文档:https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/hive.html
https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/configuration.html

  • ‘es.resource’ = ‘xxx/yyy’:索引名/索引类型
  • ‘es.index.auto.create’ = ‘true|false’:自动创建索引
  • ‘es.nodes’ = ‘ip’:es的URL地址,默认localhost
  • ‘es.port’ = ‘9200’:es的端口,默认9200
  • ‘es.mapping.names’ = ‘date:@timestamp,url:url_123’:列明映射。hive列date映射estimestamp,hive列url映射url_123。默认以hive列明为准一一对应。
  • ‘es.mapping.id’ = ‘id’:指定使用hive名为id的列的值作为es的id
  • ‘es.input.json` = ‘yes’:使用json作为es的输入
  • ‘es.output.json` = ‘yes’:使用json作为es的输出,同es.input.json一起使用

    使用json作为es的唯一输入

  • 创建表定义
    1
    2
    3
    CREATE EXTERNAL TABLE hive_es_json (json STRING)
    STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
    TBLPROPERTIES('es.resource' = 'json/es_json','es.index.auto.create' = 'true','es.nodes' = '192.168.70.128','es.port' = '9200','es.input.json' = 'yes','es.output.json' = 'yes');
  • 插入数据
    1
    insert into table hive_es_json select '{"id":"1","name":"test111"}' from temp limit 1;

在es中可用sql查询,json字段作为条件,如SELECT FROM json where name like ‘%test%’*

集成时一些错误

Error: java.io.IOException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Index [radiott/artiststt] missing and settings [es.index.read.missing.as.empty] is set to false (state=,code=0)

索引不存在,再创建hive外部表的时候关联es的索引radiott/artiststt,当es中没有该索引时报错

Java API

文档:https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.4/index.html

使用时的一些错误

Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor

加入guava的包

1
2
3
4
5
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>18.0</version>
</dependency>

Caused by: java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/FormatFeature

jackson版本不一致,修改jackson版本

1
<jackson.version>2.8.1</jackson.version>