elasticsearch安装与集成

elasticsearch版本 2.4.5

安装

下载es并解压

修改config目录下elasticsearch.yml文件

cluster.name: my-es
node.name: node-1
network.host: 192.168.70.128
http.port: 9200

在elasticsearch根目录下创建data目录

elasticsearch不允许使用root账户启动，需要添加一个新账户

useradd esroot #创建用户
passwd esroot #设置密码，连续输入两次
groupadd es #添加分组
usermod -G esroot es #如果报用户“es”不存在，是因为服务器版本问题，改成usermod -G es esroot
chown -R esroot.es * #在elasticsearch根目录下执行，给用户赋权
su esroot #切换用户

启动

1	bin/elasticsearch

如果出现以下代码表示启动成功

{
    "name": "node-1",
    "cluster_name": "my-es",
    "cluster_uuid": "PTij19gvR6iKW2lY9XWGtw",
    "version": {
        "number": "2.4.5",
        "build_hash": "c849dd13904f53e63e88efc33b2ceeda0b6a1276",
        "build_timestamp": "2017-04-24T16:18:17Z",
        "build_snapshot": false,
        "lucene_version": "5.5.4"
    },
    "tagline": "You Know, for Search"
}

插件安装

bin/plugin install plugin-name
elasticsearch-sql插件(v2.4.5.0)
github主页
访问地址：http://192.168.70.128:9200/_plugin/sql/
1
bin/plugin install https://github.com/NLPchina/elasticsearch-sql/releases/download/2.4.5.0/elasticsearch-sql-2.4.5.0.zip

也可以从本地文件系统安装插件

1	bin/plugin install file:///usr/local/hadoop/thirdparty/elasticsearch-2.4.5/elasticsearch-sql-2.4.5.0.zip

jdbc使用demo

maven仓库地址：http://mvnrepository.com/artifact/org.nlpcn/elasticsearch-sql

<dependency>
      <groupId>org.elasticsearch</groupId>
      <artifactId>elasticsearch</artifactId>
      <version>2.4.5</version>
</dependency>
<dependency>
      <groupId>org.nlpcn</groupId>
      <artifactId>elasticsearch-sql</artifactId>
      <version>2.4.1.0</version>
</dependency>

public void testJDBC() throws Exception {
    Properties properties = new Properties();
    properties.put("url", "jdbc:elasticsearch://192.168.70.128:9300/");
    DruidDataSource dds = (DruidDataSource) ElasticSearchDruidDataSourceFactory.createDataSource(properties);
    Connection connection = dds.getConnection();
    PreparedStatement ps = connection.prepareStatement("SELECT * from radiott");
    ResultSet resultSet = ps.executeQuery();
    List<String> result = new ArrayList<String>();
    while (resultSet.next()) {
        System.out.println(resultSet.getString("id") + "," + resultSet.getString("name"));
    }
    ps.close();
    connection.close();
    dds.close();
}

sql的一些注意事项

不等于不能写成“!=”，只能是”<>”
支持join聚合，但不支持在聚合上进行count操作，如select count(0) from a join b on a.x=b.y
分页语法同mysql，使用limit
不支持update，支持delete
查询types使用 select * from indexName/typesName
elasticsearch-head插件
github主页
访问地址：http://192.168.70.128:9200/_plugin/head/
1
bin/plugin install mobz/elasticsearch-head

elasticsearch-analysis-ik分词器插件(v1.10.5)

github主页

与Hadoop集成

与Hive集成

下载elasticsearch-hadoop-5.4.0.zip包
github主页

Java工程中maven引用

<!--包含hive、pig、storm、spark等-->
<dependency>
      <groupId>org.elasticsearch</groupId>
      <artifactId>elasticsearch-hadoop</artifactId>
      <version>5.4.0</version>
</dependency>
<!--如果仅仅和hive集成，可以用下面精简的pom-->
<dependency>
      <groupId>org.elasticsearch</groupId>
      <artifactId>elasticsearch-hadoop-hive</artifactId>
      <version>5.4.0</version>
</dependency>

详细见文档：https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html

解压elasticsearch-hadoop-5.4.0.zip包并拷贝elasticsearch-hadoop-5.4.0.jar到$HIVE_HOME/lib/目录下

修改hive-site.xml中hive.aux.jars.path属性的value值，追加elasticsearch-hadoop-5.4.0.jar的路径

<property>
    <name>hive.aux.jars.path</name>
   <value>xxx.jar,file:///usr/local/hadoop/thirdparty/hive/apache-hive-2.1.1/lib/elasticsearch-hadoop-5.4.0.jar</value>
  </property>

建立源数据表
如果不需要hive做数据分析等可以不用建立源数据表，可以用中间表的方式替代
1
CREATE TABLE hive_es_source (id STRING, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

建立hive外部表关联到es的索引

1
2
3

CREATE EXTERNAL TABLE hive_es (id STRING, name STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'radiott/artiststt','es.index.auto.create' = 'true','es.nodes' = '192.168.70.128','es.port' = '9200');

es.resource的radiott/artiststt分别是索引名和索引的类型，这个是在es访问数据时候使用的。
es.nodes是配置的es的url地址，默认是localhost。es.port是端口号码，默认是9200

插入数据

/*使用源数据表*/
insert into table hive_es_source vvalues('1','test111');
insert into table hive_es select * from hive_es_source where id='1';
/*使用temp中间表，类似于oracle中的dual*/
insert into table hive_es select '1','test111' from temp limit 1;

一些配置项说明

文档：https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/hive.html
https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/configuration.html
‘es.resource’ = ‘xxx/yyy’：索引名/索引类型

‘es.index.auto.create’ = ‘true|false’：自动创建索引

‘es.nodes’ = ‘ip’：es的URL地址，默认localhost

‘es.port’ = ‘9200’：es的端口，默认9200

‘es.mapping.names’ = ‘date:@timestamp,url:url_123’：列明映射。hive列date映射estimestamp,hive列url映射url_123。默认以hive列明为准一一对应。

‘es.mapping.id’ = ‘id’：指定使用hive名为id的列的值作为es的id

‘es.input.json` = ‘yes’：使用json作为es的输入

‘es.output.json` = ‘yes’：使用json作为es的输出，同es.input.json一起使用
使用json作为es的唯一输入
创建表定义
1
2
3
CREATE EXTERNAL TABLE hive_es_json (json STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'json/es_json','es.index.auto.create' = 'true','es.nodes' = '192.168.70.128','es.port' = '9200','es.input.json' = 'yes','es.output.json' = 'yes');

插入数据

1	insert into table hive_es_json select '{"id":"1","name":"test111"}' from temp limit 1;

在es中可用sql查询，json字段作为条件，如SELECT FROM json where name like ‘%test%’*

集成时一些错误

Error: java.io.IOException: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Index [radiott/artiststt] missing and settings [es.index.read.missing.as.empty] is set to false (state=,code=0)

索引不存在，再创建hive外部表的时候关联es的索引radiott/artiststt，当es中没有该索引时报错

Java API

文档：https://www.elastic.co/guide/en/elasticsearch/client/java-api/2.4/index.html

使用时的一些错误

Exception in thread “main” java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor

加入guava的包

<dependency>
   <groupId>com.google.guava</groupId>
    <artifactId>guava</artifactId>
    <version>18.0</version>
</dependency>

Caused by: java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/FormatFeature

jackson版本不一致，修改jackson版本

1	<jackson.version>2.8.1</jackson.version>

忘语的个人博客

elasticsearch安装与集成

安装

启动

插件安装

elasticsearch-sql插件(v2.4.5.0)

sql的一些注意事项

elasticsearch-head插件