Skip to content

Commit

Permalink
Merge pull request #179 from allwefantasy/mlsql
Browse files Browse the repository at this point in the history
增加carbondata文档
  • Loading branch information
allwefantasy authored Apr 25, 2018
2 parents 7e87624 + 45f0cc0 commit ce4d22f
Show file tree
Hide file tree
Showing 3 changed files with 99 additions and 1 deletion.
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@

1. [网页抓取](https://github.com/allwefantasy/streamingpro/blob/master/docs/crawler.md)

## CarbonData 整合

1. [如何使用CarbonData作为存储](https://github.com/allwefantasy/streamingpro/blob/master/docs/carbondata.md)

## 用配置来编程
1. [Spark Streaming](https://github.com/allwefantasy/streamingpro/blob/master/docs/sparkstreamingjson.md)
1. [Spark 批处理](https://github.com/allwefantasy/streamingpro/blob/master/docs/batchjson.md)
Expand Down
92 changes: 92 additions & 0 deletions docs/carbondata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
## 如何使用Carbondata

如果要开启Carbondata支持,编译时需要加上 `-Pcarbondata` ,同时
启动时需要添加下面参数开启。当前carbondata 版本为 1.3.1。

```
-streaming.enableCarbonDataSupport true
-streaming.carbondata.store "/data/carbon/store"
-streaming.carbondata.meta "/data/carbon/meta"
```

通过rest 接接口/run/sql 创建一张表:

```
CREATE TABLE carbon_table2 (
col1 STRING,
col2 STRING
)
STORED BY 'carbondata'
TBLPROPERTIES('streaming'='true')
```

通过rest接口 /run/script提交一个流式程序:

```sql
set streamName="streamExample";
load kafka9.`-` where `kafka.bootstrap.servers`="127.0.0.1:9092"
and `topics`="testM"
as newkafkatable1;

select "abc" as col1,decodeKafka(value) as col2 from newkafkatable1
as table21;

save append table21
as carbondata.`-`
options mode="append"
and duration="10"
and dbName="default"
and tableName="carbon_table2"
and `carbon.stream.parser`="org.apache.carbondata.streaming.parser.RowStreamParserImp"
and checkpointLocation="/data/carbon/store/default/carbon_table2/.streaming/checkpoint";
```

通过rest接口 `/stream/jobs/running` 查看是否正常运行。

通过rest接口 `run/sql`查看结果表:

```sql
select * from carbon_table2
```

查询结果如下:

```json
[
{
"col1": "abc"
},
{
"col1": "abc"
},
{
"col1": "abc"
},
{
"col1": "abc"
},
{
"col1": "abc",
"col2": "今天才是我的dafk"
},
{
"col1": "abc",
"col2": "dakfea"
},
{
"col1": "abc",
"col2": "afek"
},
{
"col1": "abc",
"col2": "dafkea"
},
{
"col1": "abc",
"col2": "dafkea"
}
]
```


4 changes: 3 additions & 1 deletion docs/compile.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ mvn -DskipTests clean package \
```

如果要开启Carbondata支持,加上 `-Pcarbondata` 即可。

如果提示streamingpro-dsl 依赖包无法找到,那么分别进入 streamingpro-dsl/streamingpro-dsl-legacy执行如下指令:

```
Expand Down Expand Up @@ -129,4 +131,4 @@ cd streamingpro
mvn -DskipTests clean package -pl streamingpro-spark -am -Ponline -Pscala-2.10 -Pcarbondata -Phive-thrift-server -Pspark-1.6.1 -Pshade
```

值得注意的是,StreamingPro已经对spark 1.6.x 版本已经停止维护了。
值得注意的是,StreamingPro已经对spark 1.6.x 版本已经停止维护了。

0 comments on commit ce4d22f

Please sign in to comment.