全部产品
云市场

快速入门

更新时间:2019-09-06 10:28:36

上传作业jar包及python等文件

分析集群选择使用HttpFS服务来供用户上传管理作业的jar包、python文件等到服务端

1、从分析集群控制台获取HttpFS服务地址,比如:

  1. HttpFS: http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000

2、使用建议:

  • 目前可以使用Restful API或者命令行来管理这些资源。具体使用参考文档,下面以Restful API来举例:
  • 为了使用合理,建议HttpFS的用户名为:resource;resource上传的根目录为/resourcesdir/,该目录后面可以创建子目录

3、上传本地jar包或者python文件到Spark服务端

  • 创建目录/resourcesdir:

    1. curl -i -X PUT "http://ap-xxx.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir?op=MKDIRS&user.name=resource"
  • 上传jar:

    1. 上传本地./examples/jars/examples_2.11-2.3.2.jarHttpFs的/resourcesdir/examples_2.11-2.3.2.jar
    2. curl -i -X PUT -T ./examples/jars/examples_2.11-2.3.2.jar "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/examples_2.11-2.3.2.jar?op=CREATE&data=true&user.name=resource" -H "Content-Type:application/octet-stream"
  • 上传python:
    1. 上传本地./examples/src/main/python/pi.pyHttpFs的/resourcesdir/pi.py
    2. curl -i -X PUT -T ./examples/src/main/python/pi.py "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/pi.py?op=CREATE&data=true&user.name=resource" -H "Content-Type:application/octet-stream"
  • 查看文件:
    1. 查看HttpFs的/resourcesdir/目录文件
    2. curl -i "http://ap-xxx-.9b78df04-b.rds.aliyuncs.com:14000/webhdfs/v1/resourcesdir/?op=LISTSTATUS&user.name=resource"

通过作业管理服务(LivyServer)提交作业

Spark服务选择Apache LivyServer来构建作业管理服务,支持提交jar(包括streaming)、python等

1、分析集群控制台获取LivyServer服务地址,比如:

  1. LivyServer:http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998

2、提交作业

  • 编写LivyServer上传作业的json文件livy_pi.json
    1. {
    2. "file": "/resourcesdir/spark-examples_2.11-2.3.2.jar",
    3. "className": "org.apache.spark.examples.SparkPi",
    4. "driverMemory": "1g",
    5. "executorMemory": "1g",
    6. "conf": {
    7. "spark.executor.instances": "1",
    8. "spark.executor.cores": "1"
    9. }
    10. }
  • 提交jar作业

命令:

  1. curl -H "Content-Type: application/json" -X POST -d @livy_pi.json http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches |python -m json.tool

样例:

  1. [root@master]# curl -H "Content-Type: application/json" -X POST -d @livy_pi.json http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches |python -m json.tool
  2. % Total % Received % Xferd Average Speed Time Time Time Current
  3. Dload Upload Total Spent Left Speed
  4. 100 368 100 145 100 223 4815 7405 --:--:-- --:--:-- --:--:-- 7689
  5. {
  6. "appId": null,
  7. "appInfo": {
  8. "driverLogUrl": null,
  9. "sparkUiUrl": null
  10. },
  11. "id": 1,
  12. "log": [
  13. "stdout: ",
  14. "\nstderr: ",
  15. "\nYARN Diagnostics: "
  16. ],
  17. "state": "starting"
  18. }
  • 提交python作业

命令:

  1. curl -X POST --data '{"file": "/resourcesdir/pi.py"}' -H "Content-Type: application/json" http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches

3、查询作业状态

通过LivyServer的API以及Spark UI查看

命令:

  1. curl http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches/1/state | python -m json.tool

样例:

  1. [root@master t-apsara-spark-2.2.2]# curl http://ap-xxx-master1-001.spark.9b78df04-b.rds.aliyuncs.com:8998/batches/1/state | python -m json.tool
  2. % Total % Received % Xferd Average Speed Time Time Time Current
  3. Dload Upload Total Spent Left Speed
  4. 100 26 100 26 0 0 1904 0 --:--:-- --:--:-- --:--:-- 2000
  5. {
  6. "id": 1,
  7. "state": "success"
  8. }

4、 参考资料

Livy社区文档:https://livy.incubator.apache.org/

Spark社区文档:http://spark.apache.org/docs/2.3.2/

Aliyun官方Demo:https://github.com/aliyun/aliyun-apsaradb-hbase-demo/tree/master/spark