全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 winbugs及其他软件专版
1351 4
2017-04-18
Avro Data Source for Apache Spark

A library for reading and writing Avro data from Spark SQL.

Requirements

This documentation is for version 3.2.0 of this library, which supports Spark 2.0+. For documentation on earlier versions of this library, see the links below.

This library has different versions for Spark 1.2, 1.3, 1.4 through 1.6, and 2.0+:

Spark VersionCompatible version of Avro Data Source for Spark
1.20.2.0
1.31.0.0
1.4-1.62.0.1
2.0+3.2.0 (this version)Linking

This library is cross-published for Scala 2.11, so 2.11 users should replace 2.10 with 2.11 in the commands listed below.

You can link against this library in your program at the following coordinates:

Using SBT:

libraryDependencies += "com.databricks" %% "spark-avro" % "3.2.0"

Using Maven:

<dependency>    <groupId>com.databricks</groupId>    <artifactId>spark-avro_2.10</artifactId>    <version>3.2.0</version></dependency>
With spark-shell or spark-submit

This library can also be added to Spark jobs launched through spark-shell or spark-submit by using the --packagescommand line option. For example, to include it when starting the spark shell:

$ bin/spark-shell --packages com.databricks:spark-avro_2.11:3.2.0

Unlike using --jars, using --packages ensures that this library and its dependencies will be added to the classpath. The --packages argument can also be used with bin/spark-submit.

Features

Avro Data Source for Spark supports reading and writing of Avro data from Spark SQL.

  • Automatic schema conversion: It supports most conversions between Spark SQL and Avro records, making Avro a first-class citizen in Spark.
  • Partitioning: This library allows developers to easily read and write partitioned data witout any extra configuration. Just pass the columns you want to partition on, just like you would for Parquet.
  • Compression: You can specify the type of compression to use when writing Avro out to disk. The supported types areuncompressed, snappy, and deflate. You can also specify the deflate level.
  • Specifying record names: You can specify the record name and namespace to use by passing a map of parameters withrecordName and recordNamespace.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2017-4-18 03:59:35
复制代码
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-4-18 03:59:56
复制代码
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-4-18 07:08:48
谢谢楼主分享!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2017-4-18 07:09:20
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群