全部版块 我的主页
论坛 金融投资论坛 六区 金融学(理论版) 量化投资
18713 78
2015-01-21
Rwordseg、Rweibo、tm的安装 使用默认方法安装相关R中文文本挖掘包(tmcn、Rwordseg、Rweibo)时,会出现安装失败。合适的方法是:通过源代码安装相关包的程序:手工下载源代码及其依赖,然后编译安装。

首先是基础性的tm包。tm包是R文本挖掘的通用包。直接使用install.package即可安装。

本帖隐藏的内容

1install.packages("tm")
tmcnRwordsegRweibo是李舰等人开发的中文文本挖掘包。三个网页(官网)中提供了包说明与安装方法。但经测试,其中的安装方法不可用。
正确的安装方法为
先下载tmcn、[Rwordseg]、Rweibo
的源码。

tmcn无依赖关系,直接使用。
1install.packages("~/Downloads/tmcn_0.1-3.tar", repos=NULL, type="source")
Rwordseg依赖于rJava。
该包需要预先安装Java环境。如果未曾安装Java,请先安装Java,安装Java(及PATH的配置)过程不再赘述。
12install.packages("rJava")  install.packages("~/Downloads/Rwordseg_0.2-1.tar", repos=NULL, type="source")
Rweibo依赖于RCurl、rjson、XML、digest四个包。
这四个依赖包同样不能直接安装,需要先从科大源下载源码:(按包名搜索RCurl、XML、rjson、digest),然后再安装。
1234567install.packages("bitops") #RCurl的依赖  install.packages("~/Downloads/RCurl_1.95-4.1.tar", repos=NULL, type="source")  install.packages("~/Downloads/XML_3.98-1.1.tar", repos=NULL, type="source")  install.packages("~/Downloads/rjson_0.2.13.tar", repos=NULL, type="source")  install.packages("~/Downloads/digest_0.6.4.tar", repos=NULL, type="source")    install.packages("~/Downloads/Rweibo_0.2-9.tar", repos=NULL, type="source")

http://andy-henry.github.io/2014/05/24/rwordseg_install/






本帖隐藏的内容

第一部分

在用“Rwordseg”程序包进行分词练习。我也忍不住进行了一次实验。
首先,肯定是装程序包了,个人感觉是废话,纯凑字数。

http://blog.sina.com.cn/s/blog_70f6320901017int.html


http://f.dataguru.cn/thread-46051-1-1.html
http://f.dataguru.cn/forum.php?mod=viewthread&tid=46051
http://f.dataguru.cn/forum.php?mod=viewthread&tid=114967
http://f.dataguru.cn/forum.php?mod=viewthread&tid=19179
等等吧,觉得看代码有点难度。


第二部分


tm包是R语言中为文本挖掘提供综合性处理的package,进行操作前载入tm包,vignette命令可以让你得到相关的文档说明



library(tm)
vignette("tm")

#首先要读取文本,本次操作所用的文本是tm包自带的20个XML格式文本,存放在library\tm\texxts\crude文件夹中。用Corpus命令读取文本并生成语料库文件


reut21578 <- system.file("texts", "crude", package = "tm")
reuters <- Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML))

#下一步用tm_map命令对语料库文件进行预处理,将其转为纯文本并去除多余空格,转换小写,去除常用词汇、合并异形同意词汇

reuters <- tm_map(reuters, as.PlainTextDocument)
reuters <- tm_map(reuters, stripWhitespace)
reuters <- tm_map(reuters, tolower)
reuters <- tm_map(reuters, removeWords, stopwords("english"))
tm_map(reuters, stemDocument)

#利用DocumentTermMatrix将处理后的语料库进行断字处理,生成词频权重矩阵

dtm <- DocumentTermMatrix(reuters)

#部分矩阵内容可通过inspect来观察

inspect(dtm[1:5, 100:105])

Docs abdul-aziz ability able abroad, abu accept
127 0 0 0 0 0 0
144 0 2 0 0 0 0
191 0 0 0 0 0 0
194 0 0 0 0 0 0
211 0 0 0 0 0 0

如果需要考察多个文档中特有词汇的出现频率,可以手工生成字典,并将它作为生成矩阵的参数

(d <- Dictionary(c("prices", "crude", "oil")))
inspect(DocumentTermMatrix(reuters, list(dictionary = d)))

因为生成的矩阵是一个稀疏矩阵,再进行降维处理,之后转为标准数据框格式

dtm2 <- removeSparseTerms(dtm, sparse=0.95)
data <- as.data.frame(inspect(dtm2))

再之后就可以利用R语言中任何工具加以研究了,下面用层次聚类试试看
先进行标准化处理,再生成距离矩阵,再用层次聚类

data.scale <- scale(data)
d <- dist(data.scale, method = "euclidean")
fit <- hclust(d, method="ward")

绘制聚类图
plot(fit)
可以看到在20个文档中,489号和502号聚成一类,与其它文档区别较大。


可执行代码:

library(tm)
vignette("tm")

reut21578 <- system.file("texts", "crude", package = "tm")
reuters <- Corpus(DirSource(reut21578), readerControl = list(reader = readReut21578XML))

reuters <- tm_map(reuters, as.PlainTextDocument)
reuters <- tm_map(reuters, stripWhitespace)
reuters <- tm_map(reuters, tolower)
reuters <- tm_map(reuters, removeWords, stopwords("english"))
tm_map(reuters, stemDocument)

dtm <- DocumentTermMatrix(reuters)

inspect(dtm[1:5, 100:105])

dtm2 <- removeSparseTerms(dtm, sparse=0.95)
data <- as.data.frame(inspect(dtm2))

data.scale <- scale(data)
d <- dist(data.scale, method = "euclidean")
fit <- hclust(d, method="ward")

plot(fit)










目前已有很多精心设计、维护良好且广泛支持的与机器学习相关的R程序包。在我们要进行的案例研究中,涉及的程序包主要用于:处理空间数据、进行文本分析、分析网络拓扑等,还有些程序包用于与网络API进行交互,当然还有其他很多功能,不胜枚举。因此,我们的任务很大程度上会依赖内置在这些程序包的函数功能。
加载R程序包很简单。实现加载的两个函数是:library和require。两者之间存在细微差别,在本书中,主要差别是:后者会返回一个布尔值(TRUE或FALSE)来表示是否加载成功。例如,在第6章中,我们会用到tm程序包来分词。要加载该程序包,我们既可以用library也可以用require。在下面所举例子中,我们用library来加载tm包,用require来加载XML包,再用print函数来显示require函数的返回值。可以看到,返回的布尔值是“TRUE”,可见XML包加载成功了。
library(tm)
print(require(XML))
#[1] TRUE

假如XML包还未安装成功,即require函数返回值为“FALSE”,那么我们在调用之前仍需先安装成功这个包。
注意: 如果你刚安装成功R环境,那么你还需要安装较多的程序包才能完成本书的所有案例研究。
在R环境中安装程序包有两种方法:可以用图形用户界面进行安装,也可以用R控制台中的install.packages函数来安装。考虑到本书目标读者的水平,我们在本书的案例研究中会全部采用R控制台进行交互,但还是有必要介绍一下怎么用图形用户界面安装程序包。在R应用程序的菜单栏上,找到Packages & Data→Package Installer(程序包→安装程序包),点击之后弹出如图1-4所示的窗口。从程序包资源库的下拉列表中选择CRAN(binaries)(CRAN(二进制))或者CRAN(sources)(CRAN(源代码)),点击Get List(获取列表)按钮,加载所有可安装的程序包,最新的程序包版本可以从CRAN (sources)(CRAN(源代码))资源库中获取。如果你的计算机上已经安装了所需的编译器,我们推荐用源代码安装。接着,选择要安装的包,然后点击Install Selected(安装所选包),即可安装。


相比而言,用install.packages函数来安装是一种更佳的方法,因为它在安装方式和安装路径上更为灵活。这种方法的主要优势之一就是既可以用本地的源代码,也可以用CRAN上的源代码来安装。虽然以下这种情况不太常见,但仍然有可能会需要。有时你可能要安装一些CRAN上还未发布的程序包,比如你要将程序包更新到测试版本,那么你必须用源代码进行安装:
install.packages("tm", dependencies=TRUE)
setwd("~/Downloads/")
install.packages("RCurl_1.5-0.tar.gz", repos=NULL, type="source")

第一行代码中,我们用默认参数从CRAN上安装了tm程序包。tm程序包用于文本挖掘,在第3章将用它来对电子邮件文本进行分类。install.packages中一个很有用的参数是suggests,这个参数默认值是FALSE,如果设置为TRUE,就会在安装过程中通知install.packages函数下载并安装初始安装过程所依赖的程序包。为了得到最佳实践,我们推荐将此参数值一直设置为TRUE,当R应用程序上没有任何程序包的情况下更要如此。
同样还有另一种安装方法,那就是直接使用源代码的压缩文件进行安装。在上一个例子中,我们用作者网站上的源代码安装了RCurl程序包。用setwd函数确保R的工作路径已设置为保存源代码的目录,然后就可以简单地执行前面的命令从源代码安装了。注意,这里需要改动两个参数。首先,我们必须设置repos=NULL来告诉函数不要使用CRAN中任意一个资源库,然后要设置type="source"来告诉函数使用源代码安装。
表1-2:本书中用到的程序包
名称 网址 作者 简介及用法
arm  http://cran.r-project.org/  Andrew Gelman等 用于构建多水平/层次回归模型的
web/packages/arm/  程序包
ggplot2 http://had.co.nz/ggplot2/ Hadley Wickham 是图语法在R中的实现,是创建    高质量图形的首选程序包
glmnet http://cran.r-project.org/ Jerome Friedman、  包含Lasso和elastic-net的正则化
web/packages/glmnet/ Trevor Hastie和 广义线性模型
index.html Rob Tibshirani
igraph http://igraph.sourceforge Gabor Csardi 简单的图及网络分析程序,用于
.net/  模拟社交网络  
lme4 http://cran.r-project.org/ Douglas Bates、 提供函数用于创建线性及广义混
web/packages/lme4/ Martin Maechler和 合效应模型
  Ben Bolker
lubridate https://github.com/ Hadley Wickham 提供方便的函数,使在R环境中
hadley/lubridate  处理日期更为容易
RCurl http://www.omegahat. Duncan Temple Lang 提供了一个与libcurl库中HTTP协
org/RCurl/  议交互的R接口,用于从网络中    导入原始数据
reshape http://had.co.nz/plyr/ Hadley Wickham 提供一系列工具用于在R中处     理、聚合以及管理数据
RJSONIO http://www.omegahat. Duncan Temple Lang 提供读写JSON(JavaScript
org/RJSONIO/  Object Notation)数据的函数,    用于解析来自网络API的数据
tm http://www.spatstat.org/ Ingo Feinerer 提供一系列文本挖掘函数,用于
spatstat/  处理非结构化文本数据
XML http://www.omegahat. Duncan Temple Lang 用于解析XML及HTML文件,以
org/RSXML/  便从网络中提取结构化数据

前文已经提到过,在本书中我们会使用一些程序包。表1-2列出了本书的案例研究所用到的所有程序包,包括对其用途的简单介绍,以及查看每个包详细信息的链接。安装所需程序包的数量不少,为了加快安装过程,我们创建了一个简短的脚本来检查每个必需的程序包是否已安装,若没有安装,它会通过CRAN进行安装。要运行该脚本,先用setwd函数将工作目录设置为本章代码所在的文件夹,再执行source命令,如下所示:
source("package_installer.R")

如果你还没有安装过程序包,系统可能要求你选择一个CRAN的库。一旦设置完成,脚本就开始运行,你就可以看到所有需要安装的程序包的安装进度。现在,我们就要用R开始机器学习之旅了!在我们开始案例分析之前,我们仍需要回顾一些常用的R相关的函数与操作。




想要调用write.xlsx()这个函数,在安装了xlsx,xlsxjars和rJava这几个包后,还是不行,一输入library(xlsx)就报错
两方面
1.是否安装JAVA,如果已经安装请检查JAVA是否符合R的版本。建议从新安装下JAVA:http://www.java.com/en/download/manual.jsp
2.不工作,在加载包之前,手动配置下java的位置
Sys.setenv(JAVA_HOME='C:\Program Files\Java\jre7') # for 64-bit version
Sys.setenv(JAVA_HOME='C:\Program Files (x86)\Java\jre7') # for 32-bit version library(rJava)
------
问题一般能解决。

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2015-1-21 23:05:02

windows中创建R程序包简明指南


R软件中CRAN镜像迄今已经收藏了2300多个程序包,几乎涉及了统计编程的所有领域,每个程序包既有源代码,又有编译好的Windows或者MacOS平台下的程序。在编写R函数较多时,最好将其制作成程序包,便于管理和使用。如果愿意,还可以将R程序包提交到CRAN,与世界各地的用户分享成果。

在Windows环境下如何编写R程序包?也就是生成供linux环境编译运行的tar.gz文件,也生成供windows下使用的.zip文件?这一过程并不复杂,但要下载一些工具软件,按照相应的步骤填写相应的“表格”,继而在控制台中输入一些指令。如果你是R的用户,相信这些不应该陌生了。

在Windows下编写R程序包通常包括以下几步:

(1)工具软件Rtools的安装和备选软件的安装。

(2)r脚本的准备,也就是用来生成程序包的函数脚本。

(3)利用R中自带的package.skeleton()函数,生成制作包所需要的Description 文件和帮助文件.rd 。

(4)按要求填写生成的Description 文件和帮助文件.rd

(5)在windows cmd的命令行中输入相应的命令,生成zip文件或者.tar.gz,并进行相应的检查。

下面我们来创建最简单的一个R程序包,其中只包含一个函数。

一 工具软件安装和配置

制作r包的工具软件包括Rtools,HTML编译器(R2.10后不需要HTML编译器),MikTeX 或 Ctex (如果不想获得pdf手册,则不需要安装)

1 工具软件安装

(1)Rtools(制作R包的主要工具)

Rtools是在windows下制作R包的一系列工具,其中包括

1) CYGWIN 在Windows下模拟UNIX环境

2) MinGW编译器,可用来编译C和Fortran语言。

3) Perl

下载地址:  http://www.murdoch-sutherland.com/Rtools/

(2) 微软HTML编译器(备选):

用来从源文件生成HTML格式的帮助文件(2.10以后的版本则不需要)

下载地址:http://go.microsoft.com/fwlink/?LinkId=14188

(3) MikTeX 或CteX(备选)

用来生成PDF格式的帮助文件

下载地址:http://www.miktex.org/      www.ctex.org/  

分别按照要求安装好。

2 设置文件启动路径:

设置启动路径的目的是在cmd命令行可以直接调用Rtools等相应软件。

右键点击:

我的电脑>属性>高级>环境变量>系统变量  PATH一项,点击“编辑”,检查是否具有以下路径。通常软件在安装时已经自动配置好了启动路径。如果没有,需要手工添加:

c:\Rtools\bin;c:\Rtools\perl\bin;c:\Rtools\MinGW\bin; C:\CTEX\MiKTeX\miktex\bin;C:\CTEX\CTeX\ctex\bin;C:\CTEX\CTeX\cct\bin;C:\CTEX\CTeX\ty\bin; C:\Program Files\R\R-2.11.0\bin\;


图1 设置启动路径


二 R脚本的准备

假如现在我们已经有了一个编好的R函数freq,用来计算物种出现的相对频度,存成了r脚本的格式,文件名为freq.r

其内容如下所示

##############################################

freq <-

function(matr){

   matr <- as.matrix(matr)

   if(!is.matrix(matr)){

       stop("The input data must be matrix!\n")

   }

   if(any(is.na(matr))){

       matr <- na.omit(matr)

       print(paste("NA found in matrix, and have been removed\n"))

   }

   matr[matr>1] <- 1

   result <- apply(matr, 2, sum)/nrow(matr)

   return(result)

}

##############################################

下面是用R自带的package.skeletons()函数生成R程序包的框架

三 R包框架的准备

1 生成准备文件

登陆R :开始>所有程序>R>R.2.9.0

(1)清除内存中的对象,目的删除R内存中所有不需要的数据或函数:

rm(list=ls())

(2)设定工作目录,这里设定为 c:/pa

setwd("c:/pa")

(3)先用source()函数将r脚本中的函数读取。

如果要创建的R包中有很多函数,则建议先将各函数存在一个脚本文件,再用source()函数读取该脚本中的各函数,并将需要的数据读取到内存中。用package.skeleton (name="packname", list = ls())生成相应的包框架。

这里,我们要创建一个名为freq的R包。则输入以下命令:

package.skeleton(name="freq", list = ls())

此时,R控制台中显示

> package.skeleton(name="freq", list = ls())

Creating directories ...

Creating DESCRIPTION ...

Creating Read-and-delete-me ...

Saving functions and data ...

Making help files ...

Done.

Further steps are described in './freq/Read-and-delete-me'.

>

可以看到c:/pa文件夹下新出现了一个freq文件夹

该文件夹下的内容就是R包的框架,包括Read-and-delete-me,DESCRIPTION文件,r文件夹,man文件夹,只要按要求将其填写完整,再进行相应的编译即可。

Read-and-delete-me 包括如何创建R包

DESCRIPTION 是对R包的简要介绍

r文件夹中存放的是.r文件,即各函数的源代码

man文件夹下存放的是Rd文件,也就是R帮助的源代码

首先查看Read-and-delete-me文件

文件内容如下:

####################################################################################

* Edit the help file skeletons in 'man', possibly combining help files for multiple functions.

* Put any C/C++/Fortran code in 'src'.

* If you have compiled code, add a .First.lib() function in 'R' to load the shared library.

* Run R CMD build to build the package tarball.

* Run R CMD check to check the package tarball.

Read "Writing R Extensions" for more information.

####################################################################################

大致意思如下:

可以man文件夹下编辑帮助文件

C/C++/Fortran 的源代码应该放入src文件夹下

需要在登录时载入包

可以运行R CMD建立和检查相应的包

注:这里的R CMD说的是在Linux的终端输入的命令,实际上在Windows环境中应该输入 Rcmd

Rcmd build packname 给源程序打包,

Rcmd build --binary packname建立zip包。

Rcmd check packname 检查程序包的错误。

查看过该文件之后,需要将其删除。

2 编辑Description文件和rd文件

(1) Description文件的编辑

按照提示,填好各项

Description文件是该程序包的简介,这一格式是Debian Linux的作者发明的。

内容如下:

红色部分是需要手工编辑的。

需要特别注意的是,本程序包的例子中使用了vegan程序包的数据,则应该在Description文件中加入Suggests:vegan, 否则在Rcmd check中将不能通过。

如果程序包中的R函数引用vegan程序包的函数,则需要在Description文件中加入 Depends:vegan 这样在该程序包被载入的同时,保证vegan程序包也被载入。

####################################

Package: freq

Type: Package

Title: Calculate relative frequency

Version: 1.0

Date: 2010-05-20

Author: Jinlong Zhang

Maintainer: Jinlong Zhang

Description: Calculate relative frequency for species matrix.

License: GPL-2

LazyLoad: yes

Suggests: vegan

#####################################

(2)man文件夹中.rd文件编辑

man文件夹中包含两个文件 freq.Rd和freq-package.Rd,分别是对freq()函数和freq包的介绍,下面逐项填写:

Rd文件的格式与Tex的格式很像,如果有LaTex的基础,则会毫不费力。如果没有,则需要仔细琢磨一下了。Rd文件的项目中不能留空,否则在检查时会显示警告。其中title是必须填写的内容。同时要注意:在Rd文件中,不要出现非ASCII码字符,否则在Rcmd check中将不能通过。

freq.Rd 文件内容:红色的为手工输入的部分,原文件中%后的为注释,可以忽略

#################################################################

\name{freq}

\alias{freq}

\title{

Species relative frequency

}

\description{

This function calculates the species relative frequency which equals to the numbers of occupied plots partitioned by the total number of plots for each species.

}

\usage{

freq.calc(matr)

}

\arguments{

\item{matr}{ The standard species matrix

}

}

\details{

The input data is a standard species matrix with rows for plots and column for species.

}

\value{

Returns a vector that contains relative frequency for each species included in the input matrix.

}

\references{

None

}

\author{

Jinlong Zhang \email{jinlongzhang01@gmail.com}

}

\examples{

library(vegan)

data(BCI)

freq(BCI)

}

\keyword{ frequency }

\keyword{ species }

######################################################################

freq-package.Rd中帮助文件的填法与freq.Rd的类似。

四 通过cmd创建R包

在Windows 开始> 运行> cmd

键入 cd c:\pa\   将工作目录转移到c:/pa下

键入 Rcmd INSTALL --build freq  制作windows zip包 (编者按2013-12-06: 请注意, 新版的Rcmd, 制作Windows Binary程序包, 已经更改为 Rcmd INSTALL --build  )

键入 Rcmd build freq  制作linux平台下可运行的tar.gz包

命令运行完之后可以发现,在c:/pa/文件夹下分别生成了freq.zip和freq_1.0.tar.gz压缩包。

键入 Rcmd check freq 对freq_1.0.tar.gz代码的各项内容进行检查。

键入 Rcmd Rd2pdf freq 生成pdf格式的命令手册。

图4在cmd中输入Rcmd build freq,获得相应的tar.gz程序包

如果作者希望将自己制作的Package上传到CRAN,则必须要通过Rcmd check,并且其中不能有任何错误或警告。


二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-1-21 23:12:09
需要一个Snowball包,但是去官网上下不到这个包,只有一个SnowballC的包,请问这两个包一样吗?
还有,我想用到Snowball里的Snowballstemmer函数,这个函数在SnowballC包中没有,这种情况该怎么办呢?还希望有人能帮忙解决一下,谢谢~

您好,这个包跟snowball包是一样的吗?因为我想做文本挖掘,要用到snowball包里的一个snowballstemmer函数,安装了snowballC包以后无法用这个函数啊

差不多,后者用c写的。wordStem这个函数类似那个
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-1-22 00:05:49
rJava

About rJava
GIT access
Download/Files
News
Check results
Package R docs


About rJava
What is rJava?
        

rJava is a simple R-to-Java interface. It is comparable to the .C/.Call C interface. rJava provides a low-level bridge between R and Java (via JNI). It allows to create objects, call methods and access fields of Java objects from R.

rJava release versions can be obtained from CRAN - usually install.packages("rJava") in R will do the trick. The current development version can be downloaded from thefiles section.

In a sense the inverse of rJava is JRI (Java/R Interface) which provides the opposite direction - calling R from Java. JRI is now shipped as a part of the rJava package, although it still can be used as a separate entity (especially for development). Currently rJava is used as a part of JGR, iPlots and JavaGD software/packages.

Please report any bugs or wishes related to rJava or JRI using Issues on GitHub.

What's new?
rJava source repository is now on GitHub and that is also the place to report bugs. The main page and builds are still on RForge.net.

2012/12/23 - rJava 0.9-6 released. Fixes Java parameter issue introduced in 0.9-5 on systems with headless mode (e.g. OS X).

2011/06/22 - rJava 0.9-0 released. This is a major upgrade that changes behavior of array references in low-level calls back to early 0.8 state as intedended by the original design. It should be more consistent now. We have had rJava 0.9-0 in RC state for a long time so hopefully all developers relying on rJava have checked their packages. For the full list of fixes and changes see NEWS.

2009/10/27 - rJava 0.8-0 released. Many new features mostly thanks to Romain Francois -- check the NEWS file for details.

2009/08/22 - rJava 0.7-0 released. Recommended update (includes bugfixes to fields handling, new support for with(), method/field auto-completion and more forgiving $operator), includes JRI 0.5-0.


2008/09/22 - rJava 0.6-0 released. Adds support for Java serialization and R-side cache which enables Java objects to become persistent across sessions. See ?.jcache in R (or the online help) for details.

2007/11/05 - rJava 0.5-1 released. Fixes issues with Windows; and minor updates (see NEWS).

2007/08/22 - rJava 0.5-0 released. This is a major update featuring many new features and bugfixes. It sports a new custom class loader, much improved (and faster) field support, integration of all native Java types, automatic fall-back to static methods, infrastructure for writing Java packages easily (see .jpackage), support for custom convertors and call-backs. Please read the NEWS file for details.

2007/02/27 - rJava 0.4-14 (update is recommended to all users due to memory leak fixes), please use CRAN to get the latest release. The current development version isrJava 0.5-0 (available from here - see SVN access and download on the left). It is under heavy construction right now with many new features, so feel free to test-drive it if you want to be on the bleeding edge (it is a bit chatty as some debugging output is still enabled). Some of the highlights are memory profiler, intelligent class loader, easy Java package integration and callback support.


Installation

First, make sure you have JDK 1.4 or higher installed (some platforms require hgher version see R Wiki). On unix systems make sure that R was configured with Java support. If not, you can re-configure R by using R CMD javareconf (you may have to prepend sudo or run it as root depending on your installation - see R-ext manual A.2.2 for details). On Windows Java is detected at run-time from the registry.

rJava can be installed as any other R package from CRAN using install.packages('rJava'). See the files section in the left menu for development versions.

JRI is only compiled if supported, i.e. if R was configured as a framework or with --enable-R-shlib.

Documentation

If you want to run R within a Java application, please see the JRI pages for details. rJava allows you to use R code to create Java objects, call Java methods and pass data between R and Java. Most functions are already documented by the corresponding help pages in R, but here is a very basic crashcourse:

The following gives a quick guide to the use of rJava. If you have questions about installation, please visit the R Wiki - rJava package.

Let's start with some low-level examples. Remember, this is all essentially a JNI interface, so you may encounter new things if you used Java from high level only.

library(rJava)
.jinit() # this starts the JVM
s <- .jnew("java/lang/String", "Hello World!")

Ok, here we have our first Java object. It is equivalent to the Java line of (pseudo) code s = new java.lang.String("Hello World!"); The class name may look strange to casual Java users, but this is how JNI class names look like - slashes instead of dots. Also note that you must always refer to the full class name, because there is no import ... facility.

Next, we will call a simple method:

.jcall(s,"I","length")
[1] 12

This is equivalent to s.length(), but it is a bit more complex than expected. The main reason is that in JNI when looking up methods, you must supply the return type as well. So here we say that we want to call a method length on the object s with the return type I which means int. The table of JNI types is as follows:
IintegerDdouble (numeric)Jlong (*)Ffloat (*)
VvoidZbooleanCchar (integer)Bbyte (raw)
L<class>; Java object of the class <class> (e.g. Ljava/lang/Object;)
[<type> Array of objects of type <type> (e.g. [D for an array of doubles)
Not all types or combinations are supported, but most are. Note that the Java type short was sacrificed for greater good (and pushed to T), namely S return type specification in.jcall is a shortcut for Ljava/lang/String;. When passing parameters to methods, R objects are automatically converted where possible:.jcall(s,"I","indexOf","World")
[1] 6

This is equivalent to s.indexOf("World") and the string parameter "World" is automatically converted into java.lang.String object and passed to Java. Note that you can equally pass Java object references as well. But before we do that, let us see how you can find a method signature if you're not sure. Let's say we want to know more about theconcat method for our object s:.jmethods(s,"concat")
[1] "public java.lang.String java.lang.String.concat(java.lang.String)"

We see that concat expects a string and returns a string, so we can just concatenate the string itself if we want:.jcall(s,"Ljava/lang/String;","concat",s)
[1] "Hello World!Hello World!"

We are telling JNI that the return value will be of an object of the class java.lang.String (there is a convenience shotcut of S as well). The parameter evalString is by default set to TRUE (see help for .jcall), so you don't get a reference to the new string, but the actual contents instead.

There is a simple function .jstrVal that returns the content of a string reference or calls toString() method and returns the string:

print(s)
[1] "Java-Object: Hello World"
.jstrVal(s)
[1] "Hello World"

At the end, let us create some windows and buttons using AWT:f <- .jnew("java/awt/Frame", "Hello")
b <- .jnew("java/awt/Button", "OK")
.jcall(f, "Ljava/awt/Component;", "add", .jcast(b, "java/awt/Component"))
.jcall(f,, "pack")
.jcall(f,, "setVisible", TRUE)

This should show a simple AWT window with an OK button inside. Note that we need to cast the button b into java/awt/Component, because the method signature needs to be matched precisely (call .jmethods(f, "add") to see the expected type). You can get rid of this window by calling .jcall(f,,"dispose").

Finally a note about the $ convenience operator. It provides an experimental, but simple way of writing code in Java style at the cost of speed. Taking the String example, you can achieve the same with:

s$length()
[1] 12
s$indexOf("World")
[1] 6

You simply use $ instead of .. This interface uses Java reflection API to find the correct method so it is much slower and may not be right (works for simple examples but may not for more complex ones). For now its use is discouraged in programs as it may change in the future. However, feel free to test it and report any issues with it.
(*) Note that there is no long or float type in R. Both are converted into numeric in R. In order to pass a numeric as either of the two back to Java, you will neeed to use.jfloat or .jlong functions to mark an object as belonging to one of those classes.

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-1-22 00:07:06
rJava

About rJava
GIT access
Download/Files
News
Check results
Package R docs


About rJava
What is rJava?
        

rJava is a simple R-to-Java interface. It is comparable to the .C/.Call C interface. rJava provides a low-level bridge between R and Java (via JNI). It allows to create objects, call methods and access fields of Java objects from R.

rJava release versions can be obtained from CRAN - usually install.packages("rJava") in R will do the trick. The current development version can be downloaded from thefiles section.

In a sense the inverse of rJava is JRI (Java/R Interface) which provides the opposite direction - calling R from Java. JRI is now shipped as a part of the rJava package, although it still can be used as a separate entity (especially for development). Currently rJava is used as a part of JGR, iPlots and JavaGD software/packages.

Please report any bugs or wishes related to rJava or JRI using Issues on GitHub.

What's new?
rJava source repository is now on GitHub and that is also the place to report bugs. The main page and builds are still on RForge.net.

2012/12/23 - rJava 0.9-6 released. Fixes Java parameter issue introduced in 0.9-5 on systems with headless mode (e.g. OS X).

2011/06/22 - rJava 0.9-0 released. This is a major upgrade that changes behavior of array references in low-level calls back to early 0.8 state as intedended by the original design. It should be more consistent now. We have had rJava 0.9-0 in RC state for a long time so hopefully all developers relying on rJava have checked their packages. For the full list of fixes and changes see NEWS.

2009/10/27 - rJava 0.8-0 released. Many new features mostly thanks to Romain Francois -- check the NEWS file for details.

2009/08/22 - rJava 0.7-0 released. Recommended update (includes bugfixes to fields handling, new support for with(), method/field auto-completion and more forgiving $operator), includes JRI 0.5-0.


2008/09/22 - rJava 0.6-0 released. Adds support for Java serialization and R-side cache which enables Java objects to become persistent across sessions. See ?.jcache in R (or the online help) for details.

2007/11/05 - rJava 0.5-1 released. Fixes issues with Windows; and minor updates (see NEWS).

2007/08/22 - rJava 0.5-0 released. This is a major update featuring many new features and bugfixes. It sports a new custom class loader, much improved (and faster) field support, integration of all native Java types, automatic fall-back to static methods, infrastructure for writing Java packages easily (see .jpackage), support for custom convertors and call-backs. Please read the NEWS file for details.

2007/02/27 - rJava 0.4-14 (update is recommended to all users due to memory leak fixes), please use CRAN to get the latest release. The current development version isrJava 0.5-0 (available from here - see SVN access and download on the left). It is under heavy construction right now with many new features, so feel free to test-drive it if you want to be on the bleeding edge (it is a bit chatty as some debugging output is still enabled). Some of the highlights are memory profiler, intelligent class loader, easy Java package integration and callback support.


Installation

First, make sure you have JDK 1.4 or higher installed (some platforms require hgher version see R Wiki). On unix systems make sure that R was configured with Java support. If not, you can re-configure R by using R CMD javareconf (you may have to prepend sudo or run it as root depending on your installation - see R-ext manual A.2.2 for details). On Windows Java is detected at run-time from the registry.

rJava can be installed as any other R package from CRAN using install.packages('rJava'). See the files section in the left menu for development versions.

JRI is only compiled if supported, i.e. if R was configured as a framework or with --enable-R-shlib.

Documentation

If you want to run R within a Java application, please see the JRI pages for details. rJava allows you to use R code to create Java objects, call Java methods and pass data between R and Java. Most functions are already documented by the corresponding help pages in R, but here is a very basic crashcourse:

The following gives a quick guide to the use of rJava. If you have questions about installation, please visit the R Wiki - rJava package.

Let's start with some low-level examples. Remember, this is all essentially a JNI interface, so you may encounter new things if you used Java from high level only.

library(rJava)
.jinit() # this starts the JVM
s <- .jnew("java/lang/String", "Hello World!")

Ok, here we have our first Java object. It is equivalent to the Java line of (pseudo) code s = new java.lang.String("Hello World!"); The class name may look strange to casual Java users, but this is how JNI class names look like - slashes instead of dots. Also note that you must always refer to the full class name, because there is no import ... facility.

Next, we will call a simple method:

.jcall(s,"I","length")
[1] 12

This is equivalent to s.length(), but it is a bit more complex than expected. The main reason is that in JNI when looking up methods, you must supply the return type as well. So here we say that we want to call a method length on the object s with the return type I which means int. The table of JNI types is as follows:
IintegerDdouble (numeric)Jlong (*)Ffloat (*)
VvoidZbooleanCchar (integer)Bbyte (raw)
L<class>; Java object of the class <class> (e.g. Ljava/lang/Object;)
[<type> Array of objects of type <type> (e.g. [D for an array of doubles)
Not all types or combinations are supported, but most are. Note that the Java type short was sacrificed for greater good (and pushed to T), namely S return type specification in.jcall is a shortcut for Ljava/lang/String;. When passing parameters to methods, R objects are automatically converted where possible:.jcall(s,"I","indexOf","World")
[1] 6

This is equivalent to s.indexOf("World") and the string parameter "World" is automatically converted into java.lang.String object and passed to Java. Note that you can equally pass Java object references as well. But before we do that, let us see how you can find a method signature if you're not sure. Let's say we want to know more about theconcat method for our object s:.jmethods(s,"concat")
[1] "public java.lang.String java.lang.String.concat(java.lang.String)"

We see that concat expects a string and returns a string, so we can just concatenate the string itself if we want:.jcall(s,"Ljava/lang/String;","concat",s)
[1] "Hello World!Hello World!"

We are telling JNI that the return value will be of an object of the class java.lang.String (there is a convenience shotcut of S as well). The parameter evalString is by default set to TRUE (see help for .jcall), so you don't get a reference to the new string, but the actual contents instead.

There is a simple function .jstrVal that returns the content of a string reference or calls toString() method and returns the string:

print(s)
[1] "Java-Object: Hello World"
.jstrVal(s)
[1] "Hello World"

At the end, let us create some windows and buttons using AWT:f <- .jnew("java/awt/Frame", "Hello")
b <- .jnew("java/awt/Button", "OK")
.jcall(f, "Ljava/awt/Component;", "add", .jcast(b, "java/awt/Component"))
.jcall(f,, "pack")
.jcall(f,, "setVisible", TRUE)

This should show a simple AWT window with an OK button inside. Note that we need to cast the button b into java/awt/Component, because the method signature needs to be matched precisely (call .jmethods(f, "add") to see the expected type). You can get rid of this window by calling .jcall(f,,"dispose").

Finally a note about the $ convenience operator. It provides an experimental, but simple way of writing code in Java style at the cost of speed. Taking the String example, you can achieve the same with:

s$length()
[1] 12
s$indexOf("World")
[1] 6

You simply use $ instead of .. This interface uses Java reflection API to find the correct method so it is much slower and may not be right (works for simple examples but may not for more complex ones). For now its use is discouraged in programs as it may change in the future. However, feel free to test it and report any issues with it.
(*) Note that there is no long or float type in R. Both are converted into numeric in R. In order to pass a numeric as either of the two back to Java, you will neeed to use.jfloat or .jlong functions to mark an object as belonging to one of those classes.
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2015-1-22 00:36:44
最近出门在外,很少来论坛~
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

点击查看更多内容…
相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群