venerdì 27 aprile 2007

What's the best way to upgrade R on Windows©?

Copied and pasted from R for Windows FAQ - Version for R-2.5.0 - FAQ 2.8:

That's a matter of taste. For most people the best thing to do is to uninstall R (see the previous Q), install the new version, copy any installed packages to the library folder in the new installation, run update.packages() in the new R (`Update packages...' from the Packages menu, if you prefer) and then delete anything left of the old installation. Different versions of R are quite deliberately installed in parallel folders so you can keep old versions around if you wish.

Upgrading from R 1.x.y to R 2.x.y is special as all the packages need to be reinstalled. Rather than copy them across, make a note of their names and re-install them from CRAN.

giovedì 26 aprile 2007

How to Superimpose Histograms

Function inspired by the code of Martin Maechler found on the R-List at http://tolstoy.newcastle.edu.au/R/help/06/06/30059.html
superhist2pdf <- function(x, filename = "super_histograms.pdf",
dev = "pdf", title = "Superimposed Histograms", nbreaks ="Sturges") {
junk = NULL
grouping = NULL
for(i in 1:length(x)) {
junk = c(junk,x[[i]])
grouping <- c(grouping, rep(i,length(x[[i]]))) }
grouping <- factor(grouping)
n.gr <- length(table(grouping))
xr <- range(junk)
histL <- tapply(junk, grouping, hist, breaks=nbreaks, plot = FALSE)
maxC <- max(sapply(lapply(histL, "[[", "counts"), max))
if(dev == "pdf") { pdf(filename, version = "1.4") } else{}
if((TC <- transparent.cols <- .Device %in% c("pdf", "png"))) {
cols <- hcl(h = seq(30, by=360 / n.gr, length = n.gr), l = 65, alpha = 0.5) }
else {
h.den <- c(10, 15, 20)
h.ang <- c(45, 15, -30) }
if(TC) {
plot(histL[[1]], xlim = xr, ylim= c(0, maxC), col = cols[1], xlab = "x", main = title) }
else { plot(histL[[1]], xlim = xr, ylim= c(0, maxC), density = h.den[1], angle = h.ang[1], xlab = "x") }
if(!transparent.cols) {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, density = h.den[j], angle = h.ang[j]) } else {
for(j in 2:n.gr) plot(histL[[j]], add = TRUE, col = cols[j]) }
invisible()
if( dev == "pdf") {
dev.off() }
}
# How to use the function:
d1 = rnorm(1:100)
d2 = rnorm(1:100) + 4
# the input object MUST be a list!
l1 = list(d1,d2)
superhist2pdf(l1, nbreaks="Sturges")

martedì 24 aprile 2007

Plotting multiple smooth lines on the same graph

d1 <- cbind(rnorm(100), rnorm(100,3,1))
d2 <- cbind(rnorm(100), rnorm(100,1,1))
plot(d1[,1], d1[,2], xlim=range(c(d1[,1], d2[,1])), ylim=range(c(d1[,2], d2[,2])), col="blue", xlab="X", ylab="Y")
points(d2[,1], d2[,2], col="red")
points(loess.smooth(d1[,1], d1[,2]), type="l", col="blue")
points(loess.smooth(d2[,1], d2[,2]), type="l", col="red")


venerdì 20 aprile 2007

How to join two arrays using their column names intersection

ar1 <- array(data = c(1:16), dim = c(4,4))
ar2 <- array(data = c(1:16), dim = c(4,4))
colnames(ar1) <- c("A","B","D","E")
colnames(ar2) <- c("C","A","E","B")
# get the common names between the matrices
same <- intersect(colnames(ar1), colnames(ar2))
# join them together
rbind(ar1[,same], ar2[,same])

giovedì 19 aprile 2007

Highlight overlapping area between two curves

This post was updated thanks to Blaž Triglav contribution.
p <- seq(0.2,1.4,0.01)
x1 <- dnorm(p, 0.70, 0.12)
x2 <- dnorm(p, 0.90, 0.12) 
plot(range(p), range(x1,x2), type = "n") 
lines(p, x1, col = "red", lwd = 4, lty = 2) 
lines(p, x2, col = "blue", lwd = 4)
polygon(c(p,p[1]), c(pmin(x1,x2),0), col = "grey")

Below an advanced example (Thanks to Blaž):
p <- seq(54,71,0.1)
x1 <- dnorm(p, 60, 1.5)
x2 <- dnorm(p, 65, 1.5)
par(mar=c(5.5, 4, 0.5, 0.5))
plot(range(p), range(x1,x2), type = "n", xlab="", ylab="", axes=FALSE)
box()
mtext(expression(bar(y)), side=1, line=1, adj=1.0, cex=2, col="black")
mtext("f"~(bar(y)), at=0.265, side=2, line=0.5, adj=1.0, las=2.0, cex=2, col="black")
axis(1, at=60, lab=mu[H[0]]~"=60", cex.axis=1.5, pos=-0.0165, tck=0.02)
axis(1, at=65, lab=mu[H[1]]~"=65", cex.axis=1.5, pos=-0.0165, tck=0.02)
axis(1, at=63.489, lab=y[k][","][zg]~"=63,489", pos=-0.035, tck=0.2)
axis(1, at=63.489, lab="", pos=-0.035, tck=-0.02)
lines(p, x1, col = "black")
lines(p, x2, col = "black")
line3 <- 63.5
polygon(c(line3, p[p<=line3], line3), c(0,x2[p<=line3],0), col = "grey23")
lines(p, x1, col = "black")
line <- 63.489
line1 <- 60
line2 <- 65
abline(v=line, col="black", lwd=1.5, lty = "dashed")
abline(v=line1, col="black", lwd=0.3, lty = "dashed")
abline(v=line2, col="black", lwd=0.3, lty = "dashed")
xt3 <- c(line,p[p>line],line)
yt3 <- c(0,x1[p>line],0)
polygon(xt3, yt3, col="grey")
legend("topright", inset=.05, c(expression(alpha), expression(beta)), fill=c("grey", "grey23"), cex=2)

mercoledì 18 aprile 2007

It's not R but It's VERY useful

On the chance that you are talking about replacing a word, say, 500 times in a large text file, you might be interested in sed.

Example (you must be in a unix environment with GNU free software):

sed 's/old_word/new_word/g' text_file > new_text_file

After which, new_text_file will contain the contents of text_file, except with the new_word values in place of the old_word values. For those interested in more detailed information, as usual, man (in this case man sed) is your friend.

martedì 17 aprile 2007

From Similarity To Distance matrix


# This function returns an object of class "dist"
sim2dist <- function(mx) as.dist(sqrt(outer(diag(mx), diag(mx), "+") - 2*mx)) # from similarity to distance matrix d.mx = as.matrix(d.mx) d.mx = sim2dist(d.mx) # The distance matrix can be used to visualize # hierarchical clustering results as dendrograms hc = hclust(d.mx) plot(hc)


See Multivariate Analysis (Probability and Mathematical Statistics) for the statistical theory.