R-Tips: Q-Tips for R

Distance Between Sounds


library(stringdist)
> stringdist("ham", "jam", method ="soundex")
[1] 1
> stringdist("ham", "banana", method ="soundex")
[1] 1

This is pretty bad. Character-wise cosine distance or some other distance based on Soundex codes may be better. Better still may be distance matrices based on errors in naive translation.

same same


"1" == 1
[1] TRUE

"01" == 1
[1] FALSE

identical("1", 1)
[1] FALSE

grepl NA


grepl("senate", c(NA, "senate"))
[1] FALSE  TRUE

str_detect(c(NA, "senate"), fixed("senate", ignore_case=T))
[1]   NA TRUE

vignettes about knitr
When building package vignettes, use

%\VignetteEngine{knitr::knitr}

rather than

%\VignetteEngine{knitr::rmarkdown}

rmarkdown changes all words with http:// to links, unless of course you want that to happen.

Why StringsAsFactors?


object.size(as.factor(c(rep("1",1000), rep("2", 1000))))
8512 bytes
object.size(c(rep("1",1000), rep("2", 1000)))
16136 bytes

Rprofile Customizations from RScript
To access customizations in RProfile with RScript, source .Rprofile within the R script

Missing weights, weighted.mean
There are instances where sampling weights are not only unknown, but in fact, cannot be known (unless one makes certain unsavory assumptions). Under those circumstances, weights for certain respondents can be ‘missing’. Typically there the strategy is to code those weights as 0. However if you retain those as NA, weighted.mean etc. is wont to give you NA as an answer, even if you set na.rm=T.

To get non-NA answers, set NAs to zero or estimate mean over respondents with non-missing weights.

More
Find more here.