R-Tips: Q-Tips for R

Distance Between Sounds

> stringdist("ham", "jam", method ="soundex")
[1] 1
> stringdist("ham", "banana", method ="soundex")
[1] 1

This is pretty bad. Character-wise cosine distance or some other distance based on Soundex codes may be better. Better still may be distance matrices based on errors in naive translation.

same same

"1" == 1
[1] TRUE

"01" == 1

identical("1", 1)

grepl NA

grepl("senate", c(NA, "senate"))

str_detect(c(NA, "senate"), fixed("senate", ignore_case=T))
[1]   NA TRUE

vignettes about knitr
When building package vignettes, use


rather than


rmarkdown changes all words with http:// to links, unless of course you want that to happen.

Why StringsAsFactors?

object.size(as.factor(c(rep("1",1000), rep("2", 1000))))
8512 bytes
object.size(c(rep("1",1000), rep("2", 1000)))
16136 bytes

Rprofile Customizations from RScript
To access customizations in RProfile with RScript, source .Rprofile within the R script

Missing weights, weighted.mean
There are instances where sampling weights are not only unknown, but in fact, cannot be known (unless one makes certain unsavory assumptions). Under those circumstances, weights for certain respondents can be ‘missing’. Typically there the strategy is to code those weights as 0. However if you retain those as NA, weighted.mean etc. is wont to give you NA as an answer, even if you set na.rm=T.

To get non-NA answers, set NAs to zero or estimate mean over respondents with non-missing weights.

Find more here.