Distance Between Sounds
library(stringdist)
> stringdist("ham", "jam", method ="soundex")
[1] 1
> stringdist("ham", "banana", method ="soundex")
[1] 1
This is pretty bad. Character-wise cosine distance or some other distance based on Soundex codes may be better. Better still may be distance matrices based on errors in naive translation.
same same
"1" == 1
[1] TRUE
"01" == 1
[1] FALSE
identical("1", 1)
[1] FALSE
grepl NA
grepl("senate", c(NA, "senate"))
[1] FALSE TRUE
str_detect(c(NA, "senate"), fixed("senate", ignore_case=T))
[1] NA TRUE
vignettes about knitr
When building package vignettes, use
%\VignetteEngine{knitr::knitr}
rather than
%\VignetteEngine{knitr::rmarkdown}
rmarkdown changes all words with http:// to links, unless of course you want that to happen.
Why StringsAsFactors?
object.size(as.factor(c(rep("1",1000), rep("2", 1000))))
8512 bytes
object.size(c(rep("1",1000), rep("2", 1000)))
16136 bytes
Rprofile Customizations from RScript
To access customizations in RProfile with RScript, source .Rprofile within the R script
Missing weights, weighted.mean
There are instances where sampling weights are not only unknown, but in fact, cannot be known (unless one makes certain unsavory assumptions). Under those circumstances, weights for certain respondents can be ‘missing’. Typically there the strategy is to code those weights as 0. However if you retain those as NA, weighted.mean etc. is wont to give you NA as an answer, even if you set na.rm=T.
To get non-NA answers, set NAs to zero or estimate mean over respondents with non-missing weights.
More
Find more here.