CleanText.Rd
Clean text and build term matrix for bag of words,TF DFI and bi-gram.
CleanText(source_dataset, dtm_method, reductionrate)
source_dataset | A dataframe having two columns, review as text, label as binary. |
---|---|
dtm_method | 1 for bag of word, 2 for TF DFI, 3 for bigram. |
reductionrate | how many percent of term matrix you want to keep,usually 0.999 and not less than 0.99. |
dataframe "dataset" : The term matrix converted to dataframe plus target label.
A clean dataframe,a term-matrix
# NOT RUN { library("SentiAnalyzer") direction <- system.file(package = "SentiAnalyzer", "extdata/Restaurant_Reviews.tsv") orignal_dataset <- read.delim(direction,quote='',stringsAsFactors = FALSE) CleanText(original_dataset,dtm_method=1,reductionrate=0.99) CleanText(original_dataset,dtm_method=2,reductionrate=0.99) CleanText(original_dataset,dtm_method=3,reductionrate=0.999) # }