Clean text and build term matrix for bag of words,TF DFI and bi-gram.

CleanText(source_dataset, dtm_method, reductionrate)

Arguments

source_dataset

A dataframe having two columns, review as text, label as binary.

dtm_method

1 for bag of word, 2 for TF DFI, 3 for bigram.

reductionrate

how many percent of term matrix you want to keep,usually 0.999 and not less than 0.99.

Value

dataframe "dataset" : The term matrix converted to dataframe plus target label.

A clean dataframe,a term-matrix

Examples

# NOT RUN {
library("SentiAnalyzer")
direction <- system.file(package = "SentiAnalyzer", "extdata/Restaurant_Reviews.tsv")
orignal_dataset <- read.delim(direction,quote='',stringsAsFactors = FALSE)
CleanText(original_dataset,dtm_method=1,reductionrate=0.99)
CleanText(original_dataset,dtm_method=2,reductionrate=0.99)
CleanText(original_dataset,dtm_method=3,reductionrate=0.999)
# }