Clean text and build term matrix for bag of words,TF DFI and bi-gram.

CleanText(source_dataset, dtm_method, reductionrate)

Arguments

source_dataset	A dataframe having two columns, review as text, label as binary.
dtm_method	1 for bag of word, 2 for TF DFI, 3 for bigram.
reductionrate	how many percent of term matrix you want to keep,usually 0.999 and not less than 0.99.

Value

dataframe "dataset" : The term matrix converted to dataframe plus target label.

A clean dataframe,a term-matrix

Examples

# NOT RUN {
library("SentiAnalyzer")
direction <- system.file(package = "SentiAnalyzer", "extdata/Restaurant_Reviews.tsv")
orignal_dataset <- read.delim(direction,quote='',stringsAsFactors = FALSE)
CleanText(original_dataset,dtm_method=1,reductionrate=0.99)
CleanText(original_dataset,dtm_method=2,reductionrate=0.99)
CleanText(original_dataset,dtm_method=3,reductionrate=0.999)
# }

Clean text and build term matrix for bag of words,TF DFI and bi-gram.

Arguments

Value

Examples

Contents

Author