20.14 Remove Own Stop Words

docs <- tm_map(docs, removeWords, c("department", "email"))
viewDocs(docs, 16)
## hybrid weighted random forests 
## classifying  highdimensional data
## baoxun xu  joshua zhexue huang  graham williams 
## yunming ye
## 
## 
##   computer science harbin institute  technology shenzhen graduate
## school shenzhen  china
## 
## shenzhen institutes  advanced technology chinese academy  sciences shenzhen
##  china
##  amusing gmailcom
## random forests   popular classification method based   ensemble  
## single type  decision trees  subspaces  data   literature 
##  many different types  decision tree algorithms including c cart 
## chaid  type  decision tree algorithm may capture different information
##  structure  paper proposes  hybrid weighted random forest algorithm
## simultaneously using  feature weighting method   hybrid forest method 
## classify  high dimensional data  hybrid weighted random forest algorithm
## can effectively reduce subspace size  improve classification performance
## without increasing  error bound  conduct  series  experiments  eight
## high dimensional datasets  compare  method  traditional random forest
## methods   classification methods  results show   method
## consistently outperforms  traditional methods
## keywords random forests hybrid weighted random forest classification decision tree
## 
## 
## 
## introduction
## 
## random forests     popular classification
## method  builds  ensemble   single type
##  decision trees  different random subspaces 
## data  decision trees  often either built using
## c   cart    one type within
##  single random forest  recent years random
## forests  attracted increasing attention due 
##   competitive performance compared  
## classification methods especially  highdimensional
## data  algorithmic intuitiveness  simplicity 
##    important capability  ensemble using
## bagging   stochastic discrimination 
## several methods   proposed  grow random
## forests  subspaces  data        
##  methods   popular forest construction
## procedure  proposed  breiman   first use
## bagging  generate training data subsets  building
## individual trees
##  subspace  features  
## randomly selected   node  grow branches 
##  decision tree  trees   combined  
## ensemble   forest   ensemble learner 
## performance   random forest  highly dependent
##  two factors  performance   tree  
## diversity   trees   forests  breiman
## formulated  overall performance   set  trees 
##  average strength  proved   generalization
## 
## error   random forest  bounded   ratio  
## average correlation  trees divided   square
##   average strength   trees
##   high dimensional data   text data
##   usually  large portion  features  
## uninformative   classes   forest building
## process informative features    large
## chance   missed   randomly select  small
## subspace breiman suggested selecting log m   
## features   subspace  m   number 
## independent features   data  high dimensional
## data    result weak trees  created  
## subspaces  average strength   trees  reduced
##   error bound   random forest  enlarged
## therefore   large proportion   weak
## trees  generated   random forest  forest  
## large likelihood  make  wrong decision  mainly
## results   weak trees classification power
##  address  problem  aim  optimize decision
## trees   random forest  two strategies one
## straightforward strategy   enhance  classification
## performance  individual trees   feature weighting
## method  subspace sampling     
## method feature weights  computed  respect
##   correlations  features   class feature
##  regarded   probabilities   feature 
##  selected  subspaces  method obviously
## increases  classification performance  individual
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## 
## 
## baoxun xu joshua zhexue huang graham williams yunming ye
## 
## trees   subspaces will  biased  contain
##  informative features however  chance  
## correlated trees  also increased since  features 
## large weights  likely   repeatedly selected
##  second strategy   straightforward use
## several different types  decision trees   training
## data subset  increase  diversity   trees
##   select  optimal tree   individual
## tree classifier   random forest model  work
## presented  extends  algorithm developed  
## specifically  build three different types  tree
## classifiers c cart  chaid    
## training data subset   evaluate  performance
##   three classifiers  select  best tree 
##  way  build  hybrid random forest  may
## include different types  decision trees   ensemble
##  added diversity   decision trees can effectively
## improve  accuracy   tree   forest 
## hence  classification performance   ensemble
## however   use  method  build  best
## random forest model  classifying high dimensional
## data  can   sure   subspace size  best
##   paper  propose  hybrid weighted random
## forest algorithm  simultaneously using  new feature
## weighting method together   hybrid random
## forest method  classify high dimensional data 
##  new random forest algorithm  calculate feature
## weights  use weighted sampling  randomly select
## features  subspaces   node  building different
## types  trees classifiers c cart  chaid 
##  training data subset  select  best tree 
##  individual tree   final ensemble model
## experiments  performed   high dimensional
## text datasets  dimensions ranging   
##   compared  performance  eight random
## forest methods  wellknown classification methods
## c random forest cart random forest chaid
## random forest hybrid random forest c weighted
## random forest cart weighted random forest chaid
## weighted random forest hybrid weighted random
## forest support vector machines  naive bayes 
##  knearest neighbors 
##  experimental
## results show   hybrid weighted random forest
## achieves improved classification performance  
## ten competitive methods
##  remainder   paper  organized  follows
##  section   introduce  framework  building
##  hybrid weighted random forest  describe  new
## random forest algorithm section  summarizes four
## measures  evaluate random forest models  present
## experimental results   high dimensional text datasets
##  section  section  contains  conclusions
## 
## table  contingency table  input feature   class
## feature y
## y  y   
## y  yj   
## y  yq total
##   
## 
## 
## j
## 
## q
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
##   ai
## 
## 
## ij
## 
## iq
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
##   ap
## p
## 
## pj
## 
## pq
## p
## total
## 
## 
## j
## 
## q
## 
## 
## general framework  building hybrid random forests
##  integrating  two methods  propose  novel
## hybrid weighted random forest algorithm
## 
## 
## let y   class  target feature  q distinct
## class labels yj  j       q   purposes 
##  discussion  consider  single categorical feature
##   dataset d  p distinct category values 
## denote  distinct values  ai         p
## numeric features can  discretized  p intervals 
##  supervised discretization method
## assume d  val objects  size   subset 
## d satisfying  condition    ai  y  yj 
## denoted  ij  considering  combinations  
## categorical values     labels  y   can
## obtain  contingency table     y  shown
##  table   far right column contains  marginal
## totals  feature 
## 
## hybrid
## forests
## 
## weighted
## 
## random
## 
##   section  first introduce  feature weighting
## method  subspace sampling   present 
## 
## q
## 
## 
##  
## 
## ij
## 
##         p
## 
## 
## 
## j
## 
##   bottom row   marginal totals  class
## feature y 
## j 
## 
## p
## 
## 
## ij
## 
##  j       q
## 
## 
## 
## 
## 
##  grand total  total number  samples  
##  bottom right corner
## 
## 
## q 
## p
## 
## 
## ij
## 
## 
## 
## j 
## 
## given  training dataset d  feature   first
## compute  contingency table  feature weights 
##  computed using  two methods   discussed
##   following subsection
## 
## 
## 
## 
## notation
## 
## feature weighting method
## 
##   subsection  give  details   feature
## weighting method  subspace sampling  random
## forests consider  mdimensional feature space
##            present   compute 
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## hybrid weighted random forests  classifying  highdimensional data
## weights w  w      wm   every feature   space
##  weights   used   improved algorithm
##  grow  decision tree   random forest
##  feature weight computation
##  weight  feature  represents  correlation
##   values  feature    values  
## class feature y   larger weight will indicate  
## class labels  objects   training dataset  
## correlated   values  feature  indicating 
##    informative   class  objects thus 
##  suggested     stronger power  predicting
##  classes  new objects
##   following  propose  use  chisquare
## statistic  compute feature weights  
## method can quantify  correspondence  two
## categorical variables
## given  contingency table   input feature  
##  class feature y  dataset d  chisquare statistic
##   two features  computed 
## corra y  
## 
## q
## p 
## 
## ij  tij 
## tij
##  j
## 
## 
## 
##  ij   observed frequency  
## contingency table  tij   expected frequency
## computed 
##  x j
## tij 
## 
## 
## 
## 
##  larger  measure corra y   
## informative  feature    predicting class y 
##  normalized feature weight
##  practice feature weights  normalized  feature
## subspace sampling  use corra y   measure 
## informativeness   features  consider 
##  feature weights however  treat  weights 
## probabilities  features  normalize  measures 
## ensure  sum   normalized feature weights 
## equal   let corrai  y      m    set
##  m feature measures  compute  normalized
## weights 
## 
## corrai  y 
## wi  n 
## 
##  corrai  y 
##   use  square root  smooth  values 
##  measures wi can  considered   probability
##  feature ai  randomly sampled   subspace 
##  informative  feature   larger  weight 
##  higher  probability   feature  selected
## 
## diversity  commonly obtained  using bagging 
## random subspace sampling  introduce  
## element  diversity  using different types  trees
## considering  analogy  forestry  different data subsets  bagging represent  soil structures different decision tree algorithms represent different tree species  approach  two key aspects
## one   use three types  decision tree algorithms 
## generate three different tree classifiers   training data subset     evaluate  accuracy
##   tree   measure  tree importance  
## paper  use  outofbag accuracy  assess  importance   tree
## following breiman   use bagging  generate
##  series  training data subsets    build
## trees   tree  data subset used  grow
##  tree  called  inofbag iob data  
## remaining data subset  called  outofbag oob
## data since oob data   used  building trees
##  can use  data  objectively evaluate  trees
## accuracy  importance  oob accuracy gives 
## unbiased estimate   true accuracy   model
## given n instances   training dataset d   tree
## classifier hk iobk  built   kth training data
## subset iobk   define  oob accuracy   tree
## hk iobk   di  d 
## n
## oobacck 
## 
## framework  building  hybrid random
## forest
## 
##   ensemble learner  performance   random
## forest  highly dependent  two factors  diversity
## among  trees   accuracy   tree 
## 
## 
## 
## ihk di   yi  di  oobk 
## n
##  idi  oobk 
## 
## 
## 
##     indicator function  larger 
## oobacck   better  classification quality   tree
##  use  outofbag data subset oobi  calculate
##  outofbag accuracies   three types  trees
## c cart  chaid  evaluation values e 
## e  e respectively
## fig  illustrates  procedure  building  hybrid
## random forest model firstly  series  iob oob
## datasets  generated   entire training dataset
##  bagging  three types  tree classifiers c
## cart  chaid  built using  iob dataset
##  corresponding oob dataset  used  calculate 
## oob accuracies   three tree classifiers finally
##  select  tree   highest oob accuracy 
##  final tree classifier   included   hybrid
## random forest
## building  hybrid random forest model  
## way will increase  diversity among  trees
##  classification performance   individual tree
## classifier  also maximized
## 
## 
## 
## 
## 
## 
## decision tree algorithms
## 
##  core   approach   diversity  decision
## tree algorithms   random forest different decision
## tree algorithms grow structurally different trees 
##   training data selecting  good decision tree
## algorithm  grow trees   random forest  critical
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## 
## 
## baoxun xu joshua zhexue huang graham williams yunming ye
##  difference lies   way  split  node 
##   split functions  binary branches  multibranches   work  use  different decision
## tree algorithms  build  hybrid random forest
## 
## 
## 
## figure   hybrid random forests framework
## 
##   performance   random forest  studies
##  considered  different decision tree algorithms
## affect  random forest      paper
##  common decision tree algorithms   follows
## classification trees  c   supervised
## learning classification algorithm used  construct
## decision trees given  set  preclassified objects 
## described   vector  attribute values  construct
##  mapping  attribute values  classes c uses
##  divideandconquer approach  grow decision trees
## beginning   entire dataset  tree  constructed
##  considering  predictor variable  dividing 
## dataset  best predictor  chosen   node
## using  impurity  diversity measure  goal 
##  produce subsets   data   homogeneous
##  respect   target variable c selects  test
##  maximizes  information gain ratio igr 
## classification  regression tree cart 
##  recursive partitioning method  can  used 
##  regression  classification  main difference
##  c  cart   test selection 
## evaluation process
## chisquared automatic interaction detector
## chaid method  based   chisquare test 
## association  chaid decision tree  constructed
##  repeatedly splitting subsets   space  two
##   nodes  determine  best split  
## node  allowable pair  categories   predictor
## variables  merged     statistically
## significant difference within  pair  respect  
## target variable  
##   decision tree algorithms  can see 
## 
## hybrid weighted random forest algorithm
## 
##   subsection  present  hybrid weighted
## random forest algorithm  simultaneously using 
## feature weights   hybrid method  classify high
## dimensional data  benefits   algorithm 
## two aspects firstly compared  hybrid forest
## method   can use  small subspace size 
## create accurate random forest models
## secondly
## compared  building  random forest using feature
## weighting   can use several different types 
## decision trees   training data subset  increase
##  diversities  trees  added diversity  
## decision trees can effectively improve  classification
## performance   ensemble model  detailed steps
##  introduced  algorithm 
## input parameters  algorithm  include  training
## dataset d  set  features   class feature y 
##  number  trees   random forest k  
## size  subspaces m  output   random forest
## model m  lines  form  loop  building k
## decision trees   loop line  samples  training
## data d  sampling  replacement  generate 
## inofbag data subset iobi  building  decision tree
## line  build three types  tree classifiers c
## cart  chaid   procedure line  calls
##  function createt reej   build  tree classifier
## line  calculates  outofbag accuracy   tree
## classifier   procedure line  selects  tree
## classifier   maximum outofbag accuracy k
## decision tree trees  thus generated  form  hybrid
## weighted random forest model m 
## generically function createt reej  first creates 
## new node   tests  stopping criteria  decide
## whether  return   upper node   split 
## node   choose  split  node   feature
## weighting method  used  randomly select m features
##   subspace  node splitting  features
##  used  candidates  generate  best split 
## partition  node   subset   partition
## createt reej   called   create  new node 
##  current node   leaf node  created  returns 
##  parent node  recursive process continues 
##  full tree  generated
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## hybrid weighted random forests  classifying  highdimensional data
## algorithm  new random forest algorithm
##  input
##   d   training dataset
##      features space       
##   y   class features space y  y   yq 
##   k   number  trees
##   m   size  subspaces
##  output  random forest m 
##  method
##       k 
## 
## draw  bootstrap sample inofbag data subset
## iobi  outofbag data subset oobi 
## training dataset d
## 
##  j     
## 
## hij iobi   createt reej 
## use outofbag data subset oobi  calculate
## 
##  outofbag accuracy oobacci j   tree
## classifier hij iobi   equation
## 
## end 
## 
## select hi iobi    highest outofbag
## accuracy oobacci  optimal tree 
##  end 
##  combine
## 
## k
## tree
## classifiers
## h iob  h iob   hk iobk    random
## forest m 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## function createtree
## create  new node n 
##  stopping criteria  met 
## return n   leaf node
## else
##  j    m 
## compute
## 
## informativeness
## measure
## corraj  y   equation 
## end 
## compute feature weights w  w   wm  
## equation 
## use  feature weighting method  randomly
## select m features
## use  m features  candidates  generate
##  best split   node   partitioned
## call createtree   split
## end 
## return n 
## evaluation measures
## 
##   paper  use five measures ie strength
## correlation error bound c s  test accuracy  f
## metric  evaluate  random forest models strength
## measures  collective performance  individual trees
##   random forest   correlation measures 
## diversity   trees  ratio   correlation
##   square   strength c s indicates 
## generalization error bound   random forest model
##  three measures  introduced   
## accuracy measures  performance   random forest
## model  unseen test data  f metric  
## 
## 
## 
## commonly used measure  classification performance
## 
## 
## strength  correlation measures
## 
##  follow breimans method described   
## calculate  strength correlation   ratio c s 
## following breimans notation  denote strength 
## s  correlation   let hk iobk    kth
## tree classifier grown   kth training data iobk
## sampled  d  replacement
## assume 
## random forest model contains k trees  outofbag
## proportion  votes  di  d  class j 
## k
## ihk di   j di 
##   iobk 
## qdi  j  kk
## 
##   iobk 
## k idi 
##    number  trees   random forest
##   trained without di  classify di  class
## j divided   number  training datasets 
## containing di 
##  strength s  computed 
## 
## qdi  yi   maxjyi qdi  j
## n 
## n
## 
## s
## 
## 
## 
##  n   number  objects  d  yi indicates
##  true class  di 
##  correlation   computed 
## n
## 
## 
## 
##  qdi  yi   maxjyi qdi  j  s
## n
## 
##  
## 
## 
## k
## 
## k
## k  pk  pk  
## k pk  p
## 
## 
## n
## pk 
## 
## 
## 
## ihk di   yi  di 
##   iobk 
## n
##   iobk 
##  idi 
## 
## 
## 
## 
## n
## pk 
## 
## 
## 
## ihk di   jdi  y  di 
##   iobk 
## n
## id
## 
##  
## iob
## 
## 
## k
## 
## 
## 
## 
## 
## jdi  y   argmaxjyi qd j
## 
## 
## 
##   class  obtains  maximal number  votes
## among  classes   true class
## 
## 
## general error bound measure c s
## 
## given  strength  correlation  outofbag
## estimate   c s measure can  computed
##  important theoretical result  breimans method
##   upper bound   generalization error  
## random forest ensemble   derived 
## p e    s  s
## 
## 
## 
##     mean value  correlations  
## pairs  individual classifiers  s   strength 
##  set  individual classifiers   estimated  
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## 
## 
## baoxun xu joshua zhexue huang graham williams yunming ye
## 
## average accuracy  individual classifiers  d 
## outofbag evaluation  inequality shows  
## generalization error   random forest  affected 
##  strength  individual classifiers   mutual
## correlations therefore breiman defined  c s ratio
##  measure  random forest 
## c s   s
## 
## 
## 
##  smaller  ratio  better  performance 
##  random forest   c s gives guidance 
## reducing  generalization error  random forests
## 
## 
## test accuracy
## 
##  test accuracy measures  classification performance   random forest   test data set let
## dt   test data  yt   class labels given
## di  dt   number  votes  di  class j 
## n di  j 
## 
## k
## 
## 
## ihk di   j
## 
## 
## 
## table 
## summary statistic   highdimensional
## datasets
## name
## features
## instances
## classes  minority
## fbis
## 
## 
## 
## 
## re
## 
## 
## 
## 
## re
## 
## 
## 
## 
## tr
## 
## 
## 
## 
## wap
## 
## 
## 
## 
## tr
## 
## 
## 
## 
## las
## 
## 
## 
## 
## las
## 
## 
## 
## 
## 
##  emphasizes  performance   classifier  rare
## categories define     follows
## 
##  
## 
## t pi
## t pi
##   
## t pi  f pi 
## t pi  f ni 
## 
## 
## 
## f    category    macroaveraged f
##  computed 
## 
## k
## 
##  test accuracy  calculated 
## f  
## 
##  di  yi   maxjyi n di  j   
## n 
## 
##  
##  m acrof  
##   
## 
## q
## 
## 
## q
## 
## f 
## 
## 
## 
## n
## 
## acc 
## 
##  n   number  objects  dt  yi indicates
##  true class  di 
## 
## 
## f metric
## 
##  evaluate  performance  classification methods
##  dealing   unbalanced class distribution  use
##  f metric introduced  yang  liu  
## measure  equal   harmonic mean  recall 
##  precision   overall f score   entire
## classification problem can  computed   microaverage   macroaverage
## microaveraged f  computed globally  
## classes  emphasizes  performance   classifier
##  common classes define     follows
## q
## 
## q
## t pi
##  t pi
##   q 
##    q
## 
##  t pi  f pi 
##  t pi  f ni 
##  q   number  classes t pi true positives
##   number  objects correctly predicted  class 
## f pi false positives   number  objects  
## predicted  belong  class      microaveraged f  computed 
## m icrof  
## 
## 
## 
## 
## 
## 
## macroaveraged f  first computed locally 
##  class    average   classes  taken
## 
##  larger  microf  macrof values  
## higher  classification performance   classifier
## 
## 
## experiments
## 
##   section  present two experiments 
## demonstrate  effectiveness   new random
## forest algorithm  classifying high dimensional data
## high dimensional datasets  various sizes 
## characteristics  used   experiments 
## first experiment  designed  show   proposed
## method can reduce  generalization error bound
## c s   improve test accuracy   size 
##  selected subspace    large  second
## experiment  used  demonstrate  classification
## performance   proposed method  comparison 
##  classification methods ie svm nb  knn
## 
## 
## datasets
## 
##   experiments  used eight realworld high
## dimensional datasets  datasets  selected
## due   diversities   number  features 
## number  instances   number  classes 
## dimensionalities vary     instances
## vary       minority class rate varies
##       dataset  randomly
## select   instances   training dataset 
##  remaining data   test dataset detailed
## information   eight datasets  listed  table 
##  fbis re re tr wap tr las
##  las datasets  classical text classification
## benchmark datasets   carefully selected 
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## hybrid weighted random forests  classifying  highdimensional data
## preprocessed  han  karypis  dataset fbis
##  compiled   foreign broadcast information
## service trec   datasets re  re 
## selected   reuters text categorization test
## collection distribution    datasets tr 
## tr  derived  trec  trec 
##  trec  dataset wap    webace
## project wap   datasets las  las 
## selected   los angeles times  trec 
##  classes   datasets  generated  
## relevance judgment provided   collections
## 
## 
## performance comparisons  random forest methods
## 
##  purpose   experiment   evaluate
##  effect   hybrid weighted random forest
## method h w rf  strength correlation c s 
##  test accuracy
##  eight high dimensional
## datasets  analyzed  results  compared
##  seven  random forest methods ie c
## random forest c rf cart random forest
## cart rf chaid random forest chaid rf
## hybrid random forest h rf c weighted random
## forest c w rf cart weighted random forest
## cart w rf chaid weighted random forest
## chaid w rf   dataset  ran 
## random forest algorithm  different sizes  
## feature subspaces since  number  features  
## datasets   large  started   subspace
##   features  increased  subspace   
## features  time   given subspace size  built
##  trees   random forest model  order 
## obtain  stable result  built  random forest models
##   subspace size  dataset   algorithm
##  computed  average values   four measures
##  strength correlation c s   test accuracy  
## final results  comparison  performance  
## eight random forest algorithms   four measures
##      datasets  shown  figs    
## 
## fig  plots  strength   eight methods 
## different subspace sizes      datasets
##    subspace  higher  strength 
## better  result   curves  can see 
##  new algorithm h w rf consistently performs
## better   seven  random forest algorithms
##  advantages   obvious  small subspaces
##  new algorithm quickly achieved higher strength
##   subspace size increases
##  seven 
## random forest algorithms require larger subspaces 
## achieve  higher strength  results indicate 
##  hybrid weighted random forest algorithm enables
## random forest models  achieve  higher strength
##  small subspace sizes compared   seven 
## random forest algorithms
## fig  plots  curves   correlations  
## eight random forest methods    datasets 
## 
## 
## 
## small subspace sizes h rf c rf cart rf
##  chaid rf produce higher correlations 
##  trees   datasets  correlation decreases
##   subspace size increases   random forest
## models  lower  correlation   trees
##   better  final model
##   new
## random forest algorithm h w rf  low correlation
## level  achieved   small subspaces  
##  datasets  also note    subspace size
## increased  correlation level increased  well  
## understandable    subspace size increases
##   informative features   likely  
## selected repeatedly   subspaces increasing 
## similarity   decision trees therefore  feature
## weighting method  subspace selection works well 
## small subspaces  least   point  view  
## correlation measure
## fig  shows  error bound indicator c s  
## eight methods    datasets   figures
##  can observe    subspace size increases c s
## consistently reduces  behaviour indicates  
## subspace size larger  log m  benefits  eight
## algorithms however  new algorithm h w rf
## achieved  lower level  c s  subspace size 
## log m      seven  algorithms
## fig  plots  curves showing  accuracy  
## eight random forest models   test datasets 
##   datasets  can clearly see   new random
## forest algorithm h w rf outperforms  seven
##  random forest algorithms   eight data sets
##  can  seen   new method   stable
##  classification performance   methods 
##    figures   observed   highest test
## accuracy  often obtained   default subspace size
##  log m     implies   practice large
## size subspaces   necessary  grow highquality
## trees  random forests
## 
## 
## performance comparisons
## classification methods
## 
## 
## 
## 
## 
##  conducted   experimental comparison
##  three  widely used text classification
## methods support vector machines svm naive
## bayes nb  knearest neighbor knn 
## support vector machine used  linear kernel  
## regularization parameter     often
## used  text categorization  naive bayes 
## adopted  multivariate bernoulli event model 
##  frequently used  text classification   knearest neighbor knn  set  number k 
## neighbors     experiments  used wekas
## implementation   three text classification
## methods   used  single subspace size 
## features   eight datasets  run  random forest
## algorithms  h rf c rf cart rf 
## chaid rf  used  subspace size   features 
##  first  datasets ie fbis re re tr wap 
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## 
## 
## baoxun xu joshua zhexue huang graham williams yunming ye
## fbis
## 
## re
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## strength
## 
## strength
## 
## 
## 
## 
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## 
## chaidwrf
## 
## chaidwrf
## 
## 
## 
## hrf
## 
## 
## 
## hrf
## 
## crf
## 
## crf
## 
## 
## 
## cartrf
## 
## 
## 
## cartrf
## 
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## number  features
## 
## re
## 
## 
## 
## 
## 
## tr
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## 
## strength
## 
## strength
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## 
## chaidwrf
## 
## 
## 
## chaidwrf
## 
## hrf
## 
## hrf
## 
## 
## 
## crf
## 
## 
## 
## crf
## 
## cartrf
## 
## cartrf
## 
## 
## 
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## number  features
## 
## wap
## 
## tr
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## strength
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## strength
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## chaidwrf
## 
## chaidwrf
## hrf
## 
## 
## 
## hrf
## 
## 
## 
## crf
## 
## crf
## 
## cartrf
## 
## cartrf
## 
## 
## 
## 
## 
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## las
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## las
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## chaidwrf
## 
## 
## 
## strength
## 
## strength
## 
## 
## 
## number  features
## 
## number  features
## 
## 
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## chaidwrf
## 
## hrf
## crf
## 
## 
## 
## hrf
## 
## 
## 
## crf
## 
## cartrf
## chaidrf
## 
## cartrf
## 
## 
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## figure  strength changes   number  features   subspace    high dimensional datasets
## 
## tr  run  random forest algorithms  used
##  subspace size   features   last  datasets
## las  las  run  random forest algorithms
##  h w rf c w rf cart w rf 
## chaid w rf  used breimans subspace size 
## 
## log m     run  random forest algorithms
##  number  features provided  consistent result 
## shown  fig   order  obtain stable results 
## built  random forest models   random forest
## algorithm   dataset  present  average
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## hybrid weighted random forests  classifying  highdimensional data
## fbis
## 
## 
## 
## re
## 
## 
## 
## 
## 
## 
## 
## 
## 
## correlation
## 
## correlation
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## chaidwrf
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## chaidwrf
## 
## hrf
## crf
## 
## 
## 
## hrf
## 
## 
## 
## crf
## 
## cartrf
## chaidrf
## 
## cartrf
## 
## 
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## re
## 
## 
## 
## 
## 
## tr
## 
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## chaidwrf
## 
## 
## 
## correlation
## 
## correlation
## 
## 
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## chaidwrf
## hrf
## 
## hrf
## crf
## 
## 
## 
## crf
## 
## 
## 
## cartrf
## 
## cartrf
## 
## 
## 
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## number  features
## 
## wap
## 
## tr
## 
## 
## 
## 
## 
## 
## 
## 
## correlation
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## chaidwrf
## 
## 
## 
## correlation
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## chaidwrf
## 
## 
## 
## hrf
## 
## hrf
## 
## crf
## 
## 
## 
## crf
## 
## cartrf
## 
## cartrf
## 
## 
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## number  features
## 
## las
## 
## las
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## chaidwrf
## 
## 
## 
## correlation
## 
## correlation
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## chaidwrf
## 
## 
## 
## hrf
## 
## hrf
## crf
## 
## 
## 
## crf
## 
## 
## 
## cartrf
## 
## cartrf
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## figure  correlation changes   number  features   subspace    high dimensional datasets
## 
## results noting   range  values  less 
##    hybrid trees  always  accurate
##  comparison results  classification performance
##  eleven methods  shown  table 
## 
## performance  estimated using test accuracy acc
## 
## micro f mic  macro f mac boldface
## denotes best results  eleven classification
## methods
##   improvement  often quite
## small   always  improvement demonstrated
##  observe   proposed method h w rf
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## 
## 
## baoxun xu joshua zhexue huang graham williams yunming ye
## fbis
## 
## 
## 
## re
## 
## 
## 
## 
## log m
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## cwrf
## 
## 
## 
## 
## 
## 
## 
## hwrf
## 
## c s
## 
## c s
## 
## 
## 
## 
## 
## cartwrf
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## 
## chaidwrf
## 
## 
## 
## chaidwrf
## 
## 
## 
## hrf
## 
## hrf
## 
## log m
## 
## 
## crf
## 
## 
## 
## crf
## 
## 
## 
## cartrf
## 
## cartrf
## 
## chaidrf
## 
## 
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## log m
## 
## 
## 
## 
## 
## 
## 
## c s
## 
## hwrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## tr
## 
## 
## 
## 
## 
## c s
## 
## 
## 
## number  features
## 
## re
## 
## 
## 
## 
## 
## cwrf
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## 
## cartwrf
## 
## chaidwrf
## 
## 
## 
## chaidwrf
## 
## 
## 
## hrf
## 
## hrf
## 
## crf
## 
## log m
## 
## 
## 
## 
## 
## crf
## 
## 
## 
## cartrf
## 
## cartrf
## 
## chaidrf
## 
## 
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## number  features
## 
## wap
## 
## tr
## 
## 
## 
## 
## 
## 
## 
## log m
## 
## log m
## 
## 
## 
## 
## 
## 
## 
## 
## cwrf
## 
## c s
## 
## 
## 
## c s
## 
## hwrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## cartwrf
## 
## 
## chaidwrf
## 
## 
## 
## chaidwrf
## hrf
## 
## hrf
## crf
## 
## 
## 
## crf
## 
## 
## 
## cartrf
## 
## cartrf
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## number  features
## 
## las
## 
## las
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## 
## 
## hrf
## 
## 
## 
## crf
## cartwrf
## 
## 
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## chaidwrf
## 
## log m
## 
## 
## 
## c s
## 
## c s
## 
## 
## 
## 
## hwrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## chaidwrf
## 
## log m
## 
## 
## 
## 
## hrf
## crf
## 
## 
## 
## cartrf
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## figure  c s changes   number  features   subspace    high dimensional datasets
## 
## outperformed   classification methods  
## datasets
## 
## 
## 
## conclusions
## 
##   paper  presented  hybrid weighted random
## forest algorithm  simultaneously using  feature
## weighting method   hybrid forest method  classify
##  computer journal vol 
## 
##  
## 
## 
## 
## hybrid weighted random forests  classifying  highdimensional data
## fbis
## 
## 
## 
## 
## 
## re
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## 
## accuracy
## 
## accuracy
## 
## 
## 
## chaidwrf
## hrf
## 
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## chaidwrf
## hrf
## 
## 
## 
## crf
## 
## crf
## 
## log m
## 
## cartrf
## 
## 
## 
## 
## 
## cartrf
## 
## log m
## 
## 
## 
## 
## 
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## re
## 
## tr
## 
## 
## 
## 
## 
## 
## 
## 
## log m
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## 
## 
## 
## accuracy
## 
## accuracy
## 
## 
## 
## chaidwrf
## 
## 
## 
## hwrf
## 
## 
## 
## cwrf
## cartwrf
## 
## 
## 
## chaidwrf
## hrf
## 
## hrf
## 
## 
## 
## log m
## 
## 
## 
## crf
## 
## crf
## 
## 
## 
## cartrf
## 
## cartrf
## 
## 
## 
## 
## 
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## number  features
## 
## wap
## 
## 
## 
## 
## 
## tr
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## 
## accuracy
## 
## accuracy
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## chaidwrf
## 
## chaidwrf
## 
## 
## 
## hrf
## 
## log m
## 
## 
## 
## 
## hrf
## 
## crf
## 
## 
## 
## cartrf
## 
## 
## 
## crf
## 
## log m
## 
## 
## 
## 
## cartrf
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## number  features
## 
## las
## 
## 
## 
## 
## 
## las
## 
## 
## 
## 
## 
## 
## 
## accuracy
## 
## 
## hwrf
## cwrf
## 
## 
## 
## cartwrf
## chaidwrf
## 
## 
## 
## accuracy
## 
## 
## 
## 
## 
## 
## 
## hwrf
## cwrf
## cartwrf
## 
## 
## 
## chaidwrf
## hrf
## 
## hrf
## 
## log m
## 
## 
## 
## crf
## 
## 
## 
## crf
## 
## log m
## 
## 
## 
## 
## 
## cartrf
## 
## cartrf
## chaidrf
## 
## chaidrf
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## number  features
## 
## figure  test accuracy changes   number  features   subspace    high dimensional datasets
## 
## high dimensional data  algorithm   retains
##  small subspace size breimans formula log m   
##  determining  subspace size  create accurate
## random forest models  also effectively reduces
##  upper bound   generalization error 
## 
## improves classification performance   results 
## experiments  various high dimensional datasets 
## random forest generated   new method  superior
##   classification methods  can use  default
## log m    subspace size  generally guarantee
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## 
## 
## baoxun xu joshua zhexue huang graham williams yunming ye
## 
## table   comparison  results
## datasets
## dataset
## fbis
## measures
## acc
## mic
## svm
##  
## knn
## 
## 
## nb
##  
## h rf
##  
## c rf
##  
## cart rf
##  
## chaid rf
##  
## h w rf
##  
## c w rf
##  
## cart w rf
##  
## chaid w rf
##  
## dataset
## wap
## measures
## acc
## mic
## svm
## 
## 
## knn
##  
## nb
##  
## h rf
##  
## c rf
##  
## cart rf
##  
## chaid rf
##  
## h w rf
##  
## c w rf
##  
## cart w rf
## 
## 
## chaid w rf
##  
## 
## best accuracy micro f  macro f results   eleven methods   
## re
## mic
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## tr
## mac
## acc
## mic
##   
##   
##   
##   
##   
## 
##  
##  
## 
##   
##   
## 
## 
## 
## 
## 
## 
## 
## mac
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## acc
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
##  always produce  best models   variety 
## measures  using  hybrid weighted random forest
## algorithm
## acknowledgements
##  research  supported  part  nsfc 
## grant   shenzhen new industry development fund  grant nocxba
## references
##  breiman l  random forests machine learning
##  
##  ho t  random subspace method  constructing decision forests ieee transactions  pattern
## analysis  machine intelligence  
##  quinlan j  c programs  machine
## learning morgan kaufmann
##  breiman l  classification  regression trees
## chapman  hall crc
##  breiman l  bagging predictors
## machine
## learning  
##  ho t  random decision forests proceedings
##   third international conference  document
## analysis  recognition pp  ieee
##  dietterich t   experimental comparison 
## three methods  constructing ensembles  decision
## trees bagging boosting  randomization machine
## learning  
## 
## mac
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## mac
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## re
## mic
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## las
## acc
## mic
## 
## 
##  
##  
## 
## 
##  
##  
## 
## 
##  
##  
##  
##  
## acc
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## tr
## mic
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## las
## mac
## acc
## mic
##   
##   
## 
## 
## 
## 
##  
## 
##  
## 
## 
## 
##   
##   
##   
##  
## 
## 
##  
## mac
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## acc
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## mac
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## mac
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
## 
##  banfield r hall l bowyer k  kegelmeyer w
##   comparison  decision tree ensemble creation
## techniques ieee transactions  pattern analysis
##  machine intelligence  
## 
##  robniksikonja
## m  improving random forests
## proceedings   th european conference 
## machine learning pp  springer
##  ho t  c decision forests proceedings 
##  fourteenth international conference  pattern
## recognition pp  ieee
##  dietterrich t  machine learning research four
## current direction artificial intelligence magzine 
## 
##  amaratunga d cabrera j  lee y 
## enriched random forests bioinformatics  
## 
##  ye y li h deng x  huang j 
## feature weighting random forest  detection  hidden
## web search interfaces  journal  computational
## linguistics  chinese language processing  
## 
##  xu b huang j williams g wang q 
## ye y  classifying  highdimensional data
##  random forests built  small subspaces
## international journal  data warehousing 
## mining  
##  xu b huang j williams g li j  ye y
##  hybrid random forests advantages  mixed
## trees  classifying text data proceedings   th
## pacificasia conference  knowledge discovery 
## data mining springer
## 
##  computer journal vol 
## 
##  
## 
## 
## 
## hybrid weighted random forests  classifying  highdimensional data
##  biggs d de ville b  suen e   method
##  choosing multiway partitions  classification 
## decision trees journal  applied statistics  
##  ture m kurt  turhan kurum   ozdamar
## k  comparing classification techniques 
## predicting essential hypertension expert systems 
## applications  
##  begum n ma f  ren f  automatic text summarization using support vector machine
## international journal  innovative computing information  control  
##  chen j huang h tian s  qu y 
## feature selection  text classification  naive
## bayes expert systems  applications  
## 
##  tan s  neighborweighted knearest neighbor
##  unbalanced text corpus
## expert systems 
## applications  
##  pearson k    theory  contingency 
##  relation  association  normal correlation
## cambridge university press
##  yang y  liu x   reexamination 
## text categorization methods proceedings   th
## international conference  research  development
##  information retrieval pp  acm
##  han e  karypis g  centroidbased
## document classification analysis  experimental
## results proceedings   th european conference 
## principles  data mining  knowledge discovery
## pp  springer
##  trec
## 
## text
## retrieval
## conference
## http  trecnistgov
##  lewis
## d
## 
## reuters
## text
## categorization
## test
## collection
## distribution
## 
## http  wwwresearchattcom  lewis
##  han e boley d gini m gross r hastings
## k karypis g kumar v mobasher b 
## moore j  webace  web agent  document
## categorization  exploration proceedings   nd
## international conference  autonomous agents pp
##  acm
##  mccallum   nigam k   comparison 
## event models  naive bayes text classification aaai workshop  learning  text categorization pp 
## 
##  witten  frank e  hall m  data mining
## practical machine learning tools  techniques
## morgan kaufmann
## 
##  computer journal vol 
## 
## 

Previously we used the English stopwords provided by . We could instead or in addition remove our own stop words as we have done above. We have chosen here two words, simply for illustration. The choice might depend on the domain of discourse, and might not become apparent until we’ve done some analysis.



Your donation will support ongoing development and give you access to the PDF version of this book. Desktop Survival Guides include Data Science, GNU/Linux, and MLHub. Books available on Amazon include Data Mining with Rattle and Essentials of Data Science. Popular open source software includes rattle, wajig, and mlhub. Hosted by Togaware, a pioneer of free and open source software since 1984.
Copyright © 1995-2021 Graham.Williams@togaware.com Creative Commons Attribution-ShareAlike 4.0.