Doug Biber; Jesse Egbert; Mark Davies
Panel: A linguistic taxonomy of registers on the searchable web: Distribution, linguistic descriptions, and automatic register identification
Abstract book pp 52-54
Oral-literate dimensions & Narrative dimension remain constant in all MDA across languages and registers
3 dimensions here
Pronouns & questions, verbs, dependent clauses crucial in interactivity
These analyses show that there are major linguistic differences among the eight major user-defined register categories.
Can we automatically id web registers?
Start point 150+ linguistic features as predictors
90% was training corpus and 10% test corpus
Each document was assigned to a single category
Stepwise discriminant analysis to select the strongest predictive features
10-feature model 0.34 precision
44-feature model 0.44 precision