Convert various text file formats in the OS X Terminal with textutil

Apps-utilities-terminal-icon

Original post here. Copyright MacIssues.

There are a number of ways you can convert a text document to another format, by simply opening it in a text editor like TextEdit and then choosing Save As from the File menu to export it. With TextEdit, you can choose Word, Rich Text, Plain Text, and OpenDocument Text, among others, as the formats in which to save your current file; however, if you are a Terminal user then you might enjoy knowing you can do this right from the command line.

One command-line tool Apple includes in OS X is “textutil” which can be used for a number of manipulations of supported text documents, with one of them being to convert a targeted document to a specified format:

Open the Terminal
Type the following command, replacing FORMAT with one of txt, html, rtf, rtfd, doc, docx, wordml, odt, or webarchive to specify the desired format:
textutil -convert FORMAT
Ensure there is a space after the format specification, and then drag your target document to the Terminal window, so the command looks something like this (in this case, converting a webarchive called “mypage” to docx):
textutil -convert docx ~/Desktop/mypage.webarchive

When this command is executed, the resulting file will appear in the same folder as the original.
While the use of the Terminal for this might seem unnecessary given the ability to use various word processing programs for converting files, you can use it when managing text documents in automator routines, shell scripts, or applescripts where conversion of these documents might be desired.

One prime use for this routine is to batch-convert files, so if you have a folder of txt documents you would like to convert to docx, then you can do so by using Terminal wildcards with this command to target all or at least a group of desired files in the folder:

textutil -convert docx ~/Desktop/TextDocuments/*.txt
In the above command, any .txt documents in the folder called “TextDocuments” on the current user’s desktop will be converted to docx format.

The textutil command can be used for these format conversion routines, but also supports a number of other features such as specifying encoding, changing font sizes and type faces, and modifying file metadata.

 

My Mac posts.

Corpus linguistics in the South 11, U. Sussex

 

IMG_20160227_094108

Freeman Centre, University of Sussex, 27 February 2016

Some of the presentations

____________________

Lee Oakley, University of Birmingham
Some challenges when analysing a Census Corpus

The SexEd Corpus: a census corpus 1950-2014
93,202 words
11-16 year olds
Teenage readership
How are different sexualities presented to British teenagers?

Methodological approach to more qualitative analyses
All analysis is comparison

____________________

Jill Bowie &  Sean Wallis, UCL

Investigating changes in structures and collocations, from a treebank to a megacorpus

Corpus: COHA (Davies 2012)

The to-infinitival perfect

80% decrease in use since 1820

402 verb lemmas in order of frequency

Top 30 collocates account for 95% of tokens (top 95% percintile)

Seem, Appear, Say, Ought, Be, Report, Claim

Seeming group

Cognition group

Cognition and saying group

Modality group

Grammatical change tends to be lexically constrained

Benefits of using dual corpora (ICE-GB + COHA)

We need open data to do more with the corpus data

____________________

Taming the beast: getting to grips with a mega corpus.

Chris Turner, Coventry

Oxford corpus of English

some / any

Corpus of law reports

____________________

Frequency and grammaticalization in a spoken corpus of Cameroon Pdgin English

Gabriel Ozon, Sheffield

estimated 50% of the population use it

West of Cameroon

Stigmatised status

Pilot study: 30 hours recordings, British Academy

____________________

 
How to use a nanocorpus. Enriching corpora of interpreting.
 
Camille Ciollard & Bart Defrancq
 
Female interpreters hedge more than male speakers
 
Use of the marker well
 

____________________

 
Capturing the zoo: a system for downloading, preparing and managing corpus data from online forums.
 
Clausia Viggiana & John Williams
 

Open source tools 
Citizen science 
To capture and interrogate linguistic data form online CS forums: zooniverse
 

____________________

How small corpora paradoxically uncovered the nexr quark in corpus studies.
 
Bill Louw, Coventry & Zimbabwe
 

Theory of scientific method, William Whewell, Trinity College, Colligation.
 
Text reads text
 

#CFP CLS12: Corpus studies at the lexis-grammar interface NEW deadline March 10

CLS12 will take place on Saturday 2 April 2016 at Edge Hill University.

The focus of CLS12 is the interaction of lexis and grammar. The focus is influenced by Halliday’s view of lexis and grammar as “complementary perspectives” (1991: 32), and his conception of the two as notional ends of a continuum (lexicogrammar), in that “if you interrogate the system grammatically you will get grammar-like answers and if you interrogate it lexically you get lexis-like answers” (1992: 64).

We welcome corpus-based papers which examine any aspect of the interaction of lexis and grammar, or to extend Halliday’s conception, studies which interrogate the system lexicogrammatically to get lexicogrammatical answers. The studies …

-may be located more towards the lexis end or the grammar end of the continuum.
-may be descriptive, theoretical or applied (e.g. related to language teaching).
-may (but don’t need to) be situated within any theoretical approach that recognises the combination or interaction of lexis and grammar (e.g. Construction Grammar, Lexical Grammar, Pattern Grammar, Systemic Functional Grammar).
-may be synchronic or diachronic.
We also welcome papers which discuss methodological issues related to the corpus-based study of the lexis-grammar interface.

Presentations will be allocated a total of 40 minutes (including at least 10 minutes for discussion).

Please send an abstract of 400-500 words (excluding references) to Costas Gabrielatos (gabrielc@edgehill.ac.uk). Please make sure to specify the research questions or hypotheses, the corpus and methodology, and the main findings.

Attachment-iconNew!!!!!!

The deadline for abstract submission is 10 March 2016. Abstracts will be double-blind reviewed.

More info: https://www.edgehill.ac.uk/english/research/conferences/cls12/

Thrilled to take part in the @CamLangsci Multilingualism seminar tomorrow

 

camLangSciences

Cambridge Language Sciences Initiative

Speakers: Dr Teresa Parodi (Dept. of Theoretical and Applied Linguistics) and Dr Pascual Perez-Paredes (Faculty of Education)

-Longitudinal corpora of untutored learners

Teresa Parodi, Dept. of Theoretical and Applied Linguistics

-Noun Phrases in Spanish young learners of EFL: insights from the International Corpus of Crosslinguistic Interlanguage (ICCI)

Pascual Pérez-Paredes, RSLE, Faculty of Education, University of Cambridge