A Corpus-Based Study on the Characteristics of Texts Generated in Computer-Mediated Communication

Autor: joannelee0228 • March 26, 2012 • Case Study • 1,967 Words (8 Pages) • 2,473 Views

Page 1 of 8

With the advanced technology and easy access to the Internet, computer-mediated communication (CMC), which diminishes the constraints of time and geographic barriers, is a common form of interaction among people in today’s era of information. Particularly, the blog is gaining worldwide attention as a new and important online genre in recent years (Vincent et al. 2007). The resultant linguistic features are interesting and worth a corpus-based study to investigate how people communicate with the special use of their language resources in this specific genre.

In this study, a target corpus, with a total of 30,103 words, is built by collecting texts from 5 personal blogs on a popular theme, beauty. The blogs originate in various places (Figure 1) and this leads to a more representative resultant analysis.

Blogs Description

Beaut.ie Established by a pair of Irish sisters Kirstie and Aisling, freelance beauty writers in Ireland, in 2006

Jack and Hill: A Beauty Blog Founded by 2 beauty obsessive women in 2005: Hillary, an author in the LA, & Jankie, a PR pro in London and the US

Beauty Addict Founded by Kristen (a Beauty Addict grown up in New York and lived in Manhattan) in 2005 as a way to share her product obsessions with the world

Skin Care and Beauty Publishes Galina’s reviews on beauty and skin care products, and articles from recent fashion magazines (English, German, Russian issues) since 2007

My Women Stuff Operates by Paris B from 2007, writing about all things that make women beautiful, based in Kuala Lumpur, Malaysia

Figure 1: Details of the collected beauty blogs

The beauty-corpus is compared with both the spoken and written samplers of the British National Corpus (BNC), a 2 million-word representative of Standard British English, to protrude its sole linguistic features. The analysis is conducted by Wmatrix, a leading corpus linguistic software which offers word frequency profiles and concordances. With Wmatrix’s functions of part-of-speech tagging (POS) by CLAWS and semantic tagging (Semtag) by USAS, the linguistic characteristics with unexpected high frequency in the target corpus, indicated by the relatively higher log-likelihood (LL) values, in comparison with other more general texts can be identified. The results are discussed as follows.

The aboutness of the corpus is revealed in its word frequency (Appendix I) and collocation lists (Appendix II). The first 2 content words on the word frequency profile are ‘skin’ (18th) and ‘hair’ (24th) while 80% of the top 60 collocations are also related to the topic, such as the names of the products (e.g. ‘cleansing oils’ , ‘BB cream’) and the brands (e.g. ‘Esmeria Organics’, ‘Clarins White’). The remaining 20% are

...

Download as: txt (12.3 Kb) pdf (152.8 Kb) docx (15.4 Kb)

Continue for 7 more pages »

Read Full Essay Save