Journal of Diplomatic Language

JOURNAL OF DIPLOMATIC   LANGUAGE
JDL I:4 (2004)

 Visualizing Co-occurrence Structures in Political Language:
Content Analysis, Multidimensional Scaling, and Unrooted Cluster Trees  
    

Dr. Lawrence Alfred Powell, Ph.D.
Department of Government
University of the West Indies-Jamaica

This paper demonstrates the integrated use of combinations of (1) word-use frequency counts, (2) analysis of co-occurrences, (3) nonmetric multidimensional scaling, and (4) hierarchical cluster analysis trees in visualizing and revealing underlying thematic patterns that exist within public political language. As exemplary political 'texts', four of George Bush's post-9/11 public addresses to the U.S. Congress are examined. The President's September 20, 2001 special address to Congress, and the subsequent three State of the Union addresses were combined into a composite "post-9/11 addresses" text file. Frequently-occurring thematic keywords (and synonyms) were then identified, using the CONCORDANCE program. The resultant co-occurrence matrix of keywords was then analyzed using the HAMLET program, several matrix conversion programs written by the author, MINISSA (MDSx version), QCLUST, and TREEVIEW. The derived two- and three-dimensional scaling plots of word co-occurrence patterns and the plots of the unrooted cluster trees reveal a consistent, bifurcated "us vs. them", "civilized forces of good" vs. "barbaric forces of evil" overall rhetorical structure in these post-9/11 political addresses to Congress. The paper concludes that this integrated approach to exploration and visualization of word co-occurrences is a useful heuristic for isolating generalized patterns within public political documents and speeches-though epistemologically it is more appropriate when used in the context of an 'interpretive' or 'verstehen' framework which treats speeches as social reality constructions, rather than within a stricter confirmatory, logical positivist framework.




VISUALIZING CO-OCCURRENCE STRUCTURES IN POLITICAL LANGUAGE:
Content Analysis, Multidimensional Scaling, and Unrooted Cluster Trees

One of the recurrent problems encountered in analyzing the content of public language is how to uncover and visualize embedded structural patterns.[1] This paper demonstrates how, using a series of commonly-available programs, one can make integrated use of combinations of (i) word-use frequency counts, (ii) analysis of co-occurrences, (iii) nonmetric multidimensional scaling, and (iv) hierarchical cluster analysis trees, in visualizing and revealing underlying thematic patterns that exist within public political language. As exemplary political texts, four of U.S. President George W. Bush's post-9/11 public addresses to Congress will be examined. Frequently-occurring thematic keywords and synonyms are identified (using CONCORDANCE), and the resultant co-occurrence matrix of keywords is further analyzed for evidence of structure-using a sequential combination of HAMLET, MINISSA(MDSx), several matrix conversion and plotting programs written by the author, QCLUST, and TREEVIEW-in order to produce additional insights into the patterns of language use.[2]

Preparing Political Language 'Data' for Content Analysis

The first step in the process of 'excavating' underlying meaning structures within a public document or speech is to properly prepare the raw text file of language data for an unambiguous analysis of its content. In order to have a clean, high-fidelity text file to analyze, with a high signal-to-noise ratio: (i) non-meaningful, stray punctuation/symbols should be deleted, (ii) the 'garbage' or 'junk' words (typically articles, conjunctions) should be eliminated (unless you are planning a keyword-in-context analysis, in which case you should retain them), and (iii) potentially theoretically-important keywords should be initially identified. Without this preliminary pruning of the raw file, it is rarely possible to hone in on salient, recurrent word-use structures with any degree of resolution (Weber 1990; Krippendorf 1980; Brislin 1980).

There are a number of available programs that will produce an initial concordance of moderate-to-large text files, among which CONCORDANCE stands out as a particularly flexible one, containing many useful options (Watt 2002, Sinclair 1986, Howard-Hill 1979). In the present example, raw ascii text files of U.S. President George W. Bush's September 20, 2001 address to Congress, and three subsequent State of the Union Addresses were combined into a single "address4.txt" file, within MSWord. A concordance of this saved "txt"-format file was then produced, using the CONCORDANCE program (which will also process "rtf"-format files). CONCORDANCE contains an option for comprehensively listing all of the unique words that appear in the file, from highest to lowest frequency of occurrence. From this listing, a sample of 40 initial words that were (i) of interest to the analysis and (ii) occurred frequently in the Bush addresses, was selected for further analysis. The initial CONCORDANCE listing also made it possible to identify frequently-occurring stray or punctuation characters, and junk words, that might interfere with a 'clean count' of the file words used in the subsequent content analysis. Having identified them with the CONCORDANCE listing, all stray punctuation/characters (such as : ; , - $ !) were eliminated from the file. Note that periods (full stops) were retained, to preserve the option of using 'sentences' as context units.

Since 'keyword-in-context' searches were later to be done as a part of the overall text analysis, in this instance the junk words ("the", "a", "of", "and" etc.) were retained , to keep the exact original speech phrasings intact (for later exploration and quoting). Note however that if there is not a compelling reason for keeping them, it is usually better to eliminate all of these 'junk words' from the file-which helps to consolidate and tighten the focus of word searches for co-occurrences (by reducing 'noise'.)

Generating Vocabulary, Synonyms, Word-Use Frequencies, and a Word Co-occurrences Matrix

Having thus cleaned the file, and reduced the 'word list' under consideration to a subset of words of theoretical interest for purposes of the analysis, Alan Brier's (1985, 2003) HAMLET content analysis program was then applied to the resultant "address4.txt" file. HAMLET'S "create vocabulary list" option allows for construction of a list of significant words to be searched-for/counted in the text file, as well as the further option of associating "synonyms" with these significant words-such that whenever either a keyword or one of its synonyms is encountered it is counted as one occurrence of that signified word/code/concept, (within a pre-specified "fixed context unit", or within sentences.) The "vocabulary" file constructed for this present illustration (which in current versions of HAMLET could potentially have consisted of up to 100 words), made use of 37 of the 40 words that had originally been identified by the CONCORDANCE listing. Given that some words could be conceptually subsumed under others as having similar meanings, a finalized vocabulary of 20 keywords, with 17 related synonyms, was produced, as follows:

TABLE 1: Counted Keywords and Synonyms in HAMLET
of Post-9/11 Public Addresses to Congress by U.S. President
George W. Bush (word sample = 20 keywords, with 17 synonyms)
*Indicates use of a 'wild card', permitting flexible counts of word
(e.g. "terror*" includes terror, terrors, terrorize, terrorist, terrorism etc.)

Having generated a vocabulary list of theoretically-important terms and synonyms (saved to a ".voc" file for further use), the next step within HAMLET is to produce a word frequencies count for these 20 terms, and also an initial matrix of the joint frequencies (i.e. a count of word co-occurrences within the file). On the "Options" menu of HAMLET, a "fixed context unit" of 9 was requested. This simply instructs the program to search for word frequencies/co-occurrences within intervals of 9 words. For a public document with more complex logical structure, and/or many dependent clauses, a higher fixed context unit (e.g. 11-15 word intervals) might be appropriate, whereas with televised 'sound bite'-studded political speeches composed of compact sentences intended for popular consumption, a shorter fixed context unit (6-10) is usually more suitable. Once the fixed context unit ("9") has been set, the text file identified ("address4.txt"), and the vocabulary file created and saved ("address4.voc"), one is now ready to generate the word frequencies count and the co-occurrences matrix. When these paramenters have been set, this is initiated by clicking the "Count Joint Frequencies for the Specified Vocabulary" bar in HAMLET. The output from this step (as applied to the vocabulary list in Table 1) is summarized in Tables 2A, B, & C, below. (Note, however, that in the actual HAMLET output, unlike these printed tables, the 'B' and 'C' matrices are 'wrapped' to fit the page-which is limited to 80 spaces-and so can unfortunately be rather confusing to read.)

Table 2a: Raw Frequencies, % of Vocabulary, and
% of Total Text, for 20 Keywords, Counted
Across 4 Bush Congressional Addresses
17848 words were read from the "addressx4.txt" text
file, 2114 of these were in the search list, and 1984
context-units were counted.

Table 2b: 'Raw' Joint Frequencies (Co-occurrences) for 20 Keywords,
Counted Across 4 Bush Congressional Addresses. (Co-occurrences
were counted within fixed context units of 9 words.)


Table 2c: Adjusted Joint Frequencies (Co-occurrences) for 20 Keywords,
Counted Across 4 Bush Congressional Addresses. (Co-occurrences were
counted within fixed context units of 9 words.)

As summarized in Table 2A, the (i) raw word frequency counts, (ii) word counts as a percent of total words in the text, and (iii) word counts as a percent of total vocabulary words counted can provide useful insights into some of the dominant, overall patterns within the text of a public document or speech. For example the 951 counts of "we" (and associated synonyms "us", "our", "you*", "I', and "my"-encompassing Bush and his intended U.S. audience), and the 227 counts of "America*" (which owing to the 'wild card' incorporates variants like "American", "America's" etc.), account for 50% and 10.7%, respectively, of the total vocabulary words that were located by HAMLET. Given that these 'ingroup signifiers' occur far more frequently than most of the other vocabulary words, and seem to dominate the linguistic space in these public speeches, this pattern might be interpreted, for example, as evidence of an obsessive, narcissistic preoccupation of the U.S. with itself in the wake of "9/11", poised against an imminent threat to collective sense of self-as evidenced by the next two highest-frequency terms "they" (with synonyms "their" and "them") and "terror*" (including variants like "terrorist", "terrorism", "terrorize"), which have 205 counts (9.7%) and 112 counts (5.3%) respectively.

Inspecting the 'raw' (Table 2B) and 'standardized' (Table 2C) word co-occurrences matrices can sometimes provide additional insights as to how symbolic words are being used interactively, or in context, within a speech or document. Thus we see, for instance, that in these Bush speeches "we" and "America*" were used together (i.e. co-occurred within fixed context units of 9 words) 69 times, and "we" and "good" co-occurred 30 times. Similarly "they" and "terror" co-occur 19 times, and so on. In Table 2C, these figures are standardized across the entire matrix-which provides normalized values that tend to be more stable than the raw frequencies when used in subsequent transformations, like MDS or cluster analysis. These 'adjusted' or 'standardized' joint frequencies (sij) are calculated within HAMLET using the formula sij = (fij) / (fi + fj - fij), where fi are the individual word frequencies, and fij are the joint frequencies for pairs of words (i,j), expressed in terms of the chosen unit of context.

Nonmetric Multidimensional Scaling of the Word Co-occurrences Matrix,
to Visually Represent/Explore Interactive Patterns of Word Association

Because it is capable of finding the most parsimonious 'fit' of these word combinations in a 'smallest space' of two or three dimensions, nonmetric multidimensional scaling of the normalized co-occurrence matrix can be a useful further tool in representing and exploring interactive patterns that may exist between clusters of keywords within the text (Kruskal 1964, Guttman 1968, Rapoport & Fillenbaum 1972, Burton 1972, Cox 1994, Brier 2003).

In order to accomplish this, one has to first save the normalized version of the co-occurrence matrix within HAMLET (in "*.mat" format). Following the listing of the word frequency table and co-occurrence matrices, the next HAMLET screen asks if one wants a cluster analysis (to which one should answer "no"), which then brings up the MDS screen-to which one answers "yes" and "OK" to yield a crude MDS map of the word associations, useful for a quick visual inspection of the pattern of relationships. If one saves HAMLET's matrix ("addressx4.mat") file at this stage, it can then be pasted into more robust and flexible MDS programs such as the MDS(x) version of MINISSA, or KYST, or the SPSS version of ALSCAL, or the MDS options that are available within SAS and SYSTAT.

In the example used here, HAMLET's co-occurrence matrix (located in the saved "addressx4.mat" file), plus the 20 word labels (also in the .mat file), have been pasted into an 'input' file for the MDS(x) version of MINISSA-which provides more flexible analysis options than the cruder version included in HAMLET, and tends to produce more stable solutions. (The latter is obtainable from http://www.newmdsx.com/MINISSA/minissa.htm) A sample input file, sufficient to run an MDS(x)-MINISSA analysis of the co-occurrence matrix listed in Table 2C, is shown in Table 3:

Table 3: A Sample Input File to Run MDS(x)-MINISSA on a Normalized
Co-occurrences Matrix, Derived from HAMLET


          Input file name: bush202d.inp
          Output file name: bush202d.out
          Command-line syntax to run this file at the DOS prompt: minissa bush202d.out

Note that this file must be prepared, and saved, in either Notepad or the DOS text editor (not within MSWord), as MINISSA-MDS(x) is a DOS-based program from the original "Bell Labs" Fortran series of programs designed by Guttman, Lingoes, Roskam, Kruskal, Young and others (Roskam and Lingoes 1970). Owing to the Fortran-derived syntax, commands must begin in column 1, and parameters or options in column 16, with a right margin limited to 80 spaces (the length of the 'computer cards' these programs were originally written to process, on mainframe computers in the 1960s-70s). Spacing and syntax are therefore 'literal' and must be exact, as minor syntax errors and/or incorrect spacings cause Fortran programs to crash.

To produce the MINISSA two-dimensional 'map' of the 20x20 co-occurrences matrix, first name the input file, for example "bush202d.inp" (the content of which is listed in Table 3), and save it to the same directory as contains the MDS(x)-MINISSA program ("minissa.exe"). As this is a DOS-based program, one then goes to DOS level (called "command prompt" in recent versions of Windows). If, for example, the MINISSA program and the "bush202d.inp" file have been deposited in a directory named "mds", then at the command prompt "c:\", type "cd\mds" to switch to that directory. If the "minissa.exe" and "bush202d.inp" files are appropriately located in that directory, then at the command prompt you would type "minissa bush202d.out ", and then Enter. If the file syntax etc. was entered correctly, this will yield an output file named "bush202d.out", the main components of which are summarized in Figure 1. Essential statistics for further use contained in this output file ("bush202d.out") are (i) the 20 coordinates for the derived 'smallest space' dimensions I and II, (ii) the Guttman-Lingoes coefficient of alienation (roughly equivalent to Kruskal's stress, a measure of the derived configuration's overall goodness of fit to the original matrix), and (iii) the "map"-which simply plots the derived coordinates as points in a two-dimensional space.


Figure 1: Smallest Space Analysis (MINISSA/MDSx) of 20 Word
Co-occurences in 'Post 9/11' Bush Addresses to Congress, Two-Dimensional
Solution. (Coefficient of alienation = .13)

As can be seen from Figure 1, a refined MDS 'mapping' of these co-occurrences yields insights into the overall structure of word associations contained in the co-occurrences matrix that would not otherwise be evident simply by inspecting the matrix. Dimensions, clusters, partitions, and bipolarities can now be ascertained within the word-usage patterns when the two-dimensional MINISSA "solution" is plotted. For example a partitions approach to ascertaining structure reveals a left partition of words (as seen in Figure 1) reflecting "we" or "America" and its "friends" as a "civilized" force for "good"-pitted against "they", the "evil" "enemy" "terrorist" forces of "evil" in the world (right partition).[3] If one analyzes the map in terms of bipolarities, one can see "friend vs. enemy" located at opposite ends of the semantic space, with similar bipolarities occurring between "good" and "evil", and "civil*" vs. "kill". A clusters interpretation of the configuration shows "allies" and "friends" closely clustered together within the same spatial region, as are "we"/"America", "war"/"terror*", "enemy"/"Iraq"/"they", and "home*"/"protect". And not surprisingly, one also finds "evil"/"destruct*"/"weapon"/"Hussein" all in close proximity within the same region of this linguistic space derived from President Bush's post 9-11 Congressional speeches. Though MDS can sometimes yield meaningful dimensional interpretations (equivalent to factors in factor analysis), in this case a clear dimensional interpretation does not seem to emerge, and the partition, cluster, and bipolar structural clues appear more useful in highlighting the patterns of linguistic meaning.

Overall, the 'gestalt' of this derived MDS map is of a bifurcated, 'black and white' foreign policy world view-dramatically reflected in these four post-9/11 speeches-of "us vs. them", with "us" signifying America as a civilized force for good in the world in alliance with its "good" "friends", who help "protect" the "homeland" against "them"-the uncivilized "enemy", the "evil", "destructive" forces of "terror" in the world, against which it is necessary to wage a pre-emptive "war".[4]

These word interrelationships can be seen in even finer resolution if one examines the three-dimensional map in Figure 2 (which is generated by simply changing the line "2 of 2" to "3 of 3" in the "bush202d.inp" file). Components of the underlying world view appear further clarified, and regions of the semantic space are more dynamic, when allowed to 'fit' into a three dimensional space.


Figure 2: Smallest Space Analysis (MINISSA/MDSx) of 20 Word
Co-occurences in 'Post 9/11' Bush Addresses to Congress, Three-Dimensional
Solution. (Coefficient of alienation = .08)


Producing Unrooted Hierarchical Cluster Trees, using QCLUST and TREEVIEW, to Further Elaborate Word-Use Structures

Beyond the structural insights derived from the two- and three-dimensional MDS configurations, a hierarchical cluster analysis of the co-occurrences matrix-visualized in the form of an unrooted cluster tree-can be useful in further elaborating linguistic/conceptual 'branchings' in the complex patterns of word use that occur in public documents and speeches. To accomplish this, it is first necessary to convert either (i) the 'similarities' in the co-occurrences matrix, or (ii) the derived MDS coordinates, into proper Euclidean distances (i.e. 'dis-similarities', as required by many cluster analysis programs). One can make use of the "proximities" conversion features in SPSS, SAS, or SYSTAT to accomplish the similarities-to-dissimilarities conversion of the co-occurrences matrix, or alternatively, one can convert the derived MDS coordinates (in the "bush202d.out" output file) to a Euclidean distance matrix for the 20 words. The latter strategy has the advantage of 'matching' more precisely the MDS configuration.

In the present example, the configuration-to-distances conversion program CNF2DIST was used to convert the 2-dimensional MDS coordinates (from "bush202d.out") into a square 20x20 matrix of Euclidean distances between all combinations of the 20 word stimuli. This converted matrix ("dist20.mat"), as output by CNF2DIST, was then processed with John Brzustowski's hierarchical clustering program QCLUST (which has the advantage of outputting a readable 'nested cluster chain' file-that can then be plotted by TREEVIEW). To accomplish this, the converted output matrix from CNF2DIST (contained in "dist20.mat") was pasted into a new .txt file "clustr20.inp", adding the number of cluster objects (20), and also a listing of the labels for the 20 cluster objects, as shown in Table 4:

Table 4: A Sample (CNF2DIST-converted) 20x20 Euclidean Distances
Matrix File, with Identifying Information Added, Ready for Analysis by QCLUST

Input file name: clustr20.inp
Output file name: clustr20.out
Command-line syntax to run this file at the DOS prompt:
qclust -m0 -n0 -c7 -i clustr20.inp -o clustr20.out -t tree20.txt

(Note that the matrix entries, as shown here, have been truncated in order to fit on the page. In the actual file, precision was to at least 4 decimal places, so as to obtain an optimal clustering and well-defined unrooted tree structures.)

The (DOS-level) command line for running qclust.exe on the distance matrix file shown in Table 4 ("clustr20.inp") would be:

qclust -m0 -n0 -c7 -i clustr20.inp -o clustr20.out -t tree20.txt

This instructs QCLUST to implement the following options, in running the hierarchical cluster analysis on the "*.mat" file:

Input matrix format = square distance matrix (m=
) Clustering method = Saiton & Nei neighbor joining (c=7)
Names/labels location = after item count but before matrix (n=0)

The output 'nested cluster chain' file that is produced by QCLUST reads as follows:

(((((allies:0.05096,friend:0.44904):0.13594,civil*:0.96406):0.29288,world:0.23212):0.37803,(((((AlQaeda:0.48529,kill:0.31471):0.30714,(home*:0.10000,protect:0.20000):0.29286):0.37824,war:0.14676):0.15553,terror*:0.10697):0.07202,((((destruct*:0.12614,(evil:0.10469,Hussein:0.79531):0.37386):0.02812,weapon:0.24687):0.32607,(enemy*:0.39000,Iraq:0.01000):0.31768):0.37207,they:0.13730):0.06138):0.24102):0.04697,(America*:0.10521,we:0.09479):0.06016,good:0.58984);

and is saved as "tree20.txt", in the same directory as QCLUST. This nested cluster chain file is then directly readable by Rod Page's (1993, 1996, 2002) TREEVIEW program-which can produce "most parsimonious fit" unrooted cluster tree graphics that further elaborate the 'branchings' and 'sub-branchings' within the patterns of word interrelationships (Harding 1972, Furnas 1984, Le Quesne 1989). Necessary commands within TREEVIEW (a Windows-based program) to accomplish this are:

Open = "tree20.txt"
Tree = unrooted
Tree/Order = ladderize right

As can be seen in Figure 3, two primary counterposed branch structures are revealed by TREEVIEW. The "we Americans" branch is associated with goodness, civility, and a world characterized by alliances with "friends". On the cluster tree this is in opposition to "they", which in turn has two distinct sub-branches - an "al-Qaeda" branch, concerned with protecting the homeland against terrorist threats, and a "Hussein/Iraq" branch, which defines an "evil enemy" bent on using weapons of mass destruction. Clearly, the analysis of these branching patterns gives additional definition to the structures that were isolated in the earlier MDS procedures, making possible further insights as to the intricacies of the word-usage patterns.

Figure 3: Unrooted Hierarchical Cluster Tree of Interrelationships between 20 Keywords
in Bush 'Post-9/11' Addresses to Congress, 2001-2004


DISCUSSION

This paper has demonstrated how it is possible to elaborate patterns in the structure of public language using a congeries of text-analysis and visualization programs that are easily available to the researcher, in either public-domain or 'demo' form. As we have seen, CONCORDANCE (and similar programs capable of producing concordances of large texts) can be used to create an initial concordance of public documents/speeches, thereby aiding in the identification of frequently-used keywords. HAMLET is useful in the construction/reading of vocabulary lists (and synonyms) of significant terms chosen for the analysis, and in producing word frequency lists and an initial matrix of the joint frequencies (word co-occurrences). The MDS(x) version of MINISSA is a flexible program for producing a refined, robust (moreso than the HAMLET version) two- or three-dimensional nonmetric MDS configuration from the co-occurrences matrix, and for identifying/exploring dimensions, clusters, partiitions, and bipolarites within the word-usage patterns. After conversion of the MINISSA 2-D coordinates into a proper Euclidean distance matrix (using CNF2DIST or a similar 'proximities conversion' program), Brzustowski's QCLUST can perform a hierarchical cluster analysis on the distance matrix, producing a nested cluster chain file. The latter is readable by Rod Page's public domain TREEVIEW, producing unrooted cluster trees that help further elaborate branchings and sub-branchings in complex patterns of word use.

This word frequenciesàco-occurrences matrixànonmetric MDSàunrooted trees sequence provides a useful overall analytical process for elaborating the meaning structures embedded in public language. As we have seen from the 20-keyword/17-synonym illustration, a "good-vs.-evil", "us vs. them", "civilized vs. barbaric world" underlying rhetorical structure was revealed in this sampling of public language (Bush's four post-9/11 speeches), indicating the potential utility of this technique for disentangling and better understanding complex structural patterns within political language. Though it is a helpful heuristic for isolating generalized patterns within public documents and speeches, it should be kept in mind that, epistemologically, this approach would be more appropriate when used in the context of an 'interpretive' or 'verstehen' framework, which treats speeches/documents as social reality constructions, rather than within a stricter confirmatory 'logical positivist' framework.

REFERENCES

Altemeyer, B. (1981). Enemies of Freedom. San Francisco: Jossey-Bass.

Ashby, F. G. (1992). Multidimensional models of perception and cognition. Mahwah, NJ: Erlbaum.

Axelrod, R. (1973). Schema theory: An information processing model of perception and cognition. American Political Science Review, 67, 1248-1266.

Barthes, R. (1968). Elements of semiology. New York: Hill & Wang.

Barthes, R. (1974). Mythologies. New York: Hill & Wang.

Bakhtin, M. (1986). Speech genres and other late essays. Austin: University of Texas Press.

Berger, P., & Luckmann, T. (1967). The social construction of reality. London: Penguin.

Brier, A. (1985). HAMLET: A Pascal Program to Count Joint Frequencies of Words in a Text. Siegener Periodicum für internationale empirische Sozialwissenschaft, 4, 177?196.

Brier, A. (1988). Natural language processing and the analysis of structure. European Political Data Newsletter, 68, 39-53.

Brier, A. (2003). Analysis of joint frequencies of words in a text: User notes for HAMLET for Windows. Southampton University: University Computing Service.

Brislin, R. W. (1980). Translation and content analysis of oral and written materials. In H. C. Triandis and J. W. Berry (Eds.), Handbook of cross-cultural psychology, Vol. 2 (pp. 389-444). Boston: Allen and Bacon.

Burke, K. (1945). A grammar of motives. New York: Prentice-Hall.

Burke. K. (1966). Language as symbolic action. Berkeley: University of California Press.

Burton, M. (1972). Semantic dimensions of occupation names. In K. Romney, R. Shepard, and S. Nerlove (Eds.), Multidimensional scaling: theory and applications in the social sciences, Volume II: Applications (pp. 55-71). New York: Seminar Press.

Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: M.I.T. Press.

Chomsky, N. (1967). Deep structure, surface structure, and semantic interpretation. Bloomington, IN: University Linguistics.

Conover, P. J., & Feldman, S. (1984). How people organize their political world: A schematic model. American Journal of Poltiical Science, 28, 95-125.

Cox, T. F. & Cox, M. A. (1994). Multidimensional Scaling. London: Chapman and Hall.

D'Andrade, R. G. (1984). Cultural meaning systems. In R. A. Schweder & R. a. Levine (Eds.), Culture Theory: Essays on Mind, Self, and Emotion (pp. 88-119). Cambridge: Cambridge University Press.

Edelman, M. (1977). Political language: Words that succeed and policies that fail. San Diego: Academic Press.

Edelman, M. (1985). Political language and political reality. Political Science and Politics, 18, 10-19.

Edelman, M. (1988). Constructing the political spectacle. Chicago: University of Chicago Press.

Feldman, S. & Stenner, K. (1997). Perceived threat and authoritarianism. Political Psychology, 18, 741-770.

Fiske, S. T., & Taylor, S. E. (1991). Social cognition. New York: McGraw-Hill.

Freeman, L. (2000). Visualizing social networks. Journal of Social Structure, 1, 1-10.

Freeman, L. (2004). The development of social network analysis. Vancouver: Empirical Press.

Furnas, G. W. (1984). The generation of random, binary unordered trees. Journal of Classification, 1, 187-233.

Gamson, W. & Stuart, D. (1992). Media discourse as a symbolic contest. Sociological Forum 7, 55-86.

Geertz, C. (1973). The interpretation of cultures. New York: Basic Books.

Gordon, A. D., Jupp P.E., & Byrn, R. W. (1989). The construction and assessment of mental maps. British Journal of Mathematical Psychology, 42, 169-182.

Glazer R. & Nakamoto, K. (1991). Cognitive geometry: An analysis of structure underlying representations of similarity. Marketing Science, 10, 205-228.

Goffman, E. (1974). Frame analysis: An essay on the organization of experience. Cambridge, MA: Harvard University Press.

Guttman, L (1968). A general nonmetric technique for finding the smallest coordinate space for a configuration of points. Psychometrika, 33, 469-506.

Harding, E. F. (1972). The probabilities of random tree shapes generated by random bifurcation. Advances in Applied Probability, 3, 44-77.

Howard-Hill, T. H. (1979). Literary concordances: A complete handbook for the preparation of manual and computer concordances. Oxford: Pergamon Press.

Holsti, Ole. 1967. Cognitive dynamics and images of the enemy. Journal of International Affairs, 21, 16-39.

Holsti, O. and Fagan, R. (1967). Enemies in politics. Chicago: Rand-McNally.

Iker, H. P. (1974). An historical note on the use of word?frequency contiguities in content analysis. Computers and the Humanities, 8, 1-15.

Inkeles, A. (1996). National character: A Psycho-social perspective. New Brunswick, NJ: Transaction.

Jervis, R. (1976). Perception and misperception in international politics. Princeton, NJ: Princeton University Press.

Keen, S. (1988). Faces of the Enemy: Reflections of the Hostile Imagination. San Francisco: Harper & Row.

Krippendorf, K. (1980). Content Analysis : An Introduction to its Methodology. Beverly Hills: Sage.

Kruskal, J. B. (1964). Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis. Psychometrika, 29, 1-27.

Lane, R. (1973). Patterns of political belief. In J. Knutson (Ed.), Handbook of political psychology (pp. 83-116). San Francisco: Jossey-Bass.

Lau, R. & Sears, D. (1982). Political cognition. Hillsdale, NJ: Lawrence Erlbaum Associates.

Le Quesne, W. J. (1989). Frequency distributions of lengths of possible networks from a data matrix. Cladistic, 5, 395-407.

Levine, R. A. & Cambell, D. T. (1973). Ethnocentrism: Theories of conflict, ethnic attitudes, and group behavior. New York: Wiley.

Lingoes, J.C., Roskam, E.E. & Borg, I. (1979). Geometric representation of relational data. Ann Arbor, MI: Mathesis Press.

Luce, R. D., M. D'Zmura, D. D. Hoffman, G. Iverson, & A. K. Romney, eds. (1995). Geometric representations of perceptual phenomena. Mahwah, NJ: Erlbaum.

Mead, G. H. (1934). Mind, self, and society. Chicago: University of Chicago Press.

Merleau-Ponty, M. (1962). Phenomenology of perception. New York: Humanities Press.

Osgood, C. et al. (1957). The measurement of meaning. Urbana, IL: University of Illinois Press.

Page, R.D.M. (1993). On describing the shape of rooted and unrooted trees. Cladistics, 9, 93-99.

Page, R.D.M. (1996). TreeView: An application to display phylogenetic trees on personal computers. Computer Applications in the Biological Sciences, 12, 357-358.

Page, R. D. M. (ed). 2002. Tangled trees: phylogeny, cospeciation and coevolution. University of Chicago Press.

Perdue, C., Dovidio, J., Gurtman, M., & Tyler, R. (1990). Us and them: Social categorization and the process of intergroup bias. Journal of Personality and Social Psychology, 59, 475-486.

Rapoport, A. and Fillenbaum, S. (1972). An experimental study of semantic structures. In K. Romney, R. Shepard, and S. Nerlove (Eds.), Multidimensional scaling: theory and applications in the social sciences, Volume II: Applications (pp. 93-131). New York: Seminar Press.

Robinson, C. and Powell, L. (1996). The postmodern politics of context definition: Competing reality frames. Sociological Quarterly, 37, 279-305.

Roskam, E.E. & Lingoes, J.C (1970). MINISSA-1: A FORTRAN IV (G) program for the smallest space analysis of square symmetric matrices, Behavioral. Science, 15 , 204-205.

Sapir, E. (1951). The status of linguistics as a science. In D. Mendelbaum (Ed.), Selected writings (pp. 207-214). Berkeley: University of California Press.

Schutz, A. (1972). The phenomenology of the social world. London: Heinemann.

Sinclair, J. (1986). Basic computer processing of long texts. In Leech G. & Candlin C. (eds.), Computers in English Language Teaching and Research, Harlow, Essex: Longman.

Staub, E. (1996). Cultural-societal roots of violence. American Psychologist, 51, 117-132.

Tajfel, H. (1978). Differentiation between social groups: Studies in the social psychology of intergroup relations. London: Academic Press.

Triandis, H. (1994). Culture and social behavior. New York: McGraw-Hill.

Triandis, H. (1996). The psychological measurement of cultural syndromes. American Psychologist, 51, 407-415.

Van der Dennen, J. (1987). Ethnocentrism and ingroup-outgroup differentiation. In V. Reynolds, V. Falger, and I. Vine (Eds.), The sociobiology of ethnocentrism. London: Croom Helm.

Watt, R.J.C. (2002), CONCORDANCE manual.

Weber, R. P. (1990). Basic content analysis. Newbury Park, CA: Sage.

Whorf, B. (1956). Language, thought, and reality. New York: Wiley.

Wish, M., Deutsch, M. & Biener, L. (1970). Differences in conceptual structures of nations: An exploratory study. Journal of Personality and Social Psychology, 16, 361-373.

PROGRAM SOURCES

CONCORDANCE
http://www.rjcw.freeserve.co.uk/
http://www.rjcw.freeserve.co.uk/manual/hs2030.htm

HAMLET
http://www.apb.cwc.net/homepage.htm
http://www.apb.cwc.net/download.htm

MINISSA (MDS(x) version)
http://www.newmdsx.com/MINISSA/minissa.htm

QCLUST
http://www2.biology.ualberta.ca/jbrzusto/docslust.html

TREEVIEW
http://taxonomy.zoology.gla.ac.uk/rod/treeview.html

CNF2DIST (converts MDS configuration to square Euclidean distance matrix)
Available from the author.

MDSPLOT2 (plots multidimensional scaling coordinates in two dimensions;
saves as publication-quality bitmap)
Available from the author.

MDSPLOT3 (plots multidimensional scaling coordinates in three dimensions;
saves as publication-quality bitmap)
Available from the author.

NOTES

[1] For discussions of the epistemological issues involved in developing theoretical models of culturally-shared linguistic meaning in human communication, see Chomsky (1965, 1967), Barthes (1968, 1974), Bakhtin (1986), Edelman (1977, 1985, 1988), Sapir (1951), Whorf (1956), Geertz (1973), D'Andrade (1984), Mead (1934), Berger & Luckmann (1967), Goffman (1974), Merleau-Ponty (1962), Schutz (1972), Burke (1945, 1966), Lane (1973), Axelrod (1973), Fiske & Taylor (1991), Conover & Feldman (1984), Lau and Sears (1982), Triandis (1994).

[2] On methodological techniques for the description, quantification, and/or visual portrayal of meaning structures, see Osgood (1957), Kruskal (1964), Guttman (1968), Lingoes, Roskam, & Borg (1979), Iker (1974), Brier (1988), Ashby (1992), Glazer & Nakamoto (1989), Gordon, Jupp & Byrn (1991), Rapoport & Fillenbaum (1972), Wish, Deutsch & Biener (1970), Burton (1972), Gamson & Stuart (1982), Luce et al. (1995), Powell & Robinson (1996), Triandis (1996), Freeman (2000, 2004).

[3] On cognitive functions of enemy-making and the psychology of hostility, see Holsti (1967), Holsti & Fagan (1967), Jervis (1976), Altemeyer (1981), Keen (1988), Inkeles (1996), Staub (1996), Feldman & Stenner (1997).

[4] For related studies of "us vs. them" ingroup-outgroup perception processes within societies, and in international politics, see Levine & Campbell (1973), Tajfel (1978), Van der Dennen (1987), Keen (1988), Perdue (1990).

Home
.