Introducing Custom Tables in Spyral Notebook

This notebook will help you work with custom Spyral Tables in your notebooks (it's part of the Art of Literary Text Analysis with Spyral Notebooks). In particular, we'll look at:

In the previous notebook we looked at analyzing Edgar Allan Poe's "The Gold Bug" though we didn't spend much time on one of its key topics: cryptography. Indeed, for a literrary short story, there's a fairly good introduction to simple encryption, including this part:

These characters, as any one might readily guess, form a cipher --that is to say, they convey a meaning … Now, in English, the letter which most frequently occurs is 'e'. Afterwards, the succession runs thus: a o i d h n r s t u y c f g l m w b k p q x z. E however predominates so remarkably that an individual sentence of any length is rarely seen, in which it is not the prevailing character.

Beyond an analysis of the story, wouldn't it be interesting to test this affirmation, and in particular to see if the sequence of top frequency letters described in the short story matches the sequence of top frequency letters in text of the short story itself? We can do that! Spryal can help!

AnchorCreating a Table

Let's take a moment to describe the steps we'll follow:

  • create a corpus with text from "The Gold Bug"
  • extract the actual text from the corpus
  • clean the text (so that we are left with letters from a to z)
  • create a table of letters and their frequencies
  • display output from the table

Create a Corpus with Text from "The Gold Bug"

Just as we did in the previous notebook, we'll start by creating our corpus (in case it doesn't yet exist).

loadCorpus("https://gist.githubusercontent.com/sgsinclair/84c9da05e9e142af30779cc91440e8c1/raw/goldbug.txt", {
inputRemoveUntil: 'THE GOLD-BUG',
inputRemoveFrom: 'FOUR BEASTS IN ONE'
}).assign("goldbug")
loadCorpus("https://gist.githubusercontent.com/sgsinclair/84c9da05e9e142af30779cc91440e8c1/raw/goldbug.txt", {
    inputRemoveUntil: 'THE GOLD-BUG',
    inputRemoveFrom: 'FOUR BEASTS IN ONE'
}).assign("goldbug")
This corpus has 1 document with 13,731 total words and 2,756 unique word forms. Created about 7 minutes ago.

Extract the Actual Text from the Corpus

You may recall from the previous notebook a discussion of asynchronous Javascript which led us to use the assign function (instead of the more complicated promise … then pattern). We can use a similar trick to get our plain text (note that we can assign and show a subset of the string, but any other processing after assign would need to occur in a separate block). We use getPlainText here because the text isn't too long, we might not want to do this with a larger corpus of several books.

goldbug.getPlainText().assign("text").show(100); // show first 100 characters
goldbug.getPlainText().assign("text").show(100); // show first 100 characters

Clean the Text

We have the full text of "The Gold Bug", but we're only interested in counting letters from a to z. In order to prepare the letters to be counted, we'll do two operations:

  1. convert the entire text to lowecase characters
  2. remove all the characters that aren't from a to z using a regular expression

The first operation of converting to lowercase is trivial, we use the toLowerCase function. The second operation is a bit trickier. This isn't the venue to explain regular expressions in detail, but essentially they provide a powerful mechanism for string matching that can use wildcards, character classes and character ranges. In our case we can simply strip out all the caracters that are not in the range from a to z: [^a-z].

  • [] denotes a set of characters
  • ^ specifies to match what is NOT the character that follows
  • a-z is treated as a single range (every character from a to z)
// convert to lower case and replace non a-z characters with nothing
var clean = text.toLowerCase().replace(/[^a-z]/g, "")
show(clean.substring(0,50))
// convert to lower case and replace non a-z characters with nothing
var clean = text.toLowerCase().replace(/[^a-z]/g, "")
show(clean.substring(0,50))
thegoldbugwhathowhathothisfellowisdancingmadhehath

Create a Table of Letter Frequencies

Creating a Table in Spyral is easy, it can be something as simple as this:

var table = new VoyantTable();

As we'll see, tables provide convenient methods for updating values and for displaying output. Another benefit of Tables is a convenient way to count the items in an array. In order to convert our string of letters into an array of letters, we'll use the split function.

var letters = clean.split(""); // make an array, one letter per item
var table = new VoyantTable({count: letters, orientation: 'horizontal'}); // create a table with letter counts
table.show();
var letters = clean.split(""); // make an array, one letter per item
var table = new VoyantTable({count: letters, orientation: 'horizontal'}); // create a table with letter counts
table.show();
etaoinshrdlucmfwpygbvkxjqz
762554854478420841833912351633723278253323241893152315001392130311701146114310315253511201116044

So we have a simple output of our table of letters. We definitely see that "e" is first (counting the letters sorts them automatically by frequency), though the "t" is a bit of a surprise compared to the original. Let's try to output the two sets of letters together:

var lettersDescribed = "e a o i d h n r s t u y c f g l m w b k p q x z";
var lettersCounted = table.getHeaders().join(' '); // combine lettes with a space
show("<pre>"+lettersDescribed+"\n"+lettersCounted+"</pre>");
var lettersDescribed = "e a o i d h n r s t u y c f g l m w b k p q x z";
var lettersCounted = table.getHeaders().join(' '); // combine lettes with a space
show("
"+lettersDescribed+"\n"+lettersCounted+"
");
e a o i d h n r s t u y c f g l m w b k p q x z
e t a o i n s h r d l u c m f w p y g b v k x j q z

Very interesting! For one thing, this shows that the list of letters described in the text are missing two members, which we can determine to be "v" and "j". Is this an omission from the author? From the edition? A quick look at some other editions online suggest that the omission is common, perhaps to all editions. The order of many other letters are different, the top letters are the most significant (such as the prominence of "t" in the text of "Gold Bug" that interrupts the sequence of vowels). We'd probably want to compare these frequencies to other texts as well to see if "The Gold Bug" is somehow distinctive (and how commonly the sequence claimed in the text is right or wrong).

AncreDisplaying Data from a Table

Although the default table.show() method that we used provides a nice display of the letter frequencies, it's also possible to display data from a table in other ways. In Sprial show() is usually used for the simplest (and static) output possible while embed() is used for embedding interactive views. We can generate a default grid from a table by simply calling embed (with a parameter to indicate that the grid should occupy 100% of the width available).

table.embed({width: "100%", height: "100px"});
table.embed({width: "100%", height: "100px"});

Although we can show and hide columns (see documentation on grids), in this case the grid isn't all that useful, and arguably harder to read than output from show(). But when tables have several rows of data and one wishes to do things like sort columns, embedding a table can be useful.

In this case what might be more useful is embedding a chart. However, before we do that, we're going to recreate our table with the default vertical orientation instead of the horizontal orientation we had previously (which was a very compact way of seeing letters as column headers and counts in the first row, but which arguably isn't a very standard way of showing tabular data).

var verticalTable = new VoyantTable({count: letters, headers: ["Letter", "Count"]}); // create a table with letter counts
verticalTable.show();
var verticalTable = new VoyantTable({count: letters, headers: ["Letter", "Count"]}); // create a table with letter counts
verticalTable.show();
LetterCount
e7625
t5485
a4478
o4208
i4183
n3912
s3516
h3372
r3278
d2533
l2324
u1893
c1523
m1500
f1392
w1303
p1170
y1146
g1143
b1031
v525
k351
x120
j111
q60
z44

No surprises, but now that we have a more standard table, we can easily create a line chart that shows the letter counts on the y (vertical) axis and the letters on the x (horizontal) axis.

verticalTable.embed('VoyantChart', {width: 500});
verticalTable.embed('VoyantChart', {width: 500});

The resulting graph is something close to what's called Zipf's Law which states that (for many lingustic phenomena like letter and word counts), frequency is inversely proportional to rank (or, roughly speaking, each letter is about half as frequent as its predecessor). While not quite a clean curve, the letter counts in Gold Bug somewhat follows Zipf's Law.

So we've looked at how easy and convenient it can be to create a table, in this case of letter frequencies in "The Gold Bug" and how tables can be display, including as grids and as charts.

AncreNext Steps

Here are some exercises to try, based on the contents of this notebook:

  • the prominence of the letter "t" in "The Gold Bug" is certainly unusual, any theories why? Could you describe in English (not in code) the steps that might be needed to better understand the use and distribution of the letter "t"?
  • create a new corpus from a different short story and start exploring it in the same way we did here, is the order of letters different?

If you're working sequentially through the Art of Literary Text Analysis with Spyral Notebooks, the next notebook is Thinking about Scale.