You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@JonathanReeve I wanted to make sure I understand the methodology correctly. Is this a fair summary?
In order to calculate the words which are disproportionately present in quoted vs. unquoted parts of the novel, we first split the novel's text into two parts: all the words which had been quoted at least once and all the words which hadn't. For the quoted corpus, we then multiplied each of the words by the number of times it had appeared in any quotation [did we do this? if not I think it's crucial that we do!]. Finally we normalized the word frequencies so that they represented the frequency per 100,000 words, to allow for comparison of different-sized corpora.
When the frequency per 100,000 words was similar for both corpora, we deemed the quotation to be proportionate to the whole. When the frequencies diverged most strongly, we consider this to be "disproportionate" and thus possible evidence of selection as opposed to simply arising by chance. Here we offer a list of the top 25 words most disproportionately present in quoted vs non-quoted text.
And just a question that occurs as I'm writing it... When we're comparing these two corpora (unquoted vs quoted-weighted), does it make more sense to compare their frequencies to each other or to the original full text of Middlemarch? Right now I'm struggling to get my head around the different implications of each approach, but I know they are different!
The text was updated successfully, but these errors were encountered:
@JonathanReeve I wanted to make sure I understand the methodology correctly. Is this a fair summary?
And just a question that occurs as I'm writing it... When we're comparing these two corpora (unquoted vs quoted-weighted), does it make more sense to compare their frequencies to each other or to the original full text of Middlemarch? Right now I'm struggling to get my head around the different implications of each approach, but I know they are different!
The text was updated successfully, but these errors were encountered: