-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathsentiment-analysis.rmd
589 lines (458 loc) · 24.5 KB
/
sentiment-analysis.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
# Sentiment Analysis
```{r setup_sentiment, echo=FALSE}
knitr::opts_chunk$set(cache.rebuild=T, cache=T)
```
<div style='text-align:center; font-size: 10rem; color:#990024CC'>
<i class="far fa-smile-beam fa-10x"></i>
<i class="far fa-meh-blank fa-10x"></i>
<i class="far fa-angry fa-10x"></i>
</div>
## Basic idea
A common and intuitive approach to text is <span class="emph">sentiment analysis</span>. In a grand sense, we are interested in the emotional content of some text, e.g. posts on Facebook, tweets, or movie reviews. Most of the time, this is obvious when one reads it, but if you have hundreds of thousands or millions of strings to analyze, you'd like to be able to do so efficiently.
We will use the <span class="pack">tidytext</span> package for our demonstration. It comes with a lexicon of positive and negative words that is actually a combination of multiple sources, one of which provides numeric ratings, while the others suggest different classes of sentiment.
```{r lexicon, echo=-1}
set.seed(1234)
library(tidytext)
sentiments %>% slice(sample(1:nrow(sentiments)))
```
The gist is that we are dealing with a specific, pre-defined vocabulary. Of course, any analysis will only be as good as the lexicon. The goal is usually to assign a sentiment score to a text, possibly an overall score, or a generally positive or negative grade. Given that, other analyses may be implemented to predict sentiment via standard regression tools or machine learning approaches.
## Issues
### Context, sarcasm, etc.
Now consider the following.
```{r sent_is_sick}
sentiments %>% filter(word=='sick')
```
Despite the above assigned sentiments, the word *sick* has been used at least since 1960s surfing culture as slang for positive affect. A basic approach to sentiment analysis as described here will not be able to detect slang or other context like sarcasm. However, lots of training data for a particular context may allow one to correctly predict such sentiment. In addition, there are, for example, slang lexicons, or one can simply add their own complements to any available lexicon.
### Lexicons
In addition, the lexicons are going to maybe be applicable to *general* usage of English in the western world. Some might wonder where exactly these came from or who decided that the word *abacus* should be affiliated with 'trust'. You may start your path by typing `?sentiments` at the console if you have the <span class="pack">tidytext</span> package loaded.
## Sentiment Analysis Examples
### The first thing the baby did wrong
We demonstrate sentiment analysis with the text *The first thing the baby did wrong*, which is a very popular brief guide to parenting written by world renown psychologist [Donald Barthelme][Donald Barthelme] who, in his spare time, also wrote postmodern literature. This particular text talks about an issue with the baby, whose name is Born Dancin', and who likes to tear pages out of books. Attempts are made by her parents to rectify the situation, without much success, but things are finally resolved at the end. The ultimate goal will be to see how sentiment in the text evolves over time, and in general we'd expect things to end more positively than they began.
How do we start? Let's look again at the <span class="objclass">sentiments</span> data set in the <span class="pack">tidytext</span> package.
```{r inspect_sentiments}
sentiments %>% slice(sample(1:nrow(sentiments)))
```
The <span class="objclass">bing</span> lexicon provides only *positive* or *negative* labels. The AFINN, on the other hand, is numerical, with ratings -5:5 that are in the <span class="objclass">score</span> column. The others get more imaginative, but also more problematic. Why *assimilate* is *superfluous* is beyond me. It clearly should be negative given the [Borg](https://en.wikipedia.org/wiki/Borg_%28Star_Trek%29) connotations.
```{r superfluous}
sentiments %>%
filter(sentiment=='superfluous')
```
#### Read in the text files
But I digress. We start with the raw text, reading it in line by line. In what follows we read in all the texts (three) in a given directory, such that each element of 'text' is the work itself, i.e. `text` is a list column[^text]. The <span class="func">unnest</span> function will unravel the works to where each entry is essentially a paragraph form.
```{r baby_sentiment_importraw, echo=T}
library(tidytext)
barth0 =
data_frame(file = dir('data/texts_raw/barthelme', full.names = TRUE)) %>%
mutate(text = map(file, read_lines)) %>%
transmute(work = basename(file), text) %>%
unnest(text)
```
#### Iterative processing
One of the things stressed in this document is the iterative nature of text analysis. You will consistently take two steps forward, and then one or two back as you find issues that need to be addressed. For example, in a subsequent step I found there were encoding issues[^encoding], so the following attempts to fix them. In addition, we want to <span class="emph">tokenize</span> the documents such that our <span class="emph">tokens</span> are sentences (e.g. as opposed to words or paragraphs). The reason for this is that I will be summarizing the sentiment at sentence level.
```{r barth_fix_encoding, echo=1:2, eval=1:2}
# Fix encoding, convert to sentences; you may get a warning message
barth = barth0 %>%
mutate(
text =
sapply(
text,
stringi::stri_enc_toutf8,
is_unknown_8bit = TRUE,
validate = TRUE
)
) %>%
unnest_tokens(
output = sentence,
input = text,
token = 'sentences'
)
save(barth, file='data/barth_sentences.RData')
```
#### Tokenization
The next step is to drill down to just the document we want, and subsequently tokenize to the word level. However, I also create a sentence id so that we can group on it later.
```{r get_the_baby}
# get baby doc, convert to words
baby = barth %>%
filter(work=='baby.txt') %>%
mutate(sentence_id = 1:n()) %>%
unnest_tokens(
output = word,
input = sentence,
token = 'words',
drop = FALSE
) %>%
ungroup()
```
#### Get sentiments
Now that the data has been prepped, getting the sentiments is ridiculously easy. But that is how it is with text analysis. All the hard work is spent with the data processing. Here all we need is an <span class="emph">inner join</span> of our words with a sentiment lexicon of choice. This process will only retain words that are also in the lexicon. I use the numeric-based lexicon here. At that point, we get a sum score of sentiment by sentence.
```{r baby_sentiment}
# get sentiment via inner join
baby_sentiment = baby %>%
inner_join(get_sentiments("afinn")) %>%
group_by(sentence_id, sentence) %>%
summarise(sentiment = sum(score)) %>%
ungroup()
```
#### Alternative approach
As we are interested in the sentence level, it turns out that the <span class="pack">sentimentr</span> package has built-in functionality for this, and includes a more nuanced sentiment scores that takes into account <span class="emph">valence shifters</span>, e.g. words that would negate something with positive or negative sentiment ('I do ***not*** like it').
```{r sentimentr, eval=FALSE}
baby_sentiment = barth0 %>%
filter(work=='baby.txt') %>%
get_sentences(text) %>%
sentiment() %>%
drop_na() %>% # empty lines
mutate(sentence_id = row_number())
```
The following visualizes sentiment over the progression of sentences (note that not every sentence will receive a sentiment score). You can read the sentence by hovering over the dot. The <span style="color:#5500ff">▬</span> is the running average.
```{r plot_sentiment, echo=FALSE}
# plot sentiment over sentences
baby_sentiment %>%
mutate(running_avg = cumsum(sentiment)/row_number()) %>%
plot_ly(width='50%') %>%
add_lines(x=~sentence_id, y=~running_avg,
color=I('#5500ff'),
line = list(shape = "spline"),
opacity=.5,
name='running average',
showlegend=T) %>%
add_lines(x=~sentence_id, y=~sentiment,
color=I('#00aaff'),
line = list(shape = "spline"),
showlegend=F) %>%
add_markers(x=~sentence_id, y=~sentiment,
color=I('#ff5500'),
marker = list(size=15),
hoverinfo=~ 'text',
text=~str_wrap(sentence),
showlegend=F) %>%
theme_plotly()
# baby_sentiment %>%
# mutate(running_avg = cumsum(sentiment)/row_number()) %>%
# filter(sentiment != 0) %>%
# plot_ly(width='50%') %>%
# add_lines(
# x = ~ sentence_id,
# y = ~ running_avg,
# color = I('#5500ff'),
# line = list(shape = "spline"),
# opacity = .5,
# name = 'running average',
# showlegend = T
# ) %>%
# add_lines(x=~sentence_id, y=~sentiment,
# color=I('#00aaff'),
# line = list(shape = "spline"),
# showlegend=F) %>%
# add_markers(x=~sentence_id, y=~sentiment,
# color=I('#ff5500'),
# marker = list(size=15),
# hoverinfo=~ 'text',
# text=~str_wrap(text),
# showlegend=F) %>%
# theme_plotly()
```
<br>
In general, the sentiment starts out negative as the problem is explained. It bounces back and forth a bit but ends on a positive note. You'll see that some sentences' context are not captured. For example, sentence 16 is 'But it didn't do any good'. However *good* is going to be marked as a positive sentiment in any lexicon by default. In addition, the token length will matter. Longer sentences are more likely to have some sentiment, for example.
### Romeo & Juliet
For this example, I'll invite you to more or less follow along, as there is notable pre-processing that must be done. We'll look at sentiment in Shakespeare's Romeo and Juliet. I have a cleaner version in the raw texts folder, but we can take the opportunity to use the <span class="pack">gutenbergr</span> package to download it directly from Project Gutenberg, a storehouse for works that have entered the public domain.
```{r rnj_load, echo=-c(3,5)}
library(gutenbergr)
gw0 = gutenberg_works(title == "Romeo and Juliet") # look for something with this title
gw0[,1:4]
rnj = gutenberg_download(gw0$gutenberg_id)
DT::datatable(rnj,
rownames=F,
options=list(dom='tp',
autoWidth = TRUE,
columnDefs = list(list(width = '50px', targets = 0))))
```
<br>
We've got the text now, but there is still work to be done. The following is a quick and dirty approach, but see the [Shakespeare section][Shakespeare Start to Finish] to see a more deliberate one.
We first slice off the initial parts we don't want like title, author etc. Then we get rid of other tidbits that would interfere, using a little regex as well to aid the process.
```{r rnj_clean, echo=-2}
rnj_filtered = rnj %>%
slice(-(1:49)) %>%
filter(!text==str_to_upper(text), # will remove THE PROLOGUE etc.
!text==str_to_title(text), # will remove names/single word lines
!str_detect(text, pattern='^(Scene|SCENE)|^(Act|ACT)|^\\[')) %>%
select(-gutenberg_id) %>%
unnest_tokens(sentence, input=text, token='sentences') %>%
mutate(sentenceID = 1:n())
DT::datatable(select(rnj_filtered, sentenceID, sentence),
rownames=F,
options=list(dom='tp',
autoWidth = TRUE,
columnDefs = list(list(width = '50px', targets = 0)))
)
```
<br>
The following unnests the data to word tokens. In addition, you can remove stopwords like a, an, the etc., and <span class="pack">tidytext</span> comes with a <span class="objclass">stop_words</span> data frame. However, some of the stopwords have sentiments, so you would get a bit of a different result if you retain them. As Black Sheep once said, the choice is yours, and you can deal with this, or you can deal with that.
```{r rnj_stopwords}
# show some of the matches
stop_words$word[which(stop_words$word %in% sentiments$word)] %>% head(20)
# remember to call output 'word' or antijoin won't work without a 'by' argument
rnj_filtered = rnj_filtered %>%
unnest_tokens(output=word, input=sentence, token='words') %>%
anti_join(stop_words)
```
Now we add the sentiments via the <span class="func">inner_join</span> function. Here I use 'bing', but you can use another, and you might get a different result.
```{r rnj_sentiment}
rnj_filtered %>%
count(word) %>%
arrange(desc(n))
rnj_sentiment = rnj_filtered %>%
inner_join(sentiments)
rnj_sentiment
```
```{r rnj_bing}
rnj_sentiment_bing = rnj_sentiment %>%
filter(lexicon=='bing')
table(rnj_sentiment_bing$sentiment)
```
```{r sentimentr_rng, eval=FALSE, echo=FALSE, cache=FALSE}
# appears to give more (too much?) weight to postiive
library(sentimentr)
test = rnj %>%
slice(-(1:49)) %>%
filter(!text==str_to_upper(text), # will remove THE PROLOGUE etc.
!text==str_to_title(text), # will remove names/single word lines
!str_detect(text, pattern='^(Scene|SCENE)|^(Act|ACT)|^\\[')) %>%
select(-gutenberg_id) %>%
pull(text) %>%
paste(collapse = ' ') %>%
get_sentences() %>%
sentiment()
ay <- list(
tickfont = list(color = "green"),
overlaying = "y",
side = "right",
title = "raw sentiment",
titlefont = list(textangle=45),
zeroline = F
)
test %>%
mutate(positivity = cumsum(if_else(sentiment>0, sentiment, 0)),
negativity = cumsum(abs(if_else(sentiment<0, sentiment, 0)))) %>%
plot_ly() %>%
add_lines(x=~sentence_id, y=~positivity, name='positive') %>%
add_lines(x=~sentence_id, y=~negativity, name='negative') %>%
layout() %>%
add_lines(x=~sentence_id, y=~sentiment, name='sentiment', yaxis = "y2", opacity=.1) %>%
# plotly by default has the second axis title backwards, but provides no fix
# other than to use annotate, which won't work because it extends the plot x's
# range; 'shift' does nothing but add or subtract to the x value, making it
# pointless
# add_annotations(text = 'plotly strikes again', x = 1440, xshift=100, y = 75, textangle=90, showarrow = F) %>%
theme_plotly() %>%
layout(
yaxis = list(title='absolute cumulative sentiment'),
yaxis2 = ay,
xaxis = list(title="sentence ID")
)
test %>%
mutate(positivity = cumsum(if_else(sentiment>0, sentiment, 0)),
negativity = cumsum(abs(if_else(sentiment<0, sentiment, 0)))) %>%
plot_ly() %>%
add_lines(x=~sentence_id, y=~positivity - negativity) %>%
layout(yaxis = list(title='sentiment')) %>%
theme_plotly()
```
Looks like this one is going to be a downer. The following visualizes the positive and negative sentiment scores as one progresses sentence by sentence through the work using the <span class="pack">plotly</span> package. I also show same information expressed as a difference (opaque line).
```{r rnj_sentiment_as_game, echo=FALSE}
# Note that plotly will start disappearing with ANY change, whether this has to
# do with the code, text or whatever, so you'll need to rebuild the cache for
# this page one last time to ensure it will be displayed.
ay <- list(
tickfont = list(color = "#2ca02c40"),
overlaying = "y",
side = "right",
# title = "sentiment difference",
titlefont = list(textangle=45),
zeroline = F
)
rnj_sentiment_bing %>%
arrange(sentenceID) %>%
mutate(positivity = cumsum(sentiment=='positive'),
negativity = cumsum(sentiment=='negative')) %>%
plot_ly() %>%
add_lines(x=~sentenceID, y=~positivity, name='positive') %>%
add_lines(x=~sentenceID, y=~negativity, name='negative') %>%
add_lines(x=~sentenceID, y=~positivity-negativity, name='difference',
yaxis = "y2",
opacity=.25) %>%
layout(
xaxis = list(dtick = 200),
yaxis = list(title='absolute cumulative sentiment'),
yaxis2 = ay
) %>%
theme_plotly()
```
```{r rnj_sentiment_diff, echo=F, eval=FALSE}
rnj_sentiment_bing %>%
arrange(sentenceID) %>%
mutate(positivity = cumsum(sentiment=='positive'),
negativity = cumsum(sentiment=='negative')) %>%
plot_ly() %>%
add_lines(x=~sentenceID, y=~positivity-negativity) %>%
theme_plotly() %>%
config(displayModeBar = F)
```
<br>
It's a close game until perhaps the midway point, when negativity takes over and despair sets in with the story. By the end [[:SPOILER ALERT:]] Sean Bean is beheaded, Darth Vader reveals himself to be Luke's father, and Verbal is Keyser Söze.
## Sentiment Analysis Summary
In general, sentiment analysis can be a useful exploration of data, but it is highly dependent on the context and tools used. Note also that 'sentiment' can be anything, it doesn't have to be positive vs. negative. Any vocabulary may be applied, and so it has more utility than the usual implementation.
It should also be noted that the above demonstration is largely conceptual and descriptive. While fun, it's a bit simplified. For starters, trying to classify words as simply positive or negative itself is not a straightforward endeavor. As we noted at the beginning, context matters, and in general you'd want to take it into account. Modern methods of sentiment analysis would use approaches like word2vec or deep learning to predict a sentiment probability, as opposed to a simple word match. Even in the above, matching sentiments to texts would probably only be a precursor to building a model predicting sentiment, which could then be applied to new data.
## Exercise
### Step 0: Install the packages
If you haven't already, install the <span class="pack">tidytext</span> package. Install the <span class="pack">janeaustenr</span> package and load both of them[^lazy].
### Step 1: Initial inspection
First you'll want to look at what we're dealing with, so take a gander at <span class="objclass">austenbooks</span>.
```{r ja_inspect}
library(tidytext); library(janeaustenr)
austen_books()
austen_books() %>%
distinct(book)
```
We will examine only one text. In addition, for this exercise we'll take a little bit of a different approach, looking for a specific kind of sentiment using the NRC database. It contains 10 distinct sentiments.
```{r nrc_sentiment}
get_sentiments("nrc") %>% distinct(sentiment)
```
Now, select from any of those sentiments you like (or more than one), and one of the texts as follows.
```{r nrc_init, eval=FALSE}
nrc_sadness <- get_sentiments("nrc") %>%
filter(sentiment == "positive")
ja_book = austen_books() %>%
filter(book == "Emma")
```
```{r nrc_init_, echo=FALSE}
nrc_bad <- get_sentiments("nrc") %>%
filter(sentiment %in% c('fear', 'negative', 'sadness', 'anger', 'disgust'))
ja_book = austen_books() %>%
filter(book == "Mansfield Park")
```
### Step 2: Data prep
Now we do a little prep, and I'll save you the trouble. You can just run the following.
```{r ja_prep, eval=FALSE}
ja_book = ja_book %>%
mutate(chapter = str_detect(text, regex("^chapter [\\divxlc]", ignore_case = TRUE)),
chapter = cumsum(chapter),
line_book = row_number()) %>%
unnest_tokens(word, text)
```
```{r ja_prep2}
ja_book = ja_book %>%
mutate(chapter = str_detect(text, regex("^chapter [\\divxlc]", ignore_case = TRUE)),
chapter = cumsum(chapter),
line_book = row_number()) %>%
group_by(chapter) %>%
mutate(line_chapter = row_number()) %>%
# ungroup()
unnest_tokens(word, text)
```
### Step 3: Get sentiment
Now, on your own, try the inner join approach we used previously to match the sentiments to the text. Don't try to overthink this. The third pipe step will use the <span class="func">count</span> function with the `word` column and also the argument `sort=TRUE`. Note this is just to look at your result, we aren't assigning it to an object yet.
```{r ja_join_sentiment_example, eval=FALSE}
ja_book %>%
? %>%
?
```
The following shows my negative evaluation of <span class="emph">Mansfield Park</span>.
```{r ja_join_sentiment, echo=FALSE}
ja_book %>%
inner_join(nrc_bad) %>%
count(word, sort = TRUE)
```
### Step 4: Visualize
Now let's do a visualization for sentiment. So redo your inner join, but we'll create a data frame that has the information we need.
```{r negativity_data}
plot_data = ja_book %>%
inner_join(nrc_bad) %>%
group_by(chapter, line_book, line_chapter) %>%
count() %>%
group_by(chapter) %>%
mutate(negativity = cumsum(n),
mean_chapter_negativity=mean(negativity)) %>%
group_by(line_chapter) %>%
mutate(mean_line_negativity=mean(n))
plot_data
```
At this point you have enough to play with, so I leave you to plot whatever you want.
The following[^badplot] shows both the total negativity within a chapter, as well as the per line negativity within a chapter. We can see that there is less negativity towards the end of chapters. We can also see that there appears to be more negativity in later chapters (darker lines).
```{r negativity_plot, echo=FALSE, fig.height=6, fig.width=8, dpi=100, fig.retina=2}
# neither plotly nor ggplot can handle this dual scale and coloring scheme; plotly comes
# library(ggplot2)
#
# # library(viridis)
# plot_data %>%
# ungroup() %>%
# mutate(colspec = viridis::viridis(n=n_distinct(plot_data$chapter), begin=1, end=0)[chapter]) %>%
# ggplot(aes(x=line_chapter, y=negativity)) +
# geom_point(aes(y=n*30), color='#ff5500', alpha=.2, size=2) +
# # geom_line(aes(y=n*30, color=I(colspec), group=chapter), alpha=.2, size=2) +
# geom_line(aes(color=I(colspec), group=chapter)) +
# scale_y_continuous(breaks = c(0,200,400,600),
# limits = c(0,650),
# sec.axis = sec_axis(~./30, name = "Per line count")) + # this is an utterly pointless feature
# theme(legend.position='none') +
# theme_trueMinimal()
plot_data = ungroup(plot_data)
plot_data %>%
mutate(colspec = viridis::plasma(n=n_distinct(plot_data$chapter), begin=1, end=0)[chapter],
colspec2 = viridis::plasma(n=n_distinct(plot_data$line_chapter), begin=0, end=1)[factor(line_chapter)]) %>%
ggplot(aes(x=line_chapter, y=negativity)) +
geom_point(aes(y=mean_line_negativity*85, color=I(colspec2)), alpha=.05) +
# scale_color_viridis(option='C', discrete=T) +
geom_line(aes(color=I(colspec), group=factor(chapter)), alpha=.9, lwd=.75) +
# geom_point(aes(y=mean_line_negativity*85), stat='identity', color='#ff5500', alpha=.25) +
labs(x='Line number', y='Negativity') +
ggtitle('Negativity in Mansfield Park') +
scale_y_continuous(breaks = c(0,200,400,600),
limits = c(0,900),
sec.axis = sec_axis(~./85, name = "Per line mean negativity")) +
theme(legend.position='none') +
theme_trueMinimal()
# ggplotly()
# ay <- list(
# tickfont = list(color = "gray50"),
# overlaying = "y",
# side = "right",
# title = "Per line mean negativity",
# titleangle = -90 # not only added in the dumbest option possible, there is no way to undo it.
# )
#
# plotly, the visualization package, fails at color, colorbars, layering; it literally can
# not take a literal argument and apply it, and I grow weary
# plot_data %>% ungroup() %>%
# mutate(colspec = viridis::plasma(n=n_distinct(plot_data$chapter), begin=1, end=0)[chapter]) %>%
# plot_ly(colors=~rev(colspec)) %>%
# add_lines(x=~line_chapter,
# y=~negativity,
# color=~colspec,
# line=list(colors=~I(colspec)),
# # hoverinfo='none',
# opacity=1,
# line=list(width=3),
# showlegend=F, text=~chapter) %>%
# add_markers(x=~line_chapter,
# y=~mean_line_negativity,
# color=~colspec,
# # marker=list(colorscale=~ I(colspec)),
# opacity=.25,
# size = I(7),
# yaxis = "y2",
# showlegend=F,
# text=~chapter) %>%
# # hide_colorbar() %>% # good christ!!!
# layout(title = "Negativity in Mansfield Park",
# yaxis2 = ay,
# xaxis = list(title="Line number"),
# yaxis = list(title="Negativity")) %>%
# theme_plotly()
```
[^text]: I suggest not naming your column 'text' in practice. It is a base function in R, and using it within the tidyverse may result in problems distinguishing the function from the column name (similar to `n()` function and the `n` column created by <span class="func">count</span> and <span class="func">tally</span>). I only do so for pedagogical reasons.
[^encoding]: There are almost always encoding issues in my experience.
[^lazy]: This exercise is more or less taken directly from the [tidytext book](http://tidytextmining.com/sentiment.html).
[^badplot]: This depiction goes against many of my visualization principles. I like it anyway.
```{r setup_sentiment_return, echo=FALSE}
knitr::opts_chunk$set(cache.rebuild=F, cache=T)
```