Quiz 1 Resources

Question 1

In CL we use a lot of statistics and probability theory. Why?

A: They give us several distributions to characterise how people use language.
B: Natural languages are ambiguous.
C: We need to perform statistical tests to compare models.
D: They offer a set of efficient algorithms for automating processes.

Question 2

Which of the following features does not apply to natural languages?

A: Subjected to change
B: Unambiguous
C: Conventional
D: Context-dependent

Question 3

Which level of linguistic analysis deals with the meaning of morphemes and words?

A: Lexical semantics
B: Syntax
C: Morphology
D: Compositional semantics

Question 4

Which of the following is a diachronic corpus?

A: CHILDES
B: SubtLex
C: Corpus of Historical American English
D: TASA

Question 5

Which of the following resources is not a lexicon?

A: Words with proportion of native speakers who know the word meaning
B: Words with concreteness ratings
C: Words with age of acquisition estimates
D: Words with their meaning definition

Question 6

What is a valid hyponym of dog in WordNet?

A: Dalmatian
B: Animal
C: Canine
D: Cat

Correct Answers Quiz 1

B: Natural languages are ambiguous.
B: Unambiguous
A: Lexical semantics
C: Corpus of Historical American English
D: Cat

Quiz 2 Naïve Bayes Classification

Question 1

Which of the following is not an example of text classification?

A: Essay grading (pass/fail)
B: Text simplification
C: Sentiment analysis
D: Cyberbullying detection

Question 2

Which of the following is an advantage of rule systems?

A: Robust to rare events
B: Cheap to write
C: Cannot incorporate domain knowledge
D: Can deal with ambiguity effortlessly

Question 3

Which of the following statements about discriminative classifiers is wrong?

A: They can only address binary classification problems
B: They learn the hidden process which yielded the data sample
C: They can only learn linear boundaries
D: They are non-deterministic classifiers

Question 4

Which of the following is an example of extrinsic evaluation?

A: Run a t-test between precision scores in automatic grading
B: Compute the difference in accuracy between two classifiers
C: Measure the customer satisfaction when interacting with two different bots
D: Compare translation quality between two machine translation models

Question 5

What does the likelihood capture in Bayes Rule?

A: The probability of the class given the input
B: The probability of the input
C: The probability of the input given the class
D: The probability of the class

Question 6

What does the conditional independence assumption entail in NBC?

A: An NBC doesn't track feature co-presence
B: An NBC doesn't consider the probability of the document given the class
C: An NBC doesn't track sequential information
D: An NBC doesn't consider all classes when classifying

Question 7

Which of the following is not a stop word?

A: I
B: Child
C: Do
D: Because

Question 8

In a dataset consisting of 100 tweets, 20 contain instances of cyberbullying. For the sake of argument, we pretend to be dealing with 2 binary features: whether the tweet contains at least a curse word and whether the tweet contain non-alphabetic characters. The likelihood that a tweet containing at least a curse word is an instance of cyberbullying is 0.8 while the likelihood that a tweet containing non-alphabetic characters is not an instance of cyberbullying is 0.7.

What is the prior of the cyberbullying class?

A: 0.1
B: 0.8
C: cannot tell
D: 0.2

Question 9

In a dataset consisting of 100 tweets, 20 contain instances of cyberbullying. For the sake of argument, we pretend to be dealing with 2 binary features: whether the tweet contains at least a curse word and whether the tweet contain non-alphabetic characters. The likelihood that a tweet containing at least a curse word is an instance of cyberbullying is 0.8 while the likelihood that a tweet containing non-alphabetic characters is not an instance of cyberbullying is 0.3.

Consider a test tweet with at least a curse word and only alphabetic characters. What is the probability of the test tweet being an instance of cyberbullying?

A: 0.8 $$ 0.8 $$ 0.3
B: 0.8 $$ 0.2 $$ 0.7
C: 0.2 $$ 0.8 $$ 0.3
D: 0.2 $$ 0.2 $$ 0.7

Question 10

In a dataset consisting of 100 tweets, 20 contain instances of cyberbullying. For the sake of argument, we pretend to be dealing with 2 binary features: whether the tweet contains at least a curse word and whether the tweet contain non-alphabetic characters. The likelihood that a tweet containing at least a curse word is an instance of cyberbullying is 0.8 while the likelihood that a tweet containing non-alphabetic characters is not an instance of cyberbullying is 0.3.

Consider a test tweet with at least a curse word and only alphabetic characters. Would an NBC using these features classify it as an instance of cyberbullying?

A: Yes
B: Not enough information given
C: It'd be a tie
D: No

Correct Answers Quiz 2

B: Text Simplification
A: Robust to rare events
B: They learn the hidden process which yielded the data sample
C: Measure the customer satisfaction when interacting with two different bots
C: The probability of the input given the class
A: An NBC doesn't track feature co-presence
B: Child
D: 0.2
C: 0.2 $$ 0.8 $$ 0.3
D: No

Quiz 3 Pre-processing

Question 1

How many lemmas are there in the sentence:

"The children were curious about whether there would be a surprise at home or whether there had been enough surprises already."

Punctuation doesn't count.

A: 21
B: 19
C: 18
D: 16

Question 2

How many affixes are there in the word untrustworthy?

A: 3
B: 2
C: 0
D: 1

Question 3

Which of the following words is inflected?

A: Touchstone
B: Colourful
C: Children
D: Professor

Question 4

Which normalisation technique would you use before doing language identification?

A: Lemmatisation
B: Case folding
C: None of them
D: Tokenisation

Question 5

Consider two regular expression /^[a-zA-Z]{2,6}\b/. What does it match?

A: alphabetic strings between two and six characters at the beginning of a line followed by a word boundary Correct!
B: any string but alphabetic strings between two and six characters
C: Lines containing alphabetic strings between two and six characters
D: any string between three and six characters

Question 6

What is the minimum edit distance between glowing and growling?

A: 4
B: 1
C: 2
D: 3

Correct Answers Quiz 3

D: 16
B: 2
C: Children
C: None of them
A: alphabetic strings between two and six characters at the beginning of a line followed by a word boundary
C: 2

Quiz 4 Language modelling

Question 1

How do we use language modelling in machine translation?

A: To make sure the translation has the same meaning as the source
B: To predict the next sentence in the translation
C: To pick the most fluent candidate translation
D: To pick the best word among possible candidate translations for a word in the source

Question 2

Why do we care about the chain rule of probability?

A: It tells us how to compute the probability of a sentence
B: It deals with the infinite nature of language
C: It tells us how to use limited context to approximate larger contexts
D: It tells us how to deal with underflowing problems

Question 3

How do we get ML estimates for bigram transition probabilities?

A: Get co-occurrence counts and normalise by row marginals
B: Get co-occurrence counts and take the log
C: Get co-occurrence counts and normalise by column marginals
D: Get co-occurrence counts and normalise by the matrix total

Question 4

Which of the following is NOT a component of Markov Chains?

A: Transition counts
B: Initial probability distribution
C: Accepting state
D: History states

Question 5

If we fit a 4-gram language model, how many BoS symbols do we need to prepend to the sentences?

A: 1
B: 4
C: 2
D: 3

Question 6

In linear interpolation, lambdas have to meet a strict requirement. Which one?

A: Their sum must equal 1
B: They must be lower than 1
C: The highest value matches the largest n-gram available
D: Their algebraic sum must be 0

Correct Answers Quiz 4

C: To pick the most fluent candidate translation
A: It tells us how to compute the probability of a sentence
A: Get co-occurrence counts and normalise by row marginals
C: Accepting state
D: 3
A: Their sum must equal 1

Quiz 5 PoS Tagging

Question 1

Which of the following lexical categories is an example of open class words?

A: Auxiliaries
B: Possessive pronouns
C: Adverbs
D: Conjunctions

Question 2

In terms of PoS tag ambiguity, types tend to be .... tokens?

A: More ambiguous than
B: As unambiguous as
C: Less ambiguous than
D: As ambiguous as

Question 3

What does the emission probability matrix encode in a bigram HMM?

A: The probability of a word given a word
B: The probability of a tag given a word
C: The probability of a tag given a tag
D: The probability of a word given a tag

Question 4

Which component of the HMM encodes the Markov assumption?

A: The sequence of observations
B: The observation likelihood matrix
C: The initial distribution
D: The state transition probability matrix

Question 5

The Viterbi algorithm is an example of?

A: A classifier
B: A dynamic programming algorithm
C: A rule-based system
D: A vector space

Question 6

What is the complexity of the Viterbi algorithm? Q is the set of states, t is the length of the sentence, n is the order of the model, V is the vocabulary size.

A: $O(Q^t)$
B: $O(V^t*n)$
C: $O(Q^n*t)$
D: $O(Q*V)$

Question 7

Which of the following quantities does not contribute to the computation of the new posterior probability of observing each tag given the sequence of observed events up to that point?

A: Transition probabilities from state $q_i$ to state $q_j$
B: Emission probability for observation o_j given state $q_j$
C: Posterior probability up to the previous observed event
D: The likelihood of state $q_j$ given observed event $o_j$

Question 8

In the Viterbi algorithm, we apply the argmax function. What is the input?

A: The last column in the trellis
B: The product of the last column in the trellis, the transition probability matrix and the emission probability matrix
C: The product of the last column in the trellis and the transition probability matrix
D: The product of the transition and emission probability matrices

Correct Answers Quiz 5

C: Adverbs
D: Less ambiguous than
D: The probability of a word given a tag
D: The state transition probability matrix
B: A dynamic programming algorithm
C: $O(Q^n*t)$
D: The likelihood of state $q_j$ given observed event $o_j$
C: The product of the last column in the trellis and the transition probability matrix

Quiz 6 Syntax

Question 1

Which of the following is not a component of a grammar?

A: a finite set of non-terminal states
B: a distinguished start state
C: a finite set of terminal states
D: an infinite set of production rules

Question 2

Which of the following rules is in CNF?

A: S → he ran
B: S → NP VP NP
C: S → you VP
D: S → NP VP

Question 3

What kind of corpus do we need to estimate a CFG?

A: Plain corpus
B: Parallel corpus
C: Treebank
D: Corpus with PoS annotations

Question 4

Which is the defining feature of an S constituent?

A: Its main verb has all its arguments
B: Can only occur as the LHS of rules
C: Cannot be coordinated with other S constituents
D: Always contains at least one NP

Question 5

What is the relation between CFGs and dynamic programming?

A: We can use dynamic programming because context changes our sub-parses
B: We cannot use dynamic programming because context will change our sub-parses
C: We can use dynamic programming because context cannot change sub-parses
D: They're unrelated

Question 6

In order to initialise the table for the CKY, what do we need to know? Group of answer choices.

A: The number of possible rules in the grammar
B: The number of non-terminals in the grammar
C: The number of terminals in the grammar
D: The symbols in the target sentence

Question 7

How do we use the CKY to know if a string is grammatical? Group of answer choices.

A: We check whether the final cell contains the S state
B: We check whether the final cell is not empty
C: We check whether the S state happens anywhere in the table
D: We check whether all terminals appear as the RHS in at least one rule

Question 8

Which of the following is a pre-terminal symbol?

A: N
B: NP
C: PP
D: do

Correct Answers Quiz 6

A: an infinite set of production rules
D: S → NP VP
C: Treebank
A: Its main verb has all its arguments
C: We can use dynamic programming because context cannot change sub-parses
D: The symbols in the target sentence
A: We check whether the final cell contains the S state
A: N

Files

Quiz Questions.md

Latest commit

History

Quiz Questions.md

File metadata and controls

Quiz 1 Resources

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Correct Answers Quiz 1

Quiz 2 Naïve Bayes Classification

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Question 9

Question 10

Correct Answers Quiz 2

Quiz 3 Pre-processing

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Correct Answers Quiz 3

Quiz 4 Language modelling

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Correct Answers Quiz 4

Quiz 5 PoS Tagging

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Correct Answers Quiz 5

Quiz 6 Syntax

Question 1

Question 2

Question 3

Question 4

Question 5

Question 6

Question 7

Question 8

Correct Answers Quiz 6