You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the make_del_knob function, when the size product (e_size * f_size) is smaller than the sample_size (20000 by default), the script ends up calculating the similarity score for all combinations of the src and tgt sentences, plus the remainder (20000 - e_size * f_size) . Is this behavior a mistake or an intended feature? It creates a biased histogram of the "real" distrubution by calculating multiple pairs on the 0:0 indexed sentences.
if e_size * f_size < sample_size:
# dont sample, just compute full matrix
sample_size = e_size * f_size
x_idxs = np.zeros(sample_size, dtype=np.int32)
y_idxs = np.zeros(sample_size, dtype=np.int32)
c = 0
for ii in range(e_size):
for jj in range(f_size):
x_idxs[c] = ii
y_idxs[c] = jj
c += 1
else:
# get random samples
x_idxs = np.random.choice(range(e_size), size=sample_size, replace=True).astype(np.int32)
y_idxs = np.random.choice(range(f_size), size=sample_size, replace=True).astype(np.int32)
# output
random_scores = np.empty(sample_size, dtype=np.float32)
score_path(x_idxs, y_idxs,
e_laser_norms, f_laser_norms,
e_laser, f_laser,
random_scores, )
The text was updated successfully, but these errors were encountered:
What do you mean by plus the remainder (20000 - e_size * f_size)? Which lines are you referring to?
The variable sample_size = e_size * f_size stores the correct size (in both cases).
If e_size * f_size < sample_size is true, sample_size is overwritten.
In the make_del_knob function, when the size product (e_size * f_size) is smaller than the sample_size (20000 by default), the script ends up calculating the similarity score for all combinations of the src and tgt sentences, plus the remainder (20000 - e_size * f_size) . Is this behavior a mistake or an intended feature? It creates a biased histogram of the "real" distrubution by calculating multiple pairs on the 0:0 indexed sentences.
The text was updated successfully, but these errors were encountered: