error in the make_del_knob function? #5

frankang · 2020-09-14T12:50:52Z

In the make_del_knob function, when the size product (e_size * f_size) is smaller than the sample_size (20000 by default), the script ends up calculating the similarity score for all combinations of the src and tgt sentences, plus the remainder (20000 - e_size * f_size) . Is this behavior a mistake or an intended feature? It creates a biased histogram of the "real" distrubution by calculating multiple pairs on the 0:0 indexed sentences.

if e_size * f_size < sample_size:
    # dont sample, just compute full matrix
    sample_size = e_size * f_size
    x_idxs = np.zeros(sample_size, dtype=np.int32)
    y_idxs = np.zeros(sample_size, dtype=np.int32)
    c = 0
    for ii in range(e_size):
        for jj in range(f_size):
            x_idxs[c] = ii
            y_idxs[c] = jj
            c += 1
else:
    # get random samples
    x_idxs = np.random.choice(range(e_size), size=sample_size, replace=True).astype(np.int32)
    y_idxs = np.random.choice(range(f_size), size=sample_size, replace=True).astype(np.int32)

# output
random_scores = np.empty(sample_size, dtype=np.float32)

score_path(x_idxs, y_idxs,
           e_laser_norms, f_laser_norms,
           e_laser, f_laser,
           random_scores, )

The text was updated successfully, but these errors were encountered:

janisdd · 2024-04-25T13:53:25Z

What do you mean by plus the remainder (20000 - e_size * f_size)? Which lines are you referring to?

The variable sample_size = e_size * f_size stores the correct size (in both cases).
If e_size * f_size < sample_size is true, sample_size is overwritten.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error in the make_del_knob function? #5

error in the make_del_knob function? #5

frankang commented Sep 14, 2020

janisdd commented Apr 25, 2024

error in the make_del_knob function? #5

error in the make_del_knob function? #5

Comments

frankang commented Sep 14, 2020

janisdd commented Apr 25, 2024