Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New content pack causes issues w/ displayed figures in Video Manager #5561

Open
benjaoming opened this issue Jan 17, 2018 · 13 comments
Open
Assignees
Milestone

Comments

@benjaoming
Copy link
Contributor

Summary

It seems like the duplication of video nodes is somehow causing a glitch. I can hardly believe that 11,406 videos have been removed and only 394 added in the video set.

image

System information

Traceback or relevant snippet from server.log

n/a

@benjaoming benjaoming added the bug label Jan 17, 2018
@benjaoming benjaoming added this to the 0.17.5 milestone Jan 17, 2018
@benjaoming
Copy link
Contributor Author

benjaoming commented Jan 17, 2018

A quick verification...

I have 9855 videos on my drive from the previous content pack.

Videos that are marked as available by videoscan:

SELECT COUNT(*) FROM (SELECT COUNT(youtube_id) from item WHERE available=1 GROUP by youtube_id)
> 8879

Videos that are marked as unavailable by videoscan:

SELECT COUNT(*) FROM (SELECT COUNT(youtube_id) from item WHERE available=1 GROUP by youtube_id)
> 450

This suggests that neither the function that summarizes missing videos, nor the one that summarizes how many have been downloaded take uniqueness of the youtube_id into account.

image

@benjaoming
Copy link
Contributor Author

Hi @mrpau-eugene -- would you be able to look at this? @rtibbles will help out with any questions you might have :)

@mrpau-eugene
Copy link
Contributor

@benjaoming sure! will take a look into this one and would find a fix for it 👍

@mrpau-eugene
Copy link
Contributor

@benjaoming seems like it doesn't account the uniqueness of each youtube_id
https://github.com/learningequality/ka-lite/blob/master/kalite/updates/static/js/updates/bundle_modules/update_videos.js#L160

Should this logic be changed given that we now have duplicated youtube_ids?

There is also a problem with staging.learningequality.org coz I can't download videos. I'm always getting a 404.

@benjaoming benjaoming modified the milestones: 0.17.5, 0.18 Jan 19, 2018
@benjaoming
Copy link
Contributor Author

There is also a problem with staging.learningequality.org coz I can't download videos. I'm always getting a 404.

This server is used because of using kalite.project.settings.dev, you can reconfigure it to use the real server by changing the values locally, put this in ~/.kalite/settings.py:

from kalite.project.settings.dev import *

SECURESYNC_PROTOCOL = "http"
CENTRAL_SERVER_HOST = "kalite.%s" % CENTRAL_SERVER_DOMAIN
CENTRAL_SERVER_URL = "%s://%s" % (SECURESYNC_PROTOCOL, CENTRAL_SERVER_HOST)

...and then invoke kalite without setting --settings.

@mrpau-eugene
Copy link
Contributor

mrpau-eugene commented Jan 22, 2018

@benjaoming I just found out that this doesn't seem to be the issue with the new content packs since I also experience a glitch using the old content packs.

What I did was:

  1. Download Class 5 (India)
  2. Check the Math checkbox to select all videos inside Math

screen shot 2018-01-22 at 3 27 11 pm

But if you uncheck at least one of the videos, it will have a different value

screen shot 2018-01-22 at 3 31 59 pm

It seems to be a problem on how the remote_size of the Math topic is computed in the database. I have a guess that it has something to do when building the content packs.

screen shot 2018-01-22 at 3 35 57 pm

In the screenshot above, the remote size is just around 2.5GB which is equivalent to the size when we just select the Math topic checkbox

@benjaoming
Copy link
Contributor Author

@mrpau-eugene yes, the sizes are wrong, too :) And they have been so for a while, I'm afraid that it has gone under the radar -- I can't find an issue for it.

What worries me here is that the file count is wrong, too -- which AFAIK is a new issue. And very problematic, too, because a user can select to delete videos in Topic A and then they're deleted in Topic B as well.

@mrpau-eugene
Copy link
Contributor

@benjaoming I think I seem to have found a fix regarding the sizes of the video. The only problem so far is the file count.

If we have this kind of scenario:

Math (12 total_files)
  Early Math
    ... 5 Videos (3 unique videos)
  Algebra I
    ... 3 Videos (1 unique video)
  Algebra II
    ... 4 Videos (2 unique videos)

Should the 12 total_files stay as is, or should it be 6 total files because we only have 6 unique videos?

Regarding video deletion, I have a suggestion..

  1. Check the videos by its path instead of its youtube_id
  2. If the video is deleted, mark that video with that path as available = False
  3. Check that youtube_id if other topics are still using it. If no topic uses it, we should completely delete it, else do nothing.

This might work, however when a user performs another videoscan I think videos using those youtube_ids will be marked as available = True again so I dont think this fixes the problem at all (?)

Do you have other suggestions?

@benjaoming
Copy link
Contributor Author

benjaoming commented Jan 24, 2018

Should the 12 total_files stay as is, or should it be 6 total files because we only have 6 unique videos?

It should be 6, otherwise the reported number of videos to download/delete will be wrong. This is because the figure is used to show information about video download counts -- not subjects or topic names.

Regarding video deletion, I have a suggestion..

I think your suggestion is a good one!

This might work, however when a user performs another videoscan I think videos using those youtube_ids will be marked as available = True again so I dont think this fixes the problem at all (?)

I can see that it might cause some unwanted scenario: User wants to delete content A (containing video A) such that it becomes hidden in the topic tree. I don't think users are likely to sort things manually like this because: 1) You cannot delete exercises anyways, so there's lots of "read-only" content in the topic-tree and 2) There are so many videos everywhere, users are unlikely to use the Delete function for a very careful manual video curation.

I think we can also see it like this:

User navigates to Video Management in order to Delete videos. The most likely reason for this, is that a specific video is unwanted, either because of disk space or because students aren't supposed to view it. In both cases, the video should be entirely removed, so all occurrences of youtube_id should have available=False and the file removed.

This would make things easier, right? :)

@mrpau-eugene
Copy link
Contributor

It should be 6, otherwise the reported number of videos to download/delete will be completely wrong

I'm still wondering how can this be achieved because we are only using the data from the API which has the total_files for each topic tree. I cant think of any way to fix this as of now.

I think we can also see it like this:

User navigates to Video Management in order to Delete videos. The most likely reason for this, is that a specific video is unwanted, either because of disk space or because students aren't supposed to view it. In both cases, the video should be entirely removed, so all occurrences of youtube_id should have available=False and the file removed.

This would make things easier, right? :)

I think that makes things easier. So are we gonna leave the deletion the way it works right now?

@benjaoming
Copy link
Contributor Author

So are we gonna leave the deletion the way it works right now?

I think right now it only marks the contents marked for deletion as available=False so it needs an additional lookup based on the youtube_id to mark other occurrences unavailable..?

@mrpau-eugene
Copy link
Contributor

@benjaoming when I tested awhile ago, after deleting a topic and refreshed the page, other occurences of that youtube_id were also deleted. I think I saw it also in the codebas where deleting a video is based on its youtube_id

@benjaoming benjaoming changed the title New content pack causes issues w/ video deletion New content pack causes issues w/ displayed figures in Video Manager Jan 29, 2018
@mrpau-eugene
Copy link
Contributor

mrpau-eugene commented Jan 30, 2018

@benjaoming

I don't think using a raw SQL fixes the wrong file count..

SELECT COUNT(*) FROM (SELECT COUNT(youtube_id) from item WHERE available=1 GROUP by youtube_id)

Let's say we have this kind of scenario:

We won't have any problems if a user just ticks the Math
Total: 50GB (assumption)

  • Math
    • Early Math
      • Counting
      • Counting Objects
    • Algebra I
      • Algebra Foundations
      • Solving equations

This seems to be easy since we just have to do a query like the one you posted.

But when a user only excludes one topic like this:

  • Math
    • Early Math
      • Counting
      • Counting Objects
    • Algebra I
      • Algebra Foundations
      • Solving equations

I don't know what would be the total, but won't we also have a weird file count? I also think this might have a complicated query too.

Though we can, let's say, query the total_files first and then just subtract the unchecked afterwards. But this has some drawbacks since we won't know whether those unchecked topics contains those duplicate youtube videos.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants