Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #208: broken tv_grab_pt_meo #217

Closed
wants to merge 5 commits into from
Closed

Fix #208: broken tv_grab_pt_meo #217

wants to merge 5 commits into from

Conversation

jlbribeiro
Copy link

@jlbribeiro jlbribeiro commented Oct 18, 2023

What type of Pull Request is this?

  • fixes/improves existing functionality

Does this PR close any currently open issues?

This PR fixes #208.

Please explain what this PR does

tv_grab_pt_meo currently had several issues:

  • the API endpoints have been moved to a different host/path;
  • the JSON payloads being POSTed to the API were malformed (the structure remains the same), which were rejected by the "new" API;
  • the API rejects requests without an Origin header matching https://www.meo.pt.

This PR addresses those issues, and extends XMLTV::Get_nice::post_nice_json() to accept an optional hash of HTTP headers to be included in the HTTP request; I wasn't sure whether this would feel the right approach, but given this function only appears to be used in tv_grab_pt_meo I thought it was ok. Any suggestions more than welcome.

Any other information?

I am not at all familiar with Perl; read a couple of things to understand some of the basics, what was the meaning of the $$ signature, ... I believe the changes "make sense", but I feel there might be some Perl-specifics I might be missing. Please advise if so.

This change has not been discussed on the xmltv-devel mailing list.

Where have you tested these changes?

Operating System: Alpine Linux v3.18.4

Perl Version: v5.36.1

Endpoints have moved to a separate services' host.
While the previous version of the API apparently accepted malformed JSON,
the current version rejects invalid payloads.
Some hosts expect specific HTTP header values to validate a request;
accept them as an optional argument, to make it backward compatible.
The endpoints now reject requests which do not present the expected
Origin.
@moob158
Copy link

moob158 commented Oct 19, 2023

@honir
Copy link
Contributor

honir commented Oct 19, 2023

Nice detective work! However...

Spoofing the Origin header -- to pretend to be the Meo app when you're not -- is not something we would support in the XMLTV project.

@jlbribeiro
Copy link
Author

jlbribeiro commented Oct 19, 2023

@moob158 I had written a long response but it is probably invalid now (see @honir's comment above, and mine further below); will leave some notes though.
I get "no programmes found" for DW_TV (which has an id of -1) as well, but not for other channels. Trying to grab all the channels I'm now hitting a different error related to some code that is only being exercised after this fix (Label not found for "next PROG" at /usr/bin/tv_grab_pt_meo line 579). The API contains many inconsistencies (channels with no sigla, duplicate channel IDs, ...), so it would probably require more work. But in any case, can you provide the stderr for tv_grab_pt_meo --fast --debug?


@honir I feared that, and I totally understand :P I had understood XMLTV strives to be a "good netizen", but was hoping not spoofing the User-Agent (and allowing services to block so if they wanted) passed that criterion. I guess that means this grabber is broken for good. Should it be removed, then? Seeing the comments on #208, I believe pt_vodafone might be a good enough replacement, as they probably share most of the channel grid.
(And thank you for all your work on the XMLTV project, btw!)

@rmeden
Copy link
Contributor

rmeden commented Oct 19, 2023

So adding an Origin header and keeping an XMLTV user-agent works? That's interesting... I can see the argument "Origin" is really just part of the API and if we're setting user-agent, we're not hiding who we are (and easily block-able if desired). Geoff, did you realize that user-agent wasn't changing?

@honir
Copy link
Contributor

honir commented Oct 19, 2023

Origin is not part of the API. It is a HTTP header normally set by a browser (or app) to indicate the source of the request. Thus it performs a similar function to the User-Agent but at a deeper level.

By spoofing the Origin you are hiding who you are: pretending to be someone else.

Origin => 'https://www.meo.pt'

You are effectively saying, "trust me, I'm from the Meo app"...but you're not. You're really xmltv.org!

(This is why browsers do not allow manual setting of the Origin header since it opens the browser up to cross-site security issues.)

As the site is actively checking the Origin header they are clearly expecting only requests from their own app to be permitted.

@jlbribeiro
Copy link
Author

jlbribeiro commented Oct 19, 2023

@rmeden Let me clarify that I agree with @honir; Origin is a forbidden header name, i.e. cannot be modified programmatically. Browsers send it implicitly on certain contexts, and as it is not modifiable (except outside the context of a browser) it is sometimes used as a trusting mechanism (notice that Cookie is also a forbidden header name). The fact the API checks its value is almost surely on purpose (nit: but it doesn't mean the API's intent is to check "clients"; preventing CORS is more about the security of the user, in this case implicitly sending the Cookie and performing an auth'd request on their behalf, for instance). I'm fully aware I was discussing semantics above, I was just trying to move the "line" to include Origin but exclude User-Agent 😛

With that said, I'm assuming this PR should be closed (and that tv_grab_pt_meo should be removed from this project, as it is broken "for good")? @honir + @rmeden

@honir
Copy link
Contributor

honir commented Oct 19, 2023

With that said, I'm assuming this PR should be closed (and that tv_grab_pt_meo should be removed from this project, as it is broken "for good")?

It will probably be removed in the next release (c. January?). We can leave it till then in case MEO have a change of heart and reinstate the previous urls (...unlikely, but you never know!)

@jlbribeiro
Copy link
Author

Closing this PR; again, thank you all for this project!

@jlbribeiro jlbribeiro closed this Oct 19, 2023
@jlbribeiro jlbribeiro deleted the bugfix/208-broken-tv_grab_pt_meo branch October 19, 2023 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tv_grab_pt_meo broken
4 participants