Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No Longer Valid #1

Open
whatamithinking opened this issue Feb 10, 2018 · 7 comments
Open

No Longer Valid #1

whatamithinking opened this issue Feb 10, 2018 · 7 comments

Comments

@whatamithinking
Copy link

The URLs for the site have changed. This code no longer works.

@varadchoudhari
Copy link
Owner

I ran the crawler again, the URLs are still the same and the code works.

@whatamithinking
Copy link
Author

You are right. I tried again and realized you must be using python 2 and I am using python 3.

Running this though, how did you not get blocked by LinkedIn? I am creating my own version of this and they blocked the test accounts I used to scrape the directory after about 80 page requests.

@varadchoudhari
Copy link
Owner

Yes, the code is using Python 2. I will be releasing Python 3 version of this code soon.

Use appropriate politeness policy to not get block by LinkedIn.

@whatamithinking
Copy link
Author

I tried being polite. I have throttled it down to a random wait time between 15 and 30 seconds, but after ~80 requests I am locked out with a captcha that won't let me through even when I fill it out in person. I have gone so far as to randomly shutdown for a few minutes and then start up again to look more like a person. I also have randomized headers. No luck. Any suggestions...?

@prakashsagadevan
Copy link

I have checked this script in python 2.7.2. the code doesn't work for me.
@varadchoudhari Is site has changed?

@ogamaniuk
Copy link

Yes, it doesn't work anymore, pages like https://www.linkedin.com/directory/topics-a/ don't exist anymore.

@tatanfort
Copy link

Anyone has a solution to make it work now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants