Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to iterate/parse all children for given tag #52

Open
pkasson opened this issue Oct 2, 2014 · 2 comments
Open

How to iterate/parse all children for given tag #52

pkasson opened this issue Oct 2, 2014 · 2 comments

Comments

@pkasson
Copy link

pkasson commented Oct 2, 2014

Hi,

I have tried various xpath expressions to get all images, within lists, divs, etc. But, not all images (or script tags) are found.

Not sure if its an xpath issue, but how can the entire tree be iterated through, searching for these tags ... while it may be slower, brute force sometimes works.

Thanks,

Peter

@mrhevor
Copy link

mrhevor commented Oct 8, 2014

Can you show me your code?

@pkasson
Copy link
Author

pkasson commented Oct 8, 2014

The code I used earlier is below. In the interim, I just iterated through the whole tree, looking for child tags that matched what I wanted (brute force). While not elegant, it worked.

This is re-used in a method, passing in various tags, like "script" or "img"

NSArray * elements = [doc searchWithXPathQuery:[@"//" stringByAppendingFormat:@"%@", tagName]];

Clearly, not the most exhaustive REGEX expression, so I tried this as wellL

NSArray *imageTags = [NSArray arrayWithObjects:@"img", @"ul/li/img", @"ul/img", @"div/img", @"li/img", @"a/img", nil];

and iterated through these variations to determine if more hits were found.

Lastly, tried direct REGEX interaction with this

NSString *REGEX_IMG_CONTENT = @"<img[^>]+src="([^\">]+)"";

(the regex is not playing nice with web editor - < img [ ^ > ] + src ...

[self search:doc withRegexString:REGEX_IMG_CONTENT];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants