Can this download actual media files? #216

billbeans · 2024-09-21T19:32:14Z

Maybe I'm a bit confused about what this software does, but can it actually grab a user's uploaded media (jpg, mp4) from their tweets and download them? I ran user_media on a profile, and I just got a bunch of stdout in my terminal. I saved that output to a text file and had a hell of a time grepping the links out of it to make wget work, and even then, it didn't grab all of the media from the profile I wanted scraped

The text was updated successfully, but these errors were encountered:

vladkens · 2024-10-06T03:21:47Z

@billbeans user_media is api call to twitter to get list of media – list of links to photos and videos. Its reason why use see many log in terminal.

There are no real media download in twscrape now, because no request about it before.

You can download media with this simple script now:

import asyncio
import os

import httpx

from twscrape import API


async def download_file(client: httpx.AsyncClient, url: str, outdir: str):
    filename = url.split("/")[-1].split("?")[0]
    outpath = os.path.join(outdir, filename)

    async with client.stream("GET", url) as resp:
        with open(outpath, "wb") as f:
            async for chunk in resp.aiter_bytes():
                f.write(chunk)


async def load_user_media(api: API, user_id: int, outdir: str):
    os.makedirs(outdir, exist_ok=True)
    all_photos = []
    all_videos = []

    async for doc in api.user_media(user_id):
        all_photos.extend([x.url for x in doc.media.photos])
        for video in doc.media.videos:
            variant = sorted(video.variants, key=lambda x: x.bitrate)[-1]
            all_videos.append(variant.url)

    async with httpx.AsyncClient() as client:
        await asyncio.gather(
            *[download_file(client, url, outdir) for url in all_photos],
            *[download_file(client, url, outdir) for url in all_videos],
        )


async def main():
    api = API()
    await load_user_media(api, 2244994945, "output")


if __name__ == "__main__":
    asyncio.run(main())

vladkens added the endpoint-request label Oct 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can this download actual media files? #216

Can this download actual media files? #216

billbeans commented Sep 21, 2024 •

edited

Loading

vladkens commented Oct 6, 2024

Can this download actual media files? #216

Can this download actual media files? #216

Comments

billbeans commented Sep 21, 2024 • edited Loading

vladkens commented Oct 6, 2024

billbeans commented Sep 21, 2024 •

edited

Loading