Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated the files for a X mirror bot using puppeteer #13

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion ts-bot/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,14 @@
node_modules/

# environment variable files
.env
.env

# cookie files
cookies.json

# recent tweet hashes
lastTweetHashes.json

# debugging script for puppeteer
puppeteer.js
puppeteer.ts
16 changes: 9 additions & 7 deletions ts-bot/README.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,20 @@
# Bluesky Bot Tutorial

This folder contains a starter template for creating a bot on Bluesky. In this example, the bot posts a smiley emoji on an automated schedule once every three hours.
This folder contains a starter template for creating a bot on Bluesky. In this example, the bot posts the most recent X post from a profile to your Bluesky account. This bot uses Puppeteer to scrape the web and post to Bluesky.

## Set Up

1. Install Typescript: `npm i -g typescript`
2. Install Node.js: `npm i -g ts-node`
3. Make a copy of the example `.env` file by running: `cp example.env .env`. Set your username and password in `.env`. Use an App Password.
4. Compile your project by running: `npx tsc` or activate watch mode to have your code automatically compile: `npx tsc -w`
3. Install Puppeteer: `npm i puppeteer`
4. Install dotenv: `npm i dotenv`
5. Install crypto-js: `npm i crypto-js`
6. Make a copy of the example `.env` file by running: `cp example.env .env`. Set your username, password, X password, X username and X username of the profile to mirror in `.env`. Use an App Password.
7. Compile your project by running: `npx tsc` or activate watch mode to have your code automatically compile: `npx tsc -w`

## Running the script
1. You can run the script locally: `node index.js`. You should see a smiley emoji posted to your Bluesky account.
2. Modify the script however you like to make this bot your own!
## Running the script
1. You can run the script locally: `node index.js`. You should see the most recent X post posted to your Bluesky account.
2. Modify the script however you like to make this bot your own!

## Deploying your bot
1. You can deploy a simple bot for free or low cost on a variety of platforms. For example, check out [Heroku](https://devcenter.heroku.com/articles/github-integration) or [Fly.io](https://fly.io/docs/reference/fly-launch/).

5 changes: 4 additions & 1 deletion ts-bot/example.env
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
BLUESKY_USERNAME=
BLUESKY_PASSWORD=
BLUESKY_PASSWORD=
TWITTER_USERNAME=
TWITTER_PASSWORD=
TWITTER_PROFILE=
182 changes: 170 additions & 12 deletions ts-bot/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -22,25 +22,183 @@ var __importStar = (this && this.__importStar) || function (mod) {
__setModuleDefault(result, mod);
return result;
};
var __importDefault = (this && this.__importDefault) || function (mod) {
return (mod && mod.__esModule) ? mod : { "default": mod };
};
Object.defineProperty(exports, "__esModule", { value: true });
const api_1 = require("@atproto/api");
const dotenv = __importStar(require("dotenv"));
const cron_1 = require("cron");
const process = __importStar(require("process"));
const puppeteer = __importStar(require("puppeteer"));
const fs = __importStar(require("fs"));
const path = __importStar(require("path"));
const crypto_1 = __importDefault(require("crypto")); // for generating unique hashes
dotenv.config();
// Create a Bluesky Agent
const COOKIES_PATH = path.join(__dirname, "cookies.json");
const HASHES_PATH = path.join(__dirname, "lastTweetHashes.json");
const IntervalTime = 1000 * 60; //* 5; // 5 minutes
const HASH_LOG_SIZE = 10;
// Create a Bluesky Agent
const agent = new api_1.BskyAgent({
service: 'https://bsky.social',
service: "https://bsky.social",
});
async function main() {
await agent.login({ identifier: process.env.BLUESKY_USERNAME, password: process.env.BLUESKY_PASSWORD });
const response = await agent.post({
text: "🙂"
// Function to load cookies from file
async function loadCookies(page) {
if (fs.existsSync(COOKIES_PATH)) {
const cookies = JSON.parse(fs.readFileSync(COOKIES_PATH, "utf8"));
await page.setCookie(...cookies);
console.log("Cookies loaded from file.");
}
}
// Function to save cookies to file
async function saveCookies(page) {
const cookies = await page.cookies();
fs.writeFileSync(COOKIES_PATH, JSON.stringify(cookies, null, 2));
console.log("Cookies saved to file.");
}
// Function to load the last three tweet hashes from a file
function loadLastTweetHashes() {
if (fs.existsSync(HASHES_PATH)) {
return JSON.parse(fs.readFileSync(HASHES_PATH, "utf8"));
}
return [];
}
// Function to save the last three tweet hashes to a file
function saveLastTweetHashes(hashes) {
fs.writeFileSync(HASHES_PATH, JSON.stringify(hashes, null, 2));
console.log("Last tweet hashes saved to file.");
}
// Function to scrape the latest three tweets and handle quote tweets and image URLs properly
async function scrapeLatestTweets(username) {
const browser = await puppeteer.launch({
headless: true, // Launch with GUI for manual login
});
const page = await browser.newPage();
// Load session cookies if they exist
await loadCookies(page);
try {
// Navigate to the user's profile
const profileUrl = `https://twitter.com/${username}`;
await page.goto(profileUrl, {
waitUntil: "networkidle2",
timeout: 60000,
});
console.log(`Navigated to profile: ${profileUrl}`);
// Wait for the tweets to be visible
await page.waitForSelector("article div[data-testid='tweetText']", {
timeout: 30000,
});
// Scrape the latest three tweets and their quotes if they exist, and also capture image URLs
const latestTweets = await page.evaluate(() => {
const tweetElements = Array.from(document.querySelectorAll("article"));
return tweetElements.slice(0, 3).map((el) => {
// Main tweet text
const mainTweetText = el.querySelector("div[data-testid='tweetText']")
?.textContent || "";
// Check if the tweet contains a quoted tweet (look for nested div with data-testid='tweetText')
let quotedTweetText = "";
const quoteElement = el.querySelector("div[aria-labelledby] div[data-testid='tweetText']");
if (quoteElement &&
quoteElement.textContent !== mainTweetText) {
quotedTweetText = `\n\nQuoted tweet: "${quoteElement.textContent}"`;
}
// Initialize arrays for image and video URLs
let imageUrls = [];
let videoUrls = [];
// Select image elements and cast to HTMLImageElement to access 'src'
const imageElements = el.querySelectorAll('img[alt="Image"]');
if (imageElements.length > 0) {
imageUrls = Array.from(imageElements).map((img) => img.src); // Get the src attribute of each image
}
// Select video source elements and get the src attribute
const videoElements = el.querySelectorAll('source[type="video/mp4"]');
if (videoElements.length > 0) {
videoUrls = Array.from(videoElements).map((video) => video.src); // Get the src attribute of each video
}
// Return the combined main tweet text, quoted tweet text (if any), image URLs, and video URLs
return {
text: mainTweetText.trim() + quotedTweetText,
images: imageUrls, // Attach the array of image URLs
videos: videoUrls, // Attach the array of video URLs
};
});
});
// Save session cookies after navigating (for future launches)
await saveCookies(page);
await browser.close();
if (latestTweets.length === 0) {
throw new Error("No tweets found");
}
console.log(`Latest tweets: ${JSON.stringify(latestTweets, null, 2)}`);
// Generate a hash ID from each tweet text (to simulate an ID)
const tweetData = latestTweets.map((tweet) => ({
id: crypto_1.default.createHash("sha256").update(tweet.text).digest("hex"),
text: tweet.text,
images: tweet.images, // Include the image URLs in the tweet data
}));
return tweetData;
}
catch (error) {
console.error("Error during scraping or page interaction:", error);
await browser.close();
throw error;
}
}
async function main() {
try {
console.log("Logging into Bluesky...");
// Login to Bluesky
await agent.login({
identifier: process.env.BLUESKY_USERNAME,
password: process.env.BLUESKY_PASSWORD,
});
console.log("Logged into Bluesky successfully.");
// Load the last three tweet hashes from file
let lastTweetIds = loadLastTweetHashes();
setInterval(async () => {
console.log("Starting new interval to check for latest tweets...");
try {
const tweets = await scrapeLatestTweets(process.env.TWITTER_PROFILE);
for (const tweet of tweets) {
// Check if the tweet ID (hash) already exists in the lastTweetIds
if (!lastTweetIds.some((id) => id === tweet.id)) {
// New tweet detected, proceed to post it to Bluesky
const postContent = `${tweet.text}\n\n(mirrored from X)`;
console.log(`New tweet found. Tweet ID: "${tweet.id}". Posting to Bluesky...`);
try {
// Post the tweet to Bluesky
await agent.post({
$type: "app.bsky.feed.post", // Specify the type explicitly
text: postContent, // Ensure text is correctly passed with the added content
});
console.log(`Successfully posted tweet to Bluesky: "${postContent}"`);
// Add the newly posted tweet's hash to the lastTweetIds array
lastTweetIds.push(tweet.id);
// Keep only the latest HASH_LOG_SIZE tweet IDs to avoid storing too many
if (lastTweetIds.length > HASH_LOG_SIZE) {
lastTweetIds =
lastTweetIds.slice(-HASH_LOG_SIZE);
}
// Save the updated last tweet hashes to the file
saveLastTweetHashes(lastTweetIds);
console.log(`Updated lastTweetIds to: "${lastTweetIds}"`);
}
catch (error) {
console.error("Error while posting to Bluesky:", error);
}
}
else {
console.log(`Tweet with ID "${tweet.id}" has already been posted. Skipping.`);
}
}
}
catch (error) {
console.error("Error fetching or posting tweets:", error);
}
}, IntervalTime);
}
catch (error) {
console.error("Error logging into Bluesky:", error);
}
}
main();
// Run this on a cron job
const scheduleExpressionMinute = '* * * * *'; // Run once every minute for testing
const scheduleExpression = '0 */3 * * *'; // Run once every three hours in prod
const job = new cron_1.CronJob(scheduleExpressionMinute, main);
job.start();
Loading