Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE]: REST API Mode with Axum Framework #66

Closed
angelplusultra opened this issue Feb 7, 2025 · 6 comments
Closed

[FEATURE]: REST API Mode with Axum Framework #66

angelplusultra opened this issue Feb 7, 2025 · 6 comments
Labels
enhancement New feature or request

Comments

@angelplusultra
Copy link
Owner

Describe the feature
Run jobshell in API mode with a flag option jobshell --api. this will serve an API using the axum framework on port 3000 and start a cron job to scrape all companies on an interval. This will require the --api flag to depend on two arguments after it. the port number and the cron hourly interval: jobshell --api {port} {interval}

Example:

jobshell --api 3000 4

How the API works
Every company we support will have an endppoint to get that companies jobs. I think it would work best as a route parameter since every company name is basically treated as a unique identifier and it would make it easier on the API consumer.
GET "/api/{:company} -> Vec

For example:

GET "/api/disney" triggers the handler to fetch all the disney jobs from the data file and return them.

Technical Details
I think Axum would be a good choice for this. version 0.8 is the latest.

@angelplusultra angelplusultra added the enhancement New feature or request label Feb 7, 2025
Copy link

Hello! I’m CodePlannerAI, and I’m here to help you generate implementation plans for your issues.

To get started, simply comment @codeplannerai to have me generate an implementation plan for this issue.

If you’d like to modify the plan, you can regenerate it by adding a message along with the @codeplannerai command. For example:

@codeplannerai Please avoid using extra dependencies for this issue.

Feel free to provide additional context to refine the plan to your needs!

You have 10 implementation plan generations left.

@angelplusultra
Copy link
Owner Author

@codeplannerai

Copy link

Implementation Plan:

Implementation Strategy for REST API Mode with Axum Framework


Overview

The goal is to implement an API mode in the jobshell application that:

  • Starts an HTTP server using the Axum framework on a specified port.
  • Serves job data for supported companies via endpoints like /api/{company}.
  • Initiates a cron job that scrapes all companies at a specified hourly interval.

This implementation will involve modifying existing files and adding new modules to handle API functionality while adhering to the existing code structure and patterns.


Modifications to Existing Files

  1. src/args.rs

    • Add new fields to the Args struct to handle the --api flag along with port and interval arguments.

      use clap::Parser;
      
      #[derive(Parser, Debug)]
      #[command(author, version, about, long_about = None)]
      pub struct Args {
          /// Enable Discord mode
          #[arg(long)]
          pub discord: bool,
      
          /// Enable API mode with specified port and interval
          #[arg(long)]
          pub api: Option<ApiArgs>,
      }
      
      #[derive(Debug, Clone, Parser)]
      pub struct ApiArgs {
          /// Port number to run the API server on
          pub port: u16,
          /// Hourly interval for the cron job to scrape companies
          pub interval: u64,
      }
      • Reasoning: Using an Option<ApiArgs> allows us to group port and interval under the --api flag, ensuring they're provided together.
  2. src/main.rs

    • Update the main function to handle the new --api flag.

      #[tokio::main]
      async fn main() -> AppResult<()> {
          clear_console();
          dotenv().ok();
          let args = Args::parse();
      
          if let Some(api_args) = args.api {
              modes::api::run(api_args).await?;
          } else if args.discord {
              modes::discord::run().await?;
          } else {
              modes::cli::run().await?;
          }
      
          Ok(())
      }
      • Reasoning: This ensures that when the --api flag is provided, the application runs in API mode.
  3. Cargo.toml

    • Add dependencies for Axum and any required crates for scheduling.

      [dependencies]
      axum = "0.8"
      tokio = { version = "1", features = ["full"] }
      tokio-cron-scheduler = "0.5"
      # Other existing dependencies...
      • Reasoning: Axum is required for building the HTTP server, and tokio-cron-scheduler is used for scheduling the cron jobs asynchronously.

New Files and Components

  1. src/modes/api.rs

    • Create a new module to handle API mode.

      use axum::{
          routing::get,
          Router,
      };
      use crate::handlers::api_handlers;
      use tokio_cron_scheduler::{JobScheduler, Job};
      use tokio::sync::RwLock;
      use std::sync::Arc;
      use clap::Parser;
      
      #[derive(Debug, Clone)]
      pub struct ApiArgs {
          pub port: u16,
          pub interval: u64,
      }
      
      pub async fn run(api_args: ApiArgs) -> AppResult<()> {
          // Shared state for job data
          let job_data = Arc::new(RwLock::new(load_initial_job_data()?));
      
          // Start the cron job
          start_cron_job(api_args.interval, job_data.clone()).await?;
      
          // Build the Axum app
          let app = Router::new()
              .route("/api/:company", get(api_handlers::get_company_jobs))
              .with_state(job_data);
      
          // Start the server
          axum::Server::bind(&format!("0.0.0.0:{}", api_args.port).parse().unwrap())
              .serve(app.into_make_service())
              .await
              .unwrap();
      
          Ok(())
      }
      
      // Function to load initial job data
      fn load_initial_job_data() -> AppResult<JobData> {
          // Implement loading logic
      }
      
      // Function to start the cron job
      async fn start_cron_job(interval: u64, job_data: Arc<RwLock<JobData>>) -> AppResult<()> {
          let sched = JobScheduler::new().await?;
      
          let cron_expression = format!("0 0/{} * * * *", interval);
          let job = Job::new_async(cron_expression.as_str(), move |_uuid, _l| {
              let job_data = job_data.clone();
              Box::pin(async move {
                  // Scrape all companies and update job_data
                  // Acquire write lock on job_data
                  let mut data = job_data.write().await;
                  *data = scrape_all_companies().await.unwrap();
              })
          })?;
      
          sched.add(job).await?;
          sched.start().await?;
      
          Ok(())
      }
      • Reasoning: This module sets up the API server and cron job. It uses shared state (job_data) protected by an RwLock to allow safe concurrent access.
  2. src/handlers/api_handlers.rs

    • Create handlers for the API endpoints.

      use axum::{
          extract::{Path, State},
          Json,
      };
      use std::sync::Arc;
      use tokio::sync::RwLock;
      use crate::models::JobData;
      
      pub async fn get_company_jobs(
          Path(company): Path<String>,
          State(job_data): State<Arc<RwLock<JobData>>>,
      ) -> Json<Vec<Job>> {
          let data = job_data.read().await;
          let jobs = data.get(&company).cloned().unwrap_or_default();
          Json(jobs)
      }
      • Reasoning: This handler fetches jobs for a specific company from the shared state and returns them as JSON.
  3. Update src/handlers/mod.rs

    • Include the new API handlers module.

      pub mod api_handlers;
      pub mod handlers;
      pub mod scrape_options;
      • Reasoning: Ensures the new handlers are properly integrated.
  4. src/models/job_data.rs

    • Define a model to represent the job data structure.

      use std::collections::HashMap;
      
      pub type JobData = HashMap<String, Vec<Job>>;
      
      #[derive(Clone, Serialize, Deserialize)]
      pub struct Job {
          // Fields representing job details
      }
      • Reasoning: Provides a centralized data structure for storing and accessing job information.

Technical Considerations and Challenges

  1. Asynchronous Data Access

    • Challenge: Ensuring thread-safe, asynchronous access to shared job data between the cron job and API handlers.

    • Solution: Use Arc<RwLock<JobData>> to allow multiple readers and safe mutation by the cron job.

  2. Error Handling

    • Challenge: Properly propagating errors in an asynchronous context, especially within the cron job.

    • Solution: Use AppResult and implement error handling in asynchronous blocks.

  3. Cron Job Scheduling

    • Challenge: Scheduling tasks in an asynchronous runtime without blocking.

    • Solution: Use tokio-cron-scheduler which is designed for scheduling in tokio runtime environments.

  4. Data Consistency

    • Challenge: Ensuring that data served by the API is consistent, especially during writes by the cron job.

    • Solution: The RwLock allows concurrent reads but ensures writes are exclusive, preventing data races.

  5. Command-line Argument Parsing

    • Challenge: Parsing nested command-line arguments with clap.

    • Solution: Use nested structs (ApiArgs) and ensure that required arguments are captured correctly under the --api flag.

  6. Integration with Existing Codebase

    • Challenge: Maintaining code organization and adhering to existing project structure.

    • Solution: Place new modules in logical locations (modes, handlers, models) and follow existing naming and coding conventions.


Additional Notes

  • Testing

    • Implement unit tests for the new modules, especially for the API handlers and the cron job functionality.
  • Logging

    • Add logging within the API and cron job to aid in monitoring and debugging.
  • Configuration

    • Consider allowing configuration of the API settings via environment variables or a configuration file in addition to command-line arguments.
  • Security

    • Evaluate the need for API security (e.g., authentication), especially if the API will be exposed beyond localhost.
  • Documentation

    • Update the project documentation to include instructions on running the API mode and details about the new endpoints.

By carefully implementing the steps above, we'll successfully add the requested API mode to the jobshell application, providing users with the ability to access job data via HTTP endpoints and keep the data up-to-date with scheduled scraping tasks.

You have 9 implementation plan generations left

@angelplusultra
Copy link
Owner Author

@codeplannerai

This is deprecated

axum::Server::bind(&format!("0.0.0.0:{}", api_args.port).parse().unwrap())
        .serve(app.into_make_service())
        .await
        .unwrap();

Copy link

Generating implementation plan, please wait...

Copy link

Implementation Plan:

Updated Implementation Plan for REST API Mode with Axum Framework


Overview

The goal is to implement an API mode in the jobshell application that:

  • Starts an HTTP server using the Axum framework on a specified port.
  • Serves job data for supported companies via endpoints like /api/{company}.
  • Initiates a cron job that scrapes all companies at a specified hourly interval.

This implementation will involve modifying existing files and adding new modules to handle API functionality while adhering to the existing code structure and patterns. We will ensure that we address the deprecated code and use the latest recommended practices with Axum 0.8.


Modifications to Existing Files

1. Cargo.toml

  • Add dependencies for Axum and any required crates for scheduling.
[dependencies]
# Existing dependencies...
axum = "0.8"
tokio = { version = "1.0", features = ["full"] } # Already present
tokio-cron-scheduler = { version = "0.13.0", features = ["english"] } # Update to match the latest compatible version
serde = { version = "1.0", features = ["derive"] } # Already present
serde_json = "1.0" # Already present
  • Reasoning: Axum is required for building the HTTP server, and tokio-cron-scheduler is used for scheduling the cron jobs asynchronously. We ensure versions are compatible with Axum 0.8.

2. src/args.rs

  • Modify the Args struct to include the --api flag with required port and interval arguments.

Given the existing Args struct:

use clap::Parser;

#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
pub struct Args {
    /// Enable Discord mode
    #[arg(long)]
    pub discord: bool,
}

We will update it to:

use clap::Parser;

#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
pub struct Args {
    /// Enable Discord mode
    #[arg(long, conflicts_with = "api")]
    pub discord: bool,

    /// Enable API mode
    #[arg(long, requires = ["port", "interval"], conflicts_with = "discord")]
    pub api: bool,

    /// Port number to run the API server on (required when --api is used)
    #[arg()]
    pub port: Option<u16>,

    /// Hourly interval for the cron job to scrape companies (required when --api is used)
    #[arg()]
    pub interval: Option<u64>,
}
  • Reasoning: We added the api flag and the port and interval arguments. Using requires ensures that port and interval must be provided when --api is used. Adding conflicts_with prevents --api and --discord from being used simultaneously.

3. src/main.rs

  • Update the main function to handle the new --api flag and its arguments.

Existing main.rs:

mod args;
mod company_options;
mod discord;
mod error;
mod handlers;
mod modes;
mod reports;
mod scrapers;
mod utils;
mod models {
    pub mod ai;
    pub mod custom_error;
    pub mod data;
    pub mod gemini;
    pub mod scraper;
}

use args::Args;
use clap::Parser;
use dotenv::dotenv;
use error::AppResult;
use jobshell::utils::clear_console;

#[tokio::main]
async fn main() -> AppResult<()> {
    clear_console();
    dotenv().ok();
    let args = Args::parse();

    if args.discord {
        modes::discord::run().await?;
    } else {
        modes::cli::run().await?;
    }

    Ok(())
}

Modify the main function:

#[tokio::main]
async fn main() -> AppResult<()> {
    clear_console();
    dotenv().ok();
    let args = Args::parse();

    if args.api {
        let port = args.port.expect("Port number is required when using --api");
        let interval = args.interval.expect("Interval is required when using --api");
        let api_args = ApiArgs { port, interval };
        modes::api::run(api_args).await?;
    } else if args.discord {
        modes::discord::run().await?;
    } else {
        modes::cli::run().await?;
    }

    Ok(())
}
  • Add the following struct definition at an appropriate place (e.g., in args.rs or main.rs):
#[derive(Debug, Clone)]
pub struct ApiArgs {
    pub port: u16,
    pub interval: u64,
}
  • Reasoning: We added logic to handle the --api flag and ensure that the required port and interval arguments are provided. The ApiArgs struct encapsulates these arguments for passing to the run function.

New Files and Components

1. src/modes/api.rs

  • Create a new module to handle API mode.
// src/modes/api.rs
use axum::{
    routing::get,
    Router,
};
use crate::handlers::api_handlers;
use std::net::SocketAddr;
use crate::error::AppResult;
use tokio_cron_scheduler::{JobScheduler, Job};
use std::sync::Arc;
use tokio::sync::RwLock;
use crate::models::job_data::JobData;
use crate::ApiArgs;

pub async fn run(api_args: ApiArgs) -> AppResult<()> {
    // Shared state for job data
    let job_data = Arc::new(RwLock::new(load_initial_job_data().await?));

    // Start the cron job
    start_cron_job(api_args.interval, job_data.clone()).await?;

    // Build the Axum app
    let app = Router::new()
        .route("/api/:company", get(api_handlers::get_company_jobs))
        .with_state(job_data);

    // Start the server
    let addr = SocketAddr::from(([0, 0, 0, 0], api_args.port));
    axum::Server::bind(&addr)
        .serve(app)
        .await?;

    Ok(())
}

// Function to load initial job data
async fn load_initial_job_data() -> AppResult<JobData> {
    // Implement loading logic, e.g., read from a data file or scrape initial data
    Ok(JobData::new())
}

// Function to start the cron job
async fn start_cron_job(interval: u64, job_data: Arc<RwLock<JobData>>) -> AppResult<()> {
    let sched = JobScheduler::new().await?;

    let cron_expression = format!("0 0 */{} * * * *", interval);
    let job_data_clone = job_data.clone();
    let job = Job::new_async(cron_expression.as_str(), move |_uuid, _l| {
        let job_data = job_data_clone.clone();
        Box::pin(async move {
            // Scrape all companies and update job_data
            let new_data = scrape_all_companies().await.unwrap();
            let mut data = job_data.write().await;
            *data = new_data;
        })
    })?;

    sched.add(job).await?;
    sched.start().await?;

    Ok(())
}

// Function to scrape all companies
async fn scrape_all_companies() -> AppResult<JobData> {
    // Implement scraping logic
    Ok(JobData::new())
}
  • Updated Server Initialization:

    • Issue: The previous code used deprecated methods for starting the Axum server.
    • Solution: Updated the code to use the recommended approach in Axum 0.8 by passing the Router directly to axum::Server::serve() without using into_make_service().
  • Reasoning: This module sets up the API server and cron job. It uses shared state (job_data) protected by an RwLock to allow safe concurrent access. The server initialization now complies with the latest Axum practices.

2. src/handlers/api_handlers.rs

  • Create handlers for the API endpoints.
// src/handlers/api_handlers.rs
use axum::{
    extract::{Path, State},
    Json,
};
use std::sync::Arc;
use tokio::sync::RwLock;
use crate::models::job_data::{JobData, Job};

pub async fn get_company_jobs(
    Path(company): Path<String>,
    State(job_data): State<Arc<RwLock<JobData>>>,
) -> Json<Vec<Job>> {
    let data = job_data.read().await;
    let jobs = data.get(&company).cloned().unwrap_or_default();
    Json(jobs)
}
  • Reasoning: This handler fetches jobs for a specific company from the shared state and returns them as JSON.

3. Update src/handlers/mod.rs

  • Include the new API handlers module.
// src/handlers/mod.rs
pub mod api_handlers;
pub mod handlers;
pub mod scrape_options;
  • Reasoning: Ensures the new handlers are properly integrated into the project.

4. src/models/job_data.rs

  • Define a model to represent the job data structure.
// src/models/job_data.rs
use std::collections::HashMap;
use serde::{Serialize, Deserialize};

pub type JobData = HashMap<String, Vec<Job>>;

#[derive(Clone, Serialize, Deserialize)]
pub struct Job {
    // Fields representing job details
    pub title: String,
    pub location: String,
    pub url: String,
    // Add other relevant fields
}

impl JobData {
    pub fn new() -> Self {
        HashMap::new()
    }
}
  • Reasoning: Provides a centralized data structure for storing and accessing job information.

Potential Technical Considerations and Challenges

1. Deprecated Methods in Axum

  • Challenge: The previous implementation used app.into_make_service(), which is deprecated in Axum 0.8.

  • Solution: Update the server initialization to pass the app (a Router) directly to axum::Server::serve(). The Router type implements MakeService, so into_make_service() is no longer necessary.

    axum::Server::bind(&addr)
        .serve(app)
        .await?;

2. Command-line Argument Parsing

  • Challenge: Ensuring that port and interval are required when the --api flag is used and that the flags do not conflict.
  • Solution: Use Clap's requires and conflicts_with attributes to enforce argument requirements and mutual exclusivity.

3. Asynchronous Data Access

  • Challenge: Managing asynchronous, thread-safe access to shared job data between the cron job and API handlers.
  • Solution: Use Arc<RwLock<JobData>> for shared ownership and synchronization between asynchronous tasks.

4. Cron Job Scheduling

  • Challenge: Scheduling periodic scraping without blocking the asynchronous runtime.
  • Solution: Use tokio-cron-scheduler to schedule and run cron jobs within the Tokio runtime.

5. Data Consistency

  • Challenge: Ensuring that the job data remains consistent when being read by API handlers and written by the cron job.
  • Solution: The RwLock allows multiple concurrent reads or exclusive write access, ensuring data consistency.

6. Error Handling

  • Challenge: Handling errors within asynchronous contexts, especially in the cron job and server.
  • Solution: Use the AppResult type alias for consistent error handling and propagate errors appropriately.

7. Integration with Existing Codebase

  • Challenge: Maintaining the existing code structure and style.
  • Solution: Place new modules in the appropriate directories (modes, handlers, models) and follow existing naming conventions.

Additional Notes

Testing

  • Unit Tests: Implement unit tests for the new API handlers and cron job functions to ensure they work as expected.
  • Integration Tests: Test the API endpoints to ensure they return the correct data.

Logging

  • Implement Logging: Add logging statements within the API handlers and cron job to aid in debugging and monitoring.
  • Logging Level: Use appropriate logging levels (e.g., info, debug, error) based on the importance of the messages.

Configuration

  • Environment Variables: Consider allowing the port and interval to be set via environment variables for flexibility.
  • Configuration File: Optionally, introduce a configuration file to manage settings.

Security

  • Input Validation: Validate and sanitize the company parameter in the API endpoint to prevent injection attacks.
  • Rate Limiting: Implement rate limiting if necessary to prevent abuse of the API.
  • Access Control: If exposing the API publicly, consider implementing authentication and authorization mechanisms.

Documentation

  • Update README: Provide instructions on how to run the application in API mode, including examples.
  • API Documentation: Document the API endpoints, request parameters, and response formats.
  • Code Comments: Add comments to the code to explain complex logic and decisions.

By carefully implementing these steps and addressing the deprecated code, we'll successfully add the requested API mode to the jobshell application. Users will be able to access job data via HTTP endpoints and keep the data up-to-date with scheduled scraping tasks, enhancing the functionality of the tool while maintaining code quality and consistency.

You have 8 implementation plan generations left

@angelplusultra angelplusultra closed this as not planned Won't fix, can't repro, duplicate, stale Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant