Skip to content

DataDigger is a powerful and intuitive web application designed to extract and analyze data from web pages.

License

Notifications You must be signed in to change notification settings

Solrikk/DataDigger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataDigger Logo


DataDigger

Overview

DataDigger is a powerful web application designed to extract and analyze structured data from websites. Built with Go, it provides a seamless experience for data extraction, analysis, and export.

📊 Example Output

DataDigger organizes extracted data into the following categories:

Content Type HTML Tag Text URL Metadata Date
title title Website Title 2023-05-20
heading h1 Main Heading 2023-05-20
paragraph p Content text... 2023-05-20
link a Link text https://example.com 2023-05-20
image img Alt text https://example.com/image.jpg 2023-05-20
metadata description Site description 2023-05-20

Key Features

  • Comprehensive Data Extraction: Automatically collects and organizes:

    • Page titles and metadata
    • Headings (H1-H6)
    • Paragraph text
    • Lists (ordered and unordered)
    • Links with their text and URLs
    • Images with their alt text and URLs
    • Tables with formatted content
  • Excel Export: One-click export to Excel (.xlsx) format with properly formatted sheets and columns

  • User-Friendly Interface: Clean, intuitive design that requires no technical knowledge

  • Real-Time Processing: Fast and efficient scraping engine with immediate results

How It Works

  1. Enter the URL of any website you want to analyze in the input field
  2. Click "Extract Data" and let DataDigger work its magic
  3. Receive a structured Excel file with all the extracted data
  4. Review organized content categorized by type and HTML element

Use Cases

  • Market Research: Analyze competitor websites and product information
  • Content Aggregation: Build databases of information from multiple sources
  • SEO Analysis: Extract and analyze headings, metadata, and content structure
  • Data Journalism: Collect data for reporting and analysis
  • Academic Research: Gather information from online sources for studies

Technical Details

DataDigger is built with:

  • Go (Golang) for the backend processing
  • GoQuery for HTML parsing
  • Excelize for Excel file generation
  • Clean HTML/CSS/JavaScript frontend

Getting Started

Prerequisites

  • Go 1.19 or higher

Running Locally

  1. Clone the repository
  2. Run go mod download to install dependencies
  3. Start the server with go run main.go
  4. Access the application at http://0.0.0.0:8080

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Contributions are welcome! Feel free to submit a pull request or open an issue.


Made with ❤️ by Solrikk