Skip to content

Commit

Permalink
Merge pull request #170 from VinciGit00/pre/beta
Browse files Browse the repository at this point in the history
New release, many new features and bug-fix
  • Loading branch information
PeriniM authored May 8, 2024
2 parents d86437f + 3f0e069 commit 5ea4df4
Show file tree
Hide file tree
Showing 70 changed files with 807 additions and 1,074 deletions.
6 changes: 1 addition & 5 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,4 @@ examples/graph_examples/ScrapeGraphAI_generated_graph
examples/**/result.csv
examples/**/result.json
main.py
poetry.lock

# lock files
*.lock
poetry.lock

97 changes: 87 additions & 10 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,104 @@
## [0.9.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.8.0...v0.9.0) (2024-05-04)
## [0.10.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.10.0-beta.1...v0.10.0-beta.2) (2024-05-08)


### Bug Fixes

* **examples:** local, mixed models and fixed SearchGraph embeddings problem ([6b71ec1](https://github.com/VinciGit00/Scrapegraph-ai/commit/6b71ec1d2be953220b6767bc429f4cf6529803fd))
* **examples:** openai std examples ([186c0d0](https://github.com/VinciGit00/Scrapegraph-ai/commit/186c0d035d1d211aff33c38c449f2263d9716a07))
* removed .lock file for deployment ([d4c7d4e](https://github.com/VinciGit00/Scrapegraph-ai/commit/d4c7d4e7fcc2110beadcb2fc91efc657ec6a485c))


### Docs

* update README.md ([17ec992](https://github.com/VinciGit00/Scrapegraph-ai/commit/17ec992b498839e001277e7bc3f0ebea49fbd00d))

## [0.10.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0...v0.10.0-beta.1) (2024-05-06)


### Features

* Enable end users to pass model instances of HuggingFaceHub ([7599234](https://github.com/VinciGit00/Scrapegraph-ai/commit/7599234ab9563ca4ee9b7f5b2d0267daac621ecf))
* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))
* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))
* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))
* add new hugging_face models ([d5547a4](https://github.com/VinciGit00/Scrapegraph-ai/commit/d5547a450ccd8908f1cf73707142b3481fbc6baa))
* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))
* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))
* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))


### Bug Fixes

* trailing whitespace ([2878695](https://github.com/VinciGit00/Scrapegraph-ai/commit/2878695d5f35cc9d81f24e4844fdc1988d10cb26))
* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))
* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))


### Build
### CI

* **deps:** bump tqdm from 4.66.1 to 4.66.3 ([0a17c74](https://github.com/VinciGit00/Scrapegraph-ai/commit/0a17c74e50d0457aec289e81183e9c779c735842))
* **deps:** bump tqdm from 4.66.1 to 4.66.3 ([aff6f98](https://github.com/VinciGit00/Scrapegraph-ai/commit/aff6f983b02a37ced21826847a6ace5fb15ecf3d))
* **release:** 0.9.0-beta.2 [skip ci] ([5aa600c](https://github.com/VinciGit00/Scrapegraph-ai/commit/5aa600cb0a85d320ad8dc786af26ffa46dd4d097))
* **release:** 0.9.0-beta.3 [skip ci] ([da8c72c](https://github.com/VinciGit00/Scrapegraph-ai/commit/da8c72ce138bcfe2627924d25a67afcd22cfafd5))
* **release:** 0.9.0-beta.4 [skip ci] ([8c5397f](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c5397f67a9f05e0c00f631dd297b5527263a888))
* **release:** 0.9.0-beta.5 [skip ci] ([532adb6](https://github.com/VinciGit00/Scrapegraph-ai/commit/532adb639d58640bc89e8b162903b2ed97be9853))
* **release:** 0.9.0-beta.6 [skip ci] ([8c0b46e](https://github.com/VinciGit00/Scrapegraph-ai/commit/8c0b46eb40b446b270c665c11b2c6508f4d5f4be))
* **release:** 0.9.0-beta.7 [skip ci] ([6911e21](https://github.com/VinciGit00/Scrapegraph-ai/commit/6911e21584767460c59c5a563c3fd010857cbb67))
* **release:** 0.9.0-beta.8 [skip ci] ([739aaa3](https://github.com/VinciGit00/Scrapegraph-ai/commit/739aaa33c39c12e7ab7df8a0656cad140b35c9db))

## [0.9.0-beta.8](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.7...v0.9.0-beta.8) (2024-05-06)

### CI

* **release:** 0.8.0-beta.1 [skip ci] ([d277b34](https://github.com/VinciGit00/Scrapegraph-ai/commit/d277b349a98848749a7e38ea3c511271bced3b71))
* **release:** 0.8.0-beta.2 [skip ci] ([892500a](https://github.com/VinciGit00/Scrapegraph-ai/commit/892500afe93c4d96dcffe897b382977a22079b83))
* **release:** 0.9.0-beta.1 [skip ci] ([14615a7](https://github.com/VinciGit00/Scrapegraph-ai/commit/14615a73c71bb5250772a75c415c57cb153660f8))
### Features

* add llava integration ([019b722](https://github.com/VinciGit00/Scrapegraph-ai/commit/019b7223dc969c87c3c36b6a42a19b4423b5d2af))

## [0.9.0-beta.7](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.6...v0.9.0-beta.7) (2024-05-06)


### Bug Fixes

* **llm:** fixed gemini api_key ([fd01b73](https://github.com/VinciGit00/Scrapegraph-ai/commit/fd01b73b71b515206cfdf51c1d52136293494389))

## [0.9.0-beta.6](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.5...v0.9.0-beta.6) (2024-05-06)


### Features

* Fix bug for gemini case when embeddings config not passed ([726de28](https://github.com/VinciGit00/Scrapegraph-ai/commit/726de288982700dab8ab9f22af8e26f01c6198a7))

## [0.9.0-beta.5](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.4...v0.9.0-beta.5) (2024-05-06)


### Features

* fixed custom_graphs example and robots_node ([84fcb44](https://github.com/VinciGit00/Scrapegraph-ai/commit/84fcb44aaa36e84f775884138d04f4a60bb389be))
* multiple graph instances ([dbb614a](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbb614a8dd88d7667fe3daaf0263f5d6e9be1683))
* **node:** multiple url search in SearchGraph + fixes ([930adb3](https://github.com/VinciGit00/Scrapegraph-ai/commit/930adb38f2154ba225342466bfd1846c47df72a0))

## [0.9.0-beta.4](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.3...v0.9.0-beta.4) (2024-05-05)


### Features

* add gemini embeddings ([79daa4c](https://github.com/VinciGit00/Scrapegraph-ai/commit/79daa4c112e076e9c5f7cd70bbbc6f5e4930832c))

## [0.9.0-beta.3](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.2...v0.9.0-beta.3) (2024-05-05)


### Features

* add claude documentation ([5bdee55](https://github.com/VinciGit00/Scrapegraph-ai/commit/5bdee558760521bab818efc6725739e2a0f55d20))

## [0.9.0-beta.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.9.0-beta.1...v0.9.0-beta.2) (2024-05-05)


### Features

* refactoring search function ([aeb1acb](https://github.com/VinciGit00/Scrapegraph-ai/commit/aeb1acbf05e63316c91672c99d88f8a6f338147f))


### Bug Fixes

* bug on .toml ([f7d66f5](https://github.com/VinciGit00/Scrapegraph-ai/commit/f7d66f51818dbdfddd0fa326f26265a3ab686b20))

## [0.9.0-beta.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.8.0...v0.9.0-beta.1) (2024-05-04)

Expand Down
1 change: 1 addition & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@
## Reporting a Vulnerability

For reporting a vulnerability contact directly [email protected]

59 changes: 59 additions & 0 deletions examples/anthropic/smart_scraper_haiku.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
"""
Basic example of scraping pipeline using SmartScraper using Azure OpenAI Key
"""

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import SmartScraperGraph
from scrapegraphai.utils import prettify_exec_info
from langchain_community.llms import HuggingFaceEndpoint
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings


# required environment variables in .env
# HUGGINGFACEHUB_API_TOKEN
# ANTHROPIC_API_KEY
load_dotenv()

HUGGINGFACEHUB_API_TOKEN = os.getenv('HUGGINGFACEHUB_API_TOKEN')
# ************************************************
# Initialize the model instances
# ************************************************


embedder_model_instance = HuggingFaceInferenceAPIEmbeddings(
api_key=HUGGINGFACEHUB_API_TOKEN, model_name="sentence-transformers/all-MiniLM-l6-v2"
)

# ************************************************
# Create the SmartScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {
"api_key": os.getenv("ANTHROPIC_API_KEY"),
"model": "claude-3-haiku-20240307",
"max_tokens": 4000},
"embeddings": {"model_instance": embedder_model_instance}
}

smart_scraper_graph = SmartScraperGraph(
prompt="""Don't say anything else. Output JSON only. List me all the events, with the following fields: company_name, event_name, event_start_date, event_start_time,
event_end_date, event_end_time, location, event_mode, event_category,
third_party_redirect, no_of_days,
time_in_hours, hosted_or_attending, refreshments_type,
registration_available, registration_link""",
# also accepts a string with the already downloaded HTML code
source="https://www.hmhco.com/event",
config=graph_config
)

result = smart_scraper_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = smart_scraper_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))
Original file line number Diff line number Diff line change
@@ -1,17 +1,15 @@
"""
Basic example of scraping pipeline using JSONScraperGraph from JSON documents
Example of Search Graph
"""

import os
from dotenv import load_dotenv
from scrapegraphai.graphs import JSONScraperGraph
from langchain_openai import AzureChatOpenAI
from langchain_openai import AzureOpenAIEmbeddings
from scrapegraphai.graphs import SearchGraph
from scrapegraphai.utils import convert_to_csv, convert_to_json, prettify_exec_info
load_dotenv()

# ************************************************
# Read the JSON file
# ************************************************

FILE_NAME = "inputs/example.json"
curr_dir = os.path.dirname(os.path.realpath(__file__))
file_path = os.path.join(curr_dir, FILE_NAME)
Expand All @@ -20,42 +18,47 @@
text = file.read()

# ************************************************
# Define the configuration for the graph
# Initialize the model instances
# ************************************************

llm_model_instance = AzureChatOpenAI(
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
azure_deployment=os.environ["AZURE_OPENAI_CHAT_DEPLOYMENT_NAME"]
)

embedder_model_instance = AzureOpenAIEmbeddings(
azure_deployment=os.environ["AZURE_OPENAI_EMBEDDINGS_DEPLOYMENT_NAME"],
openai_api_version=os.environ["AZURE_OPENAI_API_VERSION"],
)

# ************************************************
# Create the JSONScraperGraph instance and run it
# ************************************************

graph_config = {
"llm": {
"model": "ollama/mistral",
"temperature": 0,
"format": "json", # Ollama needs the format to be specified explicitly
# "model_tokens": 2000, # set context length arbitrarily
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"temperature": 0,
}
"llm": {"model_instance": llm_model_instance},
"embeddings": {"model_instance": embedder_model_instance}
}

# ************************************************
# Create the JSONScraperGraph instance and run it
# Create the SearchGraph instance and run it
# ************************************************

json_scraper_graph = JSONScraperGraph(
prompt="List me all the authors, title and genres of the books",
source=text, # Pass the content of the file, not the file object
search_graph = SearchGraph(
prompt="List me the best escursions near Trento",
config=graph_config
)

result = json_scraper_graph.run()
result = search_graph.run()
print(result)

# ************************************************
# Get graph execution info
# ************************************************

graph_exec_info = json_scraper_graph.get_execution_info()
graph_exec_info = search_graph.get_execution_info()
print(prettify_exec_info(graph_exec_info))

# Save to json or csv
# Save to json and csv
convert_to_csv(result, "result")
convert_to_json(result, "result")
2 changes: 2 additions & 0 deletions examples/gemini/search_graph_gemini.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@
"temperature": 0,
"streaming": True
},
"max_results": 5,
"verbose": True,
}

# ************************************************
Expand Down
54 changes: 0 additions & 54 deletions examples/local_models/Docker/csv_scraper_docker.py

This file was deleted.

Loading

0 comments on commit 5ea4df4

Please sign in to comment.