Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge from main #30

Merged
merged 101 commits into from
Feb 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
101 commits
Select commit Hold shift + click to select a range
e5da502
FEAT: Adding callbacks functionality
Pratyush-exe Dec 17, 2023
b87c5eb
CHORE: Structure change
Pratyush-exe Dec 21, 2023
73962f1
CHORE: Created some methods for on_epoch_end, 'DeepEvalCallback'
Pratyush-exe Dec 22, 2023
77af0cc
FEAT: Added custom metrics table for CLI
Pratyush-exe Dec 24, 2023
08eb920
FIX: fixed issues with progress table and added progress bar
Pratyush-exe Dec 25, 2023
410b268
FEAT: Added calc metric every x epoch and fixed some issues
Pratyush-exe Dec 27, 2023
1dcdc8d
CHORE: added progress_context while evaluation
Pratyush-exe Dec 27, 2023
004ef1c
FIX: fixed overlapping of progress-bars with tables
Pratyush-exe Jan 16, 2024
46b4004
FIX: fixed flikering of inital progress bars and order of columns
Pratyush-exe Jan 16, 2024
a1b8431
CHORE: code reformat
Pratyush-exe Jan 16, 2024
210df2e
FEAT: Added test_callback script for testing
Pratyush-exe Jan 16, 2024
d0afde7
FEAT: Added test_callback script for testing
Pratyush-exe Jan 17, 2024
c213f9b
CHORE: code lint
Pratyush-exe Jan 17, 2024
eb59e3d
Merge branch 'main' of https://github.com/Pratyush-exe/deepeval into …
Pratyush-exe Jan 17, 2024
3b27e05
CHORE: made callbacks compatible with latest pull
Pratyush-exe Jan 17, 2024
8b7e3bc
FEAT: Added list of Goldens as an optional parameter to EvaluationDat…
Pratyush-exe Jan 20, 2024
d0771b1
FEAT: Added 'retrieval_context' as param for Golden
Pratyush-exe Jan 20, 2024
f09d531
FEAT: Added support for Golden
Pratyush-exe Jan 20, 2024
c06871a
CHORE: Separated callback and rich code for better readability
Pratyush-exe Jan 21, 2024
bf81870
CHORE: Added docstrings to deepeval.callbacks.huggingface.utils
Pratyush-exe Jan 21, 2024
ff2b3a7
CHORE: Spell corrections
Pratyush-exe Jan 21, 2024
ea46dd6
Rename bias and toxicity
penguine-ip Jan 21, 2024
a1d2a67
fix docs
penguine-ip Jan 21, 2024
4a4e8d1
fix docs
penguine-ip Jan 21, 2024
c8e230d
Merge pull request #423 from confident-ai/hotfix/rename-bias-and-toxi…
penguine-ip Jan 21, 2024
50958f5
new release
penguine-ip Jan 21, 2024
a880ea5
Merge pull request #424 from confident-ai/release-v0.20.50
penguine-ip Jan 21, 2024
5cd7c13
CHORE: Added new test code
Pratyush-exe Jan 21, 2024
591e268
Fixed assertion on actual output
penguine-ip Jan 23, 2024
d22ebd1
Merge pull request #426 from confident-ai/hotfix/bias-and-toxic-check
penguine-ip Jan 23, 2024
0d67204
new release
penguine-ip Jan 23, 2024
09ad27b
Merge pull request #427 from confident-ai/release-v0.20.51
penguine-ip Jan 23, 2024
03fd6c4
FIX: Fixed sample_size code inside create_deepeval_dataset function
Pratyush-exe Jan 23, 2024
b1d6cb7
Merge pull request #368 from Pratyush-exe/feature/callbacks
penguine-ip Jan 23, 2024
3415c19
Added default to golden
penguine-ip Jan 23, 2024
a91f2b6
Merge pull request #428 from confident-ai/hotfix/datasetscreation
penguine-ip Jan 23, 2024
eeb79b1
added deployment option
penguine-ip Jan 23, 2024
b9274be
fix test
penguine-ip Jan 23, 2024
5d449cc
reformat
penguine-ip Jan 23, 2024
334a233
Add pytest deployment flag
penguine-ip Jan 23, 2024
cd1df87
.
penguine-ip Jan 23, 2024
e082fe6
.
penguine-ip Jan 23, 2024
66e350e
Merge pull request #429 from confident-ai/feature/deployment
penguine-ip Jan 23, 2024
2f55255
Fix global mutable defaults in EvaluationDataset
jschomay Jan 23, 2024
a7c78ea
Merge pull request #431 from jeffometer/main
penguine-ip Jan 23, 2024
1b50b98
.
penguine-ip Jan 23, 2024
57f6c5d
Merge pull request #432 from confident-ai/release-v0.20.52
penguine-ip Jan 23, 2024
10bf854
Added docsearch
penguine-ip Jan 24, 2024
2e46802
delete redundant "Toxicity"
nicholasburka Jan 25, 2024
8322dba
New integrations
penguine-ip Jan 25, 2024
c040b5a
fix tests
penguine-ip Jan 25, 2024
14e33fd
lint
penguine-ip Jan 25, 2024
46a9ae4
Merge pull request #434 from nicholasburka/patch-1
penguine-ip Jan 25, 2024
d7a4b93
Merge pull request #435 from confident-ai/features/integrations
penguine-ip Jan 25, 2024
94e5457
Llamaindex docs and integration
penguine-ip Jan 25, 2024
5a1cfda
Merge pull request #436 from confident-ai/features/llamaindex
penguine-ip Jan 25, 2024
a88314a
Updated docs
penguine-ip Jan 25, 2024
20d854b
.
penguine-ip Jan 25, 2024
aac5834
new release
penguine-ip Jan 25, 2024
1dfa192
Merge pull request #437 from confident-ai/release-v0.20.53
penguine-ip Jan 25, 2024
f1c7a6e
Update docs
penguine-ip Jan 25, 2024
da2f799
Conditional import
penguine-ip Jan 25, 2024
8d23b80
Merge pull request #438 from confident-ai/features/lazy-import
penguine-ip Jan 25, 2024
e107c40
added deployment flag
penguine-ip Jan 25, 2024
e53cbdd
Merge pull request #439 from confident-ai/feature/cicd
penguine-ip Jan 25, 2024
ddf5735
Added retrieval Context
penguine-ip Jan 26, 2024
861f3cd
Merge pull request #440 from confident-ai/features/deployment-github
penguine-ip Jan 27, 2024
58993da
Added deployment configs
penguine-ip Jan 27, 2024
bcbe21e
Added deployment key
penguine-ip Jan 27, 2024
7c3e546
Debug
penguine-ip Jan 27, 2024
ebe574c
debug
penguine-ip Jan 27, 2024
6550f51
debug
penguine-ip Jan 27, 2024
ffa3ffc
.
penguine-ip Jan 27, 2024
9ca77f6
.
penguine-ip Jan 27, 2024
3eedb7d
Merge pull request #442 from confident-ai/features/deploymentgithub
penguine-ip Jan 27, 2024
c8942cb
Added basic docs
Pratyush-exe Jan 28, 2024
c43f3aa
Merge pull request #444 from Pratyush-exe/callbacks-docs
penguine-ip Jan 28, 2024
9e463bb
Added custom model
penguine-ip Jan 29, 2024
970ffa9
Merge pull request #445 from confident-ai/features/custommodelagain
penguine-ip Jan 29, 2024
b287283
new release
penguine-ip Jan 29, 2024
fca2c1c
Merge pull request #446 from confident-ai/release-v0.20.54
penguine-ip Jan 29, 2024
45d64f3
Fix package setup
nictuku Jan 29, 2024
8505e1f
Merge pull request #447 from nictuku/integrations-fix
penguine-ip Jan 29, 2024
4c3076b
FIx evaluate
penguine-ip Jan 29, 2024
995103a
new release
penguine-ip Jan 29, 2024
e7eec73
Merge pull request #448 from confident-ai/release-v0.20.55
penguine-ip Jan 29, 2024
6566679
Added confident cost latency logging
penguine-ip Jan 30, 2024
6f4618c
Merge pull request #449 from confident-ai/features/latency-and-cost
penguine-ip Jan 30, 2024
ab08118
Added dataset alias for confidnet
penguine-ip Jan 30, 2024
27090fc
Merge pull request #450 from confident-ai/features/dataset-integration
penguine-ip Jan 30, 2024
4c64ed8
Fix convert goldens
penguine-ip Jan 30, 2024
73e9eb3
Fix cost and latency thresholds
penguine-ip Jan 30, 2024
6c9d056
updated docs
penguine-ip Jan 30, 2024
3a5a555
Merge pull request #451 from confident-ai/hotfix/costandlatency
penguine-ip Jan 30, 2024
c7ccd87
.
penguine-ip Jan 30, 2024
08c11b9
Merge pull request #452 from confident-ai/release-v0.20.56
penguine-ip Jan 30, 2024
bead5c0
Added docs
penguine-ip Jan 30, 2024
d3939c9
Updated docs
penguine-ip Jan 31, 2024
8c99c56
udpated docs
penguine-ip Feb 1, 2024
49b165b
updated docs
penguine-ip Feb 3, 2024
0728b19
Fixed docs
penguine-ip Feb 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/workflows/deepeval-results.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,11 @@ jobs:
- name: Check if 'deepeval' script is available
run: ls -l $(poetry env info --path)/bin/deepeval || echo "deepeval script not found"

- name: Run deepeval login
env:
CONFIDENT_API_KEY: ${{ secrets.CONFIDENT_API_KEY }}
run: poetry run deepeval login --confident-api-key "$CONFIDENT_API_KEY"

- name: Run deepeval tests and capture output
run: poetry run deepeval test run tests/test_quickstart.py > output.txt 2>&1

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,4 +65,4 @@ jobs:
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
run: |
poetry run pytest tests/ --ignore=tests/test_g_eval.py
poetry run pytest tests/
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ Whether your application is implemented via RAG or fine-tuning, LangChain or Lla
- Contextual Recall
- Contextual Precision
- RAGAS
- Toxicity
- Hallucination
- Toxicity
- Bias
Expand Down
2 changes: 1 addition & 1 deletion deepeval/_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__: str = "0.20.49"
__version__: str = "0.20.56"
8 changes: 7 additions & 1 deletion deepeval/cli/test.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
import pytest
import typer
import os
import json
from typing_extensions import Annotated
from typing import Optional
from deepeval.test_run import test_run_manager, TEMP_FILE_NAME
from deepeval.utils import delete_file_if_exists
from deepeval.utils import delete_file_if_exists, get_deployment_configs
from deepeval.test_run import invoke_test_run_end_hook
from deepeval.telemetry import capture_evaluation_count

Expand Down Expand Up @@ -56,6 +57,11 @@ def run(
if exit_on_first_failure:
pytest_args.insert(0, "-x")

deployment_configs = get_deployment_configs()
if deployment_configs is not None:
deployment_configs_json = json.dumps(deployment_configs)
pytest_args.extend(["--deployment", deployment_configs_json])

pytest_args.extend(
[
"--verbose" if verbose else "--quiet",
Expand Down
1 change: 1 addition & 0 deletions deepeval/dataset/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ class Golden(BaseModel):
actual_output: Optional[str] = Field(None, alias="actualOutput")
expected_output: Optional[str] = Field(None, alias="expectedOutput")
context: Optional[list] = Field(None)
retrieval_context: Optional[list] = Field(None, alias="retrievalContext")
additional_metadata: Optional[Dict] = Field(
None, alias="additionalMetadata"
)
Expand Down
33 changes: 26 additions & 7 deletions deepeval/dataset/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,6 @@
from rich.console import Console
import json
import webbrowser
import os

from deepeval.metrics import BaseMetric
from deepeval.test_case import LLMTestCase
Expand All @@ -26,9 +25,20 @@ class EvaluationDataset:
test_cases: List[LLMTestCase]
goldens: List[Golden]

def __init__(self, test_cases: List[LLMTestCase] = []):
self.test_cases = test_cases
self.goldens = []
def __init__(
self,
alias: Optional[str] = None,
goldens: Optional[List[Golden]] = None,
test_cases: Optional[List[LLMTestCase]] = None,
):
if test_cases is not None:
for test_case in test_cases:
test_case.dataset_alias = alias
self.test_cases = test_cases
else:
self.test_cases = []
self.goldens = goldens or []
self.alias = alias

def add_test_case(self, test_case: LLMTestCase):
self.test_cases.append(test_case)
Expand All @@ -39,6 +49,11 @@ def __iter__(self):
def evaluate(self, metrics: List[BaseMetric]):
from deepeval import evaluate

if len(self.test_cases) == 0:
raise ValueError(
"No test cases found in evaluation dataset. Unable to evaluate empty dataset."
)

return evaluate(self.test_cases, metrics)

def add_test_cases_from_csv_file(
Expand Down Expand Up @@ -109,6 +124,7 @@ def get_column_data(df: pd.DataFrame, col_name: str, default=None):
actual_output=actual_output,
expected_output=expected_output,
context=context,
dataset_alias=self.alias,
)
)

Expand Down Expand Up @@ -171,6 +187,7 @@ def add_test_cases_from_json_file(
actual_output=actual_output,
expected_output=expected_output,
context=context,
dataset_alias=self.alias,
)
)

Expand Down Expand Up @@ -238,6 +255,7 @@ def add_test_cases_from_hf_dataset(
actual_output=actual_output,
expected_output=expected_output,
context=context,
dataset_alias=self.alias,
)
)

Expand Down Expand Up @@ -274,6 +292,7 @@ def push(self, alias: str):

def pull(self, alias: str, auto_convert_goldens_to_test_cases: bool = True):
if is_confident():
self.alias = alias
api = Api()
result = api.get_request(
endpoint=Endpoints.DATASET_ENDPOINT.value,
Expand All @@ -284,10 +303,10 @@ def pull(self, alias: str, auto_convert_goldens_to_test_cases: bool = True):
goldens=result["goldens"],
)

self.goldens = response.goldens

if auto_convert_goldens_to_test_cases:
self.test_cases = convert_goldens_to_test_cases(self.goldens)
self.test_cases = convert_goldens_to_test_cases(
response.goldens, alias
)
else:
raise Exception(
"Run `deepeval login` to pull dataset from Confident AI"
Expand Down
8 changes: 6 additions & 2 deletions deepeval/dataset/utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import List
from typing import List, Optional
from deepeval.dataset.api import Golden
from deepeval.test_case import LLMTestCase

Expand All @@ -18,14 +18,18 @@ def convert_test_cases_to_goldens(
return goldens


def convert_goldens_to_test_cases(goldens: List[Golden]) -> List[LLMTestCase]:
def convert_goldens_to_test_cases(
goldens: List[Golden], dataset_alias: Optional[str] = None
) -> List[LLMTestCase]:
test_cases = []
for golden in goldens:
test_case = LLMTestCase(
input=golden.input,
actual_output=golden.actual_output,
expected_output=golden.expected_output,
context=golden.context,
retrieval_context=golden.retrieval_context,
dataset_alias=dataset_alias,
)
test_cases.append(test_case)
return test_cases
Empty file.
1 change: 1 addition & 0 deletions deepeval/integrations/harness/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from deepeval.integrations.harness import DeepEvalHarnessCallback
26 changes: 26 additions & 0 deletions deepeval/integrations/harness/callback.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
from typing import List, Union


# from deepeval.experimental import BaseEvaluationExperiment

try:
from transformers.trainer_callback import TrainerCallback

class DeepEvalHarnessCallback(TrainerCallback):
"""
A [transformers.TrainerCallback] that logs various harness LLM evaluation metrics to DeepEval
"""

def __init__(self, experiments):
super().__init__()
self.experiments = experiments

raise NotImplementedError("DeepEvalHarnessCallback is WIP")

except ImportError:

class DeepEvalHarnessCallback:
def __init__(self, *args, **kwargs):
raise ImportError(
"The 'transformers' library is required to use the DeepEvalHarnessCallback."
)
1 change: 1 addition & 0 deletions deepeval/integrations/hugging_face/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from deepeval.integrations.hugging_face import DeepEvalHuggingFaceCallback
Loading
Loading