Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor scripts to avoid anti-patterns, redundancy #1986

Merged
merged 47 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
d721099
Update bicep for ACA
1yefuwang1 Aug 22, 2024
30f00e5
First working version
1yefuwang1 Aug 23, 2024
72e34d2
Support workload profile
1yefuwang1 Aug 28, 2024
55a97fd
Merge branch 'Azure-Samples:main' into main
1yefuwang1 Aug 29, 2024
7edd2db
Add support for CORS and fix identity for openai
1yefuwang1 Aug 30, 2024
8fc2d5a
Add aca-host
1yefuwang1 Sep 2, 2024
9cadd14
Make acr unique
1yefuwang1 Sep 2, 2024
0623e9b
Add doc for aca host
1yefuwang1 Sep 3, 2024
73b2bb3
Merge branch 'Azure-Samples:main' into yefu/aca
1yefuwang1 Sep 3, 2024
e362545
Update ACA docs
1yefuwang1 Sep 3, 2024
24d668a
Remove unneeded bicep files
1yefuwang1 Sep 3, 2024
fbb4b05
Revert chanes to infra/main.parameters.json
1yefuwang1 Sep 3, 2024
4ced7ce
Fix markdown lint issues
1yefuwang1 Sep 3, 2024
625866f
Run frontend build before building docker image
1yefuwang1 Sep 4, 2024
40287f2
remove symlinks and update scripts with paths relative to its own fol…
1yefuwang1 Sep 5, 2024
a99a6c5
Merge with main.bicep
1yefuwang1 Sep 6, 2024
9dc65ca
output AZURE_CONTAINER_REGISTRY_ENDPOINT
1yefuwang1 Sep 9, 2024
7f523a0
Fix deployment with app service
1yefuwang1 Sep 9, 2024
9e6e145
Improve naming and README
1yefuwang1 Sep 9, 2024
4ec32f7
Fix identity name and cost esitmation for aca
1yefuwang1 Sep 9, 2024
4174fd3
Share env vars in bicep and update docs
1yefuwang1 Sep 10, 2024
7e49c99
Revert "remove symlinks and update scripts with paths relative to its…
1yefuwang1 Sep 13, 2024
259e7a5
Add containerapps as a commented out host option
1yefuwang1 Sep 13, 2024
920e979
Update app/backend/.dockerignore
pamelafox Sep 13, 2024
eb09e46
Apply suggestions from code review
pamelafox Sep 13, 2024
56025eb
Merge branch 'main' into yefu/aca
pamelafox Sep 13, 2024
13021cb
More steps for deployment guide
pamelafox Sep 13, 2024
8b19702
Update azure.yaml
pamelafox Sep 13, 2024
6550960
Merge branch 'main' into yefu/aca
pamelafox Sep 13, 2024
d49f60c
Update comment
pamelafox Sep 13, 2024
11837ba
cleanup bicep files and improve docs
1yefuwang1 Sep 14, 2024
560076b
Update condition for running in production for credential
pamelafox Sep 14, 2024
5682b67
Merge branch 'yefu/aca' of https://github.com/tawalke/azure-search-op…
pamelafox Sep 14, 2024
59e01c8
Refactors to scripts
pamelafox Sep 23, 2024
436ea35
Merge branch 'main' into loadazdenv
pamelafox Sep 23, 2024
a508bae
Remove phi changes
pamelafox Sep 23, 2024
457224d
Make mypy happy
pamelafox Sep 23, 2024
7c385c8
Add dotenv requirement
pamelafox Sep 23, 2024
02c280c
Env var tweaks
pamelafox Sep 24, 2024
a4a4f11
Fix error handling
pamelafox Sep 25, 2024
e4a7abf
Update manageacl.py commands
pamelafox Sep 25, 2024
0ab84da
Doc update
pamelafox Sep 25, 2024
4fef884
Adding more tests for prepdocs
pamelafox Sep 26, 2024
7d57de8
Fix markdown copy
pamelafox Sep 26, 2024
1980845
Fix relative links
pamelafox Sep 26, 2024
b378727
Make prepdocs mypy happy
pamelafox Sep 26, 2024
697fa01
Fix auth_update if check
pamelafox Sep 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions app/backend/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -714,9 +714,10 @@ def create_app():
# Log levels should be one of https://docs.python.org/3/library/logging.html#logging-levels
# Set root level to WARNING to avoid seeing overly verbose logs from SDKS
logging.basicConfig(level=logging.WARNING)
# Set the app logger level to INFO by default
default_level = "INFO"
app.logger.setLevel(os.getenv("APP_LOG_LEVEL", default_level))
# Set our own logger levels to INFO by default
app_level = os.getenv("APP_LOG_LEVEL", "INFO")
app.logger.setLevel(os.getenv("APP_LOG_LEVEL", app_level))
logging.getLogger("scripts").setLevel(app_level)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since prepdocs uses the "scripts" logger, I wanted to make sure we also see its logs when using user upload feature.
Though I wonder if I should give it a different name, like "ragapp".

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think scripts is an acceptable name for now - we can change it later if it's not working


if allowed_origin := os.getenv("ALLOWED_ORIGIN"):
app.logger.info("ALLOWED_ORIGIN is set, enabling CORS for %s", allowed_origin)
Expand Down
23 changes: 23 additions & 0 deletions app/backend/load_azd_env.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
import json
import logging
import subprocess

from dotenv import load_dotenv

logger = logging.getLogger("scripts")


def load_azd_env():
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file actually shows up in two places, both here and in scripts folder, for convenience.

"""Get path to current azd env file and load file using python-dotenv"""
result = subprocess.run("azd env list -o json", shell=True, capture_output=True, text=True)
if result.returncode != 0:
raise Exception("Error loading azd env")
env_json = json.loads(result.stdout)
env_file_path = None
for entry in env_json:
if entry["IsDefault"]:
env_file_path = entry["DotEnvPath"]
if not env_file_path:
raise Exception("No default azd env file found")
logger.info(f"Loading azd env from {env_file_path}")
load_dotenv(env_file_path, override=True)
9 changes: 9 additions & 0 deletions app/backend/main.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
import os

from app import create_app
from load_azd_env import load_azd_env

# WEBSITE_HOSTNAME is always set by App Service, RUNNING_IN_PRODUCTION is set in main.bicep
RUNNING_ON_AZURE = os.getenv("WEBSITE_HOSTNAME") is not None or os.getenv("RUNNING_IN_PRODUCTION") is not None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code also exists in two places-

  1. here, so that we load the environment when starting up the app locally
  2. in app.py, so that we can decide what kind of credential to use


if not RUNNING_ON_AZURE:
load_azd_env()

app = create_app()
274 changes: 105 additions & 169 deletions app/backend/prepdocs.py

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion app/backend/prepdocslib/blobmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

from .listfilestrategy import File

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


class BlobManager:
Expand Down
2 changes: 1 addition & 1 deletion app/backend/prepdocslib/embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
)
from typing_extensions import TypedDict

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


class EmbeddingBatch:
Expand Down
2 changes: 1 addition & 1 deletion app/backend/prepdocslib/filestrategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from .searchmanager import SearchManager, Section
from .strategy import DocumentAction, SearchInfo, Strategy

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


async def parse_file(
Expand Down
2 changes: 1 addition & 1 deletion app/backend/prepdocslib/htmlparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from .page import Page
from .parser import Parser

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


def cleanup_data(data: str) -> str:
Expand Down
2 changes: 1 addition & 1 deletion app/backend/prepdocslib/integratedvectorizerstrategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
from .searchmanager import SearchManager
from .strategy import DocumentAction, SearchInfo, Strategy

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


class IntegratedVectorizerStrategy(Strategy):
Expand Down
2 changes: 1 addition & 1 deletion app/backend/prepdocslib/listfilestrategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
DataLakeServiceClient,
)

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


class File:
Expand Down
2 changes: 1 addition & 1 deletion app/backend/prepdocslib/pdfparser.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from .page import Page
from .parser import Parser

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


class LocalPdfParser(Parser):
Expand Down
2 changes: 1 addition & 1 deletion app/backend/prepdocslib/searchmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
from .strategy import SearchInfo
from .textsplitter import SplitPage

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


class Section:
Expand Down
2 changes: 1 addition & 1 deletion app/backend/prepdocslib/textsplitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from .page import Page, SplitPage

logger = logging.getLogger("ingester")
logger = logging.getLogger("scripts")


class TextSplitter(ABC):
Expand Down
1 change: 1 addition & 0 deletions app/backend/requirements.in
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,4 @@ beautifulsoup4
types-beautifulsoup4
msgraph-sdk==1.1.0
openai-messages-token-helper
python-dotenv
2 changes: 2 additions & 0 deletions app/backend/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,8 @@ python-dateutil==2.9.0.post0
# microsoft-kiota-serialization-text
# pendulum
# time-machine
python-dotenv==1.0.1
# via -r requirements.in
quart==0.19.6
# via
# -r requirements.in
Expand Down
2 changes: 1 addition & 1 deletion app/frontend/src/pages/chat/Chat.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ const Chat = () => {
if (event["context"] && event["context"]["data_points"]) {
event["message"] = event["delta"];
askResponse = event as ChatAppResponse;
} else if (event["delta"]["content"]) {
} else if (event["delta"] && event["delta"]["content"]) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated error that I found while presenting. This error meant we arent currently rendering errors correctly, as {error: } responses dont contain a delta key.

setIsLoading(false);
await updateState(event["delta"]["content"]);
} else if (event["context"]) {
Expand Down
16 changes: 0 additions & 16 deletions app/start.sh
Original file line number Diff line number Diff line change
@@ -1,21 +1,5 @@
#!/bin/sh

echo ""
echo "Loading azd .env file from current environment"
echo ""

while IFS='=' read -r key value; do
value=$(echo "$value" | sed 's/^"//' | sed 's/"$//')
export "$key=$value"
done <<EOF
$(azd env get-values)
EOF

if [ $? -ne 0 ]; then
echo "Failed to load environment variables from azd environment"
exit $?
fi

cd ../
echo 'Creating python virtual environment ".venv"'
python3 -m venv .venv
Expand Down
4 changes: 2 additions & 2 deletions docs/deploy_features.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,13 +224,13 @@ and will have ACLs associated with that directory. When the ingester runs, it wi
If you are enabling this feature on an existing index, you should also update your index to have the new `storageUrl` field:

```shell
./scripts/manageacl.ps1 -v --acl-action enable_acls
python ./scripts/manageacl.py -v --acl-action enable_acls
```

And then update existing search documents with the storage URL of the main Blob container:

```shell
./scripts/manageacl.ps1 -v --acl-action update_storage_urls --url <https://YOUR-MAIN-STORAGE-ACCOUNT.blob.core.windows.net/content/>
python ./scripts/manageacl.py -v --acl-action update_storage_urls --url <https://YOUR-MAIN-STORAGE-ACCOUNT.blob.core.windows.net/content/>
```

Going forward, all uploaded documents will have their `storageUrl` set in the search index.
Expand Down
Loading
Loading