Skip to content

Commit

Permalink
feat: use course cache to retrieve course data (#72)
Browse files Browse the repository at this point in the history
  • Loading branch information
alangsto authored Feb 28, 2024
1 parent fddf0bc commit dc9179c
Show file tree
Hide file tree
Showing 12 changed files with 136 additions and 54 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ Change Log
Unreleased
**********

4.1.0 - 2024-02-26
******************
* Use course cache to inject course title and course skill names into prompt template.

4.0.0 - 2024-02-21
******************
* Remove use of course waffle flag. Use the django setting LEARNING_ASSISTANT_AVAILABLE
Expand Down
50 changes: 50 additions & 0 deletions docs/decisions/0004-course-cache-use.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
4. Use of the LMS course caches
###############################

Status
******

**Accepted** *2024-02-26*

Context
*******
Each course run ID in edx-platform is associated with another course ID. While for many courses, the mapping between
course run ID and course ID is straight forward, i.e. edX+testX+2019 -> edX+testX, this is not the case for every
course on edX. The discovery service is the source of truth for mappings between course run ID and course IDs, and
string manipulation cannot be relied on as an accurate way to map between the two forms of ID.

The learning-assistant `CourseChatView`_ accepts a course run ID as a path parameter, but a number of API functions
in the learning assistant backend also require the course ID associated with a given course.

In our initial release, we also found that the current courses available in 2U's Xpert Platform team's index, which was
being used to inject course skill names and course titles into the system prompt (see `System Prompt Design Changes`_ for
original details), were too limited. Courses included in that index were conditionally added depending on course
enrollment dates and additional fields from the discovery course API. While the 2U Xpert Platform team may work to address
the gap in product needs for their current course index, an alternate method for retrieving course skills and title should
be considered.

Decision
********
In order to determine the mapping between a course run ID and course ID in the learning-assistant app, we will make
use of an `existing course run cache that is defined in edx-platform`_. Similarly, to retrieve the skill names and title of a course, we will also use
an `existing course cache`_. Both caches store data from the discovery API for course runs and courses, respectively.
These are long term caches with a TTL of 24 hours, and on a cache miss the discovery API will be called.

Consequences
************
* If the caches were to be removed, code in the learning-assistant repository would no longer function as expected.
* On a cache miss, the learning-assistant backend will incur additional performance cost on calls to the discovery API.

Rejected Alternatives
*********************
* Calling the discovery API directly from the learning-assistant backend
* This would require building a custom solution in the learning-assistant app to call the discovery service directly.
* Without a cache, this would impact performance on every call to the learning-assistant backend.
* Using string manipulation to map course run ID to course ID.
* If we do not use the discovery service as our source of truth for course run ID to course ID mappings,
we run the risk of being unable to support courses that do not fit the usual pattern mapping.

.. _existing course run cache that is defined in edx-platform: https://github.com/openedx/edx-platform/blob/c61df904c1d2a5f523f1da44460c21e17ec087ee/openedx/core/djangoapps/catalog/utils.py#L801
.. _CourseChatView: https://github.com/edx/learning-assistant/blob/fddf0bc27016bd4a1cabf82de7bcb80b51f3763b/learning_assistant/views.py#L29
.. _System Prompt Design Changes: https://github.com/edx/learning-assistant/blob/main/docs/decisions/0002-system-prompt-design-changes.rst
.. _existing course cache: https://github.com/openedx/edx-platform/blob/3a2b6dd8fcc909fd9128f81750f52650ba8ff906/openedx/core/djangoapps/catalog/utils.py#L767
2 changes: 1 addition & 1 deletion learning_assistant/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
Plugin for a learning assistant backend, intended for use within edx-platform.
"""

__version__ = '4.0.0'
__version__ = '4.1.0'

default_app_config = 'learning_assistant.apps.LearningAssistantConfig' # pylint: disable=invalid-name
23 changes: 19 additions & 4 deletions learning_assistant/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@
from learning_assistant.platform_imports import (
block_get_children,
block_leaf_filter,
get_cache_course_data,
get_cache_course_run_data,
get_single_block,
get_text_transcript,
traverse_block_pre_order,
Expand Down Expand Up @@ -101,19 +103,23 @@ def get_block_content(request, user_id, course_id, unit_usage_key):
return cache_data['content_length'], cache_data['content_items']


def render_prompt_template(request, user_id, course_id, unit_usage_key):
def render_prompt_template(request, user_id, course_run_id, unit_usage_key, course_id):
"""
Return a rendered prompt template, specified by the LEARNING_ASSISTANT_PROMPT_TEMPLATE setting.
"""
unit_content = ''

course_run_key = CourseKey.from_string(course_id)
course_run_key = CourseKey.from_string(course_run_id)
if unit_usage_key and course_content_enabled(course_run_key):
_, unit_content = get_block_content(request, user_id, course_id, unit_usage_key)
_, unit_content = get_block_content(request, user_id, course_run_id, unit_usage_key)

course_data = get_cache_course_data(course_id, ['skill_names', 'title'])
skill_names = course_data['skill_names']
title = course_data['title']

template_string = getattr(settings, 'LEARNING_ASSISTANT_PROMPT_TEMPLATE', '')
template = Environment(loader=BaseLoader).from_string(template_string)
data = template.render(unit_content=unit_content)
data = template.render(unit_content=unit_content, skill_names=skill_names, title=title)
return data


Expand Down Expand Up @@ -168,3 +174,12 @@ def set_learning_assistant_enabled(course_key, enabled):
course_key=obj.course_id,
enabled=obj.enabled
)


def get_course_id(course_run_id):
"""
Given a course run id (str), return the associated course key.
"""
course_data = get_cache_course_run_data(course_run_id, ['course'])
course_key = course_data['course']
return course_key
2 changes: 1 addition & 1 deletion learning_assistant/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

EXTERNAL_COURSE_KEY_PATTERN = r'([A-Za-z0-9-_:]+)'

COURSE_ID_PATTERN = rf'(?P<course_id>({INTERNAL_COURSE_KEY_PATTERN}|{EXTERNAL_COURSE_KEY_PATTERN}))'
COURSE_ID_PATTERN = rf'(?P<course_run_id>({INTERNAL_COURSE_KEY_PATTERN}|{EXTERNAL_COURSE_KEY_PATTERN}))'

ACCEPTED_CATEGORY_TYPES = ['html', 'video']
CATEGORY_TYPE_MAP = {
Expand Down
16 changes: 14 additions & 2 deletions learning_assistant/platform_imports.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,14 +51,26 @@ def get_cache_course_run_data(course_run_id, fields):
"""
Return course run related data given a course run id.
This function makes use of the discovery course run cache, which is necessary because
only the discovery service stores the relation between courseruns and courses.
This function makes use of the course run cache in the LMS, which caches data from the discovery service. This is
necessary because only the discovery service stores the relation between courseruns and courses.
"""
# pylint: disable=import-error, import-outside-toplevel
from openedx.core.djangoapps.catalog.utils import get_course_run_data
return get_course_run_data(course_run_id, fields)


def get_cache_course_data(course_id, fields):
"""
Return course related data given a course id.
This function makes use of the course cache in the LMS, which caches data from the discovery service. This is
necessary because only the discovery service stores course skills data.
"""
# pylint: disable=import-error, import-outside-toplevel
from openedx.core.djangoapps.catalog.utils import get_course_data
return get_course_data(course_id, fields)


def get_user_role(user, course_key):
"""
Return the role of the user on the edX platform.
Expand Down
19 changes: 4 additions & 15 deletions learning_assistant/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,6 @@
from requests.exceptions import ConnectTimeout
from rest_framework import status as http_status

from learning_assistant.platform_imports import get_cache_course_run_data

log = logging.getLogger(__name__)


Expand Down Expand Up @@ -60,24 +58,15 @@ def get_reduced_message_list(prompt_template, message_list):
return new_message_list


def get_course_id(course_run_id):
"""
Given a course run id (str), return the associated course key.
"""
course_data = get_cache_course_run_data(course_run_id, ['course'])
course_key = course_data['course']
return course_key


def create_request_body(prompt_template, message_list, courserun_id):
def create_request_body(prompt_template, message_list, course_id):
"""
Form request body to be passed to the chat endpoint.
"""
response_body = {
'context': {
'content': prompt_template,
'render': {
'doc_id': get_course_id(courserun_id),
'doc_id': course_id,
'fields': ['skillNames', 'title']
}
},
Expand All @@ -87,7 +76,7 @@ def create_request_body(prompt_template, message_list, courserun_id):
return response_body


def get_chat_response(prompt_template, message_list, courserun_id):
def get_chat_response(prompt_template, message_list, course_id):
"""
Pass message list to chat endpoint, as defined by the CHAT_COMPLETION_API setting.
"""
Expand All @@ -98,7 +87,7 @@ def get_chat_response(prompt_template, message_list, courserun_id):
connect_timeout = getattr(settings, 'CHAT_COMPLETION_API_CONNECT_TIMEOUT', 1)
read_timeout = getattr(settings, 'CHAT_COMPLETION_API_READ_TIMEOUT', 15)

body = create_request_body(prompt_template, message_list, courserun_id)
body = create_request_body(prompt_template, message_list, course_id)

try:
response = requests.post(
Expand Down
20 changes: 11 additions & 9 deletions learning_assistant/views.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
except ImportError:
pass

from learning_assistant.api import learning_assistant_enabled, render_prompt_template
from learning_assistant.api import get_course_id, learning_assistant_enabled, render_prompt_template
from learning_assistant.serializers import MessageSerializer
from learning_assistant.utils import get_chat_response, user_role_is_staff

Expand All @@ -34,9 +34,9 @@ class CourseChatView(APIView):
authentication_classes = (SessionAuthentication, JwtAuthentication,)
permission_classes = (IsAuthenticated,)

def post(self, request, course_id):
def post(self, request, course_run_id):
"""
Given a course ID, retrieve a chat response for that course.
Given a course run ID, retrieve a chat response for that course.
Expected POST data: {
[
Expand All @@ -46,7 +46,7 @@ def post(self, request, course_id):
}
"""
try:
courserun_key = CourseKey.from_string(course_id)
courserun_key = CourseKey.from_string(course_run_id)
except InvalidKeyError:
return Response(
status=http_status.HTTP_400_BAD_REQUEST,
Expand Down Expand Up @@ -89,11 +89,13 @@ def post(self, request, course_id):
'Attempting to retrieve chat response for user_id=%(user_id)s in course_id=%(course_id)s',
{
'user_id': request.user.id,
'course_id': course_id
'course_id': course_run_id
}
)

prompt_template = render_prompt_template(request, request.user.id, course_id, unit_id)
course_id = get_course_id(course_run_id)

prompt_template = render_prompt_template(request, request.user.id, course_run_id, unit_id, course_id)

status_code, message = get_chat_response(prompt_template, message_list, course_id)

Expand Down Expand Up @@ -122,16 +124,16 @@ class LearningAssistantEnabledView(APIView):
authentication_classes = (SessionAuthentication, JwtAuthentication,)
permission_classes = (IsAuthenticated,)

def get(self, request, course_id):
def get(self, request, course_run_id):
"""
Given a course ID, retrieve whether the Learning Assistant is enabled for the corresponding course.
Given a course run ID, retrieve whether the Learning Assistant is enabled for the corresponding course.
The response will be in the following format.
{'enabled': <bool>}
"""
try:
courserun_key = CourseKey.from_string(course_id)
courserun_key = CourseKey.from_string(course_run_id)
except InvalidKeyError:
return Response(
status=http_status.HTTP_400_BAD_REQUEST,
Expand Down
2 changes: 2 additions & 0 deletions test_settings.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,8 @@ def root(*args):
"{{ unit_content }}"
"\""
"{% endif %}"
"{{ skill_names }}"
"{{ title }}"
)

LEARNING_ASSISTANT_AVAILABLE = True
19 changes: 14 additions & 5 deletions tests/test_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ def setUp(self):
]
self.block = FakeBlock(self.children)

self.course_id = 'course-v1:edx+test+23'
self.course_run_id = 'course-v1:edx+test+23'

@ddt.data(
('video', True),
Expand Down Expand Up @@ -156,7 +156,7 @@ def test_get_block_content(self, mock_get_children_contents, mock_get_single_blo
# args does not matter for this test right now, as the `get_single_block` function is entirely mocked.
request = MagicMock()
user_id = 1
course_id = self.course_id
course_id = self.course_run_id
unit_usage_key = 'block-v1:edX+A+B+type@vertical+block@verticalD'

length, items = get_block_content(request, user_id, course_id, unit_usage_key)
Expand All @@ -183,25 +183,34 @@ def test_get_block_content(self, mock_get_children_contents, mock_get_single_blo
('', False),
)
@ddt.unpack
@patch('learning_assistant.api.get_cache_course_data')
@patch('learning_assistant.toggles._is_learning_assistant_waffle_flag_enabled')
@patch('learning_assistant.api.get_block_content')
def test_render_prompt_template(self, unit_content, flag_enabled, mock_get_content, mock_is_flag_enabled):
def test_render_prompt_template(
self, unit_content, flag_enabled, mock_get_content, mock_is_flag_enabled, mock_cache
):
mock_get_content.return_value = (len(unit_content), unit_content)
mock_is_flag_enabled.return_value = flag_enabled
skills_content = ['skills']
title = 'title'
mock_cache.return_value = {'skill_names': skills_content, 'title': title}

# mock arguments that are passed through to `get_block_content` function. the value of these
# args does not matter for this test right now, as the `get_block_content` function is entirely mocked.
request = MagicMock()
user_id = 1
course_id = self.course_id
course_run_id = self.course_run_id
unit_usage_key = 'block-v1:edX+A+B+type@vertical+block@verticalD'
course_id = 'edx+test'

prompt_text = render_prompt_template(request, user_id, course_id, unit_usage_key)
prompt_text = render_prompt_template(request, user_id, course_run_id, unit_usage_key, course_id)

if unit_content and flag_enabled:
self.assertIn(unit_content, prompt_text)
else:
self.assertNotIn('The following text is useful.', prompt_text)
self.assertIn(str(skills_content), prompt_text)
self.assertIn(title, prompt_text)


@ddt.ddt
Expand Down
11 changes: 2 additions & 9 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,7 @@ def setUp(self):
self.prompt_template = 'This is a prompt.'

self.message_list = [{'role': 'assistant', 'content': 'Hello'}, {'role': 'user', 'content': 'Goodbye'}]
self.course_id = 'course-v1:edx+test+23'
self.course_key = 'edx-test'

self.patcher = patch(
'learning_assistant.utils.get_cache_course_run_data',
return_value={'course': self.course_key}
)
self.patcher.start()
self.course_id = 'edx+test'

def get_response(self):
return get_chat_response(self.prompt_template, self.message_list, self.course_id)
Expand Down Expand Up @@ -99,7 +92,7 @@ def test_post_request_structure(self, mock_requests):
'context': {
'content': self.prompt_template,
'render': {
'doc_id': self.course_key,
'doc_id': self.course_id,
'fields': ['skillNames', 'title']
}
},
Expand Down
Loading

0 comments on commit dc9179c

Please sign in to comment.