Skip to content
This repository has been archived by the owner on Mar 1, 2024. It is now read-only.

Commit

Permalink
Fix chain of table prompts
Browse files Browse the repository at this point in the history
  • Loading branch information
MarouaneMaatouk committed Jan 23, 2024
1 parent dcdb1fa commit c1c37e1
Showing 1 changed file with 29 additions and 63 deletions.
92 changes: 29 additions & 63 deletions llama_hub/llama_packs/tables/chain_of_table/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -73,10 +73,10 @@ def generate_prompt_component(self, **kwargs: Any) -> QueryComponent:
If the table needs an extra inferred column to answer the question, we use f_add_column() to
add this column. For example,
/*
col : Week | When | Kickoff | Opponent | Results; Final score | Results; Team record
row 1 : 1 | Saturday, April 13 | 7:00 p.m. | at Rhein Fire | W 27-21 | 1-0
row 2 : 2 | Saturday, April 20 | 7:00 p.m. | London Monarchs | W 37-3 | 2-0
row 3 : 3 | Sunday, April 28 | 6:00 p.m. | at Barcelona Dragons | W 33-29 | 3-0
col : Week | When | Kickoff | Opponent | Results; Final score | Results; Team record | Attendance
row 1 : 1 | Saturday, April 13 | 7:00 p.m. | at Rhein Fire | W 27-21 | 1-0 | 32092 supporters
row 2 : 2 | Saturday, April 20 | 7:00 p.m. | London Monarchs | W 37-3 | 2-0 | 34186 supporters
row 3 : 3 | Sunday, April 28 | 6:00 p.m. | at Barcelona Dragons | W 33-29 | 3-0 | 17503 supporters
*/
Question : what is the date of the competition with highest attendance?
The existing columns are: "Week", "When", "Kickoff", "Opponent", "Results; Final score",
Expand All @@ -97,7 +97,7 @@ def generate_prompt_component(self, **kwargs: Any) -> QueryComponent:
Question : Whose home team score is higher, richmond or st kilda?
Function : f_select_row(row 1, row 3)
Explanation: The question asks about the home team score of richmond and st kilda. We need
to know the the information of richmond and st kilda in row 1 and row 3. We select row 1
to know the information of richmond and st kilda in row 1 and row 3. We select row 1
and row 3.
If the table only needs a few columns to answer the question, we use
Expand All @@ -124,7 +124,7 @@ def generate_prompt_component(self, **kwargs: Any) -> QueryComponent:
*/
Question: tell me the number of athletes from japan.
Function : f_group_by(Country)
Explanation: The question asks about the number of athletes from India. Each row is about
Explanation: The question asks about the number of athletes from japan. Each row is about
an athlete. We can group column "Country" to group the athletes from the same country.
If the question asks about the order of items in a column, we use f_sort_by() to sort
Expand Down Expand Up @@ -162,7 +162,7 @@ def generate_prompt_component(self, **kwargs: Any) -> QueryComponent:
col : Date | Division | League | Regular Season | Playoffs | Open Cup
row 1 : 2001/01/02 | 2 | USL A-League | 4th, Western | Quarterfinals | Did not qualify
row 2 : 2002/08/06 | 2 | USL A-League | 2nd, Pacific | 1st Round | Did not qualify
row 5 : 2005/03/24 | 2 | USL First Division | 5th | Quarterfinals | 4th Round
row 5 : 2005/03/24 | 2 | USL A-League | 5th | Quarterfinals | 4th Round
*/
Question: what was the last year where this team was a part of the usl a-league?
Candidates: {candidates}
Expand Down Expand Up @@ -198,15 +198,15 @@ def generate_prompt_component(self, **kwargs: Any) -> QueryComponent:
2. Datetype: the strings that describe a date, such as year, month, day
3. String: other strings
/*
col : Week | When | Kickoff | Opponent | Results; Final score | Results; Team record
row 1 : 1 | Saturday, April 13 | 7:00 p.m. | at Rhein Fire | W 27-21 | 1-0
row 2 : 2 | Saturday, April 20 | 7:00 p.m. | London Monarchs | W 37-3 | 2-0
row 3 : 3 | Sunday, April 28 | 6:00 p.m. | at Barcelona Dragons | W 33-29 | 3-0
col : Week | When | Kickoff | Opponent | Results; Final score | Results; Team record | Attendance
row 1 : 1 | Saturday, April 13 | 7:00 p.m. | at Rhein Fire | W 27-21 | 1-0 | 32092 supporters
row 2 : 2 | Saturday, April 20 | 7:00 p.m. | London Monarchs | W 37-3 | 2-0 | 34186 supporters
row 3 : 3 | Sunday, April 28 | 6:00 p.m. | at Barcelona Dragons | W 33-29 | 3-0 | 17503 supporters
*/
Question: what is the date of the competition with highest attendance?
The existing columns are: "Week", "When", "Kickoff", "Opponent", "Results; Final score",
"Results; Team record", "Game site", "Attendance".
Explanation: the question asks about the date of the competition with highest score. Each
Explanation: the question asks about the date of the competition with attendance number. Each
row is about one competition. We extract the value from column "Attendance" and create a
different column "Attendance number" for each row. The datatype is Numerical.
Therefore, the answer is: f_add_column(Attendance number). The value: 32092 | 34186 | 17503
Expand Down Expand Up @@ -333,37 +333,11 @@ def fn(self, table: pd.DataFrame, args: Any) -> pd.DataFrame:
select_column_schema = SelectColumnSchema()


# select_args_str = """\
# Using f_select_row() to select relevant rows in the given table that support or oppose the
# statement.
# Please use f_select_row([*]) to select all rows in the table.
# /*
# table caption : 1972 vfl season.
# col : home team | home team score | away team | away team score | venue | crowd
# row 1 : st kilda | 13.12 (90) | melbourne | 13.11 (89) | moorabbin oval | 18836
# row 2 : south melbourne | 9.12 (66) | footscray | 11.13 (79) | lake oval | 9154
# row 3 : richmond | 20.17 (137) | fitzroy | 13.22 (100) | mcg | 27651
# row 4 : geelong | 17.10 (112) | collingwood | 17.9 (111) | kardinia park | 23108
# row 5 : north melbourne | 8.12 (60) | carlton | 23.11 (149) | arden street oval | 11271
# row 6 : hawthorn | 15.16 (106) | essendon | 12.15 (87) | vfl park | 36749
# */
# statement : what is the away team with the highest score?
# explain : the statement want to ask the away team of highest away team score. the highest
# away team score is 23.11 (149). it is on the row 5.so we need row 5.
# The answer is : f_select_row([row 5])
# """
# select_args_prompt = PromptTemplate(select_args_str)
# select_args_schema = FunctionSchema(
# prompt=select_args_str,
# regex="f_select_row\([(.*)]\)",
# )

select_row_str = """\
Using f_select_row() to select relevant rows in the given table that support or oppose the
statement.
Please use f_select_row([*]) to select all rows in the table.
/*
table caption : 1972 vfl season.
col : home team | home team score | away team | away team score | venue | crowd
row 1 : st kilda | 13.12 (90) | melbourne | 13.11 (89) | moorabbin oval | 18836
row 2 : south melbourne | 9.12 (66) | footscray | 11.13 (79) | lake oval | 9154
Expand All @@ -374,7 +348,7 @@ def fn(self, table: pd.DataFrame, args: Any) -> pd.DataFrame:
*/
statement : what is the away team with the highest score?
explain : the statement want to ask the away team of highest away team score. the highest
away team score is 23.11 (149). it is on the row 5.so we need row 5.
away team score is 23.11 (149). it is on the row 5. so we need row 5.
The answer is : f_select_row([row 5])
{serialized_table}
Expand Down Expand Up @@ -422,9 +396,9 @@ def fn(self, table: pd.DataFrame, args: Any) -> pd.DataFrame:
row 5 : 5 | 8 | Marina Maslyonko (KAZ) | 53.99 | KAZ
*/
Question: tell me the number of athletes from japan.
Explanation: The question asks about the number of athletes from japan.
The existing columns are: Rank, Lane, Athlete, Time, Country.
Explanation: The question asks about the number of athletes from India. Each row is about
an athlete. We can group column "Country" to group the athletes from the same country.
And each row is about an athlete. We can group column "Country" to group the athletes from the same country.
Therefore, the answer is: f_group_by(Country).
{serialized_table}
Expand Down Expand Up @@ -460,31 +434,30 @@ def fn(self, table: pd.DataFrame, args: Any) -> pd.DataFrame:

sort_by_str = """\
To answer the question, we can first use f_sort_by() to sort the values in a column to get
the
order of the items. The order can be "large to small" or "small to large".
the order of the items. The order can be "large to small" or "small to large".
The column to sort should have these data types:
1. Numerical: the numerical strings that can be used in sort
2. DateType: the strings that describe a date, such as year, month, day
3. String: other strings
/*
col : Position | Club | Played | Points | Wins | Draws | Losses | Goals for | Goals against
row 1 : 1 | Malaga CF | 42 | 79 | 22 | 13 | 7 | 72 | 47
row 10 : 10 | CP Merida | 42 | 59 | 15 | 14 | 13 | 48 | 41
row 3 : 3 | CD Numancia | 42 | 73 | 21 | 10 | 11 | 68 | 40
*/
More rules:
- The answer MUST be in the format "the answer is: f_sort_by(Arg1)", where Arg1 is the
column name.
- The answer CANNOT include multiple columns
- You CANNOT run f_sort_by on a row. For instance, f_sort_by(row 1) is not allowed.
/*
col : Position | Club | Played | Points | Wins | Draws | Losses | Goals for | Goals against
row 1 : 1 | Malaga CF | 42 | 79 | 22 | 13 | 7 | 72 | 47
row 10 : 10 | CP Merida | 42 | 59 | 15 | 14 | 13 | 48 | 41
row 3 : 3 | CD Numancia | 42 | 73 | 21 | 10 | 11 | 68 | 40
*/
Question: what club placed in the last position?
Explanation: the question asks about the club in the last position.
The existing columns are: Position, Club, Played, Points, Wins, Draws, Losses, Goals for,
Goals against
Explanation: the question asks about the club in the last position. Each row is about a
club. We need to know the order of position from last to front. There is a column for
position and the column name is Position. The datatype is Numerical.
Goals against.
And Each row is about a club. We need to know the order of position from last to front.
There is a column for position and the column name is Position. The datatype is Numerical.
Therefore, the answer is: f_sort_by(Position), the order is "large to small".
{serialized_table}
Expand Down Expand Up @@ -523,16 +496,9 @@ def fn(self, table: pd.DataFrame, args: Any) -> pd.DataFrame:
Here is the table to answer this question. Please understand the table and answer the
question:
/*
col : Rank | City | Passengers Number | Ranking | Airline
row 1 : 1 | United States, Los Angeles | 14749 | 2 | Alaska Airlines
row 2 : 2 | United States, Houston | 5465 | 8 | United Express
row 3 : 3 | Canada, Calgary | 3761 | 5 | Air Transat, WestJet
row 4 : 4 | Canada, Saskatoon | 2282 | 4 |
row 5 : 5 | Canada, Vancouver | 2103 | 2 | Air Transat
row 6 : 6 | United States, Phoenix | 1829 | 1 | US Airways
row 7 : 7 | Canada, Toronto | 1202 | 1 | Air Transat, CanJet
row 8 : 8 | Canada, Edmonton | 110 | 2 |
row 9 : 9 | United States, Oakland | 107 | 5 |
col : City | Passengers Number
row 1 : United States, Los Angeles | 14749
row 2 : Canada, Saskatoon | 2282
*/
Question: how many more passengers flew to los angeles than to saskatoon from manzanillo
airport in 2013?
Expand Down

0 comments on commit c1c37e1

Please sign in to comment.