-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create KGTK augment command #638
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every function should have augmented parameters. Lot of changes required. Some repeated mistakes from previous PRs. Please update
@@ -0,0 +1,177 @@ | |||
## Summary | |||
|
|||
This command will augmented graph from a KGTK Edge file with numeric value in float (or date) on node2. This command will automatically detect date in wikidata format and transform it to float in year |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the grammar. Also I can't understand what this command will do from this description. Please update
|
||
### The Output File | ||
|
||
The output file is an edge file for each mode that contains the following columns: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what extra edges will be added? Example?
-o OUTPUT_FILE, --output-file OUTPUT_FILE | ||
The KGTK output file. (May be omitted or '-' for | ||
stdout.) | ||
--dataset DATASET Specify the location of dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly is the location of dataset?
The KGTK output file. (May be omitted or '-' for | ||
stdout.) | ||
--dataset DATASET Specify the location of dataset. | ||
--train-file-name TRAIN_FILE_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these new parameters need to have longer descriptions
Specify name for training file | ||
--numerical-literal-name NUM_LITERAL_NAME | ||
Specify name for numerical literal file | ||
--valid-file-name VALID_FILE_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
?? I will only add this comment here, add longer description help messages
collections_raw = defaultdict(list) | ||
|
||
if train_edges_raw is not None: | ||
for i, row in train_edges_raw.iterrows(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove df.iterrows(), it is the most inefficient function
import pandas as pd | ||
from tqdm import tqdm | ||
from bisect import bisect | ||
from kgtk.augment.utils import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import * again
|
||
def gen_plabel(pnode, unit=None): | ||
if not unit: | ||
return pnode + ' (Interval)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use f strings everywhere, be consistent
parser.add_output_file() | ||
|
||
parser.add_argument('--dataset', dest='dataset', type=str, | ||
default=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add longer meaningful help messages
from kgtk.cli_entry import cli_entry | ||
from kgtk.exceptions import KGTKArgumentParseException | ||
import glob | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for all of unit tests, you have to test on content also, instead of only length
No description provided.