Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide source code info for Symbol #180

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PaulZ-98
Copy link

@PaulZ-98 PaulZ-98 commented Jun 9, 2022

For function symbols, provide the source code file, optional line number, and optional column. This is useful for noting the source code file/line/col when disassembling a function. In the code, look up the module by the function Symbol address, and use the module to help get the line number.

root@ub20-sdb:~/drgn_dev/drgn# drgn
drgn 0.0.19+2.g57320ac.dirty (using Python 3.8.10, elfutils 0.176, with libkdumpfile)
For help, type help(drgn).
>>> import drgn
>>> from drgn import NULL, Object, cast, container_of, execscript, offsetof, reinterpret, sizeof
>>> from drgn.helpers.linux import *
>>> 
>>> prog.symbol('mutex_lock').source(offset=0, line_number=True, line_column=False)
'/build/linux-hwe-5.8-Ox7OXf/linux-hwe-5.8-5.8.0/kernel/locking/mutex.c: 280'
>>> prog.symbol('mutex_lock').source(offset=42, line_number=True, line_column=False)
'/build/linux-hwe-5.8-Ox7OXf/linux-hwe-5.8-5.8.0/kernel/locking/mutex.c: 284'
>>> 

First drgn PR, so was unsure how to develop a test for this... let me know. Also how to create drgn_Symbol_source_DOC instead of putting in a string directly? Thanks!

@PaulZ-98 PaulZ-98 force-pushed the drgn_symbol_source branch 3 times, most recently from 02ab1f2 to 0d2772a Compare June 9, 2022 19:56
@brenns10
Copy link
Contributor

Hi @PaulZ-98, I just wanted to provide some feedback on this. I'm not a maintainer so Omar's opinion will matter more here, but he may not be able to get back to you for a week or two.

First, this is a great idea, thanks for contributing it!

Second, regarding your question:

Also how to create drgn_Symbol_source_DOC instead of putting in a string directly?

These are generated from the _drgn.pyi stubs file directly. (See the script docs/exts/drgndoc/docstrings.py). You've already added the docstring to the stub file, so you should be able to just use drgn_Symbol_source_DOC directly, and it will be generated. If not, try doing a clean build.

Third, I have a few notes regarding the approach.

  • The stack frame object provides this information already. I know that it can't satisfy all the use cases that this interface could, since stack frames are limited to whatever is currently on some thread's stack. But I think the API is a bit better - it simply returns a tuple of (filename, line, column), and leaves the string formatting up to the user. It would be more consistent if your API ultimately returned the same tuple, and it would also ensure that users wouldn't need to parse the string returned by Symbol.source() if they did care about the column or line and wanted them as integers.
  • I'm not sure that this interface belongs with the Symbol object. Symbols may be a lot of things - code and data, which should have source information, but also untyped linker symbols or section names, or a variety of other stuff that doesn't really relate to a line in a source file. Symbols also have a lot of unpleasantness: namely, they aren't unique! The same symbol name can appear multiple times, and the same address can be contained by multiple symbols (thus the Prog.symbols() API). This does really happen due to static inline functions, or due to name collisions for static functions in different compilation units. Imagine you have an address and you'd like to get some source info about it. First, you'd need to create a symbol and deal with all of the above, including the possibility that there's no symbol containing that address -- something which seems distinctly possible to me. It seems that it would be better if we just had some sort of method prog.source_info(address), which would take a virtual address and return the tuple mentioned above. It would still be easy to use this API if you already have a Symbol, but it would prevent you from needing to interact with Symbols if you don't need to.

I'll also put a few code review notes in the PR itself, which should help regardless of whether Omar agrees with me on the approach.

Again, thanks for this and I hope to see it merged :)


if (!PyArg_ParseTupleAndKeywords(args,
kwargs,
"iii",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the Python docs you can use "k" to mean "unsigned long". That way you can support the full width of unsigned long and don't need the cast.

Further, you can use p to mean bool (see here), so that your bool usage above makes sense and matches the type stubs you created.

Suggested change
"iii",
"kpp",

return NULL;

(void) drgn_symbol_source(((Symbol *)self)->sym, p,
(unsigned long)offset,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you've changed to unsigned long, then avoid the cast.

Suggested change
(unsigned long)offset,
offset,

struct drgn_program *p = &prgm->prog;
int ln_num = 0;
int ln_col = 0;
int offset, line_number, line_column;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Offset should be unsigned long, and it would be cleaner to use bool for line_number and line_column - since you have ln_num and ln_col above, and it's not clear what their purpose will be.

Suggested change
int offset, line_number, line_column;
bool line_number, line_column;
unsigned long offset;

libdrgn/symbol.c Outdated
@@ -40,6 +45,39 @@ void drgn_symbol_from_elf(const char *name, uint64_t address,
ret->kind = DRGN_SYMBOL_KIND_UNKNOWN;
}

LIBDRGN_PUBLIC void drgn_symbol_source(struct drgn_symbol *sym,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API has no way of showing whether it succeeded or failed. Return struct drgn_error * and return NULL on success.

Suggested change
LIBDRGN_PUBLIC void drgn_symbol_source(struct drgn_symbol *sym,
LIBDRGN_PUBLIC struct drgn_error *
drgn_symbol_source(struct drgn_symbol *sym,

libdrgn/symbol.c Outdated
*line_col_ret = linecol;
}

return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return;
return NULL;

libdrgn/symbol.c Outdated
int *line_col_ret)
{
if (sym->kind != DRGN_SYMBOL_KIND_FUNC)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs an error return, otherwise caller won't know if src_file_ret is a valid string.

libdrgn/symbol.c Outdated
Comment on lines 62 to 63
if (dim == NULL)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, needs error return.

libdrgn/symbol.c Outdated
Comment on lines 66 to 67
if (line == NULL)
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

Comment on lines 41 to 42
char source[256];
char formatted[296];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do the magic numbers come from? Comments would be nice here, though if you end up ditching the string formatting, then it will be irrelevant.

@PaulZ-98
Copy link
Author

Thanks for reviewing @brenns10 I will go through your comments and start making changes.

For function symbols, provide the source code file,
line number, and column.

Signed-off-by: Paul Zuchowski <[email protected]>
@PaulZ-98
Copy link
Author

I think I've covered most of your comments @brenns10. I can always move the code out of Symbol if that's what we decide to do.

@pfactum
Copy link
Contributor

pfactum commented Sep 22, 2022

I'd love to have something like prog.source_info(address) implemented.

My use-case is crush, where I pair drgn with capstone to have a nice disassembler (see disassemble.py), and the only thing I miss now is translating arbitrary virtual address into a source code info.

Copy link
Owner

@osandov osandov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, I was on vacation for a month while you submitted this and have been perpetually behind on code review since then :) I agree with the other comments here that have the interface for this should be prog.source_info(address). I additionally left some comments about the implementation. Thanks for the contribution!

Comment on lines +306 to +323
struct drgn_debug_info_module *
drgn_debug_info_module_byaddress(struct drgn_debug_info *dbinfo, uint64_t addr)
{

for (struct drgn_debug_info_module_table_iterator it =
drgn_debug_info_module_table_first(&dbinfo->modules); it.entry; ) {
struct drgn_debug_info_module *module = *it.entry;
do {
struct drgn_debug_info_module *next = module->next;
if (!next)
it = drgn_debug_info_module_table_next(it);
if (addr >= module->start && addr <= module->end)
return module;
module = next;
} while (module);
}
return NULL;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding this, you can look up the Dwfl_Module * for an address with dwfl_addrmodule(). If you need the struct drgn_debug_info_module *, too, you can get it with dwfl_module_info(). Here's an example:

if (prog->dbinfo) {
Dwfl_Module *dwfl_module = dwfl_addrmodule(prog->dbinfo->dwfl,
pc - !regs->interrupted);
if (dwfl_module) {
void **userdatap;
dwfl_module_info(dwfl_module, &userdatap, NULL, NULL,
NULL, NULL, NULL, NULL);
struct drgn_debug_info_module *module = *userdatap;

"could not locate module from function address");


Dwfl_Line *line = dwfl_module_getsrc(dim->dwfl_module, sym_address);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dwfl_module_getsrc() doesn't work on binaries compiled by Clang, because Clang doesn't generate .debug_aranges by default and dwfl_module_getsrc() only checks .debug_aranges. I would really like this to work with Clang as well. I had to work around this same problem in add17a9. The difference here is that we don't have the CU containing the address yet. We need a function like:

struct drgn_error *
drgn_debug_info_module_find_dwarf_cu(struct drgn_debug_info_module *module,
				     uint64_t pc, uint64_t *bias_ret,
				     Dwarf_Die *cu_die_ret)

Which would be very similar to the beginning of drgn_debug_info_module_find_dwarf_scopes(), just without needing to iterate into children of the CU DIE. (Let me know if you need help factoring that out.) Then, you can do something similar to drgn_stack_frame_source():

drgn/libdrgn/stack_trace.c

Lines 312 to 331 in 330c71b

Dwarf_Addr bias;
if (!dwfl_module_getdwarf(dwfl_module, &bias))
return NULL;
pc.value = pc.value - bias - !regs->interrupted;
Dwarf_Die *scopes = trace->frames[frame].scopes;
size_t num_scopes = trace->frames[frame].num_scopes;
Dwarf_Die cu_die;
if (!dwarf_cu_die(scopes[num_scopes - 1].cu, &cu_die, NULL,
NULL, NULL, NULL, NULL, NULL))
return NULL;
Dwarf_Line *line = dwarf_getsrc_die(&cu_die, pc.value);
if (!line)
return NULL;
if (line_ret)
dwarf_lineno(line, line_ret);
if (column_ret)
dwarf_linecol(line, column_ret);
return dwarf_linesrc(line, NULL, NULL);

int lineno, linecol;
if ((src = dwfl_lineinfo(line, &sym_address, &lineno, &linecol,
NULL, NULL)) != NULL) {
strncpy(src_file_ret, src, size);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather that we just strdup() this and leave it up to the caller to free() it instead of requiring a size.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants