Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux GETTID syscall problem on OpenWrt #98

Open
aaaaalbert opened this issue Jun 2, 2015 · 18 comments
Open

Linux GETTID syscall problem on OpenWrt #98

aaaaalbert opened this issue Jun 2, 2015 · 18 comments

Comments

@aaaaalbert
Copy link
Collaborator

I saw this exception running RepyV2 code in a vessel on Guilherme Martin's WNDR3800 OpenWrt router. My program calls getresources(). Then linux_api.py tries to read the current task's CPU consumption, but the _get_current_thread_id() call returns -1. This leads get_current_thread_cpu_time() into a non-existing dir, crashing the Repy program.

========================================
Running program: easy-loop.r2py
Arguments: []
========================================

---
Uncaught exception!

---
Following is a full traceback, and a user traceback.
The user traceback excludes non-user modules. The most recent call is displayed last.

Full debugging traceback:
  "repyV2/repy.py", line 177, in execute_namespace_until_completion
  "/mnt/seattle/seattle_repy/repyV2/virtual_namespace.py", line 117, in evaluate
  "/mnt/seattle/seattle_repy/repyV2/safe.py", line 588, in safe_run
  "easy-loop.r2py", line 3, in <module>
  "/mnt/seattle/seattle_repy/repyV2/nonportable.py", line 319, in get_resources
  "/mnt/seattle/seattle_repy/repyV2/linux_api.py", line 227, in get_current_thread_cpu_time
  "/mnt/seattle/seattle_repy/repyV2/linux_api.py", line 100, in _process_stat_file

User traceback:
  "easy-loop.r2py", line 3, in <module>
  "/mnt/seattle/seattle_repy/repyV2/linux_api.py", line 227, in get_current_thread_cpu_time
  "/mnt/seattle/seattle_repy/repyV2/linux_api.py", line 100, in _process_stat_file

Exception (with type 'exceptions.IOError'): [Errno 2] No such file or directory: '/proc/5652/task/-1/stat'
@aaaaalbert
Copy link
Collaborator Author

@XuefengHuang, could you please test this on your OpenWrt install?

@XuefengHuang
Copy link

Sure! I get the same exception on TL-WDR3600 OpenWrt router.

Running program: test.r2py
Arguments: []
========================================
---
Uncaught exception!
---
Following is a full traceback, and a user traceback.
The user traceback excludes non-user modules. The most recent call is displayed last.

Full debugging traceback:
  "repyV2/repy.py", line 177, in execute_namespace_until_completion
  "/seattle/seattle_repy/repyV2/virtual_namespace.py", line 117, in evaluate
  "/seattle/seattle_repy/repyV2/safe.py", line 588, in safe_run
  "test.r2py", line 3, in <module>
  "/seattle/seattle_repy/repyV2/nonportable.py", line 319, in get_resources
  "/seattle/seattle_repy/repyV2/linux_api.py", line 227, in get_current_thread_cpu_time
  "/seattle/seattle_repy/repyV2/linux_api.py", line 100, in _process_stat_file

User traceback:
  "test.r2py", line 3, in <module>
  "/seattle/seattle_repy/repyV2/linux_api.py", line 227, in get_current_thread_cpu_time
  "/seattle/seattle_repy/repyV2/linux_api.py", line 100, in _process_stat_file

Exception (with type 'exceptions.IOError'): [Errno 2] No such file or directory: '/proc/1377/task/-1/stat'
---

@XuefengHuang
Copy link

It seems that OpenWrt doesn't have libc.so.6 so _get_current_thread_id() call returns -1.
OpenWrt:

>>> import ctypes.util
>>> libc = ctypes.CDLL(ctypes.util.find_library("c"))
>>> print ctypes.util.find_library("c")
None
>>> libc.syscall(186)
-1
>>> libc.syscall(224)
-1

Ubuntu:

>>> import ctypes.util
>>> libc = ctypes.CDLL(ctypes.util.find_library("c"))
>>> print ctypes.util.find_library("c")
libc.so.6
>>> libc.syscall(186)
2424

@aaaaalbert
Copy link
Collaborator Author

OK, that's interesting. Can you check the contents of the procfs directory for a running sandbox and see if it contains anything in its task subdir?

@XuefengHuang
Copy link

On OpenWrt, the pid is the same as thread id. If I change this line to file = "/proc/"+str(pid)+"/task/"+str(pid)+"/stat", it will work! But I think it will not work if one program have multiple threads.

@aaaaalbert
Copy link
Collaborator Author

Hmmm.

def just_wait():
  while True:
    sleep(100)

for i in range(5):
  createthread(just_wait)

While this runs, ls the procfs task dir again. I would think that even a non-threaded sandboxed program should have a process with at least two threads (the sandbox and the nanny), but try it and we will know more.

@XuefengHuang
Copy link

Yes, it has two threads. This link is helpful to explain why ctypes.util.find_library("c") doesn't work on OpenWrt.

@aaaaalbert
Copy link
Collaborator Author

That's one relatively deep issue it seems. Their observation is that Python's ctypes.util lacks clauses for certain platforms such as MIPS. This seems to be the piece of code they are referring to, https://github.com/python/cpython/blob/2.7/Lib/ctypes/util.py#L220-L227 (EDITED to point to the Python 2.7 sources), although I couldn't verify that this is the actual branch taken on OpenWrt.

However, let's consider a comprehensive fix for this later (if at all --- @python's bug tracker has just too many ctypes-related issues open, whereas @openwrt's packages repo's doesn't have one corresponding to the ticket you linked to).

The easy patch I would propose is this: Just report the total CPU time as the thread's CPU time for now. This errs on the safe side in that it reports a higher number than the actual consumption was.

A not much more difficult way to work around the ctypes problem would be to use Python + procfs instead, and change the createthread function to

  • Acquire a lock.
  • os.listdir the task's directory under /proc. Note the existing thread IDs.
  • Spawn a new thread.
  • listdir again and observe what thread ID was added.
  • Store the TID as a private property of the newly-spawned thread.

Then, when getresources comes, it looks up the TID attached to the thread object, and can read from the proper subdir in proc. Making sense?


(In the long run, my bet would be we should phase out ctypes completely, but I didn't look yet to see how feasible / how much work this will be.)

@aaaaalbert
Copy link
Collaborator Author

@XuefengHuang, how is this progressing?

@XuefengHuang
Copy link

I tried to use Python + procfs instead, and change the createthread function as Albert mentioned above. However, returned TID is always empty. I think the reason is that getresource don't call createthread in fact.

I also tried to update OpenWrt firmware from Barrier Breaker 14.07 to Chaos Calmer15.05 to see whether ctypes works. Unfortunately, the new version firmware also doesn't work.

@aaaaalbert
Copy link
Collaborator Author

I can imagine that the proposed fix for createthread is a little tricky to implement.

However, I think you should be OK just patching getresources for now so that it just avoids the GETTID syscall in https://github.com/SeattleTestbed/repy_v2/blob/master/nonportable.py#L319 .

(Edit: Fixed linked line number.)

@aaaaalbert
Copy link
Collaborator Author

I dug through ctypes/utils.py which is the cause of our problem as it doesn't find libc on OpenWrt. I also found Python issue 13508 that documents the backgrounds. In short, util.find_library's methods of trying to find libraries are not very good, especially when running on non-development machines that lack gcc, ldconfig, objdump, and similar tools.

However, libc.so does exist on my OpenWrt x86 VM. It's in /lib. Thus, I can hardcode the path to libc in nix_common_api.py, just as the Android portion does:

25c25,31
<   libc = ctypes.CDLL(ctypes.util.find_library("c"))
---
>   path_to_libc = ctypes.util.find_library("c")
>   if path_to_libc is None:
>     # `find_library` failed to find `libc`. Use the hardcoded location 
>     # valid on OpenWrt.
>     path_to_libc = "/lib/libc.so.0"
>   libc = ctypes.CDLL(path_to_libc)

This makes syscall and thus GETTID work. @XuefengHuang, could you please take a look on your MIPS box?

@XuefengHuang
Copy link

Yes, I also can find libc.so.0 in /lib. But ctypes.util.find_library("c") return libc.so.6 in Ubuntu, so I am not sure whether libc.so.0 works. I will check it!

@aaaaalbert
Copy link
Collaborator Author

If you ls -l /lib, you'll see that libc.so.0 links to a libUclibc-.... shared object. That the file names don't match should not be a problem.

@XuefengHuang
Copy link

I tried libc.so.0 but it doesn't work on OpenWrt. There is the result:

>>> import ctypes.util
>>> libc = ctypes.CDLL("/lib/libc.so.0")
>>> libc.syscall(186)
-1
>>> libc.syscall(224)
-1
>>> 

@aaaaalbert
Copy link
Collaborator Author

Very interesting. I do the precise same thing on OpenWrt-x86, and it works.

So we're back to "patch out thread accounting", I guess...

XuefengHuang added a commit to XuefengHuang/seattle-package that referenced this issue Feb 9, 2016
Since this issue (SeattleTestbed/repy_v2#98), we decide to patch out thread accounting.
XuefengHuang added a commit to XuefengHuang/seattle-package that referenced this issue Feb 9, 2016
Since this issue (SeattleTestbed/repy_v2#98), we decide to patch out thread accounting.
@aaaaalbert
Copy link
Collaborator Author

Confirming your negative result on my TP-Link MR3020 (MIPS 32-bit) running a custom Chaos Calmer image.

However, I started to think more about how uClibc might affect which syscalls have what number, and behold, gettid has yet a different number in the uClibc / MIPS ABI: 4222!

Relevant header file reference:

We need to hardcode the path to libc.so, and then use a different syscall number due to the non-x86 / non-x64 ABI. (In other words, let's get rid of syscall as soon as possible.)

@aaaaalbert
Copy link
Collaborator Author

For reference, you can hardcode the ctypes path in nix_common_api.py like so:

25c25,31
<   libc = ctypes.CDLL(ctypes.util.find_library("c"))
---
>   path_to_libc = ctypes.util.find_library("c")
>   if path_to_libc is None:
>     # `find_library` failed. Use the hardcoded location 
>     # valid on OpenWrt.
>     path_to_libc = "/lib/libc.so.0"
>   libc = ctypes.CDLL(path_to_libc)
> 

aaaaalbert added a commit to aaaaalbert/repy_v2 that referenced this issue Jul 19, 2017
This commit fixes an issue with the GETTID syscall on MIPS-based
systems such as most of our OpenWrt routers.

At the same time, it *creates* an issue on non-x86 and non-MIPS
systems, as I don't know and can't test the GETTID for, say,
ARMs right now. A friendly TODO comment points out this situation.
aaaaalbert added a commit to aaaaalbert/repy_v2 that referenced this issue Jul 20, 2017
This is a follow-up commit to 643b912
which fixed SeattleTestbed#98 (GETTID on MIPS) but broke it
for ARMs at the same time. GETTID's syscall number on ARM is identical
with x86_32's, or so my testing on a Raspberry Pi Model B Rev 2 with
BCM2835 SoC with an ARM CPU core indicates.

This commit also adds raising an exception to signal that we have no
GETTID number in case we are running on a platform other than
x86_32/64, MIPS, or ARM.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants