Link "read more" nella pagina principale #2

claudioscordino · 2012-05-18T13:59:54Z

Il link "Read more" alla pagina principale (https://github.com/jlelli/sched-deadline/) linka il file README di Linux!

Claudio

All PA1.1 systems have been oopsing on boot since commit f311847 Author: James Bottomley <[email protected]> Date: Wed Dec 22 10:22:11 2010 -0600 parisc: flush pages through tmpalias space because a PA2.0 instruction was accidentally introduced into the PA1.1 TLB insertion interruption path when it was consolidated with the do_alias macro. Fix the do_alias macro only to use PA2.0 instructions if compiled for 64 bit. Cc: [email protected] #2.6.39+ Signed-off-by: James Bottomley <[email protected]>

As pointed out by serveral people, PA1.1 only has a type 26 instruction meaning that the space register must be explicitly encoded. Not giving an explicit space means that the compiler uses the type 24 version which is PA2.0 only resulting in an illegal instruction crash. This regression was caused by commit f311847 Author: James Bottomley <[email protected]> Date: Wed Dec 22 10:22:11 2010 -0600 parisc: flush pages through tmpalias space Reported-by: Helge Deller <[email protected]> Signed-off-by: John David Anglin <[email protected]> Cc: [email protected] #2.6.39+ Signed-off-by: James Bottomley <[email protected]>

Since XRC support was added, the uverbs code has locked SRQ, CQ and PD objects needed during QP and SRQ creation in different orders depending on the the code path. This leads to the (at least theoretical) possibility of deadlock, and triggers the lockdep splat below. Fix this by making sure we always lock the SRQ first, then CQs and finally the PD. ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc5+ #34 Not tainted ------------------------------------------------------- ibv_srq_pingpon/2484 is trying to acquire lock: (SRQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] but task is already holding lock: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (CQ-uobj){+++++.}: [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe [<ffffffff81384f28>] down_read+0x34/0x43 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs] [<ffffffffa00b16c3>] ib_uverbs_create_qp+0x180/0x684 [ib_uverbs] [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs] [<ffffffff810fe47f>] vfs_write+0xa7/0xee [<ffffffff810fe65f>] sys_write+0x45/0x69 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b -> #1 (PD-uobj){++++++}: [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe [<ffffffff81384f28>] down_read+0x34/0x43 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs] [<ffffffffa00af8ad>] __uverbs_create_xsrq+0x96/0x386 [ib_uverbs] [<ffffffffa00b31b9>] ib_uverbs_detach_mcast+0x1cd/0x1e6 [ib_uverbs] [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs] [<ffffffff810fe47f>] vfs_write+0xa7/0xee [<ffffffff810fe65f>] sys_write+0x45/0x69 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b -> #0 (SRQ-uobj){+++++.}: [<ffffffff81070898>] __lock_acquire+0xa29/0xd06 [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe [<ffffffff81384f28>] down_read+0x34/0x43 [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs] [<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs] [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs] [<ffffffff810fe47f>] vfs_write+0xa7/0xee [<ffffffff810fe65f>] sys_write+0x45/0x69 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b other info that might help us debug this: Chain exists of: SRQ-uobj --> PD-uobj --> CQ-uobj Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(CQ-uobj); lock(PD-uobj); lock(CQ-uobj); lock(SRQ-uobj); *** DEADLOCK *** 3 locks held by ibv_srq_pingpon/2484: #0: (QP-uobj){+.+...}, at: [<ffffffffa00b162c>] ib_uverbs_create_qp+0xe9/0x684 [ib_uverbs] #1: (PD-uobj){++++++}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] #2: (CQ-uobj){+++++.}, at: [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] stack backtrace: Pid: 2484, comm: ibv_srq_pingpon Not tainted 3.4.0-rc5+ #34 Call Trace: [<ffffffff8137eff0>] print_circular_bug+0x1f8/0x209 [<ffffffff81070898>] __lock_acquire+0xa29/0xd06 [<ffffffffa00af37c>] ? __idr_get_uobj+0x20/0x5e [ib_uverbs] [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffff81070fd0>] lock_acquire+0xbf/0xfe [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffff81070eee>] ? lock_release+0x166/0x189 [<ffffffff81384f28>] down_read+0x34/0x43 [<ffffffffa00af51b>] ? idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af51b>] idr_read_uobj+0x2f/0x4d [ib_uverbs] [<ffffffffa00af542>] idr_read_obj+0x9/0x19 [ib_uverbs] [<ffffffffa00b1728>] ib_uverbs_create_qp+0x1e5/0x684 [ib_uverbs] [<ffffffff81070fec>] ? lock_acquire+0xdb/0xfe [<ffffffff81070c09>] ? lock_release_non_nested+0x94/0x213 [<ffffffff810d470f>] ? might_fault+0x40/0x90 [<ffffffff810d470f>] ? might_fault+0x40/0x90 [<ffffffffa00ae3dd>] ib_uverbs_write+0xb7/0xc2 [ib_uverbs] [<ffffffff810fe47f>] vfs_write+0xa7/0xee [<ffffffff810ff736>] ? fget_light+0x3b/0x99 [<ffffffff810fe65f>] sys_write+0x45/0x69 [<ffffffff8138cdf9>] system_call_fastpath+0x16/0x1b Reported-by: Or Gerlitz <[email protected]> Signed-off-by: Roland Dreier <[email protected]>

The following lockdep problem was reported by Or Gerlitz <[email protected]>: [ INFO: possible recursive locking detected ] 3.3.0-32035-g1b2649e-dirty #4 Not tainted --------------------------------------------- kworker/5:1/418 is trying to acquire lock: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0138a41>] rdma_destroy_i d+0x33/0x1f0 [rdma_cm] but task is already holding lock: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disable_ca llback+0x24/0x45 [rdma_cm] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&id_priv->handler_mutex); lock(&id_priv->handler_mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by kworker/5:1/418: #0: (ib_cm){.+.+.+}, at: [<ffffffff81042ac1>] process_one_work+0x210/0x4a 6 #1: ((&(&work->work)->work)){+.+.+.}, at: [<ffffffff81042ac1>] process_on e_work+0x210/0x4a6 #2: (&id_priv->handler_mutex){+.+.+.}, at: [<ffffffffa0135130>] cma_disab le_callback+0x24/0x45 [rdma_cm] stack backtrace: Pid: 418, comm: kworker/5:1 Not tainted 3.3.0-32035-g1b2649e-dirty #4 Call Trace: [<ffffffff8102b0fb>] ? console_unlock+0x1f4/0x204 [<ffffffff81068771>] __lock_acquire+0x16b5/0x174e [<ffffffff8106461f>] ? save_trace+0x3f/0xb3 [<ffffffff810688fa>] lock_acquire+0xf0/0x116 [<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffff81364351>] mutex_lock_nested+0x64/0x2ce [<ffffffffa0138a41>] ? rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155 [<ffffffff81065abc>] ? trace_hardirqs_on+0xd/0xf [<ffffffffa0138a41>] rdma_destroy_id+0x33/0x1f0 [rdma_cm] [<ffffffffa0139c02>] cma_req_handler+0x418/0x644 [rdma_cm] [<ffffffffa012ee88>] cm_process_work+0x32/0x119 [ib_cm] [<ffffffffa0130299>] cm_req_handler+0x928/0x982 [ib_cm] [<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm] [<ffffffffa0130326>] cm_work_handler+0x33/0xfe5 [ib_cm] [<ffffffff81065a78>] ? trace_hardirqs_on_caller+0x11e/0x155 [<ffffffffa01302f3>] ? cm_req_handler+0x982/0x982 [ib_cm] [<ffffffff81042b6e>] process_one_work+0x2bd/0x4a6 [<ffffffff81042ac1>] ? process_one_work+0x210/0x4a6 [<ffffffff813669f3>] ? _raw_spin_unlock_irq+0x2b/0x40 [<ffffffff8104316e>] worker_thread+0x1d6/0x350 [<ffffffff81042f98>] ? rescuer_thread+0x241/0x241 [<ffffffff81046a32>] kthread+0x84/0x8c [<ffffffff8136e854>] kernel_thread_helper+0x4/0x10 [<ffffffff81366d59>] ? retint_restore_args+0xe/0xe [<ffffffff810469ae>] ? __init_kthread_worker+0x56/0x56 [<ffffffff8136e850>] ? gs_change+0xb/0xb The actual locking is fine, since we're dealing with different locks, but from the same lock class. cma_disable_callback() acquires the listening id mutex, whereas rdma_destroy_id() acquires the mutex for the new connection id. To fix this, delay the call to rdma_destroy_id() until we've released the listening id mutex. Signed-off-by: Sean Hefty <[email protected]> Signed-off-by: Roland Dreier <[email protected]>

xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing work during mount and unmount. This flag can be cleared by unmount after the xfs_sync_worker checks it but before the work is completed. The has caused crashes in the completion handler for the dummy transaction commited by xfs_sync_worker: PID: 27544 TASK: ffff88013544e040 CPU: 3 COMMAND: "kworker/3:0" #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9 #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053 #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8 #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48 #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee #7 [ffff88016fdffc60] page_fault at ffffffff813ac635 [exception RIP: xlog_get_lowest_lsn+0x30] RIP: ffffffffa04a9910 RSP: ffff88016fdffd10 RFLAGS: 00010246 RAX: ffffc90014e48000 RBX: ffff88014d879980 RCX: ffff88014d879980 RDX: ffff8802214ee4c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88016fdffd10 R8: ffff88014d879a80 R9: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802214ee400 R13: ffff88014d879980 R14: 0000000000000000 R15: ffff88022fd96605 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs] #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs] Protect xfs_sync_worker by using the s_umount semaphore at the read level to provide exclusion with unmount while work is progressing. Reviewed-by: Mark Tinguely <[email protected]> Signed-off-by: Ben Myers <[email protected]>

As observed and suggested by Tushar Gosavi... --------- readdir calls these function to send TRANS2_FIND_FIRST and TRANS2_FIND_NEXT command to the server. The current cifs module is not specifying CIFS_SEARCH_BACKUP_SEARCH flag while sending these command when backupuid/backupgid is specified. This can be resolved by specifying CIFS_SEARCH_BACKUP_SEARCH flag. --------- Cc: <[email protected]> Reported-and-Tested-by: Tushar Gosavi <[email protected]> Signed-off-by: Shirish Pargaonkar <[email protected]> Acked-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>

When BT traffic load changes from its previous state, a new LQ command needs to be sent down to the firmware. This needs to be done only once per change. The state variable that keeps track of this change is last_bt_traffic_load. However, it was not being updated when the change had been handled. Not updating this variable was causing a flood of advanced BT config commands to be sent to the firmware. Fix this. Cc: [email protected] #2.6.38+ Signed-off-by: Meenakshi Venkataraman <[email protected]> Signed-off-by: Wey-Yi Guy <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: John W. Linville <[email protected]>

Shadow registers in the device are meant to allow the driver to update certain device registers without needing to wake up all components of the device. However, using this feature in the device causes communication between the driver and the device to become unreliable, resulting in host command timeouts. Disable this feature by default till a fix is available for the bug. Cc: [email protected] #2.6.38+ Signed-off-by: Meenakshi Venkataraman <[email protected]> Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: John W. Linville <[email protected]>

Now when we set the group inode free count, we don't have a proper group lock so that multiple threads may decrease the inode free count at the same time. And e2fsck will complain something like: Free inodes count wrong for group #1 (1, counted=0). Fix? no Free inodes count wrong for group #2 (3, counted=0). Fix? no Directories count wrong for group #2 (780, counted=779). Fix? no Free inodes count wrong for group #3 (2272, counted=2273). Fix? no So this patch try to protect it with the ext4_lock_group. btw, it is found by xfstests test case 269 and the volume is mkfsed with the parameter "-O ^resize_inode,^uninit_bg,extent,meta_bg,flex_bg,ext_attr" and I have run it 100 times and the error in e2fsck doesn't show up again. Signed-off-by: Tao Ma <[email protected]> Signed-off-by: "Theodore Ts'o" <[email protected]>

Pull CIFS updates from Steve French. * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (29 commits) cifs: fix oops while traversing open file list (try #4) cifs: Fix comment as d_alloc_root() is replaced by d_make_root() CIFS: Introduce SMB2 mounts as vers=2.1 CIFS: Introduce SMB2 Kconfig option CIFS: Move add/set_credits and get_credits_field to ops structure CIFS: Move protocol specific demultiplex thread calls to ops struct CIFS: Move protocol specific part from cifs_readv_receive to ops struct CIFS: Move header_size/max_header_size to ops structure CIFS: Move protocol specific part from SendReceive2 to ops struct cifs: Include backup intent search flags during searches {try #2) CIFS: Separate protocol specific part from setlk CIFS: Separate protocol specific part from getlk CIFS: Separate protocol specific lock type handling CIFS: Convert lock type to 32 bit variable CIFS: Move locks to cifsFileInfo structure cifs: convert send_nt_cancel into a version specific op cifs: add a smb_version_operations/values structures and a smb_version enum cifs: remove the vers= and version= synonyms for ver= cifs: add warning about change in default cache semantics in 3.7 cifs: display cache= option in /proc/mounts ...

…condition When holding the mmap_sem for reading, pmd_offset_map_lock should only run on a pmd_t that has been read atomically from the pmdp pointer, otherwise we may read only half of it leading to this crash. PID: 11679 TASK: f06e8000 CPU: 3 COMMAND: "do_race_2_panic" #0 [f06a9dd8] crash_kexec at c049b5ec #1 [f06a9e2c] oops_end at c083d1c2 #2 [f06a9e40] no_context at c0433ded #3 [f06a9e64] bad_area_nosemaphore at c043401a #4 [f06a9e6c] __do_page_fault at c0434493 #5 [f06a9eec] do_page_fault at c083eb45 #6 [f06a9f04] error_code (via page_fault) at c083c5d5 EAX: 01fb470c EBX: fff35000 ECX: 00000003 EDX: 00000100 EBP: 00000000 DS: 007b ESI: 9e201000 ES: 007b EDI: 01fb4700 GS: 00e0 CS: 0060 EIP: c083bc14 ERR: ffffffff EFLAGS: 00010246 #7 [f06a9f38] _spin_lock at c083bc14 #8 [f06a9f44] sys_mincore at c0507b7d #9 [f06a9fb0] system_call at c083becd start len EAX: ffffffda EBX: 9e200000 ECX: 00001000 EDX: 6228537f DS: 007b ESI: 00000000 ES: 007b EDI: 003d0f00 SS: 007b ESP: 62285354 EBP: 62285388 GS: 0033 CS: 0073 EIP: 00291416 ERR: 000000da EFLAGS: 00000286 This should be a longstanding bug affecting x86 32bit PAE without THP. Only archs with 64bit large pmd_t and 32bit unsigned long should be affected. With THP enabled the barrier() in pmd_none_or_trans_huge_or_clear_bad() would partly hide the bug when the pmd transition from none to stable, by forcing a re-read of the *pmd in pmd_offset_map_lock, but when THP is enabled a new set of problem arises by the fact could then transition freely in any of the none, pmd_trans_huge or pmd_trans_stable states. So making the barrier in pmd_none_or_trans_huge_or_clear_bad() unconditional isn't good idea and it would be a flakey solution. This should be fully fixed by introducing a pmd_read_atomic that reads the pmd in order with THP disabled, or by reading the pmd atomically with cmpxchg8b with THP enabled. Luckily this new race condition only triggers in the places that must already be covered by pmd_none_or_trans_huge_or_clear_bad() so the fix is localized there but this bug is not related to THP. NOTE: this can trigger on x86 32bit systems with PAE enabled with more than 4G of ram, otherwise the high part of the pmd will never risk to be truncated because it would be zero at all times, in turn so hiding the SMP race. This bug was discovered and fully debugged by Ulrich, quote: ---- [..] pmd_none_or_trans_huge_or_clear_bad() loads the content of edx and eax. 496 static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) 497 { 498 /* depend on compiler for an atomic pmd read */ 499 pmd_t pmdval = *pmd; // edi = pmd pointer 0xc0507a74 <sys_mincore+548>: mov 0x8(%esp),%edi ... // edx = PTE page table high address 0xc0507a84 <sys_mincore+564>: mov 0x4(%edi),%edx ... // eax = PTE page table low address 0xc0507a8e <sys_mincore+574>: mov (%edi),%eax [..] Please note that the PMD is not read atomically. These are two "mov" instructions where the high order bits of the PMD entry are fetched first. Hence, the above machine code is prone to the following race. - The PMD entry {high|low} is 0x0000000000000000. The "mov" at 0xc0507a84 loads 0x00000000 into edx. - A page fault (on another CPU) sneaks in between the two "mov" instructions and instantiates the PMD. - The PMD entry {high|low} is now 0x00000003fda38067. The "mov" at 0xc0507a8e loads 0xfda38067 into eax. ---- Reported-by: Ulrich Obergfell <[email protected]> Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Larry Woodman <[email protected]> Cc: Petr Matousek <[email protected]> Cc: Rik van Riel <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

Shrink TIF_WORK_MASK so that it will fit in the 12-bit signed immediate operand field of an ANDI instruction. Suggested-by: Al Viro <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>

Optimise the system call exit path in entry.S by packing some instructions. Suggested-by: Al Viro <[email protected]> Signed-off-by: David Howells <[email protected]> Signed-off-by: Al Viro <[email protected]>

…/git/viro/signal Pull third pile of signal handling patches from Al Viro: "This time it's mostly helpers and conversions to them; there's a lot of stuff remaining in the tree, but that'll either go in -rc2 (isolated bug fixes, ideally via arch maintainers' trees) or will sit there until the next cycle." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: x86: get rid of calling do_notify_resume() when returning to kernel mode blackfin: check __get_user() return value whack-a-mole with TIF_FREEZE FRV: Optimise the system call exit path in entry.S [ver #2] FRV: Shrink TIF_WORK_MASK [ver #2] FRV: Prevent syscall exit tracing and notify_resume at end of kernel exceptions new helper: signal_delivered() powerpc: get rid of restore_sigmask() most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set set_restore_sigmask() is never called without SIGPENDING (and never should be) TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set don't call try_to_freeze() from do_signal() pull clearing RESTORE_SIGMASK into block_sigmask() sh64: failure to build sigframe != signal without handler openrisc: tracehook_signal_handler() is supposed to be called on success new helper: sigmask_to_save() new helper: restore_saved_sigmask() new helpers: {clear,test,test_and_clear}_restore_sigmask() HAVE_RESTORE_SIGMASK is defined on all architectures now

Remove spinlock as atomic_t can be used instead. Note we use only 16 lower bits, upper bits are changed but we impilcilty cast to u16. This fix possible deadlock on IBSS mode reproted by lockdep: ================================= [ INFO: inconsistent lock state ] 3.4.0-wl+ #4 Not tainted --------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. kworker/u:2/30374 [HC0[0]:SC0[0]:HE1:SE1] takes: (&(&intf->seqlock)->rlock){+.?...}, at: [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib] {IN-SOFTIRQ-W} state was registered at: [<c04978ab>] __lock_acquire+0x47b/0x1050 [<c0498504>] lock_acquire+0x84/0xf0 [<c0835733>] _raw_spin_lock+0x33/0x40 [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib] [<f9979f2a>] rt2x00queue_write_tx_frame+0x1a/0x300 [rt2x00lib] [<f997834f>] rt2x00mac_tx+0x7f/0x380 [rt2x00lib] [<f98fe363>] __ieee80211_tx+0x1b3/0x300 [mac80211] [<f98ffdf5>] ieee80211_tx+0x105/0x130 [mac80211] [<f99000dd>] ieee80211_xmit+0xad/0x100 [mac80211] [<f9900519>] ieee80211_subif_start_xmit+0x2d9/0x930 [mac80211] [<c0782e87>] dev_hard_start_xmit+0x307/0x660 [<c079bb71>] sch_direct_xmit+0xa1/0x1e0 [<c0784bb3>] dev_queue_xmit+0x183/0x730 [<c078c27a>] neigh_resolve_output+0xfa/0x1e0 [<c07b436a>] ip_finish_output+0x24a/0x460 [<c07b4897>] ip_output+0xb7/0x100 [<c07b2d60>] ip_local_out+0x20/0x60 [<c07e01ff>] igmpv3_sendpack+0x4f/0x60 [<c07e108f>] igmp_ifc_timer_expire+0x29f/0x330 [<c04520fc>] run_timer_softirq+0x15c/0x2f0 [<c0449e3e>] __do_softirq+0xae/0x1e0 irq event stamp: 18380437 hardirqs last enabled at (18380437): [<c0526027>] __slab_alloc.clone.3+0x67/0x5f0 hardirqs last disabled at (18380436): [<c0525ff3>] __slab_alloc.clone.3+0x33/0x5f0 softirqs last enabled at (18377616): [<c0449eb3>] __do_softirq+0x123/0x1e0 softirqs last disabled at (18377611): [<c041278d>] do_softirq+0x9d/0xe0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&intf->seqlock)->rlock); <Interrupt> lock(&(&intf->seqlock)->rlock); *** DEADLOCK *** 4 locks held by kworker/u:2/30374: #0: (wiphy_name(local->hw.wiphy)){++++.+}, at: [<c045cf99>] process_one_work+0x109/0x3f0 #1: ((&sdata->work)){+.+.+.}, at: [<c045cf99>] process_one_work+0x109/0x3f0 #2: (&ifibss->mtx){+.+.+.}, at: [<f98f005b>] ieee80211_ibss_work+0x1b/0x470 [mac80211] #3: (&intf->beacon_skb_mutex){+.+...}, at: [<f997a644>] rt2x00queue_update_beacon+0x24/0x50 [rt2x00lib] stack backtrace: Pid: 30374, comm: kworker/u:2 Not tainted 3.4.0-wl+ #4 Call Trace: [<c04962a6>] print_usage_bug+0x1f6/0x220 [<c0496a12>] mark_lock+0x2c2/0x300 [<c0495ff0>] ? check_usage_forwards+0xc0/0xc0 [<c04978ec>] __lock_acquire+0x4bc/0x1050 [<c0527890>] ? __kmalloc_track_caller+0x1c0/0x1d0 [<c0777fb6>] ? copy_skb_header+0x26/0x90 [<c0498504>] lock_acquire+0x84/0xf0 [<f9979a20>] ? rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib] [<c0835733>] _raw_spin_lock+0x33/0x40 [<f9979a20>] ? rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib] [<f9979a20>] rt2x00queue_create_tx_descriptor+0x380/0x490 [rt2x00lib] [<f997a5cf>] rt2x00queue_update_beacon_locked+0x5f/0xb0 [rt2x00lib] [<f997a64d>] rt2x00queue_update_beacon+0x2d/0x50 [rt2x00lib] [<f9977e3a>] rt2x00mac_bss_info_changed+0x1ca/0x200 [rt2x00lib] [<f9977c70>] ? rt2x00mac_remove_interface+0x70/0x70 [rt2x00lib] [<f98e4dd0>] ieee80211_bss_info_change_notify+0xe0/0x1d0 [mac80211] [<f98ef7b8>] __ieee80211_sta_join_ibss+0x3b8/0x610 [mac80211] [<c0496ab4>] ? mark_held_locks+0x64/0xc0 [<c0440012>] ? virt_efi_query_capsule_caps+0x12/0x50 [<f98efb09>] ieee80211_sta_join_ibss+0xf9/0x140 [mac80211] [<f98f0456>] ieee80211_ibss_work+0x416/0x470 [mac80211] [<c0496d8b>] ? trace_hardirqs_on+0xb/0x10 [<c077683b>] ? skb_dequeue+0x4b/0x70 [<f98f207f>] ieee80211_iface_work+0x13f/0x230 [mac80211] [<c045cf99>] ? process_one_work+0x109/0x3f0 [<c045d015>] process_one_work+0x185/0x3f0 [<c045cf99>] ? process_one_work+0x109/0x3f0 [<f98f1f40>] ? ieee80211_teardown_sdata+0xa0/0xa0 [mac80211] [<c045ed86>] worker_thread+0x116/0x270 [<c045ec70>] ? manage_workers+0x1e0/0x1e0 [<c0462f64>] kthread+0x84/0x90 [<c0462ee0>] ? __init_kthread_worker+0x60/0x60 [<c083d382>] kernel_thread_helper+0x6/0x10 Cc: [email protected] Signed-off-by: Stanislaw Gruszka <[email protected]> Acked-by: Helmut Schaa <[email protected]> Acked-by: Gertjan van Wingerde <[email protected]> Signed-off-by: John W. Linville <[email protected]>

In 3.5 kernel, the endpoint is assigned dynamically for the substreams, but the PCM assignment still checks the presence of the endpoint pointer. This ended up in duplicated PCM substream creations at probing time, resulting in kernel warnings like: WARNING: at fs/proc/generic.c:586 proc_register+0x169/0x1a6() Pid: 1152, comm: modprobe Not tainted 3.5.0-rc1-00110-g71fae7e #2 Call Trace: [<ffffffff8102a400>] warn_slowpath_common+0x83/0x9c [<ffffffff8102a4bc>] warn_slowpath_fmt+0x46/0x48 [<ffffffff813829ad>] ? add_preempt_count+0x39/0x3b [<ffffffff811292f0>] proc_register+0x169/0x1a6 [<ffffffff8112962e>] create_proc_entry+0x74/0x8c [<ffffffffa018eb63>] snd_info_register+0x3e/0xc3 [snd] [<ffffffffa01fde2e>] snd_pcm_new_stream+0xb1/0x404 [snd_pcm] [<ffffffffa024861f>] snd_usb_add_audio_stream+0xd2/0x230 [snd_usb_audio] [<ffffffffa0241d33>] ? snd_usb_parse_audio_format+0x252/0x34f [snd_usb_audio] [<ffffffff810d6b17>] ? kmem_cache_alloc_trace+0xab/0xbb [<ffffffffa0248c29>] snd_usb_parse_audio_interface+0x4ac/0x567 [snd_usb_audio] [<ffffffffa023f0ff>] snd_usb_create_stream+0xe9/0x125 [snd_usb_audio] [<ffffffffa023f9b1>] usb_audio_probe+0x62a/0x72c [snd_usb_audio] ..... This patch fixes the regression by checking the fixed endpoint number for each substream instead of the endpoint pointer. Reported-and-tested-by: Jamie Heilman <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>

Some edac drivers register themselves as mce decoders via notifier_chain. But in current notifier_chain implementation logic, it doesn't accept same notifier registered twice. If so, it will be wrong when adding/removing the element from the list. For example, on one SandyBridge platform, remove module sb_edac and then trigger one error, it will hit oops because it has no mce decoder registered but related notifier_chain still points to an invalid callback function. Here is an example: Call Trace: [<ffffffff8150ef6a>] atomic_notifier_call_chain+0x1a/0x20 [<ffffffff8102b936>] mce_log+0x46/0x180 [<ffffffff8102eaea>] apei_mce_report_mem_error+0x4a/0x60 [<ffffffff812e19d2>] ghes_do_proc+0x192/0x210 [<ffffffff812e2066>] ghes_proc+0x46/0x70 [<ffffffff812e20d8>] ghes_notify_sci+0x48/0x80 [<ffffffff8150ef05>] notifier_call_chain+0x55/0x80 [<ffffffff81076f1a>] __blocking_notifier_call_chain+0x5a/0x80 [<ffffffff812aea11>] ? acpi_os_wait_events_complete+0x23/0x23 [<ffffffff81076f56>] blocking_notifier_call_chain+0x16/0x20 [<ffffffff812ddc4d>] acpi_hed_notify+0x19/0x1b [<ffffffff812b16bd>] acpi_device_notify+0x19/0x1b [<ffffffff812beb38>] acpi_ev_notify_dispatch+0x67/0x7f [<ffffffff812aea3a>] acpi_os_execute_deferred+0x29/0x36 [<ffffffff81069dc2>] process_one_work+0x132/0x450 [<ffffffff8106bbcb>] worker_thread+0x17b/0x3c0 [<ffffffff8106ba50>] ? manage_workers+0x120/0x120 [<ffffffff81070aee>] kthread+0x9e/0xb0 [<ffffffff81514724>] kernel_thread_helper+0x4/0x10 [<ffffffff81070a50>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff81514720>] ? gs_change+0x13/0x13 Code: f3 49 89 d4 45 85 ed 4d 89 c6 48 8b 0f 74 48 48 85 c9 75 17 eb 41 0f 1f 80 00 00 00 00 41 83 ed 01 4c 89 f9 74 22 4d 85 ff 74 1d <4c> 8b 79 08 4c 89 e2 48 89 de 48 89 cf ff 11 4d 85 f6 74 04 41 RIP [<ffffffff8150eef6>] notifier_call_chain+0x46/0x80 RSP <ffff88042868fb20> CR2: ffffffffa01af838 ---[ end trace 0100930068e73e6f ]--- BUG: unable to handle kernel paging request at fffffffffffffff8 IP: [<ffffffff810705b0>] kthread_data+0x10/0x20 PGD 1a0d067 PUD 1a0e067 PMD 0 Oops: 0000 [#2] SMP Only i7core_edac and sb_edac have such issues because they have more than one memory controller which means they have to register mce decoder many times. Cc: <[email protected]> # 3.2 and upper Signed-off-by: Chen Gong <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>

The warning below triggers on AMD MCM packages because physical package IDs on the cores of a _physical_ socket are the same. I.e., this field says which CPUs belong to the same physical package. However, the same two CPUs belong to two different internal, i.e. "logical" nodes in the same physical socket which is reflected in the CPU-to-node map on x86 with NUMA. Which makes this check wrong on the above topologies so circumvent it. [ 0.444413] Booting Node 0, Processors #1 #2 #3 #4 #5 Ok. [ 0.461388] ------------[ cut here ]------------ [ 0.465997] WARNING: at arch/x86/kernel/smpboot.c:310 topology_sane.clone.1+0x6e/0x81() [ 0.473960] Hardware name: Dinar [ 0.477170] sched: CPU #6's mc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. [ 0.486860] Booting Node 1, Processors #6 [ 0.491104] Modules linked in: [ 0.494141] Pid: 0, comm: swapper/6 Not tainted 3.4.0+ #1 [ 0.499510] Call Trace: [ 0.501946] [<ffffffff8144bf92>] ? topology_sane.clone.1+0x6e/0x81 [ 0.508185] [<ffffffff8102f1fc>] warn_slowpath_common+0x85/0x9d [ 0.514163] [<ffffffff8102f2b7>] warn_slowpath_fmt+0x46/0x48 [ 0.519881] [<ffffffff8144bf92>] topology_sane.clone.1+0x6e/0x81 [ 0.525943] [<ffffffff8144c234>] set_cpu_sibling_map+0x251/0x371 [ 0.532004] [<ffffffff8144c4ee>] start_secondary+0x19a/0x218 [ 0.537729] ---[ end trace 4eaa2a86a8e2da22 ]--- [ 0.628197] #7 #8 #9 #10 #11 Ok. [ 0.807108] Booting Node 3, Processors #12 #13 #14 #15 #16 #17 Ok. [ 0.897587] Booting Node 2, Processors #18 #19 #20 #21 #22 #23 Ok. [ 0.917443] Brought up 24 CPUs We ran a topology sanity check test we have here on it and it all looks ok... hopefully :). Signed-off-by: Borislav Petkov <[email protected]> Cc: Andreas Herrmann <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

A bunch of bugzillas have complained how noisy the nmi_watchdog is during boot-up especially with its expected failure cases (like virt and bios resource contention). This is my attempt to quiet them down and keep it less confusing for the end user. What I did is print the message for cpu0 and save it for future comparisons. If future cpus have an identical message as cpu0, then don't print the redundant info. However, if a future cpu has a different message, happily print that loudly. Before the change, you would see something like: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping 0a Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver. ... version: 2 ... bit width: 40 ... generic registers: 2 ... value mask: 000000ffffffffff ... max period: 000000007fffffff ... fixed-purpose events: 3 ... event mask: 0000000700000003 NMI watchdog enabled, takes one hw-pmu counter. Booting Node 0, Processors #1 NMI watchdog enabled, takes one hw-pmu counter. #2 NMI watchdog enabled, takes one hw-pmu counter. #3 Ok. NMI watchdog enabled, takes one hw-pmu counter. Brought up 4 CPUs Total of 4 processors activated (22607.24 BogoMIPS). After the change, it is simplified to: ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1 CPU0: Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz stepping 0a Performance Events: PEBS fmt0+, Core2 events, Intel PMU driver. ... version: 2 ... bit width: 40 ... generic registers: 2 ... value mask: 000000ffffffffff ... max period: 000000007fffffff ... fixed-purpose events: 3 ... event mask: 0000000700000003 NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter. Booting Node 0, Processors #1 #2 #3 Ok. Brought up 4 CPUs V2: little changes based on Joe Perches' feedback V3: printk cleanup based on Ingo's feedback; checkpatch fix V4: keep printk as one long line V5: Ingo fix ups Reported-and-tested-by: Nathan Zimmer <[email protected]> Signed-off-by: Don Zickus <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

reg_timeout_work() calls restore_regulatory_settings() which takes cfg80211_mutex. reg_set_request_processed() already holds cfg80211_mutex before calling cancel_delayed_work_sync(reg_timeout), so it might deadlock. Call the async cancel_delayed_work instead, in order to avoid the potential deadlock. This is the relevant lockdep warning: cfg80211: Calling CRDA for country: XX ====================================================== [ INFO: possible circular locking dependency detected ] 3.4.0-rc5-wl+ #26 Not tainted ------------------------------------------------------- kworker/0:2/1391 is trying to acquire lock: (cfg80211_mutex){+.+.+.}, at: [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211] but task is already holding lock: ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 ((reg_timeout).work){+.+...}: [<c008fd44>] validate_chain+0xb94/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c005b600>] wait_on_work+0x4c/0x154 [<c005c000>] __cancel_work_timer+0xd4/0x11c [<c005c064>] cancel_delayed_work_sync+0x1c/0x20 [<bf28b274>] reg_set_request_processed+0x50/0x78 [cfg80211] [<bf28bd84>] set_regdom+0x550/0x600 [cfg80211] [<bf294cd8>] nl80211_set_reg+0x218/0x258 [cfg80211] [<c03c7738>] genl_rcv_msg+0x1a8/0x1e8 [<c03c6a00>] netlink_rcv_skb+0x5c/0xc0 [<c03c7584>] genl_rcv+0x28/0x34 [<c03c6720>] netlink_unicast+0x15c/0x228 [<c03c6c7c>] netlink_sendmsg+0x218/0x298 [<c03933c8>] sock_sendmsg+0xa4/0xc0 [<c039406c>] __sys_sendmsg+0x1e4/0x268 [<c0394228>] sys_sendmsg+0x4c/0x70 [<c0013840>] ret_fast_syscall+0x0/0x3c -> #1 (reg_mutex){+.+.+.}: [<c008fd44>] validate_chain+0xb94/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c04734dc>] mutex_lock_nested+0x48/0x320 [<bf28b2cc>] reg_todo+0x30/0x538 [cfg80211] [<c0059f44>] process_one_work+0x2a0/0x480 [<c005a4b4>] worker_thread+0x1bc/0x2bc [<c0061148>] kthread+0x98/0xa4 [<c0014af4>] kernel_thread_exit+0x0/0x8 -> #0 (cfg80211_mutex){+.+.+.}: [<c008ed58>] print_circular_bug+0x68/0x2cc [<c008fb28>] validate_chain+0x978/0x10f0 [<c0090b68>] __lock_acquire+0x8c8/0x9b0 [<c0090d40>] lock_acquire+0xf0/0x114 [<c04734dc>] mutex_lock_nested+0x48/0x320 [<bf28ae00>] restore_regulatory_settings+0x34/0x418 [cfg80211] [<bf28b200>] reg_timeout_work+0x1c/0x20 [cfg80211] [<c0059f44>] process_one_work+0x2a0/0x480 [<c005a4b4>] worker_thread+0x1bc/0x2bc [<c0061148>] kthread+0x98/0xa4 [<c0014af4>] kernel_thread_exit+0x0/0x8 other info that might help us debug this: Chain exists of: cfg80211_mutex --> reg_mutex --> (reg_timeout).work Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock((reg_timeout).work); lock(reg_mutex); lock((reg_timeout).work); lock(cfg80211_mutex); *** DEADLOCK *** 2 locks held by kworker/0:2/1391: #0: (events){.+.+.+}, at: [<c0059e94>] process_one_work+0x1f0/0x480 #1: ((reg_timeout).work){+.+...}, at: [<c0059e94>] process_one_work+0x1f0/0x480 stack backtrace: [<c001b928>] (unwind_backtrace+0x0/0x12c) from [<c0471d3c>] (dump_stack+0x20/0x24) [<c0471d3c>] (dump_stack+0x20/0x24) from [<c008ef70>] (print_circular_bug+0x280/0x2cc) [<c008ef70>] (print_circular_bug+0x280/0x2cc) from [<c008fb28>] (validate_chain+0x978/0x10f0) [<c008fb28>] (validate_chain+0x978/0x10f0) from [<c0090b68>] (__lock_acquire+0x8c8/0x9b0) [<c0090b68>] (__lock_acquire+0x8c8/0x9b0) from [<c0090d40>] (lock_acquire+0xf0/0x114) [<c0090d40>] (lock_acquire+0xf0/0x114) from [<c04734dc>] (mutex_lock_nested+0x48/0x320) [<c04734dc>] (mutex_lock_nested+0x48/0x320) from [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) [<bf28ae00>] (restore_regulatory_settings+0x34/0x418 [cfg80211]) from [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) [<bf28b200>] (reg_timeout_work+0x1c/0x20 [cfg80211]) from [<c0059f44>] (process_one_work+0x2a0/0x480) [<c0059f44>] (process_one_work+0x2a0/0x480) from [<c005a4b4>] (worker_thread+0x1bc/0x2bc) [<c005a4b4>] (worker_thread+0x1bc/0x2bc) from [<c0061148>] (kthread+0x98/0xa4) [<c0061148>] (kthread+0x98/0xa4) from [<c0014af4>] (kernel_thread_exit+0x0/0x8) cfg80211: Calling CRDA to update world regulatory domain cfg80211: World regulatory domain updated: cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp) cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (2457000 KHz - 2482000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (300 mBi, 2000 mBm) cfg80211: (5170000 KHz - 5250000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) cfg80211: (5735000 KHz - 5835000 KHz @ 40000 KHz), (300 mBi, 2000 mBm) Cc: [email protected] Signed-off-by: Eliad Peller <[email protected]> Signed-off-by: Johannes Berg <[email protected]>

Ignoring interfaces with additional descriptors is not a reliable method for locating the correct interface on Gobi devices. There is at least one device where this method fails: https://bbs.archlinux.org/viewtopic.php?id=143506 The result is that the AT command port (interface #2) is hidden from qcserial, preventing traditional serial modem usage: [ 15.562552] qmi_wwan 4-1.6:1.0: cdc-wdm0: USB WDM device [ 15.562691] qmi_wwan 4-1.6:1.0: wwan0: register 'qmi_wwan' at usb-0000:00:1d.0-1.6, Qualcomm Gobi wwan/QMI device, 1e:df:3c:3a:4e:3b [ 15.563383] qmi_wwan: probe of 4-1.6:1.1 failed with error -22 [ 15.564189] qmi_wwan 4-1.6:1.2: cdc-wdm1: USB WDM device [ 15.564302] qmi_wwan 4-1.6:1.2: wwan1: register 'qmi_wwan' at usb-0000:00:1d.0-1.6, Qualcomm Gobi wwan/QMI device, 1e:df:3c:3a:4e:3b [ 15.564328] qmi_wwan: probe of 4-1.6:1.3 failed with error -22 [ 15.569376] qcserial 4-1.6:1.1: Qualcomm USB modem converter detected [ 15.569440] usb 4-1.6: Qualcomm USB modem converter now attached to ttyUSB0 [ 15.570372] qcserial 4-1.6:1.3: Qualcomm USB modem converter detected [ 15.570430] usb 4-1.6: Qualcomm USB modem converter now attached to ttyUSB1 Use static interface numbers taken from the interface map in qcserial for all Gobi devices instead: Gobi 1K USB layout: 0: serial port (doesn't respond) 1: serial port (doesn't respond) 2: AT-capable modem port 3: QMI/net Gobi 2K+ USB layout: 0: QMI/net 1: DM/DIAG (use libqcdm from ModemManager for communication) 2: AT-capable modem port 3: NMEA This should be more reliable over all, and will also prevent the noisy "probe failed" messages. The whitelisting logic is expected to be replaced by direct interface number matching in 3.6. Reported-by: Heinrich Siebmanns (Harvey) <[email protected]> Cc: <[email protected]> # v3.4: 0000188 USB: qmi_wwan: Make forced int 4 whitelist generic Cc: <[email protected]> # v3.4: f7142e6 USB: qmi_wwan: Add ZTE (Vodafone) K3520-Z Cc: <[email protected]> # v3.4 Signed-off-by: Bjørn Mork <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Denys Fedoryshchenko reported a LOCKDEP issue with l2tp code. [ 8683.927442] ====================================================== [ 8683.927555] [ INFO: possible circular locking dependency detected ] [ 8683.927672] 3.4.1-build-0061 #14 Not tainted [ 8683.927782] ------------------------------------------------------- [ 8683.927895] swapper/0/0 is trying to acquire lock: [ 8683.928007] (slock-AF_INET){+.-...}, at: [<e0fc73ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core] [ 8683.928121] [ 8683.928121] but task is already holding lock: [ 8683.928121] (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] sch_direct_xmit+0x36/0x119 [ 8683.928121] [ 8683.928121] which lock already depends on the new lock. [ 8683.928121] [ 8683.928121] [ 8683.928121] the existing dependency chain (in reverse order) is: [ 8683.928121] [ 8683.928121] -> #1 (_xmit_ETHER#2){+.-...}: [ 8683.928121] [<c015a561>] lock_acquire+0x71/0x85 [ 8683.928121] [<c034da2d>] _raw_spin_lock+0x33/0x40 [ 8683.928121] [<c0304e0c>] ip_send_reply+0xf2/0x1ce [ 8683.928121] [<c0317dbc>] tcp_v4_send_reset+0x153/0x16f [ 8683.928121] [<c0317f4a>] tcp_v4_do_rcv+0x172/0x194 [ 8683.928121] [<c031929b>] tcp_v4_rcv+0x387/0x5a0 [ 8683.928121] [<c03001d0>] ip_local_deliver_finish+0x13a/0x1e9 [ 8683.928121] [<c0300645>] NF_HOOK.clone.11+0x46/0x4d [ 8683.928121] [<c030075b>] ip_local_deliver+0x41/0x45 [ 8683.928121] [<c03005dd>] ip_rcv_finish+0x31a/0x33c [ 8683.928121] [<c0300645>] NF_HOOK.clone.11+0x46/0x4d [ 8683.928121] [<c0300960>] ip_rcv+0x201/0x23d [ 8683.928121] [<c02de91b>] __netif_receive_skb+0x329/0x378 [ 8683.928121] [<c02deae8>] netif_receive_skb+0x4e/0x7d [ 8683.928121] [<e08d5ef3>] rtl8139_poll+0x243/0x33d [8139too] [ 8683.928121] [<c02df103>] net_rx_action+0x90/0x15d [ 8683.928121] [<c012b2b5>] __do_softirq+0x7b/0x118 [ 8683.928121] [ 8683.928121] -> #0 (slock-AF_INET){+.-...}: [ 8683.928121] [<c0159f1b>] __lock_acquire+0x9a3/0xc27 [ 8683.928121] [<c015a561>] lock_acquire+0x71/0x85 [ 8683.928121] [<c034da2d>] _raw_spin_lock+0x33/0x40 [ 8683.928121] [<e0fc73ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core] [ 8683.928121] [<e0fe31fb>] l2tp_eth_dev_xmit+0x1a/0x2f [l2tp_eth] [ 8683.928121] [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2 [ 8683.928121] [<c02f064c>] sch_direct_xmit+0x55/0x119 [ 8683.928121] [<c02e0528>] dev_queue_xmit+0x282/0x418 [ 8683.928121] [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c [ 8683.928121] [<c031f524>] arp_xmit+0x22/0x24 [ 8683.928121] [<c031f567>] arp_send+0x41/0x48 [ 8683.928121] [<c031fa7d>] arp_process+0x289/0x491 [ 8683.928121] [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c [ 8683.928121] [<c031f7a0>] arp_rcv+0xb1/0xc3 [ 8683.928121] [<c02de91b>] __netif_receive_skb+0x329/0x378 [ 8683.928121] [<c02de9d3>] process_backlog+0x69/0x130 [ 8683.928121] [<c02df103>] net_rx_action+0x90/0x15d [ 8683.928121] [<c012b2b5>] __do_softirq+0x7b/0x118 [ 8683.928121] [ 8683.928121] other info that might help us debug this: [ 8683.928121] [ 8683.928121] Possible unsafe locking scenario: [ 8683.928121] [ 8683.928121] CPU0 CPU1 [ 8683.928121] ---- ---- [ 8683.928121] lock(_xmit_ETHER#2); [ 8683.928121] lock(slock-AF_INET); [ 8683.928121] lock(_xmit_ETHER#2); [ 8683.928121] lock(slock-AF_INET); [ 8683.928121] [ 8683.928121] *** DEADLOCK *** [ 8683.928121] [ 8683.928121] 3 locks held by swapper/0/0: [ 8683.928121] #0: (rcu_read_lock){.+.+..}, at: [<c02dbc10>] rcu_lock_acquire+0x0/0x30 [ 8683.928121] #1: (rcu_read_lock_bh){.+....}, at: [<c02dbc10>] rcu_lock_acquire+0x0/0x30 [ 8683.928121] #2: (_xmit_ETHER#2){+.-...}, at: [<c02f062d>] sch_direct_xmit+0x36/0x119 [ 8683.928121] [ 8683.928121] stack backtrace: [ 8683.928121] Pid: 0, comm: swapper/0 Not tainted 3.4.1-build-0061 #14 [ 8683.928121] Call Trace: [ 8683.928121] [<c034bdd2>] ? printk+0x18/0x1a [ 8683.928121] [<c0158904>] print_circular_bug+0x1ac/0x1b6 [ 8683.928121] [<c0159f1b>] __lock_acquire+0x9a3/0xc27 [ 8683.928121] [<c015a561>] lock_acquire+0x71/0x85 [ 8683.928121] [<e0fc73ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core] [ 8683.928121] [<c034da2d>] _raw_spin_lock+0x33/0x40 [ 8683.928121] [<e0fc73ec>] ? l2tp_xmit_skb+0x173/0x47e [l2tp_core] [ 8683.928121] [<e0fc73ec>] l2tp_xmit_skb+0x173/0x47e [l2tp_core] [ 8683.928121] [<e0fe31fb>] l2tp_eth_dev_xmit+0x1a/0x2f [l2tp_eth] [ 8683.928121] [<c02e01e7>] dev_hard_start_xmit+0x333/0x3f2 [ 8683.928121] [<c02f064c>] sch_direct_xmit+0x55/0x119 [ 8683.928121] [<c02e0528>] dev_queue_xmit+0x282/0x418 [ 8683.928121] [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2 [ 8683.928121] [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c [ 8683.928121] [<c031f524>] arp_xmit+0x22/0x24 [ 8683.928121] [<c02e02a6>] ? dev_hard_start_xmit+0x3f2/0x3f2 [ 8683.928121] [<c031f567>] arp_send+0x41/0x48 [ 8683.928121] [<c031fa7d>] arp_process+0x289/0x491 [ 8683.928121] [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42 [ 8683.928121] [<c031f4fb>] NF_HOOK.clone.19+0x45/0x4c [ 8683.928121] [<c031f7a0>] arp_rcv+0xb1/0xc3 [ 8683.928121] [<c031f7f4>] ? __neigh_lookup.clone.20+0x42/0x42 [ 8683.928121] [<c02de91b>] __netif_receive_skb+0x329/0x378 [ 8683.928121] [<c02de9d3>] process_backlog+0x69/0x130 [ 8683.928121] [<c02df103>] net_rx_action+0x90/0x15d [ 8683.928121] [<c012b2b5>] __do_softirq+0x7b/0x118 [ 8683.928121] [<c012b23a>] ? local_bh_enable+0xd/0xd [ 8683.928121] <IRQ> [<c012b4d0>] ? irq_exit+0x41/0x91 [ 8683.928121] [<c0103c6f>] ? do_IRQ+0x79/0x8d [ 8683.928121] [<c0157ea1>] ? trace_hardirqs_off_caller+0x2e/0x86 [ 8683.928121] [<c034ef6e>] ? common_interrupt+0x2e/0x34 [ 8683.928121] [<c0108a33>] ? default_idle+0x23/0x38 [ 8683.928121] [<c01091a8>] ? cpu_idle+0x55/0x6f [ 8683.928121] [<c033df25>] ? rest_init+0xa1/0xa7 [ 8683.928121] [<c033de84>] ? __read_lock_failed+0x14/0x14 [ 8683.928121] [<c0498745>] ? start_kernel+0x303/0x30a [ 8683.928121] [<c0498209>] ? repair_env_string+0x51/0x51 [ 8683.928121] [<c04980a8>] ? i386_start_kernel+0xa8/0xaf It appears that like most virtual devices, l2tp should be converted to LLTX mode. This patch takes care of statistics using atomic_long in both RX and TX paths, and fix a bug in l2tp_eth_dev_recv(), which was caching skb->data before a pskb_may_pull() call. Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Denys Fedoryshchenko <[email protected]> Cc: James Chapman <[email protected]> Cc: Hong zhi guo <[email protected]> Cc: Francois Romieu <[email protected]> Signed-off-by: David S. Miller <[email protected]>

A recursive lockdep warning occurs if you call regulator_set_optimum_mode() on a regulator with a supply because there is no nesting annotation for the rdev->mutex. To avoid this warning, get the supply's load before locking the regulator's mutex to avoid grabbing the same class of lock twice. ============================================= [ INFO: possible recursive locking detected ] 3.4.0 #3257 Tainted: G W --------------------------------------------- swapper/0/1 is trying to acquire lock: (&rdev->mutex){+.+.+.}, at: [<c036e9e0>] regulator_get_voltage+0x18/0x38 but task is already holding lock: (&rdev->mutex){+.+.+.}, at: [<c036ef38>] regulator_set_optimum_mode+0x24/0x224 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&rdev->mutex); lock(&rdev->mutex); *** DEADLOCK *** May be due to missing lock nesting notation 3 locks held by swapper/0/1: #0: (&__lockdep_no_validate__){......}, at: [<c03dbb48>] __driver_attach+0x40/0x8c #1: (&__lockdep_no_validate__){......}, at: [<c03dbb58>] __driver_attach+0x50/0x8c #2: (&rdev->mutex){+.+.+.}, at: [<c036ef38>] regulator_set_optimum_mode+0x24/0x224 stack backtrace: [<c001521c>] (unwind_backtrace+0x0/0x12c) from [<c00cc4d4>] (validate_chain+0x760/0x1080) [<c00cc4d4>] (validate_chain+0x760/0x1080) from [<c00cd744>] (__lock_acquire+0x950/0xa10) [<c00cd744>] (__lock_acquire+0x950/0xa10) from [<c00cd990>] (lock_acquire+0x18c/0x1e8) [<c00cd990>] (lock_acquire+0x18c/0x1e8) from [<c080c248>] (mutex_lock_nested+0x68/0x3c4) [<c080c248>] (mutex_lock_nested+0x68/0x3c4) from [<c036e9e0>] (regulator_get_voltage+0x18/0x38) [<c036e9e0>] (regulator_get_voltage+0x18/0x38) from [<c036efb8>] (regulator_set_optimum_mode+0xa4/0x224) ... Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Mark Brown <[email protected]>

With commit 49dca5a I introduced a bug (visible if CONFIG_PROVE_RCU is enabled) which occures when a panic has happened: [ 1526.520230] =============================== [ 1526.520230] [ INFO: suspicious RCU usage. ] [ 1526.520230] 3.5.0-rc1+ #12 Not tainted [ 1526.520230] ------------------------------- [ 1526.520230] /c/kernel-tests/mm/include/linux/rcupdate.h:436 Illegal context switch in RCU read-side critical section! [ 1526.520230] [ 1526.520230] other info that might help us debug this: [ 1526.520230] [ 1526.520230] [ 1526.520230] rcu_scheduler_active = 1, debug_locks = 0 [ 1526.520230] 3 locks held by net.agent/3279: [ 1526.520230] #0: (&mm->mmap_sem){++++++}, at: [<ffffffff82f85962>] do_page_fault+0x193/0x390 [ 1526.520230] #1: (panic_lock){+.+...}, at: [<ffffffff82ed2830>] panic+0x37/0x1d3 [ 1526.520230] #2: (rcu_read_lock){.+.+..}, at: [<ffffffff810b9b28>] rcu_lock_acquire+0x0/0x29 [ 1526.520230] [ 1526.520230] stack backtrace: [ 1526.520230] Pid: 3279, comm: net.agent Not tainted 3.5.0-rc1+ #12 [ 1526.520230] Call Trace: [ 1526.520230] [<ffffffff810e1570>] lockdep_rcu_suspicious+0x109/0x112 [ 1526.520230] [<ffffffff810bfe3a>] rcu_preempt_sleep_check+0x45/0x47 [ 1526.520230] [<ffffffff810bfe5a>] __might_sleep+0x1e/0x19a [ 1526.520230] [<ffffffff82f8010e>] down_write+0x26/0x81 [ 1526.520230] [<ffffffff8276a966>] led_trigger_unregister+0x1f/0x9c [ 1526.520230] [<ffffffff8276def5>] heartbeat_reboot_notifier+0x15/0x19 [ 1526.520230] [<ffffffff82f85bf5>] notifier_call_chain+0x96/0xcd [ 1526.520230] [<ffffffff82f85cba>] __atomic_notifier_call_chain+0x8e/0xff [ 1526.520230] [<ffffffff81094b7c>] ? kmsg_dump+0x37/0x1eb [ 1526.520230] [<ffffffff82f85d3f>] atomic_notifier_call_chain+0x14/0x16 [ 1526.520230] [<ffffffff82ed28e1>] panic+0xe8/0x1d3 [ 1526.520230] [<ffffffff811473e2>] out_of_memory+0x15d/0x1d3 So in case of a panic, now just turn of the LED. Other approaches like scheduling a work to unregister the trigger aren't working because there isn't much which still runs after a panic occured (except timers). Signed-off-by: Alexander Holler <[email protected]> Signed-off-by: Bryan Wu <[email protected]>

tvp5150 driver probe function doesn't check if the chip is present. Thus the driver can be loaded without having a device. This is dangerous and can cause kernel crash like this: Kernel BUG at c03c0964 [verbose debug info unavailable] Internal error: Oops - BUG: 0 [#1] PREEMPT ARM Modules linked in: CPU: 0 Tainted: G W (3.4.0-cm-t3730+ #2) PC is at media_entity_create_link+0xe4/0xf4 LR is at isp_register_entities+0x228/0x2f4 pc : [<c03c0964>] lr : [<c03f3b30>] psr: 60000013 sp : cf02de50 ip : 00000000 fp : c079405c r10: 00000000 r9 : 00000000 r8 : cf33c800 r7 : c0794834 r6 : 00000000 r5 : 00000000 r4 : cf365b48 r3 : 00000000 r2 : cf365b48 r1 : 00000000 r0 : cf33c800 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 80004019 DAC: 00000015 Process swapper (pid: 1, stack limit = 0xcf02c2f0) Stack: (0xcf02de50 to 0xcf02e000) de40: cf360000 cf366f28 cf366218 c0794834 de60: cf365b48 00000000 cf33c800 c03f3b30 00000000 00000000 cf360890 cf3668a0 de80: 00000003 cf360000 c0785a58 00000000 cf360528 00000000 00000000 00003fff dea0: cf360500 c03f4cdc c06cc1f4 cf360000 c0785a58 c0d27808 c07d55ec c0785a58 dec0: c031f0e0 c07d55ec c0776900 000000bb 00000000 c032040c c03203f4 c031ef0c dee0: c0785a58 c07d55ec c0785a8c c031f0e0 c075e670 c031f0c8 cf02deb8 c0785a58 df00: c07d55ec c031f174 c07d55ec 00000000 cf02df18 c031d7a0 cf01d4a8 cf068b10 df20: 00000000 c07d55ec c07c74d0 cf34bcc0 00000000 c031ded8 c0672340 c054cd38 df40: 00000000 c07e68c0 c07d55ec 00000000 00000000 c075e670 c0776900 000000bb df60: 00000000 c031f770 c07e68c0 00000007 c07e68c0 00000000 c075e670 c0008790 df80: 000000bb 00000006 00000006 c066e650 cf02dfa4 c07689b8 c07689b8 00000007 dfa0: c07e68c0 c073f2e8 c07689c0 000000bb 00000000 c073f2bc 00000006 00000006 dfc0: c073f2e8 00000000 c077649c c077649c c00150cc 00000013 00000000 00000000 dfe0: 00000000 c073f3cc cf02dfe8 00000000 c073f368 c00150cc 00000000 00000000 [<c03c0964>] (media_entity_create_link+0xe4/0xf4) from [<c03f3b30>] (isp_register_entities+0x228/0x2f4) [<c03f3b30>] (isp_register_entities+0x228/0x2f4) from [<c03f4cdc>] (isp_probe+0x7ac/0x9b8) [<c03f4cdc>] (isp_probe+0x7ac/0x9b8) from [<c032040c>] (platform_drv_probe+0x18/0x1c) [<c032040c>] (platform_drv_probe+0x18/0x1c) from [<c031ef0c>] (really_probe+0x64/0x1d8) [<c031ef0c>] (really_probe+0x64/0x1d8) from [<c031f0c8>] (driver_probe_device+0x48/0x60) [<c031f0c8>] (driver_probe_device+0x48/0x60) from [<c031f174>] (__driver_attach+0x94/0x98) [<c031f174>] (__driver_attach+0x94/0x98) from [<c031d7a0>] (bus_for_each_dev+0x54/0x80) [<c031d7a0>] (bus_for_each_dev+0x54/0x80) from [<c031ded8>] (bus_add_driver+0xac/0x2a8) [<c031ded8>] (bus_add_driver+0xac/0x2a8) from [<c031f770>] (driver_register+0x78/0x180) [<c031f770>] (driver_register+0x78/0x180) from [<c0008790>] (do_one_initcall+0x34/0x184) [<c0008790>] (do_one_initcall+0x34/0x184) from [<c073f2bc>] (do_basic_setup+0x9c/0xc8) [<c073f2bc>] (do_basic_setup+0x9c/0xc8) from [<c073f3cc>] (kernel_init+0x64/0xec) [<c073f3cc>] (kernel_init+0x64/0xec) from [<c00150cc>] (kernel_thread_exit+0x0/0x8) Code: e1c812b6 e8bd87f0 e7f001f2 eafffffe (e7f001f2) ---[ end trace 3ed3c618b26ff3e8 ]--- Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b This patch fixes the tvp5150_read() function to return an error in case the I2C transaction fails. tvp5150_probe() and other relevant driver callbacks changed to check the status of the I2C read operations. In case of a read error throw an error message with v4l2_err() instead of v4l2_dbg(). [[email protected]: Fix a small typo breaking compilation] Signed-off-by: Dmitry Lifshitz <[email protected]> Signed-off-by: Igor Grinberg <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>

While running hotplug tests I ran into this RCU splat =============================== [ INFO: suspicious RCU usage. ] 3.4.0 #3275 Tainted: G W ------------------------------- include/linux/rcupdate.h:729 rcu_read_lock() used illegally while idle! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0 RCU used illegally from extended quiescent state! 4 locks held by swapper/2/0: #0: ((cpu_died).wait.lock){......}, at: [<c00ab128>] complete+0x1c/0x5c #1: (&p->pi_lock){-.-.-.}, at: [<c00b275c>] try_to_wake_up+0x2c/0x388 #2: (&rq->lock){-.-.-.}, at: [<c00b2860>] try_to_wake_up+0x130/0x388 #3: (rcu_read_lock){.+.+..}, at: [<c00abe5c>] cpuacct_charge+0x28/0x1f4 stack backtrace: [<c001521c>] (unwind_backtrace+0x0/0x12c) from [<c00abec8>] (cpuacct_charge+0x94/0x1f4) [<c00abec8>] (cpuacct_charge+0x94/0x1f4) from [<c00b395c>] (update_curr+0x24c/0x2c8) [<c00b395c>] (update_curr+0x24c/0x2c8) from [<c00b59c4>] (enqueue_task_fair+0x50/0x194) [<c00b59c4>] (enqueue_task_fair+0x50/0x194) from [<c00afea4>] (enqueue_task+0x30/0x34) [<c00afea4>] (enqueue_task+0x30/0x34) from [<c00b0908>] (ttwu_activate+0x14/0x38) [<c00b0908>] (ttwu_activate+0x14/0x38) from [<c00b28a8>] (try_to_wake_up+0x178/0x388) [<c00b28a8>] (try_to_wake_up+0x178/0x388) from [<c00a82a0>] (__wake_up_common+0x34/0x78) [<c00a82a0>] (__wake_up_common+0x34/0x78) from [<c00ab154>] (complete+0x48/0x5c) [<c00ab154>] (complete+0x48/0x5c) from [<c07db7cc>] (cpu_die+0x2c/0x58) [<c07db7cc>] (cpu_die+0x2c/0x58) from [<c000f954>] (cpu_idle+0x64/0xfc) [<c000f954>] (cpu_idle+0x64/0xfc) from [<80208160>] (0x80208160) When a cpu is marked offline during its idle thread it calls cpu_die() during an RCU idle period. cpu_die() calls complete() to notify the killing process that the cpu has died. complete() calls into the scheduler code and eventually grabs an RCU read lock in cpuacct_charge(). Mark complete() as RCU_NONIDLE so that RCU pays attention to this CPU for the duration of the complete() function even though it's in idle. Suggested-by: "Paul E. McKenney" <[email protected]> Signed-off-by: Stephen Boyd <[email protected]> Signed-off-by: Russell King <[email protected]>

Jian found that when he ran fsx on a 32 bit arch with a large wsize the process and one of the bdi writeback kthreads would sometimes deadlock with a stack trace like this: crash> bt PID: 2789 TASK: f02edaa0 CPU: 3 COMMAND: "fsx" #0 [eed63cbc] schedule at c083c5b3 #1 [eed63d80] kmap_high at c0500ec8 #2 [eed63db0] cifs_async_writev at f7fabcd7 [cifs] #3 [eed63df0] cifs_writepages at f7fb7f5c [cifs] #4 [eed63e50] do_writepages at c04f3e32 #5 [eed63e54] __filemap_fdatawrite_range at c04e152a #6 [eed63ea4] filemap_fdatawrite at c04e1b3e #7 [eed63eb4] cifs_file_aio_write at f7fa111a [cifs] #8 [eed63ecc] do_sync_write at c052d202 #9 [eed63f74] vfs_write at c052d4ee #10 [eed63f94] sys_write at c052df4c #11 [eed63fb0] ia32_sysenter_target at c0409a98 EAX: 00000004 EBX: 00000003 ECX: abd73b73 EDX: 012a65c6 DS: 007b ESI: 012a65c6 ES: 007b EDI: 00000000 SS: 007b ESP: bf8db178 EBP: bf8db1f8 GS: 0033 CS: 0073 EIP: 40000424 ERR: 00000004 EFLAGS: 00000246 Each task would kmap part of its address array before getting stuck, but not enough to actually issue the write. This patch fixes this by serializing the marshal_iov operations for async reads and writes. The idea here is to ensure that cifs aggressively tries to populate a request before attempting to fulfill another one. As soon as all of the pages are kmapped for a request, then we can unlock and allow another one to proceed. There's no need to do this serialization on non-CONFIG_HIGHMEM arches however, so optimize all of this out when CONFIG_HIGHMEM isn't set. Cc: <[email protected]> Reported-by: Jian Li <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Steve French <[email protected]>

…d reasons We've had some reports of a deadlock where rpciod ends up with a stack trace like this: PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14" #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9 #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs] #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8 #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs] #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs] #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670 #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271 #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638 #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942 #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9 #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808 #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6 #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7 #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc] #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc] #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0 #24 [ffff8810343bfee8] kthread at ffffffff8108dd96 #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca rpciod is trying to allocate memory for a new socket to talk to the server. The VM ends up calling ->releasepage to get more memory, and it tries to do a blocking commit. That commit can't succeed however without a connected socket, so we deadlock. Fix this by setting PF_FSTRANS on the workqueue task prior to doing the socket allocation, and having nfs_release_page check for that flag when deciding whether to do a commit call. Also, set PF_FSTRANS unconditionally in rpc_async_schedule since that function can also do allocations sometimes. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Cc: [email protected]

On architectures where cputime_t is 64 bit type, is possible to trigger divide by zero on do_div(temp, (__force u32) total) line, if total is a non zero number but has lower 32 bit's zeroed. Removing casting is not a good solution since some do_div() implementations do cast to u32 internally. This problem can be triggered in practice on very long lived processes: PID: 2331 TASK: ffff880472814b00 CPU: 2 COMMAND: "oraagent.bin" #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2 #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00 #3 [ffff880472a51cd0] die at ffffffff8100f26b #4 [ffff880472a51d00] do_trap at ffffffff814f03f4 #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff #6 [ffff880472a51e00] divide_error at ffffffff8100be7b [exception RIP: thread_group_times+0x56] RIP: ffffffff81056a16 RSP: ffff880472a51eb8 RFLAGS: 00010046 RAX: bc3572c9fe12d194 RBX: ffff880874150800 RCX: 0000000110266fad RDX: 0000000000000000 RSI: ffff880472a51eb8 RDI: 001038ae7d9633dc RBP: ffff880472a51ef8 R8: 00000000b10a3a64 R9: ffff880874150800 R10: 00007fcba27ab680 R11: 0000000000000202 R12: ffff880472a51f08 R13: ffff880472a51f10 R14: 0000000000000000 R15: 0000000000000007 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d #8 [ffff880472a51f40] sys_times at ffffffff81088524 #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2 RIP: 0000003808caac3a RSP: 00007fcba27ab6d8 RFLAGS: 00000202 RAX: 0000000000000064 RBX: ffffffff8100b0f2 RCX: 0000000000000000 RDX: 00007fcba27ab6e0 RSI: 000000000076d58e RDI: 00007fcba27ab6e0 RBP: 00007fcba27ab700 R8: 0000000000000020 R9: 000000000000091b R10: 00007fcba27ab680 R11: 0000000000000202 R12: 00007fff9ca41940 R13: 0000000000000000 R14: 00007fcba27ac9c0 R15: 00007fff9ca41940 ORIG_RAX: 0000000000000064 CS: 0033 SS: 002b Cc: [email protected] Signed-off-by: Stanislaw Gruszka <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>

Commit 6f458df (tcp: improve latencies of timer triggered events) added bug leading to following trace : [ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000 [ 2866.131726] [ 2866.132188] ========================= [ 2866.132281] [ BUG: held lock freed! ] [ 2866.132281] 3.6.0-rc1+ #622 Not tainted [ 2866.132281] ------------------------- [ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there! [ 2866.132281] (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6 [ 2866.132281] 4 locks held by kworker/0:1/652: [ 2866.132281] #0: (rpciod){.+.+.+}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f [ 2866.132281] #1: ((&task->u.tk_work)){+.+.+.}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f [ 2866.132281] #2: (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6 [ 2866.132281] #3: (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff81078017>] run_timer_softirq+0x1ad/0x35f [ 2866.132281] [ 2866.132281] stack backtrace: [ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622 [ 2866.132281] Call Trace: [ 2866.132281] <IRQ> [<ffffffff810bc527>] debug_check_no_locks_freed+0x112/0x159 [ 2866.132281] [<ffffffff818a0839>] ? __sk_free+0xfd/0x114 [ 2866.132281] [<ffffffff811549fa>] kmem_cache_free+0x6b/0x13a [ 2866.132281] [<ffffffff818a0839>] __sk_free+0xfd/0x114 [ 2866.132281] [<ffffffff818a08c0>] sk_free+0x1c/0x1e [ 2866.132281] [<ffffffff81911e1c>] tcp_write_timer+0x51/0x56 [ 2866.132281] [<ffffffff81078082>] run_timer_softirq+0x218/0x35f [ 2866.132281] [<ffffffff81078017>] ? run_timer_softirq+0x1ad/0x35f [ 2866.132281] [<ffffffff810f5831>] ? rb_commit+0x58/0x85 [ 2866.132281] [<ffffffff81911dcb>] ? tcp_write_timer_handler+0x148/0x148 [ 2866.132281] [<ffffffff81070bd6>] __do_softirq+0xcb/0x1f9 [ 2866.132281] [<ffffffff81a0a00c>] ? _raw_spin_unlock+0x29/0x2e [ 2866.132281] [<ffffffff81a1227c>] call_softirq+0x1c/0x30 [ 2866.132281] [<ffffffff81039f38>] do_softirq+0x4a/0xa6 [ 2866.132281] [<ffffffff81070f2b>] irq_exit+0x51/0xad [ 2866.132281] [<ffffffff81a129cd>] do_IRQ+0x9d/0xb4 [ 2866.132281] [<ffffffff81a0a3ef>] common_interrupt+0x6f/0x6f [ 2866.132281] <EOI> [<ffffffff8109d006>] ? sched_clock_cpu+0x58/0xd1 [ 2866.132281] [<ffffffff81a0a172>] ? _raw_spin_unlock_irqrestore+0x4c/0x56 [ 2866.132281] [<ffffffff81078692>] mod_timer+0x178/0x1a9 [ 2866.132281] [<ffffffff818a00aa>] sk_reset_timer+0x19/0x26 [ 2866.132281] [<ffffffff8190b2cc>] tcp_rearm_rto+0x99/0xa4 [ 2866.132281] [<ffffffff8190dfba>] tcp_event_new_data_sent+0x6e/0x70 [ 2866.132281] [<ffffffff8190f7ea>] tcp_write_xmit+0x7de/0x8e4 [ 2866.132281] [<ffffffff818a565d>] ? __alloc_skb+0xa0/0x1a1 [ 2866.132281] [<ffffffff8190f952>] __tcp_push_pending_frames+0x2e/0x8a [ 2866.132281] [<ffffffff81904122>] tcp_sendmsg+0xb32/0xcc6 [ 2866.132281] [<ffffffff819229c2>] inet_sendmsg+0xaa/0xd5 [ 2866.132281] [<ffffffff81922918>] ? inet_autobind+0x5f/0x5f [ 2866.132281] [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb [ 2866.132281] [<ffffffff8189adab>] sock_sendmsg+0xa3/0xc4 [ 2866.132281] [<ffffffff810f5de6>] ? rb_reserve_next_event+0x26f/0x2d5 [ 2866.132281] [<ffffffff8103e6a9>] ? native_sched_clock+0x29/0x6f [ 2866.132281] [<ffffffff8103e6f8>] ? sched_clock+0x9/0xd [ 2866.132281] [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb [ 2866.132281] [<ffffffff8189ae03>] kernel_sendmsg+0x37/0x43 [ 2866.132281] [<ffffffff8199ce49>] xs_send_kvec+0x77/0x80 [ 2866.132281] [<ffffffff8199cec1>] xs_sendpages+0x6f/0x1a0 [ 2866.132281] [<ffffffff8107826d>] ? try_to_del_timer_sync+0x55/0x61 [ 2866.132281] [<ffffffff8199d0d2>] xs_tcp_send_request+0x55/0xf1 [ 2866.132281] [<ffffffff8199bb90>] xprt_transmit+0x89/0x1db [ 2866.132281] [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c [ 2866.132281] [<ffffffff81999d92>] call_transmit+0x1c5/0x20e [ 2866.132281] [<ffffffff819a0d55>] __rpc_execute+0x6f/0x225 [ 2866.132281] [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c [ 2866.132281] [<ffffffff819a0f33>] rpc_async_schedule+0x28/0x34 [ 2866.132281] [<ffffffff810835d6>] process_one_work+0x24d/0x47f [ 2866.132281] [<ffffffff81083567>] ? process_one_work+0x1de/0x47f [ 2866.132281] [<ffffffff819a0f0b>] ? __rpc_execute+0x225/0x225 [ 2866.132281] [<ffffffff81083a6d>] worker_thread+0x236/0x317 [ 2866.132281] [<ffffffff81083837>] ? process_scheduled_works+0x2f/0x2f [ 2866.132281] [<ffffffff8108b7b8>] kthread+0x9a/0xa2 [ 2866.132281] [<ffffffff81a12184>] kernel_thread_helper+0x4/0x10 [ 2866.132281] [<ffffffff81a0a4b0>] ? retint_restore_args+0x13/0x13 [ 2866.132281] [<ffffffff8108b71e>] ? __init_kthread_worker+0x5a/0x5a [ 2866.132281] [<ffffffff81a12180>] ? gs_change+0x13/0x13 [ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000 [ 2866.309689] ============================================================================= [ 2866.310254] BUG TCP (Not tainted): Object already free [ 2866.310254] ----------------------------------------------------------------------------- [ 2866.310254] The bug comes from the fact that timer set in sk_reset_timer() can run before we actually do the sock_hold(). socket refcount reaches zero and we free the socket too soon. timer handler is not allowed to reduce socket refcnt if socket is owned by the user, or we need to change sk_reset_timer() implementation. We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED was used instead of TCP_DELACK_TIMER_DEFERRED. For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED, even if not fired from a timer. Reported-by: Fengguang Wu <[email protected]> Tested-by: Fengguang Wu <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>

Not all I/O ASIC versions have the free-running counter implemented, an early revision used in the 5000/1xx models aka 3MIN and 4MIN did not have it. Therefore we cannot unconditionally use it as a clock source. Fortunately if not implemented its register slot has a fixed value so it is enough if we check for the value at the end of the calibration period being the same as at the beginning. This also means we need to look for another high-precision clock source on the systems affected. The 5000/1xx can have an R4000SC processor installed where the CP0 Count register can be used as a clock source. Unfortunately all the R4k DECstations suffer from the missed timer interrupt on CP0 Count reads erratum, so we cannot use the CP0 timer as a clock source and a clock event both at a time. However we never need an R4k clock event device because all DECstations have a DS1287A RTC chip whose periodic interrupt can be used as a clock source. This gives us the following four configuration possibilities for I/O ASIC DECstations: 1. No I/O ASIC counter and no CP0 timer, e.g. R3k 5000/1xx (3MIN). 2. No I/O ASIC counter but the CP0 timer, i.e. R4k 5000/150 (4MIN). 3. The I/O ASIC counter but no CP0 timer, e.g. R3k 5000/240 (3MAX+). 4. The I/O ASIC counter and the CP0 timer, e.g. R4k 5000/260 (4MAX+). For #1 and #2 this change stops the I/O ASIC free-running counter from being installed as a clock source of a 0Hz frequency. For #2 it also arranges for the CP0 timer to be used as a clock source rather than a clock event device, because having an accurate wall clock is more important than a high-precision interval timer. For #3 there is no change. For #4 the change makes the I/O ASIC free-running counter installed as a clock source so that the CP0 timer can be used as a clock event device. Unfortunately the use of the CP0 timer as a clock event device relies on a succesful completion of c0_compare_interrupt. That never happens, because while waiting for a CP0 Compare interrupt to happen the function spins in a loop reading the CP0 Count register. This makes the CP0 Count erratum trigger reliably causing the interrupt waited for to be lost in all cases. As a result #4 resorts to using the CP0 timer as a clock source as well, just as #2. However we want to keep this separate arrangement in case (hope) c0_compare_interrupt is eventually rewritten such that it avoids the erratum. Signed-off-by: Maciej W. Rozycki <[email protected]> Cc: [email protected] Patchwork: https://patchwork.linux-mips.org/patch/5825/ Signed-off-by: Ralf Baechle <[email protected]>

When parsing lines from objdump a line containing source code starting with a numeric label is mistaken for a line of disassembly starting with a memory address. Current validation fails to recognise that the "memory address" is out of range and calculates an invalid offset which later causes this segfault: Program received signal SIGSEGV, Segmentation fault. 0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50) at util/annotate.c:631 631 hits += h->addr[offset++]; (gdb) bt #0 0x0000000000457315 in disasm__calc_percent (notes=0xc98970, evidx=0, offset=143705, end=2127526177, path=0x7fffffffbf50) at util/annotate.c:631 #1 0x00000000004d65e3 in annotate_browser__calc_percent (browser=0x7fffffffd130, evsel=0xa01da0) at ui/browsers/annotate.c:364 #2 0x00000000004d7433 in annotate_browser__run (browser=0x7fffffffd130, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:672 #3 0x00000000004d80c9 in symbol__tui_annotate (sym=0xc989a0, map=0xa02660, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:962 #4 0x00000000004d7aa0 in hist_entry__tui_annotate (he=0xdf73f0, evsel=0xa01da0, hbt=0x0) at ui/browsers/annotate.c:823 #5 0x00000000004dd648 in perf_evsel__hists_browse (evsel=0xa01da0, nr_events=1, helpline= 0x58b768 "For a higher level overview, try: perf report --sort comm,dso", ev_name=0xa02cd0 "cycles", left_exits=false, hbt= 0x0, min_pcnt=0, env=0xa011e0) at ui/browsers/hists.c:1659 #6 0x00000000004de372 in perf_evlist__tui_browse_hists (evlist=0xa01520, help= 0x58b768 "For a higher level overview, try: perf report --sort comm,dso", hbt=0x0, min_pcnt=0, env=0xa011e0) at ui/browsers/hists.c:1950 #7 0x000000000042cf6b in __cmd_report (rep=0x7fffffffd6c0) at builtin-report.c:581 #8 0x000000000042e25d in cmd_report (argc=0, argv=0x7fffffffe4b0, prefix=0x0) at builtin-report.c:965 #9 0x000000000041a0e1 in run_builtin (p=0x801548, argc=1, argv=0x7fffffffe4b0) at perf.c:319 #10 0x000000000041a319 in handle_internal_command (argc=1, argv=0x7fffffffe4b0) at perf.c:376 #11 0x000000000041a465 in run_argv (argcp=0x7fffffffe38c, argv=0x7fffffffe380) at perf.c:420 #12 0x000000000041a707 in main (argc=1, argv=0x7fffffffe4b0) at perf.c:521 After the fix is applied the symbol can be annotated showing the problematic line "1: rep" copy_user_generic_string /usr/lib/debug/lib/modules/3.9.10-100.fc17.x86_64/vmlinux */ ENTRY(copy_user_generic_string) CFI_STARTPROC ASM_STAC andl %edx,%edx and %edx,%edx jz 4f je 37 cmpl $8,%edx cmp $0x8,%edx jb 2f /* less than 8 bytes, go to byte copy loop */ jb 33 ALIGN_DESTINATION mov %edi,%ecx and $0x7,%ecx je 28 sub $0x8,%ecx neg %ecx sub %ecx,%edx 1a: mov (%rsi),%al mov %al,(%rdi) inc %rsi inc %rdi dec %ecx jne 1a movl %edx,%ecx 28: mov %edx,%ecx shrl $3,%ecx shr $0x3,%ecx andl $7,%edx and $0x7,%edx 1: rep 100.00 rep movsq %ds:(%rsi),%es:(%rdi) movsq 2: movl %edx,%ecx 33: mov %edx,%ecx 3: rep rep movsb %ds:(%rsi),%es:(%rdi) movsb 4: xorl %eax,%eax 37: xor %eax,%eax data32 xchg %ax,%ax ASM_CLAC ret retq Signed-off-by: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>

This patch fixes the issues indicated by the test results that ipmi_msg_handler() is invoked in atomic context. BUG: scheduling while atomic: kipmi0/18933/0x10000100 Modules linked in: ipmi_si acpi_ipmi ... CPU: 3 PID: 18933 Comm: kipmi0 Tainted: G AW 3.10.0-rc7+ #2 Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.0027.070120100606 07/01/2010 ffff8838245eea00 ffff88103fc63c98 ffffffff814c4a1e ffff88103fc63ca8 ffffffff814bfbab ffff88103fc63d28 ffffffff814c73e0 ffff88103933cbd4 0000000000000096 ffff88103fc63ce8 ffff88102f618000 ffff881035c01fd8 Call Trace: <IRQ> [<ffffffff814c4a1e>] dump_stack+0x19/0x1b [<ffffffff814bfbab>] __schedule_bug+0x46/0x54 [<ffffffff814c73e0>] __schedule+0x83/0x59c [<ffffffff81058853>] __cond_resched+0x22/0x2d [<ffffffff814c794b>] _cond_resched+0x14/0x1d [<ffffffff814c6d82>] mutex_lock+0x11/0x32 [<ffffffff8101e1e9>] ? __default_send_IPI_dest_field.constprop.0+0x53/0x58 [<ffffffffa09e3f9c>] ipmi_msg_handler+0x23/0x166 [ipmi_si] [<ffffffff812bf6e4>] deliver_response+0x55/0x5a [<ffffffff812c0fd4>] handle_new_recv_msgs+0xb67/0xc65 [<ffffffff81007ad1>] ? read_tsc+0x9/0x19 [<ffffffff814c8620>] ? _raw_spin_lock_irq+0xa/0xc [<ffffffffa09e1128>] ipmi_thread+0x5c/0x146 [ipmi_si] ... Also Tony Camuso says: We were getting occasional "Scheduling while atomic" call traces during boot on some systems. Problem was first seen on a Cisco C210 but we were able to reproduce it on a Cisco c220m3. Setting CONFIG_LOCKDEP and LOCKDEP_SUPPORT to 'y' exposed a lockdep around tx_msg_lock in acpi_ipmi.c struct acpi_ipmi_device. ================================= [ INFO: inconsistent lock state ] 2.6.32-415.el6.x86_64-debug-splck #1 --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/3/17 [HC0[0]:SC1[1]:HE1:SE0] takes: (&ipmi_device->tx_msg_lock){+.?...}, at: [<ffffffff81337a27>] ipmi_msg_handler+0x71/0x126 {SOFTIRQ-ON-W} state was registered at: [<ffffffff810ba11c>] __lock_acquire+0x63c/0x1570 [<ffffffff810bb0f4>] lock_acquire+0xa4/0x120 [<ffffffff815581cc>] __mutex_lock_common+0x4c/0x400 [<ffffffff815586ea>] mutex_lock_nested+0x4a/0x60 [<ffffffff8133789d>] acpi_ipmi_space_handler+0x11b/0x234 [<ffffffff81321c62>] acpi_ev_address_space_dispatch+0x170/0x1be The fix implemented by this change has been tested by Tony: Tested the patch in a boot loop with lockdep debug enabled and never saw the problem in over 400 reboots. Reported-and-tested-by: Tony Camuso <[email protected]> Signed-off-by: Lv Zheng <[email protected]> Reviewed-by: Huang Ying <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>

Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: Michael L. Semon <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]> (cherry picked from commit f112a04)

When btrfs creates a bioset, we must also allocate the integrity data pool. Otherwise btrfs will crash when it tries to submit a bio to a checksumming disk: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 PGD 2305e4067 PUD 23063d067 PMD 0 Oops: 0000 [#1] PREEMPT SMP Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug] CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000 RIP: 0010:[<ffffffff8111e28a>] [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 RSP: 0018:ffff880230699688 EFLAGS: 00010286 RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445 RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000 RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008 R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210 R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720 FS: 00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0 Stack: ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250 ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000 0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900 Call Trace: [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0 [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360 [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150 [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs] [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460 [<ffffffff8123e58a>] generic_make_request+0xca/0x100 [<ffffffff8123e639>] submit_bio+0x79/0x160 [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs] [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs] [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs] [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs] [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0 [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260 [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120 [btrfs] [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs] [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs] [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs] [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0 [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0 [<ffffffff81176ba0>] mount_fs+0x20/0xd0 [<ffffffff81191096>] vfs_kern_mount+0x76/0x120 [<ffffffff81193320>] do_mount+0x200/0xa40 [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80 [<ffffffff81193bf0>] SyS_mount+0x90/0xe0 [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48 89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7 44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74 RIP [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150 RSP <ffff880230699688> CR2: 0000000000000018 ---[ end trace 7a96042017ed21e2 ]--- Signed-off-by: Darrick J. Wong <[email protected]> Signed-off-by: Josef Bacik <[email protected]> Signed-off-by: Chris Mason <[email protected]>

->needs_read_fill is used to implement the following behaviors. 1. Ensure buffer filling on the first read. 2. Force buffer filling after a write. 3. Force buffer filling after a successful poll. However, #2 and #3 don't really work as sysfs doesn't reset file position. While the read buffer would be refilled, the next read would continue from the position after the last read or write, requiring an explicit seek to the start for it to be useful, which makes ->needs_read_fill superflous as read buffer is always refilled if f_pos == 0. Update sysfs_read_file() to test buffer->page for #1 instead and remove ->needs_read_fill. While this changes behavior in extreme corner cases - e.g. re-reading a sysfs file after seeking to non-zero position after a write or poll, it's highly unlikely to lead to actual breakage. This change is to prepare for using seq_file in the read path. While at it, reformat a comment in fill_write_buffer(). Signed-off-by: Tejun Heo <[email protected]> Cc: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

Booting a mx6 with CONFIG_PROVE_LOCKING we get: ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0-rc4-next-20131009+ #34 Not tainted ------------------------------------------------------- swapper/0/1 is trying to acquire lock: (&imx_drm_device->mutex){+.+.+.}, at: [<804575a8>] imx_drm_encoder_get_mux_id+0x28/0x98 but task is already holding lock: (&crtc->mutex){+.+...}, at: [<802fe778>] drm_modeset_lock_all+0x40/0x54 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&crtc->mutex){+.+...}: [<800777d0>] __lock_acquire+0x18d4/0x1c24 [<80077fec>] lock_acquire+0x68/0x7c [<805ead5c>] _mutex_lock_nest_lock+0x58/0x3a8 [<802fec50>] drm_crtc_init+0x48/0xa8 [<80457c88>] imx_drm_add_crtc+0xd4/0x144 [<8045e2e8>] ipu_drm_probe+0x114/0x1fc [<80312278>] platform_drv_probe+0x20/0x50 [<80310c68>] driver_probe_device+0x110/0x22c [<80310e20>] __driver_attach+0x9c/0xa0 [<8030f218>] bus_for_each_dev+0x5c/0x90 [<80310750>] driver_attach+0x20/0x28 [<8031034c>] bus_add_driver+0xdc/0x1dc [<803114d8>] driver_register+0x80/0xfc [<80312198>] __platform_driver_register+0x50/0x64 [<808172fc>] ipu_drm_driver_init+0x18/0x20 [<800088c0>] do_one_initcall+0xfc/0x160 [<807e7c5c>] kernel_init_freeable+0x104/0x1d4 [<805e2930>] kernel_init+0x10/0xec [<8000ea68>] ret_from_fork+0x14/0x2c -> #1 (&dev->mode_config.mutex){+.+.+.}: [<800777d0>] __lock_acquire+0x18d4/0x1c24 [<80077fec>] lock_acquire+0x68/0x7c [<805eb100>] mutex_lock_nested+0x54/0x3a4 [<802fe758>] drm_modeset_lock_all+0x20/0x54 [<802fead4>] drm_encoder_init+0x20/0x7c [<80457ae4>] imx_drm_add_encoder+0x88/0xec [<80459838>] imx_ldb_probe+0x344/0x4fc [<80312278>] platform_drv_probe+0x20/0x50 [<80310c68>] driver_probe_device+0x110/0x22c [<80310e20>] __driver_attach+0x9c/0xa0 [<8030f218>] bus_for_each_dev+0x5c/0x90 [<80310750>] driver_attach+0x20/0x28 [<8031034c>] bus_add_driver+0xdc/0x1dc [<803114d8>] driver_register+0x80/0xfc [<80312198>] __platform_driver_register+0x50/0x64 [<8081722c>] imx_ldb_driver_init+0x18/0x20 [<800088c0>] do_one_initcall+0xfc/0x160 [<807e7c5c>] kernel_init_freeable+0x104/0x1d4 [<805e2930>] kernel_init+0x10/0xec [<8000ea68>] ret_from_fork+0x14/0x2c -> #0 (&imx_drm_device->mutex){+.+.+.}: [<805e510c>] print_circular_bug+0x74/0x2e0 [<80077ad0>] __lock_acquire+0x1bd4/0x1c24 [<80077fec>] lock_acquire+0x68/0x7c [<805eb100>] mutex_lock_nested+0x54/0x3a4 [<804575a8>] imx_drm_encoder_get_mux_id+0x28/0x98 [<80459a98>] imx_ldb_encoder_prepare+0x34/0x114 [<802ef724>] drm_crtc_helper_set_mode+0x1f0/0x4c0 [<802f0344>] drm_crtc_helper_set_config+0x828/0x99c [<802ff270>] drm_mode_set_config_internal+0x5c/0xdc [<802eebe0>] drm_fb_helper_set_par+0x50/0xb4 [<802af580>] fbcon_init+0x490/0x500 [<802dd104>] visual_init+0xa8/0xf8 [<802df414>] do_bind_con_driver+0x140/0x37c [<802df764>] do_take_over_console+0x114/0x1c4 [<802af65c>] do_fbcon_takeover+0x6c/0xd4 [<802b2b30>] fbcon_event_notify+0x7c8/0x818 [<80049954>] notifier_call_chain+0x4c/0x8c [<80049cd8>] __blocking_notifier_call_chain+0x50/0x68 [<80049d10>] blocking_notifier_call_chain+0x20/0x28 [<802a75f0>] fb_notifier_call_chain+0x1c/0x24 [<802a9224>] register_framebuffer+0x188/0x268 [<802ee994>] drm_fb_helper_initial_config+0x2bc/0x4b8 [<802f118c>] drm_fbdev_cma_init+0x7c/0xec [<80817288>] imx_fb_helper_init+0x54/0x90 [<800088c0>] do_one_initcall+0xfc/0x160 [<807e7c5c>] kernel_init_freeable+0x104/0x1d4 [<805e2930>] kernel_init+0x10/0xec [<8000ea68>] ret_from_fork+0x14/0x2c other info that might help us debug this: Chain exists of: &imx_drm_device->mutex --> &dev->mode_config.mutex --> &crtc->mutex Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&crtc->mutex); lock(&dev->mode_config.mutex); lock(&crtc->mutex); lock(&imx_drm_device->mutex); *** DEADLOCK *** 6 locks held by swapper/0/1: #0: (registration_lock){+.+.+.}, at: [<802a90bc>] register_framebuffer+0x20/0x268 #1: (&fb_info->lock){+.+.+.}, at: [<802a7a90>] lock_fb_info+0x20/0x44 #2: (console_lock){+.+.+.}, at: [<802a9218>] register_framebuffer+0x17c/0x268 #3: ((fb_notifier_list).rwsem){.+.+.+}, at: [<80049cbc>] __blocking_notifier_call_chain+0x34/0x68 #4: (&dev->mode_config.mutex){+.+.+.}, at: [<802fe758>] drm_modeset_lock_all+0x20/0x54 #5: (&crtc->mutex){+.+...}, at: [<802fe778>] drm_modeset_lock_all+0x40/0x54 In order to avoid this lockdep warning, remove the locking from imx_drm_encoder_get_mux_id() and imx_drm_crtc_panel_format_pins(). Tested on a mx6sabrelite and mx53qsb. Reported-by: Russell King <[email protected]> Tested-by: Russell King <[email protected]> Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

If EM Transmit bit is busy during init ata_msleep() is called. It is wrong - msleep() should be used instead of ata_msleep(), because if EM Transmit bit is busy for one port, it will be busy for all other ports too, so using ata_msleep() causes wasting tries for another ports. The most common scenario looks like that now (six ports try to transmit a LED meaasege): - port #0 tries for the 1st time and succeeds - ports #1-5 try for the 1st time and sleeps - port #1 tries for the 2nd time and succeeds - ports #2-5 try for the 2nd time and sleeps - port #2 tries for the 3rd time and succeeds - ports #3-5 try for the 3rd time and sleeps - port #3 tries for the 4th time and succeeds - ports #4-5 try for the 4th time and sleeps - port #4 tries for the 5th time and succeeds - port #5 tries for the 5th time and sleeps At this moment port #5 wasted all its five tries and failed to initialize. Because there are only 5 (EM_MAX_RETRY) tries available usually only five ports succeed to initialize. The sixth port and next ones usually will fail. If msleep() is used instead of ata_msleep() the first port succeeds to initialize in the first try and next ones usually succeed to initialize in the second try. tj: updated comment Signed-off-by: Lukasz Dorau <[email protected]> Signed-off-by: Tejun Heo <[email protected]>

…nfs_open() Use i_writecount to control whether to get an fscache cookie in nfs_open() as NFS does not do write caching yet. I *think* this is the cause of a problem encountered by Mark Moseley whereby __fscache_uncache_page() gets a NULL pointer dereference because cookie->def is NULL: BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 PGD 0 Thread overran stack, or stack corrupted Oops: 0000 [#1] SMP Modules linked in: ... CPU: 7 PID: 18993 Comm: php Not tainted 3.11.1 #1 Hardware name: Dell Inc. PowerEdge R420/072XWF, BIOS 1.3.5 08/21/2012 task: ffff8804203460c0 ti: ffff880420346640 RIP: 0010:[<ffffffff812a1903>] __fscache_uncache_page+0x23/0x160 RSP: 0018:ffff8801053af878 EFLAGS: 00210286 RAX: 0000000000000000 RBX: ffff8800be2f8780 RCX: ffff88022ffae5e8 RDX: 0000000000004c66 RSI: ffffea00055ff440 RDI: ffff8800be2f8780 RBP: ffff8801053af898 R08: 0000000000000001 R09: 0000000000000003 R10: 0000000000000000 R11: 0000000000000000 R12: ffffea00055ff440 R13: 0000000000001000 R14: ffff8800c50be538 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88042fc60000(0063) knlGS:00000000e439c700 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000000001d8f000 CR4: 00000000000607f0 Stack: ... Call Trace: [<ffffffff81365a72>] __nfs_fscache_invalidate_page+0x42/0x70 [<ffffffff813553d5>] nfs_invalidate_page+0x75/0x90 [<ffffffff811b8f5e>] truncate_inode_page+0x8e/0x90 [<ffffffff811b90ad>] truncate_inode_pages_range.part.12+0x14d/0x620 [<ffffffff81d6387d>] ? __mutex_lock_slowpath+0x1fd/0x2e0 [<ffffffff811b95d3>] truncate_inode_pages_range+0x53/0x70 [<ffffffff811b969d>] truncate_inode_pages+0x2d/0x40 [<ffffffff811b96ff>] truncate_pagecache+0x4f/0x70 [<ffffffff81356840>] nfs_setattr_update_inode+0xa0/0x120 [<ffffffff81368de4>] nfs3_proc_setattr+0xc4/0xe0 [<ffffffff81357f78>] nfs_setattr+0xc8/0x150 [<ffffffff8122d95b>] notify_change+0x1cb/0x390 [<ffffffff8120a55b>] do_truncate+0x7b/0xc0 [<ffffffff8121f96c>] do_last+0xa4c/0xfd0 [<ffffffff8121ffbc>] path_openat+0xcc/0x670 [<ffffffff81220a0e>] do_filp_open+0x4e/0xb0 [<ffffffff8120ba1f>] do_sys_open+0x13f/0x2b0 [<ffffffff8126aaf6>] compat_SyS_open+0x36/0x50 [<ffffffff81d7204c>] sysenter_dispatch+0x7/0x24 The code at the instruction pointer was disassembled: > (gdb) disas __fscache_uncache_page > Dump of assembler code for function __fscache_uncache_page: > ... > 0xffffffff812a18ff <+31>: mov 0x48(%rbx),%rax > 0xffffffff812a1903 <+35>: cmpb $0x0,0x10(%rax) > 0xffffffff812a1907 <+39>: je 0xffffffff812a19cd <__fscache_uncache_page+237> These instructions make up: ASSERTCMP(cookie->def->type, !=, FSCACHE_COOKIE_TYPE_INDEX); That cmpb is the faulting instruction (%rax is 0). So cookie->def is NULL - which presumably means that the cookie has already been at least partway through __fscache_relinquish_cookie(). What I think may be happening is something like a three-way race on the same file: PROCESS 1 PROCESS 2 PROCESS 3 =============== =============== =============== open(O_TRUNC|O_WRONLY) open(O_RDONLY) open(O_WRONLY) -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_disable_inode_cookie() __fscache_relinquish_cookie() nfs_inode->fscache = NULL <--nfs_fscache_set_inode_cookie() -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_enable_inode_cookie() __fscache_acquire_cookie() nfs_inode->fscache = cookie <--nfs_fscache_set_inode_cookie() <--nfs_open() -->nfs_setattr() ... ... -->nfs_invalidate_page() -->__nfs_fscache_invalidate_page() cookie = nfsi->fscache -->nfs_open() -->nfs_fscache_set_inode_cookie() nfs_fscache_inode_lock() nfs_fscache_disable_inode_cookie() -->__fscache_relinquish_cookie() -->__fscache_uncache_page(cookie) <crash> <--__fscache_relinquish_cookie() nfs_inode->fscache = NULL <--nfs_fscache_set_inode_cookie() What is needed is something to prevent process #2 from reacquiring the cookie - and I think checking i_writecount should do the trick. It's also possible to have a two-way race on this if the file is opened O_TRUNC|O_RDONLY instead. Reported-by: Mark Moseley <[email protected]> Signed-off-by: David Howells <[email protected]>

As the new x86 CPU bootup printout format code maintainer, I am taking immediate action to improve and clean (and thus indulge my OCD) the reporting of the cores when coming up online. Fix padding to a right-hand alignment, cleanup code and bind reporting width to the max number of supported CPUs on the system, like this: [ 0.074509] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.644008] smpboot: Booting Node 1, Processors: #8 #9 #10 #11 #12 #13 #14 #15 OK [ 1.245006] smpboot: Booting Node 2, Processors: #16 #17 #18 #19 #20 #21 #22 #23 OK [ 1.864005] smpboot: Booting Node 3, Processors: #24 #25 #26 #27 #28 #29 #30 #31 OK [ 2.489005] smpboot: Booting Node 4, Processors: #32 #33 #34 #35 #36 #37 #38 #39 OK [ 3.093005] smpboot: Booting Node 5, Processors: #40 #41 #42 #43 #44 #45 #46 #47 OK [ 3.698005] smpboot: Booting Node 6, Processors: #48 #49 #50 #51 #52 #53 #54 #55 OK [ 4.304005] smpboot: Booting Node 7, Processors: #56 #57 #58 #59 #60 #61 #62 #63 OK [ 4.961413] Brought up 64 CPUs and this: [ 0.072367] smpboot: Booting Node 0, Processors: #1 #2 #3 #4 #5 #6 #7 OK [ 0.686329] Brought up 8 CPUs Signed-off-by: Borislav Petkov <[email protected]> Cc: Libin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

Michael Semon reported that xfs/299 generated this lockdep warning: ============================================= [ INFO: possible recursive locking detected ] 3.12.0-rc2+ #2 Not tainted --------------------------------------------- touch/21072 is trying to acquire lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 but task is already holding lock: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&xfs_dquot_other_class); lock(&xfs_dquot_other_class); *** DEADLOCK *** May be due to missing lock nesting notation 7 locks held by touch/21072: #0: (sb_writers#10){++++.+}, at: [<c11185b6>] mnt_want_write+0x1e/0x3e #1: (&type->i_mutex_dir_key#4){+.+.+.}, at: [<c11078ee>] do_last+0x245/0xe40 #2: (sb_internal#2){++++.+}, at: [<c122c9e0>] xfs_trans_alloc+0x1f/0x35 #3: (&(&ip->i_lock)->mr_lock/1){+.+...}, at: [<c126cd1b>] xfs_ilock+0x100/0x1f1 #4: (&(&ip->i_lock)->mr_lock){++++-.}, at: [<c126cf52>] xfs_ilock_nowait+0x105/0x22f #5: (&dqp->q_qlock){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 #6: (&xfs_dquot_other_class){+.+...}, at: [<c12902fb>] xfs_trans_dqlockedjoin+0x57/0x64 The lockdep annotation for dquot lock nesting only understands locking for user and "other" dquots, not user, group and quota dquots. Fix the annotations to match the locking heirarchy we now have. Reported-by: Michael L. Semon <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Ben Myers <[email protected]> Signed-off-by: Ben Myers <[email protected]>

Turn it into (for example): [ 0.073380] x86: Booting SMP configuration: [ 0.074005] .... node #0, CPUs: #1 #2 #3 #4 #5 #6 #7 [ 0.603005] .... node #1, CPUs: #8 #9 #10 #11 #12 #13 #14 #15 [ 1.200005] .... node #2, CPUs: #16 #17 #18 #19 #20 #21 #22 #23 [ 1.796005] .... node #3, CPUs: #24 #25 #26 #27 #28 #29 #30 #31 [ 2.393005] .... node #4, CPUs: #32 #33 #34 #35 #36 #37 #38 #39 [ 2.996005] .... node #5, CPUs: #40 #41 #42 #43 #44 #45 #46 #47 [ 3.600005] .... node #6, CPUs: #48 #49 #50 #51 #52 #53 #54 #55 [ 4.202005] .... node #7, CPUs: #56 #57 #58 #59 #60 #61 #62 #63 [ 4.811005] .... node #8, CPUs: #64 #65 #66 #67 #68 #69 #70 #71 [ 5.421006] .... node #9, CPUs: #72 #73 #74 #75 #76 #77 #78 #79 [ 6.032005] .... node #10, CPUs: #80 #81 #82 #83 #84 #85 #86 #87 [ 6.648006] .... node #11, CPUs: #88 #89 #90 #91 #92 #93 #94 #95 [ 7.262005] .... node #12, CPUs: #96 #97 #98 #99 #100 #101 #102 #103 [ 7.865005] .... node #13, CPUs: #104 #105 #106 #107 #108 #109 #110 #111 [ 8.466005] .... node #14, CPUs: #112 #113 #114 #115 #116 #117 #118 #119 [ 9.073006] .... node #15, CPUs: #120 #121 #122 #123 #124 #125 #126 #127 [ 9.679901] x86: Booted up 16 nodes, 128 CPUs and drop useless elements. Change num_digits() to hpa's division-avoiding, cell-phone-typed version which he went at great lengths and pains to submit on a Saturday evening. Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

In nfs4_proc_getlk(), when some error causes a retry of the call to _nfs4_proc_getlk(), we can end up with Oopses of the form BUG: unable to handle kernel NULL pointer dereference at 0000000000000134 IP: [<ffffffff8165270e>] _raw_spin_lock+0xe/0x30 <snip> Call Trace: [<ffffffff812f287d>] _atomic_dec_and_lock+0x4d/0x70 [<ffffffffa053c4f2>] nfs4_put_lock_state+0x32/0xb0 [nfsv4] [<ffffffffa053c585>] nfs4_fl_release_lock+0x15/0x20 [nfsv4] [<ffffffffa0522c06>] _nfs4_proc_getlk.isra.40+0x146/0x170 [nfsv4] [<ffffffffa052ad99>] nfs4_proc_lock+0x399/0x5a0 [nfsv4] The problem is that we don't clear the request->fl_ops after the first try and so when we retry, nfs4_set_lock_state() exits early without setting the lock stateid. Regression introduced by commit 70cc648 (locks: make ->lock release private data before returning in GETLK case) Reported-by: Weston Andros Adamson <[email protected]> Reported-by: Jorge Mora <[email protected]> Signed-off-by: Trond Myklebust <[email protected]> Cc: <[email protected]> #2.6.22+

This patch activates the audio device of the Cubox. The audio flow (pin mpp_audio1) is output on both I2S and S/PDIF. The third si5351 clock (#2, pin mpp13) is used as the external clock. Signed-off-by: Jean-Francois Moine <[email protected]> Acked-by: Sebastian Hesselbarth <[email protected]> Signed-off-by: Jason Cooper <[email protected]>

Previously we coalesced windows by expanding the first overlapping one and making the second invalid. But we never look at the expanded first window again, so we fail to notice other windows that overlap it. For example, we coalesced these: [io 0x0000-0x03af] // #0 [io 0x03e0-0x0cf7] // #1 [io 0x0000-0xdfff] // #2 into these, which still overlap: [io 0x0000-0xdfff] // #0 [io 0x03e0-0x0cf7] // #1 The fix is to expand the *second* overlapping resource and ignore the first, so we get this instead with no overlaps: [io 0x0000-0xdfff] // #2 [bhelgaas: changelog] Reference: https://bugzilla.kernel.org/show_bug.cgi?id=62511 Signed-off-by: Alexey Neyman <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>

This patch adds a batch support to nfnetlink. Basically, it adds two new control messages: * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch, the nfgenmsg->res_id indicates the nfnetlink subsystem ID. * NFNL_MSG_BATCH_END, that results in the invocation of the ss->commit callback function. If not specified or an error ocurred in the batch, the ss->abort function is invoked instead. The end message represents the commit operation in nftables, the lack of end message results in an abort. This patch also adds the .call_batch function that is only called from the batch receival path. This patch adds atomic rule updates and dumps based on bitmask generations. This allows to atomically commit a set of rule-set updates incrementally without altering the internal state of existing nf_tables expressions/matches/targets. The idea consists of using a generation cursor of 1 bit and a bitmask of 2 bits per rule. Assuming the gencursor is 0, then the genmask (expressed as a bitmask) can be interpreted as: 00 active in the present, will be active in the next generation. 01 inactive in the present, will be active in the next generation. 10 active in the present, will be deleted in the next generation. ^ gencursor Once you invoke the transition to the next generation, the global gencursor is updated: 00 active in the present, will be active in the next generation. 01 active in the present, needs to zero its future, it becomes 00. 10 inactive in the present, delete now. ^ gencursor If a dump is in progress and nf_tables enters a new generation, the dump will stop and return -EBUSY to let userspace know that it has to retry again. In order to invalidate dumps, a global genctr counter is increased everytime nf_tables enters a new generation. This new operation can be used from the user-space utility that controls the firewall, eg. nft -f restore The rule updates contained in `file' will be applied atomically. cat file ----- add filter INPUT ip saddr 1.1.1.1 counter accept #1 del filter INPUT ip daddr 2.2.2.2 counter drop #2 -EOF- Note that the rule 1 will be inactive until the transition to the next generation, the rule 2 will be evicted in the next generation. There is a penalty during the rule update due to the branch misprediction in the packet matching framework. But that should be quickly resolved once the iteration over the commit list that contain rules that require updates is finished. Event notification happens once the rule-set update has been committed. So we skip notifications is case the rule-set update is aborted, which can happen in case that the rule-set is tested to apply correctly. This patch squashed the following patches from Pablo: * nf_tables: atomic rule updates and dumps * nf_tables: get rid of per rule list_head for commits * nf_tables: use per netns commit list * nfnetlink: add batch support and use it from nf_tables * nf_tables: all rule updates are transactional * nf_tables: attach replacement rule after stale one * nf_tables: do not allow deletion/replacement of stale rules * nf_tables: remove unused NFTA_RULE_FLAGS Signed-off-by: Pablo Neira Ayuso <[email protected]>

Andrey reported the following report: ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3 ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3) Accessed by thread T13003: #0 ffffffff810dd2da (asan_report_error+0x32a/0x440) #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40) #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20) #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260) #4 ffffffff812a1065 (__fput+0x155/0x360) #5 ffffffff812a12de (____fput+0x1e/0x30) #6 ffffffff8111708d (task_work_run+0x10d/0x140) #7 ffffffff810ea043 (do_exit+0x433/0x11f0) #8 ffffffff810eaee4 (do_group_exit+0x84/0x130) #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30) #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Allocated by thread T5167: #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0) #1 ffffffff8128337c (__kmalloc+0xbc/0x500) #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90) #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0) #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40) #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430) #6 ffffffff8129b668 (finish_open+0x68/0xa0) #7 ffffffff812b66ac (do_last+0xb8c/0x1710) #8 ffffffff812b7350 (path_openat+0x120/0xb50) #9 ffffffff812b8884 (do_filp_open+0x54/0xb0) #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0) #11 ffffffff8129d4b7 (SyS_open+0x37/0x50) #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Shadow bytes around the buggy address: ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap redzone: fa Heap kmalloc redzone: fb Freed heap region: fd Shadow gap: fe The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;' Although the crash happened in ftrace_regex_open() the real bug occurred in trace_get_user() where there's an incrementation to parser->idx without a check against the size. The way it is triggered is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop that reads the last character stores it and then breaks out because there is no more characters. Then the last character is read to determine what to do next, and the index is incremented without checking size. Then the caller of trace_get_user() usually nulls out the last character with a zero, but since the index is equal to the size, it writes a nul character after the allocated space, which can corrupt memory. Luckily, only root user has write access to this file. Link: http://lkml.kernel.org/r/[email protected] Reported-by: Andrey Konovalov <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>

Milan Kocian writes: I found the crash in virtio-net-rx thread (I can reproduce it every time by 'aptitude update' in VM): traps: virtio-net-rx[28933] general protection ip:7f00dda3d107 sp:7f00c58f4de8 error:0 in libc-2.17.so[7f00dd90f000+1a2000] gdb backtrace: (gdb) bt #0 0x00007fb6a548e107 in ?? () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x000000000041259c in memcpy_toiovecend (iov=0x7fb68d346ea0, iov@entry=0x7fb68d345e90, kdata=<optimized out>, kdata@entry=0x7fb68d346e90 "", offset=<optimized out>, len=<optimized out>) at util/iovec.c:70 #2 0x000000000040c66d in virtio_net_rx_thread (p=0x23688a0) at virtio/net.c:117 #3 0x00007fb6a5b2ee0e in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 #4 0x00007fb6a54489ed in clone () from /lib/x86_64-linux-gnu/libc.so.6 I tried to add some printf to diagnose it but it isn't clear to me: virtio_net_rx_thread: before memcpy_toiovecend; copied: 0, len: 18890, iovsize: 4096, realiovsize: 4096 memcpy_toiovecend: offset: 0, len: 4096 memcpy_toiovecend: iov_len: 4096, len: 4096 virtio_net_rx_thread: before memcpy_toiovecend; copied: 4096, len: 18890, iovsize: 4096, realiovsize: 4096 memcpy_toiovecend: offset: 4096, len: 4096 memcpy_toiovecend: iov_len: 4096, len: 4096 memcpy_toiovecend: iov_len: 0, len: 4096 memcpy_toiovecend: iov_len: 0, len: 4096 . N x memcpy_toiovecend: iov_len: 0, len: 4096 . memcpy_toiovecend: iov_len: 0, len: 4096 memcpy_toiovecend: iov_len: 0, len: 4096 memcpy_toiovecend: iov_len: 1519143547641528320, len: 4096 memcpy_toiovecend: iov_len: 193827583623176, len: 4096 ./runlkvm.sh: line 2: 16090 Segmentation fault IMHO problem come when received len size is bigger than maximum of the dst iovec (realiovsize). Only iovec size is copied and in the next run isn't place to copy the rest of len size. Asias He writes: We should skip copied bytes from the buffer not from the iov itself which memcpy_toiovecend does. Reported-and-tested-by: Milan Kocian <[email protected]> Signed-off-by: Asias He <[email protected]> Signed-off-by: Pekka Enberg <[email protected]>

Having them be different seems an obscure configuration. Signed-off-by: Vineet Gupta <[email protected]>

While enabling lockdep on seqlocks, I ran across the warning below caused by the ipv6 stats being updated in both irq and non-irq context. This patch changes from IP6_INC_STATS_BH to IP6_INC_STATS (suggested by Eric Dumazet) to resolve this problem. [ 11.120383] ================================= [ 11.121024] [ INFO: inconsistent lock state ] [ 11.121663] 3.12.0-rc1+ #68 Not tainted [ 11.122229] --------------------------------- [ 11.122867] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. [ 11.123741] init/4483 [HC0[0]:SC1[3]:HE1:SE0] takes: [ 11.124505] (&stats->syncp.seq#6){+.?...}, at: [<c1ab80c2>] ndisc_send_ns+0xe2/0x130 [ 11.125736] {SOFTIRQ-ON-W} state was registered at: [ 11.126447] [<c10e0eb7>] __lock_acquire+0x5c7/0x1af0 [ 11.127222] [<c10e2996>] lock_acquire+0x96/0xd0 [ 11.127925] [<c1a9a2c3>] write_seqcount_begin+0x33/0x40 [ 11.128766] [<c1a9aa03>] ip6_dst_lookup_tail+0x3a3/0x460 [ 11.129582] [<c1a9e0ce>] ip6_dst_lookup_flow+0x2e/0x80 [ 11.130014] [<c1ad18e0>] ip6_datagram_connect+0x150/0x4e0 [ 11.130014] [<c1a4d0b5>] inet_dgram_connect+0x25/0x70 [ 11.130014] [<c198dd61>] SYSC_connect+0xa1/0xc0 [ 11.130014] [<c198f571>] SyS_connect+0x11/0x20 [ 11.130014] [<c198fe6b>] SyS_socketcall+0x12b/0x300 [ 11.130014] [<c1bbf880>] syscall_call+0x7/0xb [ 11.130014] irq event stamp: 1184 [ 11.130014] hardirqs last enabled at (1184): [<c1086901>] local_bh_enable+0x71/0x110 [ 11.130014] hardirqs last disabled at (1183): [<c10868cd>] local_bh_enable+0x3d/0x110 [ 11.130014] softirqs last enabled at (0): [<c108014d>] copy_process.part.42+0x45d/0x11a0 [ 11.130014] softirqs last disabled at (1147): [<c1086e05>] irq_exit+0xa5/0xb0 [ 11.130014] [ 11.130014] other info that might help us debug this: [ 11.130014] Possible unsafe locking scenario: [ 11.130014] [ 11.130014] CPU0 [ 11.130014] ---- [ 11.130014] lock(&stats->syncp.seq#6); [ 11.130014] <Interrupt> [ 11.130014] lock(&stats->syncp.seq#6); [ 11.130014] [ 11.130014] *** DEADLOCK *** [ 11.130014] [ 11.130014] 3 locks held by init/4483: [ 11.130014] #0: (rcu_read_lock){.+.+..}, at: [<c109363c>] SyS_setpriority+0x4c/0x620 [ 11.130014] #1: (((&ifa->dad_timer))){+.-...}, at: [<c108c1c0>] call_timer_fn+0x0/0xf0 [ 11.130014] #2: (rcu_read_lock){.+.+..}, at: [<c1ab6494>] ndisc_send_skb+0x54/0x5d0 [ 11.130014] [ 11.130014] stack backtrace: [ 11.130014] CPU: 0 PID: 4483 Comm: init Not tainted 3.12.0-rc1+ #68 [ 11.130014] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 11.130014] 00000000 00000000 c55e5c10 c1bb0e71 c57128b0 c55e5c4c c1badf79 c1ec1123 [ 11.130014] c1ec1484 00001183 00000000 00000000 00000001 00000003 00000001 00000000 [ 11.130014] c1ec1484 00000004 c5712dcc 00000000 c55e5c84 c10de492 00000004 c10755f2 [ 11.130014] Call Trace: [ 11.130014] [<c1bb0e71>] dump_stack+0x4b/0x66 [ 11.130014] [<c1badf79>] print_usage_bug+0x1d3/0x1dd [ 11.130014] [<c10de492>] mark_lock+0x282/0x2f0 [ 11.130014] [<c10755f2>] ? kvm_clock_read+0x22/0x30 [ 11.130014] [<c10dd8b0>] ? check_usage_backwards+0x150/0x150 [ 11.130014] [<c10e0e74>] __lock_acquire+0x584/0x1af0 [ 11.130014] [<c10b1baf>] ? sched_clock_cpu+0xef/0x190 [ 11.130014] [<c10de58c>] ? mark_held_locks+0x8c/0xf0 [ 11.130014] [<c10e2996>] lock_acquire+0x96/0xd0 [ 11.130014] [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c1ab66d3>] ndisc_send_skb+0x293/0x5d0 [ 11.130014] [<c1ab80c2>] ? ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c1ab80c2>] ndisc_send_ns+0xe2/0x130 [ 11.130014] [<c108cc32>] ? mod_timer+0xf2/0x160 [ 11.130014] [<c1aa706e>] ? addrconf_dad_timer+0xce/0x150 [ 11.130014] [<c1aa70aa>] addrconf_dad_timer+0x10a/0x150 [ 11.130014] [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [<c108c233>] call_timer_fn+0x73/0xf0 [ 11.130014] [<c108c1c0>] ? __internal_add_timer+0xb0/0xb0 [ 11.130014] [<c1aa6fa0>] ? addrconf_dad_completed+0x1c0/0x1c0 [ 11.130014] [<c108c5b1>] run_timer_softirq+0x141/0x1e0 [ 11.130014] [<c1086b20>] ? __do_softirq+0x70/0x1b0 [ 11.130014] [<c1086b70>] __do_softirq+0xc0/0x1b0 [ 11.130014] [<c1086e05>] irq_exit+0xa5/0xb0 [ 11.130014] [<c106cfd5>] smp_apic_timer_interrupt+0x35/0x50 [ 11.130014] [<c1bbfbca>] apic_timer_interrupt+0x32/0x38 [ 11.130014] [<c10936ed>] ? SyS_setpriority+0xfd/0x620 [ 11.130014] [<c10e26c9>] ? lock_release+0x9/0x240 [ 11.130014] [<c10936d7>] ? SyS_setpriority+0xe7/0x620 [ 11.130014] [<c1bbee6d>] ? _raw_read_unlock+0x1d/0x30 [ 11.130014] [<c1093701>] SyS_setpriority+0x111/0x620 [ 11.130014] [<c109363c>] ? SyS_setpriority+0x4c/0x620 [ 11.130014] [<c1bbf880>] syscall_call+0x7/0xb Signed-off-by: John Stultz <[email protected]> Acked-by: Eric Dumazet <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: Alexey Kuznetsov <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Hideaki YOSHIFUJI <[email protected]> Cc: James Morris <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Patrick McHardy <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>

Erik Hugne says: ==================== tipc: message reassembly using fragment chain We introduce a new reassembly algorithm that improves performance and eliminates the risk of causing out-of-memory situations. v3: -Use skb_try_coalesce, and revert to fraglist if this does not succeed. -Make sure reassembly list head is uncloned. v2: -Rebased on Ying's indentation fix. -Node unlock call in msg_fragmenter case moved from patch #2 to #1. ('continue' with this lock held would cause spinlock recursion if only patch #1 is used) ==================== Signed-off-by: David S. Miller <[email protected]>

…ux/kernel/git/tip/tip Pull x86 boot changes from Ingo Molnar: "Two changes that prettify and compactify the SMP bootup output from: smpboot: Booting Node 0, Processors #1 #2 #3 OK smpboot: Booting Node 1, Processors #4 #5 #6 #7 OK smpboot: Booting Node 2, Processors #8 #9 #10 #11 OK smpboot: Booting Node 3, Processors #12 #13 #14 #15 OK Brought up 16 CPUs to something like: x86: Booting SMP configuration: .... node #0, CPUs: #1 #2 #3 .... node #1, CPUs: #4 #5 #6 #7 .... node #2, CPUs: #8 #9 #10 #11 .... node #3, CPUs: #12 #13 #14 #15 x86: Booted up 4 nodes, 16 CPUs" * 'x86-boot-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/boot: Further compress CPUs bootup message x86: Improve the printout of the SMP bootup CPU table

Nathan Zimmer found that once we get over 10+ cpus, the scalability of SPECjbb falls over due to the contention on the global 'epmutex', which is taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations. Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path by using rcu to guard against any concurrent traversals. Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for simple topologies. IE when adding a link from an epoll file descriptor to a wakeup source, where the epoll file descriptor is not nested. This patch (of 2): Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by converting the file->f_ep_links list into an rcu one. In this way, we can traverse the epoll network on the add path in parallel with deletes. Since deletes can't create loops or worse wakeup paths, this is safe. This patch in combination with the patch "epoll: Do not take global 'epmutex' for simple topologies", shows a dramatic performance improvement in scalability for SPECjbb. Signed-off-by: Jason Baron <[email protected]> Tested-by: Nathan Zimmer <[email protected]> Cc: Eric Wong <[email protected]> Cc: Nelson Elhage <[email protected]> Cc: Al Viro <[email protected]> Cc: Davide Libenzi <[email protected]> Cc: "Paul E. McKenney" <[email protected]> CC: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>

Now that seqcounts are lockdep enabled objects, we need to explicitly initialize runtime allocated seqcounts so that lockdep can track them. Without this patch, Fengguang was seeing: [ 4.127282] INFO: trying to register non-static key. [ 4.128027] the code is fine but needs lockdep annotation. [ 4.128027] turning off the locking correctness validator. [ 4.128027] CPU: 0 PID: 96 Comm: kworker/u4:1 Not tainted 3.12.0-next-20131108-10601-gbad570d #2 [ 4.128027] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ ... ] [ 4.128027] Call Trace: [ 4.128027] [<7908e744>] ? console_unlock+0x353/0x380 [ 4.128027] [<79dc7cf2>] dump_stack+0x48/0x60 [ 4.128027] [<7908953e>] __lock_acquire.isra.26+0x7e3/0xceb [ 4.128027] [<7908a1c5>] lock_acquire+0x71/0x9a [ 4.128027] [<794079aa>] ? blk_throtl_bio+0x1c3/0x485 [ 4.128027] [<7940658b>] throtl_update_dispatch_stats+0x7c/0x153 [ 4.128027] [<794079aa>] ? blk_throtl_bio+0x1c3/0x485 [ 4.128027] [<794079aa>] blk_throtl_bio+0x1c3/0x485 ... Use u64_stats_init() for all affected data structures, which initializes the seqcount. Reported-and-Tested-by: Fengguang Wu <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Jens Axboe <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> [ Folded in another fix from the mailing list as well as a fix to that fix. Tweaked commit message. ] Signed-off-by: John Stultz <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ So I actually think that the two SOBs from PeterZ are the right depiction of the patch route. ] Signed-off-by: Ingo Molnar <[email protected]>

The commit 94a86df seem to have uncovered a long standing bug that did not trigger so far. BUG: unable to handle kernel paging request at 00000009dd503502 IP: [<ffffffff815b1868>] rfcomm_sock_getsockopt+0x128/0x200 PGD 0 Oops: 0000 [#1] SMP Modules linked in: ath5k ath mac80211 cfg80211 CPU: 2 PID: 1459 Comm: bluetoothd Not tainted 3.11.0-133163-gcebd830 #2 Hardware name: System manufacturer System Product Name/P6T DELUXE V2, BIOS 1202 12/22/2010 task: ffff8803304106a0 ti: ffff88033046a000 task.ti: ffff88033046a000 RIP: 0010:[<ffffffff815b1868>] [<ffffffff815b1868>] rfcomm_sock_getsockopt+0x128/0x200 RSP: 0018:ffff88033046bed8 EFLAGS: 00010246 RAX: 00000009dd503502 RBX: 0000000000000003 RCX: 00007fffa2ed5548 RDX: 0000000000000003 RSI: 0000000000000012 RDI: ffff88032fd37480 RBP: ffff88033046bf28 R08: 00007fffa2ed554c R09: ffff88032f5707d8 R10: 00007fffa2ed5548 R11: 0000000000000202 R12: ffff880330bbd000 R13: 00007fffa2ed5548 R14: 0000000000000003 R15: 00007fffa2ed554c FS: 00007fc44cfac700(0000) GS:ffff88033fc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000009dd503502 CR3: 00000003304c2000 CR4: 00000000000007e0 Stack: ffff88033046bf28 ffffffff815b0f2f ffff88033046bf18 0002ffff81105ef6 0000000600000000 ffff88032fd37480 0000000000000012 00007fffa2ed5548 0000000000000003 00007fffa2ed554c ffff88033046bf78 ffffffff814c0380 Call Trace: [<ffffffff815b0f2f>] ? rfcomm_sock_setsockopt+0x5f/0x190 [<ffffffff814c0380>] SyS_getsockopt+0x60/0xb0 [<ffffffff815e0852>] system_call_fastpath+0x16/0x1b Code: 02 00 00 00 0f 47 d0 4c 89 ef e8 74 13 cd ff 83 f8 01 19 c9 f7 d1 83 e1 f2 e9 4b ff ff ff 0f 1f 44 00 00 49 8b 84 24 70 02 00 00 <4c> 8b 30 4c 89 c0 e8 2d 19 cd ff 85 c0 49 89 d7 b9 f2 ff ff ff RIP [<ffffffff815b1868>] rfcomm_sock_getsockopt+0x128/0x200 RSP <ffff88033046bed8> CR2: 00000009dd503502 It triggers in the following segment of the code: 0x1313 is in rfcomm_sock_getsockopt (net/bluetooth/rfcomm/sock.c:743). 738 739 static int rfcomm_sock_getsockopt_old(struct socket *sock, int optname, char __user *optval, int __user *optlen) 740 { 741 struct sock *sk = sock->sk; 742 struct rfcomm_conninfo cinfo; 743 struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn; 744 int len, err = 0; 745 u32 opt; 746 747 BT_DBG("sk %p", sk); The l2cap_pi(sk) is wrong here since it should have been rfcomm_pi(sk), but that socket of course does not contain the low-level connection details requested here. Tracking down the actual offending commit, it seems that this has been introduced when doing some L2CAP refactoring: commit 8c1d787 Author: Gustavo F. Padovan <[email protected]> Date: Wed Apr 13 20:23:55 2011 -0300 @@ -743,6 +743,7 @@ static int rfcomm_sock_getsockopt_old(struct socket *sock, int optname, char __u struct sock *sk = sock->sk; struct sock *l2cap_sk; struct rfcomm_conninfo cinfo; + struct l2cap_conn *conn = l2cap_pi(sk)->chan->conn; int len, err = 0; u32 opt; @@ -787,8 +788,8 @@ static int rfcomm_sock_getsockopt_old(struct socket *sock, int optname, char __u l2cap_sk = rfcomm_pi(sk)->dlc->session->sock->sk; - cinfo.hci_handle = l2cap_pi(l2cap_sk)->conn->hcon->handle; - memcpy(cinfo.dev_class, l2cap_pi(l2cap_sk)->conn->hcon->dev_class, 3); + cinfo.hci_handle = conn->hcon->handle; + memcpy(cinfo.dev_class, conn->hcon->dev_class, 3); The l2cap_sk got accidentally mixed into the sk (which is RFCOMM) and now causing a problem within getsocketopt() system call. To fix this, just re-introduce l2cap_sk and make sure the right socket is used for the low-level connection details. Reported-by: Fabio Rossi <[email protected]> Reported-by: Janusz Dziedzic <[email protected]> Tested-by: Janusz Dziedzic <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Johan Hedberg <[email protected]>

If an TRACE_EVENT() uses __assign_str() or __get_str on a NULL pointer then the following oops will happen: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<c127a17b>] strlen+0x10/0x1a *pde = 00000000 ^M Oops: 0000 [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.13.0-rc1-test+ #2 Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006^M task: f5cde9f0 ti: f5e5e000 task.ti: f5e5e000 EIP: 0060:[<c127a17b>] EFLAGS: 00210046 CPU: 1 EIP is at strlen+0x10/0x1a EAX: 00000000 EBX: c2472da8 ECX: ffffffff EDX: c2472da8 ESI: c1c5e5fc EDI: 00000000 EBP: f5e5fe84 ESP: f5e5fe80 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000000 CR3: 01f32000 CR4: 000007d0 Stack: f5f18b90 f5e5feb8 c10687a8 0759004f 00000005 00000005 00000005 00200046 00000002 00000000 c1082a93 f56c7e28 c2472da8 c1082a93 f5e5fee4 c106bc61^M 00000000 c1082a93 00000000 00000000 00000001 00200046 00200082 00000000 Call Trace: [<c10687a8>] ftrace_raw_event_lock+0x39/0xc0 [<c1082a93>] ? ktime_get+0x29/0x69 [<c1082a93>] ? ktime_get+0x29/0x69 [<c106bc61>] lock_release+0x57/0x1a5 [<c1082a93>] ? ktime_get+0x29/0x69 [<c10824dd>] read_seqcount_begin.constprop.7+0x4d/0x75 [<c1082a93>] ? ktime_get+0x29/0x69^M [<c1082a93>] ktime_get+0x29/0x69 [<c108a46a>] __tick_nohz_idle_enter+0x1e/0x426 [<c10690e8>] ? lock_release_holdtime.part.19+0x48/0x4d [<c10bc184>] ? time_hardirqs_off+0xe/0x28 [<c1068c82>] ? trace_hardirqs_off_caller+0x3f/0xaf [<c108a8cb>] tick_nohz_idle_enter+0x59/0x62 [<c1079242>] cpu_startup_entry+0x64/0x192 [<c102299c>] start_secondary+0x277/0x27c Code: 90 89 c6 89 d0 88 c4 ac 38 e0 74 09 84 c0 75 f7 be 01 00 00 00 89 f0 48 5e 5d c3 55 89 e5 57 66 66 66 66 90 83 c9 ff 89 c7 31 c0 <f2> ae f7 d1 8d 41 ff 5f 5d c3 55 89 e5 57 66 66 66 66 90 31 ff EIP: [<c127a17b>] strlen+0x10/0x1a SS:ESP 0068:f5e5fe80 CR2: 0000000000000000 ---[ end trace 01bc47bf519ec1b2 ]--- New tracepoints have been added that have allowed for NULL pointers being assigned to strings. To fix this, change the TRACE_EVENT() code to check for NULL and if it is, it will assign "(null)" to it instead (similar to what glibc printf does). Reported-by: Shuah Khan <[email protected]> Reported-by: Jovi Zhangwei <[email protected]> Link: http://lkml.kernel.org/r/CAGdX0WFeEuy+DtpsJzyzn0343qEEjLX97+o1VREFkUEhndC+5Q@mail.gmail.com Link: http://lkml.kernel.org/r/[email protected] Fixes: 9cbf117 ("tracing/events: provide string with undefined size support") Cc: [email protected] # 2.6.31+ Signed-off-by: Steven Rostedt <[email protected]>

The following two commits implemented mmap support in the regular file path and merged bin file support into the regular path. 73d9714 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c") 3124eb1 ("sysfs: merge regular and bin file handling") After the merge, the following commands trigger a spurious lockdep warning. "test-mmap-read" simply mmaps the file and dumps the content. $ cat /sys/block/sda/trace/act_mask $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096 ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0-work+ #378 Not tainted ------------------------------------------------------- test-mmap-read/567 is trying to acquire lock: (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120 but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: ... -> #2 (sr_mutex){+.+.+.}: ... -> #1 (&bdev->bd_mutex){+.+.+.}: ... -> #0 (&of->mutex){+.+.+.}: ... other info that might help us debug this: Chain exists of: &of->mutex --> sr_mutex --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(sr_mutex); lock(&mm->mmap_sem); lock(&of->mutex); *** DEADLOCK *** 1 lock held by test-mmap-read/567: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0 stack backtrace: CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80 ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208 0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0 Call Trace: [<ffffffff81611ad2>] dump_stack+0x4e/0x7a [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60 [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0 [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0 [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120 [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0 [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0 [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0 [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250 [<ffffffff81008282>] SyS_mmap+0x22/0x30 [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b This happens because one file nests sr_mutex, which nests mm->mmap_sem under it, under of->mutex while mmap implementation naturally nests of->mutex under mm->mmap_sem. The warning is false positive as of->mutex is per open-file and the two paths belong to two different files. This warning didn't trigger before regular and bin file supports were merged because only bin file supported mmap and the other side of locking happened only on regular files which used equivalent but separate locking. It'd be best if we give separate locking classes per file but we can't easily do that. Let's differentiate on ->mmap() for now. Later we'll add explicit file operations struct and can add per-ops lockdep key there. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Dave Jones <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

ttyA has ld associated to n_gsm, when ttyA is closing, it triggers to release gsmttyB's ld data dlci[B], then race would happen if gsmttyB is opening in parallel. Here are race cases we found recently in test: CASE #1 ==================================================================== releasing dlci[B] race with gsmtty_install(gsmttyB), then panic in gsmtty_open(gsmttyB), as below: tty_release(ttyA) tty_open(gsmttyB) | | ----- gsmtty_install(gsmttyB) | | ----- gsm_dlci_alloc(gsmttyB) => alloc dlci[B] tty_ldisc_release(ttyA) ----- | | gsm_dlci_release(dlci[B]) ----- | | gsm_dlci_free(dlci[B]) ----- | | ----- gsmtty_open(gsmttyB) gsmtty_open() { struct gsm_dlci *dlci = tty->driver_data; => here it uses dlci[B] ... } In gsmtty_open(gsmttyA), it uses dlci[B] which was release, so hit a panic. ===================================================================== CASE #2 ===================================================================== releasing dlci[0] race with gsmtty_install(gsmttyB), then panic in gsmtty_open(), as below: tty_release(ttyA) tty_open(gsmttyB) | | ----- gsmtty_install(gsmttyB) | | ----- gsm_dlci_alloc(gsmttyB) => alloc dlci[B] | | ----- gsmtty_open(gsmttyB) fail | | ----- tty_release(gsmttyB) | | ----- gsmtty_close(gsmttyB) | | ----- gsmtty_detach_dlci(dlci[B]) | | ----- dlci_put(dlci[B]) | | tty_ldisc_release(ttyA) ----- | | gsm_dlci_release(dlci[0]) ----- | | gsm_dlci_free(dlci[0]) ----- | | ----- dlci_put(dlci[0]) In gsmtty_detach_dlci(dlci[B]), it tries to use dlci[0] which was released, then hit panic. ===================================================================== IMHO, n_gsm tty operations would refer released ldisc, as long as gsm_dlci_release() has chance to release ldisc data when some gsmtty operations are not completed.. This patch is try to avoid it by: 1) in n_gsm driver, use a global gsm spin lock to avoid gsm_dlci_release() run in parallel with gsmtty_install(); 2) Increase dlci's ref count in gsmtty_install() instead of in gsmtty_open(), the purpose is to prevent gsm_dlci_release() releasing dlci after gsmtty_install() allocats dlci but before gsmtty_open increases dlci's ref count; 3) Decrease dlci's ref count in gsmtty_remove(), which is a tty framework api, and this is the opposite process of step 2). Signed-off-by: Chao Bi <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>

The patch fixes the following lockdep warning, which is 100% reproducible on network restart: ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0+ #47 Tainted: GF ------------------------------------------------------- kworker/1:1/27 is trying to acquire lock: ((&(&adapter->watchdog_task)->work)){+.+...}, at: [<ffffffff8108a5b0>] flush_work+0x0/0x70 but task is already holding lock: (&adapter->mutex){+.+...}, at: [<ffffffffa0177c0a>] e1000_reset_task+0x4a/0xa0 [e1000] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&adapter->mutex){+.+...}: [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120 [<ffffffff816b8cbc>] mutex_lock_nested+0x4c/0x390 [<ffffffffa017233d>] e1000_watchdog+0x7d/0x5b0 [e1000] [<ffffffff8108b972>] process_one_work+0x1d2/0x510 [<ffffffff8108ca80>] worker_thread+0x120/0x3a0 [<ffffffff81092c1e>] kthread+0xee/0x110 [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0 -> #0 ((&(&adapter->watchdog_task)->work)){+.+...}: [<ffffffff810bd9c0>] __lock_acquire+0x1710/0x1810 [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120 [<ffffffff8108a5eb>] flush_work+0x3b/0x70 [<ffffffff8108b5d8>] __cancel_work_timer+0x98/0x140 [<ffffffff8108b693>] cancel_delayed_work_sync+0x13/0x20 [<ffffffffa0170cec>] e1000_down_and_stop+0x3c/0x60 [e1000] [<ffffffffa01775b1>] e1000_down+0x131/0x220 [e1000] [<ffffffffa0177c12>] e1000_reset_task+0x52/0xa0 [e1000] [<ffffffff8108b972>] process_one_work+0x1d2/0x510 [<ffffffff8108ca80>] worker_thread+0x120/0x3a0 [<ffffffff81092c1e>] kthread+0xee/0x110 [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&adapter->mutex); lock((&(&adapter->watchdog_task)->work)); lock(&adapter->mutex); lock((&(&adapter->watchdog_task)->work)); *** DEADLOCK *** 3 locks held by kworker/1:1/27: #0: (events){.+.+.+}, at: [<ffffffff8108b906>] process_one_work+0x166/0x510 #1: ((&adapter->reset_task)){+.+...}, at: [<ffffffff8108b906>] process_one_work+0x166/0x510 #2: (&adapter->mutex){+.+...}, at: [<ffffffffa0177c0a>] e1000_reset_task+0x4a/0xa0 [e1000] stack backtrace: CPU: 1 PID: 27 Comm: kworker/1:1 Tainted: GF 3.12.0+ #47 Hardware name: System manufacturer System Product Name/P5B-VM SE, BIOS 0501 05/31/2007 Workqueue: events e1000_reset_task [e1000] ffffffff820f6000 ffff88007b9dba98 ffffffff816b54a2 0000000000000002 ffffffff820f5e50 ffff88007b9dbae8 ffffffff810ba936 ffff88007b9dbac8 ffff88007b9dbb48 ffff88007b9d8f00 ffff88007b9d8780 ffff88007b9d8f00 Call Trace: [<ffffffff816b54a2>] dump_stack+0x49/0x5f [<ffffffff810ba936>] print_circular_bug+0x216/0x310 [<ffffffff810bd9c0>] __lock_acquire+0x1710/0x1810 [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250 [<ffffffff810bdb5d>] lock_acquire+0x9d/0x120 [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250 [<ffffffff8108a5eb>] flush_work+0x3b/0x70 [<ffffffff8108a5b0>] ? __flush_work+0x250/0x250 [<ffffffff8108b5d8>] __cancel_work_timer+0x98/0x140 [<ffffffff8108b693>] cancel_delayed_work_sync+0x13/0x20 [<ffffffffa0170cec>] e1000_down_and_stop+0x3c/0x60 [e1000] [<ffffffffa01775b1>] e1000_down+0x131/0x220 [e1000] [<ffffffffa0177c12>] e1000_reset_task+0x52/0xa0 [e1000] [<ffffffff8108b972>] process_one_work+0x1d2/0x510 [<ffffffff8108b906>] ? process_one_work+0x166/0x510 [<ffffffff8108ca80>] worker_thread+0x120/0x3a0 [<ffffffff8108c960>] ? manage_workers+0x2c0/0x2c0 [<ffffffff81092c1e>] kthread+0xee/0x110 [<ffffffff81092b30>] ? __init_kthread_worker+0x70/0x70 [<ffffffff816c3d7c>] ret_from_fork+0x7c/0xb0 [<ffffffff81092b30>] ? __init_kthread_worker+0x70/0x70 == The issue background == The problem occurs, because e1000_down(), which is called under adapter->mutex by e1000_reset_task(), tries to synchronously cancel e1000 auxiliary works (reset_task, watchdog_task, phy_info_task, fifo_stall_task), which take adapter->mutex in their handlers. So the question is what does adapter->mutex protect there? The adapter->mutex was introduced by commit 0ef4ee ("e1000: convert to private mutex from rtnl") as a replacement for rtnl_lock() taken in the asynchronous handlers. It targeted on fixing a similar lockdep warning issued when e1000_down() was called under rtnl_lock(), and it fixed it, but unfortunately it introduced the lockdep warning described above. Anyway, that said the source of this bug is that the asynchronous works were made to take rtnl_lock() some time ago, so let's look deeper and find why it was added there. The rtnl_lock() was added to asynchronous handlers by commit 338c15 ("e1000: fix occasional panic on unload") in order to prevent asynchronous handlers from execution after the module is unloaded (e1000_down() is called) as it follows from the comment to the commit: > Net drivers in general have an issue where timers fired > by mod_timer or work threads with schedule_work are running > outside of the rtnl_lock. > > With no other lock protection these routines are vulnerable > to races with driver unload or reset paths. > > The longer term solution to this might be a redesign with > safer locks being taken in the driver to guarantee no > reentrance, but for now a safe and effective fix is > to take the rtnl_lock in these routines. I'm not sure if this locking scheme fixed the problem or just made it unlikely, although I incline to the latter. Anyway, this was long time ago when e1000 auxiliary works were implemented as timers scheduling real work handlers in their routines. The e1000_down() function only canceled the timers, but left the real handlers running if they were running, which could result in work execution after module unload. Today, the e1000 driver uses sane delayed works instead of the pair timer+work to implement its delayed asynchronous handlers, and the e1000_down() synchronously cancels all the works so that the problem that commit 338c15 tried to cope with disappeared, and we don't need any locks in the handlers any more. Moreover, any locking there can potentially result in a deadlock. So, this patch reverts commits 0ef4ee and 338c15. Fixes: 0ef4eed ("e1000: convert to private mutex from rtnl") Fixes: 338c15e ("e1000: fix occasional panic on unload") Cc: Tushar Dave <[email protected]> Cc: Patrick McHardy <[email protected]> Signed-off-by: Vladimir Davydov <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>

…is completed Currently, when mounting pstore file system, a read callback of efi_pstore driver runs mutiple times as below. - In the first read callback, scan efivar_sysfs_list from head and pass a kmsg buffer of a entry to an upper pstore layer. - In the second read callback, rescan efivar_sysfs_list from the entry and pass another kmsg buffer to it. - Repeat the scan and pass until the end of efivar_sysfs_list. In this process, an entry is read across the multiple read function calls. To avoid race between the read and erasion, the whole process above is protected by a spinlock, holding in open() and releasing in close(). At the same time, kmemdup() is called to pass the buffer to pstore filesystem during it. And then, it causes a following lockdep warning. To make the dynamic memory allocation runnable without taking spinlock, holding off a deletion of sysfs entry if it happens while scanning it via efi_pstore, and deleting it after the scan is completed. To implement it, this patch introduces two flags, scanning and deleting, to efivar_entry. On the code basis, it seems that all the scanning and deleting logic is not needed because __efivars->lock are not dropped when reading from the EFI variable store. But, the scanning and deleting logic is still needed because an efi-pstore and a pstore filesystem works as follows. In case an entry(A) is found, the pointer is saved to psi->data. And efi_pstore_read() passes the entry(A) to a pstore filesystem by releasing __efivars->lock. And then, the pstore filesystem calls efi_pstore_read() again and the same entry(A), which is saved to psi->data, is used for resuming to scan a sysfs-list. So, to protect the entry(A), the logic is needed. [ 1.143710] ------------[ cut here ]------------ [ 1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110() [ 1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags)) [ 1.144058] Modules linked in: [ 1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2 [ 1.144058] 0000000000000009 ffff8800797e9ae0 ffffffff816614a5 ffff8800797e9b28 [ 1.144058] ffff8800797e9b18 ffffffff8105510d 0000000000000080 0000000000000046 [ 1.144058] 00000000000000d0 00000000000003af ffffffff81ccd0c0 ffff8800797e9b78 [ 1.144058] Call Trace: [ 1.144058] [<ffffffff816614a5>] dump_stack+0x54/0x74 [ 1.144058] [<ffffffff8105510d>] warn_slowpath_common+0x7d/0xa0 [ 1.144058] [<ffffffff8105517c>] warn_slowpath_fmt+0x4c/0x50 [ 1.144058] [<ffffffff8131290f>] ? vsscanf+0x57f/0x7b0 [ 1.144058] [<ffffffff810bbd74>] lockdep_trace_alloc+0x104/0x110 [ 1.144058] [<ffffffff81192da0>] __kmalloc_track_caller+0x50/0x280 [ 1.144058] [<ffffffff815147bb>] ? efi_pstore_read_func.part.1+0x12b/0x170 [ 1.144058] [<ffffffff8115b260>] kmemdup+0x20/0x50 [ 1.144058] [<ffffffff815147bb>] efi_pstore_read_func.part.1+0x12b/0x170 [ 1.144058] [<ffffffff81514800>] ? efi_pstore_read_func.part.1+0x170/0x170 [ 1.144058] [<ffffffff815148b4>] efi_pstore_read_func+0xb4/0xe0 [ 1.144058] [<ffffffff81512b7b>] __efivar_entry_iter+0xfb/0x120 [ 1.144058] [<ffffffff8151428f>] efi_pstore_read+0x3f/0x50 [ 1.144058] [<ffffffff8128d7ba>] pstore_get_records+0x9a/0x150 [ 1.158207] [<ffffffff812af25c>] ? selinux_d_instantiate+0x1c/0x20 [ 1.158207] [<ffffffff8128ce30>] ? parse_options+0x80/0x80 [ 1.158207] [<ffffffff8128ced5>] pstore_fill_super+0xa5/0xc0 [ 1.158207] [<ffffffff811ae7d2>] mount_single+0xa2/0xd0 [ 1.158207] [<ffffffff8128ccf8>] pstore_mount+0x18/0x20 [ 1.158207] [<ffffffff811ae8b9>] mount_fs+0x39/0x1b0 [ 1.158207] [<ffffffff81160550>] ? __alloc_percpu+0x10/0x20 [ 1.158207] [<ffffffff811c9493>] vfs_kern_mount+0x63/0xf0 [ 1.158207] [<ffffffff811cbb0e>] do_mount+0x23e/0xa20 [ 1.158207] [<ffffffff8115b51b>] ? strndup_user+0x4b/0xf0 [ 1.158207] [<ffffffff811cc373>] SyS_mount+0x83/0xc0 [ 1.158207] [<ffffffff81673cc2>] system_call_fastpath+0x16/0x1b [ 1.158207] ---[ end trace 61981bc62de9f6f4 ]--- Signed-off-by: Seiji Aguchi <[email protected]> Tested-by: Madper Xie <[email protected]> Cc: [email protected] Signed-off-by: Matt Fleming <[email protected]>

jlelli pushed a commit that referenced this issue Nov 21, 2013

ARC: cacheflush refactor #2: I and D caches lines to have same size

63d2dfd

Having them be different seems an obscure configuration. Signed-off-by: Vineet Gupta <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link "read more" nella pagina principale #2

Link "read more" nella pagina principale #2

claudioscordino commented May 18, 2012

Link "read more" nella pagina principale #2

Link "read more" nella pagina principale #2

Comments

claudioscordino commented May 18, 2012