Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux v4.10 merge #335

Merged
merged 10,000 commits into from
Feb 22, 2017
Merged

Linux v4.10 merge #335

merged 10,000 commits into from
Feb 22, 2017

Conversation

thehajime
Copy link
Member

@thehajime thehajime commented Feb 20, 2017

Cc: @liuyuan10 @tavip (for cpu_idle_loop() usage)


This change is Reviewable

toshikani and others added 30 commits February 3, 2017 14:13
Patch series "fix a kernel oops when reading sysfs valid_zones", v2.

A sysfs memory file is created for each 2GiB memory block on x86-64 when
the system has 64GiB or more memory.  [1] When the start address of a
memory block is not backed by struct page, i.e.  a memory range is not
aligned by 2GiB, reading its 'valid_zones' attribute file leads to a
kernel oops.  This issue was observed on multiple x86-64 systems with
more than 64GiB of memory.  This patch-set fixes this issue.

Patch 1 first fixes an issue in test_pages_in_a_zone(), which does not
test the start section.

Patch 2 then fixes the kernel oops by extending test_pages_in_a_zone()
to return valid [start, end).

Note for stable kernels: The memory block size change was made by commit
bdee237 ("x86: mm: Use 2GB memory block size on large-memory x86-64
systems"), which was accepted to 3.9.  However, this patch-set depends
on (and fixes) the change to test_pages_in_a_zone() made by commit
5f0f288 ("mm/memory_hotplug.c: check for missing sections in
test_pages_in_a_zone()"), which was accepted to 4.4.

So, I recommend that we backport it up to 4.4.

[1] 'Commit bdee237 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

This patch (of 2):

test_pages_in_a_zone() does not check 'start_pfn' when it is aligned by
section since 'sec_end_pfn' is set equal to 'pfn'.  Since this function
is called for testing the range of a sysfs memory file, 'start_pfn' is
always aligned by section.

Fix it by properly setting 'sec_end_pfn' to the next section pfn.

Also make sure that this function returns 1 only when the range belongs
to a zone.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Toshi Kani <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Reza Arbab <[email protected]>
Cc: Greg KH <[email protected]>
Cc: <[email protected]>	[4.4+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Reading a sysfs "memoryN/valid_zones" file leads to the following oops
when the first page of a range is not backed by struct page.
show_valid_zones() assumes that 'start_pfn' is always valid for
page_zone().

 BUG: unable to handle kernel paging request at ffffea017a000000
 IP: show_valid_zones+0x6f/0x160

This issue may happen on x86-64 systems with 64GiB or more memory since
their memory block size is bumped up to 2GiB.  [1] An example of such
systems is desribed below.  0x3240000000 is only aligned by 1GiB and
this memory block starts from 0x3200000000, which is not backed by
struct page.

 BIOS-e820: [mem 0x0000003240000000-0x000000603fffffff] usable

Since test_pages_in_a_zone() already checks holes, fix this issue by
extending this function to return 'valid_start' and 'valid_end' for a
given range.  show_valid_zones() then proceeds with the valid range.

[1] 'Commit bdee237 ("x86: mm: Use 2GB memory block size on
    large-memory x86-64 systems")'

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Toshi Kani <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Zhang Zhen <[email protected]>
Cc: Reza Arbab <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: <[email protected]>	[4.4+]

Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Tetsuo has noticed that an OOM stress test which performs large write
requests can cause the full memory reserves depletion.  He has tracked
this down to the following path

	__alloc_pages_nodemask+0x436/0x4d0
	alloc_pages_current+0x97/0x1b0
	__page_cache_alloc+0x15d/0x1a0          mm/filemap.c:728
	pagecache_get_page+0x5a/0x2b0           mm/filemap.c:1331
	grab_cache_page_write_begin+0x23/0x40   mm/filemap.c:2773
	iomap_write_begin+0x50/0xd0             fs/iomap.c:118
	iomap_write_actor+0xb5/0x1a0            fs/iomap.c:190
	? iomap_write_end+0x80/0x80             fs/iomap.c:150
	iomap_apply+0xb3/0x130                  fs/iomap.c:79
	iomap_file_buffered_write+0x68/0xa0     fs/iomap.c:243
	? iomap_write_end+0x80/0x80
	xfs_file_buffered_aio_write+0x132/0x390 [xfs]
	? remove_wait_queue+0x59/0x60
	xfs_file_write_iter+0x90/0x130 [xfs]
	__vfs_write+0xe5/0x140
	vfs_write+0xc7/0x1f0
	? syscall_trace_enter+0x1d0/0x380
	SyS_write+0x58/0xc0
	do_syscall_64+0x6c/0x200
	entry_SYSCALL64_slow_path+0x25/0x25

the oom victim has access to all memory reserves to make a forward
progress to exit easier.  But iomap_file_buffered_write and other
callers of iomap_apply loop to complete the full request.  We need to
check for fatal signals and back off with a short write instead.

As the iomap_apply delegates all the work down to the actor we have to
hook into those.  All callers that work with the page cache are calling
iomap_write_begin so we will check for signals there.  dax_iomap_actor
has to handle the situation explicitly because it copies data to the
userspace directly.  Other callers like iomap_page_mkwrite work on a
single page or iomap_fiemap_actor do not allocate memory based on the
given len.

Fixes: 68a9f5e ("xfs: implement iomap based buffered write path")
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Tetsuo Handa <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Al Viro <[email protected]>
Cc: <[email protected]>	[4.8+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
do_generic_file_read() can be told to perform a large request from
userspace.  If the system is under OOM and the reading task is the OOM
victim then it has an access to memory reserves and finishing the full
request can lead to the full memory depletion which is dangerous.  Make
sure we rather go with a short read and allow the killed task to
terminate.

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Al Viro <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Merge fixes from Andrew Morton:
 "8 fixes"

* emailed patches from Andrew Morton <[email protected]>:
  mm, fs: check for fatal signals in do_generic_file_read()
  fs: break out of iomap_file_buffered_write on fatal signals
  base/memory, hotplug: fix a kernel oops in show_valid_zones()
  mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone()
  jump label: pass kbuild_cflags when checking for asm goto support
  shmem: fix sleeping from atomic context
  kasan: respect /proc/sys/kernel/traceoff_on_warning
  zswap: disable changing params if init fails
Some Kabylake desktop processors may not reach max turbo when running in
HWP mode, even if running under sustained 100% utilization.

This occurs when the HWP.EPP (Energy Performance Preference) is set to
"balance_power" (0x80) -- the default on most systems.

It occurs because the platform BIOS may erroneously enable an
energy-efficiency setting -- MSR_IA32_POWER_CTL BIT-EE, which is not
recommended to be enabled on this SKU.

On the failing systems, this BIOS issue was not discovered when the
desktop motherboard was tested with Windows, because the BIOS also
neglects to provide the ACPI/CPPC table, that Windows requires to enable
HWP, and so Windows runs in legacy P-state mode, where this setting has
no effect.

Linux' intel_pstate driver does not require ACPI/CPPC to enable HWP, and
so it runs in HWP mode, exposing this incorrect BIOS configuration.

There are several ways to address this problem.

First, Linux can also run in legacy P-state mode on this system.
As intel_pstate is how Linux enables HWP, booting with
"intel_pstate=disable"
will run in acpi-cpufreq/ondemand legacy p-state mode.

Or second, the "performance" governor can be used with intel_pstate,
which will modify HWP.EPP to 0.

Or third, starting in 4.10, the
/sys/devices/system/cpu/cpufreq/policy*/energy_performance_preference
attribute in can be updated from "balance_power" to "performance".

Or fourth, apply this patch, which fixes the erroneous setting of
MSR_IA32_POWER_CTL BIT_EE on this model, allowing the default
configuration to function as designed.

Signed-off-by: Srinivas Pandruvada <[email protected]>
Reviewed-by: Len Brown <[email protected]>
Cc: 4.6+ <[email protected]> # 4.6+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Pull VFIO fix from Alex Williamson:
 "Fix an error path in SPAPR IOMMU backend (Alexey Kardashevskiy)"

* tag 'vfio-v4.10-rc7' of git://github.com/awilliam/linux-vfio:
  vfio/spapr: Fix missing mutex unlock when creating a window
…t/mst/vhost

Pull virtio/vhost fixes from Michael S. Tsirkin:
 "Last minute fixes:

   - ARM DMA fix revert

   - vhost endian-ness fix

   - MAINTAINERS: email address change for Amit"

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
  MAINTAINERS: update email address for Amit Shah
  vhost: fix initialization for vq->is_le
  Revert "vring: Force use of DMA API for ARM-based systems with legacy devices"
The might_sleep_if() assertions in __pm_runtime_idle(),
__pm_runtime_suspend() and __pm_runtime_resume() may generate
false-positive warnings in some situations.  For example, that
happens if a nested pm_runtime_get_sync()/pm_runtime_put() pair
is executed with disabled interrupts within an outer
pm_runtime_get_sync()/pm_runtime_put() section for the same device.
[Generally, pm_runtime_get_sync() may sleep, so it should not be
called with disabled interrupts, but in this particular case the
previous pm_runtime_get_sync() guarantees that the device will not
be suspended, so the inner pm_runtime_get_sync() will return
immediately after incrementing the device's usage counter.]

That started to happen in the i915 driver in 4.10-rc, leading to
the following splat:

 BUG: sleeping function called from invalid context at drivers/base/power/runtime.c:1032
 in_atomic(): 1, irqs_disabled(): 0, pid: 1500, name: Xorg
 1 lock held by Xorg/1500:
  #0:  (&dev->struct_mutex){+.+.+.}, at:
  [<ffffffffa0680c13>] i915_mutex_lock_interruptible+0x43/0x140 [i915]
 CPU: 0 PID: 1500 Comm: Xorg Not tainted
 Call Trace:
  dump_stack+0x85/0xc2
  ___might_sleep+0x196/0x260
  __might_sleep+0x53/0xb0
  __pm_runtime_resume+0x7a/0x90
  intel_runtime_pm_get+0x25/0x90 [i915]
  aliasing_gtt_bind_vma+0xaa/0xf0 [i915]
  i915_vma_bind+0xaf/0x1e0 [i915]
  i915_gem_execbuffer_relocate_entry+0x513/0x6f0 [i915]
  i915_gem_execbuffer_relocate_vma.isra.34+0x188/0x250 [i915]
  ? trace_hardirqs_on+0xd/0x10
  ? i915_gem_execbuffer_reserve_vma.isra.31+0x152/0x1f0 [i915]
  ? i915_gem_execbuffer_reserve.isra.32+0x372/0x3a0 [i915]
  i915_gem_do_execbuffer.isra.38+0xa70/0x1a40 [i915]
  ? __might_fault+0x4e/0xb0
  i915_gem_execbuffer2+0xc5/0x260 [i915]
  ? __might_fault+0x4e/0xb0
  drm_ioctl+0x206/0x450 [drm]
  ? i915_gem_execbuffer+0x340/0x340 [i915]
  ? __fget+0x5/0x200
  do_vfs_ioctl+0x91/0x6f0
  ? __fget+0x111/0x200
  ? __fget+0x5/0x200
  SyS_ioctl+0x79/0x90
  entry_SYSCALL_64_fastpath+0x23/0xc6

even though the code triggering it is correct.

Unfortunately, the might_sleep_if() assertions in question are
too coarse-grained to cover such cases correctly, so make them
a bit less sensitive in order to avoid the false-positives.

Reported-and-tested-by: Sedat Dilek <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
…it/jejb/scsi

Pull SCSI fix from James Bottomley:
 "A single fix this time: a fix for a virtqueue removal bug which only
  appears to affect S390, but which results in the queue hanging forever
  thus causing the machine to fail shutdown"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: virtio_scsi: Reject commands when virtqueue is broken
…/git/gregkh/usb

Pull USB fixes from Greg KH:
 "Here are some small USB fixes for some reported issues, and the usual
  number of new device ids for 4.10-rc7.

  All of these, except the last new device id, have been in linux-next
  for a while with no reported issues"

* tag 'usb-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: pl2303: add ATEN device ID
  usb: gadget: f_fs: Assorted buffer overflow checks.
  USB: Add quirk for WORLDE easykey.25 MIDI keyboard
  usb: musb: Fix external abort on non-linefetch for musb_irq_work()
  usb: musb: Fix host mode error -71 regression
  USB: serial: option: add device ID for HP lt2523 (Novatel E371)
  USB: serial: qcserial: add Dell DW5570 QDL
…rnel/git/gregkh/staging

Pull staging/IIO fixes from Greg KH:
 "Here are a few small IIO and one staging driver fix for 4.10-rc7. They
  fix some reported issues with the drivers.

  All of them have been in linux-next for a week or so with no reported
  issues"

* tag 'staging-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: greybus: timesync: validate platform state callback
  iio: dht11: Use usleep_range instead of msleep for start signal
  iio: adc: palmas_gpadc: retrieve a valid iio_dev in suspend/resume
  iio: health: max30100: fixed parenthesis around FIFO count check
  iio: health: afe4404: retrieve a valid iio_dev in suspend/resume
  iio: health: afe4403: retrieve a valid iio_dev in suspend/resume
…kernel/git/gregkh/char-misc

Pull char/misc driver fixes from Greg KH:
 "Here are two bugfixes that resolve some reported issues. One in the
  firmware loader, that should fix the much-reported problem of crashes
  with it. The other is a hyperv fix for a reported regression.

  Both have been in linux-next for a week or so with no reported issues"

* tag 'char-misc-4.10-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  Drivers: hv: vmbus: finally fix hv_need_to_signal_on_read()
  firmware: fix NULL pointer dereference in __fw_load_abort()
Pull KVM fix from Radim Krčmář:
 "Fix a regression that prevented migration between hosts with different
  XSAVE features even if the missing features were not used by the guest
  (for stable)"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: do not save guest-unsupported XSAVE state
…inux/kernel/git/tip/tip

Pull irq fixes from Thomas Gleixner:

 - Prevent double activation of interrupt lines, which causes problems
   on certain interrupt controllers

 - Handle the fallout of the above because x86 (ab)uses the activation
   function to reconfigure interrupts under the hood.

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Make irq activate operations symmetric
  irqdomain: Avoid activating interrupts more than once
When vmemmap_populate() allocates space for the memmap it does so in 2MB
sized chunks. The libnvdimm-pfn driver incorrectly accounts for this
when the alignment of the device is set to 4K. When this happens we
trigger memory allocation failures in altmap_alloc_block_buf() and
trigger warnings of the form:

 WARNING: CPU: 0 PID: 3376 at arch/x86/mm/init_64.c:656 arch_add_memory+0xe4/0xf0
 [..]
 Call Trace:
  dump_stack+0x86/0xc3
  __warn+0xcb/0xf0
  warn_slowpath_null+0x1d/0x20
  arch_add_memory+0xe4/0xf0
  devm_memremap_pages+0x29b/0x4e0

Fixes: 315c562 ("libnvdimm, pfn: add 'align' attribute, default to HPAGE_SIZE")
Cc: <[email protected]>
Signed-off-by: Dan Williams <[email protected]>
Andrey Konovalov got crashes in __ip_options_echo() when a NULL skb->dst
is accessed.

ipv4_pktinfo_prepare() should not drop the dst if (evil) IP options
are present.

We could refine the test to the presence of ts_needtime or srr,
but IP options are not often used, so let's be conservative.

Thanks to syzkaller team for finding this bug.

Fixes: d826eb1 ("ipv4: PKTINFO doesnt need dst reference")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
syzkaller found another out of bound access in ip_options_compile(),
or more exactly in cipso_v4_validate()

Fixes: 20e2a86 ("cipso: handle CIPSO options correctly when NetLabel is disabled")
Fixes: 446fda4 ("[NetLabel]: CIPSOv4 engine")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov  <[email protected]>
Cc: Paul Moore <[email protected]>
Acked-by: Paul Moore <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Commit:

  a33d331 ("x86/CPU/AMD: Fix Bulldozer topology")

restored the initial approach we had with the Fam15h topology of
enumerating CU (Compute Unit) threads as cores. And this is still
correct - they're beefier than HT threads but still have some
shared functionality.

Our current approach has a problem with the Mad Max Steam game, for
example. Yves Dionne reported a certain "choppiness" while playing on
v4.9.5.

That problem stems most likely from the fact that the CU threads share
resources within one CU and when we schedule to a thread of a different
compute unit, this incurs latency due to migrating the working set to a
different CU through the caches.

When the thread siblings mask mirrors that aspect of the CUs and
threads, the scheduler pays attention to it and tries to schedule within
one CU first. Which takes care of the latency, of course.

Reported-by: Yves Dionne <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]> # 4.9
Cc: Brice Goglin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Yazen Ghannam <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
After:

  a33d331 ("x86/CPU/AMD: Fix Bulldozer topology")

our  SMT scheduling topology for Fam17h systems is broken, because
the ThreadId is included in the ApicId when SMT is enabled.

So, without further decoding cpu_core_id is unique for each thread
rather than the same for threads on the same core. This didn't affect
systems with SMT disabled. Make cpu_core_id be what it is defined to be.

Signed-off-by: Yazen Ghannam <[email protected]>
Signed-off-by: Borislav Petkov <[email protected]>
Cc: <[email protected]> # 4.9
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
Andrey Konovalov reported out of bound accesses in ip6gre_err()

If GRE flags contains GRE_KEY, the following expression
*(((__be32 *)p) + (grehlen / 4) - 1)

accesses data ~40 bytes after the expected point, since
grehlen includes the size of IPv6 headers.

Let's use a "struct gre_base_hdr *greh" pointer to make this
code more readable.

p[1] becomes greh->protocol.
grhlen is the GRE header length.

Fixes: c12b395 ("gre: Support GRE over IPv6")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Andrey Konovalov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Incorrect num_elem parameter value (1 vs. 5) was used in the
aes_siv_encrypt() call. This resulted in only the first one of the five
AAD vectors to SIV getting included in calculation. This does not
protect all the contents correctly and would not interoperate with a
standard compliant implementation.

Fix this by using the correct number. A matching fix is needed in the AP
side (hostapd) to get FILS authentication working properly.

Fixes: 39404fe ("mac80211: FILS AEAD protection for station mode association frames")
Reported-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
The skcipher could have been of the async variant which may return from
skcipher_encrypt() with -EINPROGRESS after having queued the request.
The FILS AEAD implementation here does not have code for dealing with
that possibility, so allocate a sync cipher explicitly to avoid
potential issues with hardware accelerators.

This is based on the patch sent out by Ard.

Fixes: 39404fe ("mac80211: FILS AEAD protection for station mode association frames")
Reported-by: Ard Biesheuvel <[email protected]>
Signed-off-by: Jouni Malinen <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
The function ieee80211_ie_split_vendor doesn't return 0 on errors. Instead
it returns any offset < ielen when WLAN_EID_VENDOR_SPECIFIC is found. The
return value in mesh_add_vendor_ies must therefore be checked against
ifmsh->ie_len and not 0. Otherwise all ifmsh->ie starting with
WLAN_EID_VENDOR_SPECIFIC will be rejected.

Fixes: 082ebb0 ("mac80211: fix mesh beacon format")
Signed-off-by: Thorsten Horstmann <[email protected]>
Signed-off-by: Mathias Kretschmer <[email protected]>
Signed-off-by: Simon Wunderlich <[email protected]>
[[email protected]: Add commit message]
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
A previous change to fix checks for NL80211_MESHCONF_HT_OPMODE
missed setting the flag when replacing FILL_IN_MESH_PARAM_IF_SET
with checking codes. This results in dropping the received HT
operation value when called by nl80211_update_mesh_config(). Fix
this by setting the flag properly.

Fixes: 9757235 ("nl80211: correct checks for NL80211_MESHCONF_HT_OPMODE value")
Signed-off-by: Masashi Honma <[email protected]>
[rewrite commit message to use Fixes: line]
Signed-off-by: Johannes Berg <[email protected]>
* pm-core-fixes:
  PM / runtime: Avoid false-positive warnings from might_sleep_if()

* pm-cpufreq-fixes:
  cpufreq: intel_pstate: Disable energy efficiency optimization
  cpufreq: brcmstb-avs-cpufreq: properly retrieve P-state upon suspend
  cpufreq: brcmstb-avs-cpufreq: extend sysfs entry brcm_avs_pmap
snd_seq_pool_done() syncs with closing of all opened threads, but it
aborts the wait loop with a timeout, and proceeds to the release
resource even if not all threads have been closed.  The timeout was 5
seconds, and if you run a crazy stuff, it can exceed easily, and may
result in the access of the invalid memory address -- this is what
syzkaller detected in a bug report.

As a fix, let the code graduate from naiveness, simply remove the loop
timeout.

BugLink: http://lkml.kernel.org/r/CACT4Y+YdhDV2H5LLzDTJDVF-qiYHUHhtRaW4rbb4gUhTCQB81w@mail.gmail.com
Reported-by: Dmitry Vyukov <[email protected]>
Cc: <[email protected]>
Signed-off-by: Takashi Iwai <[email protected]>
Dmitry reported use-after-free in ip6_datagram_recv_specific_ctl()

A similar bug was fixed in commit 8ce4862 ("ipv6: tcp: restore
IP6CB for pktoptions skbs"), but I missed another spot.

tcp_v6_syn_recv_sock() can indeed set np->pktoptions from ireq->pktopts

Fixes: 971f10e ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
…/scm/linux/kernel/git/jberg/mac80211

Johannes Berg says:

====================
A few simple fixes:
 * fix FILS AEAD cipher usage to use the correct AAD vectors
   and to use synchronous algorithms
 * fix using mesh HT operation data from userspace
 * fix adding mesh vendor elements to beacons & plink frames
====================

Signed-off-by: David S. Miller <[email protected]>
torvalds and others added 21 commits February 17, 2017 09:53
…kernel/git/wsa/linux

Pull i2c fix from Wolfram Sang:
 "I2C has a revert to fix a regression"

* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  Revert "i2c: designware: detect when dynamic tar update is possible"
…/git/dtor/input

Pull input fix from Dmitry Torokhov:
 "Just a single change to Elan touchpad driver to recognize a new ACPI
  ID"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: elan_i2c - add ELAN0605 to the ACPI table
…el/git/powerpc/linux

Pull powerpc fix from Michael Ellerman:
 "One fix from Paul: we can not use the radix MMU under a hypervisor for
  now.

  Although the code checked if the processor supports radix, that is not
  sufficient"

* tag 'powerpc-4.10-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64: Disable use of radix under a hypervisor
In the current DCCP implementation an skb for a DCCP_PKT_REQUEST packet
is forcibly freed via __kfree_skb in dccp_rcv_state_process if
dccp_v6_conn_request successfully returns.

However, if IPV6_RECVPKTINFO is set on a socket, the address of the skb
is saved to ireq->pktopts and the ref count for skb is incremented in
dccp_v6_conn_request, so skb is still in use. Nevertheless, it gets freed
in dccp_rcv_state_process.

Fix by calling consume_skb instead of doing goto discard and therefore
calling __kfree_skb.

Similar fixes for TCP:

fb7e239 [TCP]: skb is unexpectedly freed.
0aea76d tcp: SYN packets are now
simply consumed

Signed-off-by: Andrey Konovalov <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Since the commit 0c1d70a ("net: use dst_cache for vxlan device")
vxlan_fill_metadata_dst() calls vxlan_get_route() passing a NULL
dst_cache pointer, so the latter should explicitly check for
valid dst_cache ptr. Unfortunately the commit d71785f ("net: add
dst_cache to ovs vxlan lwtunnel") removed said check.

As a result is possible to trigger a null pointer access calling
vxlan_fill_metadata_dst(), e.g. with:

ovs-vsctl add-br ovs-br0
ovs-vsctl add-port ovs-br0 vxlan0 -- set interface vxlan0 \
	type=vxlan options:remote_ip=192.168.1.1 \
	options:key=1234 options:dst_port=4789 ofport_request=10
ip address add dev ovs-br0 172.16.1.2/24
ovs-vsctl set Bridge ovs-br0 ipfix=@i -- --id=@i create IPFIX \
	targets=\"172.16.1.1:1234\" sampling=1
iperf -c 172.16.1.1 -u -l 1000 -b 10M -t 1 -p 1234

This commit addresses the issue passing to vxlan_get_route() the
dst_cache already available into the lwt info processed by
vxlan_fill_metadata_dst().

Fixes: d71785f ("net: add dst_cache to ovs vxlan lwtunnel")
Signed-off-by: Paolo Abeni <[email protected]>
Acked-by: Jiri Benc <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Pull block layer fix from Jens Axboe:
 "A single fix for a lockdep splat reported by Thomas and Gabriel"

* 'for-linus' of git://git.kernel.dk/linux-block:
  cfq-iosched: don't call wbt_disable_default() with IRQs disabled
A nested lock depth was added to the hasbin_delete() code but it
doesn't actually work some well and results in tons of lockdep splats.

Fix the code instead to properly drop the lock around the operation
and just keep peeking the head of the hashbin queue.

Reported-by: Dmitry Vyukov <[email protected]>
Tested-by: Dmitry Vyukov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
This patch enables the Qualcomm RPM based Clock Controller present on
A-family boards.

Signed-off-by: Andy Gross <[email protected]>
Acked-by: Bjorn Andersson <[email protected]>
Signed-off-by: Olof Johansson <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Use rcuidle console tracepoint because, apparently, it may be issued
from an idle CPU:

  hw-breakpoint: Failed to enable monitor mode on CPU 0.
  hw-breakpoint: CPU 0 failed to disable vector catch

  ===============================
  [ ERR: suspicious RCU usage.  ]
  4.10.0-rc8-next-20170215+ lkl#119 Not tainted
  -------------------------------
  ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage!

  other info that might help us debug this:

  RCU used illegally from idle CPU!
  rcu_scheduler_active = 2, debug_locks = 0
  RCU used illegally from extended quiescent state!
  2 locks held by swapper/0/0:
   #0:  (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54
   #1:  (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474

  stack backtrace:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ lkl#119
  Hardware name: Generic OMAP4 (Flattened Device Tree)
    console_unlock
    vprintk_emit
    vprintk_default
    printk
    reset_ctrl_regs
    dbg_cpu_pm_notify
    notifier_call_chain
    cpu_pm_exit
    omap_enter_idle_coupled
    cpuidle_enter_state
    cpuidle_enter_state_coupled
    do_idle
    cpu_startup_entry
    start_kernel

This RCU warning, however, is suppressed by lockdep_off() in printk().
lockdep_off() increments the ->lockdep_recursion counter and thus
disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want
lockdep to be enabled "current->lockdep_recursion == 0".

Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Sergey Senozhatsky <[email protected]>
Reported-by: Tony Lindgren <[email protected]>
Tested-by: Tony Lindgren <[email protected]>
Acked-by: Paul E. McKenney <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Cc: Petr Mladek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Lindgren <[email protected]>
Cc: Russell King <[email protected]>
Cc: <[email protected]> [3.4+]
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
Pull networking fixes from David Miller:

 1) Fix leak in dpaa_eth error paths, from Dan Carpenter.

 2) Use after free when using IPV6_RECVPKTINFO, from Andrey Konovalov.

 3) fanout_release() cannot be invoked from atomic contexts, from Anoob
    Soman.

 4) Fix bogus attempt at lockdep annotation in IRDA.

 5) dev_fill_metadata_dst() can OOP on a NULL dst cache pointer, from
    Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  irda: Fix lockdep annotations in hashbin_delete().
  vxlan: fix oops in dev_fill_metadata_dst
  dccp: fix freeing skb too early for IPV6_RECVPKTINFO
  dpaa_eth: small leak on error
  packet: Do not call fanout_release from atomic contexts
…m/linux/kernel/git/tip/tip

Pull timer fixes from Thomas Gleixner:
 "Two small fixes::

   - Prevent deadlock on the tick broadcast lock. Found and fixed by
     Mike.

   - Stop using printk() in the timekeeping debug code to prevent a
     deadlock against the scheduler"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  timekeeping: Use deferred printk() in debug code
  tick/broadcast: Prevent deadlock on tick_broadcast_lock
…cm/linux/kernel/git/tip/tip

Pull locking fix from Thomas Gleixner:
 "Move the futex init function to core initcall so user mode helper does
  not run into an uninitialized futex syscall"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  futex: Move futex_init() to core_initcall
…inux/kernel/git/tip/tip

Pull x86 fix from Thomas Gleixner:
 "Make the build clean by working around yet another GCC stupidity"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/vm86: Fix unused variable warning if THP is disabled
Pull ARM fixes from Russell King:
 "A couple of fixes from Kees concerning problems he spotted with our
  user access support"

* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
  ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()
  ARM: 8657/1: uaccess: consistently check object sizes
…nel/git/arm/arm-soc

Pull ARM SoC fixes from Arnd Bergmann:
 "Two more bugfixes that came in during this week:

   - a defconfig change to enable a vital driver used on some Qualcomm
     based phones. This was already queued for 4.11, but the maintainer
     asked to have it in 4.10 after all.

   - a regression fix for the reset controller framework, this got
     broken by a typo in the 4.10 merge window"

* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: multi_v7_defconfig: enable Qualcomm RPMCC
  reset: fix shared reset triggered_count decrement on error
If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped
check, release the dst before returning an error.

Fixes: ec5e3b0 ("ipv6: Inhibit IPv4-mapped src address on the wire.")
Signed-off-by: Willem de Bruijn <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
Don't crash the machine just because of an empty transfer. Use WARN_ON()
combined with returning an error.

Found by Dmitry Vyukov and syzkaller.

[ Changed to "WARN_ON_ONCE()". Al has a patch that should fix the root
  cause, but a BUG_ON() is not acceptable in any case, and a WARN_ON()
  might still be a cause of excessive log spamming.

  NOTE! If this warning ever triggers, we may end up leaking resources,
  since this doesn't bother to try to clean the command up. So this
  WARN_ON_ONCE() triggering does imply real problems. But BUG_ON() is
  much worse.

  People really need to stop using BUG_ON() for "this shouldn't ever
  happen". It makes pretty much any bug worse.     - Linus ]

Signed-off-by: Johannes Thumshirn <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Cc: James Bottomley <[email protected]>
Cc: Al Viro <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
What happens is that a write to /dev/sg is given a request with non-zero
->iovec_count combined with zero ->dxfer_len.  Or with ->dxferp pointing
to an array full of empty iovecs.

Having write permission to /dev/sg shouldn't be equivalent to the
ability to trigger BUG_ON() while holding spinlocks...

Found by Dmitry Vyukov and syzkaller.

[ The BUG_ON() got changed to a WARN_ON_ONCE(), but this fixes the
  underlying issue.  - Linus ]

Signed-off-by: Al Viro <[email protected]>
Reported-by: Dmitry Vyukov <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
Linux 4.10

This fixes lkl#245.

Conflicts:
	fs/xfs/xfs_buf.c
	kernel/sched/idle.c
	scripts/link-vmlinux.sh
	tools/Makefile

Signed-off-by: Hajime Tazaki <[email protected]>
This commit includes several adapts for arch/lkl:
- change cpu_idle_loop to do_idle for scheduling
- cycle_t to u64 for clock_read() return value

Signed-off-by: Hajime Tazaki <[email protected]>
Copy link
Member

@liuyuan10 liuyuan10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great upstream exposes do_idle for us :)
Don't know what's the best process. Do you want to reverse commit 9072b9a so that it doesn't count as an change to upstream in LKL

@thehajime
Copy link
Member Author

Great upstream exposes do_idle for us :)
Don't know what's the best process. Do you want to reverse commit 9072b9a so that it doesn't count as an change to upstream in LKL

@liuyuan10 thanks for the review.

no, exposure of do_idle() is not from upstream, but from mine (b405804). The upstream changed the name of function from cpu_idle_loop() to do_idle().

@liuyuan10
Copy link
Member

ha, I'm too optimistic.

BTW: this fixes #245

@laijs
Copy link

laijs commented Feb 22, 2017

#336 changes how the idle is handled.

@tavip tavip merged commit ddc34f7 into lkl:master Feb 22, 2017
@thehajime thehajime deleted the v4.10-merge branch March 6, 2017 02:57
@tavip tavip mentioned this pull request Sep 15, 2017
retrage pushed a commit to retrage/linux that referenced this pull request Dec 14, 2018
This patch fixes the following splat.

[118709.054937] BUG: using smp_processor_id() in preemptible [00000000] code: test/1571
[118709.054970] caller is nft_update_chain_stats.isra.4+0x53/0x97 [nf_tables]
[118709.054980] CPU: 2 PID: 1571 Comm: test Not tainted 4.17.0-rc6+ lkl#335
[...]
[118709.054992] Call Trace:
[118709.055011]  dump_stack+0x5f/0x86
[118709.055026]  check_preemption_disabled+0xd4/0xe4

Signed-off-by: Pablo Neira Ayuso <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.