merge-upstream/v5.10.42 from branch/tag: upstream/v5.10.42 into branch: cos-5.10
Changelog:
-------------------------------------------------------------
A. Cody Schuffelen (1):
virt_wifi: Return micros for BSS TSF values
Abhijeet Rao (1):
xhci-pci: Allow host runtime PM as default for Intel Alder Lake xHCI
Abhinav Kumar (1):
drm/msm/dp: Fix incorrect NULL check kbot warnings in DP driver
Adam Ford (1):
clk: imx: Fix reparenting of UARTs not associated with stdout
Adrian Hunter (8):
perf inject: Fix repipe usage
mmc: sdhci-pci: Fix initialization of some SD cards for Intel BYT-based controllers
mmc: sdhci-pci: Add PCI IDs for Intel LKF
perf intel-pt: Fix sample instruction bytes
perf intel-pt: Fix transaction abort handling
perf scripts python: exported-sql-viewer.py: Fix copy to clipboard from Top Calls by elapsed Time report
perf scripts python: exported-sql-viewer.py: Fix Array TypeError
perf scripts python: exported-sql-viewer.py: Fix warning display
Ahmed S. Darwish (1):
net: xfrm: Localize sequence counter per network namespace
Ajish Koshy (1):
scsi: pm80xx: Fix drives missing during rmmod/insmod loop
Al Cooper (1):
mmc: sdhci-brcmstb: Remove CQE quirk
Al Viro (2):
LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late
hostfs: fix memory handling in follow_link()
Alaa Emad (2):
media: dvb: Add check on sp8870_readreg return
media: gspca: mt9m111: Check write_bridge for timeout
Alain Volmat (1):
spi: stm32: Fix use-after-free on unbind
Alan Stern (1):
USB: usbfs: Don't WARN about excessively large memory allocations
Alban Bedel (1):
platform/x86: intel-hid: Support Lenovo ThinkPad X1 Tablet Gen 2
Aleksander Jan Bajkowski (1):
net: lantiq: fix memory corruption in RX ring
Alex Elder (2):
net: ipa: fix init header command validation
net: ipa: memory region array is variable size
Alexander Aring (23):
net: ieee802154: nl-mac: fix check on panid
net: ieee802154: fix nl802154 del llsec key
net: ieee802154: fix nl802154 del llsec dev
net: ieee802154: fix nl802154 add llsec key
net: ieee802154: fix nl802154 del llsec devkey
net: ieee802154: forbid monitor for set llsec params
net: ieee802154: forbid monitor for del llsec seclevel
net: ieee802154: stop dump llsec params for monitors
net: ieee802154: stop dump llsec keys for monitors
net: ieee802154: forbid monitor for add llsec key
net: ieee802154: forbid monitor for del llsec key
net: ieee802154: stop dump llsec devs for monitors
net: ieee802154: forbid monitor for add llsec dev
net: ieee802154: forbid monitor for del llsec dev
net: ieee802154: stop dump llsec devkeys for monitors
net: ieee802154: forbid monitor for add llsec devkey
net: ieee802154: forbid monitor for del llsec devkey
net: ieee802154: stop dump llsec seclevels for monitors
net: ieee802154: forbid monitor for add llsec seclevel
fs: dlm: fix debugfs dump
fs: dlm: add errno handling to check callback
fs: dlm: check on minimum msglen size
fs: dlm: flush swork on shutdown
Alexander Gordeev (1):
s390/cpcmd: fix inline assembly register clobbering
Alexander Lobakin (3):
mtd: spinand: core: add missing MODULE_DEVICE_TABLE()
xsk: Respect device's headroom and tailroom on generic xmit path
gro: fix napi_gro_frags() Fast GRO breakage due to IP alignment check
Alexander Shishkin (2):
intel_th: pci: Add Rocket Lake CPU support
intel_th: pci: Add Alder Lake-M support
Alexander Shiyan (1):
ASoC: fsl_esai: Fix TDM slot setup for I2S mode
Alexander Usyskin (1):
mei: request autosuspend after sending rx flow control
Alexandru Ardelean (3):
iio: adc: Kconfig: make AD9467 depend on ADI_AXI_ADC symbol
iio: hid-sensors: select IIO_TRIGGERED_BUFFER under HID_SENSOR_IIO_TRIGGER
iio: adc: ad7192: handle regulator voltage error first
Alexandru Elisei (1):
KVM: arm64: Initialize VCPU mdcr_el2 before loading it
Alexey Kardashevskiy (2):
powerpc/iommu: Annotate nested lock for lockdep
powerpc: Fix early setup to make early_ioremap() work
Ali Saidi (1):
locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
Alison Schofield (1):
x86, sched: Treat Intel SNC topology as default, COD as exception
Amir Goldstein (1):
ovl: invalidate readdir cache on changes to dir with origin
Amit (1):
nvmet: remove unused ctrl->cqs
Anastasia Kovaleva (1):
scsi: qla2xxx: Prevent PRLI in target mode
Andre Edich (1):
net: phy: lan87xx: fix access to wrong register of LAN87xx
Andre Przywara (5):
kselftest/arm64: sve: Do not use non-canonical FFR register value
arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
arm64: dts: allwinner: Revert SD card CD GPIO for Pine64-LTS
kselftest/arm64: mte: Fix compilation with native compiler
kselftest/arm64: mte: Fix MTE feature detection
Andrei Matei (1):
bpf: Allow variable-offset stack access
Andrew Jeffery (1):
serial: 8250: Add UART_BUG_TXRACE workaround for Aspeed VUART
Andrew Price (1):
gfs2: Flag a withdraw if init_threads() fails
Andrew Scull (1):
bug: Remove redundant condition check in report_bug
Andrii Nakryiko (7):
libbpf: Add explicit padding to bpf_xdp_set_link_opts
bpftool: Fix maybe-uninitialized warnings
selftests/bpf: Re-generate vmlinux.h and BPF skeletons if bpftool changed
selftests/bpf: Fix BPF_CORE_READ_BITFIELD() macro
selftests/bpf: Fix field existence CO-RE reloc tests
selftests/bpf: Fix core_reloc test runner
bpf: Prevent writable memory-mapping of read-only ringbuf pages
Andy Gospodarek (1):
bnxt_en: Include new P5 HV definition in VF check.
Andy Shevchenko (16):
i2c: designware: Adjust bus_freq_hz when refuse high speed mode set
gpiolib: Read "gpio-line-names" from a firmware node
dmaengine: dw: Make it dependent to HAS_IOMEM
pinctrl: core: Show pin numbers for the controllers with base = 0
usb: gadget: pch_udc: Revert d3cb25a12138 completely
usb: gadget: pch_udc: Replace cpu_to_le32() by lower_32_bits()
usb: gadget: pch_udc: Check if driver is present before calling ->setup()
usb: gadget: pch_udc: Check for DMA mapping error
usb: gadget: pch_udc: Initialize device pointer before use
usb: gadget: pch_udc: Provide a GPIO line used on Intel Minnowboard (v1)
driver core: platform: Declare early_platform_cleanup() prototype
usb: typec: ucsi: Put fwnode in any case during ->probe()
scripts: switch explicitly to Python 3
iio: dac: ad5770r: Put fwnode in error case during ->probe()
platform/x86: intel_punit_ipc: Append MODULE_DEVICE_TABLE for ACPI
spi: Assume GPIO CS active high in ACPI case
Aniruddha Tvs Rao (1):
mmc: sdhci-tegra: Add required callbacks to set/clear CQE_EN bit
Anirudh Rayabharam (8):
net: hso: fix null-ptr-deref during tty device unregistration
usb: gadget: dummy_hcd: fix gpf in gadget_setup
rapidio: handle create_workqueue() failure
net: stmicro: handle clk_prepare() failure during init
video: hgafb: correctly handle card detect failure during probe
net: fujitsu: fix potential null-ptr-deref
net/smc: properly handle workqueue allocation failure
ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()
Anirudh Venkataramanan (2):
ice: Continue probe on link/PHY errors
ice: Use port number instead of PF ID for WoL
Anna Schumaker (1):
NFSv4: Fix a NULL pointer dereference in pnfs_mark_matching_lsegs_return()
Annaliese McDermond (3):
ASoC: tlv320aic32x4: Register clocks before registering component
ASoC: tlv320aic32x4: Increase maximum register in regmap
sc16is7xx: Defer probe if device read fails
Anson Jacob (3):
drm/amdkfd: Fix UBSAN shift-out-of-bounds warning
drm/amd/display: Fix UBSAN warning for not a valid value for type '_Bool'
drm/amd/display: Fix UBSAN: shift-out-of-bounds warning
Anthony Wang (1):
drm/amd/display: Force vsync flip when reconfiguring MPCC
Antoine Tenart (2):
vxlan: do not modify the shared tunnel info when PMTU triggers an ICMP reply
geneve: do not modify the shared tunnel info when PMTU triggers an ICMP reply
Antonio Borneo (1):
spi: stm32: drop devres version of spi_register_master
Anup Patel (1):
RISC-V: Fix error code returned by riscv_hartid_to_cpuid()
Archie Pusaka (3):
Bluetooth: verify AMP hci_chan before amp_destroy
Bluetooth: Set CONF_NOT_COMPLETE as l2cap_chan default
Bluetooth: check for zapped sk before connecting
Ard Biesheuvel (6):
ARM: 9056/1: decompressor: fix BSS size calculation for LLVM ld.lld
crypto: api - check for ERR pointers in crypto_destroy_tfm()
ARM: 9011/1: centralize phys-to-virt conversion of DT/ATAGS address
ARM: 9012/1: move device tree mapping out of linear region
ARM: 9020/1: mm: use correct section size macro to describe the FDT virtual address
ARM: 9027/1: head.S: explicitly map DT even if it lives in the first physical section
Aric Cyr (2):
drm/amd/display: Don't optimize bandwidth before disabling planes
drm/amd/display: DCHUB underflow counter increasing in some scenarios
Ariel Levkovich (2):
net/mlx5e: Fix mapping of ct_label zero
net/mlx5: Set term table as an unmanaged flow table
Arkadiusz Kubalewski (4):
i40e: Fix sparse warning: missing error code 'err'
i40e: Fix sparse error: 'vsi->netdev' could be null
i40e: Fix sparse error: uninitialized symbol 'ring'
i40e: Fix sparse errors in i40e_txrx.c
Arnaldo Carvalho de Melo (2):
perf map: Tighten snprintf() string precision to pass gcc check on some 32-bit arches
perf symbols: Fix dso__fprintf_symbols_by_name() to return the number of printed chars
Arnd Bergmann (25):
x86/build: Turn off -fcf-protection for realmode targets
remoteproc: qcom: pil_info: avoid 64-bit division
soc/fsl: qbman: fix conflicting alignment attributes
lockdep: Address clang -Wformat warning printing for %hd
drm/imx: imx-ldb: fix out of bounds array access warning
ARM: keystone: fix integer overflow warning
ARM: omap1: fix building with clang IAS
Input: i8042 - fix Pegatron C15B ID entry
kasan: fix hwasan build for gcc
amdgpu: avoid incorrect %hu format string
security: commoncap: fix -Wstringop-overread warning
spi: rockchip: avoid objtool warning
crypto: poly1305 - fix poly1305_core_setkey() declaration
irqchip/gic-v3: Fix OF_BAD_ADDR error handling
wlcore: fix overlapping snprintf arguments in debugfs
net: enetc: fix link error again
smp: Fix smp_call_function_single_async prototype
ext4: fix debug format string warning
x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes
airo: work around stack usage warning
kgdb: fix gcc-11 warning on indentation
usb: sl811-hcd: improve misleading indentation
isdn: capi: fix mismatched prototypes
PCI: thunder: Fix compile testing
kcsan: Fix debugfs initcall return type
Artur Petrosyan (3):
usb: dwc2: Fix session request interrupt handler
usb: dwc2: Fix host mode hibernation exit with remote wakeup flow.
usb: dwc2: Fix hibernation between host and device modes.
Arun Easi (2):
scsi: qla2xxx: Fix crash in qla2xxx_mqueuecommand()
PCI: Allow VPD access for QLogic ISP2722
Athira Rajeev (2):
powerpc/perf: Fix PMU constraint check for EBB events
powerpc/perf: Fix the threshold event selection for memory events in power10
Atul Gopinathan (2):
cdrom: gdrom: deallocate struct gdrom_unit fields in remove_gdrom
serial: max310x: unregister uart driver in case of failure and abort
Aurelien Aptel (2):
smb2: fix use-after-free in smb2_ioctl_query_info()
cifs: set server->cipher_type to AES-128-CCM for SMB3.0
Avri Altman (2):
mmc: block: Update ext_csd.cache_ctrl if it was written
mmc: block: Issue a cache flush only when it's enabled
Axel Rasmussen (1):
userfaultfd: release page in error path to avoid BUG_ON
Aya Levin (5):
net/mlx5e: Fix ethtool indication of connector type
net/mlx5: Fix PPLM register mapping
net/mlx5: Fix PBMC register mapping
net/mlx5e: Fix setting of RS FEC mode
net/mlx5e: Fix error path of updating netdev queues
Ayush Garg (1):
Bluetooth: Fix incorrect status handling in LE PHY UPDATE event
Ayush Sawal (2):
crypto: chelsio - Read rxchannel-id from firmware
cxgb4/ch_ktls: Clear resources when pf4 device is removed
Badhri Jagan Sridharan (5):
usb: typec: tcpm: Address incorrect values of tcpm psy for fixed supply
usb: typec: tcpm: Address incorrect values of tcpm psy for pps supply
usb: typec: tcpm: update power supply once partner accepts
usb: typec: tcpci: Check ROLE_CONTROL while interpreting CC_STATUS
usb: typec: tcpm: Fix error while calculating PPS out values
Baptiste Lepers (1):
sunrpc: Fix misplaced barrier in call_decode
Bart Van Assche (4):
scsi: qla2xxx: Always check the return value of qla24xx_get_isp_stats()
scsi: libfc: Fix a format specifier
blk-mq: Swap two calls in blk_mq_exit_queue()
scsi: ufs: core: Increase the usable queue depth
Bas Nieuwenhuizen (1):
tools/power/turbostat: Fix turbostat for AMD Zen CPUs
Bastian Germann (1):
ASoC: sunxi: sun4i-codec: fill ASoC card owner
Ben Gardon (5):
KVM: x86/mmu: change TDP MMU yield function returns to match cond_resched
KVM: x86/mmu: Merge flush and non-flush tdp_mmu_iter_cond_resched
KVM: x86/mmu: Rename goal_gfn to next_last_level_gfn
KVM: x86/mmu: Ensure forward progress when yielding in TDP MMU iter
KVM: x86/mmu: Yield in TDU MMU iter even if no SPTES changed
Ben Greear (1):
mac80211: fix time-is-after bug in mlme
Bence Csókás (1):
i2c: Add I2C_AQ_NO_REP_START adapter quirk
Benjamin Block (1):
dm rq: fix double free of blk_mq_tag_set in dev remove after table load fails
Benjamin Segall (1):
kvm: exit halt polling on need_resched() as well
Bhaumik Bhatt (3):
bus: mhi: core: Clear configuration from channel context during reset
bus: mhi: core: Destroy SBL devices when moving to mission mode
bus: mhi: core: Clear context for stopped channels from remove()
Bill Wendling (1):
arm64/vdso: Discard .note.gnu.property sections in vDSO
Bixuan Cui (3):
usb: musb: fix PM reference leak in musb_irq_work()
usb: core: hub: Fix PM reference leak in usb_port_resume()
iommu/virtio: Add missing MODULE_DEVICE_TABLE
Bjorn Andersson (5):
net: qrtr: Avoid potential use after free in MHI send
soc: qcom: mdt_loader: Validate that p_filesz < p_memsz
soc: qcom: mdt_loader: Detect truncated read of segments
remoteproc: qcom_q6v5_mss: Validate p_filesz in ELF loader
usb: typec: mux: Fix matching with typec_altmode_desc
Bob Pearson (1):
RDMA/rxe: Fix a bug in rxe_fill_ip_info()
Bob Peterson (1):
gfs2: report "already frozen/thawed" errors
Bodo Stroesser (1):
scsi: target: tcmu: Return from tcmu_handle_completions() if cmd_id not found
Boris Brezillon (2):
drm/panfrost: Clear MMU irqs before handling the fault
drm/panfrost: Don't try to map pages that are already mapped
Boris Burkov (1):
btrfs: return whole extents in fiemap
Brendan Jackman (1):
libbpf: Fix signed overflow in ringbuf_process_ring
Brian King (1):
scsi: ibmvfc: Fix invalid state machine BUG_ON()
Bruce Allan (1):
ice: fix memory allocation call
Caleb Connolly (1):
Input: s6sy761 - fix coordinate read bit shift
Calvin Walton (1):
tools/power turbostat: Fix offset overflow issue in index converting
Can Guo (5):
scsi: ufs: core: Fix task management request completion timeout
scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs
scsi: ufs: core: Do not put UFS power into LPM if link is broken
scsi: ufs: core: Cancel rpm_dev_flush_recheck_work during system suspend
scsi: ufs: core: Narrow down fast path in system suspend path
Carl Philipp Klemm (1):
power: supply: cpcap-charger: Add usleep to cpcap charger to avoid usb plug bounce
Carlos Leija (1):
ARM: OMAP4: PM: update ROM return address for OSWR and OFF
Carsten Haitzler (1):
drm/komeda: Fix bit check to import to value of proper type
Catalin Marinas (3):
arm64: mte: Ensure TIF_MTE_ASYNC_FAULT is set atomically
arm64: Remove arm64_dma32_phys_limit and its uses
arm64: Fix race condition on PG_dcache_clean in __sync_icache_dcache()
Catherine Sullivan (2):
gve: Check TX QPL was actually assigned
gve: Upgrade memory barrier in poll routine
Cezary Rojewski (1):
ASoC: Intel: Skylake: Compile when any configuration is selected
Chaitanya Kulkarni (3):
scsi: target: pscsi: Fix warning in pscsi_complete_cmd()
nvmet: add lba to sect conversion helpers
nvmet: fix inline bio check for bdev-ns
Changfeng (1):
drm/amdgpu: disable 3DCGCG on picasso/raven1 to avoid compute hang
Chao Yu (14):
f2fs: fix to avoid out-of-bounds memory access
f2fs: move ioctl interface definitions to separated file
f2fs: fix compat F2FS_IOC_{MOVE,GARBAGE_COLLECT}_RANGE
f2fs: fix to allow migrating fully valid segment
f2fs: fix panic during f2fs_resize_fs()
f2fs: fix to align to section for fallocate() on pinned file
f2fs: fix to update last i_size if fallocate partially succeeds
f2fs: fix to avoid touching checkpointed data in get_victim()
f2fs: fix to cover __allocate_new_section() with curseg_lock
f2fs: fix to avoid accessing invalid fio in f2fs_allocate_data_block()
f2fs: avoid unneeded data copy in f2fs_ioc_move_range()
f2fs: compress: fix to free compress page correctly
f2fs: compress: fix race condition of overwrite vs truncate
f2fs: compress: fix to assign cc.cluster_idx correctly
Charan Teja Reddy (1):
sched,psi: Handle potential task count underflow bugs more gracefully
Chen Huang (1):
powerpc: Fix HAVE_HARDLOCKUP_DETECTOR_ARCH build configuration
Chen Hui (2):
clk: qcom: a53-pll: Add missing MODULE_DEVICE_TABLE
clk: qcom: apss-ipq-pll: Add missing MODULE_DEVICE_TABLE
Chen Jun (1):
posix-timers: Preserve return value in clock_adjtime32()
Chinh T Cao (2):
ice: Refactor DCB related variables out of the ice_port_info struct
ice: Recognize 860 as iSCSI port in CEE mode
Chinmay Agarwal (1):
neighbour: Prevent Race condition in neighbour subsytem
Chris Chiu (2):
block: clear GD_NEED_PART_SCAN later in bdev_disk_changed
USB: Add reset-resume quirk for WD19's Realtek Hub
Chris Dion (1):
SUNRPC: Handle major timeout in xprt_adjust_timeout()
Chris Park (1):
drm/amd/display: Disconnect non-DP with no EDID
Chris von Recklinghausen (1):
PM: hibernate: x86: Use crc32 instead of md5 for hibernation e820 integrity check
Christian A. Ehrhardt (1):
vfio/pci: Add missing range check in vfio_pci_mmap
Christian Gmeiner (1):
serial: 8250_pci: handle FL_NOIRQ board flag
Christian König (2):
drm/amdgpu: fix concurrent VM flushes on Vega/Navi v2
drm/amdgpu: stop touching sched.ready in the backend
Christoph Hellwig (5):
block: return -EBUSY when there are open partitions in blkdev_reread_part
md: split mddev_find
md: factor out a mddev_find_locked helper from mddev_find
nvme: do not try to reconfigure APST when the controller is not live
nvme-multipath: fix double initialization of ANA state
Christophe JAILLET (16):
net: davicom: Fix regulator not turned off on failed probe
mmc: uniphier-sd: Fix an error handling path in uniphier_sd_probe()
mmc: uniphier-sd: Fix a resource leak in the remove function
usb: gadget: s3c: Fix incorrect resources releasing
usb: gadget: s3c: Fix the error handling path in 's3c2410_udc_probe()'
media: venus: core: Fix some resource leaks in the error path of 'venus_probe()'
usb: fotg210-hcd: Fix an error message
usb: musb: Fix an error message
ACPI: scan: Fix a memory leak in an error handling path
xhci: Do not use GFP_KERNEL in (potentially) atomic context
openrisc: Fix a memory leak
uio_hv_generic: Fix a memory leak in error handling paths
spi: spi-fsl-dspi: Fix a resource leak in an error handling path
net: netcp: Fix an error message
net: mdio: thunder: Fix a double free issue in the .remove function
net: mdio: octeon: Fix some double free issues
Christophe Kerello (1):
spi: stm32-qspi: fix pm_runtime usage_count counter
Christophe Leroy (5):
mm: ptdump: fix build failure
powerpc/32: Fix boot failure with CONFIG_STACKPROTECTOR
powerpc/64: Fix the definition of the fixmap area
powerpc/52xx: Fix an invalid ASM expression ('addi' used instead of 'add')
powerpc/32: Statically initialise first emergency context
Chuck Lever (6):
NFSD: Fix sparse warning in nfs4proc.c
SUNRPC: Move fault injection call sites
SUNRPC: Remove trace_xprt_transmit_queued
xprtrdma: Avoid Receive Queue wrapping
xprtrdma: Fix cwnd update ordering
xprtrdma: rpcrdma_mr_pop() already does list_del_init()
Chunfeng Yun (6):
arm64: dts: mt8173: fix property typo of 'phys' in dsi node
usb: xhci-mtk: support quirk to disable usb2 lpm
usb: xhci-mtk: remove or operator for setting schedule parameters
usb: xhci-mtk: improve bandwidth scheduling with TT
usb: core: hub: fix race condition about TRSMRCY of resume
usb: core: reduce power-on-good delay time of root hub
Ciara Loftus (4):
libbpf: Ensure umem pointer is non-NULL before dereferencing
libbpf: Restore umem state after socket create failure
libbpf: Only create rx and tx XDP rings when necessary
libbpf: Fix potential NULL pointer dereference
Claire Chang (1):
swiotlb: Fix the type of index
Claudio Imbrenda (5):
KVM: s390: VSIE: correctly handle MVPG when in VSIE
KVM: s390: split kvm_s390_logical_to_effective
KVM: s390: VSIE: fix MVPG handling for prefixing and MSO
KVM: s390: split kvm_s390_real_to_abs
KVM: s390: extend kvm_s390_shadow_fault to return entry pointer
Claudiu Beznea (2):
net: macb: restore cmp registers on resume path
net: macb: fix the restore of cmp registers
Claudiu Manoil (1):
gianfar: Handle error code at MAC address change
Colin Ian King (30):
ice: Fix potential infinite loop when using u8 loop counter
clk: socfpga: arria10: Fix memory leak of socfpga_clk on error return
drm/radeon: fix copy of uninitialized variable back to userspace
memory: gpmc: fix out of bounds read and dereference on gpmc_cs[]
crypto: sun8i-ss - Fix memory leak of object d when dma_iv fails to map
staging: rtl8192u: Fix potential infinite loop
crypto: sun8i-ss - Fix memory leak of pad
crypto: sa2ul - Fix memory leak of rxd
usb: gadget: r8a66597: Add missing null check on return from platform_get_resource
media: vivid: fix assignment of dev->fbuf_out_flags
media: [next] staging: media: atomisp: fix memory leak of object flash
media: m88rs6000t: avoid potential out-of-bounds reads on arrays
clk: uniphier: Fix potential infinite loop
scsi: pm80xx: Fix potential infinite loop
ASoC: Intel: boards: sof-wm8804: add check for PLL setting
liquidio: Fix unintented sign extension of a left shift of a u16
xfs: fix return of uninitialized value in variable error
mt7601u: fix always true expression
cxgb4: Fix unintentional sign extension issues
net: thunderx: Fix unintentional sign extension issue
net/mlx5: Fix bit-wise and with zero
ALSA: usb: midi: don't return -ENOMEM when usb_urb_ep_type_check fails
net: davinci_emac: Fix incorrect masking of tx and rx error channel
wlcore: Fix buffer overrun by snprintf due to incorrect buffer size
KEYS: trusted: Fix memory leak on object td
f2fs: fix a redundant call to f2fs_balance_fs if an error occurs
dmaengine: idxd: Fix potential null dereference on pointer status
fs/proc/generic.c: fix incorrect pde_is_permanent check
iio: tsl2583: Fix division by a zero lux_val
serial: tegra: Fix a mask operation that is always true
Colin Xu (2):
drm/i915/gvt: Fix virtual display setup for BXT/APL
drm/i915/gvt: Fix vfio_edid issue for BXT/APL
Cong Wang (1):
smc: disallow TCP_ULP in smc_setsockopt()
Corentin Labbe (2):
crypto: sun8i-ss - fix result memory leak on error path
crypto: allwinner - add missing CRYPTO_ prefix
Cédric Le Goater (2):
powerpc/xive: Drop check on irq_data in xive_core_debug_show()
powerpc/xive: Fix xmon command "dxi"
DENG Qingfang (1):
net: dsa: mt7530: fix VLAN traffic leaks
Dafna Hirschfeld (1):
media: rkisp1: rsz: crash fix when setting src format
Damien Le Moal (1):
null_blk: fix command timeout completion handling
Dan Carpenter (30):
thunderbolt: Fix a leak in tb_retimer_add()
thunderbolt: Fix off by one in tb_port_find_retimer()
dmaengine: plx_dma: add a missing put_device() on error path
ipw2x00: potential buffer overflow in libipw_wx_set_encodeext()
ovl: fix missing revert_creds() on error path
mtd: rawnand: fsmc: Fix error code in fsmc_nand_probe()
regulator: bd9576: Fix return from bd957x_probe()
node: fix device cleanups in error handling code
Drivers: hv: vmbus: Use after free in __vmbus_open()
soc: aspeed: fix a ternary sign expansion bug
drm/amd/display: Fix off by one in hdmi_14_process_transaction()
media: atomisp: Fix use after free in atomisp_alloc_css_stat_bufs()
drm: xlnx: zynqmp: fix a memset in zynqmp_dp_train()
HSI: core: fix resource leaks in hsi_add_client_from_dt()
nfc: pn533: prevent potential memory corruption
rtw88: Fix an error code in rtw_debugfs_set_rsvd_page()
drm/i915/gvt: Fix error code in intel_gvt_init_device()
drm/amd/pm: fix error code in smu_set_power_limit()
bnxt_en: fix ternary sign extension bug in bnxt_show_temp()
kfifo: fix ternary sign extension bugs
SUNRPC: fix ternary sign expansion bug in tracing
firmware: arm_scpi: Prevent the ternary sign expansion bug
RDMA/uverbs: Fix a NULL vs IS_ERR() bug
NFS: fix an incorrect limit in filelayout_decode_layout()
net: dsa: fix a crash if ->get_sset_count() fails
chelsio/chtls: unlock on error in chtls_pt_recvmsg()
net: hso: check for allocation failure in hso_create_bulk_serial_device()
staging: emxx_udc: fix loop in _nbu2ss_nuke()
ASoC: cs35l33: fix an error code in probe()
scsi: libsas: Use _safe() loop in sas_resume_port()
Dan Williams (1):
xen/unpopulated-alloc: consolidate pgmap manipulation
Daniel Beer (1):
mmc: sdhci-pci-gli: increase 1.8V regulator wait
Daniel Borkmann (15):
bpf: Use correct permission flag for mixed signed bounds arithmetic
bpf: Ensure off_reg has no mixed signed bounds for all types
bpf: Move off_reg into sanitize_ptr_alu
bpf: Rework ptr_limit into alu_limit and add common error path
bpf: Improve verifier error messages for users
bpf: Move sanitize_val_alu out of op switch
bpf: Refactor and streamline bounds check into helper
bpf: Tighten speculative pointer arithmetic mask
bpf: Fix masking negation logic upon negative dst register
bpf: Fix leakage of uninitialized bpf stack under speculation
bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds
bpf: Fix alu32 const subreg bound tracking on bitwise operations
bpf: Wrap aux data inside bpf_sanitize_info container
bpf: Fix mask direction swap upon off reg sign change
bpf: No need to simulate speculative domain for immediates
Daniel Cordova A (1):
ALSA: hda: fixup headset for ASUS GU502 laptop
Daniel Gomez (2):
drm/amdgpu/ttm: Fix memory leak userptr pages
drm/radeon/ttm: Fix memory leak userptr pages
Daniel Jurgens (1):
net/mlx5: Don't request more than supported EQs
Daniel Niv (1):
media: media/saa7164: fix saa7164_encoder_register() memory leak bugs
Daniel Phan (1):
mac80211: Check crypto_aead_encrypt for errors
Daniel Wagner (1):
nvmet: seset ns->file when open fails
Daniele Palmas (1):
USB: serial: option: add Telit LE910-S1 compositions 0x7010, 0x7011
Danielle Ratson (1):
selftests: mlxsw: Remove a redundant if statement in tc_flower_scale test
Dario Binacchi (2):
serial: omap: don't disable rs485 if rts gpio is missing
serial: omap: fix rs485 half-duplex filtering
Darren Powell (1):
amdgpu/pm: Prevent force of DCEFCLK on NAVI10 and SIENNA_CICHLID
Darrick J. Wong (1):
ics932s401: fix broken handling of errors when word reading fails
Dave Ertman (1):
ice: remove DCBNL_DEVRESET bit from PF state
Dave Jiang (7):
dmaengine: idxd: Fix clobbering of SWERR overflow bit on writeback
dmaengine: idxd: fix delta_rec and crc size field for completion record
dmaengine: idxd: fix opcap sysfs attribute output
dmaengine: idxd: fix wq size store permission state
dmaengine: idxd: fix wq cleanup of WQCFG registers
dmaengine: idxd: fix dma device lifetime
dmaengine: idxd: fix cdev setup and free device lifetime issues
Dave Marchevsky (1):
bpf: Refcount task stack in bpf_get_task_stack
Dave Rigby (1):
perf unwind: Set userdata for all __report_module() paths
David Awogbemila (3):
gve: Update mgmt_msix_idx if num_ntfy changes
gve: Add NULL pointer checks when freeing irqs.
gve: Correct SKB queue index validation.
David Bauer (5):
spi: ath79: always call chipselect function
spi: ath79: remove spi-master setup and cleanup assignment
spi: sync up initial chipselect state
mtd: don't lock when recursively deleting partitions
mt76: mt76x0: disable GTK offloading
David E. Box (2):
platform/x86: intel_pmc_core: Ignore GBE LTR on Tiger Lake platforms
platform/x86: intel_pmc_core: Don't use global pmcdev in quirks
David Edmondson (1):
KVM: x86: dump_vmcs should not assume GUEST_IA32_EFER is valid
David Gow (1):
kunit: tool: Fix a python tuple typing error
David Hildenbrand (3):
s390: fix detection of vector enhancements facility 1 vs. vector packed decimal facility
kernel/resource: make walk_system_ram_res() find all busy IORESOURCE_SYSTEM_RAM resources
kernel/resource: make walk_mem_res() find all busy IORESOURCE_MEM resources
David Howells (3):
afs: Fix updating of i_mode due to 3rd party change
afs: Fix speculative status fetches
afs: Fix the nlink handling of dir-over-dir rename
David Matlack (1):
kvm: Cap halt polling at kvm->max_halt_poll_ns
David S. Miller (1):
math: Export mul_u64_u64_div_u64
David Ward (3):
ASoC: rt286: Generalize support for ALC3263 codec
ASoC: rt286: Make RT286_SET_GPIO_* readable and writable
drm/amd/display: Initialize attribute for hdcp_srm sysfs file
Davide Caratti (3):
openvswitch: fix stack OOB read while fragmenting IPv4 packets
net/sched: fq_pie: re-factor fix for fq_pie endless loop
net/sched: fq_pie: fix OOB access in the traffic path
Davidlohr Bueso (1):
fs/epoll: restore waking from ep_done_scan()
Dean Anderson (1):
usb: gadget/function/f_fs string table fix for multiple languages
Dejin Zheng (1):
PCI: xgene: Fix cfg resource mapping
Dima Chumak (2):
net/mlx5e: Fix multipath lag activation
net/mlx5e: Fix nullptr in add_vlan_push_action()
Dingchen (David) Zhang (1):
drm/amd/display: add handling for hdcp2 rx id list validation
Dinghao Liu (7):
dmaengine: tegra20: Fix runtime PM imbalance on error
media: platform: sti: Fix runtime PM imbalance in regs_show
media: sun8i-di: Fix runtime PM imbalance in deinterlace_start_streaming
mfd: arizona: Fix rumtime PM imbalance on error
iio: light: gp2ap002: Fix rumtime PM imbalance on error
iio: proximity: pulsedlight: Fix rumtime PM imbalance on error
PCI: tegra: Fix runtime PM imbalance in pex_ep_event_pex_rst_deassert()
Dmitry Baryshkov (2):
drm/msm/dsi_pll_7nm: Fix variable usage for pll_lockdet_rate
PCI: Release OF node in pci_scan_device()'s error path
Dmitry Osipenko (6):
drm/tegra: dc: Don't set PLL clock to 0Hz
cpuidle: tegra: Fix C7 idling state on Tegra114
ARM: tegra: acer-a500: Rename avdd to vdda of touchscreen node
soc/tegra: pmc: Fix completion of power-gate toggling
soc/tegra: regulators: Fix locking up when voltage-spread is out of range
iio: gyro: mpu3050: Fix reported temperature value
Dmitry Safonov (1):
xfrm/compat: Cleanup WARN()s that can be user-triggered
Dmitry Vyukov (1):
drm/vkms: fix misuse of WARN_ON
Dmytro Laktyushkin (1):
drm/amd/display: fix dml prefetch validation
Dom Cobley (1):
drm/vc4: crtc: Reduce PV fifo threshold on hvs4
Dominik Andreas Schorpp (1):
USB: serial: ftdi_sio: add IDs for IDS GmbH Products
Don Brace (1):
scsi: smartpqi: Use host-wide tag space
Dong Aisheng (1):
PM / devfreq: Use more accurate returned new_freq as resume_freq
Dongliang Mu (2):
NFC: nci: fix memory leak in nci_allocate_device
misc/uss720: fix memory leak in uss720_probe
DooHyun Hwang (1):
mmc: core: Do a power cycle when the CMD11 fails
Douglas Gilbert (1):
HID cp2112: fix support for multiple gpiochips
Du Cheng (4):
cfg80211: remove WARN_ON() in cfg80211_sme_connect
net: sched: tapr: prevent cycle_time == 0 in parse_taprio_schedule
ethernet: sun: niu: fix missing checks of niu_pci_eeprom_read()
net: caif: remove BUG_ON(dev == NULL) in caif_xmit
Eckhart Mohr (1):
ALSA: hda/realtek: Add quirk for Intel Clevo PCx0Dx
Eddie James (2):
ARM: dts: aspeed: Rainier: Fix humidity sensor bus address
hwmon: (occ) Fix poll rate limiting
Edward Cree (3):
sfc: farch: fix TX queue lookup in TX flush done handling
sfc: farch: fix TX queue lookup in TX event handling
sfc: ef10: fix TX queue lookup in TX event handling
Elad Grupi (1):
nvmet-tcp: fix a segmentation fault during io parsing error
Eli Cohen (4):
vdpa/mlx5: Fix suspend/resume index restoration
vdpa/mlx5: Fix wrong use of bit numbers
vdpa/mlx5: Set err = -ENOMEM in case dma_map_sg_attrs fails
{net,vdpa}/mlx5: Configure interface MAC into mpfs L2 table
Elia Devito (1):
ALSA: hda/realtek: Add fixup for HP Spectre x360 15-df0xxx
Emily Deng (1):
drm/amdgpu: Fix some unload driver issues
Emmanuel Grumbach (1):
mac80211: clear the beacon's CRC after channel switch
Eric Auger (2):
KVM: arm/arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST read
KVM: arm64: Fix KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION read
Eric Biggers (3):
random: initialize ChaCha20 constants with correct endianness
f2fs: fix error handling in f2fs_end_enable_verity()
crypto: rng - fix crypto_rng_reset() refcounting when !CRYPTO_STATS
Eric Dumazet (13):
net: ensure mac header is set in virtio_net_hdr_to_skb()
sch_red: fix off-by-one checks in red_check_params()
netfilter: nft_limit: avoid possible divide error in nft_limit_init
gro: ensure frag0 meets IP header alignment
inet: use bigger hash table for IP ID generation
net/packet: remove data races in fanout operations
ip6_vti: proper dev_{hold|put} in ndo_[un]init methods
netfilter: nftables: avoid overflows in nft_hash_buckets()
virtio_net: Do not pull payload in skb->head
ip6_gre: proper dev_{hold|put} in ndo_[un]init methods
sit: proper dev_{hold|put} in ndo_[un]init methods
ip6_tunnel: sit: proper dev_{hold|put} in ndo_[un]init methods
ipv6: remove extra dev_hold() for fallback tunnels
Eric Farman (1):
vfio-ccw: Check initialized flag in cp_init()
Erwan Le Ray (14):
serial: stm32: fix code cleaning warnings and checks
serial: stm32: add "_usart" prefix in functions name
serial: stm32: fix probe and remove order for dma
serial: stm32: fix startup by enabling usart for reception
serial: stm32: fix incorrect characters on console
serial: stm32: fix TX and RX FIFO thresholds
serial: stm32: fix a deadlock condition with wakeup event
serial: stm32: fix wake-up flag handling
serial: stm32: fix a deadlock in set_termios
serial: stm32: fix tx dma completion, release channel
serial: stm32: call stm32_transmit_chars locked
serial: stm32: fix FIFO flush in startup and set_termios
serial: stm32: add FIFO flush when port is closed
serial: stm32: fix tx_empty condition
Eryk Brol (1):
drm/amd/display: Check for DSC support instead of ASIC revision
Eryk Rybak (2):
i40e: Fix kernel oops when i40e driver removes VF's
i40e: Fix display statistics for veb_tc
Esteve Varela Colominas (1):
platform/x86: thinkpad_acpi: Allow the FnLock LED to change state
Eugene Korenevsky (1):
cifs: fix out-of-bound memory access when calling smb3_notify() at mount point
Evan Nimmo (1):
xfrm: Use actual socket sk instead of skb socket for xfrm_output_resume
Evan Quan (1):
drm/amd/pm: correct MGpuFanBoost setting
Ewan D. Milne (1):
scsi: scsi_dh_alua: Remove check for ASC 24h in alua_rtpg()
Eyal Birger (1):
xfrm: interface: fix ipv4 pmtu check to honor ip header df
Fabian Vogt (7):
Input: nspire-keypad - enable interrupts only when opened
fotg210-udc: Fix DMA on EP0 for length > max packet size
fotg210-udc: Fix EP0 IN requests bigger than two packets
fotg210-udc: Remove a dubious condition leading to fotg210_done
fotg210-udc: Mask GRP2 interrupts we don't handle
fotg210-udc: Don't DMA more than the buffer can take
fotg210-udc: Complete OUT requests on short packets
Fabien Parent (1):
arm64: dts: mediatek: fix reset GPIO level on pumpkin
Fabio Estevam (1):
media: rkvdec: Remove of_match_ptr()
Fabio Pricoco (1):
ice: Increase control queue timeout
Fabrice Gasnier (1):
mfd: stm32-timers: Avoid clearing auto reload register
Fangzhi Zuo (1):
drm/amd/display: Fix debugfs link_settings entry
Feilong Lin (1):
ACPI / hotplug / PCI: Fix reference count leak in enable_slot()
Felix Fietkau (6):
mt76: fix potential DMA mapping leak
mt76: mt7615: fix tx skb dma unmap
mt76: mt7915: fix tx skb dma unmap
mt76: mt7615: fix entering driver-own state on mt7663
net: ethernet: mtk_eth_soc: fix RX VLAN offload
perf jevents: Fix getting maximum number of fds
Felix Kuehling (1):
drm/amdkfd: fix build error with AMD_IOMMU_V2=m
Fenghua Yu (8):
selftests/resctrl: Enable gcc checks to detect buffer overflows
selftests/resctrl: Fix compilation issues for global variables
selftests/resctrl: Fix compilation issues for other global variables
selftests/resctrl: Clean up resctrl features check
selftests/resctrl: Fix missing options "-n" and "-p"
selftests/resctrl: Use resctrl/info for feature detection
selftests/resctrl: Fix incorrect parsing of iMC counters
selftests/resctrl: Fix checking for < 0 for unsigned values
Fengnan Chang (1):
ext4: fix error code in ext4_commit_super
Fernando Fernandez Mancera (1):
ethtool: fix missing NLM_F_MULTI flag when dumping
Ferry Toth (1):
usb: dwc3: pci: Enable usb2-gadget-lpm-disable for Intel Merrifield
Filipe Manana (5):
btrfs: fix metadata extent leak after failure to create subvolume
btrfs: fix race between transaction aborts and fsyncs leading to use-after-free
btrfs: fix race when picking most recent mod log operation for an old root
btrfs: fix race leading to unpersisted data and metadata on fsync
btrfs: release path before starting transaction when cloning inline extent
Finn Behrens (1):
tweewide: Fix most Shebang lines
Finn Thain (1):
m68k: mvme147,mvme16x: Don't wipe PCC timer config bits
Florent Revest (1):
libbpf: Initialize the bpf_seq_printf parameters array field by field
Florian Fainelli (1):
net: phy: broadcom: Only advertise EEE for supported modes
Florian Westphal (3):
netfilter: x_tables: fix compat match/target pad out-of-bound write
netfilter: bridge: add pre_exit hooks for ebtable unregistration
netfilter: arp_tables: add pre_exit hook for table unregister
Francesco Ruggeri (1):
ipv6: record frag_max_size in atomic fragments in input path
Francois Gervais (1):
rtc: pcf85063: fallback to parent of_node
Fredrik Strupe (1):
ARM: 9071/1: uprobes: Don't hook on thumb instructions
Frieder Schrempf (1):
can: mcp251x: fix resume from sleep before interface was brought up
Fugang Duan (1):
net: fec: fix the potential memory leak in fec_enet_init()
Gao Xiang (2):
parisc: avoid a warning on u8 cast for cmpxchg on u8 pointers
erofs: add unsupported inode i_format check
Geert Uytterhoeven (5):
regulator: bd9571mwv: Fix AVS and DVFS voltage range
phy: marvell: ARMADA375_USBCLUSTER_PHY should not default to y, unconditionally
dt-bindings: media: renesas,vin: Make resets optional on R-Car Gen1
serial: sh-sci: Fix off-by-one error in FIFO threshold register setting
i2c: sh_mobile: Use new clock calculation formulas for RZ/G2E
Geoffrey D. Bennett (2):
ALSA: usb-audio: scarlett2: Fix device hang with ehci-pci
ALSA: usb-audio: scarlett2: Improve driver startup messages
George McCollister (1):
net: hsr: fix mac_len checks
Gerd Hoffmann (3):
drm/qxl: release shadow on shutdown
drm/qxl: use ttm bo priorities
Revert "drm/qxl: do not run release if qxl failed to init"
Gioh Kim (2):
block/rnbd-clt: Fix missing a memory free when unloading the module
RDMA/rtrs-clt: destroy sysfs after removing session from active list
Giovanni Cabiddu (1):
crypto: qat - fix error path in adf_isr_resource_alloc()
Greg Kroah-Hartman (62):
Linux 5.10.29
Linux 5.10.30
Linux 5.10.31
Linux 5.10.32
Linux 5.10.33
Linux 5.10.34
Linux 5.10.35
Linux 5.10.36
Linux 5.10.37
Revert "iommu/vt-d: Remove WO permissions on second-level paging entries"
Revert "iommu/vt-d: Preset Access/Dirty bits for IOVA over FL"
kobject_uevent: remove warning in init_uevent_argv()
Linux 5.10.38
Linux 5.10.39
Revert "ALSA: sb8: add a check for request_region"
Revert "rapidio: fix a NULL pointer dereference when create_workqueue() fails"
Revert "serial: mvebu-uart: Fix to avoid a potential NULL pointer dereference"
Revert "video: hgafb: fix potential NULL pointer dereference"
Revert "net: stmicro: fix a missing check of clk_prepare"
Revert "leds: lp5523: fix a missing check of return value of lp55xx_read"
Revert "hwmon: (lm80) fix a missing check of bus read in lm80 probe"
Revert "video: imsttfb: fix potential NULL pointer dereferences"
Revert "ecryptfs: replace BUG_ON with error handling code"
Revert "scsi: ufs: fix a missing check of devm_reset_control_get"
Revert "gdrom: fix a memory leak bug"
cdrom: gdrom: initialize global variable at init time
Revert "media: rcar_drif: fix a memory disclosure"
Revert "rtlwifi: fix a potential NULL pointer dereference"
Revert "qlcnic: Avoid potential NULL pointer dereference"
Revert "niu: fix missing checks of niu_pci_eeprom_read"
net: rtlwifi: properly check for alloc_workqueue() failure
Linux 5.10.40
Linux 5.10.41
kgdb: fix gcc-11 warnings harder
Revert "crypto: cavium/nitrox - add an error message to explain the failure of pci_request_mem_regions"
Revert "media: usb: gspca: add a missed check for goto_low_power"
Revert "ALSA: sb: fix a missing check of snd_ctl_add"
Revert "serial: max310x: pass return value of spi_register_driver"
Revert "net: fujitsu: fix a potential NULL pointer dereference"
Revert "net/smc: fix a NULL pointer dereference"
Revert "net: caif: replace BUG_ON with recovery code"
Revert "char: hpet: fix a missing check of ioremap"
Revert "ALSA: gus: add a check of the status of snd_ctl_add"
Revert "ALSA: usx2y: Fix potential NULL pointer dereference"
Revert "isdn: mISDNinfineon: fix potential NULL pointer dereference"
Revert "ath6kl: return error code in ath6kl_wmi_set_roam_lrssi_cmd()"
Revert "isdn: mISDN: Fix potential NULL pointer dereference of kzalloc"
Revert "dmaengine: qcom_hidma: Check for driver register failure"
Revert "libertas: add checks for the return value of sysfs_create_group"
libertas: register sysfs groups properly
Revert "ASoC: cs43130: fix a NULL pointer dereference"
ASoC: cs43130: handle errors in cs43130_probe() properly
Revert "media: dvb: Add check on sp8870_readreg"
Revert "media: gspca: mt9m111: Check write_bridge for timeout"
Revert "media: gspca: Check the return value of write_bridge for timeout"
media: gspca: properly check for errors in po1030_probe()
Revert "net: liquidio: fix a NULL pointer dereference"
Revert "brcmfmac: add a check for the status of usb_register"
brcmfmac: properly check for bus register errors
i915: fix build warning in intel_dp_get_link_status()
Revert "Revert "ALSA: usx2y: Fix potential NULL pointer dereference""
Linux 5.10.42
Grzegorz Siwik (1):
i40e: Fix parameters in aq_get_phy_register()
Guangbin Huang (2):
net: hns3: clear VF down state bit before request link status
net: hns3: remediate a potential overflow risk of bd_num_list
Guangqing Zhu (2):
power: supply: cpcap-battery: fix invalid usage of list cursor
thermal/drivers/tsens: Fix missing put_device error
Guchun Chen (3):
drm/amdgpu: fix NULL pointer dereference
drm/amdgpu: update gc golden setting for Navi12
drm/amdgpu: update sdma golden setting for Navi12
Guennadi Liakhovetski (1):
ASoC: SOF: Intel: HDA: fix core status verification
Guenter Roeck (1):
pcnet32: Use pci_resource_len to validate PCI resource
Gulam Mohamed (1):
block: fix a race between del_gendisk and BLKRRPART
Guochun Mao (1):
ubifs: Only check replay with inode type to judge if inode linked
Gustavo A. R. Silva (6):
mtd: physmap: physmap-bt1-rom: Fix unintentional stack access
sctp: Fix out-of-bounds warning in sctp_process_asconf_param()
flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()
ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()
wl3501_cs: Fix out-of-bounds warnings in wl3501_send_pkt
wl3501_cs: Fix out-of-bounds warnings in wl3501_mgmt_join
Gustavo Pimentel (1):
dmaengine: dw-edma: Fix crash on loading/unloading driver
Hanna Hawa (2):
pinctrl: pinctrl-single: remove unused parameter
pinctrl: pinctrl-single: fix pcs_pin_dbg_show() when bits_per_mux is not zero
Hannes Reinecke (1):
nvme: retrigger ANA log update if group descriptor isn't found
Hans Verkuil (5):
media: gspca/sq905.c: fix uninitialized variable
media: vivid: update EDID
media: gscpa/stv06xx: fix memory leak
media: v4l2-ctrls: fix reference to freed memory
media: v4l2-ctrls.c: fix race condition in hdl->requests list
Hans de Goede (18):
ASoC: intel: atom: Stop advertising non working S24LE support
extcon: arizona: Fix some issues when HPDET IRQ fires after the jack has been unplugged
extcon: arizona: Fix various races on driver unbind
usb: roles: Call try_module_get() from usb_role_switch_find_by_fwnode()
misc: lis3lv02d: Fix false-positive WARN on various HP models
HID: lenovo: Use brightness_set_blocking callback for setting LEDs brightness
HID: lenovo: Fix lenovo_led_set_tp10ubkbd() error handling
HID: lenovo: Check hid_get_drvdata() returns non NULL in lenovo_event()
HID: lenovo: Map mic-mute button to KEY_F20 instead of KEY_MICMUTE
ASoC: Intel: bytcr_rt5640: Enable jack-detect support on Asus T100TAF
ASoC: Intel: bytcr_rt5640: Add quirk for the Chuwi Hi8 tablet
ASoC: rt5670: Add a quirk for the Dell Venue 10 Pro 5055
Input: elants_i2c - do not bind to i2c-hid compatible ACPI instantiated devices
Input: silead - add workaround for x86 BIOS-es which bring the chip up in a stuck state
gpiolib: acpi: Add quirk to ignore EC wakeups on Dell Venue 10 Pro 5055
platform/x86: intel_int0002_vgpio: Only call enable_irq_wake() when using s2idle
platform/x86: dell-smbios-wmi: Fix oops on rmmod dell_smbios
platform/x86: touchscreen_dmi: Add info for the Chuwi Hi10 Pro (CWI529) tablet
Hansem Ro (1):
Input: ili210x - add missing negation for touch indication on ili210x
Hao Chen (1):
net: hns3: fix for vxlan gpe tx checksum bug
Harald Freudenberger (2):
s390/zcrypt: fix zcard and zqueue hot-unplug memleak
s390/archrandom: add parameter check for s390_arch_random_generate
Harry Wentland (1):
drm/amd/display: Reject non-zero src_y and src_x for video planes
Hauke Mehrtens (1):
mtd: rawnand: mtk: Fix WAITRDY break condition and timeout
He Ying (3):
irqchip/gic-v3: Do not enable irqs when handling spurious interrups
cpuidle: Fix ARM_QCOM_SPM_CPUIDLE configuration
firmware: qcom-scm: Fix QCOM_SCM configuration
Heiko Carstens (2):
init/Kconfig: make COMPILE_TEST depend on !S390
KVM: s390: fix guarded storage control register handling
Heiner Kallweit (2):
r8169: tweak max read request size for newer chips also in jumbo mtu mode
r8169: don't advertise pause in jumbo mode
Heinz Mauelshagen (1):
dm raid: fix inconclusive reshape layout on fast raid4/5/6 table reload sequences
Helge Deller (1):
parisc: parisc-agp requires SBA IOMMU driver
Hemant Kumar (1):
usb: gadget: Fix double free of device descriptor pointers
Heming Zhao (1):
md-cluster: fix use-after-free issue when removing rdev
Hillf Danton (1):
tty: n_gsm: check error while registering tty devices
Hoang Le (2):
tipc: convert dest node's address to network order
Revert "net:tipc: Fix a double free in tipc_sk_mcast_rcv"
Hou Pu (3):
nvmet: return proper error code from discovery ctrl
nvmet: use new ana_log_size instead the old one
nvmet-tcp: fix inline data size comparison in nvmet_tcp_queue_response
Hristo Venev (2):
net: sit: Unregister catch-all devices
net: ip6_tunnel: Unregister catch-all devices
Hsin-Yi Wang (1):
misc: eeprom: at24: check suspend status before disable regulator
Huang Pei (2):
MIPS: fix local_irq_{disable,enable} in asmmacro.h
MIPS: loongson64: fix bug when PAGE_SIZE > 16KB
Hubert Streidl (1):
mfd: da9063: Support SMBus and I2C mode
Hui Tang (1):
crypto: qat - fix unmap invalid dma address
Hui Wang (4):
ALSA: hda: generic: change the DAC ctl name for LO+SPK or LO+HP
ALSA: hda/realtek: reset eapd coeff to default value for alc287
ALSA: hda/realtek: the bass speaker can't output sound on Yoga 9i
ALSA: hda/realtek: Headphone volume is controlled by Front mixer
Hyeongseok Kim (1):
exfat: fix erroneous discard when clear cluster bit
Håkon Bugge (1):
RDMA/core: Fix corrupted SL on passive side
Ian Abbott (1):
staging: comedi: tests: ni_routes_test: Fix compilation error
Ido Schimmel (2):
mlxsw: spectrum: Fix ECN marking in tunnel decapsulation
mlxsw: spectrum_mr: Update egress RIF list before route's action
Igor Matheus Andrade Torrente (1):
video: hgafb: fix potential NULL pointer dereference
Igor Pylypiv (1):
scsi: pm80xx: Increase timeout for pm80xx mpi_uninit_check()
Ilya Leoshkevich (1):
selftests: fix prepending $(OUTPUT) to $(TEST_PROGS)
Ilya Lipnitskiy (4):
of: property: fw_devlink: do not link ".*,nr-gpios"
MIPS: pci-mt7620: fix PLL lock check
MIPS: pci-rt2880: fix slot 0 configuration
MIPS: pci-legacy: stop using of_pci_range_to_resource
Ilya Maximets (1):
openvswitch: fix send of uninitialized stack memory in ct limit reply
Ingo Molnar (1):
x86/platform/uv: Fix !KEXEC build failure
J. Bruce Fields (1):
nfsd: ensure new clients break delegations
Jacek Bułatek (1):
ice: Fix for dereference of NULL pointer
Jack Pham (3):
usb: dwc3: gadget: Free gadget structure only after freeing endpoints
usb: dwc3: gadget: Enable suspend events
usb: typec: ucsi: Retrieve all the PDOs instead of just the first 4
Jack Qiu (1):
fs: direct-io: fix missing sdio->boundary
Jacob Pan (1):
iommu/vt-d: Reject unsupported page request modes
Jae Hyun Yoo (2):
Revert "i3c master: fix missing destroy_workqueue() on error in i3c_master_register"
media: aspeed: fix clock handling logic
Jaegeuk Kim (1):
dm verity fec: fix misaligned RS roots IO
Jakub Kicinski (1):
ethtool: pause: make sure we init driver stats
James Bottomley (2):
KEYS: trusted: Fix TPM reservation for seal/unseal
security: keys: trusted: fix TPM2 authorizations
James Smart (7):
scsi: lpfc: Fix incorrect dbde assignment when building target abts wqe
scsi: lpfc: Fix pt2pt connection does not recover after LOGO
scsi: lpfc: Fix crash when a REG_RPI mailbox fails triggering a LOGO response
scsi: lpfc: Fix error handling for mailboxes completed in MBX_POLL mode
scsi: lpfc: Remove unsupported mbox PORT_CAPABILITIES logic
scsi: lpfc: Fix illegal memory access on Abort IOCBs
nvme-fc: clear q_live at beginning of association teardown
James Zhu (4):
drm/amdgpu/vcn1: add cancel_delayed_work_sync before power gate
drm/amdgpu/vcn2.0: add cancel_delayed_work_sync before power gate
drm/amdgpu/vcn2.5: add cancel_delayed_work_sync before power gate
drm/amdgpu/jpeg2.0: add cancel_delayed_work_sync before power gate
Jan Beulich (3):
xen-pciback: redo VF placement in the virtual topology
xen-pciback: reconfigure also from backend watch handler
x86/Xen: swap NX determination and GDT setup on BSP
Jan Glauber (1):
md: Fix missing unused status line of /proc/mdstat
Jan Kara (3):
ext4: annotate data race in start_this_handle()
ext4: annotate data race in jbd2_journal_dirty_metadata()
ext4: Fix occasional generic/418 failure
Jan Kratochvil (1):
perf unwind: Fix separate debug info files when using elfutils' libdw's unwinder
Jane Chu (1):
mm/memory-failure: unnecessary amount of unmapping
Jann Horn (1):
tty: Remove dead termiox code
Jared Baldridge (1):
drm: Added orientation quirk for OneGX1 Pro
Jarkko Sakkinen (2):
tpm, tpm_tis: Extend locality handling to TPM2 in tpm_tis_gen_interrupt()
tpm, tpm_tis: Reserve locality in tpm_tis_resume()
Jaroslaw Gawin (1):
i40e: fix the restart auto-negotiation after FEC modified
Jason Gunthorpe (5):
vfio: Depend on MMU
vfio/fsl-mc: Re-order vfio_fsl_mc_probe()
vfio/pci: Move VGA and VF initialization to functions
vfio/pci: Re-order vfio_pci_probe()
vfio/mdev: Do not allow a mdev_type to have a NULL parent pointer
Jason Wang (1):
vhost-vdpa: fix vm_flags for virtqueue doorbell mapping
Jason Xing (1):
i40e: fix the panic when running bpf in xdpdrv mode
Javed Hasan (1):
scsi: qedf: Add pointer checks in qedf_update_link_speed()
Jean Delvare (1):
i2c: i801: Don't generate an interrupt on bus reset
Jeff Layton (4):
ceph: fix inode leak on getattr error in __fh_to_dentry
ceph: fix fscache invalidation
ceph: don't clobber i_snap_caps on non-I_NEW inode
ceph: don't allow access to MDS-private inodes
Jeffrey Hugo (2):
bus: mhi: core: Fix check for syserr at power_up
bus: mhi: core: Sanity check values from remote device before use
Jeffrey Mitchell (1):
ecryptfs: fix kernel panic with null dev_name
Jens Axboe (1):
io_uring: don't mark S_ISBLK async work as unbounded
Jeremy Szu (4):
ALSA: hda/realtek: fix mute/micmute LEDs for HP 855 G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 15 G8
ALSA: hda/realtek: fix mute/micmute LEDs and speaker for HP Zbook Fury 17 G8
Jernej Skrabec (2):
arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
media: cedrus: Fix H265 status definitions
Jerome Forissier (1):
tee: optee: do not check memref size on return from Secure World
Jesse Brandeburg (1):
ixgbe: fix large MTU request from VF
Jia Zhou (1):
ALSA: core: remove redundant spin_lock pair in snd_card_disconnect
Jia-Ju Bai (7):
interconnect: core: fix error return code of icc_link_destroy()
HID: alps: fix error return code in alps_input_configured()
mtd: maps: fix error return code of physmap_flash_remove()
media: platform: sunxi: sun6i-csi: fix error return code of sun6i_video_start_streaming()
thermal: thermal_of: Fix error return code of thermal_of_populate_bind_params()
rpmsg: qcom_glink_native: fix error return code of qcom_glink_rx_data()
kernel: kexec_file: fix error return code of kexec_calculate_store_digests()
Jian Shen (2):
net: hns3: add check for HNS3_NIC_STATE_INITED in hns3_reset_notify_up_enet()
net: hns3: put off calling register_netdev() until client initialize complete
Jianbo Liu (1):
net/mlx5: Set reformat action when needed for termination rules
Jianxiong Gao (9):
driver core: add a min_align_mask field to struct device_dma_parameters
swiotlb: add a IO_TLB_SIZE define
swiotlb: factor out an io_tlb_offset helper
swiotlb: factor out a nr_slots helper
swiotlb: clean up swiotlb_tbl_unmap_single
swiotlb: refactor swiotlb_tbl_map_single
swiotlb: don't modify orig_addr in swiotlb_tbl_sync_single
swiotlb: respect min_align_mask
nvme-pci: set min_align_mask
Jiapeng Zhong (1):
HID: wacom: Assign boolean values to a bool variable
Jiaran Zhang (1):
net: hns3: fix incorrect resp_msg issue
Jim Ma (1):
tls splice: check SPLICE_F_NONBLOCK instead of MSG_DONTWAIT
Jim Mattson (1):
perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
Jin Yao (1):
perf report: Fix wrong LBR block sorting
Jingwen Chen (1):
drm/amd/amdgpu: fix refcount leak
Jinzhou Su (1):
drm/amdgpu: Add mem sync flag for IB allocated by SA
Jiri Kosina (3):
iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_enqueue_hcmd()
iwlwifi: Fix softirq/hardirq disabling in iwl_pcie_gen2_enqueue_hcmd()
Bluetooth: avoid deadlock between hci_dev->lock and socket lock
Jiri Olsa (6):
tools/resolve_btfids: Build libbpf and libsubcmd in separate directories
tools/resolve_btfids: Check objects before removing
tools/resolve_btfids: Set srctree variable unconditionally
kbuild: Add resolve_btfids clean to root clean target
kbuild: Do not clean resolve_btfids if the output does not exist
perf tools: Fix dynamic libbpf link
Jisheng Zhang (1):
arm64: kprobes: Restore local irqflag if kprobes is cancelled
Joakim Zhang (1):
net: stmmac: Fix MAC WoL not working if PHY does not support WoL
Joe Thornber (2):
dm persistent data: packed struct should have an aligned() attribute too
dm space map common: fix division bug in sm_ll_find_free_block()
Joel Stanley (1):
jffs2: Hook up splice_write callback
Joerg Roedel (5):
x86/sev: Do not require Hypervisor CPUID bit for SEV guests
x86/sev-es: Don't return NULL from sev_es_get_ghcb()
x86/sev-es: Use __put_user()/__get_user() for data accesses
x86/sev-es: Forward page-faults which happen during emulation
x86/boot/compressed/64: Check SEV encryption in the 32-bit boot-path
Johan Hovold (22):
net: hso: fix NULL-deref on disconnect regression
Revert "USB: cdc-acm: fix rounding error in TIOCSSERIAL"
tty: moxa: fix TIOCSSERIAL jiffies conversions
tty: amiserial: fix TIOCSSERIAL permission check
USB: serial: usb_wwan: fix TIOCSSERIAL jiffies conversions
staging: greybus: uart: fix TIOCSSERIAL jiffies conversions
USB: serial: ti_usb_3410_5052: fix TIOCSSERIAL permission check
staging: fwserial: fix TIOCSSERIAL jiffies conversions
tty: moxa: fix TIOCSSERIAL permission check
staging: fwserial: fix TIOCSSERIAL permission check
staging: fwserial: fix TIOCSSERIAL implementation
staging: fwserial: fix TIOCGSERIAL implementation
staging: greybus: uart: fix unprivileged TIOCCSERIAL
USB: cdc-acm: fix unprivileged TIOCCSERIAL
USB: cdc-acm: fix TIOCGSERIAL implementation
tty: actually undefine superseded ASYNC flags
tty: fix return value for unsupported ioctls
tty: fix return value for unsupported termiox ioctls
serial: core: return early on unsupported ioctls
net: hso: fix control-request directions
USB: trancevibrator: fix control-request direction
net: hso: bail out on interrupt URB allocation failure
Johannes Berg (15):
iwlwifi: pcie: properly set LTR workarounds on 22000 devices
nl80211: fix beacon head validation
nl80211: fix potential leak of ACL params
cfg80211: check S1G beacon compat element length
mac80211: fix TXQ AC confusion
cfg80211: scan: drop entry from hidden_list on overflow
mac80211: bail out if cipher schemes are invalid
iwlwifi: pcie: make cfg vs. trans_cfg more robust
um: Mark all kernel symbols as local
um: Disable CONFIG_GCOV with MODULES
mac80211: drop A-MSDUs on old ciphers
mac80211: add fragment cache to sta_info
mac80211: check defrag PN against current frame
mac80211: prevent attacks on TKIP/WEP as well
mac80211: do not accept/forward invalid EAPOL frames
John Fastabend (2):
bpf, sockmap: Fix sk->prot unhash op reset
bpf, sockmap: Fix incorrect fwd_alloc accounting
John Millikin (1):
x86/build: Propagate $(CLANG_FLAGS) to $(REALMODE_FLAGS)
John Paul Adrian Glaubitz (2):
ia64: tools: remove inclusion of ia64-specific version of errno.h header
ia64: tools: remove duplicate definition of ia64_mf() on ia64
Jolly Shah (1):
scsi: libsas: Reset num_scatter if libata marks qc as NODATA
Jonas Holmberg (1):
ALSA: aloop: Fix initialization of controls
Jonas Witschel (1):
ALSA: hda/realtek: fix mute/micmute LEDs for HP ProBook 445 G7
Jonathan Cameron (7):
iio:accel:adis16201: Fix wrong axis assignment that prevents loading
iio:adc:ad7476: Fix remove handling
iio: adc: ad7768-1: Fix too small buffer passed to iio_push_to_buffers_with_timestamp()
iio: adc: ad7124: Fix missbalanced regulator enable / disable on error.
iio: adc: ad7124: Fix potential overflow due to non sequential channel numbers
iio: adc: ad7923: Fix undersized rx buffer.
iio: adc: ad7192: Avoid disabling a clock that was never enabled.
Jonathan Kim (1):
drm/amdgpu: mask the xgmi number of hops reported from psp to kfd
Jonathan McDowell (1):
net: stmmac: Set FIFO sizes for ipq806x
Jonathon Reinhart (3):
net: Make tcp_allowed_congestion_control readonly in non-init netns
netfilter: conntrack: Make global sysctls readonly in non-init netns
net: Only allow init netns to set default tcp cong to a restricted algo
Jordan Niethe (1):
powerpc/64s: Fix pte update for kernel memory on radix
Josef Bacik (5):
btrfs: do proper error handling in create_reloc_root
btrfs: do proper error handling in btrfs_update_reloc_root
btrfs: convert logic BUG_ON()'s in replace_path to ASSERT()'s
btrfs: avoid RCU stalls while running delayed iputs
btrfs: do not BUG_ON in link_to_fixup_dir
Jouni Roivas (1):
hfsplus: prevent corruption in shrinking truncate
Juergen Gross (2):
xen/events: fix setting irq affinity
xen/gntdev: fix gntdev_mmap() error exit path
Julian Braha (2):
lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS
media: drivers: media: pci: sta2x11: fix Kconfig dependency on GPIOLIB
Julian Wiedmann (1):
net/smc: remove device from smcd_dev_list after failed device_add()
Jussi Maki (1):
bpf: Set mac_len in bpf_skb_change_head
KP Singh (1):
libbpf: Add explicit padding to btf_dump_emit_type_decl_opts
Kai Stuhlemmer (ebee Engineering) (1):
mtd: rawnand: atmel: Update ecc_stats.corrected counter
Kai Vehmanen (1):
ALSA: hda/hdmi: fix race in handling acomp ELD notification at resume
Kai-Heng Feng (3):
USB: Add LPM quirk for Lenovo ThinkPad USB-C Dock Gen2 Ethernet
drm/radeon/dpm: Disable sclk switching on Oland when two 4K 60Hz monitors are connected
platform/x86: hp_accel: Avoid invoking _INI to speed up resume
Kailang Yang (1):
ALSA: hda/realtek - Headset Mic issue on HP platform
Kaixu Xia (1):
cxgb4: Fix the -Wmisleading-indentation warning
Kalyan Thota (1):
drm/msm/disp/dpu1: icc path needs to be set before dpu runtime resume
Kamal Heib (1):
RDMA/qedr: Fix kernel panic when trying to access recv_cq
Kan Liang (1):
perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
Karthikeyan Kathirvel (1):
mac80211: choose first enabled channel for monitor
Kees Cook (4):
drm/radeon: Fix off-by-one power_state index heap overwrite
drm/radeon: Avoid power table parsing memory leaks
debugfs: Make debugfs_allow RO after init
proc: Check /proc/$pid/attr/ writes against file opener
Kefeng Wang (1):
riscv: Fix spelling mistake "SPARSEMEM" to "SPARSMEM"
Keith Busch (2):
nvmet: remove unsupported command noise
nvme-tcp: rerun io_work if req_list is not empty
Kenneth Feng (1):
drm/amd/pm: fix workload mismatch on vega10
Kenta.Tada@sony.com (1):
seccomp: Fix CONFIG tests for Seccomp_filters
Kevin Barnett (1):
scsi: smartpqi: Add new PCI IDs
Kevin Wang (1):
drm/amdkfd: correct sienna_cichlid SDMA RLC register offset error
Kiran Gunda (1):
backlight: qcom-wled: Fix FSC update issue for WLED5
Kishon Vijay Abraham I (7):
PCI: keystone: Let AM65 use the pci_ops defined in pcie-designware-host.c
phy: cadence: Sierra: Fix PHY power_on sequence
phy: ti: j721e-wiz: Invoke wiz_init() before of_platform_device_create()
phy: ti: j721e-wiz: Delete "clk_div_sel" clk provider during cleanup
PCI: endpoint: Make *_get_first_free_bar() take into account 64 bit BAR
PCI: endpoint: Add helper API to get the 'next' unreserved BAR
PCI: endpoint: Make *_free_bar() to return error codes on failure
Konrad Dybcio (1):
drm/msm/adreno: a5xx_power: Don't apply A540 lm_setup to other GPUs
Krzysztof Goreczny (1):
ice: prevent ice_open and ice_stop during reset
Krzysztof Kozlowski (14):
clk: socfpga: fix iomem pointer cast on 64-bit
ARM: dts: exynos: correct fuel gauge interrupt trigger level on GT-I9100
ARM: dts: exynos: correct fuel gauge interrupt trigger level on Midas family
ARM: dts: exynos: correct MUIC interrupt trigger level on Midas family
ARM: dts: exynos: correct PMIC interrupt trigger level on Midas family
ARM: dts: exynos: correct PMIC interrupt trigger level on Odroid X/U3 family
ARM: dts: exynos: correct PMIC interrupt trigger level on SMDK5250
ARM: dts: exynos: correct PMIC interrupt trigger level on Snow
ARM: dts: s5pv210: correct fuel gauge interrupt trigger level on Fascinate family
memory: renesas-rpc-if: fix possible NULL pointer dereference of resource
memory: samsung: exynos5422-dmc: handle clk_set_parent() failure
ASoC: simple-card: fix possible uninitialized single_cpu local variable
pinctrl: samsung: use 'int' for register masks in Exynos
i2c: s3c2410: fix possible NULL pointer deref on read message after write
Kumar Kartikeya Dwivedi (1):
net: sched: bump refcount for new action in ACT replace mode
Kunihiko Hayashi (2):
ARM: dts: uniphier: Change phy-mode to RGMII-ID to enable delay pins for RTL8211E
arm64: dts: uniphier: Change phy-mode to RGMII-ID to enable delay pins for RTL8211E
Kuninori Morimoto (2):
ASoC: rsnd: call rsnd_ssi_master_clk_start() from rsnd_ssi_init()
ASoC: rsnd: check all BUSIF status when error
Kuogee Hsieh (1):
drm/msm/dp: initialize audio_comp when audio starts
Kurt Kanzenbach (1):
net: hsr: Reset MAC header for Tx path
Lad Prabhakar (2):
media: i2c: imx219: Move out locking/unlocking of vflip and hflip controls from imx219_set_stream
media: i2c: imx219: Balance runtime PM use-count
Lai Jiangshan (1):
KVM/VMX: Invoke NMI non-IST entry instead of IST entry
Lang Yu (1):
drm/amd/amdgpu: fix a potential deadlock in gpu reset
Lars-Peter Clausen (1):
iio: inv_mpu6050: Fully validate gyro and accel scale writes
Laurent Pinchart (3):
dmaengine: xilinx: dpdma: Fix descriptor issuing on video group
dmaengine: xilinx: dpdma: Fix race condition in done IRQ
media: imx: capture: Return -EPIPE from __capture_legacy_try_fmt()
Lee Gibson (1):
qtnfmac: Fix possible buffer overflow in qtnf_event_handle_external_auth
Lee Jones (1):
drm/amd/display/dc/dce/dce_aux: Remove duplicate line causing 'field overwritten' issue
Len Brown (1):
Revert "tools/power turbostat: adjust for temperature offset"
Leo Yan (5):
perf auxtrace: Fix potential NULL pointer dereference
perf tools: Change fields type in perf_record_time_conv
perf jit: Let convert_timestamp() to be backwards-compatible
perf session: Add swap operation for event TIME_CONV
locking/lockdep: Correct calling tracepoints
Leon Romanovsky (5):
RDMA/addr: Be strict with gid size
RDMA/siw: Properly check send and receive CQ pointers
RDMA/siw: Release xarray entry
RDMA/core: Prevent divide-by-zero error triggered by the user
RDMA/rxe: Clear all QP fields if creation failed
Li Huafei (1):
ima: Fix the error code for restoring the PCR value
Liam Howlett (1):
m68k: Add missing mmap_read_lock() to sys_cacheflush()
Lijun Pan (3):
ibmvnic: avoid calling napi_disable() twice
ibmvnic: remove duplicate napi_schedule call in do_reset function
ibmvnic: remove duplicate napi_schedule call in open function
Like Xu (1):
perf/x86: Avoid touching LBR_TOS MSR for Arch LBR
Liming Sun (1):
platform/mellanox: mlxbf-tmfifo: Fix a memory barrier issue
Lin Ma (1):
bluetooth: eliminate the potential race condition when removing the HCI controller
Lingutla Chandrasekhar (1):
sched/fair: Ignore percpu threads for imbalance pulls
Linus Lüssing (1):
net: bridge: mcast: fix broken length + header check for MRDv6 Adv.
Linus Torvalds (3):
readdir: make sure to verify directory entry for legacy interfaces too
Fix misc new gcc warnings
drm/i915/display: fix compiler warning about array overrun
Linus Walleij (3):
ARM: dts: ux500: Fix up TVK R3 sensors
drm/mcde/panel: Inverse misunderstood flag
net: ethernet: ixp4xx: Set the DMA masks explicitly
Liu Jian (1):
bpftool: Add sock_release help info for cgroup attach/prog load command
Liu Ying (1):
media: docs: Fix data organization of MEDIA_BUS_FMT_RGB101010_1X30
Loic Poulain (1):
net: qrtr: Fix memory leak on qrtr_tx_wait failure
Longfang Liu (1):
crypto: hisilicon/sec - fixes a printing error
Lorenz Bauer (1):
bpf: link: Refuse non-O_RDWR flags in BPF_OBJ_GET
Lorenzo Bianconi (2):
mt76: mt7915: fix aggr len debugfs node
mt76: mt7615: fix mib stats counter reporting to mac80211
Lu Baolu (9):
iommu/vt-d: Don't set then clear private data in prq_event_thread()
iommu/vt-d: Report right snoop capability when using FL for IOVA
iommu/vt-d: Report the right page fault address
iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
iommu/vt-d: Remove WO permissions on second-level paging entries
iommu/vt-d: Invalidate PASID cache when root/context entry changed
iommu/vt-d: Preset Access/Dirty bits for IOVA over FL
iommu/vt-d: Remove WO permissions on second-level paging entries
iommu/vt-d: Use user privilege for RID2PASID translation
Luca Ceresoli (1):
fpga: fpga-mgr: xilinx-spi: fix error messages on -EPROBE_DEFER
Luca Coelho (1):
iwlwifi: fix 11ax disabled bit in the regulatory capability flags
Luca Fancellu (1):
xen/evtchn: Change irq_info lock to raw_spinlock_t
Lucas Stankus (1):
staging: iio: cdc: ad7746: avoid overwrite of num_channels
Ludovic Desroches (1):
ARM: dts: at91: change the key code of the gpio key
Ludovic Senecaux (1):
netfilter: conntrack: Fix gre tunneling over ipv6
Luis Henriques (1):
virtiofs: fix memory leak in virtio_fs_probe()
Luiz Augusto von Dentz (1):
Bluetooth: SMP: Fail if remote and local public keys are identical
Lukasz Bartosik (2):
clk: fix invalid usage of list cursor in register
clk: fix invalid usage of list cursor in unregister
Lukasz Luba (2):
thermal/core/fair share: Lock the thermal zone while looping over instances
PM / devfreq: Unlock mutex and free devfreq struct in error path
Lukasz Majczak (1):
ASoC: Intel: kbl_da7219_max98927: Fix kabylake_ssp_fixup function
Luke D Jones (1):
ALSA: hda/realtek: GA503 use same quirks as GA401
Lv Yunlong (21):
ethernet/netronome/nfp: Fix a use after free in nfp_bpf_ctrl_msg_rx
drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit
ethernet: myri10ge: Fix a use after free in myri10ge_sw_tso
net:tipc: Fix a double free in tipc_sk_mcast_rcv
net/rds: Fix a use after free in rds_message_map_pages
dmaengine: Fix a double free in dma_async_device_register
gpu/xen: Fix a use after free in xen_drm_drv_init
ALSA: emu8000: Fix a use after free in snd_emu8000_create_mixer
ALSA: sb: Fix two use after free in snd_sb_qsound_build
mtd: rawnand: gpmi: Fix a double free in gpmi_nand_init
crypto: qat - Fix a double free in adf_create_ring
drivers/block/null_blk/main: Fix a double free in null_init.
IB/isert: Fix a use after free in isert_connect_request
mwl8k: Fix a double Free in mwl8k_probe_hw
ath10k: Fix a use after free in ath10k_htc_send_bundle
net:emac/emac-mac: Fix a use after free in emac_mac_tx_buf_send
RDMA/siw: Fix a use after free in siw_alloc_mr
RDMA/bnxt_re: Fix a double free in bnxt_qplib_alloc_res
net:nfc:digital: Fix a double free in digital_tg_recv_dep_req
ethernet:enic: Fix a use after free bug in enic_hard_start_xmit
drm/i915/gt: Fix a double free in gen8_preallocate_top_level_pdp
Maciej W. Rozycki (9):
x86/build: Disable HIGHMEM64G selection for M486SX
FDDI: defxx: Bail out gracefully with unassigned PCI resource for CSR
FDDI: defxx: Make MMIO the configuration default except for EISA
MIPS: Reinstate platform `__div64_32' handler
MIPS: Avoid DIVU in `__div64_32' is result would be zero
MIPS: Avoid handcoded DIVU in `__div64_32' altogether
vgacon: Record video mode changes with VT_RESIZEX
vt_ioctl: Revert VT_RESIZEX parameter handling removal
vt: Fix character height handling with VT_RESIZEX
Maciej Żenczykowski (2):
net-ipv6: bugfix - raw & sctp - switch to ipv6_can_nonlocal_bind()
net: fix nla_strcmp to handle more then one trailing null character
Magnus Karlsson (2):
i40e: fix broken XDP support
samples/bpf: Consider frame size in tx_only of xdpsock sample
Mahesh Salgaonkar (1):
powerpc/eeh: Fix EEH handling for hugepages in ioremap space.
Manivannan Sadhasivam (3):
mtd: Handle possible -EPROBE_DEFER from parse_mtd_partitions()
mtd: rawnand: qcom: Return actual error code instead of -ENODEV
ARM: 9075/1: kernel: Fix interrupted SMC calls
Mans Rullgard (1):
ARM: dts: am33xx: add aliases for mmc interfaces
Maor Gottlieb (3):
RDMA/mlx5: Fix drop packet rule in egress table
RDMA/mlx5: Recover from fatal event in dual port mode
RDMA/mlx5: Fix query DCT via DEVX
Marc Kleine-Budde (3):
can: mcp251x: fix support for half duplex SPI host controllers
can: mcp251xfd: mcp251xfd_probe(): add missing can_rx_offload_del() in error path
can: m_can: m_can_tx_work_queue(): fix tx_skb race condition
Marc Zyngier (4):
ACPI: GTDT: Don't corrupt interrupt mappings on watchdow probe failure
KVM: arm64: Fully zero the vcpu state on reset
arm64: entry: factor irq triage logic into macros
KVM: arm64: Prevent mixed-width VM creation
Marcel Hamer (1):
usb: dwc3: omap: improve extcon initialization
Marco Elver (1):
kcsan, debugfs: Move debugfs file creation out of early init
Marcus Folkesson (1):
wilc1000: write value to WILC_INTR2_ENABLE register
Marek Behún (4):
ARM: dts: turris-omnia: configure LED[2]/INTn pin as interrupt pin
arm64: dts: marvell: armada-37xx: add syscon compatible to NB clk node
cpufreq: armada-37xx: Fix setting TBG parent for load levels
clk: mvebu: armada-37xx-periph: remove .set_parent method for CPU PM clock
Marek Vasut (2):
rsi: Use resume_noirq for SDIO
drm/stm: Fix bus_flags handling
Marijn Suijten (2):
drm/msm/mdp5: Configure PP_SYNC_HEIGHT to double the vtotal
drm/msm/mdp5: Do not multiply vclk line count by 100
Mark Langsdorf (2):
ACPI: custom_method: fix potential use-after-free issue
ACPI: custom_method: fix a possible memory leak
Mark Pearson (1):
platform/x86: thinkpad_acpi: Correct thermal sensor allocation
Mark Rutland (1):
arm64: entry: always set GIC_PRIO_PSR_I_SET during entry
Mark Zhang (1):
RDMA/mlx5: Fix mlx5 rates to IB rates map
Martin Blumenstingl (3):
net: dsa: lantiq_gswip: Let GSWIP automatically set the xMII clock
net: dsa: lantiq_gswip: Don't use PHY auto polling
net: dsa: lantiq_gswip: Configure all remaining GSWIP_MII_CFG bits
Martin Leung (1):
drm/amd/display: changing sr exit latency
Martin Schiller (1):
net: phy: intel-xway: enable integrated led functions
Martin Wilck (2):
scsi: target: pscsi: Clean up after failure in pscsi_map_sg()
scsi: scsi_transport_srp: Don't block target in SRP_PORT_LOST state
Masahiro Yamada (4):
init/Kconfig: make COMPILE_TEST depend on HAS_IOMEM
kbuild: update config_data.gz only when the content of .config is changed
kbuild: generate Module.symvers only when vmlinux exists
scripts/clang-tools: switch explicitly to Python 3
Masami Hiramatsu (1):
x86/kprobes: Fix to check non boostable prefixes correctly
Mateusz Palczewski (2):
i40e: Added Asym_Pause to supported link modes
i40e: Fix PHY type identifiers for 2.5G and 5G adapters
Mathias Krause (1):
nitro_enclaves: Fix stale file descriptors on failed usercopy
Mathias Nyman (5):
xhci: check port array allocation was successful before dereferencing it
xhci: check control context is valid before dereferencing it.
xhci: fix potential array out of bounds with several interrupters
thunderbolt: usb4: Fix NVM read buffer bounds and offset issue
thunderbolt: dma_port: Fix NVM read buffer bounds and offset issue
Mathy Vanhoef (4):
mac80211: assure all fragments are encrypted
mac80211: prevent mixed key and fragment cache attacks
mac80211: properly handle A-MSDUs that start with an RFC 1042 header
cfg80211: mitigate A-MSDU aggregation attacks
Matt Chen (1):
iwlwifi: add support for Qu with AX201 device
Matt Wang (1):
scsi: BusLogic: Fix 64-bit system enumeration error for Buslogic
Matthew Wilcox (Oracle) (5):
XArray: Fix splitting to non-zero orders
radix tree test suite: Register the main thread with the RCU library
idr test suite: Take RCU read lock in idr_find_test_1
idr test suite: Create anchor before launching throbber
mm: fix struct page layout on 32-bit systems
Matthias Schiffer (1):
power: supply: bq27xxx: fix power_avg for newer ICs
Matti Vaittinen (1):
gpio: sysfs: Obey valid_mask
Mauro Carvalho Chehab (1):
atomisp: don't let it go past pipes array
Maxim Kochetkov (3):
net: dsa: Fix type was not set for devlink port
net: phy: marvell: fix m88e1011_set_downshift
net: phy: marvell: fix m88e1111_set_downshift
Maxim Mikityanskiy (2):
HID: plantronics: Workaround for double volume key presses
net/mlx5e: Use net_prefetchw instead of prefetchw in MPWQE TX datapath
Maximilian Luz (2):
usb: xhci: Increase timeout for HC halt
serial: 8250_dw: Add device HID for new AMD UART controller
Md Haris Iqbal (3):
RDMA/rtrs-clt: Close rtrs client conn before destroying rtrs clt session files
block/rnbd-clt: Change queue_depth type in rnbd_clt_session to size_t
block/rnbd-clt: Check the return value of the function rtrs_clt_query
Meng Li (1):
regmap: set debugfs_name to NULL after it is freed
Miaohe Lin (4):
khugepaged: fix wrong result value for trace_mm_collapse_huge_page_isolate()
mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts()
mm/migrate.c: fix potential indeterminate pte entry in migrate_vma_insert_page()
ksm: fix potential missing rmap_item for stable_node
Michael Brown (1):
xen-netback: Check for hotplug-status existence before watching
Michael Chan (3):
bnxt_en: Fix RX consumer index logic in the error path.
bnxt_en: Add PCI IDs for Hyper-V VF devices.
bnxt_en: Fix context memory setup for 64K page size.
Michael Ellerman (7):
powerpc/pseries: Only register vio drivers if vio bus exists
powerpc/pseries: Stop calling printk in rtas_stop_self()
powerpc/64s: Fix crashes when toggling stf barrier
powerpc/64s: Fix crashes when toggling entry flush barrier
selftests/gpio: Use TEST_GEN_PROGS_EXTENDED
selftests/gpio: Move include of lib.mk up
selftests/gpio: Fix build when source tree is read only
Michael Kelley (1):
Drivers: hv: vmbus: Increase wait time for VMbus unload
Michael Walle (2):
mtd: require write permissions for locking and badblock ioctls
rtc: fsl-ftm-alarm: add MODULE_TABLE()
Michal Kalderon (1):
nvmet-rdma: Fix NULL deref when SEND is completed with error
Michal Simek (1):
firmware: xilinx: Add a blank line after function declaration
Mickaël Salaün (1):
ovl: fix leaked dentry
Mihai Moldovan (1):
kconfig: nconf: stop endless search loops
Mike Galbraith (1):
x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access
Mike Marciniszyn (3):
IB/hfi1: Fix probe time panic when AIP is enabled with a buggy BIOS
IB/hfi1: Use kzalloc() for mmu_rb_handler allocation
IB/hfi1: Correct oversized ring allocation
Mike Rapoport (2):
nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff
openrisc: mm/init.c: remove unused memblock_region variable in map_ram()
Mike Travis (1):
x86/platform/uv: Set section block size for hubless architectures
Mikhail Durnev (1):
ASoC: rsnd: core: Check convert rate in rsnd_hw_params
Mikko Perttunen (1):
gpu: host1x: Use different lock classes for each client
Miklos Szeredi (3):
ovl: allow upperdir inside lowerdir
virtiofs: fix userns
cuse: prevent clone
Mikulas Patocka (2):
dm snapshot: fix crash with transient storage and zero chunk size
dm snapshot: properly fix a crash when an origin has no snapshots
Milton Miller (1):
net/ncsi: Avoid channel_monitor hrtimer deadlock
Ming Lei (1):
blk-mq: plug request for shared sbitmap
Muchun Song (1):
mm: memcontrol: slab: fix obtain a reference to a freeing memcg
Muhammad Usama Anjum (2):
net: ipv6: check for validity before dereferencing cfg->fc_nlinfo.nlh
media: em28xx: fix memory leak
Murthy Bhat (1):
scsi: smartpqi: Correct request leakage during reset operations
Nathan Chancellor (13):
arm64: alternatives: Move length validation in alternative_{insn, endif}
x86/boot: Add $(CLANG_FLAGS) to compressed KBUILD_CFLAGS
efi/libstub: Add $(CLANG_FLAGS) to x86 flags
Makefile: Move -Wno-unused-but-set-variable out of GCC only block
crypto: arm/curve25519 - Move '.fpu' after '.arch'
ACPI: CPPC: Replace cppc_attr with kobj_attribute
x86/events/amd/iommu: Fix sysfs type mismatch
perf/amd/uncore: Fix sysfs type mismatch
powerpc/fadump: Mark fadump_calculate_reserve_size as __init
powerpc/prom: Mark identical_pvr_fixup as __init
riscv: Use $(LD) instead of $(CC) to link vDSO
scripts/recordmcount.pl: Fix RISC-V regex for clang
riscv: Workaround mcount name prior to clang-13
Neil Armstrong (1):
drm/meson: fix shutdown crash when component not probed
NeilBrown (1):
SUNRPC in case of backlog, hand free slots directly to waiting task
Nicholas Piggin (5):
powerpc/powernv: Enable HAIL (HV AIL) for ISA v3.1 processors
KVM: PPC: Book3S HV P9: Restore host CTRL SPR after guest exit
powerpc/pseries: Fix hcall tracing recursion in pv queued spinlocks
powerpc/64s/syscall: Use pt_regs.trap to distinguish syscall ABI difference between sc and scv syscalls
powerpc/64s/syscall: Fix ptrace syscall info with scv syscalls
Nick Desaulniers (1):
gcov: re-fix clang-11+ support
Nick Lowe (1):
igb: Enable RSS for Intel I211 Ethernet Controller
Niklas Cassel (1):
nvme-pci: don't simple map sgl when sgls are disabled
Nikola Livic (1):
pNFS/flexfiles: fix incorrect size check in decode_nfs_fh()
Nikolay Aleksandrov (1):
net: bridge: when suppression is enabled exclude RARP packets
Nikolay Borisov (1):
mm/sl?b.c: remove ctor argument from kmem_cache_flags
Nirmoy Das (1):
drm/amd/display: use GFP_ATOMIC in dcn20_resource_construct
Nobuhiro Iwamatsu (2):
firmware: xilinx: Remove zynqmp_pm_get_eemi_ops() in IS_REACHABLE(CONFIG_ZYNQMP_FIRMWARE)
rtc: ds1307: Fix wday settings for rx8130
Noralf Trønnes (1):
drm/probe-helper: Check epoch counter in output_poll_execute()
Norbert Ciosek (1):
virtchnl: Fix layout of RSS structures
Norman Maurer (1):
net: udp: Add support for getsockopt(..., ..., UDP_GRO, ..., ...);
Obeida Shamoun (1):
backlight: qcom-wled: Use sink_addr for sync toggle
Odin Ugedal (1):
sched/fair: Fix unfairness caused by missing load decay
Oleg Nesterov (1):
ptrace: make ptrace() fail if the tracee changed its pid unexpectedly
Olga Kornievskaia (2):
NFSv4.2: fix copy stateid copying for the async copy
NFSv4.2 fix handling of sr_eof in SEEK's reply
Oliver Hartkopp (2):
can: bcm/raw: fix msg_namelen values depending on CAN_REQUIRED_SIZE
can: isotp: fix msg_namelen values depending on CAN_REQUIRED_SIZE
Oliver Neukum (2):
USB: CDC-ACM: fix poison/unpoison imbalance
cdc-wdm: untangle a circular dependency between callback and softint
Oliver Stäbler (1):
arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0
Omar Sandoval (1):
kyber: fix out of bounds access when preempted
Ondrej Mosnacek (5):
selinux: make nslot handling in avtab more robust
selinux: fix cond_list corruption when changing booleans
selinux: fix race between old and new sidtab
perf/core: Fix unconditional security_locked_down() call
serial: core: fix suspicious security_locked_down() call
Ong Boon Leong (2):
xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory model
net: stmmac: fix TSO and TBS feature enabling during driver open
Or Cohen (2):
net/sctp: fix race condition in sctp_destroy_sock
net/nfc: fix use-after-free llcp_sock_bind/connect
Orson Zhai (1):
mailbox: sprd: Introduce refcnt when clients requests/free channels
Otavio Pontes (1):
x86/microcode: Check for offline CPUs before requesting new microcode
Pablo Neira Ayuso (9):
netfilter: nftables: skip hook overlap logic if flowtable is stale
netfilter: flowtable: fix NAT IPv6 offload mangling
netfilter: conntrack: do not print icmpv6 as unknown via /proc
netfilter: nft_payload: fix C-VLAN offload support
netfilter: nftables_offload: VLAN id needs host byteorder in flow dissector
netfilter: nftables_offload: special ethertype handling for VLAN
netfilter: xt_SECMARK: add new revision to fix structure layout
netfilter: nfnetlink_osf: Fix a missing skb_header_pointer() NULL check
netfilter: nftables: Fix a memleak from userdata error path in new objects
Pali Rohár (7):
net: phy: marvell: fix detection of PHY on Topaz switches
cpufreq: armada-37xx: Fix the AVS value for load L1
clk: mvebu: armada-37xx-periph: Fix switching CPU freq from 250 Mhz to 1 GHz
clk: mvebu: armada-37xx-periph: Fix workaround for switching from L1 to L0
cpufreq: armada-37xx: Fix driver cleanup when registration failed
cpufreq: armada-37xx: Fix determining base CPU frequency
PCI: iproc: Fix return value of iproc_msi_irq_domain_alloc()
Pan Bian (1):
bus: qcom: Put child node before return
Paolo Abeni (8):
net: let skb_orphan_partial wake-up waiters.
mptcp: forbit mcast-related sockopt on MPTCP sockets
udp: never accept GSO_FRAGLIST packets
mptcp: fix splat when closing unaccepted socket
mptcp: avoid error message on infinite mapping
mptcp: drop unconditional pr_warn on bad opt
mptcp: fix data stream corruption
net: really orphan skbs tied to closing sk
Paolo Bonzini (1):
KVM: x86/mmu: preserve pending TLB flush across calls to kvm_tdp_mmu_zap_sp
Paul Aurich (1):
cifs: Return correct error code from smb2_get_enc_key
Paul Cercueil (1):
drm: bridge/panel: Cleanup connector on bridge detach
Paul Clements (1):
md/raid1: properly indicate failure when ending a failed write request
Paul Durrant (1):
xen-blkback: fix compatibility bug with single page rings
Paul Fertser (1):
hwmon: (pmbus/pxe1610) don't bail out when not all pages are active
Paul M Stillwell Jr (1):
ice: handle increasing Tx or Rx ring sizes
Paul Menzel (2):
iommu/amd: Put newline after closing bracket in warning
Revert "iommu/amd: Fix performance counter initialization"
Paul Moore (1):
selinux: add proper NULL termination to the secclass_map permissions
Pavel Andrianov (1):
net: pxa168_eth: Fix a potential data race in pxa168_eth_remove
Pavel Begunkov (3):
io_uring: fix timeout cancel return code
block: don't ignore REQ_NOWAIT for direct IO
io_uring: fix overflows checks in provide buffers
Pavel Machek (1):
intel_th: Consistency and off-by-one fix
Pavel Skripkin (6):
drivers: net: fix memory leak in atusb_probe
drivers: net: fix memory leak in peak_usb_create_dev
net: mac802154: Fix general protection fault
media: dvb-usb: fix memory leak in dvb_usb_adapter_init
tty: fix memory leak in vc_deallocate
net: usb: fix memory leak in smsc75xx_bind
Pavel Tatashin (3):
mm/gup: check every subpage of a compound page during isolation
mm/gup: return an error on migration failure
mm/gup: check for isolation errors
Pavel Tikhomirov (1):
net: sched: sch_teql: fix null-pointer dereference
Pawel Laszczak (2):
usb: gadget: uvc: add bInterval checking for HS mode
usb: webcam: Invalid size of Processing Unit Descriptor
Paweł Chmiel (1):
clk: exynos7: Mark aclk_fsys1_200 as critical
Pedro Tammela (1):
libbpf: Fix bail out from 'ringbuf_process_ring()' on error
PeiSen Hou (1):
ALSA: hda/realtek: Add some CLOVE SSIDs of ALC293
Peilin Ye (1):
media: dvbdev: Fix memory leak in dvb_media_device_free()
Peng Fan (1):
mmc: sdhci-esdhc-imx: validate pinctrl before use it
Peng Li (1):
net: hns3: use netif_tx_disable to stop the transmit queue
Peter Collingbourne (3):
arm64: fix inline asm in load_unaligned_zeropad()
kasan: fix unit tests with CONFIG_UBSAN_LOCAL_BOUNDS enabled
arm64: mte: initialize RGSR_EL1.SEED in __cpu_setup
Peter Ujfalusi (1):
ALSA: hda/realtek: Chain in pop reduction fixup for ThinkStation P340
Peter Wang (1):
scsi: ufs: ufs-mediatek: Fix power down spec violation
Peter Xu (1):
mm/hugetlb: fix F_SEAL_FUTURE_WRITE
Peter Zijlstra (3):
perf: Rework perf_event_exit_event()
sched,fair: Alternative sched_slice()
openrisc: Define memory barrier mb
Petr Machata (3):
selftests: net: mirror_gre_vlan_bridge_1q: Make an FDB entry static
selftests: mlxsw: Increase the tolerance of backlog buildup
selftests: mlxsw: Fix mausezahn invocation in ERSPAN scale test
Petr Mladek (4):
watchdog: rename __touch_watchdog() to a better descriptive name
watchdog: explicitly update timestamp when reporting softlockup
watchdog/softlockup: remove logic that tried to prevent repeated reports
watchdog: fix barriers when printing backtraces from all CPUs
Phil Calvin (1):
ALSA: hda/realtek: fix mic boost on Intel NUC 8
Phil Elwell (1):
usb: dwc2: Fix gadget DMA unmap direction
Phillip Lougher (1):
squashfs: fix divide error in calculate_skip()
Phillip Potter (11):
net: tun: set tun->dev->addr_len during TUNSETLINK processing
net: geneve: check skb is large enough for IPv4/IPv6 header
net: usb: ax88179_178a: initialize local variables before use
fbdev: zero-fill colormap in fbcmap.c
net: geneve: modify IP header check in geneve6_xmit_skb and geneve_xmit_skb
net: hsr: check skb can contain struct hsr_ethhdr in fill_frame_info
scsi: ufs: handle cleanup correctly on devm_reset_control_get error
leds: lp5523: check return value of lp5xx_read and jump to cleanup code
isdn: mISDNinfineon: check/cleanup ioremap failure correctly in setup_io
isdn: mISDN: correctly handle ph_info allocation failure in hfcsusb_ph_info
dmaengine: qcom_hidma: comment platform_driver_register call
Pierre-Louis Bossart (2):
soundwire: cadence: only prepare attached devices on clock stop
ASoC: samsung: tm2_wm5110: check of of_parse return value
Ping Cheng (1):
HID: wacom: set EV_KEY and EV_ABS only for non-HID_GENERIC type of devices
Ping-Ke Shih (2):
rtw88: Fix array overrun in rtw_get_tx_power_params()
rtlwifi: 8821ae: upgrade PHY and RF parameters
Piotr Krysiuk (2):
bpf, x86: Validate computation of branch displacements for x86-64
bpf, x86: Validate computation of branch displacements for x86-32
Po-Hao Huang (1):
rtw88: 8822c: add LC calibration for RTL8822C
Potnuri Bharat Teja (2):
RDMA/cxgb4: check for ipv6 address properly while destroying listener
RDMA/cxgb4: add missing qpid increment
Pradeep Kumar Chitrapu (1):
ath11k: fix thermal temperature read
Pradeep P V K (1):
mmc: sdhci: Check for reset prior to DMA address unmap
Prashant Malani (1):
platform/chrome: cros_ec_typec: Add DP mode check
Qii Wang (3):
i2c: mediatek: Fix wrong dma sync flag
i2c: mediatek: Fix send master code at more than 1MHz
i2c: mediatek: Disable i2c start_en and clear intr_stat brfore reset
Qinglang Miao (9):
soc: qcom: pdr: Fix error return code in pdr_register_listener
i2c: cadence: fix reference leak when pm_runtime_get_sync fails
i2c: img-scb: fix reference leak when pm_runtime_get_sync fails
i2c: imx-lpi2c: fix reference leak when pm_runtime_get_sync fails
i2c: imx: fix reference leak when pm_runtime_get_sync fails
i2c: omap: fix reference leak when pm_runtime_get_sync fails
i2c: sprd: fix reference leak when pm_runtime_get_sync fails
i2c: stm32f7: fix reference leak when pm_runtime_get_sync fails
i2c: xiic: fix reference leak when pm_runtime_get_sync fails
Qu Huang (1):
drm/amdkfd: Fix cat debugfs hang_hws file causes system crash bug
Qu Wenruo (1):
btrfs: handle remount to no compress during compression
Quanyang Wang (11):
spi: spi-zynqmp-gqspi: use wait_for_completion_timeout to make zynqmp_qspi_exec_op not interruptible
spi: spi-zynqmp-gqspi: add mutex locking for exec_op
spi: spi-zynqmp-gqspi: transmit dummy circles by using the controller's internal functionality
spi: spi-zynqmp-gqspi: fix incorrect operating mode in zynqmp_qspi_read_op
spi: spi-zynqmp-gqspi: fix clk_enable/disable imbalance issue
spi: spi-zynqmp-gqspi: fix hang issue when suspend/resume
spi: spi-zynqmp-gqspi: fix use-after-free in zynqmp_qspi_exec_op
spi: spi-zynqmp-gqspi: return -ENOMEM if dma_map_single fails
drm/tilcdc: send vblank event when disabling crtc
clk: zynqmp: move zynqmp_pll_set_mode out of round_rate callback
clk: zynqmp: pll: add set_pll_mode to check condition in zynqmp_pll_enable
Quentin Perret (1):
sched: Fix out-of-bound access in uclamp
Quinn Tran (1):
scsi: qla2xxx: Fix use after free in bsg
Raed Salem (1):
net/mlx5: Fix placement of log_max_flow_counter
Rafael J. Wysocki (4):
ACPI: x86: Call acpi_boot_table_init() after acpi_table_upgrade()
PCI: PM: Do not read power state in pci_enable_device_flags()
cpufreq: intel_pstate: Use HWP if enabled by platform firmware
drivers: base: Fix device link removal
Rafał Miłecki (2):
dt-bindings: net: ethernet-controller: fix typo in NVMEM
ARM: dts: BCM5301X: fix "reg" formatting in /memory node
Rahul Lakkireddy (1):
cxgb4: avoid collecting SGE_QBASE regs during traffic
Raju Rangoju (1):
cxgb4: avoid accessing registers when clearing filters
Ramesh Babu B (1):
net: stmmac: Clear receive all(RA) bit when promiscuous mode is off
Rander Wang (1):
soundwire: stream: fix memory leak in stream config error path
Randy Dunlap (8):
ia64: remove duplicate entries in generic_defconfig
csky: change a Kconfig symbol name to fix e1000 build error
ia64: fix discontig.c section mismatches
NFS: fs_context: validate UDP retrans to prevent shift out-of-bounds
drm: bridge: fix LONTIUM use of mipi_dsi_() functions
powerpc: iommu: fix build when neither PCI or IBMVIO is set
MIPS: alchemy: xxs1500: add gpio-au1000.h header file
MIPS: ralink: export rt_sysc_membase for rt2880_wdt.c
Randy Wright (1):
serial: 8250_pci: Add support for new HPE serial device
Rasmus Villemoes (2):
lib/vsprintf.c: remove leftover 'f' and 'F' cases from bstr_printf()
devtmpfs: fix placement of complete() call
Ravi Kumar Bokka (1):
drivers: nvmem: Fix voltage settings for QTI qfprom-efuse
Reiji Watanabe (1):
KVM: VMX: Don't use vcpu->run->internal.ndata as an array index
Ricardo Ribalda (3):
media: staging/intel-ipu3: Fix memory leak in imu_fmt
media: staging/intel-ipu3: Fix set_fmt error handling
media: staging/intel-ipu3: Fix race condition during set_fmt
Ricardo Rivera-Matos (1):
power: supply: bq25980: Move props from battery node
Richard Fitzgerald (1):
ASoC: cs42l42: Regmap must use_single_read/write
Richard Sanger (1):
net: packetmmap: fix only tx timestamp on request
Rijo Thomas (2):
crypto: ccp - fix command queuing to TEE ring buffer
tee: amdtee: unload TA only when its refcount becomes 0
Rikard Falkeborn (1):
linux/bits.h: fix compilation error with GENMASK
Rob Clark (2):
drm/msm: Ratelimit invalid-fence message
drm/msm: Fix a5xx/a6xx timestamps
Robert Malz (1):
ice: Cleanup fltr list in case of allocation issues
Robin Murphy (2):
perf/arm_pmu_platform: Use dev_err_probe() for IRQ errors
perf/arm_pmu_platform: Fix error handling
Robin Singh (1):
drm/amd/display: fixed divide by zero kernel crash during dsc enablement
Rodrigo Siqueira (1):
drm/amd/display: Fix two cursor duplication when using overlay
Roi Dayan (2):
net/mlx5e: Fix null deref accessing lag dev
netfilter: flowtable: Remove redundant hw refresh bit
Rolf Eike Beer (1):
iommu/vt-d: Fix sysfs leak in alloc_iommu()
Romain Naour (1):
mips: Do not include hi and lo in clobber list for R6
Roman Bolshakov (1):
scsi: target: iscsi: Fix zero tag inside a trace event
Roman Gushchin (1):
percpu: make pcpu_nr_empty_pop_pages per chunk type
Rong Chen (1):
selftests/vm: fix out-of-tree build
Ronnie Sahlberg (2):
cifs: revalidate mapping when we open files for SMB1 POSIX
cifs: fix memory leak in smb2_copychunk_range
Rui Miguel Silva (1):
iio: gyro: fxas21002c: balance runtime power in error path
Ruslan Bilovol (2):
usb: gadget: f_uac2: validate input parameters
usb: gadget: f_uac1: validate input parameters
Russ Weight (1):
fpga: dfl: pci: add DID for D5005 PAC cards
Russell Currey (1):
selftests/powerpc: Fix L1D flushing tests for Power10
Russell King (3):
net: sfp: relax bitrate-derived mode check
net: sfp: cope with SFPs that set both LOS normal and LOS inverted
ARM: footbridge: fix PCI interrupt mapping
Ryan Lee (2):
ASoC: max98373: Changed amp shutdown register as volatile
ASoC: max98373: Added 30ms turn on/off time delay
Ryder Lee (3):
mt76: mt7615: use ieee80211_free_txskb() in mt7615_tx_token_put()
mt76: mt7915: fix mib stats counter reporting to mac80211
mt76: mt7615: fix memleak when mt7615_unregister_device()
Saeed Mahameed (1):
net/mlx5e: reset XPS on error flow if netdev isn't registered yet
Sagi Grimberg (3):
nvme-tcp: block BH in sk state_change sk callback
nvmet-tcp: fix incorrect locking in state_change sk callback
nvme-tcp: fix possible use-after-completion
Sai Prakash Ranjan (2):
arm64: dts: qcom: sm8250: Fix level triggered PMU interrupt polarity
arm64: dts: qcom: sm8250: Fix timer interrupt to specify EL2 physical timer
Salil Mehta (1):
net: hns3: Limiting the scope of vector_ring_chain variable
Sami Loone (2):
ALSA: hda/realtek: fix static noise on ALC285 Lenovo laptops
ALSA: hda/realtek: ALC285 Thinkpad jack pin quirk is unreachable
Sandeep Singh (1):
xhci: Add reset resume quirk for AMD xhci controller.
Sander Vanheule (1):
mt76: mt7615: support loading EEPROM for MT7613BE
Saravana Kannan (1):
driver core: Fix locking bug in deferred_probe_timeout_work_func()
Sargun Dhillon (2):
Documentation: seccomp: Fix user notification documentation
seccomp: Refactor notification handler to prepare for new semantics
Sean Christopherson (26):
KVM: x86/mmu: Ensure TLBs are flushed when yielding during GFN range zap
KVM: x86/mmu: Ensure TLBs are flushed for TDP MMU during NX zapping
KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages
KVM: VMX: Convert vcpu_vmx.exit_reason to a union
x86/cpu: Initialize MSR_TSC_AUX if RDTSCP *or* RDPID is supported
KVM: x86: Defer the MMU unload to the normal path on an global INVPCID
KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit
KVM: x86: Remove emulator's broken checks on CR0/CR3/CR4 loads
KVM: nSVM: Set the shadow root level to the TDP level for nested NPT
KVM: SVM: Don't strip the C-bit from CR2 on #PF interception
KVM: SVM: Do not allow SEV/SEV-ES initialization after vCPUs are created
KVM: SVM: Inject #GP on guest MSR_TSC_AUX accesses if RDTSCP unsupported
KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch
KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bit
KVM: nVMX: Truncate base/index GPR value on address calc in !64-bit
KVM: Destroy I/O bus devices on unregister failure _after_ sync'ing SRCU
KVM: Stop looking for coalesced MMIO zones if the bus is destroyed
KVM: x86/mmu: Retry page faults that hit an invalid memslot
crypto: ccp: Detect and reject "invalid" addresses destined for PSP
KVM: VMX: Intercept FS/GS_BASE MSR accesses for 32-bit KVM
KVM: x86/mmu: Remove the defunct update_pte() paging hook
crypto: ccp: Free SEV device if SEV init fails
KVM: x86: Emulate RDPID only if RDTSCP is supported
KVM: x86: Move RDPID emulation intercept to its own enum
KVM: VMX: Do not advertise RDPID if ENABLE_RDTSCP control is unsupported
KVM: VMX: Disable preemption when probing user return MSRs
Sean MacLennan (1):
USB: serial: ti_usb_3410_5052: add startech.com device id
Sean Wang (2):
mt76: mt7663s: make all of packets 4-bytes aligned in sdio tx aggregation
mt76: mt7663s: fix the possible device hang in high traffic
Sean Young (1):
media: ite-cir: check for receive overflow
Sebastian Krzyszkowiak (1):
arm64: dts: imx8mq-librem5-r3: Mark buck3 as always on
Seevalamuthu Mariappan (1):
mac80211: clear sta->fast_rx when STA removed from 4-addr VLAN
Serge E. Hallyn (1):
capabilities: require CAP_SETFCAP to map uid 0
Sergei Trofimovich (5):
ia64: mca: allocate early mca with GFP_ATOMIC
ia64: fix format strings for err_inject
ia64: fix user_stack_pointer() for ptrace()
ia64: fix EFI_DEBUG build
ia64: module: fix symbolizer crash on fdescr
Sergey Shtylyov (16):
pata_arasan_cf: fix IRQ check
pata_ipx4xx_cf: fix IRQ check
sata_mv: add IRQ checks
ata: libahci_platform: fix IRQ check
scsi: ufs: ufshcd-pltfrm: Fix deferred probing
scsi: hisi_sas: Fix IRQ checks
scsi: jazz_esp: Add IRQ check
scsi: sun3x_esp: Add IRQ check
scsi: sni_53c710: Add IRQ check
i2c: cadence: add IRQ check
i2c: emev2: add IRQ check
i2c: jz4780: add IRQ check
i2c: mlxbf: add IRQ check
i2c: rcar: add IRQ check
i2c: sh7760: add IRQ check
i2c: sh7760: fix IRQ error path
Seunghui Lee (1):
mmc: core: Set read only for SD cards with permanent write protect bit
Shameer Kolothum (1):
iommu: Check dev->iommu in iommu_dev_xxx functions
Shawn Guo (4):
soc: qcom: geni: shield geni_icc_get() for ACPI boot
arm64: dts: qcom: sdm845: fix number of pins in 'gpio-ranges'
arm64: dts: qcom: sm8150: fix number of pins in 'gpio-ranges'
arm64: dts: qcom: sm8250: fix number of pins in 'gpio-ranges'
Shay Drory (2):
RDMA/core: Add CM to restrack after successful attachment to a device
RDMA/core: Don't access cm_id after its destruction
Shayne Chen (1):
mt76: mt7915: fix txpower init for TSSI off chips
Shengjiu Wang (3):
ASoC: wm8960: Fix wrong bclk and lrclk with pll enabled for some chips
ASoC: wm8960: Remove bitclk relax condition in wm8960_configure_sysclk
ASoC: ak5558: correct reset polarity
Shixin Liu (6):
crypto: sun8i-ss - Fix PM reference leak when pm_runtime_get_sync() fails
crypto: sun8i-ce - Fix PM reference leak in sun8i_ce_probe()
crypto: stm32/hash - Fix PM reference leak on stm32-hash.c
crypto: stm32/cryp - Fix PM reference leak on stm32-cryp.c
crypto: sa2ul - Fix PM reference leak in sa_ul_probe()
crypto: omap-aes - Fix PM reference leak on omap-aes.c
Shou-Chieh Hsu (1):
HID: google: add don USB id
Shradha Todi (1):
PCI: endpoint: Fix NULL pointer dereference for ->get_features()
Shuah Khan (5):
usbip: add sysfs_lock to synchronize sysfs code paths
usbip: stub-dev synchronize sysfs code paths
usbip: vudc synchronize sysfs code paths
usbip: synchronize event handler with sysfs code paths
ath10k: Fix ath10k_wmi_tlv_op_pull_peer_stats_info() unlock without lock
Shuo Chen (1):
dyndbg: fix parsing file query without a line-range suffix
Shyam Prasad N (1):
cifs: detect dead connections only when echoes are enabled.
Shyam Sundar S K (2):
amd-xgbe: Update DMA coherency values
platform/x86: hp-wireless: add AMD's hardware id to the supported list
Si-Wei Liu (1):
vdpa/mlx5: should exclude header length and fcs from mtu
Sibi Sankar (1):
remoteproc: qcom_q6v5_mss: Replace ioremap with memremap
Simon Rettberg (1):
drm/i915/gt: Disable HiZ Raw Stall Optimization on broken gen7
Sindhu Devale (1):
RDMA/i40iw: Fix error unwinding when i40iw_hmc_sd_one fails
Smita Koralahalli (1):
perf vendor events amd: Fix broken L2 Cache Hits from L2 HWPF metric
Souptick Joarder (1):
media: atomisp: Fixed error handling path
Sourabh Jain (1):
powerpc/kexec_file: Use current CPU info while setting up FDT
Sreekanth Reddy (1):
scsi: mpt3sas: Block PCI config access from userspace during reset
Srikar Dronamraju (2):
powerpc/smp: Reintroduce cpu_core_mask
powerpc/smp: Set numa node before updating mask
Srinivas Kandagatla (2):
arm64: dts: qcom: db845c: fix correct powerdown pin for WSA881x
soundwire: bus: Fix device found flag correctly
Srinivas Pandruvada (3):
tools/power/x86/intel-speed-select: Increase string size
platform/x86: ISST: Account for increased timeout in some cases
thermal/drivers/intel: Initialize RW trip to THERMAL_TEMP_INVALID
Sriram R (2):
ath10k: Validate first subframe of A-MSDU before processing the list
ath11k: Clear the fragment cache during key install
Stanimir Varbanov (1):
media: venus: hfi_parser: Don't initialize parser on v1
Stanislav Fomichev (1):
tools/resolve_btfids: Add /libbpf to .gitignore
Stefan Assmann (1):
iavf: remove duplicate free resources calls
Stefan Berger (3):
tpm: acpi: Check eventlog signature before using it
tpm: efi: Use local variable for calculating final log size
tpm: vtpm_proxy: Avoid reading host log when using a virtual device
Stefan Chulski (1):
net: mvpp2: add buffer header handling in RX
Stefan Raspl (1):
tools/kvm_stat: Add restart delay
Stefan Riedmueller (1):
ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces
Stefan Roese (1):
net: ethernet: mtk_eth_soc: Fix packet statistics support for MT7628/88
Stefano Brivio (1):
netfilter: nft_set_pipapo_avx2: Add irq_fpu_usable() check, fallback to non-AVX2 version
Stefano Garzarella (2):
vsock/vmci: log once the failed queue pair allocation
vsock/virtio: free queued packets when closing socket
Steffen Dirkwinkel (1):
platform/x86: pmc_atom: Match all Beckhoff Automation baytrail boards with critclk_systems DMI table
Steffen Klassert (2):
xfrm: Fix NULL pointer dereference on policy lookup
xfrm: Provide private skb extensions for segmented and hw offloaded ESP packets
Stephen Boyd (6):
drm/msm: Set drvdata to NULL when msm_drm_init() fails
serial: stm32: Use of_device_get_match_data()
firmware: qcom_scm: Make __qcom_scm_is_call_available() return bool
firmware: qcom_scm: Reduce locking section for __get_convention()
firmware: qcom_scm: Workaround lack of "is available" call on SC7180
ASoC: qcom: lpass-cpu: Use optional clk APIs
Steve French (3):
smb3: when mounting with multichannel include it in requested capabilities
smb3: do not attempt multichannel to server which does not support it
SMB3: incorrect file id in requests compounded with open
Steven Rostedt (VMware) (4):
ftrace: Check if pages were allocated before calling free_pages()
ftrace: Handle commands when closing set_ftrace_filter file
tracing: Map all PIDs to command lines
tracing: Restructure trace_clock_global() to never block
Stéphane Marchesin (1):
drm/i915: Fix crash in auto_retire
Subbaraman Narayanamurthy (1):
interconnect: qcom: bcm-voter: add a missing of_node_put()
Sudhakar Panneerselvam (1):
md/bitmap: wait for external bitmap writes to complete during tear down
Sumeet Pawnikar (1):
ACPI: PM: Add ACPI ID of Alder Lake Fan
Sun Ke (1):
nbd: Fix NULL pointer in flush_workqueue
Suravee Suthikulpanit (1):
iommu/amd: Remove performance counter pre-initialization test
Suzuki K Poulose (3):
KVM: arm64: Hide system instruction access to Trace registers
KVM: arm64: Disable guest access to trace filter controls
coresight: Do not scan for graph if none is present
Taehee Yoo (2):
mld: fix panic in mld_newpack()
sch_dsmark: fix a NULL deref in qdisc_reset()
Takashi Iwai (28):
ALSA: hda/realtek: Fix speaker amp setup on Acer Aspire E1
ALSA: hda/conexant: Apply quirk for another HP ZBook G5 model
drm/i915: Fix invalid access to ACPI _DSM objects
ALSA: usb-audio: Add MIDI quirk for Vox ToneLab EX
ALSA: hda/conexant: Re-order CX5066 quirk table entries
ALSA: usb-audio: Explicitly set up the clock selector
media: dvb-usb: Fix use-after-free access
media: dvb-usb: Fix memory leak at error in dvb_usb_device_init()
ALSA: hda/realtek: Re-order ALC882 Acer quirk table entries
ALSA: hda/realtek: Re-order ALC882 Sony quirk table entries
ALSA: hda/realtek: Re-order ALC882 Clevo quirk table entries
ALSA: hda/realtek: Re-order ALC269 HP quirk table entries
ALSA: hda/realtek: Re-order ALC269 Acer quirk table entries
ALSA: hda/realtek: Re-order ALC269 Dell quirk table entries
ALSA: hda/realtek: Re-order ALC269 ASUS quirk table entries
ALSA: hda/realtek: Re-order ALC269 Sony quirk table entries
ALSA: hda/realtek: Re-order ALC269 Lenovo quirk table entries
ALSA: hda/realtek: Re-order remaining ALC269 quirk table entries
ALSA: hda/realtek: Re-order ALC662 quirk table entries
ALSA: hda/realtek: Remove redundant entry for ALC861 Haier/Uniwill devices
ALSA: hda/realtek: Fix speaker amp on HP Envy AiO 32
ALSA: usb-audio: Add error checks for usb_driver_claim_interface() calls
ALSA: hda/realtek: Add quirk for Lenovo Ideapad S740
ALSA: intel8x0: Don't update period unless prepared
ALSA: line6: Fix racy initialization of LINE6 MIDI
ALSA: usb-audio: Validate MS endpoint descriptors
ALSA: hda/realtek: Fix silent headphone output on ASUS UX430UA
ALSA: hda/realtek: Add fixup for HP OMEN laptop
Takashi Sakamoto (7):
ALSA: bebob: enable to deliver MIDI messages for multiple ports
ALSA: dice: fix stream format for TC Electronic Konnekt Live at high sampling transfer frequency
ALSA: firewire-lib: fix amdtp_packet tracepoints event for packet_index field
ALSA: dice: fix stream format at middle sampling rate for Alesis iO 26
ALSA: firewire-lib: fix calculation for size of IR context payload
ALSA: bebob/oxfw: fix Kconfig entry for Mackie d.2 Pro
ALSA: firewire-lib: fix check for the size of isochronous packet payload
Tanner Love (1):
net/packet: make packet_fanout.arr size configurable up to 64K
Tao Liu (1):
openvswitch: meter: fix race when getting now_ms.
Tao Ren (1):
usb: gadget: aspeed: fix dma map failure
Tariq Toukan (1):
net/mlx5e: Enforce minimum value check for ICOSQ size
Tasos Sahanidis (2):
media: saa7134: use sg_dma_len when building pgtable
media: saa7146: use sg_dma_len when building pgtable
Teava Radu (1):
platform/x86: touchscreen_dmi: Add info for the Mediacom Winpad 7.0 W700 tablet
Tejas Patel (1):
firmware: xilinx: Fix dereferencing freed memory
Tejun Heo (1):
blk-iocost: fix weight updates of inner active iocgs
Tetsuo Handa (7):
batman-adv: initialize "struct batadv_tvlv_tt_vlan_data"->reserved field
lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message
misc: vmw_vmci: explicitly initialize vmci_notify_bm_set_msg struct
misc: vmw_vmci: explicitly initialize vmci_datagram payload
ttyprintk: Add TTY hangup callback.
Bluetooth: initialize skb_queue_head at l2cap_chan_create()
tty: vt: always invoke vc->vc_sw->con_resize callback
Thadeu Lima de Souza Cascardo (3):
io_uring: truncate lengths larger than MAX_RW_COUNT on provide buffers
bpf, ringbuf: Deny reserve of buffers larger than ringbuf
Bluetooth: cmtp: fix file refcount when cmtp_attach_device fails
Theodore Ts'o (2):
fs: fix reporting supported extra file attributes for statx()
ext4: allow the dax flag to be set and cleared on inline directories
Thinh Nguyen (5):
usb: xhci: Fix port minor revision
usb: dwc3: gadget: Check for disabled LPM quirk
usb: dwc3: gadget: Remove FS bInterval_m1 limitation
usb: dwc3: gadget: Fix START_TRANSFER link state check
usb: dwc3: gadget: Properly track pending and queued SG
Thomas Gleixner (4):
Revert 337f13046ff0 ("futex: Allow FUTEX_CLOCK_REALTIME with FUTEX_WAIT op")
futex: Do not apply time namespace adjustment on FUTEX_LOCK_PI
KVM: x86: Cancel pvclock_gtod_work on module removal
KVM: x86: Prevent deadlock against tk_core.seq
Thomas Richter (1):
perf ftrace: Fix access to pid in array when setting a pid filter
Thomas Zimmermann (1):
drm/ast: Fix invalid usage of AST_MAX_HWC_WIDTH in cursor atomic_check
Tian Tao (1):
dm integrity: fix missing goto in bitmap_flush_interval error handling
Tiezhu Yang (2):
MIPS/bpf: Enable bpf_probe_read{, str}() on MIPS again
MIPS: Loongson64: Use _CACHE_UNCACHED instead of _CACHE_UNCACHED_ACCELERATED
Timo Gurr (1):
ALSA: usb-audio: Add dB range mapping for Sennheiser Communications Headset PC 8
Toke Høiland-Jørgensen (2):
bpf: Enforce that struct_ops programs be GPL-only
ath9k: Fix error check in ath9k_hw_read_revisions() for PCI devices
Tom Lendacky (2):
x86/sev-es: Move sev_es_put_ghcb() in prep for follow on patch
x86/sev-es: Invalidate the GHCB after completing VMGEXIT
Tom Seewald (3):
qlcnic: Add null check after calling netdev_alloc_skb
char: hpet: add checks after calling ioremap
net: liquidio: Add missing null pointer checks
Tomas Winkler (1):
mei: me: add Alder Lake P device id.
Tong Zhang (8):
mISDN: fix crash in fritzpci
drm/qxl: do not run release if qxl failed to init
drm/ast: fix memory leak when unload the driver
crypto: qat - don't release uninitialized resources
crypto: qat - ADF_STATUS_PF_RUNNING should be set after adf_dev_init
ALSA: hdsp: don't disable if not enabled
ALSA: hdspm: don't disable if not enabled
ALSA: rme9652: don't disable if not enabled
Tong Zhu (1):
neighbour: Disregard DEAD dst in neigh_update
Tony Ambardar (1):
powerpc: fix EDEADLOCK redefinition error in uapi/asm/errno.h
Tony Lindgren (14):
bus: ti-sysc: Fix warning on unbind if reset is not deasserted
ARM: OMAP4: Fix PMIC voltage domains for bionic
ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race
ARM: dts: Fix moving mmc devices with aliases for omap4 & 5
ARM: OMAP2+: Fix warning for omap_init_time_of()
ARM: OMAP2+: Fix uninitialized sr_inst
gpio: omap: Save and restore sysconfig
ARM: dts: Fix swapped mmc order for omap3
bus: ti-sysc: Probe for l4_wkup and l4_cfg interconnect devices first
clocksource/drivers/timer-ti-dm: Fix posted mode status check order
clocksource/drivers/timer-ti-dm: Add missing set_state_oneshot_stopped
PM: runtime: Fix unpaired parent child_count for force_resume
clocksource/drivers/timer-ti-dm: Prepare to handle dra7 timer wrap issue
clocksource/drivers/timer-ti-dm: Handle dra7 timer wrap errata i940
Trond Myklebust (11):
NFS: Don't discard pNFS layout segments that are marked for return
NFSv4: Don't discard segments marked for return in _pnfs_return_layout()
NFS: nfs4_bitmask_adjust() must not change the server global bitmasks
NFS: Fix attribute bitmask in _nfs42_proc_fallocate()
NFSv4.2: Always flush out writes in nfs42_proc_fallocate()
NFS: Deal correctly with attribute generation counter overflow
NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting
NFS: NFS_INO_REVAL_PAGECACHE should mark the change attribute invalid
NFS: Fix an Oopsable condition in __nfs_pageio_add_request()
NFS: Don't corrupt the value of pg_bytes_written in nfs_do_recoalesce()
SUNRPC: More fixes for backlog congestion
Tudor Ambarus (2):
Revert "mtd: spi-nor: macronix: Add support for mx25l51245g"
spi: spi-ti-qspi: Free DMA resources
Tvrtko Ursulin (1):
drm/i915/overlay: Fix active retire callback alignment
Tyrel Datwyler (1):
powerpc/pseries: extract host bridge from pci_bus prior to bus removal
Uladzislau Rezki (Sony) (1):
kvfree_rcu: Use same set of GFP flags as does single-argument
Ulf Hansson (1):
mmc: core: Fix hanging on I/O during system suspend for removable cards
Uwe Kleine-König (1):
pwm: atmel: Fix duty cycle calculation in .get_state()
Vadym Kochan (1):
net: marvell: prestera: fix port event handling on init
Vaibhav Jain (2):
libnvdimm/region: Fix nvdimm_has_flush() to handle ND_REGION_ASYNC
powerpc/mm: Add cond_resched() while removing hpte mappings
Valentin CARON - foss (1):
ARM: dts: stm32: fix usart 2 & 3 pinconf to wake up with flow control
Valentin Schneider (1):
sched/fair: Fix shift-out-of-bounds in load_balance()
Vamshi Krishna Gopal (1):
ASoC: Intel: sof_sdw: add quirk for new ADL-P Rvp
Varad Gautam (1):
ipc/mqueue, msg, sem: avoid relying on a stack reference past its expiry
Vasily Averin (1):
tools/cgroup/slabinfo.py: updated to work on current kernel
Vasily Gorbik (2):
s390/entry: save the caller of psw_idle
s390/disassembler: increase ebpf disasm buffer size
Ville Syrjälä (2):
drm/i915: Avoid div-by-zero on gen2
drm/i915: Read C0DRB3/C1DRB3 as 16 bits again
Vinay Kumar Yadav (4):
ch_ktls: Fix kernel panic
ch_ktls: fix device connection close
ch_ktls: tcb close causes tls connection failure
ch_ktls: do not send snd_una update to TCB in middle
Vincent Donnefort (1):
sched/pelt: Fix task util_est update filtering
Vincent Whitchurch (1):
cifs: Silently ignore unknown oplock break handle
Vineet Gupta (1):
ARC: entry: fix off-by-one error in syscall number validation
Viswas G (1):
scsi: pm80xx: Fix chip initialization failure
Vitaly Chikunov (1):
perf beauty: Fix fsconfig generator
Vitaly Kuznetsov (3):
ACPI: processor: Fix build when CONFIG_ACPI_PROCESSOR=m
genirq/matrix: Prevent allocation counter corruption
KVM: nVMX: Always make an attempt to map eVMCS after migration
Vivek Goyal (5):
fuse: fix write deadlock
fuse: invalidate attrs when page writeback completes
dax: Add an enum for specifying dax wakup mode
dax: Add a wakeup mode parameter to put_unlocked_entry()
dax: Wake up all waiters after invalidating dax entry
Vlad Buslov (4):
net: sched: fix action overwrite reference counting
net: sched: fix err handler in tcf_action_init()
Revert "net: sched: bump refcount for new action in ACT replace mode"
net: zero-initialize tc skb extension on allocation
Vladimir Barinov (1):
arm64: dts: renesas: r8a77980: Fix vin4-7 endpoint binding
Vladimir Isaev (2):
ARC: mm: PAE: use 40-bit physical page mask
ARC: mm: Use max_high_pfn as a HIGHMEM zone border
Vladimir Murzin (1):
ARM: 9069/1: NOMMU: Fix conversion for_each_membock() to for_each_mem_range()
Vladimir Oltean (8):
net/sched: cls_flower: use ntohs for struct flow_dissector_key_ports
net: dsa: sja1105: update existing VLANs from the bridge VLAN list
net: dsa: sja1105: use 4095 as the private VLAN for untagged traffic
net: dsa: sja1105: error out on unsupported PHY mode
net: dsa: sja1105: add error handling in sja1105_setup()
net: dsa: sja1105: call dsa_unregister_switch when allocating memory fails
net: dsa: sja1105: fix VL lookup command packing for P/Q/R/S
net: dsa: fix error code getting shifted with 4 in dsa_slave_get_sset_count
Vladyslav Tarasiuk (1):
net/mlx4: Fix EEPROM dump support
Waiman Long (1):
sched/debug: Fix cgroup_path[] serialization
Wan Jiabing (1):
cavium/liquidio: Fix duplicate argument
Wang Li (2):
spi: qup: fix PM reference leak in spi_qup_remove()
spi: fsl-lpspi: Fix PM reference leak in lpspi_prepare_xfer_hardware()
Wang Qing (1):
arc: kernel: Return -EFAULT if copy_to_user() fails
Wang Wensheng (5):
RDMA/qedr: Fix error return code in qedr_iw_connect()
IB/hfi1: Fix error return code in parse_platform_config()
RDMA/bnxt_re: Fix error return code in bnxt_qplib_cq_process_terminal()
RDMA/srpt: Fix error return code in srpt_cm_req_recv()
mm/sparse: add the missing sparse_buffer_fini() in error branch
Wanpeng Li (5):
KVM: LAPIC: Accurately guarantee busy wait for timer to expire when using hv_timer
context_tracking: Move guest exit context tracking to separate helpers
context_tracking: Move guest exit vtime accounting to separate helpers
KVM: x86: Defer vtime accounting 'til after IRQ handling
KVM: X86: Fix vCPU preempted state from guest's point of view
Wayne Lin (2):
drm/dp_mst: Revise broadcast msg lct & lcr
drm/dp_mst: Set CLEAR_PAYLOAD_ID_TABLE as broadcast
Wei Yongjun (7):
spi: dln2: Fix reference leak to master
spi: omap-100k: Fix reference leak to master
usb: typec: tps6598x: Fix return value check in tps6598x_probe()
usb: typec: stusb160x: fix return value check in stusb160x_probe()
clocksource/drivers/ingenic_ost: Fix return value check in ingenic_ost_probe()
spi: spi-zynqmp-gqspi: Fix missing unlock on error in zynqmp_qspi_exec_op()
media: m88ds3103: fix return value check in m88ds3103_probe()
Wen Gong (6):
mac80211: extend protection against mixed key and fragment cache attacks
ath10k: add CCMP PN replay protection for fragmented frames for PCIe
ath10k: drop fragments with multicast DA for PCIe
ath10k: drop fragments with multicast DA for SDIO
ath10k: drop MPDU which has discard flag set by firmware for SDIO
ath10k: Fix TKIP Michael MIC verification for PCIe
Wengang Wang (1):
ocfs2: fix deadlock between setattr and dio_end_io_write
Werner Sembach (1):
drm/amd/display: Try YCbCr420 color when YCbCr444 fails
Wesley Cheng (2):
usb: dwc3: gadget: Ignore EP queue requests during bus reset
usb: dwc3: gadget: Return success always for kick transfer in ep queue
William A. Kennington III (1):
spi: Fix use-after-free with devm_spi_alloc_*
William Roche (1):
RAS/CEC: Correct ce_add_elem()'s returned values
Wolfram Sang (4):
i2c: turn recovery error on init to debug
i2c: rcar: make sure irq is not threaded on Gen2 and earlier
i2c: rcar: protect against supurious interrupts on V3U
i2c: bail out early when RDWR parameters are wrong
Wong Vee Khee (1):
ethtool: fix incorrect datatype in set_eee ops
Wu Bo (2):
nvmet: fix memory leak in nvmet_alloc_ctrl()
nvme-loop: fix memory leak in nvme_loop_create_ctrl()
Xiang Chen (2):
mtd: spi-nor: core: Fix an issue of releasing resources during read/write
iommu: Fix a boundary issue to avoid performance drop
Xiao Ni (1):
async_xor: increase src_offs when dropping destination page
Xiaogang Chen (1):
drm/amdgpu/display: buffer INTERRUPT_LOW_IRQ_CONTEXT interrupt work
Xiaoming Ni (4):
nfc: fix refcount leak in llcp_sock_bind()
nfc: fix refcount leak in llcp_sock_connect()
nfc: fix memory leak in llcp_sock_connect()
nfc: Avoid endless loops caused by repeated llcp_sock_connect()
Xie He (2):
Revert "drivers/net/wan/hdlc_fr: Fix a double free in pvc_xmit"
net: lapbether: Prevent racing when checking whether the netif is running
Xie Yongji (1):
vhost-vdpa: protect concurrent access to vhost device iotlb
Xin Long (9):
esp: delete NETIF_F_SCTP_CRC bit from features for esp offload
tipc: increment the tmp aead refcnt before attaching it
xfrm: BEET mode doesn't support fragments for inner packets
Revert "net/sctp: fix race condition in sctp_destroy_sock"
sctp: delay auto_asconf init until binding the first addr
sctp: do asoc update earlier in sctp_sf_do_dupcook_a
sctp: fix a SCTP_MIB_CURRESTAB leak in sctp_sf_do_dupcook_b
tipc: wait and exit until all work queues are done
tipc: skb_linearize the head skb when reassembling msgs
Xingui Yang (1):
ata: ahci: Disable SXS for Hisilicon Kunpeng920
Xu Yihang (1):
ext4: fix error return code in ext4_fc_perform_commit()
Xu Yilun (1):
mfd: intel-m10-bmc: Fix the register access range
Xuan Zhuo (1):
xsk: Fix for xp_aligned_validate_desc() when len == chunk_size
Yang Yang (1):
jffs2: check the validity of dstlen in jffs2_zlib_compress()
Yang Yingliang (15):
usb: gadget: tegra-xudc: Fix possible use-after-free in tegra_xudc_remove()
phy: phy-twl4030-usb: Fix possible use-after-free in twl4030_usb_remove()
power: supply: generic-adc-battery: fix possible use-after-free in gab_remove()
power: supply: s3c_adc_battery: fix possible use-after-free in s3c_adc_bat_remove()
media: tc358743: fix possible use-after-free in tc358743_remove()
media: adv7604: fix possible use-after-free in adv76xx_remove()
media: i2c: adv7511-v4l2: fix possible use-after-free in adv7511_remove()
media: i2c: tda1997: Fix possible use-after-free in tda1997x_remove()
media: i2c: adv7842: fix possible use-after-free in adv7842_remove()
USB: gadget: udc: fix wrong pointer passed to IS_ERR() and PTR_ERR()
spi: fsl: add missing iounmap() on error in of_fsl_spi_probe()
media: omap4iss: return error code when omap4iss_get() failed
net/tipc: fix missing destroy_workqueue() on error in tipc_crypto_start()
PCI: endpoint: Fix missing destroy_workqueue()
tools/testing/selftests/exec: fix link error
Yangbo Lu (1):
ptp_qoriq: fix overflow in ptp_qoriq_adjfine() u64 calcalation
Yannick Vignon (1):
net: stmmac: Do not enable RX FIFO overflow interrupts
Yaqi Chen (1):
samples/bpf: Fix broken tracex1 due to kprobe argument change
Ye Bin (2):
ext4: fix ext4_error_err save negative errno into superblock
usbip: vudc: fix missing unlock on error in usbip_sockfd_store()
Yi Li (1):
drm/amdgpu: Fix GPU TLB update error when PAGE_SIZE > AMDGPU_PAGE_SIZE
Yi Zhuang (1):
f2fs: Fix a hungtask problem in atomic write
Yingjie Wang (1):
drm/radeon: Fix a missing check bug in radeon_dp_mst_detect()
Yinjun Zhang (3):
nfp: flower: ignore duplicate merge hints from FW
nfp: devlink: initialize the devlink port attribute "lanes"
bpf, offload: Reorder offload callback 'prepare' in verifier
Yonghong Song (3):
bpf, x86: Use kvmalloc_array instead kmalloc_array in bpf_jit_comp
bpf: Permits pointers on stack for helper calls
selftests: Set CC to clang in lib.mk if LLVM is set
Yongxin Liu (2):
ice: fix memory leak of aRFS after resuming from suspend
ixgbe: fix unbalanced device enable/disable in suspend/resume
Yoshihiro Shimoda (5):
ARM: dts: renesas: Add mmc aliases into R-Car Gen2 board dts files
arm64: dts: renesas: Add mmc aliases into board dts files
arm64: dts: renesas: r8a779a0: Fix PMU interrupt
net: renesas: ravb: Fix a stuck issue when a lot of frames are received
usb: gadget: udc: renesas_usb3: Fix a race in usb3_start_pipen()
Yu Chen (1):
usb: dwc3: core: Do core softreset when switch mode
Yuanyuan Zhong (1):
pinctrl: lewisburg: Update number of pins in community
YueHaibing (2):
PM: runtime: Replace inline function pm_runtime_callbacks_present()
iio: adc: ad7793: Add missing error code in ad7793_setup()
Yufen Yu (1):
block: only update parent bi_status when bio fail
Yufeng Mo (3):
net: hns3: fix incorrect configuration for igu_egu_hw_err
net: hns3: initialize the message content in hclge_get_link_mode()
net: hns3: disable phy loopback setting in hclge_mac_start_phy
Yunjian Wang (2):
net: cls_api: Fix uninitialised struct field bo->unlocked_driver_cb
i40e: Fix use-after-free in i40e_client_subtask()
Yunsheng Lin (5):
net: hns3: add handling for xmit skb with recursive fraglist
net: sched: fix packet stuck problem for lockless qdisc
net: sched: fix tx action rescheduling issue during deactivation
net: sched: fix tx action reschedule issue with stopped queue
net: hns3: check the return of skb_checksum_help()
Zhang Xiaoxu (1):
NFSv4: Fix v4.0/v4.1 SEEK_DATA return -ENOTSUPP when set NFS_V4_2 config
Zhang Yi (2):
ext4: fix check to prevent false positive report of incorrect used inodes
ext4: do not set SB_ACTIVE in ext4_orphan_cleanup()
Zhang Zhengming (1):
bridge: Fix possible races between assigning rx_handler_data and setting IFF_BRIDGE_PORT bit
Zhao Heming (1):
md: md_open returns -EBUSY when entering racing area
Zhen Lei (9):
perf map: Fix error return code in maps__clone()
perf data: Fix error return code in perf_data__create_dir()
iommu/arm-smmu-v3: add bit field SFM into GERROR_ERR_MASK
tpm: fix error return code in tpm2_get_cc_attrs_tbl()
ARM: 9064/1: hw_breakpoint: Do not directly check the event's overflow_handler hook
xen/unpopulated-alloc: fix error return code in fill_list()
dt-bindings: serial: 8250: Remove duplicated compatible strings
scsi: qla2xxx: Fix error return code in qla82xx_write_flash_dword()
net: bnx2: Fix error return code in bnx2_init_board()
Zheng Yongjun (1):
net: openvswitch: conntrack: simplify the return expression of ovs_ct_limit_get_default_limit()
Zheyu Ma (1):
serial: rp2: use 'request_firmware' instead of 'request_firmware_nowait'
Zhouyi Zhou (1):
rcu: Remove spurious instrumentation_end() in rcu_nmi_enter()
Zhu Lingshan (1):
Revert "irqbypass: do not start cons/prod when failed connect"
Zihao Yu (1):
riscv,entry: fix misaligned base for excp_vect_table
Zolton Jheng (1):
USB: serial: pl2303: add device id for ADLINK ND-6530 GC
Zou Wei (2):
gpio: cadence: Add missing MODULE_DEVICE_TABLE
interconnect: qcom: Add missing MODULE_DEVICE_TABLE
Zqiang (3):
workqueue: Move the position of debug_work_activate() in __queue_work()
lib: stackdepot: turn depot_lock spinlock to raw_spinlock
locking/mutex: clear MUTEX_FLAGS if wait_list is empty due to signal
brian-sy yang (1):
thermal/drivers/cpufreq_cooling: Fix slab OOB issue
dillon min (1):
dt-bindings: serial: stm32: Use 'type: object' instead of false for 'additionalProperties'
dongjian (1):
power: supply: Use IRQF_ONESHOT
gexueyuan (1):
memory: pl353: fix mask of ECC page_size config register
karthik alapati (1):
staging: wimax/i2400m: fix byte-order issue
kernel test robot (2):
of: overlay: fix for_each_child.cocci warnings
ALSA: usb-audio: scarlett2: snd_scarlett_gen2_controls_create() can be static
lizhe (1):
jffs2: Fix kasan slab-out-of-bounds problem
louis.wang (1):
ARM: 9066/1: ftrace: pause/unpause function graph tracer in cpu_suspend()
mark-yw.chen (1):
Bluetooth: btusb: Enable quirk boolean flag for Mediatek Chip.
shaoyunl (1):
drm/amdgpu : Fix asic reset regression issue introduce by 8f211fe8ac7c4f
wenxu (1):
net/mlx5e: fix ingress_ifindex check in mlx5e_flower_parse_meta
xinhui pan (1):
drm/amdgpu: Fix a use-after-free
yangerkun (1):
block: reexpand iov_iter after read/write
zhouchuangao (1):
fs/nfs: Use fatal_signal_pending instead of signal_pending
Álvaro Fernández Rojas (3):
mtd: rawnand: brcmnand: fix OOB R/W with Hamming ECC
gpio: guard gpiochip_irqchip_add_domain() with GPIOLIB_IRQCHIP
mips: bmips: fix syscon-reboot nodes
Íñigo Huguet (1):
net:CXGB4: fix leak if sk_buff is not used
周琰杰 (Zhou Yanjie) (1):
I2C: JZ4780: Fix bug for Ingenic X1000.
BUG=None
TEST=tryjob, validation and K8s e2e
RELEASE_NOTE=Updated the Linux kernel to v5.10.42.
Signed-off-by: Oleksandr Tymoshenko <ovt@google.com>
Change-Id: I9b9de1782f69d84d170525465fd19af43a7a9160
diff --git a/Documentation/block/queue-sysfs.rst b/Documentation/block/queue-sysfs.rst
index 2638d34..4874c52 100644
--- a/Documentation/block/queue-sysfs.rst
+++ b/Documentation/block/queue-sysfs.rst
@@ -273,4 +273,13 @@
do not support zone commands, they will be treated as regular block devices
and zoned will report "none".
+split_alignment (RW)
+----------------------
+This is the alignment in bytes sectors at which the requeust queue is
+allowed to split IO requests. Once this value is set, the requeust
+queue splits IOs such that the individual IOs are aligned to
+split_alignment. The value of 0 indicates that an IO request can be
+split anywhere. This value must be a power of 2 and greater than or
+equal to 512.
+
Jens Axboe <jens.axboe@oracle.com>, February 2009
diff --git a/Documentation/filesystems/ext4/journal.rst b/Documentation/filesystems/ext4/journal.rst
index 849d5b1..cdbfec4 100644
--- a/Documentation/filesystems/ext4/journal.rst
+++ b/Documentation/filesystems/ext4/journal.rst
@@ -681,3 +681,53 @@
- Stores the TID of the commit, CRC of the fast commit of which this tag
represents the end of
+Fast Commit Replay Idempotence
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Fast commits tags are idempotent in nature provided the recovery code follows
+certain rules. The guiding principle that the commit path follows while
+committing is that it stores the result of a particular operation instead of
+storing the procedure.
+
+Let's consider this rename operation: 'mv /a /b'. Let's assume dirent '/a'
+was associated with inode 10. During fast commit, instead of storing this
+operation as a procedure "rename a to b", we store the resulting file system
+state as a "series" of outcomes:
+
+- Link dirent b to inode 10
+- Unlink dirent a
+- Inode 10 with valid refcount
+
+Now when recovery code runs, it needs "enforce" this state on the file
+system. This is what guarantees idempotence of fast commit replay.
+
+Let's take an example of a procedure that is not idempotent and see how fast
+commits make it idempotent. Consider following sequence of operations:
+
+1) rm A
+2) mv B A
+3) read A
+
+If we store this sequence of operations as is then the replay is not idempotent.
+Let's say while in replay, we crash after (2). During the second replay,
+file A (which was actually created as a result of "mv B A" operation) would get
+deleted. Thus, file named A would be absent when we try to read A. So, this
+sequence of operations is not idempotent. However, as mentioned above, instead
+of storing the procedure fast commits store the outcome of each procedure. Thus
+the fast commit log for above procedure would be as follows:
+
+(Let's assume dirent A was linked to inode 10 and dirent B was linked to
+inode 11 before the replay)
+
+1) Unlink A
+2) Link A to inode 11
+3) Unlink B
+4) Inode 11
+
+If we crash after (3) we will have file A linked to inode 11. During the second
+replay, we will remove file A (inode 11). But we will create it back and make
+it point to inode 11. We won't find B, so we'll just skip that step. At this
+point, the refcount for inode 11 is not reliable, but that gets fixed by the
+replay of last inode 11 tag. Thus, by converting a non-idempotent procedure
+into a series of idempotent outcomes, fast commits ensured idempotence during
+the replay.
diff --git a/Documentation/networking/filter.rst b/Documentation/networking/filter.rst
index debb59e..1583d59 100644
--- a/Documentation/networking/filter.rst
+++ b/Documentation/networking/filter.rst
@@ -1006,13 +1006,13 @@
Mode modifier is one of::
- BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
- BPF_ABS 0x20
- BPF_IND 0x40
- BPF_MEM 0x60
- BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
- BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
- BPF_XADD 0xc0 /* eBPF only, exclusive add */
+ BPF_IMM 0x00 /* used for 32-bit mov in classic BPF and 64-bit in eBPF */
+ BPF_ABS 0x20
+ BPF_IND 0x40
+ BPF_MEM 0x60
+ BPF_LEN 0x80 /* classic BPF only, reserved in eBPF */
+ BPF_MSH 0xa0 /* classic BPF only, reserved in eBPF */
+ BPF_ATOMIC 0xc0 /* eBPF only, atomic operations */
eBPF has two non-generic instructions: (BPF_ABS | <size> | BPF_LD) and
(BPF_IND | <size> | BPF_LD) which are used to access packet data.
@@ -1044,11 +1044,19 @@
BPF_MEM | <size> | BPF_STX: *(size *) (dst_reg + off) = src_reg
BPF_MEM | <size> | BPF_ST: *(size *) (dst_reg + off) = imm32
BPF_MEM | <size> | BPF_LDX: dst_reg = *(size *) (src_reg + off)
- BPF_XADD | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
- BPF_XADD | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
-Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW. Note that 1 and
-2 byte atomic increments are not supported.
+Where size is one of: BPF_B or BPF_H or BPF_W or BPF_DW.
+
+It also includes atomic operations, which use the immediate field for extra
+encoding.
+
+ .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_W | BPF_STX: lock xadd *(u32 *)(dst_reg + off16) += src_reg
+ .imm = BPF_ADD, .code = BPF_ATOMIC | BPF_DW | BPF_STX: lock xadd *(u64 *)(dst_reg + off16) += src_reg
+
+Note that 1 and 2 byte atomic operations are not supported.
+
+You may encounter BPF_XADD - this is a legacy name for BPF_ATOMIC, referring to
+the exclusive-add operation encoded when the immediate field is zero.
eBPF has one 16-byte instruction: BPF_LD | BPF_DW | BPF_IMM which consists
of two consecutive ``struct bpf_insn`` 8-byte blocks and interpreted as single
diff --git a/PRESUBMIT.cfg b/PRESUBMIT.cfg
new file mode 100644
index 0000000..0d7269b
--- /dev/null
+++ b/PRESUBMIT.cfg
@@ -0,0 +1,11 @@
+[Hook Overrides]
+aosp_license_check: false
+cros_license_check: false
+long_line_check: false
+stray_whitespace_check: false
+tab_check: false
+tabbed_indent_required_check: false
+signoff_check: true
+
+# Make sure RELEASE_NOTE field is present.
+release_note_field_check: true
diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 0207b6e..897634d 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1620,10 +1620,9 @@
}
emit_str_r(dst_lo, tmp2, off, ctx, BPF_SIZE(code));
break;
- /* STX XADD: lock *(u32 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_W:
- /* STX XADD: lock *(u64 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_DW:
+ /* Atomic ops */
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
goto notyet;
/* STX: *(size *)(dst + off) = src */
case BPF_STX | BPF_MEM | BPF_W:
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index ef9f1d5..f7b1948 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -875,10 +875,18 @@
}
break;
- /* STX XADD: lock *(u32 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_W:
- /* STX XADD: lock *(u64 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_DW:
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
+ if (insn->imm != BPF_ADD) {
+ pr_err_once("unknown atomic op code %02x\n", insn->imm);
+ return -EINVAL;
+ }
+
+ /* STX XADD: lock *(u32 *)(dst + off) += src
+ * and
+ * STX XADD: lock *(u64 *)(dst + off) += src
+ */
+
if (!off) {
reg = dst;
} else {
diff --git a/arch/mips/net/ebpf_jit.c b/arch/mips/net/ebpf_jit.c
index 561154c..939dd06 100644
--- a/arch/mips/net/ebpf_jit.c
+++ b/arch/mips/net/ebpf_jit.c
@@ -1423,8 +1423,8 @@
case BPF_STX | BPF_H | BPF_MEM:
case BPF_STX | BPF_W | BPF_MEM:
case BPF_STX | BPF_DW | BPF_MEM:
- case BPF_STX | BPF_W | BPF_XADD:
- case BPF_STX | BPF_DW | BPF_XADD:
+ case BPF_STX | BPF_W | BPF_ATOMIC:
+ case BPF_STX | BPF_DW | BPF_ATOMIC:
if (insn->dst_reg == BPF_REG_10) {
ctx->flags |= EBPF_SEEN_FP;
dst = MIPS_R_SP;
@@ -1438,7 +1438,12 @@
src = ebpf_to_mips_reg(ctx, insn, src_reg_no_fp);
if (src < 0)
return src;
- if (BPF_MODE(insn->code) == BPF_XADD) {
+ if (BPF_MODE(insn->code) == BPF_ATOMIC) {
+ if (insn->imm != BPF_ADD) {
+ pr_err("ATOMIC OP %02x NOT HANDLED\n", insn->imm);
+ return -EINVAL;
+ }
+
/*
* If mem_off does not fit within the 9 bit ll/sc
* instruction immediate field, use a temp reg.
diff --git a/arch/powerpc/net/bpf_jit_comp64.c b/arch/powerpc/net/bpf_jit_comp64.c
index 022103c..aaf1a88 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -683,10 +683,18 @@
break;
/*
- * BPF_STX XADD (atomic_add)
+ * BPF_STX ATOMIC (atomic ops)
*/
- /* *(u32 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_W:
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ if (insn->imm != BPF_ADD) {
+ pr_err_ratelimited(
+ "eBPF filter atomic op code %02x (@%d) unsupported\n",
+ code, i);
+ return -ENOTSUPP;
+ }
+
+ /* *(u32 *)(dst + off) += src */
+
/* Get EA into TMP_REG_1 */
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], dst_reg, off));
tmp_idx = ctx->idx * 4;
@@ -699,8 +707,15 @@
/* we're done if this succeeded */
PPC_BCC_SHORT(COND_NE, tmp_idx);
break;
- /* *(u64 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_DW:
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
+ if (insn->imm != BPF_ADD) {
+ pr_err_ratelimited(
+ "eBPF filter atomic op code %02x (@%d) unsupported\n",
+ code, i);
+ return -ENOTSUPP;
+ }
+ /* *(u64 *)(dst + off) += src */
+
EMIT(PPC_RAW_ADDI(b2p[TMP_REG_1], dst_reg, off));
tmp_idx = ctx->idx * 4;
EMIT(PPC_RAW_LDARX(b2p[TMP_REG_2], 0, b2p[TMP_REG_1], 0));
diff --git a/arch/riscv/net/bpf_jit_comp32.c b/arch/riscv/net/bpf_jit_comp32.c
index 579575f..81de865 100644
--- a/arch/riscv/net/bpf_jit_comp32.c
+++ b/arch/riscv/net/bpf_jit_comp32.c
@@ -881,7 +881,7 @@
const s8 *rd = bpf_get_reg64(dst, tmp1, ctx);
const s8 *rs = bpf_get_reg64(src, tmp2, ctx);
- if (mode == BPF_XADD && size != BPF_W)
+ if (mode == BPF_ATOMIC && size != BPF_W)
return -1;
emit_imm(RV_REG_T0, off, ctx);
@@ -899,7 +899,7 @@
case BPF_MEM:
emit(rv_sw(RV_REG_T0, 0, lo(rs)), ctx);
break;
- case BPF_XADD:
+ case BPF_ATOMIC: /* Only BPF_ADD supported */
emit(rv_amoadd_w(RV_REG_ZERO, lo(rs), RV_REG_T0, 0, 0),
ctx);
break;
@@ -1260,7 +1260,6 @@
case BPF_STX | BPF_MEM | BPF_H:
case BPF_STX | BPF_MEM | BPF_W:
case BPF_STX | BPF_MEM | BPF_DW:
- case BPF_STX | BPF_XADD | BPF_W:
if (BPF_CLASS(code) == BPF_ST) {
emit_imm32(tmp2, imm, ctx);
src = tmp2;
@@ -1271,8 +1270,21 @@
return -1;
break;
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ if (insn->imm != BPF_ADD) {
+ pr_info_once(
+ "bpf-jit: not supported: atomic operation %02x ***\n",
+ insn->imm);
+ return -EFAULT;
+ }
+
+ if (emit_store_r64(dst, src, off, ctx, BPF_SIZE(code),
+ BPF_MODE(code)))
+ return -1;
+ break;
+
/* No hardware support for 8-byte atomics in RV32. */
- case BPF_STX | BPF_XADD | BPF_DW:
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
/* Fallthrough. */
notsupported:
diff --git a/arch/riscv/net/bpf_jit_comp64.c b/arch/riscv/net/bpf_jit_comp64.c
index 8a56b52..b44ff52 100644
--- a/arch/riscv/net/bpf_jit_comp64.c
+++ b/arch/riscv/net/bpf_jit_comp64.c
@@ -1027,10 +1027,18 @@
emit_add(RV_REG_T1, RV_REG_T1, rd, ctx);
emit_sd(RV_REG_T1, 0, rs, ctx);
break;
- /* STX XADD: lock *(u32 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_W:
- /* STX XADD: lock *(u64 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_DW:
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
+ if (insn->imm != BPF_ADD) {
+ pr_err("bpf-jit: not supported: atomic operation %02x ***\n",
+ insn->imm);
+ return -EINVAL;
+ }
+
+ /* atomic_add: lock *(u32 *)(dst + off) += src
+ * atomic_add: lock *(u64 *)(dst + off) += src
+ */
+
if (off) {
if (is_12b_int(off)) {
emit_addi(RV_REG_T1, rd, off, ctx);
diff --git a/arch/s390/net/bpf_jit_comp.c b/arch/s390/net/bpf_jit_comp.c
index 0a41827..f973e2e 100644
--- a/arch/s390/net/bpf_jit_comp.c
+++ b/arch/s390/net/bpf_jit_comp.c
@@ -1205,18 +1205,23 @@
jit->seen |= SEEN_MEM;
break;
/*
- * BPF_STX XADD (atomic_add)
+ * BPF_ATOMIC
*/
- case BPF_STX | BPF_XADD | BPF_W: /* *(u32 *)(dst + off) += src */
- /* laal %w0,%src,off(%dst) */
- EMIT6_DISP_LH(0xeb000000, 0x00fa, REG_W0, src_reg,
- dst_reg, off);
- jit->seen |= SEEN_MEM;
- break;
- case BPF_STX | BPF_XADD | BPF_DW: /* *(u64 *)(dst + off) += src */
- /* laalg %w0,%src,off(%dst) */
- EMIT6_DISP_LH(0xeb000000, 0x00ea, REG_W0, src_reg,
- dst_reg, off);
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ if (insn->imm != BPF_ADD) {
+ pr_err("Unknown atomic operation %02x\n", insn->imm);
+ return -1;
+ }
+
+ /* *(u32/u64 *)(dst + off) += src
+ *
+ * BFW_W: laal %w0,%src,off(%dst)
+ * BPF_DW: laalg %w0,%src,off(%dst)
+ */
+ EMIT6_DISP_LH(0xeb000000,
+ BPF_SIZE(insn->code) == BPF_W ? 0x00fa : 0x00ea,
+ REG_W0, src_reg, dst_reg, off);
jit->seen |= SEEN_MEM;
break;
/*
diff --git a/arch/sparc/net/bpf_jit_comp_64.c b/arch/sparc/net/bpf_jit_comp_64.c
index 3364e2a..4b8d3c6 100644
--- a/arch/sparc/net/bpf_jit_comp_64.c
+++ b/arch/sparc/net/bpf_jit_comp_64.c
@@ -1366,12 +1366,18 @@
break;
}
- /* STX XADD: lock *(u32 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_W: {
+ case BPF_STX | BPF_ATOMIC | BPF_W: {
const u8 tmp = bpf2sparc[TMP_REG_1];
const u8 tmp2 = bpf2sparc[TMP_REG_2];
const u8 tmp3 = bpf2sparc[TMP_REG_3];
+ if (insn->imm != BPF_ADD) {
+ pr_err_once("unknown atomic op %02x\n", insn->imm);
+ return -EINVAL;
+ }
+
+ /* lock *(u32 *)(dst + off) += src */
+
if (insn->dst_reg == BPF_REG_FP)
ctx->saw_frame_pointer = true;
@@ -1390,11 +1396,16 @@
break;
}
/* STX XADD: lock *(u64 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_DW: {
+ case BPF_STX | BPF_ATOMIC | BPF_DW: {
const u8 tmp = bpf2sparc[TMP_REG_1];
const u8 tmp2 = bpf2sparc[TMP_REG_2];
const u8 tmp3 = bpf2sparc[TMP_REG_3];
+ if (insn->imm != BPF_ADD) {
+ pr_err_once("unknown atomic op %02x\n", insn->imm);
+ return -EINVAL;
+ }
+
if (insn->dst_reg == BPF_REG_FP)
ctx->saw_frame_pointer = true;
diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index a11796b..d5b0bed 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -205,6 +205,18 @@
return byte + reg2hex[dst_reg] + (reg2hex[src_reg] << 3);
}
+/* Some 1-byte opcodes for binary ALU operations */
+static u8 simple_alu_opcodes[] = {
+ [BPF_ADD] = 0x01,
+ [BPF_SUB] = 0x29,
+ [BPF_AND] = 0x21,
+ [BPF_OR] = 0x09,
+ [BPF_XOR] = 0x31,
+ [BPF_LSH] = 0xE0,
+ [BPF_RSH] = 0xE8,
+ [BPF_ARSH] = 0xF8,
+};
+
static void jit_fill_hole(void *area, unsigned int size)
{
/* Fill whole space with INT3 instructions */
@@ -681,6 +693,42 @@
*pprog = prog;
}
+/* Emit the suffix (ModR/M etc) for addressing *(ptr_reg + off) and val_reg */
+static void emit_insn_suffix(u8 **pprog, u32 ptr_reg, u32 val_reg, int off)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+
+ if (is_imm8(off)) {
+ /* 1-byte signed displacement.
+ *
+ * If off == 0 we could skip this and save one extra byte, but
+ * special case of x86 R13 which always needs an offset is not
+ * worth the hassle
+ */
+ EMIT2(add_2reg(0x40, ptr_reg, val_reg), off);
+ } else {
+ /* 4-byte signed displacement */
+ EMIT1_off32(add_2reg(0x80, ptr_reg, val_reg), off);
+ }
+ *pprog = prog;
+}
+
+/*
+ * Emit a REX byte if it will be necessary to address these registers
+ */
+static void maybe_emit_mod(u8 **pprog, u32 dst_reg, u32 src_reg, bool is64)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+
+ if (is64)
+ EMIT1(add_2mod(0x48, dst_reg, src_reg));
+ else if (is_ereg(dst_reg) || is_ereg(src_reg))
+ EMIT1(add_2mod(0x40, dst_reg, src_reg));
+ *pprog = prog;
+}
+
/* LDX: dst_reg = *(u8*)(src_reg + off) */
static void emit_ldx(u8 **pprog, u32 size, u32 dst_reg, u32 src_reg, int off)
{
@@ -708,15 +756,7 @@
EMIT2(add_2mod(0x48, src_reg, dst_reg), 0x8B);
break;
}
- /*
- * If insn->off == 0 we can save one extra byte, but
- * special case of x86 R13 which always needs an offset
- * is not worth the hassle
- */
- if (is_imm8(off))
- EMIT2(add_2reg(0x40, src_reg, dst_reg), off);
- else
- EMIT1_off32(add_2reg(0x80, src_reg, dst_reg), off);
+ emit_insn_suffix(&prog, src_reg, dst_reg, off);
*pprog = prog;
}
@@ -751,13 +791,53 @@
EMIT2(add_2mod(0x48, dst_reg, src_reg), 0x89);
break;
}
- if (is_imm8(off))
- EMIT2(add_2reg(0x40, dst_reg, src_reg), off);
- else
- EMIT1_off32(add_2reg(0x80, dst_reg, src_reg), off);
+ emit_insn_suffix(&prog, dst_reg, src_reg, off);
*pprog = prog;
}
+static int emit_atomic(u8 **pprog, u8 atomic_op,
+ u32 dst_reg, u32 src_reg, s16 off, u8 bpf_size)
+{
+ u8 *prog = *pprog;
+ int cnt = 0;
+
+ EMIT1(0xF0); /* lock prefix */
+
+ maybe_emit_mod(&prog, dst_reg, src_reg, bpf_size == BPF_DW);
+
+ /* emit opcode */
+ switch (atomic_op) {
+ case BPF_ADD:
+ case BPF_SUB:
+ case BPF_AND:
+ case BPF_OR:
+ case BPF_XOR:
+ /* lock *(u32/u64*)(dst_reg + off) <op>= src_reg */
+ EMIT1(simple_alu_opcodes[atomic_op]);
+ break;
+ case BPF_ADD | BPF_FETCH:
+ /* src_reg = atomic_fetch_add(dst_reg + off, src_reg); */
+ EMIT2(0x0F, 0xC1);
+ break;
+ case BPF_XCHG:
+ /* src_reg = atomic_xchg(dst_reg + off, src_reg); */
+ EMIT1(0x87);
+ break;
+ case BPF_CMPXCHG:
+ /* r0 = atomic_cmpxchg(dst_reg + off, r0, src_reg); */
+ EMIT2(0x0F, 0xB1);
+ break;
+ default:
+ pr_err("bpf_jit: unknown atomic opcode %02x\n", atomic_op);
+ return -EFAULT;
+ }
+
+ emit_insn_suffix(&prog, dst_reg, src_reg, off);
+
+ *pprog = prog;
+ return 0;
+}
+
static bool ex_handler_bpf(const struct exception_table_entry *x,
struct pt_regs *regs, int trapnr,
unsigned long error_code, unsigned long fault_addr)
@@ -802,6 +882,7 @@
int i, cnt = 0, excnt = 0;
int proglen = 0;
u8 *prog = temp;
+ int err;
detect_reg_usage(insn, insn_cnt, callee_regs_used,
&tail_call_seen);
@@ -837,17 +918,9 @@
case BPF_ALU64 | BPF_AND | BPF_X:
case BPF_ALU64 | BPF_OR | BPF_X:
case BPF_ALU64 | BPF_XOR | BPF_X:
- switch (BPF_OP(insn->code)) {
- case BPF_ADD: b2 = 0x01; break;
- case BPF_SUB: b2 = 0x29; break;
- case BPF_AND: b2 = 0x21; break;
- case BPF_OR: b2 = 0x09; break;
- case BPF_XOR: b2 = 0x31; break;
- }
- if (BPF_CLASS(insn->code) == BPF_ALU64)
- EMIT1(add_2mod(0x48, dst_reg, src_reg));
- else if (is_ereg(dst_reg) || is_ereg(src_reg))
- EMIT1(add_2mod(0x40, dst_reg, src_reg));
+ maybe_emit_mod(&prog, dst_reg, src_reg,
+ BPF_CLASS(insn->code) == BPF_ALU64);
+ b2 = simple_alu_opcodes[BPF_OP(insn->code)];
EMIT2(b2, add_2reg(0xC0, dst_reg, src_reg));
break;
@@ -1027,12 +1100,7 @@
else if (is_ereg(dst_reg))
EMIT1(add_1mod(0x40, dst_reg));
- switch (BPF_OP(insn->code)) {
- case BPF_LSH: b3 = 0xE0; break;
- case BPF_RSH: b3 = 0xE8; break;
- case BPF_ARSH: b3 = 0xF8; break;
- }
-
+ b3 = simple_alu_opcodes[BPF_OP(insn->code)];
if (imm32 == 1)
EMIT2(0xD1, add_1reg(b3, dst_reg));
else
@@ -1066,11 +1134,7 @@
else if (is_ereg(dst_reg))
EMIT1(add_1mod(0x40, dst_reg));
- switch (BPF_OP(insn->code)) {
- case BPF_LSH: b3 = 0xE0; break;
- case BPF_RSH: b3 = 0xE8; break;
- case BPF_ARSH: b3 = 0xF8; break;
- }
+ b3 = simple_alu_opcodes[BPF_OP(insn->code)];
EMIT2(0xD3, add_1reg(b3, dst_reg));
if (src_reg != BPF_REG_4)
@@ -1230,21 +1294,60 @@
}
break;
- /* STX XADD: lock *(u32*)(dst_reg + off) += src_reg */
- case BPF_STX | BPF_XADD | BPF_W:
- /* Emit 'lock add dword ptr [rax + off], eax' */
- if (is_ereg(dst_reg) || is_ereg(src_reg))
- EMIT3(0xF0, add_2mod(0x40, dst_reg, src_reg), 0x01);
- else
- EMIT2(0xF0, 0x01);
- goto xadd;
- case BPF_STX | BPF_XADD | BPF_DW:
- EMIT3(0xF0, add_2mod(0x48, dst_reg, src_reg), 0x01);
-xadd: if (is_imm8(insn->off))
- EMIT2(add_2reg(0x40, dst_reg, src_reg), insn->off);
- else
- EMIT1_off32(add_2reg(0x80, dst_reg, src_reg),
- insn->off);
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
+ if (insn->imm == (BPF_AND | BPF_FETCH) ||
+ insn->imm == (BPF_OR | BPF_FETCH) ||
+ insn->imm == (BPF_XOR | BPF_FETCH)) {
+ u8 *branch_target;
+ bool is64 = BPF_SIZE(insn->code) == BPF_DW;
+ u32 real_src_reg = src_reg;
+
+ /*
+ * Can't be implemented with a single x86 insn.
+ * Need to do a CMPXCHG loop.
+ */
+
+ /* Will need RAX as a CMPXCHG operand so save R0 */
+ emit_mov_reg(&prog, true, BPF_REG_AX, BPF_REG_0);
+ if (src_reg == BPF_REG_0)
+ real_src_reg = BPF_REG_AX;
+
+ branch_target = prog;
+ /* Load old value */
+ emit_ldx(&prog, BPF_SIZE(insn->code),
+ BPF_REG_0, dst_reg, insn->off);
+ /*
+ * Perform the (commutative) operation locally,
+ * put the result in the AUX_REG.
+ */
+ emit_mov_reg(&prog, is64, AUX_REG, BPF_REG_0);
+ maybe_emit_mod(&prog, AUX_REG, real_src_reg, is64);
+ EMIT2(simple_alu_opcodes[BPF_OP(insn->imm)],
+ add_2reg(0xC0, AUX_REG, real_src_reg));
+ /* Attempt to swap in new value */
+ err = emit_atomic(&prog, BPF_CMPXCHG,
+ dst_reg, AUX_REG, insn->off,
+ BPF_SIZE(insn->code));
+ if (WARN_ON(err))
+ return err;
+ /*
+ * ZF tells us whether we won the race. If it's
+ * cleared we need to try again.
+ */
+ EMIT2(X86_JNE, -(prog - branch_target) - 2);
+ /* Return the pre-modification value */
+ emit_mov_reg(&prog, is64, real_src_reg, BPF_REG_0);
+ /* Restore R0 after clobbering RAX */
+ emit_mov_reg(&prog, true, BPF_REG_0, BPF_REG_AX);
+ break;
+
+ }
+
+ err = emit_atomic(&prog, insn->imm, dst_reg, src_reg,
+ insn->off, BPF_SIZE(insn->code));
+ if (err)
+ return err;
break;
/* call */
@@ -1295,20 +1398,16 @@
case BPF_JMP32 | BPF_JSGE | BPF_X:
case BPF_JMP32 | BPF_JSLE | BPF_X:
/* cmp dst_reg, src_reg */
- if (BPF_CLASS(insn->code) == BPF_JMP)
- EMIT1(add_2mod(0x48, dst_reg, src_reg));
- else if (is_ereg(dst_reg) || is_ereg(src_reg))
- EMIT1(add_2mod(0x40, dst_reg, src_reg));
+ maybe_emit_mod(&prog, dst_reg, src_reg,
+ BPF_CLASS(insn->code) == BPF_JMP);
EMIT2(0x39, add_2reg(0xC0, dst_reg, src_reg));
goto emit_cond_jmp;
case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X:
/* test dst_reg, src_reg */
- if (BPF_CLASS(insn->code) == BPF_JMP)
- EMIT1(add_2mod(0x48, dst_reg, src_reg));
- else if (is_ereg(dst_reg) || is_ereg(src_reg))
- EMIT1(add_2mod(0x40, dst_reg, src_reg));
+ maybe_emit_mod(&prog, dst_reg, src_reg,
+ BPF_CLASS(insn->code) == BPF_JMP);
EMIT2(0x85, add_2reg(0xC0, dst_reg, src_reg));
goto emit_cond_jmp;
@@ -1344,10 +1443,8 @@
case BPF_JMP32 | BPF_JSLE | BPF_K:
/* test dst_reg, dst_reg to save one extra byte */
if (imm32 == 0) {
- if (BPF_CLASS(insn->code) == BPF_JMP)
- EMIT1(add_2mod(0x48, dst_reg, dst_reg));
- else if (is_ereg(dst_reg))
- EMIT1(add_2mod(0x40, dst_reg, dst_reg));
+ maybe_emit_mod(&prog, dst_reg, dst_reg,
+ BPF_CLASS(insn->code) == BPF_JMP);
EMIT2(0x85, add_2reg(0xC0, dst_reg, dst_reg));
goto emit_cond_jmp;
}
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 2cf4d21..6a99def 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -2243,10 +2243,8 @@
return -EFAULT;
}
break;
- /* STX XADD: lock *(u32 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_W:
- /* STX XADD: lock *(u64 *)(dst + off) += src */
- case BPF_STX | BPF_XADD | BPF_DW:
+ case BPF_STX | BPF_ATOMIC | BPF_W:
+ case BPF_STX | BPF_ATOMIC | BPF_DW:
goto notyet;
case BPF_JMP | BPF_EXIT:
if (seen_exit) {
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 7cdd566..9164e67 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -106,15 +106,18 @@
static struct bio *blk_bio_write_zeroes_split(struct request_queue *q,
struct bio *bio, struct bio_set *bs, unsigned *nsegs)
{
+ sector_t split;
+
*nsegs = 0;
- if (!q->limits.max_write_zeroes_sectors)
+ split = q->limits.max_write_zeroes_sectors;
+ if (split && q->split_alignment)
+ split = round_down(split, q->split_alignment);
+
+ if (!split || bio_sectors(bio) <= split)
return NULL;
- if (bio_sectors(bio) <= q->limits.max_write_zeroes_sectors)
- return NULL;
-
- return bio_split(bio, q->limits.max_write_zeroes_sectors, GFP_NOIO, bs);
+ return bio_split(bio, split, GFP_NOIO, bs);
}
static struct bio *blk_bio_write_same_split(struct request_queue *q,
@@ -122,15 +125,18 @@
struct bio_set *bs,
unsigned *nsegs)
{
+ sector_t split;
+
*nsegs = 1;
- if (!q->limits.max_write_same_sectors)
+ split = q->limits.max_write_same_sectors;
+ if (split && q->split_alignment)
+ split = round_down(split, q->split_alignment);
+
+ if (!split || bio_sectors(bio) <= split)
return NULL;
- if (bio_sectors(bio) <= q->limits.max_write_same_sectors)
- return NULL;
-
- return bio_split(bio, q->limits.max_write_same_sectors, GFP_NOIO, bs);
+ return bio_split(bio, split, GFP_NOIO, bs);
}
/*
@@ -249,7 +255,9 @@
{
struct bio_vec bv, bvprv, *bvprvp = NULL;
struct bvec_iter iter;
- unsigned nsegs = 0, sectors = 0;
+ unsigned int nsegs = 0, nsegs_aligned = 0;
+ unsigned int sectors = 0, sectors_aligned = 0, before = 0, after = 0;
+ unsigned int sector_alignment = q->split_alignment;
const unsigned max_sectors = get_max_io_size(q, bio);
const unsigned max_segs = queue_max_segments(q);
@@ -265,12 +273,31 @@
sectors + (bv.bv_len >> 9) <= max_sectors &&
bv.bv_offset + bv.bv_len <= PAGE_SIZE) {
nsegs++;
- sectors += bv.bv_len >> 9;
- } else if (bvec_split_segs(q, &bv, &nsegs, §ors, max_segs,
- max_sectors)) {
- goto split;
+ before = round_down(sectors, sector_alignment);
+ sectors += (bv.bv_len >> 9);
+ after = round_down(sectors, sector_alignment);
+ if (sector_alignment && before != after) {
+ /* This is a valid split point */
+ nsegs_aligned = nsegs;
+ sectors_aligned = after;
+ }
+ goto next;
}
-
+ if (sector_alignment) {
+ before = round_down(sectors, sector_alignment);
+ after = round_down(sectors + (bv.bv_len >> 9),
+ sector_alignment);
+ if ((nsegs < max_segs) && before != after &&
+ ((after - before) << 9) + bv.bv_offset <= PAGE_SIZE
+ && after <= max_sectors) {
+ sectors_aligned = after;
+ nsegs_aligned = nsegs + 1;
+ }
+ }
+ if (bvec_split_segs(q, &bv, &nsegs, §ors, max_segs,
+ max_sectors))
+ goto split;
+next:
bvprv = bv;
bvprvp = &bvprv;
}
@@ -279,7 +306,13 @@
return NULL;
split:
*segs = nsegs;
- return bio_split(bio, sectors, GFP_NOIO, bs);
+ if (sector_alignment && sectors_aligned == 0)
+ return NULL;
+
+ *segs = sector_alignment ? nsegs_aligned : nsegs;
+
+ return bio_split(bio, sector_alignment ? sectors_aligned : sectors,
+ GFP_NOIO, bs);
}
/**
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index b513f16..bdc214d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -548,6 +548,33 @@
return queue_var_show(blk_queue_dax(q), page);
}
+static ssize_t queue_split_alignment_show(struct request_queue *q, char *page)
+{
+ return queue_var_show((q->split_alignment << 9), page);
+}
+
+static ssize_t queue_split_alignment_store(struct request_queue *q, const char *page,
+ size_t count)
+{
+ unsigned long split_alignment;
+ int ret;
+
+ ret = queue_var_store(&split_alignment, page, count);
+ if (ret < 0)
+ return ret;
+
+ /* split_alignment can only be a power of 2 */
+ if (split_alignment & (split_alignment - 1))
+ return -EINVAL;
+
+ /* ..and it should be greater than 512 */
+ if (!(split_alignment >> 9))
+ return -EINVAL;
+
+ q->split_alignment = split_alignment >> 9;
+ return count;
+}
+
#define QUEUE_RO_ENTRY(_prefix, _name) \
static struct queue_sysfs_entry _prefix##_entry = { \
.attr = { .name = _name, .mode = 0444 }, \
@@ -600,6 +627,7 @@
QUEUE_RO_ENTRY(queue_dax, "dax");
QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
QUEUE_RW_ENTRY(queue_wb_lat, "wbt_lat_usec");
+QUEUE_RW_ENTRY(queue_split_alignment, "split_alignment");
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
QUEUE_RW_ENTRY(blk_throtl_sample_time, "throttle_sample_time");
@@ -659,6 +687,7 @@
#ifdef CONFIG_BLK_DEV_THROTTLING_LOW
&blk_throtl_sample_time_entry.attr,
#endif
+ &queue_split_alignment_entry.attr,
NULL,
};
diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 8d70017..817fed4 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -59,6 +59,15 @@
rescue mode with init=/bin/sh, even when the /dev directory
on the rootfs is completely empty.
+config DEVTMPFS_SAFE
+ bool "Automount devtmpfs with nosuid/noexec"
+ depends on DEVTMPFS_MOUNT
+ default y
+ help
+ This instructs the kernel to automount devtmpfs with the
+ MS_NOEXEC and MS_NOSUID mount flags, which can prevent
+ certain kinds of code-execution attack on embedded platforms.
+
config STANDALONE
bool "Select only drivers that don't need compile-time external firmware"
default y
diff --git a/drivers/base/devtmpfs.c b/drivers/base/devtmpfs.c
index a71d141..6bf82b3 100644
--- a/drivers/base/devtmpfs.c
+++ b/drivers/base/devtmpfs.c
@@ -353,6 +353,7 @@
int __init devtmpfs_mount(void)
{
int err;
+ int mflags = MS_SILENT;
if (!mount_dev)
return 0;
@@ -360,7 +361,10 @@
if (!thread)
return 0;
- err = init_mount("devtmpfs", "dev", "devtmpfs", MS_SILENT, NULL);
+#ifdef CONFIG_DEVTMPFS_SAFE
+ mflags |= MS_NOEXEC | MS_NOSUID;
+#endif
+ err = init_mount("devtmpfs", "dev", "devtmpfs", mflags, NULL);
if (err)
printk(KERN_INFO "devtmpfs: error mounting %i\n", err);
else
diff --git a/drivers/md/dm-init.c b/drivers/md/dm-init.c
index b0c45c6..188b23c 100644
--- a/drivers/md/dm-init.c
+++ b/drivers/md/dm-init.c
@@ -301,3 +301,254 @@
module_param(create, charp, 0);
MODULE_PARM_DESC(create, "Create a mapped device in early boot");
+
+/* ---------------------------------------------------------------
+ * ChromeOS shim - convert dm= format to dm-mod.create= format
+ * ---------------------------------------------------------------
+ */
+
+struct dm_chrome_target {
+ char *field[4];
+};
+
+struct dm_chrome_dev {
+ char *name, *uuid, *mode;
+ unsigned int num_targets;
+ struct dm_chrome_target targets[DM_MAX_TARGETS];
+};
+
+static char __init *dm_chrome_parse_target(char *str, struct dm_chrome_target *tgt)
+{
+ unsigned int i;
+
+ tgt->field[0] = str;
+ /* Delimit first 3 fields that are separated by space */
+ for (i = 0; i < ARRAY_SIZE(tgt->field) - 1; i++) {
+ tgt->field[i + 1] = str_field_delimit(&tgt->field[i], ' ');
+ if (!tgt->field[i + 1])
+ return NULL;
+ }
+ /* Delimit last field that can be terminated by comma */
+ return str_field_delimit(&tgt->field[i], ',');
+}
+
+static char __init *dm_chrome_parse_dev(char *str, struct dm_chrome_dev *dev)
+{
+ char *target, *num;
+ unsigned int i;
+
+ if (!str)
+ return ERR_PTR(-EINVAL);
+
+ target = str_field_delimit(&str, ',');
+ if (!target)
+ return ERR_PTR(-EINVAL);
+
+ /* Delimit first 3 fields that are separated by space */
+ dev->name = str;
+ dev->uuid = str_field_delimit(&dev->name, ' ');
+ if (!dev->uuid)
+ return ERR_PTR(-EINVAL);
+
+ dev->mode = str_field_delimit(&dev->uuid, ' ');
+ if (!dev->mode)
+ return ERR_PTR(-EINVAL);
+
+ /* num is optional */
+ num = str_field_delimit(&dev->mode, ' ');
+ if (!num)
+ dev->num_targets = 1;
+ else {
+ /* Delimit num and check if it the last field */
+ if(str_field_delimit(&num, ' '))
+ return ERR_PTR(-EINVAL);
+ if (kstrtouint(num, 0, &dev->num_targets))
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (dev->num_targets > DM_MAX_TARGETS) {
+ DMERR("too many targets %u > %d",
+ dev->num_targets, DM_MAX_TARGETS);
+ return ERR_PTR(-EINVAL);
+ }
+
+ for (i = 0; i < dev->num_targets - 1; i++) {
+ target = dm_chrome_parse_target(target, &dev->targets[i]);
+ if (!target)
+ return ERR_PTR(-EINVAL);
+ }
+ /* The last one can return NULL if it reaches the end of str */
+ return dm_chrome_parse_target(target, &dev->targets[i]);
+}
+
+static char __init *dm_chrome_convert(struct dm_chrome_dev *devs, unsigned int num_devs)
+{
+ char *str = kmalloc(DM_MAX_STR_SIZE, GFP_KERNEL);
+ char *p = str;
+ unsigned int i, j;
+ int ret;
+
+ if (!str)
+ return ERR_PTR(-ENOMEM);
+
+ for (i = 0; i < num_devs; i++) {
+ if (!strcmp(devs[i].uuid, "none"))
+ devs[i].uuid = "";
+ ret = snprintf(p, DM_MAX_STR_SIZE - (p - str),
+ "%s,%s,,%s",
+ devs[i].name,
+ devs[i].uuid,
+ devs[i].mode);
+ if (ret < 0)
+ goto out;
+ p += ret;
+
+ for (j = 0; j < devs[i].num_targets; j++) {
+ ret = snprintf(p, DM_MAX_STR_SIZE - (p - str),
+ ",%s %s %s %s",
+ devs[i].targets[j].field[0],
+ devs[i].targets[j].field[1],
+ devs[i].targets[j].field[2],
+ devs[i].targets[j].field[3]);
+ if (ret < 0)
+ goto out;
+ p += ret;
+ }
+ if (i < num_devs - 1) {
+ ret = snprintf(p, DM_MAX_STR_SIZE - (p - str), ";");
+ if (ret < 0)
+ goto out;
+ p += ret;
+ }
+ }
+
+ return str;
+
+out:
+ kfree(str);
+ return ERR_PTR(ret);
+}
+
+/**
+ * dm_chrome_shim - convert old dm= format used in chromeos to the new
+ * upstream format.
+ *
+ * ChromeOS old format
+ * -------------------
+ * <device> ::= [<num>] <device-mapper>+
+ * <device-mapper> ::= <head> "," <target>+
+ * <head> ::= <name> <uuid> <mode> [<num>]
+ * <target> ::= <start> <length> <type> <options> ","
+ * <mode> ::= "ro" | "rw"
+ * <uuid> ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | "none"
+ * <type> ::= "verity" | "bootcache" | ...
+ *
+ * Example:
+ * 2 vboot none ro 1,
+ * 0 1768000 bootcache
+ * device=aa55b119-2a47-8c45-946a-5ac57765011f+1
+ * signature=76e9be054b15884a9fa85973e9cb274c93afadb6
+ * cache_start=1768000 max_blocks=100000 size_limit=23 max_trace=20000,
+ * vroot none ro 1,
+ * 0 1740800 verity payload=254:0 hashtree=254:0 hashstart=1740800 alg=sha1
+ * root_hexdigest=76e9be054b15884a9fa85973e9cb274c93afadb6
+ * salt=5b3549d54d6c7a3837b9b81ed72e49463a64c03680c47835bef94d768e5646fe
+ *
+ * Notes:
+ * 1. uuid is a label for the device and we set it to "none".
+ * 2. The <num> field will be optional initially and assumed to be 1.
+ * Once all the scripts that set these fields have been set, it will
+ * be made mandatory.
+ */
+
+static char *chrome_create;
+
+static int __init dm_chrome_shim(char *arg) {
+ if (!arg || create)
+ return -EINVAL;
+ chrome_create = arg;
+ return 0;
+}
+
+static int __init dm_chrome_parse_devices(void)
+{
+ struct dm_chrome_dev *devs;
+ unsigned int num_devs, i;
+ char *next, *base_str;
+ int ret = 0;
+
+ /* Verify if dm-mod.create was not used */
+ if (!chrome_create || create)
+ return -EINVAL;
+
+ if (strlen(chrome_create) >= DM_MAX_STR_SIZE) {
+ DMERR("Argument is too big. Limit is %d\n", DM_MAX_STR_SIZE);
+ return -EINVAL;
+ }
+
+ base_str = kstrdup(chrome_create, GFP_KERNEL);
+ if (!base_str)
+ return -ENOMEM;
+
+ next = str_field_delimit(&base_str, ' ');
+ if (!next) {
+ ret = -EINVAL;
+ goto out_str;
+ }
+
+ /* if first field is not the optional <num> field */
+ if (kstrtouint(base_str, 0, &num_devs)) {
+ num_devs = 1;
+ /* rewind next pointer */
+ next = base_str;
+ }
+
+ if (num_devs > DM_MAX_DEVICES) {
+ DMERR("too many devices %u > %d", num_devs, DM_MAX_DEVICES);
+ ret = -EINVAL;
+ goto out_str;
+ }
+
+ devs = kcalloc(num_devs, sizeof(*devs), GFP_KERNEL);
+ if (!devs)
+ return -ENOMEM;
+
+ /* restore string */
+ strcpy(base_str, chrome_create);
+
+ /* parse devices */
+ for (i = 0; i < num_devs; i++) {
+ next = dm_chrome_parse_dev(next, &devs[i]);
+ if (IS_ERR(next)) {
+ DMERR("couldn't parse device");
+ ret = PTR_ERR(next);
+ goto out_devs;
+ }
+ }
+
+ create = dm_chrome_convert(devs, num_devs);
+ if (IS_ERR(create)) {
+ ret = PTR_ERR(create);
+ goto out_devs;
+ }
+
+ DMDEBUG("Converting:\n\tdm=\"%s\"\n\tdm-mod.create=\"%s\"\n",
+ chrome_create, create);
+
+ /* Call upstream code */
+ dm_init_init();
+
+ kfree(create);
+
+out_devs:
+ create = NULL;
+ kfree(devs);
+out_str:
+ kfree(base_str);
+
+ return ret;
+}
+
+late_initcall(dm_chrome_parse_devices);
+
+__setup("dm=", dm_chrome_shim);
diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index 808a98e..ac5f9ad 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -16,8 +16,10 @@
#include "dm-verity.h"
#include "dm-verity-fec.h"
#include "dm-verity-verify-sig.h"
+#include <linux/delay.h>
#include <linux/module.h>
#include <linux/reboot.h>
+#include <crypto/hash.h>
#define DM_MSG_PREFIX "verity"
@@ -33,8 +35,9 @@
#define DM_VERITY_OPT_PANIC "panic_on_corruption"
#define DM_VERITY_OPT_IGN_ZEROES "ignore_zero_blocks"
#define DM_VERITY_OPT_AT_MOST_ONCE "check_at_most_once"
+#define DM_VERITY_OPT_ERROR_BEHAVIOR "error_behavior"
-#define DM_VERITY_OPTS_MAX (3 + DM_VERITY_OPTS_FEC + \
+#define DM_VERITY_OPTS_MAX (4 + DM_VERITY_OPTS_FEC + \
DM_VERITY_ROOT_HASH_VERIFICATION_OPTS)
static unsigned dm_verity_prefetch_cluster = DM_VERITY_DEFAULT_PREFETCH_SIZE;
@@ -48,6 +51,120 @@
unsigned n_blocks;
};
+/* Provide a lightweight means of specifying the global default for
+ * error behavior: eio, reboot, or none
+ * Legacy support for 0 = eio, 1 = reboot/panic, 2 = none, 3 = notify.
+ * This is matched to the enum in dm-verity.h.
+ */
+static char *error_behavior_istring[] = { "0", "1", "2", "3" };
+static const char *allowed_error_behaviors[] = { "eio", "panic", "none",
+ "notify", NULL };
+static char *error_behavior = "eio";
+module_param(error_behavior, charp, 0644);
+MODULE_PARM_DESC(error_behavior, "Behavior on error "
+ "(eio, panic, none, notify)");
+
+/* Controls whether verity_get_device will wait forever for a device. */
+static int dev_wait;
+module_param(dev_wait, int, 0444);
+MODULE_PARM_DESC(dev_wait, "Wait forever for a backing device");
+
+static BLOCKING_NOTIFIER_HEAD(verity_error_notifier);
+
+int dm_verity_register_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_register(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_register_error_notifier);
+
+int dm_verity_unregister_error_notifier(struct notifier_block *nb)
+{
+ return blocking_notifier_chain_unregister(&verity_error_notifier, nb);
+}
+EXPORT_SYMBOL_GPL(dm_verity_unregister_error_notifier);
+
+/* If the request is not successful, this handler takes action.
+ * TODO make this call a registered handler.
+ */
+static void verity_error(struct dm_verity *v, struct dm_verity_io *io,
+ blk_status_t status)
+{
+ const char *message = v->hash_failed ? "integrity" : "block";
+ int error_behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ dev_t devt = 0;
+ u64 block = ~0;
+ struct dm_verity_error_state error_state;
+ /* If the hash did not fail, then this is likely transient. */
+ int transient = !v->hash_failed;
+
+ devt = v->data_dev->bdev->bd_dev;
+ error_behavior = v->error_behavior;
+
+ DMERR_LIMIT("verification failure occurred: %s failure", message);
+
+ if (error_behavior == DM_VERITY_ERROR_BEHAVIOR_NOTIFY) {
+ error_state.code = status;
+ error_state.transient = transient;
+ error_state.block = block;
+ error_state.message = message;
+ error_state.dev_start = v->data_start;
+ error_state.dev_len = v->data_blocks;
+ error_state.dev = v->data_dev->bdev;
+ error_state.hash_dev_start = v->hash_start;
+ error_state.hash_dev_len = v->hash_blocks;
+ error_state.hash_dev = v->hash_dev->bdev;
+
+ /* Set default fallthrough behavior. */
+ error_state.behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+ error_behavior = DM_VERITY_ERROR_BEHAVIOR_PANIC;
+
+ if (!blocking_notifier_call_chain(
+ &verity_error_notifier, transient, &error_state)) {
+ error_behavior = error_state.behavior;
+ }
+ }
+
+ switch (error_behavior) {
+ case DM_VERITY_ERROR_BEHAVIOR_EIO:
+ break;
+ case DM_VERITY_ERROR_BEHAVIOR_NONE:
+ break;
+ default:
+ if (!transient)
+ goto do_panic;
+ }
+ return;
+
+do_panic:
+ panic("dm-verity failure: "
+ "device:%u:%u status:%d block:%llu message:%s",
+ MAJOR(devt), MINOR(devt), status, (u64)block, message);
+}
+
+/**
+ * verity_parse_error_behavior - parse a behavior charp to the enum
+ * @behavior: NUL-terminated char array
+ *
+ * Checks if the behavior is valid either as text or as an index digit
+ * and returns the proper enum value in string form or ERR_PTR(-EINVAL)
+ * on error.
+ */
+static char *verity_parse_error_behavior(const char *behavior)
+{
+ const char **allowed = allowed_error_behaviors;
+ int index;
+
+ for (index = 0; *allowed; allowed++, index++)
+ if (!strcmp(*allowed, behavior) || behavior[0] == index + '0')
+ break;
+
+ if (!*allowed)
+ return ERR_PTR(-EINVAL);
+
+ /* Convert to the integer index matching the enum. */
+ return error_behavior_istring[index];
+}
+
/*
* Auxiliary structure appended to each dm-bufio buffer. If the value
* hash_verified is nonzero, hash of the block has been verified.
@@ -554,6 +671,8 @@
struct dm_verity *v = io->v;
struct bio *bio = dm_bio_from_per_bio_data(io, v->ti->per_io_data_size);
+ if (status && !verity_fec_is_enabled(io->v))
+ verity_error(v, io, status);
bio->bi_end_io = io->orig_bi_end_io;
bio->bi_status = status;
@@ -942,6 +1061,22 @@
return r;
continue;
+ } else if (!strcasecmp(arg_name, DM_VERITY_OPT_ERROR_BEHAVIOR)) {
+ int behavior;
+
+ if (!argc) {
+ ti->error = "Missing error behavior parameter";
+ return -EINVAL;
+ }
+ if (kstrtoint(dm_shift_arg(as), 0, &behavior) ||
+ behavior < 0) {
+ ti->error = "Bad error behavior parameter";
+ return -EINVAL;
+ }
+ v->error_behavior = behavior;
+ argc--;
+ continue;
+
} else if (verity_is_fec_opt_arg(arg_name)) {
r = verity_fec_parse_opt_args(as, v, &argc, arg_name);
if (r)
@@ -964,6 +1099,132 @@
return r;
}
+static int verity_get_device(struct dm_target *ti, const char *devname,
+ struct dm_dev **dm_dev)
+{
+ do {
+ /* Try the normal path first since if everything is ready, it
+ * will be the fastest.
+ */
+ if (!dm_get_device(ti, devname,
+ dm_table_get_mode(ti->table), dm_dev))
+ return 0;
+
+ if (!dev_wait)
+ break;
+
+ /* No need to be too aggressive since this is a slow path. */
+ msleep(500);
+ } while (dev_wait && (driver_probe_done() != 0 || *dm_dev == NULL));
+ return -1;
+}
+
+static void splitarg(char *arg, char **key, char **val)
+{
+ *key = strsep(&arg, "=");
+ *val = strsep(&arg, "");
+}
+
+/* Convert Chrome OS arguments into standard arguments */
+
+static char *chromeos_args(unsigned *pargc, char ***pargv)
+{
+ char *hashstart = NULL;
+ char **argv = *pargv;
+ int argc = *pargc;
+ char *key, *val;
+ int nargc = 10;
+ char **nargv;
+ char *errstr;
+ int i;
+
+ nargv = kcalloc(14, sizeof(char *), GFP_KERNEL);
+ if (!nargv)
+ return "Failed to allocate memory";
+
+ nargv[0] = "0"; /* version */
+ nargv[3] = "4096"; /* hash block size */
+ nargv[4] = "4096"; /* data block size */
+ nargv[9] = "-"; /* salt (optional) */
+
+ for (i = 0; i < argc; ++i) {
+ DMDEBUG("Argument %d: '%s'", i, argv[i]);
+ splitarg(argv[i], &key, &val);
+ if (!key) {
+ DMWARN("Bad argument %d: missing key?", i);
+ errstr = "Bad argument: missing key";
+ goto err;
+ }
+ if (!val) {
+ DMWARN("Bad argument %d='%s': missing value", i, key);
+ errstr = "Bad argument: missing value";
+ goto err;
+ }
+ if (!strcmp(key, "alg")) {
+ nargv[7] = val;
+ } else if (!strcmp(key, "payload")) {
+ nargv[1] = val;
+ } else if (!strcmp(key, "hashtree")) {
+ nargv[2] = val;
+ } else if (!strcmp(key, "root_hexdigest")) {
+ nargv[8] = val;
+ } else if (!strcmp(key, "hashstart")) {
+ unsigned long num;
+
+ if (kstrtoul(val, 10, &num)) {
+ errstr = "Invalid hashstart";
+ goto err;
+ }
+ num >>= (12 - SECTOR_SHIFT);
+ hashstart = kmalloc(24, GFP_KERNEL);
+ if (!hashstart) {
+ errstr = "Failed to allocate memory";
+ goto err;
+ }
+ scnprintf(hashstart, sizeof(hashstart), "%lu", num);
+ nargv[5] = hashstart;
+ nargv[6] = hashstart;
+ } else if (!strcmp(key, "salt")) {
+ nargv[9] = val;
+ } else if (!strcmp(key, DM_VERITY_OPT_ERROR_BEHAVIOR)) {
+ char *behavior = verity_parse_error_behavior(val);
+
+ if (IS_ERR(behavior)) {
+ errstr = "Invalid error behavior";
+ goto err;
+ }
+ nargv[10] = "2";
+ nargv[11] = key;
+ nargv[12] = behavior;
+ nargc = 13;
+ }
+ }
+
+ if (!nargv[1] || !nargv[2] || !nargv[5] || !nargv[7] || !nargv[8]) {
+ errstr = "Missing argument";
+ goto err;
+ }
+
+ *pargc = nargc;
+ *pargv = nargv;
+ return NULL;
+
+err:
+ kfree(nargv);
+ kfree(hashstart);
+ return errstr;
+}
+
+/* Release memory allocated for Chrome OS parameter conversion */
+
+static void free_chromeos_argv(char **argv)
+{
+ if (argv) {
+ kfree(argv[5]);
+ kfree(argv);
+ }
+}
+
/*
* Target parameters:
* <version> The current format is version 1.
@@ -990,10 +1251,19 @@
sector_t hash_position;
char dummy;
char *root_hash_digest_to_validate;
+ char **chromeos_argv = NULL;
+
+ if (argc < 10) {
+ ti->error = chromeos_args(&argc, &argv);
+ if (ti->error)
+ return -EINVAL;
+ chromeos_argv = argv;
+ }
v = kzalloc(sizeof(struct dm_verity), GFP_KERNEL);
if (!v) {
ti->error = "Cannot allocate verity structure";
+ free_chromeos_argv(chromeos_argv);
return -ENOMEM;
}
ti->private = v;
@@ -1023,13 +1293,13 @@
}
v->version = num;
- r = dm_get_device(ti, argv[1], FMODE_READ, &v->data_dev);
+ r = verity_get_device(ti, argv[1], &v->data_dev);
if (r) {
ti->error = "Data device lookup failed";
goto bad;
}
- r = dm_get_device(ti, argv[2], FMODE_READ, &v->hash_dev);
+ r = verity_get_device(ti, argv[2], &v->hash_dev);
if (r) {
ti->error = "Hash device lookup failed";
goto bad;
@@ -1229,14 +1499,14 @@
__alignof__(struct dm_verity_io));
verity_verify_sig_opts_cleanup(&verify_args);
-
+ free_chromeos_argv(chromeos_argv);
return 0;
bad:
verity_verify_sig_opts_cleanup(&verify_args);
verity_dtr(ti);
-
+ free_chromeos_argv(chromeos_argv);
return r;
}
diff --git a/drivers/md/dm-verity.h b/drivers/md/dm-verity.h
index 4e769d13..6ec6ffa 100644
--- a/drivers/md/dm-verity.h
+++ b/drivers/md/dm-verity.h
@@ -14,6 +14,7 @@
#include <linux/dm-bufio.h>
#include <linux/device-mapper.h>
#include <crypto/hash.h>
+#include <linux/notifier.h>
#define DM_VERITY_MAX_LEVELS 63
@@ -56,6 +57,7 @@
int hash_failed; /* set to 1 if hash of any block failed */
enum verity_mode mode; /* mode for handling verification errors */
unsigned corrupted_errs;/* Number of errors for corrupted blocks */
+ int error_behavior; /* selects error behavior on io errors */
struct workqueue_struct *verify_wq;
@@ -93,6 +95,40 @@
*/
};
+struct verity_result {
+ struct completion completion;
+ int err;
+};
+
+struct dm_verity_error_state {
+ int code;
+ int transient; /* Likely to not happen after a reboot */
+ u64 block;
+ const char *message;
+
+ sector_t dev_start;
+ sector_t dev_len;
+ struct block_device *dev;
+
+ sector_t hash_dev_start;
+ sector_t hash_dev_len;
+ struct block_device *hash_dev;
+
+ /* Final behavior after all notifications are completed. */
+ int behavior;
+};
+
+/* This enum must be matched to allowed_error_behaviors in dm-verity.c */
+enum dm_verity_error_behavior {
+ DM_VERITY_ERROR_BEHAVIOR_EIO = 0,
+ DM_VERITY_ERROR_BEHAVIOR_PANIC,
+ DM_VERITY_ERROR_BEHAVIOR_NONE,
+ DM_VERITY_ERROR_BEHAVIOR_NOTIFY
+};
+
+int dm_verity_register_error_notifier(struct notifier_block *nb);
+int dm_verity_unregister_error_notifier(struct notifier_block *nb);
+
static inline struct ahash_request *verity_io_hash_req(struct dm_verity *v,
struct dm_verity_io *io)
{
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index f5c8022..f7fa111 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -97,6 +97,7 @@
/* A TX desc ring entry */
union gve_tx_desc {
struct gve_tx_pkt_desc pkt; /* first desc for a packet */
+ struct gve_tx_mtd_desc mtd; /* optional metadata descriptor */
struct gve_tx_seg_desc seg; /* subsequent descs for a packet */
};
diff --git a/drivers/net/ethernet/google/gve/gve_desc.h b/drivers/net/ethernet/google/gve/gve_desc.h
index 54779871..e24d031 100644
--- a/drivers/net/ethernet/google/gve/gve_desc.h
+++ b/drivers/net/ethernet/google/gve/gve_desc.h
@@ -31,6 +31,14 @@
__be64 seg_addr; /* Base address (see note) of this segment */
} __packed;
+struct gve_tx_mtd_desc {
+ u8 type_flags; /* type is lower 4 bits, subtype upper */
+ u8 path_state; /* state is lower 4 bits, hash type upper */
+ __be16 reserved0;
+ __be32 path_hash;
+ __be64 reserved1;
+} __packed;
+
struct gve_tx_seg_desc {
u8 type_flags; /* type is lower 4 bits, flags upper */
u8 l3_offset; /* TSO: 2 byte units to start of IPH */
@@ -44,6 +52,7 @@
#define GVE_TXD_STD (0x0 << 4) /* Std with Host Address */
#define GVE_TXD_TSO (0x1 << 4) /* TSO with Host Address */
#define GVE_TXD_SEG (0x2 << 4) /* Seg with Host Address */
+#define GVE_TXD_MTD (0x3 << 4) /* Metadata */
/* GVE Transmit Descriptor Flags for Std Pkts */
#define GVE_TXF_L4CSUM BIT(0) /* Need csum offload */
@@ -52,6 +61,17 @@
/* GVE Transmit Descriptor Flags for TSO Segs */
#define GVE_TXSF_IPV6 BIT(1) /* IPv6 TSO */
+/* GVE Transmit Descriptor Options for MTD Segs */
+#define GVE_MTD_SUBTYPE_PATH 0
+
+#define GVE_MTD_PATH_STATE_DEFAULT 0
+#define GVE_MTD_PATH_STATE_TIMEOUT 1
+#define GVE_MTD_PATH_STATE_CONGESTION 2
+#define GVE_MTD_PATH_STATE_RETRANSMIT 3
+
+#define GVE_MTD_PATH_HASH_NONE (0x0 << 4)
+#define GVE_MTD_PATH_HASH_L4 (0x1 << 4)
+
/* GVE Receive Packet Descriptor */
/* The start of an ethernet packet comes 2 bytes into the rx buffer.
* gVNIC adds this padding so that both the DMA and the L3/4 protocol header
diff --git a/drivers/net/ethernet/google/gve/gve_tx.c b/drivers/net/ethernet/google/gve/gve_tx.c
index b653197..61c6b30 100644
--- a/drivers/net/ethernet/google/gve/gve_tx.c
+++ b/drivers/net/ethernet/google/gve/gve_tx.c
@@ -379,6 +379,19 @@
pkt_desc->pkt.seg_addr = cpu_to_be64(addr);
}
+static void gve_tx_fill_mtd_desc(union gve_tx_desc *mtd_desc,
+ struct sk_buff *skb)
+{
+ BUILD_BUG_ON(sizeof(mtd_desc->mtd) != sizeof(mtd_desc->pkt));
+
+ mtd_desc->mtd.type_flags = GVE_TXD_MTD | GVE_MTD_SUBTYPE_PATH;
+ mtd_desc->mtd.path_state = GVE_MTD_PATH_STATE_DEFAULT |
+ GVE_MTD_PATH_HASH_L4;
+ mtd_desc->mtd.path_hash = skb->hash;
+ mtd_desc->mtd.reserved0 = 0;
+ mtd_desc->mtd.reserved1 = 0;
+}
+
static void gve_tx_fill_seg_desc(union gve_tx_desc *seg_desc,
struct sk_buff *skb, bool is_gso,
u16 len, u64 addr)
@@ -414,6 +427,7 @@
int pad_bytes, hlen, hdr_nfrags, payload_nfrags, l4_hdr_offset;
union gve_tx_desc *pkt_desc, *seg_desc;
struct gve_tx_buffer_state *info;
+ int mtd_desc_nr = !!skb->l4_hash;
bool is_gso = skb_is_gso(skb);
u32 idx = tx->req & tx->mask;
int payload_iov = 2;
@@ -445,7 +459,7 @@
&info->iov[payload_iov]);
gve_tx_fill_pkt_desc(pkt_desc, skb, is_gso, l4_hdr_offset,
- 1 + payload_nfrags, hlen,
+ 1 + mtd_desc_nr + payload_nfrags, hlen,
info->iov[hdr_nfrags - 1].iov_offset);
skb_copy_bits(skb, 0,
@@ -456,8 +470,13 @@
info->iov[hdr_nfrags - 1].iov_len);
copy_offset = hlen;
+ if (mtd_desc_nr) {
+ next_idx = (tx->req + 1) & tx->mask;
+ gve_tx_fill_mtd_desc(&tx->desc[next_idx], skb);
+ }
+
for (i = payload_iov; i < payload_nfrags + payload_iov; i++) {
- next_idx = (tx->req + 1 + i - payload_iov) & tx->mask;
+ next_idx = (tx->req + 1 + mtd_desc_nr + i - payload_iov) & tx->mask;
seg_desc = &tx->desc[next_idx];
gve_tx_fill_seg_desc(seg_desc, skb, is_gso,
@@ -473,7 +492,7 @@
copy_offset += info->iov[i].iov_len;
}
- return 1 + payload_nfrags;
+ return 1 + mtd_desc_nr + payload_nfrags;
}
netdev_tx_t gve_tx(struct sk_buff *skb, struct net_device *dev)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 0a721f6..e31f8fb 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -3109,13 +3109,19 @@
return 0;
}
-static int mem_xadd4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+static int mem_atomic4(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
+ if (meta->insn.imm != BPF_ADD)
+ return -EOPNOTSUPP;
+
return mem_xadd(nfp_prog, meta, false);
}
-static int mem_xadd8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
+static int mem_atomic8(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta)
{
+ if (meta->insn.imm != BPF_ADD)
+ return -EOPNOTSUPP;
+
return mem_xadd(nfp_prog, meta, true);
}
@@ -3475,8 +3481,8 @@
[BPF_STX | BPF_MEM | BPF_H] = mem_stx2,
[BPF_STX | BPF_MEM | BPF_W] = mem_stx4,
[BPF_STX | BPF_MEM | BPF_DW] = mem_stx8,
- [BPF_STX | BPF_XADD | BPF_W] = mem_xadd4,
- [BPF_STX | BPF_XADD | BPF_DW] = mem_xadd8,
+ [BPF_STX | BPF_ATOMIC | BPF_W] = mem_atomic4,
+ [BPF_STX | BPF_ATOMIC | BPF_DW] = mem_atomic8,
[BPF_ST | BPF_MEM | BPF_B] = mem_st1,
[BPF_ST | BPF_MEM | BPF_H] = mem_st2,
[BPF_ST | BPF_MEM | BPF_W] = mem_st4,
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/main.h b/drivers/net/ethernet/netronome/nfp/bpf/main.h
index fac9c6f..d0e17ee 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/main.h
+++ b/drivers/net/ethernet/netronome/nfp/bpf/main.h
@@ -428,9 +428,9 @@
return is_mbpf_classic_store(meta) && meta->ptr.type == PTR_TO_PACKET;
}
-static inline bool is_mbpf_xadd(const struct nfp_insn_meta *meta)
+static inline bool is_mbpf_atomic(const struct nfp_insn_meta *meta)
{
- return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_XADD);
+ return (meta->insn.code & ~BPF_SIZE_MASK) == (BPF_STX | BPF_ATOMIC);
}
static inline bool is_mbpf_mul(const struct nfp_insn_meta *meta)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
index e92ee51..9d235c0 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/verifier.c
@@ -479,7 +479,7 @@
pr_vlog(env, "map writes not supported\n");
return -EOPNOTSUPP;
}
- if (is_mbpf_xadd(meta)) {
+ if (is_mbpf_atomic(meta)) {
err = nfp_bpf_map_mark_used(env, meta, reg,
NFP_MAP_USE_ATOMIC_CNT);
if (err)
@@ -523,12 +523,17 @@
}
static int
-nfp_bpf_check_xadd(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
- struct bpf_verifier_env *env)
+nfp_bpf_check_atomic(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
+ struct bpf_verifier_env *env)
{
const struct bpf_reg_state *sreg = cur_regs(env) + meta->insn.src_reg;
const struct bpf_reg_state *dreg = cur_regs(env) + meta->insn.dst_reg;
+ if (meta->insn.imm != BPF_ADD) {
+ pr_vlog(env, "atomic op not implemented: %d\n", meta->insn.imm);
+ return -EOPNOTSUPP;
+ }
+
if (dreg->type != PTR_TO_MAP_VALUE) {
pr_vlog(env, "atomic add not to a map value pointer: %d\n",
dreg->type);
@@ -655,8 +660,8 @@
if (is_mbpf_store(meta))
return nfp_bpf_check_store(nfp_prog, meta, env);
- if (is_mbpf_xadd(meta))
- return nfp_bpf_check_xadd(nfp_prog, meta, env);
+ if (is_mbpf_atomic(meta))
+ return nfp_bpf_check_atomic(nfp_prog, meta, env);
if (is_mbpf_alu(meta))
return nfp_bpf_check_alu(nfp_prog, meta, env);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 7cae226..e4379cc 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -152,7 +152,12 @@
#define EXT4_MB_USE_RESERVED 0x2000
/* Do strict check for free blocks while retrying block allocation */
#define EXT4_MB_STRICT_CHECK 0x4000
-
+/* Large fragment size list lookup succeeded at least once for cr = 0 */
+#define EXT4_MB_CR0_OPTIMIZED 0x8000
+/* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */
+#define EXT4_MB_CR1_OPTIMIZED 0x00010000
+/* Perform linear traversal for one group */
+#define EXT4_MB_SEARCH_NEXT_LINEAR 0x00020000
struct ext4_allocation_request {
/* target inode for block we're allocating */
struct inode *inode;
@@ -1212,7 +1217,7 @@
#define EXT4_MOUNT_JOURNAL_CHECKSUM 0x800000 /* Journal checksums */
#define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT 0x1000000 /* Journal Async Commit */
#define EXT4_MOUNT_WARN_ON_ERROR 0x2000000 /* Trigger WARN_ON on error */
-#define EXT4_MOUNT_PREFETCH_BLOCK_BITMAPS 0x4000000
+#define EXT4_MOUNT_NO_PREFETCH_BLOCK_BITMAPS 0x4000000
#define EXT4_MOUNT_DELALLOC 0x8000000 /* Delalloc support */
#define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000 /* Abort on file data write */
#define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000 /* Block validity checking */
@@ -1237,7 +1242,9 @@
#define EXT4_MOUNT2_JOURNAL_FAST_COMMIT 0x00000010 /* Journal fast commit */
#define EXT4_MOUNT2_DAX_NEVER 0x00000020 /* Do not allow Direct Access */
#define EXT4_MOUNT2_DAX_INODE 0x00000040 /* For printing options only */
-
+#define EXT4_MOUNT2_MB_OPTIMIZE_SCAN 0x00000080 /* Optimize group
+ * scanning in mballoc
+ */
#define clear_opt(sb, opt) EXT4_SB(sb)->s_mount_opt &= \
~EXT4_MOUNT_##opt
@@ -1518,9 +1525,14 @@
unsigned int s_mb_free_pending;
struct list_head s_freed_data_list; /* List of blocks to be freed
after commit completed */
+ struct rb_root s_mb_avg_fragment_size_root;
+ rwlock_t s_mb_rb_lock;
+ struct list_head *s_mb_largest_free_orders;
+ rwlock_t *s_mb_largest_free_orders_locks;
/* tunables */
unsigned long s_stripe;
+ unsigned int s_mb_max_linear_groups;
unsigned int s_mb_stream_request;
unsigned int s_mb_max_to_scan;
unsigned int s_mb_min_to_scan;
@@ -1540,12 +1552,17 @@
atomic_t s_bal_success; /* we found long enough chunks */
atomic_t s_bal_allocated; /* in blocks */
atomic_t s_bal_ex_scanned; /* total extents scanned */
+ atomic_t s_bal_groups_scanned; /* number of groups scanned */
atomic_t s_bal_goals; /* goal hits */
atomic_t s_bal_breaks; /* too long searches */
atomic_t s_bal_2orders; /* 2^order hits */
- spinlock_t s_bal_lock;
- unsigned long s_mb_buddies_generated;
- unsigned long long s_mb_generation_time;
+ atomic_t s_bal_cr0_bad_suggestions;
+ atomic_t s_bal_cr1_bad_suggestions;
+ atomic64_t s_bal_cX_groups_considered[4];
+ atomic64_t s_bal_cX_hits[4];
+ atomic64_t s_bal_cX_failed[4]; /* cX loop didn't find blocks */
+ atomic_t s_mb_buddies_generated; /* number of buddies generated */
+ atomic64_t s_mb_generation_time;
atomic_t s_mb_lost_chunks;
atomic_t s_mb_preallocated;
atomic_t s_mb_discarded;
@@ -2780,8 +2797,10 @@
/* mballoc.c */
extern const struct seq_operations ext4_mb_seq_groups_ops;
+extern const struct seq_operations ext4_mb_seq_structs_summary_ops;
extern long ext4_mb_stats;
extern long ext4_mb_max_to_scan;
+extern int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset);
extern int ext4_mb_init(struct super_block *);
extern int ext4_mb_release(struct super_block *);
extern ext4_fsblk_t ext4_mb_new_blocks(handle_t *,
@@ -3283,11 +3302,14 @@
ext4_grpblk_t bb_free; /* total free blocks */
ext4_grpblk_t bb_fragments; /* nr of freespace fragments */
ext4_grpblk_t bb_largest_free_order;/* order of largest frag in BG */
+ ext4_group_t bb_group; /* Group number */
struct list_head bb_prealloc_list;
#ifdef DOUBLE_CHECK
void *bb_bitmap;
#endif
struct rw_semaphore alloc_sem;
+ struct rb_node bb_avg_fragment_size_rb;
+ struct list_head bb_largest_free_order_node;
ext4_grpblk_t bb_counters[]; /* Nr of free power-of-two-block
* regions, index is order.
* bb_counters[3] = 5 means
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 12eac88..3960b7e 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5988,7 +5988,6 @@
kfree(path);
break;
}
- ex = path2[path2->p_depth].p_ext;
for (i = 0; i <= max(path->p_depth, path2->p_depth); i++) {
cmp1 = cmp2 = 0;
if (i <= path->p_depth)
diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
index 896e117..585ecbd 100644
--- a/fs/ext4/fast_commit.c
+++ b/fs/ext4/fast_commit.c
@@ -103,8 +103,69 @@
*
* Replay code should thus check for all the valid tails in the FC area.
*
+ * Fast Commit Replay Idempotence
+ * ------------------------------
+ *
+ * Fast commits tags are idempotent in nature provided the recovery code follows
+ * certain rules. The guiding principle that the commit path follows while
+ * committing is that it stores the result of a particular operation instead of
+ * storing the procedure.
+ *
+ * Let's consider this rename operation: 'mv /a /b'. Let's assume dirent '/a'
+ * was associated with inode 10. During fast commit, instead of storing this
+ * operation as a procedure "rename a to b", we store the resulting file system
+ * state as a "series" of outcomes:
+ *
+ * - Link dirent b to inode 10
+ * - Unlink dirent a
+ * - Inode <10> with valid refcount
+ *
+ * Now when recovery code runs, it needs "enforce" this state on the file
+ * system. This is what guarantees idempotence of fast commit replay.
+ *
+ * Let's take an example of a procedure that is not idempotent and see how fast
+ * commits make it idempotent. Consider following sequence of operations:
+ *
+ * rm A; mv B A; read A
+ * (x) (y) (z)
+ *
+ * (x), (y) and (z) are the points at which we can crash. If we store this
+ * sequence of operations as is then the replay is not idempotent. Let's say
+ * while in replay, we crash at (z). During the second replay, file A (which was
+ * actually created as a result of "mv B A" operation) would get deleted. Thus,
+ * file named A would be absent when we try to read A. So, this sequence of
+ * operations is not idempotent. However, as mentioned above, instead of storing
+ * the procedure fast commits store the outcome of each procedure. Thus the fast
+ * commit log for above procedure would be as follows:
+ *
+ * (Let's assume dirent A was linked to inode 10 and dirent B was linked to
+ * inode 11 before the replay)
+ *
+ * [Unlink A] [Link A to inode 11] [Unlink B] [Inode 11]
+ * (w) (x) (y) (z)
+ *
+ * If we crash at (z), we will have file A linked to inode 11. During the second
+ * replay, we will remove file A (inode 11). But we will create it back and make
+ * it point to inode 11. We won't find B, so we'll just skip that step. At this
+ * point, the refcount for inode 11 is not reliable, but that gets fixed by the
+ * replay of last inode 11 tag. Crashes at points (w), (x) and (y) get handled
+ * similarly. Thus, by converting a non-idempotent procedure into a series of
+ * idempotent outcomes, fast commits ensured idempotence during the replay.
+ *
* TODOs
* -----
+ *
+ * 0) Fast commit replay path hardening: Fast commit replay code should use
+ * journal handles to make sure all the updates it does during the replay
+ * path are atomic. With that if we crash during fast commit replay, after
+ * trying to do recovery again, we will find a file system where fast commit
+ * area is invalid (because new full commit would be found). In order to deal
+ * with that, fast commit replay code should ensure that the "FC_REPLAY"
+ * superblock state is persisted before starting the replay, so that after
+ * the crash, fast commit recovery code can look at that flag and perform
+ * fast commit recovery even if that area is invalidated by later full
+ * commits.
+ *
* 1) Make fast commit atomic updates more fine grained. Today, a fast commit
* eligible update must be protected within ext4_fc_start_update() and
* ext4_fc_stop_update(). These routines are called at much higher
@@ -548,13 +609,13 @@
trace_ext4_fc_track_range(inode, start, end, ret);
}
-static void ext4_fc_submit_bh(struct super_block *sb)
+static void ext4_fc_submit_bh(struct super_block *sb, bool is_tail)
{
int write_flags = REQ_SYNC;
struct buffer_head *bh = EXT4_SB(sb)->s_fc_bh;
- /* TODO: REQ_FUA | REQ_PREFLUSH is unnecessarily expensive. */
- if (test_opt(sb, BARRIER))
+ /* Add REQ_FUA | REQ_PREFLUSH only its tail */
+ if (test_opt(sb, BARRIER) && is_tail)
write_flags |= REQ_FUA | REQ_PREFLUSH;
lock_buffer(bh);
set_buffer_dirty(bh);
@@ -628,7 +689,7 @@
*crc = ext4_chksum(sbi, *crc, tl, sizeof(*tl));
if (pad_len > 0)
ext4_fc_memzero(sb, tl + 1, pad_len, crc);
- ext4_fc_submit_bh(sb);
+ ext4_fc_submit_bh(sb, false);
ret = jbd2_fc_get_buf(EXT4_SB(sb)->s_journal, &bh);
if (ret)
@@ -685,7 +746,7 @@
tail.fc_crc = cpu_to_le32(crc);
ext4_fc_memcpy(sb, dst, &tail.fc_crc, sizeof(tail.fc_crc), NULL);
- ext4_fc_submit_bh(sb);
+ ext4_fc_submit_bh(sb, true);
return 0;
}
@@ -1227,18 +1288,6 @@
/* Ext4 Replay Path Routines */
-/* Get length of a particular tlv */
-static inline int ext4_fc_tag_len(struct ext4_fc_tl *tl)
-{
- return le16_to_cpu(tl->fc_len);
-}
-
-/* Get a pointer to "value" of a tlv */
-static inline u8 *ext4_fc_tag_val(struct ext4_fc_tl *tl)
-{
- return (u8 *)tl + sizeof(*tl);
-}
-
/* Helper struct for dentry replay routines */
struct dentry_info_args {
int parent_ino, dname_len, ino, inode_len;
@@ -1778,32 +1827,6 @@
return 0;
}
-static inline const char *tag2str(u16 tag)
-{
- switch (tag) {
- case EXT4_FC_TAG_LINK:
- return "TAG_ADD_ENTRY";
- case EXT4_FC_TAG_UNLINK:
- return "TAG_DEL_ENTRY";
- case EXT4_FC_TAG_ADD_RANGE:
- return "TAG_ADD_RANGE";
- case EXT4_FC_TAG_CREAT:
- return "TAG_CREAT_DENTRY";
- case EXT4_FC_TAG_DEL_RANGE:
- return "TAG_DEL_RANGE";
- case EXT4_FC_TAG_INODE:
- return "TAG_INODE";
- case EXT4_FC_TAG_PAD:
- return "TAG_PAD";
- case EXT4_FC_TAG_TAIL:
- return "TAG_TAIL";
- case EXT4_FC_TAG_HEAD:
- return "TAG_HEAD";
- default:
- return "TAG_ERROR";
- }
-}
-
static void ext4_fc_set_bitmaps_and_counters(struct super_block *sb)
{
struct ext4_fc_replay_state *state;
diff --git a/fs/ext4/fast_commit.h b/fs/ext4/fast_commit.h
index 3a6e5a1..b77f70f 100644
--- a/fs/ext4/fast_commit.h
+++ b/fs/ext4/fast_commit.h
@@ -3,6 +3,11 @@
#ifndef __FAST_COMMIT_H__
#define __FAST_COMMIT_H__
+/*
+ * Note this file is present in e2fsprogs/lib/ext2fs/fast_commit.h and
+ * linux/fs/ext4/fast_commit.h. These file should always be byte identical.
+ */
+
/* Fast commit tags */
#define EXT4_FC_TAG_ADD_RANGE 0x0001
#define EXT4_FC_TAG_DEL_RANGE 0x0002
@@ -50,7 +55,7 @@
struct ext4_fc_dentry_info {
__le32 fc_parent_ino;
__le32 fc_ino;
- u8 fc_dname[0];
+ __u8 fc_dname[0];
};
/* Value structure for EXT4_FC_TAG_INODE and EXT4_FC_TAG_INODE_PARTIAL. */
@@ -66,19 +71,6 @@
};
/*
- * In memory list of dentry updates that are performed on the file
- * system used by fast commit code.
- */
-struct ext4_fc_dentry_update {
- int fcd_op; /* Type of update create / unlink / link */
- int fcd_parent; /* Parent inode number */
- int fcd_ino; /* Inode number */
- struct qstr fcd_name; /* Dirent name */
- unsigned char fcd_iname[DNAME_INLINE_LEN]; /* Dirent name string */
- struct list_head fcd_list;
-};
-
-/*
* Fast commit reason codes
*/
enum {
@@ -107,6 +99,20 @@
EXT4_FC_REASON_MAX
};
+#ifdef __KERNEL__
+/*
+ * In memory list of dentry updates that are performed on the file
+ * system used by fast commit code.
+ */
+struct ext4_fc_dentry_update {
+ int fcd_op; /* Type of update create / unlink / link */
+ int fcd_parent; /* Parent inode number */
+ int fcd_ino; /* Inode number */
+ struct qstr fcd_name; /* Dirent name */
+ unsigned char fcd_iname[DNAME_INLINE_LEN]; /* Dirent name string */
+ struct list_head fcd_list;
+};
+
struct ext4_fc_stats {
unsigned int fc_ineligible_reason_count[EXT4_FC_REASON_MAX];
unsigned long fc_num_commits;
@@ -145,13 +151,51 @@
};
#define region_last(__region) (((__region)->lblk) + ((__region)->len) - 1)
+#endif
#define fc_for_each_tl(__start, __end, __tl) \
- for (tl = (struct ext4_fc_tl *)start; \
- (u8 *)tl < (u8 *)end; \
- tl = (struct ext4_fc_tl *)((u8 *)tl + \
+ for (tl = (struct ext4_fc_tl *)(__start); \
+ (__u8 *)tl < (__u8 *)(__end); \
+ tl = (struct ext4_fc_tl *)((__u8 *)tl + \
sizeof(struct ext4_fc_tl) + \
+ le16_to_cpu(tl->fc_len)))
+static inline const char *tag2str(__u16 tag)
+{
+ switch (tag) {
+ case EXT4_FC_TAG_LINK:
+ return "ADD_ENTRY";
+ case EXT4_FC_TAG_UNLINK:
+ return "DEL_ENTRY";
+ case EXT4_FC_TAG_ADD_RANGE:
+ return "ADD_RANGE";
+ case EXT4_FC_TAG_CREAT:
+ return "CREAT_DENTRY";
+ case EXT4_FC_TAG_DEL_RANGE:
+ return "DEL_RANGE";
+ case EXT4_FC_TAG_INODE:
+ return "INODE";
+ case EXT4_FC_TAG_PAD:
+ return "PAD";
+ case EXT4_FC_TAG_TAIL:
+ return "TAIL";
+ case EXT4_FC_TAG_HEAD:
+ return "HEAD";
+ default:
+ return "ERROR";
+ }
+}
+
+/* Get length of a particular tlv */
+static inline int ext4_fc_tag_len(struct ext4_fc_tl *tl)
+{
+ return le16_to_cpu(tl->fc_len);
+}
+
+/* Get a pointer to "value" of a tlv */
+static inline __u8 *ext4_fc_tag_val(struct ext4_fc_tl *tl)
+{
+ return (__u8 *)tl + sizeof(*tl);
+}
#endif /* __FAST_COMMIT_H__ */
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index b6229fe..22fb863 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -127,11 +127,50 @@
* smallest multiple of the stripe value (sbi->s_stripe) which is
* greater than the default mb_group_prealloc.
*
+ * If "mb_optimize_scan" mount option is set, we maintain in memory group info
+ * structures in two data structures:
+ *
+ * 1) Array of largest free order lists (sbi->s_mb_largest_free_orders)
+ *
+ * Locking: sbi->s_mb_largest_free_orders_locks(array of rw locks)
+ *
+ * This is an array of lists where the index in the array represents the
+ * largest free order in the buddy bitmap of the participating group infos of
+ * that list. So, there are exactly MB_NUM_ORDERS(sb) (which means total
+ * number of buddy bitmap orders possible) number of lists. Group-infos are
+ * placed in appropriate lists.
+ *
+ * 2) Average fragment size rb tree (sbi->s_mb_avg_fragment_size_root)
+ *
+ * Locking: sbi->s_mb_rb_lock (rwlock)
+ *
+ * This is a red black tree consisting of group infos and the tree is sorted
+ * by average fragment sizes (which is calculated as ext4_group_info->bb_free
+ * / ext4_group_info->bb_fragments).
+ *
+ * When "mb_optimize_scan" mount option is set, mballoc consults the above data
+ * structures to decide the order in which groups are to be traversed for
+ * fulfilling an allocation request.
+ *
+ * At CR = 0, we look for groups which have the largest_free_order >= the order
+ * of the request. We directly look at the largest free order list in the data
+ * structure (1) above where largest_free_order = order of the request. If that
+ * list is empty, we look at remaining list in the increasing order of
+ * largest_free_order. This allows us to perform CR = 0 lookup in O(1) time.
+ *
+ * At CR = 1, we only consider groups where average fragment size > request
+ * size. So, we lookup a group which has average fragment size just above or
+ * equal to request size using our rb tree (data structure 2) in O(log N) time.
+ *
+ * If "mb_optimize_scan" mount option is not set, mballoc traverses groups in
+ * linear order which requires O(N) search time for each CR 0 and CR 1 phase.
+ *
* The regular allocator (using the buddy cache) supports a few tunables.
*
* /sys/fs/ext4/<partition>/mb_min_to_scan
* /sys/fs/ext4/<partition>/mb_max_to_scan
* /sys/fs/ext4/<partition>/mb_order2_req
+ * /sys/fs/ext4/<partition>/mb_linear_limit
*
* The regular allocator uses buddy scan only if the request len is power of
* 2 blocks and the order of allocation is >= sbi->s_mb_order2_reqs. The
@@ -149,6 +188,16 @@
* can be used for allocation. ext4_mb_good_group explains how the groups are
* checked.
*
+ * When "mb_optimize_scan" is turned on, as mentioned above, the groups may not
+ * get traversed linearly. That may result in subsequent allocations being not
+ * close to each other. And so, the underlying device may get filled up in a
+ * non-linear fashion. While that may not matter on non-rotational devices, for
+ * rotational devices that may result in higher seek times. "mb_linear_limit"
+ * tells mballoc how many groups mballoc should search linearly before
+ * performing consulting above data structures for more efficient lookups. For
+ * non rotational devices, this value defaults to 0 and for rotational devices
+ * this is set to MB_DEFAULT_LINEAR_LIMIT.
+ *
* Both the prealloc space are getting populated as above. So for the first
* request we will hit the buddy cache which will result in this prealloc
* space getting filled. The prealloc space is then later used for the
@@ -299,6 +348,8 @@
* - bitlock on a group (group)
* - object (inode/locality) (object)
* - per-pa lock (pa)
+ * - cr0 lists lock (cr0)
+ * - cr1 tree lock (cr1)
*
* Paths:
* - new pa
@@ -328,6 +379,9 @@
* group
* object
*
+ * - allocation path (ext4_mb_regular_allocator)
+ * group
+ * cr0/cr1
*/
static struct kmem_cache *ext4_pspace_cachep;
static struct kmem_cache *ext4_ac_cachep;
@@ -351,6 +405,9 @@
ext4_group_t group);
static void ext4_mb_new_preallocation(struct ext4_allocation_context *ac);
+static bool ext4_mb_good_group(struct ext4_allocation_context *ac,
+ ext4_group_t group, int cr);
+
/*
* The algorithm using this percpu seq counter goes below:
* 1. We sample the percpu discard_pa_seq counter before trying for block
@@ -744,6 +801,269 @@
}
}
+static void ext4_mb_rb_insert(struct rb_root *root, struct rb_node *new,
+ int (*cmp)(struct rb_node *, struct rb_node *))
+{
+ struct rb_node **iter = &root->rb_node, *parent = NULL;
+
+ while (*iter) {
+ parent = *iter;
+ if (cmp(new, *iter) > 0)
+ iter = &((*iter)->rb_left);
+ else
+ iter = &((*iter)->rb_right);
+ }
+
+ rb_link_node(new, parent, iter);
+ rb_insert_color(new, root);
+}
+
+static int
+ext4_mb_avg_fragment_size_cmp(struct rb_node *rb1, struct rb_node *rb2)
+{
+ struct ext4_group_info *grp1 = rb_entry(rb1,
+ struct ext4_group_info,
+ bb_avg_fragment_size_rb);
+ struct ext4_group_info *grp2 = rb_entry(rb2,
+ struct ext4_group_info,
+ bb_avg_fragment_size_rb);
+ int num_frags_1, num_frags_2;
+
+ num_frags_1 = grp1->bb_fragments ?
+ grp1->bb_free / grp1->bb_fragments : 0;
+ num_frags_2 = grp2->bb_fragments ?
+ grp2->bb_free / grp2->bb_fragments : 0;
+
+ return (num_frags_2 - num_frags_1);
+}
+
+/*
+ * Reinsert grpinfo into the avg_fragment_size tree with new average
+ * fragment size.
+ */
+static void
+mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+ if (!test_opt2(sb, MB_OPTIMIZE_SCAN) || grp->bb_free == 0)
+ return;
+
+ write_lock(&sbi->s_mb_rb_lock);
+ if (!RB_EMPTY_NODE(&grp->bb_avg_fragment_size_rb)) {
+ rb_erase(&grp->bb_avg_fragment_size_rb,
+ &sbi->s_mb_avg_fragment_size_root);
+ RB_CLEAR_NODE(&grp->bb_avg_fragment_size_rb);
+ }
+
+ ext4_mb_rb_insert(&sbi->s_mb_avg_fragment_size_root,
+ &grp->bb_avg_fragment_size_rb,
+ ext4_mb_avg_fragment_size_cmp);
+ write_unlock(&sbi->s_mb_rb_lock);
+}
+
+/*
+ * Choose next group by traversing largest_free_order lists. Updates *new_cr if
+ * cr level needs an update.
+ */
+static void ext4_mb_choose_next_group_cr0(struct ext4_allocation_context *ac,
+ int *new_cr, ext4_group_t *group, ext4_group_t ngroups)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
+ struct ext4_group_info *iter, *grp;
+ int i;
+
+ if (ac->ac_status == AC_STATUS_FOUND)
+ return;
+
+ if (unlikely(sbi->s_mb_stats && ac->ac_flags & EXT4_MB_CR0_OPTIMIZED))
+ atomic_inc(&sbi->s_bal_cr0_bad_suggestions);
+
+ grp = NULL;
+ for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
+ if (list_empty(&sbi->s_mb_largest_free_orders[i]))
+ continue;
+ read_lock(&sbi->s_mb_largest_free_orders_locks[i]);
+ if (list_empty(&sbi->s_mb_largest_free_orders[i])) {
+ read_unlock(&sbi->s_mb_largest_free_orders_locks[i]);
+ continue;
+ }
+ grp = NULL;
+ list_for_each_entry(iter, &sbi->s_mb_largest_free_orders[i],
+ bb_largest_free_order_node) {
+ if (sbi->s_mb_stats)
+ atomic64_inc(&sbi->s_bal_cX_groups_considered[0]);
+ if (likely(ext4_mb_good_group(ac, iter->bb_group, 0))) {
+ grp = iter;
+ break;
+ }
+ }
+ read_unlock(&sbi->s_mb_largest_free_orders_locks[i]);
+ if (grp)
+ break;
+ }
+
+ if (!grp) {
+ /* Increment cr and search again */
+ *new_cr = 1;
+ } else {
+ *group = grp->bb_group;
+ ac->ac_last_optimal_group = *group;
+ ac->ac_flags |= EXT4_MB_CR0_OPTIMIZED;
+ }
+}
+
+/*
+ * Choose next group by traversing average fragment size tree. Updates *new_cr
+ * if cr lvel needs an update. Sets EXT4_MB_SEARCH_NEXT_LINEAR to indicate that
+ * the linear search should continue for one iteration since there's lock
+ * contention on the rb tree lock.
+ */
+static void ext4_mb_choose_next_group_cr1(struct ext4_allocation_context *ac,
+ int *new_cr, ext4_group_t *group, ext4_group_t ngroups)
+{
+ struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
+ int avg_fragment_size, best_so_far;
+ struct rb_node *node, *found;
+ struct ext4_group_info *grp;
+
+ /*
+ * If there is contention on the lock, instead of waiting for the lock
+ * to become available, just continue searching lineraly. We'll resume
+ * our rb tree search later starting at ac->ac_last_optimal_group.
+ */
+ if (!read_trylock(&sbi->s_mb_rb_lock)) {
+ ac->ac_flags |= EXT4_MB_SEARCH_NEXT_LINEAR;
+ return;
+ }
+
+ if (unlikely(ac->ac_flags & EXT4_MB_CR1_OPTIMIZED)) {
+ if (sbi->s_mb_stats)
+ atomic_inc(&sbi->s_bal_cr1_bad_suggestions);
+ /* We have found something at CR 1 in the past */
+ grp = ext4_get_group_info(ac->ac_sb, ac->ac_last_optimal_group);
+ for (found = rb_next(&grp->bb_avg_fragment_size_rb); found != NULL;
+ found = rb_next(found)) {
+ grp = rb_entry(found, struct ext4_group_info,
+ bb_avg_fragment_size_rb);
+ if (sbi->s_mb_stats)
+ atomic64_inc(&sbi->s_bal_cX_groups_considered[1]);
+ if (likely(ext4_mb_good_group(ac, grp->bb_group, 1)))
+ break;
+ }
+ goto done;
+ }
+
+ node = sbi->s_mb_avg_fragment_size_root.rb_node;
+ best_so_far = 0;
+ found = NULL;
+
+ while (node) {
+ grp = rb_entry(node, struct ext4_group_info,
+ bb_avg_fragment_size_rb);
+ avg_fragment_size = 0;
+ if (ext4_mb_good_group(ac, grp->bb_group, 1)) {
+ avg_fragment_size = grp->bb_fragments ?
+ grp->bb_free / grp->bb_fragments : 0;
+ if (!best_so_far || avg_fragment_size < best_so_far) {
+ best_so_far = avg_fragment_size;
+ found = node;
+ }
+ }
+ if (avg_fragment_size > ac->ac_g_ex.fe_len)
+ node = node->rb_right;
+ else
+ node = node->rb_left;
+ }
+
+done:
+ if (found) {
+ grp = rb_entry(found, struct ext4_group_info,
+ bb_avg_fragment_size_rb);
+ *group = grp->bb_group;
+ ac->ac_flags |= EXT4_MB_CR1_OPTIMIZED;
+ } else {
+ *new_cr = 2;
+ }
+
+ read_unlock(&sbi->s_mb_rb_lock);
+ ac->ac_last_optimal_group = *group;
+}
+
+static inline int should_optimize_scan(struct ext4_allocation_context *ac)
+{
+ if (unlikely(!test_opt2(ac->ac_sb, MB_OPTIMIZE_SCAN)))
+ return 0;
+ if (ac->ac_criteria >= 2)
+ return 0;
+ if (ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS))
+ return 0;
+ return 1;
+}
+
+/*
+ * Return next linear group for allocation. If linear traversal should not be
+ * performed, this function just returns the same group
+ */
+static int
+next_linear_group(struct ext4_allocation_context *ac, int group, int ngroups)
+{
+ if (!should_optimize_scan(ac))
+ goto inc_and_return;
+
+ if (ac->ac_groups_linear_remaining) {
+ ac->ac_groups_linear_remaining--;
+ goto inc_and_return;
+ }
+
+ if (ac->ac_flags & EXT4_MB_SEARCH_NEXT_LINEAR) {
+ ac->ac_flags &= ~EXT4_MB_SEARCH_NEXT_LINEAR;
+ goto inc_and_return;
+ }
+
+ return group;
+inc_and_return:
+ /*
+ * Artificially restricted ngroups for non-extent
+ * files makes group > ngroups possible on first loop.
+ */
+ return group + 1 >= ngroups ? 0 : group + 1;
+}
+
+/*
+ * ext4_mb_choose_next_group: choose next group for allocation.
+ *
+ * @ac Allocation Context
+ * @new_cr This is an output parameter. If the there is no good group
+ * available at current CR level, this field is updated to indicate
+ * the new cr level that should be used.
+ * @group This is an input / output parameter. As an input it indicates the
+ * next group that the allocator intends to use for allocation. As
+ * output, this field indicates the next group that should be used as
+ * determined by the optimization functions.
+ * @ngroups Total number of groups
+ */
+static void ext4_mb_choose_next_group(struct ext4_allocation_context *ac,
+ int *new_cr, ext4_group_t *group, ext4_group_t ngroups)
+{
+ *new_cr = ac->ac_criteria;
+
+ if (!should_optimize_scan(ac) || ac->ac_groups_linear_remaining)
+ return;
+
+ if (*new_cr == 0) {
+ ext4_mb_choose_next_group_cr0(ac, new_cr, group, ngroups);
+ } else if (*new_cr == 1) {
+ ext4_mb_choose_next_group_cr1(ac, new_cr, group, ngroups);
+ } else {
+ /*
+ * TODO: For CR=2, we can arrange groups in an rb tree sorted by
+ * bb_free. But until that happens, we should never come here.
+ */
+ WARN_ON(1);
+ }
+}
+
/*
* Cache the order of the largest free extent we have available in this block
* group.
@@ -751,18 +1071,33 @@
static void
mb_set_largest_free_order(struct super_block *sb, struct ext4_group_info *grp)
{
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
int i;
- int bits;
+ if (test_opt2(sb, MB_OPTIMIZE_SCAN) && grp->bb_largest_free_order >= 0) {
+ write_lock(&sbi->s_mb_largest_free_orders_locks[
+ grp->bb_largest_free_order]);
+ list_del_init(&grp->bb_largest_free_order_node);
+ write_unlock(&sbi->s_mb_largest_free_orders_locks[
+ grp->bb_largest_free_order]);
+ }
grp->bb_largest_free_order = -1; /* uninit */
- bits = sb->s_blocksize_bits + 1;
- for (i = bits; i >= 0; i--) {
+ for (i = MB_NUM_ORDERS(sb) - 1; i >= 0; i--) {
if (grp->bb_counters[i] > 0) {
grp->bb_largest_free_order = i;
break;
}
}
+ if (test_opt2(sb, MB_OPTIMIZE_SCAN) &&
+ grp->bb_largest_free_order >= 0 && grp->bb_free) {
+ write_lock(&sbi->s_mb_largest_free_orders_locks[
+ grp->bb_largest_free_order]);
+ list_add_tail(&grp->bb_largest_free_order_node,
+ &sbi->s_mb_largest_free_orders[grp->bb_largest_free_order]);
+ write_unlock(&sbi->s_mb_largest_free_orders_locks[
+ grp->bb_largest_free_order]);
+ }
}
static noinline_for_stack
@@ -816,10 +1151,9 @@
clear_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &(grp->bb_state));
period = get_cycles() - period;
- spin_lock(&sbi->s_bal_lock);
- sbi->s_mb_buddies_generated++;
- sbi->s_mb_generation_time += period;
- spin_unlock(&sbi->s_bal_lock);
+ atomic_inc(&sbi->s_mb_buddies_generated);
+ atomic64_add(period, &sbi->s_mb_generation_time);
+ mb_update_avg_fragment_size(sb, grp);
}
static void mb_regenerate_buddy(struct ext4_buddy *e4b)
@@ -977,7 +1311,7 @@
grinfo->bb_fragments = 0;
memset(grinfo->bb_counters, 0,
sizeof(*grinfo->bb_counters) *
- (sb->s_blocksize_bits+2));
+ (MB_NUM_ORDERS(sb)));
/*
* incore got set to the group block bitmap below
*/
@@ -1542,6 +1876,7 @@
done:
mb_set_largest_free_order(sb, e4b->bd_info);
+ mb_update_avg_fragment_size(sb, e4b->bd_info);
mb_check_buddy(e4b);
}
@@ -1678,6 +2013,7 @@
}
mb_set_largest_free_order(e4b->bd_sb, e4b->bd_info);
+ mb_update_avg_fragment_size(e4b->bd_sb, e4b->bd_info);
ext4_set_bits(e4b->bd_bitmap, ex->fe_start, len0);
mb_check_buddy(e4b);
@@ -1953,7 +2289,7 @@
int max;
BUG_ON(ac->ac_2order <= 0);
- for (i = ac->ac_2order; i <= sb->s_blocksize_bits + 1; i++) {
+ for (i = ac->ac_2order; i < MB_NUM_ORDERS(sb); i++) {
if (grp->bb_counters[i] == 0)
continue;
@@ -2132,7 +2468,7 @@
if (free < ac->ac_g_ex.fe_len)
return false;
- if (ac->ac_2order > ac->ac_sb->s_blocksize_bits+1)
+ if (ac->ac_2order >= MB_NUM_ORDERS(ac->ac_sb))
return true;
if (grp->bb_largest_free_order < ac->ac_2order)
@@ -2171,6 +2507,8 @@
ext4_grpblk_t free;
int ret = 0;
+ if (sbi->s_mb_stats)
+ atomic64_inc(&sbi->s_bal_cX_groups_considered[ac->ac_criteria]);
if (should_lock)
ext4_lock_group(sb, group);
free = grp->bb_free;
@@ -2338,13 +2676,13 @@
* We also support searching for power-of-two requests only for
* requests upto maximum buddy size we have constructed.
*/
- if (i >= sbi->s_mb_order2_reqs && i <= sb->s_blocksize_bits + 2) {
+ if (i >= sbi->s_mb_order2_reqs && i <= MB_NUM_ORDERS(sb)) {
/*
* This should tell if fe_len is exactly power of 2
*/
if ((ac->ac_g_ex.fe_len & (~(1 << (i - 1)))) == 0)
ac->ac_2order = array_index_nospec(i - 1,
- sb->s_blocksize_bits + 2);
+ MB_NUM_ORDERS(sb));
}
/* if stream allocation is enabled, use global goal */
@@ -2370,17 +2708,21 @@
* from the goal value specified
*/
group = ac->ac_g_ex.fe_group;
+ ac->ac_last_optimal_group = group;
+ ac->ac_groups_linear_remaining = sbi->s_mb_max_linear_groups;
prefetch_grp = group;
- for (i = 0; i < ngroups; group++, i++) {
- int ret = 0;
+ for (i = 0; i < ngroups; group = next_linear_group(ac, group, ngroups),
+ i++) {
+ int ret = 0, new_cr;
+
cond_resched();
- /*
- * Artificially restricted ngroups for non-extent
- * files makes group > ngroups possible on first loop.
- */
- if (group >= ngroups)
- group = 0;
+
+ ext4_mb_choose_next_group(ac, &new_cr, &group, ngroups);
+ if (new_cr != cr) {
+ cr = new_cr;
+ goto repeat;
+ }
/*
* Batch reads of the block allocation bitmaps
@@ -2445,6 +2787,9 @@
if (ac->ac_status != AC_STATUS_CONTINUE)
break;
}
+ /* Processed all groups and haven't found blocks */
+ if (sbi->s_mb_stats && i == ngroups)
+ atomic64_inc(&sbi->s_bal_cX_failed[cr]);
}
if (ac->ac_b_ex.fe_len > 0 && ac->ac_status != AC_STATUS_FOUND &&
@@ -2474,6 +2819,9 @@
goto repeat;
}
}
+
+ if (sbi->s_mb_stats && ac->ac_status == AC_STATUS_FOUND)
+ atomic64_inc(&sbi->s_bal_cX_hits[ac->ac_criteria]);
out:
if (!err && ac->ac_status != AC_STATUS_FOUND && first_err)
err = first_err;
@@ -2573,6 +2921,157 @@
.show = ext4_mb_seq_groups_show,
};
+int ext4_seq_mb_stats_show(struct seq_file *seq, void *offset)
+{
+ struct super_block *sb = (struct super_block *)seq->private;
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+
+ seq_puts(seq, "mballoc:\n");
+ if (!sbi->s_mb_stats) {
+ seq_puts(seq, "\tmb stats collection turned off.\n");
+ seq_puts(seq, "\tTo enable, please write \"1\" to sysfs file mb_stats.\n");
+ return 0;
+ }
+ seq_printf(seq, "\treqs: %u\n", atomic_read(&sbi->s_bal_reqs));
+ seq_printf(seq, "\tsuccess: %u\n", atomic_read(&sbi->s_bal_success));
+
+ seq_printf(seq, "\tgroups_scanned: %u\n", atomic_read(&sbi->s_bal_groups_scanned));
+
+ seq_puts(seq, "\tcr0_stats:\n");
+ seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[0]));
+ seq_printf(seq, "\t\tgroups_considered: %llu\n",
+ atomic64_read(&sbi->s_bal_cX_groups_considered[0]));
+ seq_printf(seq, "\t\tuseless_loops: %llu\n",
+ atomic64_read(&sbi->s_bal_cX_failed[0]));
+ seq_printf(seq, "\t\tbad_suggestions: %u\n",
+ atomic_read(&sbi->s_bal_cr0_bad_suggestions));
+
+ seq_puts(seq, "\tcr1_stats:\n");
+ seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[1]));
+ seq_printf(seq, "\t\tgroups_considered: %llu\n",
+ atomic64_read(&sbi->s_bal_cX_groups_considered[1]));
+ seq_printf(seq, "\t\tuseless_loops: %llu\n",
+ atomic64_read(&sbi->s_bal_cX_failed[1]));
+ seq_printf(seq, "\t\tbad_suggestions: %u\n",
+ atomic_read(&sbi->s_bal_cr1_bad_suggestions));
+
+ seq_puts(seq, "\tcr2_stats:\n");
+ seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[2]));
+ seq_printf(seq, "\t\tgroups_considered: %llu\n",
+ atomic64_read(&sbi->s_bal_cX_groups_considered[2]));
+ seq_printf(seq, "\t\tuseless_loops: %llu\n",
+ atomic64_read(&sbi->s_bal_cX_failed[2]));
+
+ seq_puts(seq, "\tcr3_stats:\n");
+ seq_printf(seq, "\t\thits: %llu\n", atomic64_read(&sbi->s_bal_cX_hits[3]));
+ seq_printf(seq, "\t\tgroups_considered: %llu\n",
+ atomic64_read(&sbi->s_bal_cX_groups_considered[3]));
+ seq_printf(seq, "\t\tuseless_loops: %llu\n",
+ atomic64_read(&sbi->s_bal_cX_failed[3]));
+ seq_printf(seq, "\textents_scanned: %u\n", atomic_read(&sbi->s_bal_ex_scanned));
+ seq_printf(seq, "\t\tgoal_hits: %u\n", atomic_read(&sbi->s_bal_goals));
+ seq_printf(seq, "\t\t2^n_hits: %u\n", atomic_read(&sbi->s_bal_2orders));
+ seq_printf(seq, "\t\tbreaks: %u\n", atomic_read(&sbi->s_bal_breaks));
+ seq_printf(seq, "\t\tlost: %u\n", atomic_read(&sbi->s_mb_lost_chunks));
+
+ seq_printf(seq, "\tbuddies_generated: %u/%u\n",
+ atomic_read(&sbi->s_mb_buddies_generated),
+ ext4_get_groups_count(sb));
+ seq_printf(seq, "\tbuddies_time_used: %llu\n",
+ atomic64_read(&sbi->s_mb_generation_time));
+ seq_printf(seq, "\tpreallocated: %u\n",
+ atomic_read(&sbi->s_mb_preallocated));
+ seq_printf(seq, "\tdiscarded: %u\n",
+ atomic_read(&sbi->s_mb_discarded));
+ return 0;
+}
+
+static void *ext4_mb_seq_structs_summary_start(struct seq_file *seq, loff_t *pos)
+{
+ struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ unsigned long position;
+
+ read_lock(&EXT4_SB(sb)->s_mb_rb_lock);
+
+ if (*pos < 0 || *pos >= MB_NUM_ORDERS(sb) + 1)
+ return NULL;
+ position = *pos + 1;
+ return (void *) ((unsigned long) position);
+}
+
+static void *ext4_mb_seq_structs_summary_next(struct seq_file *seq, void *v, loff_t *pos)
+{
+ struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ unsigned long position;
+
+ ++*pos;
+ if (*pos < 0 || *pos >= MB_NUM_ORDERS(sb) + 1)
+ return NULL;
+ position = *pos + 1;
+ return (void *) ((unsigned long) position);
+}
+
+static int ext4_mb_seq_structs_summary_show(struct seq_file *seq, void *v)
+{
+ struct super_block *sb = PDE_DATA(file_inode(seq->file));
+ struct ext4_sb_info *sbi = EXT4_SB(sb);
+ unsigned long position = ((unsigned long) v);
+ struct ext4_group_info *grp;
+ struct rb_node *n;
+ unsigned int count, min, max;
+
+ position--;
+ if (position >= MB_NUM_ORDERS(sb)) {
+ seq_puts(seq, "fragment_size_tree:\n");
+ n = rb_first(&sbi->s_mb_avg_fragment_size_root);
+ if (!n) {
+ seq_puts(seq, "\ttree_min: 0\n\ttree_max: 0\n\ttree_nodes: 0\n");
+ return 0;
+ }
+ grp = rb_entry(n, struct ext4_group_info, bb_avg_fragment_size_rb);
+ min = grp->bb_fragments ? grp->bb_free / grp->bb_fragments : 0;
+ count = 1;
+ while (rb_next(n)) {
+ count++;
+ n = rb_next(n);
+ }
+ grp = rb_entry(n, struct ext4_group_info, bb_avg_fragment_size_rb);
+ max = grp->bb_fragments ? grp->bb_free / grp->bb_fragments : 0;
+
+ seq_printf(seq, "\ttree_min: %u\n\ttree_max: %u\n\ttree_nodes: %u\n",
+ min, max, count);
+ return 0;
+ }
+
+ if (position == 0) {
+ seq_printf(seq, "optimize_scan: %d\n",
+ test_opt2(sb, MB_OPTIMIZE_SCAN) ? 1 : 0);
+ seq_puts(seq, "max_free_order_lists:\n");
+ }
+ count = 0;
+ list_for_each_entry(grp, &sbi->s_mb_largest_free_orders[position],
+ bb_largest_free_order_node)
+ count++;
+ seq_printf(seq, "\tlist_order_%u_groups: %u\n",
+ (unsigned int)position, count);
+
+ return 0;
+}
+
+static void ext4_mb_seq_structs_summary_stop(struct seq_file *seq, void *v)
+{
+ struct super_block *sb = PDE_DATA(file_inode(seq->file));
+
+ read_unlock(&EXT4_SB(sb)->s_mb_rb_lock);
+}
+
+const struct seq_operations ext4_mb_seq_structs_summary_ops = {
+ .start = ext4_mb_seq_structs_summary_start,
+ .next = ext4_mb_seq_structs_summary_next,
+ .stop = ext4_mb_seq_structs_summary_stop,
+ .show = ext4_mb_seq_structs_summary_show,
+};
+
static struct kmem_cache *get_groupinfo_cache(int blocksize_bits)
{
int cache_index = blocksize_bits - EXT4_MIN_BLOCK_LOG_SIZE;
@@ -2675,7 +3174,10 @@
INIT_LIST_HEAD(&meta_group_info[i]->bb_prealloc_list);
init_rwsem(&meta_group_info[i]->alloc_sem);
meta_group_info[i]->bb_free_root = RB_ROOT;
+ INIT_LIST_HEAD(&meta_group_info[i]->bb_largest_free_order_node);
+ RB_CLEAR_NODE(&meta_group_info[i]->bb_avg_fragment_size_rb);
meta_group_info[i]->bb_largest_free_order = -1; /* uninit */
+ meta_group_info[i]->bb_group = group;
mb_group_bb_bitmap_alloc(sb, meta_group_info[i], group);
return 0;
@@ -2836,7 +3338,7 @@
unsigned max;
int ret;
- i = (sb->s_blocksize_bits + 2) * sizeof(*sbi->s_mb_offsets);
+ i = MB_NUM_ORDERS(sb) * sizeof(*sbi->s_mb_offsets);
sbi->s_mb_offsets = kmalloc(i, GFP_KERNEL);
if (sbi->s_mb_offsets == NULL) {
@@ -2844,7 +3346,7 @@
goto out;
}
- i = (sb->s_blocksize_bits + 2) * sizeof(*sbi->s_mb_maxs);
+ i = MB_NUM_ORDERS(sb) * sizeof(*sbi->s_mb_maxs);
sbi->s_mb_maxs = kmalloc(i, GFP_KERNEL);
if (sbi->s_mb_maxs == NULL) {
ret = -ENOMEM;
@@ -2870,10 +3372,30 @@
offset_incr = offset_incr >> 1;
max = max >> 1;
i++;
- } while (i <= sb->s_blocksize_bits + 1);
+ } while (i < MB_NUM_ORDERS(sb));
+
+ sbi->s_mb_avg_fragment_size_root = RB_ROOT;
+ sbi->s_mb_largest_free_orders =
+ kmalloc_array(MB_NUM_ORDERS(sb), sizeof(struct list_head),
+ GFP_KERNEL);
+ if (!sbi->s_mb_largest_free_orders) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ sbi->s_mb_largest_free_orders_locks =
+ kmalloc_array(MB_NUM_ORDERS(sb), sizeof(rwlock_t),
+ GFP_KERNEL);
+ if (!sbi->s_mb_largest_free_orders_locks) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ for (i = 0; i < MB_NUM_ORDERS(sb); i++) {
+ INIT_LIST_HEAD(&sbi->s_mb_largest_free_orders[i]);
+ rwlock_init(&sbi->s_mb_largest_free_orders_locks[i]);
+ }
+ rwlock_init(&sbi->s_mb_rb_lock);
spin_lock_init(&sbi->s_md_lock);
- spin_lock_init(&sbi->s_bal_lock);
sbi->s_mb_free_pending = 0;
INIT_LIST_HEAD(&sbi->s_freed_data_list);
@@ -2924,6 +3446,10 @@
spin_lock_init(&lg->lg_prealloc_lock);
}
+ if (blk_queue_nonrot(bdev_get_queue(sb->s_bdev)))
+ sbi->s_mb_max_linear_groups = 0;
+ else
+ sbi->s_mb_max_linear_groups = MB_DEFAULT_LINEAR_LIMIT;
/* init file for buddy data */
ret = ext4_mb_init_backend(sb);
if (ret != 0)
@@ -2935,6 +3461,8 @@
free_percpu(sbi->s_locality_groups);
sbi->s_locality_groups = NULL;
out:
+ kfree(sbi->s_mb_largest_free_orders);
+ kfree(sbi->s_mb_largest_free_orders_locks);
kfree(sbi->s_mb_offsets);
sbi->s_mb_offsets = NULL;
kfree(sbi->s_mb_maxs);
@@ -2991,6 +3519,8 @@
kvfree(group_info);
rcu_read_unlock();
}
+ kfree(sbi->s_mb_largest_free_orders);
+ kfree(sbi->s_mb_largest_free_orders_locks);
kfree(sbi->s_mb_offsets);
kfree(sbi->s_mb_maxs);
iput(sbi->s_buddy_cache);
@@ -3001,17 +3531,18 @@
atomic_read(&sbi->s_bal_reqs),
atomic_read(&sbi->s_bal_success));
ext4_msg(sb, KERN_INFO,
- "mballoc: %u extents scanned, %u goal hits, "
+ "mballoc: %u extents scanned, %u groups scanned, %u goal hits, "
"%u 2^N hits, %u breaks, %u lost",
atomic_read(&sbi->s_bal_ex_scanned),
+ atomic_read(&sbi->s_bal_groups_scanned),
atomic_read(&sbi->s_bal_goals),
atomic_read(&sbi->s_bal_2orders),
atomic_read(&sbi->s_bal_breaks),
atomic_read(&sbi->s_mb_lost_chunks));
ext4_msg(sb, KERN_INFO,
- "mballoc: %lu generated and it took %Lu",
- sbi->s_mb_buddies_generated,
- sbi->s_mb_generation_time);
+ "mballoc: %u generated and it took %llu",
+ atomic_read(&sbi->s_mb_buddies_generated),
+ atomic64_read(&sbi->s_mb_generation_time));
ext4_msg(sb, KERN_INFO,
"mballoc: %u preallocated, %u discarded",
atomic_read(&sbi->s_mb_preallocated),
@@ -3606,12 +4137,13 @@
{
struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
- if (sbi->s_mb_stats && ac->ac_g_ex.fe_len > 1) {
+ if (sbi->s_mb_stats && ac->ac_g_ex.fe_len >= 1) {
atomic_inc(&sbi->s_bal_reqs);
atomic_add(ac->ac_b_ex.fe_len, &sbi->s_bal_allocated);
if (ac->ac_b_ex.fe_len >= ac->ac_o_ex.fe_len)
atomic_inc(&sbi->s_bal_success);
atomic_add(ac->ac_found, &sbi->s_bal_ex_scanned);
+ atomic_add(ac->ac_groups_scanned, &sbi->s_bal_groups_scanned);
if (ac->ac_g_ex.fe_start == ac->ac_b_ex.fe_start &&
ac->ac_g_ex.fe_group == ac->ac_b_ex.fe_group)
atomic_inc(&sbi->s_bal_goals);
diff --git a/fs/ext4/mballoc.h b/fs/ext4/mballoc.h
index e75b474..02585e3 100644
--- a/fs/ext4/mballoc.h
+++ b/fs/ext4/mballoc.h
@@ -78,6 +78,23 @@
*/
#define MB_DEFAULT_MAX_INODE_PREALLOC 512
+/*
+ * Number of groups to search linearly before performing group scanning
+ * optimization.
+ */
+#define MB_DEFAULT_LINEAR_LIMIT 4
+
+/*
+ * Minimum number of groups that should be present in the file system to perform
+ * group scanning optimizations.
+ */
+#define MB_DEFAULT_LINEAR_SCAN_THRESHOLD 16
+
+/*
+ * Number of valid buddy orders
+ */
+#define MB_NUM_ORDERS(sb) ((sb)->s_blocksize_bits + 2)
+
struct ext4_free_data {
/* this links the free block information from sb_info */
struct list_head efd_list;
@@ -161,11 +178,14 @@
/* copy of the best found extent taken before preallocation efforts */
struct ext4_free_extent ac_f_ex;
+ ext4_group_t ac_last_optimal_group;
+ __u32 ac_groups_considered;
+ __u32 ac_flags; /* allocation hints */
__u16 ac_groups_scanned;
+ __u16 ac_groups_linear_remaining;
__u16 ac_found;
__u16 ac_tail;
__u16 ac_buddy;
- __u16 ac_flags; /* allocation hints */
__u8 ac_status;
__u8 ac_criteria;
__u8 ac_2order; /* if request is to allocate 2^N blocks and
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index ab7baf5..8840acb 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1776,7 +1776,14 @@
memcpy (to, de, rec_len);
((struct ext4_dir_entry_2 *) to)->rec_len =
ext4_rec_len_to_disk(rec_len, blocksize);
+
+ /* wipe dir_entry excluding the rec_len field */
de->inode = 0;
+ memset(&de->name_len, 0, ext4_rec_len_from_disk(de->rec_len,
+ blocksize) -
+ offsetof(struct ext4_dir_entry_2,
+ name_len));
+
map++;
to += rec_len;
}
@@ -2101,6 +2108,7 @@
data2 = bh2->b_data;
memcpy(data2, de, len);
+ memset(de, 0, len); /* wipe old data */
de = (struct ext4_dir_entry_2 *) data2;
top = data2 + len;
while ((char *)(de2 = ext4_next_entry(de, blocksize)) < top)
@@ -2481,15 +2489,27 @@
entry_buf, buf_size, i))
return -EFSCORRUPTED;
if (de == de_del) {
- if (pde)
+ if (pde) {
pde->rec_len = ext4_rec_len_to_disk(
ext4_rec_len_from_disk(pde->rec_len,
blocksize) +
ext4_rec_len_from_disk(de->rec_len,
blocksize),
blocksize);
- else
+
+ /* wipe entire dir_entry */
+ memset(de, 0, ext4_rec_len_from_disk(de->rec_len,
+ blocksize));
+ } else {
+ /* wipe dir_entry excluding the rec_len field */
de->inode = 0;
+ memset(&de->name_len, 0,
+ ext4_rec_len_from_disk(de->rec_len,
+ blocksize) -
+ offsetof(struct ext4_dir_entry_2,
+ name_len));
+ }
+
inode_inc_iversion(dir);
return 0;
}
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index c7f5b66..a105ad1 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1704,7 +1704,7 @@
Opt_dioread_nolock, Opt_dioread_lock,
Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable,
Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache,
- Opt_prefetch_block_bitmaps,
+ Opt_no_prefetch_block_bitmaps, Opt_mb_optimize_scan,
#ifdef CONFIG_EXT4_DEBUG
Opt_fc_debug_max_replay, Opt_fc_debug_force
#endif
@@ -1787,7 +1787,6 @@
{Opt_auto_da_alloc, "auto_da_alloc"},
{Opt_noauto_da_alloc, "noauto_da_alloc"},
{Opt_dioread_nolock, "dioread_nolock"},
- {Opt_dioread_lock, "nodioread_nolock"},
{Opt_dioread_lock, "dioread_lock"},
{Opt_discard, "discard"},
{Opt_nodiscard, "nodiscard"},
@@ -1804,7 +1803,9 @@
{Opt_inlinecrypt, "inlinecrypt"},
{Opt_nombcache, "nombcache"},
{Opt_nombcache, "no_mbcache"}, /* for backward compatibility */
- {Opt_prefetch_block_bitmaps, "prefetch_block_bitmaps"},
+ {Opt_removed, "prefetch_block_bitmaps"},
+ {Opt_no_prefetch_block_bitmaps, "no_prefetch_block_bitmaps"},
+ {Opt_mb_optimize_scan, "mb_optimize_scan=%d"},
{Opt_removed, "check=none"}, /* mount option from ext2/3 */
{Opt_removed, "nocheck"}, /* mount option from ext2/3 */
{Opt_removed, "reservation"}, /* mount option from ext2/3 */
@@ -1837,6 +1838,8 @@
}
#define DEFAULT_JOURNAL_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
+#define DEFAULT_MB_OPTIMIZE_SCAN (-1)
+
static const char deprecated_msg[] =
"Mount option \"%s\" will be removed by %s\n"
"Contact linux-ext4@vger.kernel.org if you think we should keep it.\n";
@@ -2023,8 +2026,9 @@
{Opt_max_dir_size_kb, 0, MOPT_GTE0},
{Opt_test_dummy_encryption, 0, MOPT_STRING},
{Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET},
- {Opt_prefetch_block_bitmaps, EXT4_MOUNT_PREFETCH_BLOCK_BITMAPS,
+ {Opt_no_prefetch_block_bitmaps, EXT4_MOUNT_NO_PREFETCH_BLOCK_BITMAPS,
MOPT_SET},
+ {Opt_mb_optimize_scan, EXT4_MOUNT2_MB_OPTIMIZE_SCAN, MOPT_GTE0},
#ifdef CONFIG_EXT4_DEBUG
{Opt_fc_debug_force, EXT4_MOUNT2_JOURNAL_FAST_COMMIT,
MOPT_SET | MOPT_2 | MOPT_EXT4_ONLY},
@@ -2106,9 +2110,15 @@
return 1;
}
+struct ext4_parsed_options {
+ unsigned long journal_devnum;
+ unsigned int journal_ioprio;
+ int mb_optimize_scan;
+};
+
static int handle_mount_opt(struct super_block *sb, char *opt, int token,
- substring_t *args, unsigned long *journal_devnum,
- unsigned int *journal_ioprio, int is_remount)
+ substring_t *args, struct ext4_parsed_options *parsed_opts,
+ int is_remount)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
const struct mount_opts *m;
@@ -2265,7 +2275,7 @@
"Cannot specify journal on remount");
return -1;
}
- *journal_devnum = arg;
+ parsed_opts->journal_devnum = arg;
} else if (token == Opt_journal_path) {
char *journal_path;
struct inode *journal_inode;
@@ -2301,7 +2311,7 @@
return -1;
}
- *journal_devnum = new_encode_dev(journal_inode->i_rdev);
+ parsed_opts->journal_devnum = new_encode_dev(journal_inode->i_rdev);
path_put(&path);
kfree(journal_path);
} else if (token == Opt_journal_ioprio) {
@@ -2310,7 +2320,7 @@
" (must be 0-7)");
return -1;
}
- *journal_ioprio =
+ parsed_opts->journal_ioprio =
IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, arg);
} else if (token == Opt_test_dummy_encryption) {
return ext4_set_test_dummy_encryption(sb, opt, &args[0],
@@ -2400,6 +2410,13 @@
sbi->s_mount_opt |= m->mount_opt;
} else if (token == Opt_data_err_ignore) {
sbi->s_mount_opt &= ~m->mount_opt;
+ } else if (token == Opt_mb_optimize_scan) {
+ if (arg != 0 && arg != 1) {
+ ext4_msg(sb, KERN_WARNING,
+ "mb_optimize_scan should be set to 0 or 1.");
+ return -1;
+ }
+ parsed_opts->mb_optimize_scan = arg;
} else {
if (!args->from)
arg = 1;
@@ -2427,8 +2444,7 @@
}
static int parse_options(char *options, struct super_block *sb,
- unsigned long *journal_devnum,
- unsigned int *journal_ioprio,
+ struct ext4_parsed_options *ret_opts,
int is_remount)
{
struct ext4_sb_info __maybe_unused *sbi = EXT4_SB(sb);
@@ -2448,8 +2464,8 @@
*/
args[0].to = args[0].from = NULL;
token = match_token(p, tokens, args);
- if (handle_mount_opt(sb, p, token, args, journal_devnum,
- journal_ioprio, is_remount) < 0)
+ if (handle_mount_opt(sb, p, token, args, ret_opts,
+ is_remount) < 0)
return 0;
}
#ifdef CONFIG_QUOTA
@@ -3704,11 +3720,11 @@
elr->lr_super = sb;
elr->lr_first_not_zeroed = start;
- if (test_opt(sb, PREFETCH_BLOCK_BITMAPS))
- elr->lr_mode = EXT4_LI_MODE_PREFETCH_BBITMAP;
- else {
+ if (test_opt(sb, NO_PREFETCH_BLOCK_BITMAPS)) {
elr->lr_mode = EXT4_LI_MODE_ITABLE;
elr->lr_next_group = start;
+ } else {
+ elr->lr_mode = EXT4_LI_MODE_PREFETCH_BBITMAP;
}
/*
@@ -3739,7 +3755,7 @@
goto out;
}
- if (!test_opt(sb, PREFETCH_BLOCK_BITMAPS) &&
+ if (test_opt(sb, NO_PREFETCH_BLOCK_BITMAPS) &&
(first_not_zeroed == ngroups || sb_rdonly(sb) ||
!test_opt(sb, INIT_INODE_TABLE)))
goto out;
@@ -4013,7 +4029,6 @@
ext4_fsblk_t sb_block = get_sb_block(&data);
ext4_fsblk_t logical_sb_block;
unsigned long offset = 0;
- unsigned long journal_devnum = 0;
unsigned long def_mount_opts;
struct inode *root;
const char *descr;
@@ -4024,8 +4039,13 @@
int needs_recovery, has_huge_files;
__u64 blocks_count;
int err = 0;
- unsigned int journal_ioprio = DEFAULT_JOURNAL_IOPRIO;
ext4_group_t first_not_zeroed;
+ struct ext4_parsed_options parsed_opts;
+
+ /* Set defaults for the variables that will be set during parsing */
+ parsed_opts.journal_ioprio = DEFAULT_JOURNAL_IOPRIO;
+ parsed_opts.journal_devnum = 0;
+ parsed_opts.mb_optimize_scan = DEFAULT_MB_OPTIMIZE_SCAN;
if ((data && !orig_data) || !sbi)
goto out_free_base;
@@ -4273,8 +4293,7 @@
GFP_KERNEL);
if (!s_mount_opts)
goto failed_mount;
- if (!parse_options(s_mount_opts, sb, &journal_devnum,
- &journal_ioprio, 0)) {
+ if (!parse_options(s_mount_opts, sb, &parsed_opts, 0)) {
ext4_msg(sb, KERN_WARNING,
"failed to parse options in superblock: %s",
s_mount_opts);
@@ -4282,8 +4301,7 @@
kfree(s_mount_opts);
}
sbi->s_def_mount_opt = sbi->s_mount_opt;
- if (!parse_options((char *) data, sb, &journal_devnum,
- &journal_ioprio, 0))
+ if (!parse_options((char *) data, sb, &parsed_opts, 0))
goto failed_mount;
#ifdef CONFIG_UNICODE
@@ -4324,9 +4342,9 @@
#endif
if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) {
- printk_once(KERN_WARNING "EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support!\n");
- /* can't mount with both data=journal and dioread_nolock. */
- clear_opt(sb, DIOREAD_NOLOCK);
+ printk_once(KERN_WARNING "EXT4-fs: Warning: mounting "
+ "with data=journal disables delayed "
+ "allocation, O_DIRECT and fast_commit support!\n");
clear_opt2(sb, JOURNAL_FAST_COMMIT);
if (test_opt2(sb, EXPLICIT_DELALLOC)) {
ext4_msg(sb, KERN_ERR, "can't mount with "
@@ -4773,7 +4791,7 @@
* root first: it may be modified in the journal!
*/
if (!test_opt(sb, NOLOAD) && ext4_has_feature_journal(sb)) {
- err = ext4_load_journal(sb, es, journal_devnum);
+ err = ext4_load_journal(sb, es, parsed_opts.journal_devnum);
if (err)
goto failed_mount3a;
} else if (test_opt(sb, NOLOAD) && !sb_rdonly(sb) &&
@@ -4872,7 +4890,7 @@
goto failed_mount_wq;
}
- set_task_ioprio(sbi->s_journal->j_task, journal_ioprio);
+ set_task_ioprio(sbi->s_journal->j_task, parsed_opts.journal_ioprio);
sbi->s_journal->j_submit_inode_data_buffers =
ext4_journal_submit_inode_data_buffers;
@@ -4983,6 +5001,19 @@
ext4_fc_replay_cleanup(sb);
ext4_ext_init(sb);
+
+ /*
+ * Enable optimize_scan if number of groups is > threshold. This can be
+ * turned off by passing "mb_optimize_scan=0". This can also be
+ * turned on forcefully by passing "mb_optimize_scan=1".
+ */
+ if (parsed_opts.mb_optimize_scan == 1)
+ set_opt2(sb, MB_OPTIMIZE_SCAN);
+ else if (parsed_opts.mb_optimize_scan == 0)
+ clear_opt2(sb, MB_OPTIMIZE_SCAN);
+ else if (sbi->s_groups_count >= MB_DEFAULT_LINEAR_SCAN_THRESHOLD)
+ set_opt2(sb, MB_OPTIMIZE_SCAN);
+
err = ext4_mb_init(sb);
if (err) {
ext4_msg(sb, KERN_ERR, "failed to initialize mballoc (%d)",
@@ -5776,13 +5807,16 @@
struct ext4_mount_options old_opts;
int enable_quota = 0;
ext4_group_t g;
- unsigned int journal_ioprio = DEFAULT_JOURNAL_IOPRIO;
int err = 0;
#ifdef CONFIG_QUOTA
int i, j;
char *to_free[EXT4_MAXQUOTAS];
#endif
char *orig_data = kstrdup(data, GFP_KERNEL);
+ struct ext4_parsed_options parsed_opts;
+
+ parsed_opts.journal_ioprio = DEFAULT_JOURNAL_IOPRIO;
+ parsed_opts.journal_devnum = 0;
if (data && !orig_data)
return -ENOMEM;
@@ -5813,7 +5847,8 @@
old_opts.s_qf_names[i] = NULL;
#endif
if (sbi->s_journal && sbi->s_journal->j_task->io_context)
- journal_ioprio = sbi->s_journal->j_task->io_context->ioprio;
+ parsed_opts.journal_ioprio =
+ sbi->s_journal->j_task->io_context->ioprio;
/*
* Some options can be enabled by ext4 and/or by VFS mount flag
@@ -5823,7 +5858,7 @@
vfs_flags = SB_LAZYTIME | SB_I_VERSION;
sb->s_flags = (sb->s_flags & ~vfs_flags) | (*flags & vfs_flags);
- if (!parse_options(data, sb, NULL, &journal_ioprio, 1)) {
+ if (!parse_options(data, sb, &parsed_opts, 1)) {
err = -EINVAL;
goto restore_opts;
}
@@ -5873,7 +5908,7 @@
if (sbi->s_journal) {
ext4_init_journal_params(sb, sbi->s_journal);
- set_task_ioprio(sbi->s_journal->j_task, journal_ioprio);
+ set_task_ioprio(sbi->s_journal->j_task, parsed_opts.journal_ioprio);
}
if ((bool)(*flags & SB_RDONLY) != sb_rdonly(sb)) {
diff --git a/fs/ext4/sysfs.c b/fs/ext4/sysfs.c
index f24bef3..68c772a 100644
--- a/fs/ext4/sysfs.c
+++ b/fs/ext4/sysfs.c
@@ -221,6 +221,7 @@
EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request);
EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc);
EXT4_RW_ATTR_SBI_UI(mb_max_inode_prealloc, s_mb_max_inode_prealloc);
+EXT4_RW_ATTR_SBI_UI(mb_max_linear_groups, s_mb_max_linear_groups);
EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb);
EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error);
EXT4_RW_ATTR_SBI_UI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval);
@@ -269,6 +270,7 @@
ATTR_LIST(mb_stream_req),
ATTR_LIST(mb_group_prealloc),
ATTR_LIST(mb_max_inode_prealloc),
+ ATTR_LIST(mb_max_linear_groups),
ATTR_LIST(max_writeback_mb_bump),
ATTR_LIST(extent_max_zeroout_kb),
ATTR_LIST(trigger_fs_error),
@@ -534,6 +536,10 @@
ext4_fc_info_show, sb);
proc_create_seq_data("mb_groups", S_IRUGO, sbi->s_proc,
&ext4_mb_seq_groups_ops, sb);
+ proc_create_single_data("mb_stats", 0444, sbi->s_proc,
+ ext4_seq_mb_stats_show, sb);
+ proc_create_seq_data("mb_structs_summary", 0444, sbi->s_proc,
+ &ext4_mb_seq_structs_summary_ops, sb);
}
return 0;
}
diff --git a/fs/file_table.c b/fs/file_table.c
index 709ada3..fbd45a1a 100644
--- a/fs/file_table.c
+++ b/fs/file_table.c
@@ -279,6 +279,7 @@
}
if (file->f_op->release)
file->f_op->release(inode, file);
+ security_file_pre_free(file);
if (unlikely(S_ISCHR(inode->i_mode) && inode->i_cdev != NULL &&
!(mode & FMODE_PATH))) {
cdev_put(inode->i_cdev);
diff --git a/fs/io-wq.c b/fs/io-wq.c
index f72d538..fe3b0f9 100644
--- a/fs/io-wq.c
+++ b/fs/io-wq.c
@@ -500,8 +500,10 @@
current->signal->rlim[RLIMIT_FSIZE].rlim_cur = RLIM_INFINITY;
io_wq_switch_blkcg(worker, work);
#ifdef CONFIG_AUDIT
- current->loginuid = work->identity->loginuid;
- current->sessionid = work->identity->sessionid;
+ if (current->audit) {
+ current->audit->loginuid = work->identity->loginuid;
+ current->audit->sessionid = work->identity->sessionid;
+ }
#endif
}
@@ -516,8 +518,10 @@
}
#ifdef CONFIG_AUDIT
- current->loginuid = KUIDT_INIT(AUDIT_UID_UNSET);
- current->sessionid = AUDIT_SID_UNSET;
+ if (current->audit) {
+ current->audit->loginuid = KUIDT_INIT(AUDIT_UID_UNSET);
+ current->audit->sessionid = AUDIT_SID_UNSET;
+ }
#endif
spin_lock_irq(&worker->lock);
diff --git a/fs/io_uring.c b/fs/io_uring.c
index 369ec81..f2ed92d 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -1124,8 +1124,8 @@
id->fs = current->fs;
id->fsize = rlimit(RLIMIT_FSIZE);
#ifdef CONFIG_AUDIT
- id->loginuid = current->loginuid;
- id->sessionid = current->sessionid;
+ id->loginuid = audit_get_loginuid(current);
+ id->sessionid = audit_get_sessionid(current);
#endif
refcount_set(&id->count, 1);
}
@@ -1378,8 +1378,8 @@
req->work.flags |= IO_WQ_WORK_CREDS;
}
#ifdef CONFIG_AUDIT
- if (!uid_eq(current->loginuid, id->loginuid) ||
- current->sessionid != id->sessionid)
+ if (!uid_eq(audit_get_loginuid(current), id->loginuid) ||
+ audit_get_sessionid(current) != id->sessionid)
return false;
#endif
if (!(req->work.flags & IO_WQ_WORK_FS) &&
@@ -6903,8 +6903,10 @@
}
io_sq_thread_associate_blkcg(ctx, &cur_css);
#ifdef CONFIG_AUDIT
- current->loginuid = ctx->loginuid;
- current->sessionid = ctx->sessionid;
+ if (current->audit) {
+ current->audit->loginuid = ctx->loginuid;
+ current->audit->sessionid = ctx->sessionid;
+ }
#endif
ret |= __io_sq_thread(ctx, start_jiffies, cap_entries);
@@ -9405,8 +9407,8 @@
ctx->user = user;
ctx->creds = get_current_cred();
#ifdef CONFIG_AUDIT
- ctx->loginuid = current->loginuid;
- ctx->sessionid = current->sessionid;
+ ctx->loginuid = audit_get_loginuid(current);
+ ctx->sessionid = audit_get_sessionid(current);
#endif
ctx->sqo_task = get_task_struct(current);
diff --git a/fs/jbd2/journal.c b/fs/jbd2/journal.c
index 188f79d..2dc9444 100644
--- a/fs/jbd2/journal.c
+++ b/fs/jbd2/journal.c
@@ -1869,9 +1869,7 @@
if (jbd2_has_feature_fast_commit(journal)) {
journal->j_fc_last = be32_to_cpu(sb->s_maxlen);
- num_fc_blocks = be32_to_cpu(sb->s_num_fc_blks);
- if (!num_fc_blocks)
- num_fc_blocks = JBD2_MIN_FC_BLOCKS;
+ num_fc_blocks = jbd2_journal_get_num_fc_blks(sb);
if (journal->j_last - num_fc_blocks >= JBD2_MIN_JOURNAL_BLOCKS)
journal->j_last = journal->j_fc_last - num_fc_blocks;
journal->j_fc_first = journal->j_last + 1;
@@ -2102,9 +2100,7 @@
journal_superblock_t *sb = journal->j_superblock;
unsigned long long num_fc_blks;
- num_fc_blks = be32_to_cpu(sb->s_num_fc_blks);
- if (num_fc_blks == 0)
- num_fc_blks = JBD2_MIN_FC_BLOCKS;
+ num_fc_blks = jbd2_journal_get_num_fc_blks(sb);
if (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS)
return -ENOSPC;
diff --git a/fs/proc/Kconfig b/fs/proc/Kconfig
index c930001..c54bc40 100644
--- a/fs/proc/Kconfig
+++ b/fs/proc/Kconfig
@@ -107,3 +107,11 @@
config PROC_CPU_RESCTRL
def_bool n
depends on PROC_FS
+
+config PROC_SELF_MEM_READONLY
+ bool "Force /proc/<pid>/mem paths to be read-only"
+ default y
+ help
+ When enabled, attempts to open /proc/self/mem for write access
+ will always fail. Write access to this file allows bypassing
+ of memory map permissions (such as modifying read-only code).
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 297ea12..e5ef64e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -150,6 +150,12 @@
NULL, &proc_pid_attr_operations, \
{ .lsm = LSM })
+#ifdef CONFIG_PROC_SELF_MEM_READONLY
+# define PROC_PID_MEM_MODE S_IRUSR
+#else
+# define PROC_PID_MEM_MODE S_IRUSR|S_IWUSR
+#endif
+
/*
* Count the number of hardlinks for the pid_entry table, excluding the .
* and .. links.
@@ -896,7 +902,11 @@
static ssize_t mem_write(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
{
+#ifdef CONFIG_PROC_SELF_MEM_READONLY
+ return -EACCES;
+#else
return mem_rw(file, (char __user*)buf, count, ppos, 1);
+#endif
}
loff_t mem_lseek(struct file *file, loff_t offset, int orig)
@@ -3194,7 +3204,7 @@
#ifdef CONFIG_NUMA
REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations),
#endif
- REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
+ REG("mem", PROC_PID_MEM_MODE, proc_mem_operations),
LNK("cwd", proc_cwd_link),
LNK("root", proc_root_link),
LNK("exe", proc_exe_link),
@@ -3534,7 +3544,7 @@
#ifdef CONFIG_NUMA
REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations),
#endif
- REG("mem", S_IRUSR|S_IWUSR, proc_mem_operations),
+ REG("mem", PROC_PID_MEM_MODE, proc_mem_operations),
LNK("cwd", proc_cwd_link),
LNK("root", proc_root_link),
LNK("exe", proc_exe_link),
diff --git a/include/linux/audit.h b/include/linux/audit.h
index b3d8598..3d5b243 100644
--- a/include/linux/audit.h
+++ b/include/linux/audit.h
@@ -118,6 +118,17 @@
AUDIT_NFT_OP_INVALID,
};
+struct audit_task_info {
+ kuid_t loginuid;
+ unsigned int sessionid;
+ u64 contid;
+#ifdef CONFIG_AUDITSYSCALL
+ struct audit_context *ctx;
+#endif
+};
+
+extern struct audit_task_info init_struct_audit;
+
extern int is_audit_feature_set(int which);
extern int __init audit_register_class(int class, unsigned *list);
@@ -154,6 +165,9 @@
#ifdef CONFIG_AUDIT
/* These are defined in audit.c */
/* Public API */
+extern int audit_alloc(struct task_struct *task);
+extern void audit_free(struct task_struct *task);
+extern void __init audit_task_init(void);
extern __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
const char *fmt, ...);
@@ -197,12 +211,25 @@
static inline kuid_t audit_get_loginuid(struct task_struct *tsk)
{
- return tsk->loginuid;
+ if (!tsk->audit)
+ return INVALID_UID;
+ return tsk->audit->loginuid;
}
static inline unsigned int audit_get_sessionid(struct task_struct *tsk)
{
- return tsk->sessionid;
+ if (!tsk->audit)
+ return AUDIT_SID_UNSET;
+ return tsk->audit->sessionid;
+}
+
+extern int audit_set_contid(struct task_struct *tsk, u64 contid);
+
+static inline u64 audit_get_contid(struct task_struct *tsk)
+{
+ if (!tsk->audit)
+ return AUDIT_CID_UNSET;
+ return tsk->audit->contid;
}
extern u32 audit_enabled;
@@ -210,6 +237,14 @@
extern int audit_signal_info(int sig, struct task_struct *t);
#else /* CONFIG_AUDIT */
+static inline int audit_alloc(struct task_struct *task)
+{
+ return 0;
+}
+static inline void audit_free(struct task_struct *task)
+{ }
+static inline void __init audit_task_init(void)
+{ }
static inline __printf(4, 5)
void audit_log(struct audit_context *ctx, gfp_t gfp_mask, int type,
const char *fmt, ...)
@@ -261,6 +296,11 @@
return AUDIT_SID_UNSET;
}
+static inline u64 audit_get_contid(struct task_struct *tsk)
+{
+ return AUDIT_CID_UNSET;
+}
+
#define audit_enabled AUDIT_OFF
static inline int audit_signal_info(int sig, struct task_struct *t)
@@ -285,8 +325,6 @@
/* These are defined in auditsc.c */
/* Public API */
-extern int audit_alloc(struct task_struct *task);
-extern void __audit_free(struct task_struct *task);
extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
unsigned long a2, unsigned long a3);
extern void __audit_syscall_exit(int ret_success, long ret_value);
@@ -306,12 +344,14 @@
static inline void audit_set_context(struct task_struct *task, struct audit_context *ctx)
{
- task->audit_context = ctx;
+ task->audit->ctx = ctx;
}
static inline struct audit_context *audit_context(void)
{
- return current->audit_context;
+ if (!current->audit)
+ return NULL;
+ return current->audit->ctx;
}
static inline bool audit_dummy_context(void)
@@ -319,11 +359,7 @@
void *p = audit_context();
return !p || *(int *)p;
}
-static inline void audit_free(struct task_struct *task)
-{
- if (unlikely(task->audit_context))
- __audit_free(task);
-}
+
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
@@ -556,12 +592,6 @@
extern int audit_n_rules;
extern int audit_signals;
#else /* CONFIG_AUDITSYSCALL */
-static inline int audit_alloc(struct task_struct *task)
-{
- return 0;
-}
-static inline void audit_free(struct task_struct *task)
-{ }
static inline void audit_syscall_entry(int major, unsigned long a0,
unsigned long a1, unsigned long a2,
unsigned long a3)
@@ -694,4 +724,19 @@
return uid_valid(audit_get_loginuid(tsk));
}
+static inline bool audit_contid_valid(u64 contid)
+{
+ return contid != AUDIT_CID_UNSET;
+}
+
+static inline bool audit_contid_set(struct task_struct *tsk)
+{
+ return audit_contid_valid(audit_get_contid(tsk));
+}
+
+static inline void audit_log_string(struct audit_buffer *ab, const char *buf)
+{
+ audit_log_n_string(ab, buf, strlen(buf));
+}
+
#endif
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 542471b..2d91f2aa 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -471,6 +471,7 @@
unsigned int dma_pad_mask;
unsigned int dma_alignment;
+ unsigned int split_alignment;
#ifdef CONFIG_BLK_INLINE_ENCRYPTION
/* Inline crypto capabilities */
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 8ad8191..6a86d65 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -311,6 +311,7 @@
RET_PTR_TO_BTF_ID_OR_NULL, /* returns a pointer to a btf_id or NULL */
RET_PTR_TO_MEM_OR_BTF_ID_OR_NULL, /* returns a pointer to a valid memory or a btf_id or NULL */
RET_PTR_TO_MEM_OR_BTF_ID, /* returns a pointer to a valid memory or a btf_id */
+ RET_PTR_TO_BTF_ID, /* returns a pointer to a btf_id */
};
/* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs
@@ -1860,6 +1861,8 @@
extern const struct bpf_func_proto bpf_snprintf_btf_proto;
extern const struct bpf_func_proto bpf_per_cpu_ptr_proto;
extern const struct bpf_func_proto bpf_this_cpu_ptr_proto;
+extern const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto;
+extern const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto;
const struct bpf_func_proto *bpf_tracing_func_proto(
enum bpf_func_id func_id, const struct bpf_prog *prog);
diff --git a/include/linux/bpf_lsm.h b/include/linux/bpf_lsm.h
index aaacb6a..0d1c33a 100644
--- a/include/linux/bpf_lsm.h
+++ b/include/linux/bpf_lsm.h
@@ -7,6 +7,7 @@
#ifndef _LINUX_BPF_LSM_H
#define _LINUX_BPF_LSM_H
+#include <linux/sched.h>
#include <linux/bpf.h>
#include <linux/lsm_hooks.h>
@@ -26,6 +27,8 @@
int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
const struct bpf_prog *prog);
+bool bpf_lsm_is_sleepable_hook(u32 btf_id);
+
static inline struct bpf_storage_blob *bpf_inode(
const struct inode *inode)
{
@@ -35,12 +38,29 @@
return inode->i_security + bpf_lsm_blob_sizes.lbs_inode;
}
+static inline struct bpf_storage_blob *bpf_task(
+ const struct task_struct *task)
+{
+ if (unlikely(!task->security))
+ return NULL;
+
+ return task->security + bpf_lsm_blob_sizes.lbs_task;
+}
+
extern const struct bpf_func_proto bpf_inode_storage_get_proto;
extern const struct bpf_func_proto bpf_inode_storage_delete_proto;
+extern const struct bpf_func_proto bpf_task_storage_get_proto;
+extern const struct bpf_func_proto bpf_task_storage_delete_proto;
void bpf_inode_storage_free(struct inode *inode);
+void bpf_task_storage_free(struct task_struct *task);
#else /* !CONFIG_BPF_LSM */
+static inline bool bpf_lsm_is_sleepable_hook(u32 btf_id)
+{
+ return false;
+}
+
static inline int bpf_lsm_verify_prog(struct bpf_verifier_log *vlog,
const struct bpf_prog *prog)
{
@@ -53,10 +73,20 @@
return NULL;
}
+static inline struct bpf_storage_blob *bpf_task(
+ const struct task_struct *task)
+{
+ return NULL;
+}
+
static inline void bpf_inode_storage_free(struct inode *inode)
{
}
+static inline void bpf_task_storage_free(struct task_struct *task)
+{
+}
+
#endif /* CONFIG_BPF_LSM */
#endif /* _LINUX_BPF_LSM_H */
diff --git a/include/linux/bpf_types.h b/include/linux/bpf_types.h
index 2e6f568..99f7fd6 100644
--- a/include/linux/bpf_types.h
+++ b/include/linux/bpf_types.h
@@ -109,6 +109,7 @@
#endif
#ifdef CONFIG_BPF_LSM
BPF_MAP_TYPE(BPF_MAP_TYPE_INODE_STORAGE, inode_storage_map_ops)
+BPF_MAP_TYPE(BPF_MAP_TYPE_TASK_STORAGE, task_storage_map_ops)
#endif
BPF_MAP_TYPE(BPF_MAP_TYPE_CPUMAP, cpu_map_ops)
#if defined(CONFIG_XDP_SOCKETS)
diff --git a/include/linux/filter.h b/include/linux/filter.h
index e2ffa02..299b2f0 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -259,15 +259,32 @@
.off = OFF, \
.imm = 0 })
-/* Atomic memory add, *(uint *)(dst_reg + off16) += src_reg */
-#define BPF_STX_XADD(SIZE, DST, SRC, OFF) \
+/*
+ * Atomic operations:
+ *
+ * BPF_ADD *(uint *) (dst_reg + off16) += src_reg
+ * BPF_AND *(uint *) (dst_reg + off16) &= src_reg
+ * BPF_OR *(uint *) (dst_reg + off16) |= src_reg
+ * BPF_XOR *(uint *) (dst_reg + off16) ^= src_reg
+ * BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg);
+ * BPF_AND | BPF_FETCH src_reg = atomic_fetch_and(dst_reg + off16, src_reg);
+ * BPF_OR | BPF_FETCH src_reg = atomic_fetch_or(dst_reg + off16, src_reg);
+ * BPF_XOR | BPF_FETCH src_reg = atomic_fetch_xor(dst_reg + off16, src_reg);
+ * BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg)
+ * BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg)
+ */
+
+#define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF) \
((struct bpf_insn) { \
- .code = BPF_STX | BPF_SIZE(SIZE) | BPF_XADD, \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
- .imm = 0 })
+ .imm = OP })
+
+/* Legacy alias */
+#define BPF_STX_XADD(SIZE, DST, SRC, OFF) BPF_ATOMIC_OP(SIZE, BPF_ADD, DST, SRC, OFF)
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
diff --git a/include/linux/ima.h b/include/linux/ima.h
index 8fa7bcf..7233a27 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -29,6 +29,7 @@
enum kernel_read_file_id id);
extern void ima_post_path_mknod(struct dentry *dentry);
extern int ima_file_hash(struct file *file, char *buf, size_t buf_size);
+extern int ima_inode_hash(struct inode *inode, char *buf, size_t buf_size);
extern void ima_kexec_cmdline(int kernel_fd, const void *buf, int size);
#ifdef CONFIG_IMA_KEXEC
@@ -115,6 +116,11 @@
return -EOPNOTSUPP;
}
+static inline int ima_inode_hash(struct inode *inode, char *buf, size_t buf_size)
+{
+ return -EOPNOTSUPP;
+}
+
static inline void ima_kexec_cmdline(int kernel_fd, const void *buf, int size) {}
#endif /* CONFIG_IMA */
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index 578ff19..096ed3f 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -68,7 +68,7 @@
extern void jbd2_free(void *ptr, size_t size);
#define JBD2_MIN_JOURNAL_BLOCKS 1024
-#define JBD2_MIN_FC_BLOCKS 256
+#define JBD2_DEFAULT_FAST_COMMIT_BLOCKS 256
#ifdef __KERNEL__
@@ -1691,6 +1691,13 @@
return journal->j_chksum_driver != NULL;
}
+static inline int jbd2_journal_get_num_fc_blks(journal_superblock_t *jsb)
+{
+ int num_fc_blocks = be32_to_cpu(jsb->s_num_fc_blks);
+
+ return num_fc_blocks ? num_fc_blocks : JBD2_DEFAULT_FAST_COMMIT_BLOCKS;
+}
+
/*
* Return number of free blocks in the log. Must be called under j_state_lock.
*/
diff --git a/include/linux/lsm_hook_defs.h b/include/linux/lsm_hook_defs.h
index 32a9401..4cd6eb3 100644
--- a/include/linux/lsm_hook_defs.h
+++ b/include/linux/lsm_hook_defs.h
@@ -156,6 +156,7 @@
LSM_HOOK(int, 0, file_permission, struct file *file, int mask)
LSM_HOOK(int, 0, file_alloc_security, struct file *file)
LSM_HOOK(void, LSM_RET_VOID, file_free_security, struct file *file)
+LSM_HOOK(void, LSM_RET_VOID, file_pre_free_security, struct file *file)
LSM_HOOK(int, 0, file_ioctl, struct file *file, unsigned int cmd,
unsigned long arg)
LSM_HOOK(int, 0, mmap_addr, unsigned long addr)
@@ -173,6 +174,7 @@
LSM_HOOK(int, 0, file_open, struct file *file)
LSM_HOOK(int, 0, task_alloc, struct task_struct *task,
unsigned long clone_flags)
+LSM_HOOK(void, LSM_RET_VOID, task_post_alloc, struct task_struct *task)
LSM_HOOK(void, LSM_RET_VOID, task_free, struct task_struct *task)
LSM_HOOK(int, 0, cred_alloc_blank, struct cred *cred, gfp_t gfp)
LSM_HOOK(void, LSM_RET_VOID, cred_free, struct cred *cred)
@@ -211,6 +213,7 @@
LSM_HOOK(int, 0, task_movememory, struct task_struct *p)
LSM_HOOK(int, 0, task_kill, struct task_struct *p, struct kernel_siginfo *info,
int sig, const struct cred *cred)
+LSM_HOOK(void, LSM_RET_VOID, task_exit, struct task_struct *p)
LSM_HOOK(int, -ENOSYS, task_prctl, int option, unsigned long arg2,
unsigned long arg3, unsigned long arg4, unsigned long arg5)
LSM_HOOK(void, LSM_RET_VOID, task_to_inode, struct task_struct *p,
diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h
index c503f7a..fa18e56 100644
--- a/include/linux/lsm_hooks.h
+++ b/include/linux/lsm_hooks.h
@@ -513,6 +513,10 @@
* @file_free_security:
* Deallocate and free any security structures stored in file->f_security.
* @file contains the file structure being modified.
+ * @file_pre_free_security:
+ * Perform any logging or LSM state updates for a file being deleted
+ * using fields of the file before they have been cleared.
+ * @file contains the file structure being freed
* @file_ioctl:
* @file contains the file structure.
* @cmd contains the operation to perform.
@@ -589,6 +593,10 @@
* @clone_flags contains the flags indicating what should be shared.
* Handle allocation of task-related resources.
* Returns a zero on success, negative values on failure.
+ * @task_post_alloc:
+ * @task task being allocated.
+ * Handle allocation of task-related resources after all task fields are
+ * filled in.
* @task_free:
* @task task about to be freed.
* Handle release of task-related resources. (Note that this can be called
@@ -759,6 +767,9 @@
* @cred contains the cred of the process where the signal originated, or
* NULL if the current task is the originator.
* Return 0 if permission is granted.
+ * @task_exit:
+ * Called early when a task is exiting before all state is lost.
+ * @p contains the task_struct for process.
* @task_prctl:
* Check permission before performing a process control operation on the
* current process.
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index 5a5cb45..1a1ee88 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -33,6 +33,9 @@
struct ucounts *ucounts;
int reboot; /* group exit code if this pidns was rebooted */
struct ns_common ns;
+#ifdef CONFIG_SECURITY_CONTAINER_MONITOR
+ u64 cid; /* Main container identifier, zero if not assigned. */
+#endif
} __randomize_layout;
extern struct pid_namespace init_pid_ns;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 76cd21f..52893c2 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -36,7 +36,6 @@
#include <linux/kcsan.h>
/* task_struct member predeclarations (sorted alphabetically): */
-struct audit_context;
struct backing_dev_info;
struct bio_list;
struct blk_plug;
@@ -980,11 +979,7 @@
struct callback_head *task_works;
#ifdef CONFIG_AUDIT
-#ifdef CONFIG_AUDITSYSCALL
- struct audit_context *audit_context;
-#endif
- kuid_t loginuid;
- unsigned int sessionid;
+ struct audit_task_info *audit;
#endif
struct seccomp seccomp;
@@ -1757,6 +1752,12 @@
extern int wake_up_process(struct task_struct *tsk);
extern void wake_up_new_task(struct task_struct *tsk);
+/*
+ * Wake up tsk and try to swap it into the current tasks place, which
+ * initially means just trying to migrate it to the current CPU.
+ */
+extern int wake_up_swap(struct task_struct *tsk);
+
#ifdef CONFIG_SMP
extern void kick_process(struct task_struct *tsk);
#else
diff --git a/include/linux/security.h b/include/linux/security.h
index 3964262..19a47c8 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -364,6 +364,7 @@
int security_file_permission(struct file *file, int mask);
int security_file_alloc(struct file *file);
void security_file_free(struct file *file);
+void security_file_pre_free(struct file *file);
int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg);
int security_mmap_file(struct file *file, unsigned long prot,
unsigned long flags);
@@ -378,6 +379,7 @@
int security_file_receive(struct file *file);
int security_file_open(struct file *file);
int security_task_alloc(struct task_struct *task, unsigned long clone_flags);
+void security_task_post_alloc(struct task_struct *task);
void security_task_free(struct task_struct *task);
int security_cred_alloc_blank(struct cred *cred, gfp_t gfp);
void security_cred_free(struct cred *cred);
@@ -415,6 +417,7 @@
int security_task_movememory(struct task_struct *p);
int security_task_kill(struct task_struct *p, struct kernel_siginfo *info,
int sig, const struct cred *cred);
+void security_task_exit(struct task_struct *p);
int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
void security_task_to_inode(struct task_struct *p, struct inode *inode);
@@ -916,6 +919,9 @@
static inline void security_file_free(struct file *file)
{ }
+static inline void security_file_pre_free(struct file *file)
+{ }
+
static inline int security_file_ioctl(struct file *file, unsigned int cmd,
unsigned long arg)
{
@@ -979,6 +985,9 @@
return 0;
}
+static inline void security_task_post_alloc(struct task_struct *task)
+{ }
+
static inline void security_task_free(struct task_struct *task)
{ }
@@ -1129,6 +1138,9 @@
return 0;
}
+static inline void security_task_exit(struct task_struct *p)
+{ }
+
static inline int security_task_prctl(int option, unsigned long arg2,
unsigned long arg3,
unsigned long arg4,
diff --git a/include/uapi/linux/audit.h b/include/uapi/linux/audit.h
index cd2d827..26d65d0 100644
--- a/include/uapi/linux/audit.h
+++ b/include/uapi/linux/audit.h
@@ -71,6 +71,7 @@
#define AUDIT_TTY_SET 1017 /* Set TTY auditing status */
#define AUDIT_SET_FEATURE 1018 /* Turn an audit feature on or off */
#define AUDIT_GET_FEATURE 1019 /* Get which features are enabled */
+#define AUDIT_CONTAINER_OP 1020 /* Define the container id and info */
#define AUDIT_FIRST_USER_MSG 1100 /* Userspace messages mostly uninteresting to kernel */
#define AUDIT_USER_AVC 1107 /* We filter this differently */
@@ -495,6 +496,7 @@
#define AUDIT_UID_UNSET (unsigned int)-1
#define AUDIT_SID_UNSET ((unsigned int)-1)
+#define AUDIT_CID_UNSET ((u64)-1)
/* audit_rule_data supports filter rules with both integer and string
* fields. It corresponds with AUDIT_ADD_RULE, AUDIT_DEL_RULE and
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 556216dc..1276953 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -19,7 +19,8 @@
/* ld/ldx fields */
#define BPF_DW 0x18 /* double word (64-bit) */
-#define BPF_XADD 0xc0 /* exclusive add */
+#define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */
+#define BPF_XADD 0xc0 /* exclusive add - legacy name */
/* alu/jmp fields */
#define BPF_MOV 0xb0 /* mov reg to reg */
@@ -43,6 +44,11 @@
#define BPF_CALL 0x80 /* function call */
#define BPF_EXIT 0x90 /* function return */
+/* atomic op type fields (stored in immediate) */
+#define BPF_FETCH 0x01 /* not an opcode on its own, used to build others */
+#define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */
+#define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */
+
/* Register numbers */
enum {
BPF_REG_0 = 0,
@@ -157,6 +163,7 @@
BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF,
BPF_MAP_TYPE_INODE_STORAGE,
+ BPF_MAP_TYPE_TASK_STORAGE,
};
/* Note that tracing related programs such as
@@ -1661,6 +1668,14 @@
* Return
* A 8-byte long non-decreasing number.
*
+ * u64 bpf_get_socket_cookie(struct sock *sk)
+ * Description
+ * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
+ * *sk*, but gets socket from a BTF **struct sock**. This helper
+ * also works for sleepable programs.
+ * Return
+ * A 8-byte long unique number or 0 if *sk* is NULL.
+ *
* u32 bpf_get_socket_uid(struct sk_buff *skb)
* Return
* The owner UID of the socket associated to *skb*. If the socket
@@ -2442,7 +2457,7 @@
* running simultaneously.
*
* A user should care about the synchronization by himself.
- * For example, by using the **BPF_STX_XADD** instruction to alter
+ * For example, by using the **BPF_ATOMIC** instructions to alter
* the shared data.
* Return
* A pointer to the local storage area.
@@ -3742,6 +3757,80 @@
* Return
* The helper returns **TC_ACT_REDIRECT** on success or
* **TC_ACT_SHOT** on error.
+ *
+ * void *bpf_task_storage_get(struct bpf_map *map, struct task_struct *task, void *value, u64 flags)
+ * Description
+ * Get a bpf_local_storage from the *task*.
+ *
+ * Logically, it could be thought of as getting the value from
+ * a *map* with *task* as the **key**. From this
+ * perspective, the usage is not much different from
+ * **bpf_map_lookup_elem**\ (*map*, **&**\ *task*) except this
+ * helper enforces the key must be an task_struct and the map must also
+ * be a **BPF_MAP_TYPE_TASK_STORAGE**.
+ *
+ * Underneath, the value is stored locally at *task* instead of
+ * the *map*. The *map* is used as the bpf-local-storage
+ * "type". The bpf-local-storage "type" (i.e. the *map*) is
+ * searched against all bpf_local_storage residing at *task*.
+ *
+ * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
+ * used such that a new bpf_local_storage will be
+ * created if one does not exist. *value* can be used
+ * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
+ * the initial value of a bpf_local_storage. If *value* is
+ * **NULL**, the new bpf_local_storage will be zero initialized.
+ * Return
+ * A bpf_local_storage pointer is returned on success.
+ *
+ * **NULL** if not found or there was an error in adding
+ * a new bpf_local_storage.
+ *
+ * long bpf_task_storage_delete(struct bpf_map *map, struct task_struct *task)
+ * Description
+ * Delete a bpf_local_storage from a *task*.
+ * Return
+ * 0 on success.
+ *
+ * **-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * struct task_struct *bpf_get_current_task_btf(void)
+ * Description
+ * Return a BTF pointer to the "current" task.
+ * This pointer can also be used in helpers that accept an
+ * *ARG_PTR_TO_BTF_ID* of type *task_struct*.
+ * Return
+ * Pointer to the current task.
+ *
+ * long bpf_bprm_opts_set(struct linux_binprm *bprm, u64 flags)
+ * Description
+ * Set or clear certain options on *bprm*:
+ *
+ * **BPF_F_BPRM_SECUREEXEC** Set the secureexec bit
+ * which sets the **AT_SECURE** auxv for glibc. The bit
+ * is cleared if the flag is not specified.
+ * Return
+ * **-EINVAL** if invalid *flags* are passed, zero otherwise.
+ *
+ * u64 bpf_ktime_get_coarse_ns(void)
+ * Description
+ * Return a coarse-grained version of the time elapsed since
+ * system boot, in nanoseconds. Does not include time the system
+ * was suspended.
+ *
+ * See: **clock_gettime**\ (**CLOCK_MONOTONIC_COARSE**)
+ * Return
+ * Current *ktime*.
+ *
+ * long bpf_ima_inode_hash(struct inode *inode, void *dst, u32 size)
+ * Description
+ * Returns the stored IMA hash of the *inode* (if it's avaialable).
+ * If the hash is larger than *size*, then only *size*
+ * bytes will be copied to *dst*
+ * Return
+ * The **hash_algo** is returned on success,
+ * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
+ * invalid arguments are passed.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3900,6 +3989,12 @@
FN(per_cpu_ptr), \
FN(this_cpu_ptr), \
FN(redirect_peer), \
+ FN(task_storage_get), \
+ FN(task_storage_delete), \
+ FN(get_current_task_btf), \
+ FN(bprm_opts_set), \
+ FN(ktime_get_coarse_ns), \
+ FN(ima_inode_hash), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -4071,6 +4166,11 @@
BPF_LWT_ENCAP_IP,
};
+/* Flags for bpf_bprm_opts_set helper */
+enum {
+ BPF_F_BPRM_SECUREEXEC = (1ULL << 0),
+};
+
#define __bpf_md_ptr(type, name) \
union { \
type name; \
diff --git a/include/uapi/linux/futex.h b/include/uapi/linux/futex.h
index a89eb0a..7e2b8a8 100644
--- a/include/uapi/linux/futex.h
+++ b/include/uapi/linux/futex.h
@@ -22,6 +22,8 @@
#define FUTEX_WAIT_REQUEUE_PI 11
#define FUTEX_CMP_REQUEUE_PI 12
+#define GFUTEX_SWAP 60
+
#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
#define FUTEX_CMD_MASK ~(FUTEX_PRIVATE_FLAG | FUTEX_CLOCK_REALTIME)
@@ -41,6 +43,8 @@
#define FUTEX_CMP_REQUEUE_PI_PRIVATE (FUTEX_CMP_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
+#define GFUTEX_SWAP_PRIVATE (GFUTEX_SWAP | FUTEX_PRIVATE_FLAG)
+
/*
* Support for robust futexes: the kernel cleans up held futexes at
* thread exit time.
diff --git a/init/init_task.c b/init/init_task.c
index 5fa18ed..68bfc90 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -134,8 +134,7 @@
.thread_group = LIST_HEAD_INIT(init_task.thread_group),
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
#ifdef CONFIG_AUDIT
- .loginuid = INVALID_UID,
- .sessionid = AUDIT_SID_UNSET,
+ .audit = &init_struct_audit,
#endif
#ifdef CONFIG_PERF_EVENTS
.perf_event_mutex = __MUTEX_INITIALIZER(init_task.perf_event_mutex),
diff --git a/init/main.c b/init/main.c
index d9d9141..fd4dee3 100644
--- a/init/main.c
+++ b/init/main.c
@@ -98,6 +98,7 @@
#include <linux/mem_encrypt.h>
#include <linux/kcsan.h>
#include <linux/init_syscalls.h>
+#include <linux/audit.h>
#include <asm/io.h>
#include <asm/bugs.h>
@@ -1046,6 +1047,7 @@
nsfs_init();
cpuset_init();
cgroup_init();
+ audit_task_init();
taskstats_init_early();
delayacct_init();
diff --git a/kernel/audit.c b/kernel/audit.c
index 68cee3b..e9a3e40 100644
--- a/kernel/audit.c
+++ b/kernel/audit.c
@@ -208,6 +208,75 @@
struct sk_buff *skb;
};
+static struct kmem_cache *audit_task_cache;
+
+void __init audit_task_init(void)
+{
+ audit_task_cache = kmem_cache_create("audit_task",
+ sizeof(struct audit_task_info),
+ 0, SLAB_PANIC, NULL);
+}
+
+/**
+ * audit_alloc - allocate an audit info block for a task
+ * @tsk: task
+ *
+ * Call audit_alloc_syscall to filter on the task information and
+ * allocate a per-task audit context if necessary. This is called from
+ * copy_process, so no lock is needed.
+ */
+int audit_alloc(struct task_struct *tsk)
+{
+ int ret = 0;
+ struct audit_task_info *info;
+
+ info = kmem_cache_alloc(audit_task_cache, GFP_KERNEL);
+ if (!info) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ info->loginuid = audit_get_loginuid(current);
+ info->sessionid = audit_get_sessionid(current);
+ info->contid = audit_get_contid(current);
+ tsk->audit = info;
+
+ ret = audit_alloc_syscall(tsk);
+ if (ret) {
+ tsk->audit = NULL;
+ kmem_cache_free(audit_task_cache, info);
+ }
+out:
+ return ret;
+}
+
+struct audit_task_info init_struct_audit = {
+ .loginuid = INVALID_UID,
+ .sessionid = AUDIT_SID_UNSET,
+ .contid = AUDIT_CID_UNSET,
+#ifdef CONFIG_AUDITSYSCALL
+ .ctx = NULL,
+#endif
+};
+
+/**
+ * audit_free - free per-task audit info
+ * @tsk: task whose audit info block to free
+ *
+ * Called from copy_process and do_exit
+ */
+void audit_free(struct task_struct *tsk)
+{
+ struct audit_task_info *info = tsk->audit;
+
+ audit_free_syscall(tsk);
+ /* Freeing the audit_task_info struct must be performed after
+ * audit_log_exit() due to need for loginuid and sessionid.
+ */
+ info = tsk->audit;
+ tsk->audit = NULL;
+ kmem_cache_free(audit_task_cache, info);
+}
+
/**
* auditd_test_task - Check to see if a given task is an audit daemon
* @task: the task to check
@@ -2322,8 +2391,8 @@
sessionid = (unsigned int)atomic_inc_return(&session_id);
}
- current->sessionid = sessionid;
- current->loginuid = loginuid;
+ current->audit->sessionid = sessionid;
+ current->audit->loginuid = loginuid;
out:
audit_log_set_loginuid(oldloginuid, loginuid, oldsessionid, sessionid, rc);
return rc;
@@ -2356,6 +2425,62 @@
return audit_signal_info_syscall(t);
}
+/*
+ * audit_set_contid - set current task's audit contid
+ * @task: target task
+ * @contid: contid value
+ *
+ * Returns 0 on success, -EPERM on permission failure.
+ *
+ * Called (set) from fs/proc/base.c::proc_contid_write().
+ */
+int audit_set_contid(struct task_struct *task, u64 contid)
+{
+ u64 oldcontid;
+ int rc = 0;
+ struct audit_buffer *ab;
+
+ task_lock(task);
+ /* Can't set if audit disabled */
+ if (!task->audit) {
+ task_unlock(task);
+ return -ENOPROTOOPT;
+ }
+ oldcontid = audit_get_contid(task);
+ read_lock(&tasklist_lock);
+ /* Don't allow the audit containerid to be unset */
+ if (!audit_contid_valid(contid))
+ rc = -EINVAL;
+ /* if we don't have caps, reject */
+ else if (!capable(CAP_AUDIT_CONTROL))
+ rc = -EPERM;
+ /* if task has children or is not single-threaded, deny */
+ else if (!list_empty(&task->children))
+ rc = -EBUSY;
+ else if (!(thread_group_leader(task) && thread_group_empty(task)))
+ rc = -EALREADY;
+ /* if contid is already set, deny */
+ else if (audit_contid_set(task))
+ rc = -ECHILD;
+ read_unlock(&tasklist_lock);
+ if (!rc)
+ task->audit->contid = contid;
+ task_unlock(task);
+
+ if (!audit_enabled)
+ return rc;
+
+ ab = audit_log_start(audit_context(), GFP_KERNEL, AUDIT_CONTAINER_OP);
+ if (!ab)
+ return rc;
+
+ audit_log_format(ab,
+ "op=set opid=%d contid=%llu old-contid=%llu",
+ task_tgid_nr(task), contid, oldcontid);
+ audit_log_end(ab);
+ return rc;
+}
+
/**
* audit_log_end - end one audit record
* @ab: the audit_buffer
diff --git a/kernel/audit.h b/kernel/audit.h
index 3b9c094..c650ed4 100644
--- a/kernel/audit.h
+++ b/kernel/audit.h
@@ -135,6 +135,7 @@
kuid_t target_uid;
unsigned int target_sessionid;
u32 target_sid;
+ u64 target_cid;
char target_comm[TASK_COMM_LEN];
struct audit_tree_refs *trees, *first_trees;
@@ -251,6 +252,8 @@
extern unsigned int audit_serial(void);
extern int auditsc_get_stamp(struct audit_context *ctx,
struct timespec64 *t, unsigned int *serial);
+extern int audit_alloc_syscall(struct task_struct *tsk);
+extern void audit_free_syscall(struct task_struct *tsk);
extern void audit_put_watch(struct audit_watch *watch);
extern void audit_get_watch(struct audit_watch *watch);
@@ -292,6 +295,9 @@
extern struct list_head *audit_killed_trees(void);
#else /* CONFIG_AUDITSYSCALL */
#define auditsc_get_stamp(c, t, s) 0
+#define audit_alloc_syscall(t) 0
+#define audit_free_syscall(t) {}
+
#define audit_put_watch(w) {}
#define audit_get_watch(w) {}
#define audit_to_watch(k, p, l, o) (-EINVAL)
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 8dba8f0..b0867f06 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -114,6 +114,7 @@
kuid_t target_uid[AUDIT_AUX_PIDS];
unsigned int target_sessionid[AUDIT_AUX_PIDS];
u32 target_sid[AUDIT_AUX_PIDS];
+ u64 target_cid[AUDIT_AUX_PIDS];
char target_comm[AUDIT_AUX_PIDS][TASK_COMM_LEN];
int pid_count;
};
@@ -932,23 +933,25 @@
return context;
}
-/**
- * audit_alloc - allocate an audit context block for a task
+/*
+ * audit_alloc_syscall - allocate an audit context block for a task
* @tsk: task
*
* Filter on the task information and allocate a per-task audit context
* if necessary. Doing so turns on system call auditing for the
- * specified task. This is called from copy_process, so no lock is
- * needed.
+ * specified task. This is called from copy_process via audit_alloc, so
+ * no lock is needed.
*/
-int audit_alloc(struct task_struct *tsk)
+int audit_alloc_syscall(struct task_struct *tsk)
{
struct audit_context *context;
enum audit_state state;
char *key = NULL;
- if (likely(!audit_ever_enabled))
+ if (likely(!audit_ever_enabled)) {
+ audit_set_context(tsk, NULL);
return 0; /* Return if not auditing. */
+ }
state = audit_filter_task(tsk, &key);
if (state == AUDIT_DISABLED) {
@@ -958,7 +961,7 @@
if (!(context = audit_alloc_context(state))) {
kfree(key);
- audit_log_lost("out of memory in audit_alloc");
+ audit_log_lost("out of memory in audit_alloc_syscall");
return -ENOMEM;
}
context->filterkey = key;
@@ -1603,14 +1606,15 @@
}
/**
- * __audit_free - free a per-task audit context
+ * audit_free_syscall - free per-task audit context info
* @tsk: task whose audit context block to free
*
- * Called from copy_process and do_exit
+ * Called from audit_free
*/
-void __audit_free(struct task_struct *tsk)
+void audit_free_syscall(struct task_struct *tsk)
{
- struct audit_context *context = tsk->audit_context;
+ struct audit_task_info *info = tsk->audit;
+ struct audit_context *context = info->ctx;
if (!context)
return;
@@ -1633,7 +1637,6 @@
if (context->current_state == AUDIT_RECORD_CONTEXT)
audit_log_exit();
}
-
audit_set_context(tsk, NULL);
audit_free_context(context);
}
@@ -2415,6 +2418,7 @@
context->target_uid = task_uid(t);
context->target_sessionid = audit_get_sessionid(t);
security_task_getsecid(t, &context->target_sid);
+ context->target_cid = audit_get_contid(t);
memcpy(context->target_comm, t->comm, TASK_COMM_LEN);
}
@@ -2442,6 +2446,7 @@
ctx->target_uid = t_uid;
ctx->target_sessionid = audit_get_sessionid(t);
security_task_getsecid(t, &ctx->target_sid);
+ ctx->target_cid = audit_get_contid(t);
memcpy(ctx->target_comm, t->comm, TASK_COMM_LEN);
return 0;
}
@@ -2463,6 +2468,7 @@
axp->target_uid[axp->pid_count] = t_uid;
axp->target_sessionid[axp->pid_count] = audit_get_sessionid(t);
security_task_getsecid(t, &axp->target_sid[axp->pid_count]);
+ axp->target_cid[axp->pid_count] = audit_get_contid(t);
memcpy(axp->target_comm[axp->pid_count], t->comm, TASK_COMM_LEN);
axp->pid_count++;
diff --git a/kernel/bpf/Makefile b/kernel/bpf/Makefile
index c1b9f71..d124934 100644
--- a/kernel/bpf/Makefile
+++ b/kernel/bpf/Makefile
@@ -10,6 +10,7 @@
obj-$(CONFIG_BPF_SYSCALL) += hashtab.o arraymap.o percpu_freelist.o bpf_lru_list.o lpm_trie.o map_in_map.o
obj-$(CONFIG_BPF_SYSCALL) += local_storage.o queue_stack_maps.o ringbuf.o
obj-${CONFIG_BPF_LSM} += bpf_inode_storage.o
+obj-${CONFIG_BPF_LSM} += bpf_task_storage.o
obj-$(CONFIG_BPF_SYSCALL) += disasm.o
obj-$(CONFIG_BPF_JIT) += trampoline.o
obj-$(CONFIG_BPF_SYSCALL) += btf.o
diff --git a/kernel/bpf/bpf_lsm.c b/kernel/bpf/bpf_lsm.c
index 56cc5a9..1622a44 100644
--- a/kernel/bpf/bpf_lsm.c
+++ b/kernel/bpf/bpf_lsm.c
@@ -7,6 +7,7 @@
#include <linux/filter.h>
#include <linux/bpf.h>
#include <linux/btf.h>
+#include <linux/binfmts.h>
#include <linux/lsm_hooks.h>
#include <linux/bpf_lsm.h>
#include <linux/kallsyms.h>
@@ -14,6 +15,7 @@
#include <net/bpf_sk_storage.h>
#include <linux/bpf_local_storage.h>
#include <linux/btf_ids.h>
+#include <linux/ima.h>
/* For every LSM hook that allows attachment of BPF programs, declare a nop
* function where a BPF program can be attached.
@@ -51,6 +53,52 @@
return 0;
}
+/* Mask for all the currently supported BPRM option flags */
+#define BPF_F_BRPM_OPTS_MASK BPF_F_BPRM_SECUREEXEC
+
+BPF_CALL_2(bpf_bprm_opts_set, struct linux_binprm *, bprm, u64, flags)
+{
+ if (flags & ~BPF_F_BRPM_OPTS_MASK)
+ return -EINVAL;
+
+ bprm->secureexec = (flags & BPF_F_BPRM_SECUREEXEC);
+ return 0;
+}
+
+BTF_ID_LIST_SINGLE(bpf_bprm_opts_set_btf_ids, struct, linux_binprm)
+
+const static struct bpf_func_proto bpf_bprm_opts_set_proto = {
+ .func = bpf_bprm_opts_set,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_BTF_ID,
+ .arg1_btf_id = &bpf_bprm_opts_set_btf_ids[0],
+ .arg2_type = ARG_ANYTHING,
+};
+
+BPF_CALL_3(bpf_ima_inode_hash, struct inode *, inode, void *, dst, u32, size)
+{
+ return ima_inode_hash(inode, dst, size);
+}
+
+static bool bpf_ima_inode_hash_allowed(const struct bpf_prog *prog)
+{
+ return bpf_lsm_is_sleepable_hook(prog->aux->attach_btf_id);
+}
+
+BTF_ID_LIST_SINGLE(bpf_ima_inode_hash_btf_ids, struct, inode)
+
+const static struct bpf_func_proto bpf_ima_inode_hash_proto = {
+ .func = bpf_ima_inode_hash,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_BTF_ID,
+ .arg1_btf_id = &bpf_ima_inode_hash_btf_ids[0],
+ .arg2_type = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type = ARG_CONST_SIZE,
+ .allowed = bpf_ima_inode_hash_allowed,
+};
+
static const struct bpf_func_proto *
bpf_lsm_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
{
@@ -63,11 +111,115 @@
return &bpf_sk_storage_get_proto;
case BPF_FUNC_sk_storage_delete:
return &bpf_sk_storage_delete_proto;
+ case BPF_FUNC_spin_lock:
+ return &bpf_spin_lock_proto;
+ case BPF_FUNC_spin_unlock:
+ return &bpf_spin_unlock_proto;
+ case BPF_FUNC_task_storage_get:
+ return &bpf_task_storage_get_proto;
+ case BPF_FUNC_task_storage_delete:
+ return &bpf_task_storage_delete_proto;
+ case BPF_FUNC_bprm_opts_set:
+ return &bpf_bprm_opts_set_proto;
+ case BPF_FUNC_ima_inode_hash:
+ return prog->aux->sleepable ? &bpf_ima_inode_hash_proto : NULL;
default:
return tracing_prog_func_proto(func_id, prog);
}
}
+/* The set of hooks which are called without pagefaults disabled and are allowed
+ * to "sleep" and thus can be used for sleeable BPF programs.
+ */
+BTF_SET_START(sleepable_lsm_hooks)
+BTF_ID(func, bpf_lsm_bpf)
+BTF_ID(func, bpf_lsm_bpf_map)
+BTF_ID(func, bpf_lsm_bpf_map_alloc_security)
+BTF_ID(func, bpf_lsm_bpf_map_free_security)
+BTF_ID(func, bpf_lsm_bpf_prog)
+BTF_ID(func, bpf_lsm_bprm_check_security)
+BTF_ID(func, bpf_lsm_bprm_committed_creds)
+BTF_ID(func, bpf_lsm_bprm_committing_creds)
+BTF_ID(func, bpf_lsm_bprm_creds_for_exec)
+BTF_ID(func, bpf_lsm_bprm_creds_from_file)
+BTF_ID(func, bpf_lsm_capget)
+BTF_ID(func, bpf_lsm_capset)
+BTF_ID(func, bpf_lsm_cred_prepare)
+BTF_ID(func, bpf_lsm_file_ioctl)
+BTF_ID(func, bpf_lsm_file_lock)
+BTF_ID(func, bpf_lsm_file_open)
+BTF_ID(func, bpf_lsm_file_receive)
+
+#ifdef CONFIG_SECURITY_NETWORK
+BTF_ID(func, bpf_lsm_inet_conn_established)
+#endif /* CONFIG_SECURITY_NETWORK */
+
+BTF_ID(func, bpf_lsm_inode_create)
+BTF_ID(func, bpf_lsm_inode_free_security)
+BTF_ID(func, bpf_lsm_inode_getattr)
+BTF_ID(func, bpf_lsm_inode_getxattr)
+BTF_ID(func, bpf_lsm_inode_mknod)
+BTF_ID(func, bpf_lsm_inode_need_killpriv)
+BTF_ID(func, bpf_lsm_inode_post_setxattr)
+BTF_ID(func, bpf_lsm_inode_readlink)
+BTF_ID(func, bpf_lsm_inode_rename)
+BTF_ID(func, bpf_lsm_inode_rmdir)
+BTF_ID(func, bpf_lsm_inode_setattr)
+BTF_ID(func, bpf_lsm_inode_setxattr)
+BTF_ID(func, bpf_lsm_inode_symlink)
+BTF_ID(func, bpf_lsm_inode_unlink)
+BTF_ID(func, bpf_lsm_kernel_module_request)
+BTF_ID(func, bpf_lsm_kernfs_init_security)
+
+#ifdef CONFIG_KEYS
+BTF_ID(func, bpf_lsm_key_free)
+#endif /* CONFIG_KEYS */
+
+BTF_ID(func, bpf_lsm_mmap_file)
+BTF_ID(func, bpf_lsm_netlink_send)
+BTF_ID(func, bpf_lsm_path_notify)
+BTF_ID(func, bpf_lsm_release_secctx)
+BTF_ID(func, bpf_lsm_sb_alloc_security)
+BTF_ID(func, bpf_lsm_sb_eat_lsm_opts)
+BTF_ID(func, bpf_lsm_sb_kern_mount)
+BTF_ID(func, bpf_lsm_sb_mount)
+BTF_ID(func, bpf_lsm_sb_remount)
+BTF_ID(func, bpf_lsm_sb_set_mnt_opts)
+BTF_ID(func, bpf_lsm_sb_show_options)
+BTF_ID(func, bpf_lsm_sb_statfs)
+BTF_ID(func, bpf_lsm_sb_umount)
+BTF_ID(func, bpf_lsm_settime)
+
+#ifdef CONFIG_SECURITY_NETWORK
+BTF_ID(func, bpf_lsm_socket_accept)
+BTF_ID(func, bpf_lsm_socket_bind)
+BTF_ID(func, bpf_lsm_socket_connect)
+BTF_ID(func, bpf_lsm_socket_create)
+BTF_ID(func, bpf_lsm_socket_getpeername)
+BTF_ID(func, bpf_lsm_socket_getpeersec_dgram)
+BTF_ID(func, bpf_lsm_socket_getsockname)
+BTF_ID(func, bpf_lsm_socket_getsockopt)
+BTF_ID(func, bpf_lsm_socket_listen)
+BTF_ID(func, bpf_lsm_socket_post_create)
+BTF_ID(func, bpf_lsm_socket_recvmsg)
+BTF_ID(func, bpf_lsm_socket_sendmsg)
+BTF_ID(func, bpf_lsm_socket_shutdown)
+BTF_ID(func, bpf_lsm_socket_socketpair)
+#endif /* CONFIG_SECURITY_NETWORK */
+
+BTF_ID(func, bpf_lsm_syslog)
+BTF_ID(func, bpf_lsm_task_alloc)
+BTF_ID(func, bpf_lsm_task_getsecid)
+BTF_ID(func, bpf_lsm_task_prctl)
+BTF_ID(func, bpf_lsm_task_setscheduler)
+BTF_ID(func, bpf_lsm_task_to_inode)
+BTF_SET_END(sleepable_lsm_hooks)
+
+bool bpf_lsm_is_sleepable_hook(u32 btf_id)
+{
+ return btf_id_set_contains(&sleepable_lsm_hooks, btf_id);
+}
+
const struct bpf_prog_ops lsm_prog_ops = {
};
diff --git a/kernel/bpf/bpf_task_storage.c b/kernel/bpf/bpf_task_storage.c
new file mode 100644
index 0000000..e0da025
--- /dev/null
+++ b/kernel/bpf/bpf_task_storage.c
@@ -0,0 +1,318 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2020 Facebook
+ * Copyright 2020 Google LLC.
+ */
+
+#include <linux/pid.h>
+#include <linux/sched.h>
+#include <linux/rculist.h>
+#include <linux/list.h>
+#include <linux/hash.h>
+#include <linux/types.h>
+#include <linux/spinlock.h>
+#include <linux/bpf.h>
+#include <linux/bpf_local_storage.h>
+#include <linux/filter.h>
+#include <uapi/linux/btf.h>
+#include <linux/bpf_lsm.h>
+#include <linux/btf_ids.h>
+#include <linux/fdtable.h>
+
+DEFINE_BPF_STORAGE_CACHE(task_cache);
+
+static struct bpf_local_storage __rcu **task_storage_ptr(void *owner)
+{
+ struct task_struct *task = owner;
+ struct bpf_storage_blob *bsb;
+
+ bsb = bpf_task(task);
+ if (!bsb)
+ return NULL;
+ return &bsb->storage;
+}
+
+static struct bpf_local_storage_data *
+task_storage_lookup(struct task_struct *task, struct bpf_map *map,
+ bool cacheit_lockit)
+{
+ struct bpf_local_storage *task_storage;
+ struct bpf_local_storage_map *smap;
+ struct bpf_storage_blob *bsb;
+
+ bsb = bpf_task(task);
+ if (!bsb)
+ return NULL;
+
+ task_storage = rcu_dereference(bsb->storage);
+ if (!task_storage)
+ return NULL;
+
+ smap = (struct bpf_local_storage_map *)map;
+ return bpf_local_storage_lookup(task_storage, smap, cacheit_lockit);
+}
+
+void bpf_task_storage_free(struct task_struct *task)
+{
+ struct bpf_local_storage_elem *selem;
+ struct bpf_local_storage *local_storage;
+ bool free_task_storage = false;
+ struct bpf_storage_blob *bsb;
+ struct hlist_node *n;
+
+ bsb = bpf_task(task);
+ if (!bsb)
+ return;
+
+ rcu_read_lock();
+
+ local_storage = rcu_dereference(bsb->storage);
+ if (!local_storage) {
+ rcu_read_unlock();
+ return;
+ }
+
+ /* Neither the bpf_prog nor the bpf-map's syscall
+ * could be modifying the local_storage->list now.
+ * Thus, no elem can be added-to or deleted-from the
+ * local_storage->list by the bpf_prog or by the bpf-map's syscall.
+ *
+ * It is racing with bpf_local_storage_map_free() alone
+ * when unlinking elem from the local_storage->list and
+ * the map's bucket->list.
+ */
+ raw_spin_lock_bh(&local_storage->lock);
+ hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
+ /* Always unlink from map before unlinking from
+ * local_storage.
+ */
+ bpf_selem_unlink_map(selem);
+ free_task_storage = bpf_selem_unlink_storage_nolock(
+ local_storage, selem, false);
+ }
+ raw_spin_unlock_bh(&local_storage->lock);
+ rcu_read_unlock();
+
+ /* free_task_storage should always be true as long as
+ * local_storage->list was non-empty.
+ */
+ if (free_task_storage)
+ kfree_rcu(local_storage, rcu);
+}
+
+static void *bpf_pid_task_storage_lookup_elem(struct bpf_map *map, void *key)
+{
+ struct bpf_local_storage_data *sdata;
+ struct task_struct *task;
+ unsigned int f_flags;
+ struct pid *pid;
+ int fd, err;
+
+ fd = *(int *)key;
+ pid = pidfd_get_pid(fd, &f_flags);
+ if (IS_ERR(pid))
+ return ERR_CAST(pid);
+
+ /* We should be in an RCU read side critical section, it should be safe
+ * to call pid_task.
+ */
+ WARN_ON_ONCE(!rcu_read_lock_held());
+ task = pid_task(pid, PIDTYPE_PID);
+ if (!task) {
+ err = -ENOENT;
+ goto out;
+ }
+
+ sdata = task_storage_lookup(task, map, true);
+ put_pid(pid);
+ return sdata ? sdata->data : NULL;
+out:
+ put_pid(pid);
+ return ERR_PTR(err);
+}
+
+static int bpf_pid_task_storage_update_elem(struct bpf_map *map, void *key,
+ void *value, u64 map_flags)
+{
+ struct bpf_local_storage_data *sdata;
+ struct task_struct *task;
+ unsigned int f_flags;
+ struct pid *pid;
+ int fd, err;
+
+ fd = *(int *)key;
+ pid = pidfd_get_pid(fd, &f_flags);
+ if (IS_ERR(pid))
+ return PTR_ERR(pid);
+
+ /* We should be in an RCU read side critical section, it should be safe
+ * to call pid_task.
+ */
+ WARN_ON_ONCE(!rcu_read_lock_held());
+ task = pid_task(pid, PIDTYPE_PID);
+ if (!task || !task_storage_ptr(task)) {
+ err = -ENOENT;
+ goto out;
+ }
+
+ sdata = bpf_local_storage_update(
+ task, (struct bpf_local_storage_map *)map, value, map_flags);
+
+ err = PTR_ERR_OR_ZERO(sdata);
+out:
+ put_pid(pid);
+ return err;
+}
+
+static int task_storage_delete(struct task_struct *task, struct bpf_map *map)
+{
+ struct bpf_local_storage_data *sdata;
+
+ sdata = task_storage_lookup(task, map, false);
+ if (!sdata)
+ return -ENOENT;
+
+ bpf_selem_unlink(SELEM(sdata));
+
+ return 0;
+}
+
+static int bpf_pid_task_storage_delete_elem(struct bpf_map *map, void *key)
+{
+ struct task_struct *task;
+ unsigned int f_flags;
+ struct pid *pid;
+ int fd, err;
+
+ fd = *(int *)key;
+ pid = pidfd_get_pid(fd, &f_flags);
+ if (IS_ERR(pid))
+ return PTR_ERR(pid);
+
+ /* We should be in an RCU read side critical section, it should be safe
+ * to call pid_task.
+ */
+ WARN_ON_ONCE(!rcu_read_lock_held());
+ task = pid_task(pid, PIDTYPE_PID);
+ if (!task) {
+ err = -ENOENT;
+ goto out;
+ }
+
+ err = task_storage_delete(task, map);
+out:
+ put_pid(pid);
+ return err;
+}
+
+BPF_CALL_4(bpf_task_storage_get, struct bpf_map *, map, struct task_struct *,
+ task, void *, value, u64, flags)
+{
+ struct bpf_local_storage_data *sdata;
+
+ if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
+ return (unsigned long)NULL;
+
+ /* explicitly check that the task_storage_ptr is not
+ * NULL as task_storage_lookup returns NULL in this case and
+ * bpf_local_storage_update expects the owner to have a
+ * valid storage pointer.
+ */
+ if (!task || !task_storage_ptr(task))
+ return (unsigned long)NULL;
+
+ sdata = task_storage_lookup(task, map, true);
+ if (sdata)
+ return (unsigned long)sdata->data;
+
+ /* This helper must only be called from places where the lifetime of the task
+ * is guaranteed. Either by being refcounted or by being protected
+ * by an RCU read-side critical section.
+ */
+ if (flags & BPF_LOCAL_STORAGE_GET_F_CREATE) {
+ sdata = bpf_local_storage_update(
+ task, (struct bpf_local_storage_map *)map, value,
+ BPF_NOEXIST);
+ return IS_ERR(sdata) ? (unsigned long)NULL :
+ (unsigned long)sdata->data;
+ }
+
+ return (unsigned long)NULL;
+}
+
+BPF_CALL_2(bpf_task_storage_delete, struct bpf_map *, map, struct task_struct *,
+ task)
+{
+ if (!task)
+ return -EINVAL;
+
+ /* This helper must only be called from places where the lifetime of the task
+ * is guaranteed. Either by being refcounted or by being protected
+ * by an RCU read-side critical section.
+ */
+ return task_storage_delete(task, map);
+}
+
+static int notsupp_get_next_key(struct bpf_map *map, void *key, void *next_key)
+{
+ return -ENOTSUPP;
+}
+
+static struct bpf_map *task_storage_map_alloc(union bpf_attr *attr)
+{
+ struct bpf_local_storage_map *smap;
+
+ smap = bpf_local_storage_map_alloc(attr);
+ if (IS_ERR(smap))
+ return ERR_CAST(smap);
+
+ smap->cache_idx = bpf_local_storage_cache_idx_get(&task_cache);
+ return &smap->map;
+}
+
+static void task_storage_map_free(struct bpf_map *map)
+{
+ struct bpf_local_storage_map *smap;
+
+ smap = (struct bpf_local_storage_map *)map;
+ bpf_local_storage_cache_idx_free(&task_cache, smap->cache_idx);
+ bpf_local_storage_map_free(smap);
+}
+
+static int task_storage_map_btf_id;
+const struct bpf_map_ops task_storage_map_ops = {
+ .map_meta_equal = bpf_map_meta_equal,
+ .map_alloc_check = bpf_local_storage_map_alloc_check,
+ .map_alloc = task_storage_map_alloc,
+ .map_free = task_storage_map_free,
+ .map_get_next_key = notsupp_get_next_key,
+ .map_lookup_elem = bpf_pid_task_storage_lookup_elem,
+ .map_update_elem = bpf_pid_task_storage_update_elem,
+ .map_delete_elem = bpf_pid_task_storage_delete_elem,
+ .map_check_btf = bpf_local_storage_map_check_btf,
+ .map_btf_name = "bpf_local_storage_map",
+ .map_btf_id = &task_storage_map_btf_id,
+ .map_owner_storage_ptr = task_storage_ptr,
+};
+
+BTF_ID_LIST_SINGLE(bpf_task_storage_btf_ids, struct, task_struct)
+
+const struct bpf_func_proto bpf_task_storage_get_proto = {
+ .func = bpf_task_storage_get,
+ .gpl_only = false,
+ .ret_type = RET_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .arg2_btf_id = &bpf_task_storage_btf_ids[0],
+ .arg3_type = ARG_PTR_TO_MAP_VALUE_OR_NULL,
+ .arg4_type = ARG_ANYTHING,
+};
+
+const struct bpf_func_proto bpf_task_storage_delete_proto = {
+ .func = bpf_task_storage_delete,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_CONST_MAP_PTR,
+ .arg2_type = ARG_PTR_TO_BTF_ID,
+ .arg2_btf_id = &bpf_task_storage_btf_ids[0],
+};
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 182e162..e51c28e 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -1317,8 +1317,8 @@
INSN_3(STX, MEM, H), \
INSN_3(STX, MEM, W), \
INSN_3(STX, MEM, DW), \
- INSN_3(STX, XADD, W), \
- INSN_3(STX, XADD, DW), \
+ INSN_3(STX, ATOMIC, W), \
+ INSN_3(STX, ATOMIC, DW), \
/* Immediate based. */ \
INSN_3(ST, MEM, B), \
INSN_3(ST, MEM, H), \
@@ -1626,13 +1626,59 @@
LDX_PROBE(DW, 8)
#undef LDX_PROBE
- STX_XADD_W: /* lock xadd *(u32 *)(dst_reg + off16) += src_reg */
- atomic_add((u32) SRC, (atomic_t *)(unsigned long)
- (DST + insn->off));
- CONT;
- STX_XADD_DW: /* lock xadd *(u64 *)(dst_reg + off16) += src_reg */
- atomic64_add((u64) SRC, (atomic64_t *)(unsigned long)
- (DST + insn->off));
+#define ATOMIC_ALU_OP(BOP, KOP) \
+ case BOP: \
+ if (BPF_SIZE(insn->code) == BPF_W) \
+ atomic_##KOP((u32) SRC, (atomic_t *)(unsigned long) \
+ (DST + insn->off)); \
+ else \
+ atomic64_##KOP((u64) SRC, (atomic64_t *)(unsigned long) \
+ (DST + insn->off)); \
+ break; \
+ case BOP | BPF_FETCH: \
+ if (BPF_SIZE(insn->code) == BPF_W) \
+ SRC = (u32) atomic_fetch_##KOP( \
+ (u32) SRC, \
+ (atomic_t *)(unsigned long) (DST + insn->off)); \
+ else \
+ SRC = (u64) atomic64_fetch_##KOP( \
+ (u64) SRC, \
+ (atomic64_t *)(unsigned long) (DST + insn->off)); \
+ break;
+
+ STX_ATOMIC_DW:
+ STX_ATOMIC_W:
+ switch (IMM) {
+ ATOMIC_ALU_OP(BPF_ADD, add)
+ ATOMIC_ALU_OP(BPF_AND, and)
+ ATOMIC_ALU_OP(BPF_OR, or)
+ ATOMIC_ALU_OP(BPF_XOR, xor)
+#undef ATOMIC_ALU_OP
+
+ case BPF_XCHG:
+ if (BPF_SIZE(insn->code) == BPF_W)
+ SRC = (u32) atomic_xchg(
+ (atomic_t *)(unsigned long) (DST + insn->off),
+ (u32) SRC);
+ else
+ SRC = (u64) atomic64_xchg(
+ (atomic64_t *)(unsigned long) (DST + insn->off),
+ (u64) SRC);
+ break;
+ case BPF_CMPXCHG:
+ if (BPF_SIZE(insn->code) == BPF_W)
+ BPF_R0 = (u32) atomic_cmpxchg(
+ (atomic_t *)(unsigned long) (DST + insn->off),
+ (u32) BPF_R0, (u32) SRC);
+ else
+ BPF_R0 = (u64) atomic64_cmpxchg(
+ (atomic64_t *)(unsigned long) (DST + insn->off),
+ (u64) BPF_R0, (u64) SRC);
+ break;
+
+ default:
+ goto default_label;
+ }
CONT;
default_label:
@@ -1642,7 +1688,8 @@
*
* Note, verifier whitelists all opcodes in bpf_opcode_in_insntable().
*/
- pr_warn("BPF interpreter: unknown opcode %02x\n", insn->code);
+ pr_warn("BPF interpreter: unknown opcode %02x (imm: 0x%x)\n",
+ insn->code, insn->imm);
BUG_ON(1);
return 0;
}
@@ -2211,6 +2258,7 @@
const struct bpf_func_proto bpf_get_numa_node_id_proto __weak;
const struct bpf_func_proto bpf_ktime_get_ns_proto __weak;
const struct bpf_func_proto bpf_ktime_get_boot_ns_proto __weak;
+const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto __weak;
const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
@@ -2269,6 +2317,10 @@
/* Return TRUE if the JIT backend wants verifier to enable sub-register usage
* analysis code and wants explicit zero extension inserted by verifier.
* Otherwise, return FALSE.
+ *
+ * The verifier inserts an explicit zero extension after BPF_CMPXCHGs even if
+ * you don't override this. JITs that don't want these extra insns can detect
+ * them using insn_is_zext.
*/
bool __weak bpf_jit_needs_zext(void)
{
diff --git a/kernel/bpf/disasm.c b/kernel/bpf/disasm.c
index b44d8c4..3acc7e0 100644
--- a/kernel/bpf/disasm.c
+++ b/kernel/bpf/disasm.c
@@ -80,6 +80,13 @@
[BPF_END >> 4] = "endian",
};
+static const char *const bpf_atomic_alu_string[16] = {
+ [BPF_ADD >> 4] = "add",
+ [BPF_AND >> 4] = "and",
+ [BPF_OR >> 4] = "or",
+ [BPF_XOR >> 4] = "or",
+};
+
static const char *const bpf_ldst_string[] = {
[BPF_W >> 3] = "u32",
[BPF_H >> 3] = "u16",
@@ -153,14 +160,44 @@
bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg,
insn->off, insn->src_reg);
- else if (BPF_MODE(insn->code) == BPF_XADD)
- verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) += r%d\n",
+ else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ (insn->imm == BPF_ADD || insn->imm == BPF_AND ||
+ insn->imm == BPF_OR || insn->imm == BPF_XOR)) {
+ verbose(cbs->private_data, "(%02x) lock *(%s *)(r%d %+d) %s r%d\n",
insn->code,
bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
insn->dst_reg, insn->off,
+ bpf_alu_string[BPF_OP(insn->imm) >> 4],
insn->src_reg);
- else
+ } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ (insn->imm == (BPF_ADD | BPF_FETCH) ||
+ insn->imm == (BPF_AND | BPF_FETCH) ||
+ insn->imm == (BPF_OR | BPF_FETCH) ||
+ insn->imm == (BPF_XOR | BPF_FETCH))) {
+ verbose(cbs->private_data, "(%02x) r%d = atomic%s_fetch_%s((%s *)(r%d %+d), r%d)\n",
+ insn->code, insn->src_reg,
+ BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
+ bpf_atomic_alu_string[BPF_OP(insn->imm) >> 4],
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg, insn->off, insn->src_reg);
+ } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ insn->imm == BPF_CMPXCHG) {
+ verbose(cbs->private_data, "(%02x) r0 = atomic%s_cmpxchg((%s *)(r%d %+d), r0, r%d)\n",
+ insn->code,
+ BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg, insn->off,
+ insn->src_reg);
+ } else if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ insn->imm == BPF_XCHG) {
+ verbose(cbs->private_data, "(%02x) r%d = atomic%s_xchg((%s *)(r%d %+d), r%d)\n",
+ insn->code, insn->src_reg,
+ BPF_SIZE(insn->code) == BPF_DW ? "64" : "",
+ bpf_ldst_string[BPF_SIZE(insn->code) >> 3],
+ insn->dst_reg, insn->off, insn->src_reg);
+ } else {
verbose(cbs->private_data, "BUG_%02x\n", insn->code);
+ }
} else if (class == BPF_ST) {
if (BPF_MODE(insn->code) != BPF_MEM) {
verbose(cbs->private_data, "BUG_st_%02x\n", insn->code);
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index c489430..41ca280 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -167,6 +167,17 @@
.ret_type = RET_INTEGER,
};
+BPF_CALL_0(bpf_ktime_get_coarse_ns)
+{
+ return ktime_get_coarse_ns();
+}
+
+const struct bpf_func_proto bpf_ktime_get_coarse_ns_proto = {
+ .func = bpf_ktime_get_coarse_ns,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+};
+
BPF_CALL_0(bpf_get_current_pid_tgid)
{
struct task_struct *task = current;
@@ -685,6 +696,8 @@
return &bpf_ktime_get_ns_proto;
case BPF_FUNC_ktime_get_boot_ns:
return &bpf_ktime_get_boot_ns_proto;
+ case BPF_FUNC_ktime_get_coarse_ns:
+ return &bpf_ktime_get_coarse_ns_proto;
case BPF_FUNC_ringbuf_output:
return &bpf_ringbuf_output_proto;
case BPF_FUNC_ringbuf_reserve:
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 9433ab9..f1a146f 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -773,7 +773,8 @@
map->map_type != BPF_MAP_TYPE_ARRAY &&
map->map_type != BPF_MAP_TYPE_CGROUP_STORAGE &&
map->map_type != BPF_MAP_TYPE_SK_STORAGE &&
- map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
+ map->map_type != BPF_MAP_TYPE_INODE_STORAGE &&
+ map->map_type != BPF_MAP_TYPE_TASK_STORAGE)
return -ENOTSUPP;
if (map->spin_lock_off + sizeof(struct bpf_spin_lock) >
map->value_size) {
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 4f50d6f..fcfefff 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -496,6 +496,13 @@
func_id == BPF_FUNC_skc_to_tcp_request_sock;
}
+static bool is_cmpxchg_insn(const struct bpf_insn *insn)
+{
+ return BPF_CLASS(insn->code) == BPF_STX &&
+ BPF_MODE(insn->code) == BPF_ATOMIC &&
+ insn->imm == BPF_CMPXCHG;
+}
+
/* string representation of 'enum bpf_reg_type' */
static const char * const reg_type_str[] = {
[NOT_INIT] = "?",
@@ -1649,7 +1656,11 @@
}
if (class == BPF_STX) {
- if (reg->type != SCALAR_VALUE)
+ /* BPF_STX (including atomic variants) has multiple source
+ * operands, one of which is a ptr. Check whether the caller is
+ * asking about it.
+ */
+ if (t == SRC_OP && reg->type != SCALAR_VALUE)
return true;
return BPF_SIZE(code) == BPF_DW;
}
@@ -1681,22 +1692,38 @@
return true;
}
-/* Return TRUE if INSN doesn't have explicit value define. */
-static bool insn_no_def(struct bpf_insn *insn)
+/* Return the regno defined by the insn, or -1. */
+static int insn_def_regno(const struct bpf_insn *insn)
{
- u8 class = BPF_CLASS(insn->code);
-
- return (class == BPF_JMP || class == BPF_JMP32 ||
- class == BPF_STX || class == BPF_ST);
+ switch (BPF_CLASS(insn->code)) {
+ case BPF_JMP:
+ case BPF_JMP32:
+ case BPF_ST:
+ return -1;
+ case BPF_STX:
+ if (BPF_MODE(insn->code) == BPF_ATOMIC &&
+ (insn->imm & BPF_FETCH)) {
+ if (insn->imm == BPF_CMPXCHG)
+ return BPF_REG_0;
+ else
+ return insn->src_reg;
+ } else {
+ return -1;
+ }
+ default:
+ return insn->dst_reg;
+ }
}
/* Return TRUE if INSN has defined any 32-bit value explicitly. */
static bool insn_has_def32(struct bpf_verifier_env *env, struct bpf_insn *insn)
{
- if (insn_no_def(insn))
+ int dst_reg = insn_def_regno(insn);
+
+ if (dst_reg == -1)
return false;
- return !is_reg64(env, insn, insn->dst_reg, NULL, DST_OP);
+ return !is_reg64(env, insn, dst_reg, NULL, DST_OP);
}
static void mark_insn_zext(struct bpf_verifier_env *env,
@@ -3934,13 +3961,30 @@
return err;
}
-static int check_xadd(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
+static int check_atomic(struct bpf_verifier_env *env, int insn_idx, struct bpf_insn *insn)
{
+ int load_reg;
int err;
- if ((BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) ||
- insn->imm != 0) {
- verbose(env, "BPF_XADD uses reserved fields\n");
+ switch (insn->imm) {
+ case BPF_ADD:
+ case BPF_ADD | BPF_FETCH:
+ case BPF_AND:
+ case BPF_AND | BPF_FETCH:
+ case BPF_OR:
+ case BPF_OR | BPF_FETCH:
+ case BPF_XOR:
+ case BPF_XOR | BPF_FETCH:
+ case BPF_XCHG:
+ case BPF_CMPXCHG:
+ break;
+ default:
+ verbose(env, "BPF_ATOMIC uses invalid atomic opcode %02x\n", insn->imm);
+ return -EINVAL;
+ }
+
+ if (BPF_SIZE(insn->code) != BPF_W && BPF_SIZE(insn->code) != BPF_DW) {
+ verbose(env, "invalid atomic operand size\n");
return -EINVAL;
}
@@ -3954,6 +3998,13 @@
if (err)
return err;
+ if (insn->imm == BPF_CMPXCHG) {
+ /* Check comparison of R0 with memory location */
+ err = check_reg_arg(env, BPF_REG_0, SRC_OP);
+ if (err)
+ return err;
+ }
+
if (is_pointer_value(env, insn->src_reg)) {
verbose(env, "R%d leaks addr into mem\n", insn->src_reg);
return -EACCES;
@@ -3963,21 +4014,42 @@
is_pkt_reg(env, insn->dst_reg) ||
is_flow_key_reg(env, insn->dst_reg) ||
is_sk_reg(env, insn->dst_reg)) {
- verbose(env, "BPF_XADD stores into R%d %s is not allowed\n",
+ verbose(env, "BPF_ATOMIC stores into R%d %s is not allowed\n",
insn->dst_reg,
reg_type_str[reg_state(env, insn->dst_reg)->type]);
return -EACCES;
}
- /* check whether atomic_add can read the memory */
+ if (insn->imm & BPF_FETCH) {
+ if (insn->imm == BPF_CMPXCHG)
+ load_reg = BPF_REG_0;
+ else
+ load_reg = insn->src_reg;
+
+ /* check and record load of old value */
+ err = check_reg_arg(env, load_reg, DST_OP);
+ if (err)
+ return err;
+ } else {
+ /* This instruction accesses a memory location but doesn't
+ * actually load it into a register.
+ */
+ load_reg = -1;
+ }
+
+ /* check whether we can read the memory */
err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
- BPF_SIZE(insn->code), BPF_READ, -1, true);
+ BPF_SIZE(insn->code), BPF_READ, load_reg, true);
if (err)
return err;
- /* check whether atomic_add can write into the same memory */
- return check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
- BPF_SIZE(insn->code), BPF_WRITE, -1, true);
+ /* check whether we can write into the same memory */
+ err = check_mem_access(env, insn_idx, insn->dst_reg, insn->off,
+ BPF_SIZE(insn->code), BPF_WRITE, -1, true);
+ if (err)
+ return err;
+
+ return 0;
}
/* When register 'regno' is used to read the stack (either directly or through
@@ -4798,6 +4870,11 @@
func_id != BPF_FUNC_inode_storage_delete)
goto error;
break;
+ case BPF_MAP_TYPE_TASK_STORAGE:
+ if (func_id != BPF_FUNC_task_storage_get &&
+ func_id != BPF_FUNC_task_storage_delete)
+ goto error;
+ break;
default:
break;
}
@@ -4876,6 +4953,11 @@
if (map->map_type != BPF_MAP_TYPE_INODE_STORAGE)
goto error;
break;
+ case BPF_FUNC_task_storage_get:
+ case BPF_FUNC_task_storage_delete:
+ if (map->map_type != BPF_MAP_TYPE_TASK_STORAGE)
+ goto error;
+ break;
default:
break;
}
@@ -5508,11 +5590,14 @@
PTR_TO_BTF_ID : PTR_TO_BTF_ID_OR_NULL;
regs[BPF_REG_0].btf_id = meta.ret_btf_id;
}
- } else if (fn->ret_type == RET_PTR_TO_BTF_ID_OR_NULL) {
+ } else if (fn->ret_type == RET_PTR_TO_BTF_ID_OR_NULL ||
+ fn->ret_type == RET_PTR_TO_BTF_ID) {
int ret_btf_id;
mark_reg_known_zero(env, regs, BPF_REG_0);
- regs[BPF_REG_0].type = PTR_TO_BTF_ID_OR_NULL;
+ regs[BPF_REG_0].type = fn->ret_type == RET_PTR_TO_BTF_ID ?
+ PTR_TO_BTF_ID :
+ PTR_TO_BTF_ID_OR_NULL;
ret_btf_id = *fn->ret_btf_id;
if (ret_btf_id == 0) {
verbose(env, "invalid return type %d of func %s#%d\n",
@@ -9868,14 +9953,19 @@
} else if (class == BPF_STX) {
enum bpf_reg_type *prev_dst_type, dst_reg_type;
- if (BPF_MODE(insn->code) == BPF_XADD) {
- err = check_xadd(env, env->insn_idx, insn);
+ if (BPF_MODE(insn->code) == BPF_ATOMIC) {
+ err = check_atomic(env, env->insn_idx, insn);
if (err)
return err;
env->insn_idx++;
continue;
}
+ if (BPF_MODE(insn->code) != BPF_MEM || insn->imm != 0) {
+ verbose(env, "BPF_STX uses reserved fields\n");
+ return -EINVAL;
+ }
+
/* check src1 operand */
err = check_reg_arg(env, insn->src_reg, SRC_OP);
if (err)
@@ -10200,11 +10290,21 @@
verbose(env, "trace type programs with run-time allocated hash maps are unsafe. Switch to preallocated hash maps.\n");
}
- if ((is_tracing_prog_type(prog_type) ||
- prog_type == BPF_PROG_TYPE_SOCKET_FILTER) &&
- map_value_has_spin_lock(map)) {
- verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
- return -EINVAL;
+ if (map_value_has_spin_lock(map)) {
+ if (prog_type == BPF_PROG_TYPE_SOCKET_FILTER) {
+ verbose(env, "socket filter progs cannot use bpf_spin_lock yet\n");
+ return -EINVAL;
+ }
+
+ if (is_tracing_prog_type(prog_type)) {
+ verbose(env, "tracing progs cannot use bpf_spin_lock yet\n");
+ return -EINVAL;
+ }
+
+ if (prog->aux->sleepable) {
+ verbose(env, "sleepable progs cannot use bpf_spin_lock yet\n");
+ return -EINVAL;
+ }
}
if ((bpf_prog_is_dev_bound(prog->aux) || bpf_map_is_dev_bound(map)) &&
@@ -10229,9 +10329,11 @@
return -EINVAL;
}
break;
+ case BPF_MAP_TYPE_RINGBUF:
+ break;
default:
verbose(env,
- "Sleepable programs can only use array and hash maps\n");
+ "Sleepable programs can only use array, hash, and ringbuf maps\n");
return -EINVAL;
}
@@ -10268,13 +10370,6 @@
return -EINVAL;
}
- if (BPF_CLASS(insn->code) == BPF_STX &&
- ((BPF_MODE(insn->code) != BPF_MEM &&
- BPF_MODE(insn->code) != BPF_XADD) || insn->imm != 0)) {
- verbose(env, "BPF_STX uses reserved fields\n");
- return -EINVAL;
- }
-
if (insn[0].code == (BPF_LD | BPF_IMM | BPF_DW)) {
struct bpf_insn_aux_data *aux;
struct bpf_map *map;
@@ -10789,8 +10884,10 @@
for (i = 0; i < len; i++) {
int adj_idx = i + delta;
struct bpf_insn insn;
+ int load_reg;
insn = insns[adj_idx];
+ load_reg = insn_def_regno(&insn);
if (!aux[adj_idx].zext_dst) {
u8 code, class;
u32 imm_rnd;
@@ -10800,14 +10897,14 @@
code = insn.code;
class = BPF_CLASS(code);
- if (insn_no_def(&insn))
+ if (load_reg == -1)
continue;
/* NOTE: arg "reg" (the fourth one) is only used for
- * BPF_STX which has been ruled out in above
- * check, it is safe to pass NULL here.
+ * BPF_STX + SRC_OP, so it is safe to pass NULL
+ * here.
*/
- if (is_reg64(env, &insn, insn.dst_reg, NULL, DST_OP)) {
+ if (is_reg64(env, &insn, load_reg, NULL, DST_OP)) {
if (class == BPF_LD &&
BPF_MODE(code) == BPF_IMM)
i++;
@@ -10822,18 +10919,33 @@
imm_rnd = get_random_int();
rnd_hi32_patch[0] = insn;
rnd_hi32_patch[1].imm = imm_rnd;
- rnd_hi32_patch[3].dst_reg = insn.dst_reg;
+ rnd_hi32_patch[3].dst_reg = load_reg;
patch = rnd_hi32_patch;
patch_len = 4;
goto apply_patch_buffer;
}
- if (!bpf_jit_needs_zext())
+ /* Add in an zero-extend instruction if a) the JIT has requested
+ * it or b) it's a CMPXCHG.
+ *
+ * The latter is because: BPF_CMPXCHG always loads a value into
+ * R0, therefore always zero-extends. However some archs'
+ * equivalent instruction only does this load when the
+ * comparison is successful. This detail of CMPXCHG is
+ * orthogonal to the general zero-extension behaviour of the
+ * CPU, so it's treated independently of bpf_jit_needs_zext.
+ */
+ if (!bpf_jit_needs_zext() && !is_cmpxchg_insn(&insn))
continue;
+ if (WARN_ON(load_reg == -1)) {
+ verbose(env, "verifier bug. zext_dst is set, but no reg is defined\n");
+ return -EFAULT;
+ }
+
zext_patch[0] = insn;
- zext_patch[1].dst_reg = insn.dst_reg;
- zext_patch[1].src_reg = insn.dst_reg;
+ zext_patch[1].dst_reg = load_reg;
+ zext_patch[1].src_reg = load_reg;
patch = zext_patch;
patch_len = 2;
apply_patch_buffer:
@@ -11941,20 +12053,6 @@
return -EINVAL;
}
-/* non exhaustive list of sleepable bpf_lsm_*() functions */
-BTF_SET_START(btf_sleepable_lsm_hooks)
-#ifdef CONFIG_BPF_LSM
-BTF_ID(func, bpf_lsm_bprm_committed_creds)
-#else
-BTF_ID_UNUSED
-#endif
-BTF_SET_END(btf_sleepable_lsm_hooks)
-
-static int check_sleepable_lsm_hook(u32 btf_id)
-{
- return btf_id_set_contains(&btf_sleepable_lsm_hooks, btf_id);
-}
-
/* list of non-sleepable functions that are otherwise on
* ALLOW_ERROR_INJECTION list
*/
@@ -12176,7 +12274,7 @@
/* LSM progs check that they are attached to bpf_lsm_*() funcs.
* Only some of them are sleepable.
*/
- if (check_sleepable_lsm_hook(btf_id))
+ if (bpf_lsm_is_sleepable_hook(btf_id))
ret = 0;
break;
default:
diff --git a/kernel/exit.c b/kernel/exit.c
index d13d67f..7ecf893 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -64,6 +64,7 @@
#include <linux/rcuwait.h>
#include <linux/compat.h>
#include <linux/io_uring.h>
+#include <linux/security.h>
#include <linux/uaccess.h>
#include <asm/unistd.h>
@@ -786,6 +787,8 @@
#endif
if (tsk->mm)
setmax_mm_hiwater_rss(&tsk->signal->maxrss, tsk->mm);
+
+ security_task_exit(tsk);
}
acct_collect(code, group_dead);
if (group_dead)
diff --git a/kernel/fork.c b/kernel/fork.c
index 7c044d3..ed32614 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2033,7 +2033,6 @@
posix_cputimers_init(&p->posix_cputimers);
p->io_context = NULL;
- audit_set_context(p, NULL);
cgroup_fork(p);
#ifdef CONFIG_NUMA
p->mempolicy = mpol_dup(p->mempolicy);
@@ -2317,6 +2316,7 @@
uprobe_copy_process(p, clone_flags);
copy_oom_score_adj(clone_flags, p);
+ security_task_post_alloc(p);
return p;
diff --git a/kernel/futex.c b/kernel/futex.c
index 3136aba..30048a7 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1585,16 +1585,16 @@
}
/*
- * Wake up waiters matching bitset queued on this futex (uaddr).
+ * Prepare wake queue matching bitset queued on this futex (uaddr).
*/
static int
-futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
+prepare_wake_q(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset,
+ struct wake_q_head *wake_q)
{
struct futex_hash_bucket *hb;
struct futex_q *this, *next;
union futex_key key = FUTEX_KEY_INIT;
int ret;
- DEFINE_WAKE_Q(wake_q);
if (!bitset)
return -EINVAL;
@@ -1622,14 +1622,28 @@
if (!(this->bitset & bitset))
continue;
- mark_wake_futex(&wake_q, this);
+ mark_wake_futex(wake_q, this);
if (++ret >= nr_wake)
break;
}
}
spin_unlock(&hb->lock);
- wake_up_q(&wake_q);
+ return ret;
+}
+
+/*
+ * Wake up waiters matching bitset queued on this futex (uaddr).
+ */
+static int
+futex_wake(u32 __user *uaddr, unsigned int flags, int nr_wake, u32 bitset)
+{
+ int ret;
+ DEFINE_WAKE_Q(wake_q);
+
+ ret = prepare_wake_q(uaddr, flags, nr_wake, bitset, &wake_q);
+ if (ret > 0)
+ wake_up_q(&wake_q);
return ret;
}
@@ -2577,9 +2591,12 @@
* @hb: the futex hash bucket, must be locked by the caller
* @q: the futex_q to queue up on
* @timeout: the prepared hrtimer_sleeper, or null for no timeout
+ * @next: if present, wake next and hint to the scheduler that we'd
+ * prefer to execute it locally.
*/
static void futex_wait_queue_me(struct futex_hash_bucket *hb, struct futex_q *q,
- struct hrtimer_sleeper *timeout)
+ struct hrtimer_sleeper *timeout,
+ struct task_struct *next)
{
/*
* The task state is guaranteed to be set before another task can
@@ -2604,10 +2621,25 @@
* flagged for rescheduling. Only call schedule if there
* is no timeout, or if it has yet to expire.
*/
- if (!timeout || timeout->task)
+ if (!timeout || timeout->task) {
+ if (next) {
+#ifdef CONFIG_SMP
+ wake_up_swap(next);
+#else
+ wake_up_process(next);
+#endif
+ put_task_struct(next);
+ next = NULL;
+ }
freezable_schedule();
+ }
}
__set_current_state(TASK_RUNNING);
+
+ if (next) {
+ wake_up_process(next);
+ put_task_struct(next);
+ }
}
/**
@@ -2683,7 +2715,7 @@
}
static int futex_wait(u32 __user *uaddr, unsigned int flags, u32 val,
- ktime_t *abs_time, u32 bitset)
+ ktime_t *abs_time, u32 bitset, struct task_struct *next)
{
struct hrtimer_sleeper timeout, *to;
struct restart_block *restart;
@@ -2707,7 +2739,8 @@
goto out;
/* queue_me and wait for wakeup, timeout, or a signal. */
- futex_wait_queue_me(hb, &q, to);
+ futex_wait_queue_me(hb, &q, to, next);
+ next = NULL;
/* If we were woken (and unqueued), we succeeded, whatever. */
ret = 0;
@@ -2739,6 +2772,10 @@
ret = set_restart_fn(restart, futex_wait_restart);
out:
+ if (next) {
+ wake_up_process(next);
+ put_task_struct(next);
+ }
if (to) {
hrtimer_cancel(&to->timer);
destroy_hrtimer_on_stack(&to->timer);
@@ -2758,10 +2795,30 @@
}
restart->fn = do_no_restart_syscall;
- return (long)futex_wait(uaddr, restart->futex.flags,
- restart->futex.val, tp, restart->futex.bitset);
+ return (long)futex_wait(uaddr, restart->futex.flags, restart->futex.val,
+ tp, restart->futex.bitset, NULL);
}
+static int futex_swap(u32 __user *uaddr, unsigned int flags, u32 val,
+ ktime_t *abs_time, u32 __user *uaddr2)
+{
+ u32 bitset = FUTEX_BITSET_MATCH_ANY;
+ struct task_struct *next = NULL;
+ DEFINE_WAKE_Q(wake_q);
+ int ret;
+
+ ret = prepare_wake_q(uaddr2, flags, 1, bitset, &wake_q);
+ if (ret < 0)
+ return ret;
+ if (wake_q.first != WAKE_Q_TAIL) {
+ WARN_ON(ret != 1);
+ /* At most one wakee can be present. Pull it out. */
+ next = container_of(wake_q.first, struct task_struct, wake_q);
+ next->wake_q.next = NULL;
+ }
+
+ return futex_wait(uaddr, flags, val, abs_time, bitset, next);
+}
/*
* Userspace tried a 0 -> TID atomic transition of the futex value
@@ -3223,7 +3280,7 @@
}
/* Queue the futex_q, drop the hb lock, wait for wakeup. */
- futex_wait_queue_me(hb, &q, to);
+ futex_wait_queue_me(hb, &q, to, NULL);
spin_lock(&hb->lock);
ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to);
@@ -3733,7 +3790,7 @@
val3 = FUTEX_BITSET_MATCH_ANY;
fallthrough;
case FUTEX_WAIT_BITSET:
- return futex_wait(uaddr, flags, val, timeout, val3);
+ return futex_wait(uaddr, flags, val, timeout, val3, NULL);
case FUTEX_WAKE:
val3 = FUTEX_BITSET_MATCH_ANY;
fallthrough;
@@ -3757,6 +3814,8 @@
uaddr2);
case FUTEX_CMP_REQUEUE_PI:
return futex_requeue(uaddr, flags, uaddr2, val, val2, &val3, 1);
+ case GFUTEX_SWAP:
+ return futex_swap(uaddr, flags, val, timeout, uaddr2);
}
return -ENOSYS;
}
@@ -3773,7 +3832,7 @@
if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI ||
cmd == FUTEX_WAIT_BITSET ||
- cmd == FUTEX_WAIT_REQUEUE_PI)) {
+ cmd == FUTEX_WAIT_REQUEUE_PI || cmd == GFUTEX_SWAP)) {
if (unlikely(should_fail_futex(!(op & FUTEX_PRIVATE_FLAG))))
return -EFAULT;
if (get_timespec64(&ts, utime))
@@ -3782,7 +3841,7 @@
return -EINVAL;
t = timespec64_to_ktime(ts);
- if (cmd == FUTEX_WAIT)
+ if (cmd == FUTEX_WAIT || cmd == GFUTEX_SWAP)
t = ktime_add_safe(ktime_get(), t);
else if (cmd != FUTEX_LOCK_PI && !(op & FUTEX_CLOCK_REALTIME))
t = timens_ktime_to_host(CLOCK_MONOTONIC, t);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 57b2362..1f2a641 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3047,6 +3047,11 @@
}
EXPORT_SYMBOL(wake_up_process);
+int wake_up_swap(struct task_struct *tsk)
+{
+ return try_to_wake_up(tsk, TASK_NORMAL, WF_CURRENT_CPU);
+}
+
int wake_up_state(struct task_struct *p, unsigned int state)
{
return try_to_wake_up(p, state, 0);
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 1ad0e52..92cd153 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -6734,6 +6734,9 @@
int want_affine = 0;
int sync = (wake_flags & WF_SYNC) && !(current->flags & PF_EXITING);
+ if ((wake_flags & WF_CURRENT_CPU) && cpumask_test_cpu(cpu, p->cpus_ptr))
+ return cpu;
+
if (sd_flag & SD_BALANCE_WAKE) {
record_wakee(p);
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index fdebfcb..91578ff 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1723,6 +1723,7 @@
#define WF_FORK 0x02 /* Child wakeup after fork */
#define WF_MIGRATED 0x04 /* Internal use, task got migrated */
#define WF_ON_CPU 0x08 /* Wakee is on_cpu */
+#define WF_CURRENT_CPU 0x200 /* Prefer to move wakee to the current CPU */
/*
* To aid in avoiding the subversion of "niceness" due to uneven distribution
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index fcbfc95..327fba7 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -16,6 +16,8 @@
#include <linux/syscalls.h>
#include <linux/error-injection.h>
#include <linux/btf_ids.h>
+#include <linux/bpf_lsm.h>
+#include <net/bpf_sk_storage.h>
#include <uapi/linux/bpf.h>
#include <uapi/linux/btf.h>
@@ -1029,6 +1031,20 @@
.ret_type = RET_INTEGER,
};
+BPF_CALL_0(bpf_get_current_task_btf)
+{
+ return (unsigned long) current;
+}
+
+BTF_ID_LIST_SINGLE(bpf_get_current_btf_ids, struct, task_struct)
+
+static const struct bpf_func_proto bpf_get_current_task_btf_proto = {
+ .func = bpf_get_current_task_btf,
+ .gpl_only = true,
+ .ret_type = RET_PTR_TO_BTF_ID,
+ .ret_btf_id = &bpf_get_current_btf_ids[0],
+};
+
BPF_CALL_2(bpf_current_task_under_cgroup, struct bpf_map *, map, u32, idx)
{
struct bpf_array *array = container_of(map, struct bpf_array, map);
@@ -1171,7 +1187,11 @@
static bool bpf_d_path_allowed(const struct bpf_prog *prog)
{
- return btf_id_set_contains(&btf_allowlist_d_path, prog->aux->attach_btf_id);
+ if (prog->type == BPF_PROG_TYPE_LSM)
+ return bpf_lsm_is_sleepable_hook(prog->aux->attach_btf_id);
+
+ return btf_id_set_contains(&btf_allowlist_d_path,
+ prog->aux->attach_btf_id);
}
BTF_ID_LIST_SINGLE(bpf_d_path_btf_ids, struct, path)
@@ -1266,12 +1286,16 @@
return &bpf_ktime_get_ns_proto;
case BPF_FUNC_ktime_get_boot_ns:
return &bpf_ktime_get_boot_ns_proto;
+ case BPF_FUNC_ktime_get_coarse_ns:
+ return &bpf_ktime_get_coarse_ns_proto;
case BPF_FUNC_tail_call:
return &bpf_tail_call_proto;
case BPF_FUNC_get_current_pid_tgid:
return &bpf_get_current_pid_tgid_proto;
case BPF_FUNC_get_current_task:
return &bpf_get_current_task_proto;
+ case BPF_FUNC_get_current_task_btf:
+ return &bpf_get_current_task_btf_proto;
case BPF_FUNC_get_current_uid_gid:
return &bpf_get_current_uid_gid_proto;
case BPF_FUNC_get_current_comm:
@@ -1726,6 +1750,8 @@
return &bpf_skc_to_tcp_request_sock_proto;
case BPF_FUNC_skc_to_udp6_sock:
return &bpf_skc_to_udp6_sock_proto;
+ case BPF_FUNC_get_socket_cookie:
+ return &bpf_get_socket_ptr_cookie_proto;
#endif
case BPF_FUNC_seq_printf:
return prog->expected_attach_type == BPF_TRACE_ITER ?
diff --git a/lib/test_bpf.c b/lib/test_bpf.c
index ca7d635..49ec9e8 100644
--- a/lib/test_bpf.c
+++ b/lib/test_bpf.c
@@ -4295,13 +4295,13 @@
{ { 0, 0xffffffff } },
.stack_depth = 40,
},
- /* BPF_STX | BPF_XADD | BPF_W/DW */
+ /* BPF_STX | BPF_ATOMIC | BPF_W/DW */
{
"STX_XADD_W: Test: 0x12 + 0x10 = 0x22",
.u.insns_int = {
BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
BPF_ST_MEM(BPF_W, R10, -40, 0x10),
- BPF_STX_XADD(BPF_W, R10, R0, -40),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, R10, R0, -40),
BPF_LDX_MEM(BPF_W, R0, R10, -40),
BPF_EXIT_INSN(),
},
@@ -4316,7 +4316,7 @@
BPF_ALU64_REG(BPF_MOV, R1, R10),
BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
BPF_ST_MEM(BPF_W, R10, -40, 0x10),
- BPF_STX_XADD(BPF_W, R10, R0, -40),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, R10, R0, -40),
BPF_ALU64_REG(BPF_MOV, R0, R10),
BPF_ALU64_REG(BPF_SUB, R0, R1),
BPF_EXIT_INSN(),
@@ -4331,7 +4331,7 @@
.u.insns_int = {
BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
BPF_ST_MEM(BPF_W, R10, -40, 0x10),
- BPF_STX_XADD(BPF_W, R10, R0, -40),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, R10, R0, -40),
BPF_EXIT_INSN(),
},
INTERNAL,
@@ -4352,7 +4352,7 @@
.u.insns_int = {
BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
- BPF_STX_XADD(BPF_DW, R10, R0, -40),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, R10, R0, -40),
BPF_LDX_MEM(BPF_DW, R0, R10, -40),
BPF_EXIT_INSN(),
},
@@ -4367,7 +4367,7 @@
BPF_ALU64_REG(BPF_MOV, R1, R10),
BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
- BPF_STX_XADD(BPF_DW, R10, R0, -40),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, R10, R0, -40),
BPF_ALU64_REG(BPF_MOV, R0, R10),
BPF_ALU64_REG(BPF_SUB, R0, R1),
BPF_EXIT_INSN(),
@@ -4382,7 +4382,7 @@
.u.insns_int = {
BPF_ALU32_IMM(BPF_MOV, R0, 0x12),
BPF_ST_MEM(BPF_DW, R10, -40, 0x10),
- BPF_STX_XADD(BPF_DW, R10, R0, -40),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, R10, R0, -40),
BPF_EXIT_INSN(),
},
INTERNAL,
diff --git a/net/core/filter.c b/net/core/filter.c
index ef6bdbb..14d9e4d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4628,6 +4628,18 @@
.arg1_type = ARG_PTR_TO_CTX,
};
+BPF_CALL_1(bpf_get_socket_ptr_cookie, struct sock *, sk)
+{
+ return sk ? sock_gen_cookie(sk) : 0;
+}
+
+const struct bpf_func_proto bpf_get_socket_ptr_cookie_proto = {
+ .func = bpf_get_socket_ptr_cookie,
+ .gpl_only = false,
+ .ret_type = RET_INTEGER,
+ .arg1_type = ARG_PTR_TO_BTF_ID_SOCK_COMMON,
+};
+
BPF_CALL_1(bpf_get_socket_cookie_sock_ops, struct bpf_sock_ops_kern *, ctx)
{
return __sock_gen_cookie(ctx->sk);
diff --git a/samples/bpf/bpf_insn.h b/samples/bpf/bpf_insn.h
index 5442379..db67a28 100644
--- a/samples/bpf/bpf_insn.h
+++ b/samples/bpf/bpf_insn.h
@@ -138,11 +138,11 @@
#define BPF_STX_XADD(SIZE, DST, SRC, OFF) \
((struct bpf_insn) { \
- .code = BPF_STX | BPF_SIZE(SIZE) | BPF_XADD, \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
- .imm = 0 })
+ .imm = BPF_ADD })
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
diff --git a/samples/bpf/cookie_uid_helper_example.c b/samples/bpf/cookie_uid_helper_example.c
index deb0e3e..c5ff7a1 100644
--- a/samples/bpf/cookie_uid_helper_example.c
+++ b/samples/bpf/cookie_uid_helper_example.c
@@ -147,12 +147,12 @@
*/
BPF_MOV64_REG(BPF_REG_9, BPF_REG_0),
BPF_MOV64_IMM(BPF_REG_1, 1),
- BPF_STX_XADD(BPF_DW, BPF_REG_9, BPF_REG_1,
- offsetof(struct stats, packets)),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_9, BPF_REG_1,
+ offsetof(struct stats, packets)),
BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_6,
offsetof(struct __sk_buff, len)),
- BPF_STX_XADD(BPF_DW, BPF_REG_9, BPF_REG_1,
- offsetof(struct stats, bytes)),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_9, BPF_REG_1,
+ offsetof(struct stats, bytes)),
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_6,
offsetof(struct __sk_buff, len)),
BPF_EXIT_INSN(),
diff --git a/samples/bpf/sock_example.c b/samples/bpf/sock_example.c
index 00aae1d..23d1930 100644
--- a/samples/bpf/sock_example.c
+++ b/samples/bpf/sock_example.c
@@ -54,7 +54,7 @@
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
- BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_0, 0), /* r0 = 0 */
BPF_EXIT_INSN(),
};
diff --git a/samples/bpf/test_cgrp2_attach.c b/samples/bpf/test_cgrp2_attach.c
index 20fbd12..390ff38 100644
--- a/samples/bpf/test_cgrp2_attach.c
+++ b/samples/bpf/test_cgrp2_attach.c
@@ -53,7 +53,7 @@
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, 1), /* r1 = 1 */
- BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_1, 0),
/* Count bytes */
BPF_MOV64_IMM(BPF_REG_0, MAP_KEY_BYTES), /* r0 = 1 */
@@ -64,7 +64,8 @@
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_LDX_MEM(BPF_W, BPF_REG_1, BPF_REG_6, offsetof(struct __sk_buff, len)), /* r1 = skb->len */
- BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_0, verdict), /* r0 = verdict */
BPF_EXIT_INSN(),
diff --git a/scripts/bpf_helpers_doc.py b/scripts/bpf_helpers_doc.py
index 3148437..8b829748 100755
--- a/scripts/bpf_helpers_doc.py
+++ b/scripts/bpf_helpers_doc.py
@@ -418,6 +418,7 @@
'struct bpf_tcp_sock',
'struct bpf_tunnel_key',
'struct bpf_xfrm_state',
+ 'struct linux_binprm',
'struct pt_regs',
'struct sk_reuseport_md',
'struct sockaddr',
@@ -435,6 +436,7 @@
'struct xdp_md',
'struct path',
'struct btf_ptr',
+ 'struct inode',
]
known_types = {
'...',
@@ -465,6 +467,7 @@
'struct bpf_tcp_sock',
'struct bpf_tunnel_key',
'struct bpf_xfrm_state',
+ 'struct linux_binprm',
'struct pt_regs',
'struct sk_reuseport_md',
'struct sockaddr',
@@ -478,6 +481,7 @@
'struct task_struct',
'struct path',
'struct btf_ptr',
+ 'struct inode',
}
mapped_types = {
'u8': '__u8',
diff --git a/security/Kconfig b/security/Kconfig
index 7561f6f..98355ed 100644
--- a/security/Kconfig
+++ b/security/Kconfig
@@ -236,6 +236,7 @@
source "security/apparmor/Kconfig"
source "security/loadpin/Kconfig"
source "security/yama/Kconfig"
+source "security/container/Kconfig"
source "security/safesetid/Kconfig"
source "security/lockdown/Kconfig"
diff --git a/security/Makefile b/security/Makefile
index 3baf435..f54e22d 100644
--- a/security/Makefile
+++ b/security/Makefile
@@ -10,6 +10,7 @@
subdir-$(CONFIG_SECURITY_APPARMOR) += apparmor
subdir-$(CONFIG_SECURITY_YAMA) += yama
subdir-$(CONFIG_SECURITY_LOADPIN) += loadpin
+subdir-$(CONFIG_SECURITY_CONTAINER_MONITOR) += container
subdir-$(CONFIG_SECURITY_SAFESETID) += safesetid
subdir-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown
subdir-$(CONFIG_BPF_LSM) += bpf
@@ -32,6 +33,7 @@
obj-$(CONFIG_SECURITY_LOCKDOWN_LSM) += lockdown/
obj-$(CONFIG_CGROUPS) += device_cgroup.o
obj-$(CONFIG_BPF_LSM) += bpf/
+obj-$(CONFIG_SECURITY_CONTAINER_MONITOR) += container/
# Object integrity file lists
subdir-$(CONFIG_INTEGRITY) += integrity
diff --git a/security/bpf/hooks.c b/security/bpf/hooks.c
index 788667d..e5971fa 100644
--- a/security/bpf/hooks.c
+++ b/security/bpf/hooks.c
@@ -12,6 +12,7 @@
#include <linux/lsm_hook_defs.h>
#undef LSM_HOOK
LSM_HOOK_INIT(inode_free_security, bpf_inode_storage_free),
+ LSM_HOOK_INIT(task_free, bpf_task_storage_free),
};
static int __init bpf_lsm_init(void)
@@ -23,6 +24,7 @@
struct lsm_blob_sizes bpf_lsm_blob_sizes __lsm_ro_after_init = {
.lbs_inode = sizeof(struct bpf_storage_blob),
+ .lbs_task = sizeof(struct bpf_storage_blob),
};
DEFINE_LSM(bpf) = {
diff --git a/security/container/Kconfig b/security/container/Kconfig
new file mode 100644
index 0000000..827fe0a
--- /dev/null
+++ b/security/container/Kconfig
@@ -0,0 +1,18 @@
+config SECURITY_CONTAINER_MONITOR
+ bool "Monitor containerized processes"
+ depends on SECURITY
+ depends on MMU
+ depends on VSOCKETS=y
+ depends on X86_64
+ select SECURITYFS
+ help
+ Instrument the Linux kernel to collect more information about containers
+ and identify security threats.
+
+config SECURITY_CONTAINER_MONITOR_DEBUG
+ bool "Enable debug pr_devel logs"
+ depends on SECURITY_CONTAINER_MONITOR
+ help
+ Define DEBUG for CSM files to compile verbose debugging messages.
+
+ Only for debugging/testing do not enable for production.
diff --git a/security/container/Makefile b/security/container/Makefile
new file mode 100644
index 0000000..678d0f7
--- /dev/null
+++ b/security/container/Makefile
@@ -0,0 +1,16 @@
+PB_CCFLAGS := -DPB_SYSTEM_HEADER="<pbsystem.h>" \
+ -DPB_NO_ERRMSG \
+ -DPB_FIELD_16BIT \
+ -DPB_BUFFER_ONLY
+export PB_CCFLAGS
+
+subdir-$(CONFIG_SECURITY_CONTAINER_MONITOR) += protos
+
+obj-$(CONFIG_SECURITY_CONTAINER_MONITOR) += protos/
+obj-$(CONFIG_SECURITY_CONTAINER_MONITOR) += monitor.o pb.o process.o vsock.o
+
+ccflags-y := -I$(srctree)/security/container/protos \
+ -I$(srctree)/security/container/protos/nanopb \
+ -I$(srctree)/fs \
+ $(PB_CCFLAGS)
+ccflags-$(CONFIG_SECURITY_CONTAINER_MONITOR_DEBUG) += -DDEBUG
diff --git a/security/container/monitor.c b/security/container/monitor.c
new file mode 100644
index 0000000..f05769b
--- /dev/null
+++ b/security/container/monitor.c
@@ -0,0 +1,782 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Container Security Monitor module
+ *
+ * Copyright (c) 2018 Google, Inc
+ */
+
+#include "monitor.h"
+#include "process.h"
+
+#include <linux/audit.h>
+#include <linux/lsm_hooks.h>
+#include <linux/module.h>
+#include <linux/pipe_fs_i.h>
+#include <linux/rwsem.h>
+#include <linux/string.h>
+#include <linux/sysctl.h>
+#include <linux/socket.h>
+#include <net/sock.h>
+#include <linux/vm_sockets.h>
+#include <linux/file.h>
+
+/* protects csm_*_enabled and configurations. */
+DECLARE_RWSEM(csm_rwsem_config);
+
+/* protects csm_host_port and csm_vsocket. */
+DECLARE_RWSEM(csm_rwsem_vsocket);
+
+/* queue used for poll wait on config changes. */
+static DECLARE_WAIT_QUEUE_HEAD(config_wait);
+
+/* increase each time a new configuration is applied. */
+static unsigned long config_version;
+
+/* Stats gathered from the LSM. */
+struct container_stats csm_stats;
+
+struct container_stats_mapping {
+ const char *key;
+ size_t *value;
+};
+
+/* Key value pair mapping for the sysfs entry. */
+struct container_stats_mapping csm_stats_mapping[] = {
+ { "ProtoEncodingFailed", &csm_stats.proto_encoding_failed },
+ { "WorkQueueFailed", &csm_stats.workqueue_failed },
+ { "EventWritingFailed", &csm_stats.event_writing_failed },
+ { "SizePickingFailed", &csm_stats.size_picking_failed },
+ { "PipeAlreadyOpened", &csm_stats.pipe_already_opened },
+};
+
+/*
+ * Is monitoring enabled? Defaults to disabled.
+ * These variables might be used without locking csm_rwsem_config to check if an
+ * LSM hook can bail quickly. The semaphore is taken later to ensure CSM is
+ * still enabled.
+ *
+ * csm_enabled is true if any collector is enabled.
+ */
+bool csm_enabled;
+static bool csm_container_enabled;
+bool csm_execute_enabled;
+bool csm_memexec_enabled;
+
+/* securityfs control files */
+static struct dentry *csm_dir;
+static struct dentry *csm_enabled_file;
+static struct dentry *csm_container_file;
+static struct dentry *csm_config_file;
+static struct dentry *csm_config_vers_file;
+static struct dentry *csm_pipe_file;
+static struct dentry *csm_stats_file;
+
+/* Pipes to forward data to user-mode. */
+DECLARE_RWSEM(csm_rwsem_pipe);
+static struct file *csm_user_read_pipe;
+struct file *csm_user_write_pipe;
+
+/* Option to disable the CSM features at boot. */
+static bool cmdline_boot_disabled;
+bool cmdline_boot_vsock_enabled;
+
+/* Options disabled by default. */
+static bool cmdline_boot_pipe_enabled;
+static bool cmdline_boot_config_enabled;
+
+/* Option to fully enabled the LSM at boot for automated testing. */
+static bool cmdline_default_enabled;
+
+static int csm_boot_disabled_setup(char *str)
+{
+ return kstrtobool(str, &cmdline_boot_disabled);
+}
+early_param("csm.disabled", csm_boot_disabled_setup);
+
+static int csm_default_enabled_setup(char *str)
+{
+ return kstrtobool(str, &cmdline_default_enabled);
+}
+early_param("csm.default.enabled", csm_default_enabled_setup);
+
+static int csm_boot_vsock_enabled_setup(char *str)
+{
+ return kstrtobool(str, &cmdline_boot_vsock_enabled);
+}
+early_param("csm.vsock.enabled", csm_boot_vsock_enabled_setup);
+
+static int csm_boot_pipe_enabled_setup(char *str)
+{
+ return kstrtobool(str, &cmdline_boot_pipe_enabled);
+}
+early_param("csm.pipe.enabled", csm_boot_pipe_enabled_setup);
+
+static int csm_boot_config_enabled_setup(char *str)
+{
+ return kstrtobool(str, &cmdline_boot_config_enabled);
+}
+early_param("csm.config.enabled", csm_boot_config_enabled_setup);
+
+static bool pipe_in_use(void)
+{
+ struct pipe_inode_info *pipe;
+
+ lockdep_assert_held_write(&csm_rwsem_config);
+ if (csm_user_read_pipe) {
+ pipe = get_pipe_info(csm_user_read_pipe, false);
+ if (pipe)
+ return READ_ONCE(pipe->readers) > 1;
+ }
+ return false;
+}
+
+/* Close pipe, force has to be true to close pipe if it is still being used. */
+int close_pipe_files(bool force)
+{
+ if (csm_user_read_pipe) {
+ /* Pipe is still used. */
+ if (pipe_in_use()) {
+ if (!force)
+ return -EBUSY;
+ pr_warn("pipe is closed while it is still being used.\n");
+ }
+
+ fput(csm_user_read_pipe);
+ fput(csm_user_write_pipe);
+ csm_user_read_pipe = NULL;
+ csm_user_write_pipe = NULL;
+ }
+ return 0;
+}
+
+static void csm_update_config(schema_ConfigurationRequest *req)
+{
+ schema_ExecuteCollectorConfig *econf;
+ size_t i;
+ bool enumerate_processes = false;
+
+ /* Expect the lock to be held for write before this call. */
+ lockdep_assert_held_write(&csm_rwsem_config);
+
+ /* This covers the scenario where a client is connected and the config
+ * transitions the execute collector from disabled to enabled. In that
+ * case there may have been execute events not sent. So they are
+ * enumerated.
+ */
+ if (!csm_execute_enabled && req->execute_config.enabled &&
+ pipe_in_use())
+ enumerate_processes = true;
+
+ csm_container_enabled = req->container_config.enabled;
+ csm_execute_enabled = req->execute_config.enabled;
+ csm_memexec_enabled = req->memexec_config.enabled;
+
+ /* csm_enabled is true if any collector is enabled. */
+ csm_enabled = csm_container_enabled || csm_execute_enabled ||
+ csm_memexec_enabled;
+
+ /* Clean-up existing configurations. */
+ kfree(csm_execute_config.envp_allowlist);
+ memset(&csm_execute_config, 0, sizeof(csm_execute_config));
+
+ if (csm_execute_enabled) {
+ econf = &req->execute_config;
+ csm_execute_config.argv_limit = econf->argv_limit;
+ csm_execute_config.envp_limit = econf->envp_limit;
+
+ /* Swap the allowlist so it is not freed on return. */
+ csm_execute_config.envp_allowlist = econf->envp_allowlist.arg;
+ econf->envp_allowlist.arg = NULL;
+ }
+
+ /* Reset all stats and close pipe if disabled. */
+ if (!csm_enabled) {
+ for (i = 0; i < ARRAY_SIZE(csm_stats_mapping); i++)
+ *csm_stats_mapping[i].value = 0;
+
+ close_pipe_files(true);
+ }
+
+ config_version++;
+ if (enumerate_processes)
+ csm_enumerate_processes();
+ wake_up(&config_wait);
+}
+
+int csm_update_config_from_buffer(void *data, size_t size)
+{
+ schema_ConfigurationRequest c = schema_ConfigurationRequest_init_zero;
+ pb_istream_t istream;
+
+ c.execute_config.envp_allowlist.funcs.decode = pb_decode_string_array;
+
+ istream = pb_istream_from_buffer(data, size);
+ if (!pb_decode(&istream, schema_ConfigurationRequest_fields, &c)) {
+ kfree(c.execute_config.envp_allowlist.arg);
+ return -EINVAL;
+ }
+
+ down_write(&csm_rwsem_config);
+ csm_update_config(&c);
+ up_write(&csm_rwsem_config);
+
+ return 0;
+}
+
+static ssize_t csm_config_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ ssize_t err = 0;
+ void *mem;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ /* No partial writes. */
+ if (*ppos != 0)
+ return -EINVAL;
+
+ /* Duplicate user memory to safely parse protobuf. */
+ mem = memdup_user(buf, count);
+ if (IS_ERR(mem))
+ return PTR_ERR(mem);
+
+ err = csm_update_config_from_buffer(mem, count);
+ if (!err)
+ err = count;
+
+ kfree(mem);
+ return err;
+}
+
+static const struct file_operations csm_config_fops = {
+ .write = csm_config_write,
+};
+
+static void csm_enable(void)
+{
+ schema_ConfigurationRequest req = schema_ConfigurationRequest_init_zero;
+
+ /* Expect the lock to be held for write before this call. */
+ lockdep_assert_held_write(&csm_rwsem_config);
+
+ /* Default configuration */
+ req.container_config.enabled = true;
+ req.execute_config.enabled = true;
+ req.execute_config.argv_limit = UINT_MAX;
+ req.execute_config.envp_limit = UINT_MAX;
+ req.memexec_config.enabled = true;
+ csm_update_config(&req);
+}
+
+static void csm_disable(void)
+{
+ schema_ConfigurationRequest req = schema_ConfigurationRequest_init_zero;
+
+ /* Expect the lock to be held for write before this call. */
+ lockdep_assert_held_write(&csm_rwsem_config);
+
+ /* Zero configuration disable all collectors. */
+ csm_update_config(&req);
+ pr_info("disabled\n");
+}
+
+static ssize_t csm_enabled_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ const char *str = csm_enabled ? "1\n" : "0\n";
+
+ return simple_read_from_buffer(buf, count, ppos, str, 2);
+}
+
+static ssize_t csm_enabled_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ bool enabled;
+ int err;
+
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+
+ if (count <= 0 || count > PAGE_SIZE || *ppos)
+ return -EINVAL;
+
+ err = kstrtobool_from_user(buf, count, &enabled);
+ if (err)
+ return err;
+
+ down_write(&csm_rwsem_config);
+
+ if (enabled)
+ csm_enable();
+ else
+ csm_disable();
+
+ up_write(&csm_rwsem_config);
+
+ return count;
+}
+
+static const struct file_operations csm_enabled_fops = {
+ .read = csm_enabled_read,
+ .write = csm_enabled_write,
+};
+
+static int csm_config_version_open(struct inode *inode, struct file *file)
+{
+ /* private_data is used to keep the latest config version read. */
+ file->private_data = (void*)-1;
+ return 0;
+}
+
+static ssize_t csm_config_version_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ unsigned long version = config_version;
+ file->private_data = (void*)version;
+ return simple_read_from_buffer(buf, count, ppos, &version,
+ sizeof(version));
+}
+
+static __poll_t csm_config_version_poll(struct file *file,
+ struct poll_table_struct *poll_tab)
+{
+ if ((unsigned long)file->private_data != config_version)
+ return EPOLLIN;
+ poll_wait(file, &config_wait, poll_tab);
+ if ((unsigned long)file->private_data != config_version)
+ return EPOLLIN;
+ return 0;
+}
+
+static const struct file_operations csm_config_version_fops = {
+ .open = csm_config_version_open,
+ .read = csm_config_version_read,
+ .poll = csm_config_version_poll,
+};
+
+static int csm_pipe_open(struct inode *inode, struct file *file)
+{
+ if (!capable(CAP_SYS_ADMIN))
+ return -EPERM;
+ if (!csm_enabled)
+ return -EAGAIN;
+ return 0;
+}
+
+/* Similar to file_clone_open that is available only in 4.19 and up. */
+static inline struct file *pipe_clone_open(struct file *file)
+{
+ return dentry_open(&file->f_path, file->f_flags, file->f_cred);
+}
+
+/* Check if the pipe is still used, else recreate and dup it. */
+static struct file *csm_dup_pipe(void)
+{
+ long pipe_size = 1024 * PAGE_SIZE;
+ long actual_size;
+ struct file *pipes[2] = {NULL, NULL};
+ struct file *ret;
+ int err;
+
+ down_write(&csm_rwsem_pipe);
+
+ err = close_pipe_files(false);
+ if (err) {
+ ret = ERR_PTR(err);
+ csm_stats.pipe_already_opened++;
+ goto out;
+ }
+
+ err = create_pipe_files(pipes, O_NONBLOCK);
+ if (err) {
+ ret = ERR_PTR(err);
+ goto out;
+ }
+
+ /*
+ * Try to increase the pipe size to 1024 pages, if there is not
+ * enough memory, pipes will stay unchanged.
+ */
+ actual_size = pipe_fcntl(pipes[0], F_SETPIPE_SZ, pipe_size);
+ if (actual_size != pipe_size)
+ pr_err("failed to resize pipe to 1024 pages, error: %ld, fallback to the default value\n",
+ actual_size);
+
+ csm_user_read_pipe = pipes[0];
+ csm_user_write_pipe = pipes[1];
+
+ /* Clone the file so we can track if the reader is still used. */
+ ret = pipe_clone_open(csm_user_read_pipe);
+
+out:
+ up_write(&csm_rwsem_pipe);
+ return ret;
+}
+
+static ssize_t csm_pipe_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ int fd;
+ ssize_t err;
+ struct file *local_pipe;
+
+ /* No partial reads. */
+ if (*ppos != 0)
+ return -EINVAL;
+
+ fd = get_unused_fd_flags(0);
+ if (fd < 0)
+ return fd;
+
+ local_pipe = csm_dup_pipe();
+ if (IS_ERR(local_pipe)) {
+ err = PTR_ERR(local_pipe);
+ local_pipe = NULL;
+ goto error;
+ }
+
+ err = simple_read_from_buffer(buf, count, ppos, &fd, sizeof(fd));
+ if (err < 0)
+ goto error;
+
+ if (err < sizeof(fd)) {
+ err = -EINVAL;
+ goto error;
+ }
+
+ /* Install the file descriptor when we know everything succeeded. */
+ fd_install(fd, local_pipe);
+
+ csm_enumerate_processes();
+
+ return err;
+
+error:
+ if (local_pipe)
+ fput(local_pipe);
+ put_unused_fd(fd);
+ return err;
+}
+
+
+static const struct file_operations csm_pipe_fops = {
+ .open = csm_pipe_open,
+ .read = csm_pipe_read,
+};
+
+static void set_container_decode_callbacks(schema_Container *container)
+{
+ container->pod_namespace.funcs.decode = pb_decode_string_field;
+ container->pod_name.funcs.decode = pb_decode_string_field;
+ container->container_name.funcs.decode = pb_decode_string_field;
+ container->container_image_uri.funcs.decode = pb_decode_string_field;
+ container->labels.funcs.decode = pb_decode_string_array;
+}
+
+static void set_container_encode_callbacks(schema_Container *container)
+{
+ container->pod_namespace.funcs.encode = pb_encode_string_field;
+ container->pod_name.funcs.encode = pb_encode_string_field;
+ container->container_name.funcs.encode = pb_encode_string_field;
+ container->container_image_uri.funcs.encode = pb_encode_string_field;
+ container->labels.funcs.encode = pb_encode_string_array;
+}
+
+static void free_container_callbacks_args(schema_Container *container)
+{
+ kfree(container->pod_namespace.arg);
+ kfree(container->pod_name.arg);
+ kfree(container->container_name.arg);
+ kfree(container->container_image_uri.arg);
+ kfree(container->labels.arg);
+}
+
+static ssize_t csm_container_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ ssize_t err = 0;
+ void *mem;
+ u64 cid;
+ pb_istream_t istream;
+ struct task_struct *task;
+ schema_ContainerReport report = schema_ContainerReport_init_zero;
+ schema_Event event = schema_Event_init_zero;
+ schema_Container *container;
+ char *uuid = NULL;
+
+ /* Notify that this collector is not yet enabled. */
+ if (!csm_container_enabled)
+ return -EAGAIN;
+
+ /* No partial writes. */
+ if (*ppos != 0)
+ return -EINVAL;
+
+ /* Duplicate user memory to safely parse protobuf. */
+ mem = memdup_user(buf, count);
+ if (IS_ERR(mem))
+ return PTR_ERR(mem);
+
+ /* Callback to decode string in protobuf. */
+ set_container_decode_callbacks(&report.container);
+
+ istream = pb_istream_from_buffer(mem, count);
+ if (!pb_decode(&istream, schema_ContainerReport_fields, &report)) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ /* Check protobuf is as expected */
+ if (report.pid == 0 ||
+ report.container.container_id != 0) {
+ err = -EINVAL;
+ goto out;
+ }
+
+ /* Find if the process id is linked to an existing container-id. */
+ rcu_read_lock();
+ task = find_task_by_pid_ns(report.pid, &init_pid_ns);
+ if (task) {
+ cid = audit_get_contid(task);
+ if (cid == AUDIT_CID_UNSET)
+ err = -ENOENT;
+ } else {
+ err = -ENOENT;
+ }
+ rcu_read_unlock();
+
+ if (err)
+ goto out;
+
+ uuid = kzalloc(PROCESS_UUID_SIZE, GFP_KERNEL);
+ if (!uuid)
+ goto out;
+
+ /* Provide the uuid for the top process of the container. */
+ err = get_process_uuid_by_pid(report.pid, uuid, PROCESS_UUID_SIZE);
+ if (err)
+ goto out;
+
+ /* Correct the container-id and feed the event to vsock */
+ report.container.container_id = cid;
+ report.container.init_uuid.funcs.encode = pb_encode_uuid_field;
+ report.container.init_uuid.arg = uuid;
+ container = &event.event.container.container;
+ *container = report.container;
+
+ /* Use encode callback to generate the final proto. */
+ set_container_encode_callbacks(container);
+
+ event.which_event = schema_Event_container_tag;
+
+ err = csm_sendeventproto(schema_Event_fields, &event);
+ if (!err)
+ err = count;
+
+out:
+ /* Free any allocated nanopb callback arguments. */
+ free_container_callbacks_args(&report.container);
+ kfree(uuid);
+ kfree(mem);
+ return err;
+}
+
+static const struct file_operations csm_container_fops = {
+ .write = csm_container_write,
+};
+
+static int csm_show_stats(struct seq_file *p, void *v)
+{
+ size_t i;
+
+ for (i = 0; i < ARRAY_SIZE(csm_stats_mapping); i++) {
+ seq_printf(p, "%s:\t%zu\n",
+ csm_stats_mapping[i].key,
+ *csm_stats_mapping[i].value);
+ }
+
+ return 0;
+}
+
+static int csm_stats_open(struct inode *inode, struct file *file)
+{
+ size_t i, size = 1; /* Start at one for the null byte. */
+
+ for (i = 0; i < ARRAY_SIZE(csm_stats_mapping); i++) {
+ /*
+ * Calculate the maximum length:
+ * - Length of the key
+ * - 3 additional chars :\t\n
+ * - longest unsigned 64-bit integer.
+ */
+ size += strlen(csm_stats_mapping[i].key)
+ + 3 + sizeof("18446744073709551615");
+ }
+
+ return single_open_size(file, csm_show_stats, NULL, size);
+}
+
+static const struct file_operations csm_stats_fops = {
+ .open = csm_stats_open,
+ .read = seq_read,
+ .llseek = seq_lseek,
+ .release = single_release,
+};
+
+/* Prevent user-mode from using vsock on our port. */
+static int csm_socket_connect(struct socket *sock, struct sockaddr *address,
+ int addrlen)
+{
+ struct sockaddr_vm *vaddr = (struct sockaddr_vm *)address;
+
+ /* Filter only vsock sockets */
+ if (!sock->sk || sock->sk->sk_family != AF_VSOCK)
+ return 0;
+
+ /* Allow kernel sockets. */
+ if (sock->sk->sk_kern_sock)
+ return 0;
+
+ if (addrlen < sizeof(*vaddr))
+ return -EINVAL;
+
+ /* Forbid access to the CSM VMM backend port. */
+ if (vaddr->svm_port == CSM_HOST_PORT)
+ return -EPERM;
+
+ return 0;
+}
+
+static int csm_setxattr(struct dentry *dentry, const char *name,
+ const void *value, size_t size, int flags)
+{
+ if (csm_enabled && !strcmp(name, XATTR_SECURITY_CSM))
+ return -EPERM;
+ return 0;
+}
+
+static struct security_hook_list csm_hooks[] __lsm_ro_after_init = {
+ /* Track process execution. */
+ LSM_HOOK_INIT(bprm_check_security, csm_bprm_check_security),
+ LSM_HOOK_INIT(task_post_alloc, csm_task_post_alloc),
+ LSM_HOOK_INIT(task_exit, csm_task_exit),
+
+ /* Block vsock access when relevant. */
+ LSM_HOOK_INIT(socket_connect, csm_socket_connect),
+
+ /* Track memory execution */
+ LSM_HOOK_INIT(file_mprotect, csm_mprotect),
+ LSM_HOOK_INIT(mmap_file, csm_mmap_file),
+
+ /* Track file modification provenance. */
+ LSM_HOOK_INIT(file_pre_free_security, csm_file_pre_free),
+
+ /* Block modyfing csm xattr. */
+ LSM_HOOK_INIT(inode_setxattr, csm_setxattr),
+};
+
+static int __init csm_init(void)
+{
+ int err;
+
+ if (cmdline_boot_disabled)
+ return 0;
+
+ /*
+ * If cmdline_boot_vsock_enabled is false, only the event pool will be
+ * allocated. The destroy function will clean-up only what was reserved.
+ */
+ err = vsock_initialize();
+ if (err)
+ return err;
+
+ csm_dir = securityfs_create_dir("container_monitor", NULL);
+ if (IS_ERR(csm_dir)) {
+ err = PTR_ERR(csm_dir);
+ goto error;
+ }
+
+ csm_enabled_file = securityfs_create_file("enabled", 0644, csm_dir,
+ NULL, &csm_enabled_fops);
+ if (IS_ERR(csm_enabled_file)) {
+ err = PTR_ERR(csm_enabled_file);
+ goto error_rmdir;
+ }
+
+ csm_container_file = securityfs_create_file("container", 0200, csm_dir,
+ NULL, &csm_container_fops);
+ if (IS_ERR(csm_container_file)) {
+ err = PTR_ERR(csm_container_file);
+ goto error_rm_enabled;
+ }
+
+ csm_config_vers_file = securityfs_create_file("config_version", 0400,
+ csm_dir, NULL,
+ &csm_config_version_fops);
+ if (IS_ERR(csm_config_vers_file)) {
+ err = PTR_ERR(csm_config_vers_file);
+ goto error_rm_container;
+ }
+
+ if (cmdline_boot_config_enabled) {
+ csm_config_file = securityfs_create_file("config", 0200,
+ csm_dir, NULL,
+ &csm_config_fops);
+ if (IS_ERR(csm_config_file)) {
+ err = PTR_ERR(csm_config_file);
+ goto error_rm_config_vers;
+ }
+ }
+
+ if (cmdline_boot_pipe_enabled) {
+ csm_pipe_file = securityfs_create_file("pipe", 0400, csm_dir,
+ NULL, &csm_pipe_fops);
+ if (IS_ERR(csm_pipe_file)) {
+ err = PTR_ERR(csm_pipe_file);
+ goto error_rm_config;
+ }
+ }
+
+ csm_stats_file = securityfs_create_file("stats", 0400, csm_dir,
+ NULL, &csm_stats_fops);
+ if (IS_ERR(csm_stats_file)) {
+ err = PTR_ERR(csm_stats_file);
+ goto error_rm_pipe;
+ }
+
+ pr_debug("created securityfs control files\n");
+
+ security_add_hooks(csm_hooks, ARRAY_SIZE(csm_hooks), "csm");
+ pr_debug("registered hooks\n");
+
+ /* Off-by-default, only used for testing images. */
+ if (cmdline_default_enabled) {
+ down_write(&csm_rwsem_config);
+ csm_enable();
+ up_write(&csm_rwsem_config);
+ }
+
+ return 0;
+
+error_rm_pipe:
+ if (cmdline_boot_pipe_enabled)
+ securityfs_remove(csm_pipe_file);
+error_rm_config:
+ if (cmdline_boot_config_enabled)
+ securityfs_remove(csm_config_file);
+error_rm_config_vers:
+ securityfs_remove(csm_config_vers_file);
+error_rm_container:
+ securityfs_remove(csm_container_file);
+error_rm_enabled:
+ securityfs_remove(csm_enabled_file);
+error_rmdir:
+ securityfs_remove(csm_dir);
+error:
+ vsock_destroy();
+ pr_warn("fs initialization error: %d", err);
+ return err;
+}
+
+late_initcall(csm_init);
diff --git a/security/container/monitor.h b/security/container/monitor.h
new file mode 100644
index 0000000..cab3d5b
--- /dev/null
+++ b/security/container/monitor.h
@@ -0,0 +1,123 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Container Security Monitor module
+ *
+ * Copyright (c) 2018 Google, Inc
+ */
+
+#define pr_fmt(fmt) "container-security-monitor: " fmt
+
+#include <linux/kernel.h>
+#include <linux/security.h>
+#include <linux/fs.h>
+#include <linux/rwsem.h>
+#include <linux/binfmts.h>
+#include <linux/xattr.h>
+#include <config.pb.h>
+#include <event.pb.h>
+#include <pb_encode.h>
+#include <pb_decode.h>
+
+#include "monitoring_protocol.h"
+
+/* Part of the CSM configuration response. */
+#define CSM_VERSION 1
+
+/* protects csm_*_enabled and configurations. */
+extern struct rw_semaphore csm_rwsem_config;
+
+/* protects csm_host_port and csm_vsocket. */
+extern struct rw_semaphore csm_rwsem_vsocket;
+
+/* Port to connect to the host on, over virtio-vsock */
+#define CSM_HOST_PORT 4444
+
+/*
+ * Is monitoring enabled? Defaults to disabled.
+ * These variables might be used as gates without locking (as processor ensures
+ * valid proper access for native scalar values) so it can bail quickly.
+ */
+extern bool csm_enabled;
+extern bool csm_execute_enabled;
+extern bool csm_memexec_enabled;
+
+/* Configuration options for execute collector. */
+struct execute_config {
+ size_t argv_limit;
+ size_t envp_limit;
+ char *envp_allowlist;
+};
+
+extern struct execute_config csm_execute_config;
+
+/* pipe to forward vsock packets to user-mode. */
+extern struct rw_semaphore csm_rwsem_pipe;
+extern struct file *csm_user_write_pipe;
+
+/* Was vsock enabled at boot time? */
+extern bool cmdline_boot_vsock_enabled;
+
+/* Stats on LSM events. */
+struct container_stats {
+ size_t proto_encoding_failed;
+ size_t event_writing_failed;
+ size_t workqueue_failed;
+ size_t size_picking_failed;
+ size_t pipe_already_opened;
+};
+
+extern struct container_stats csm_stats;
+
+/* Streams file numbers are unknown from the kernel */
+#define STDIN_FILENO 0
+#define STDOUT_FILENO 1
+#define STDERR_FILENO 2
+
+/* security attribute for file provenance. */
+#define XATTR_SECURITY_CSM XATTR_SECURITY_PREFIX "csm"
+
+/* monitor functions */
+int csm_update_config_from_buffer(void *data, size_t size);
+
+/* vsock functions */
+int vsock_initialize(void);
+void vsock_destroy(void);
+int vsock_late_initialize(void);
+int csm_sendeventproto(const pb_field_t fields[], schema_Event *event);
+int csm_sendconfigrespproto(const pb_field_t fields[],
+ schema_ConfigurationResponse *resp);
+
+/* process events functions */
+int csm_bprm_check_security(struct linux_binprm *bprm);
+void csm_task_exit(struct task_struct *task);
+void csm_task_post_alloc(struct task_struct *task);
+int get_process_uuid_by_pid(pid_t pid_nr, char *buffer, size_t size);
+
+/* memory execution events functions */
+int csm_mprotect(struct vm_area_struct *vma, unsigned long reqprot,
+ unsigned long prot);
+int csm_mmap_file(struct file *file, unsigned long reqprot,
+ unsigned long prot, unsigned long flags);
+
+/* Tracking of file modification provenance. */
+void csm_file_pre_free(struct file *file);
+
+/* nano functions */
+bool pb_encode_string_field(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg);
+bool pb_decode_string_field(pb_istream_t *stream, const pb_field_t *field,
+ void **arg);
+ssize_t pb_encode_string_field_limit(pb_ostream_t *stream,
+ const pb_field_t *field,
+ void * const *arg, size_t limit);
+bool pb_encode_string_array(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg);
+bool pb_decode_string_array(pb_istream_t *stream, const pb_field_t *field,
+ void **arg);
+bool pb_encode_uuid_field(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg);
+bool pb_encode_ip4(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg);
+bool pb_encode_ip6(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg);
+
diff --git a/security/container/monitoring_protocol.h b/security/container/monitoring_protocol.h
new file mode 100644
index 0000000..dbdfc9c
--- /dev/null
+++ b/security/container/monitoring_protocol.h
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+
+/* Container security monitoring protocol definitions */
+
+#include <linux/types.h>
+
+enum csm_msgtype {
+ CSM_MSG_TYPE_HEARTBEAT = 1,
+ CSM_MSG_EVENT_PROTO = 2,
+ CSM_MSG_CONFIG_REQUEST_PROTO = 3,
+ CSM_MSG_CONFIG_RESPONSE_PROTO = 4,
+};
+
+struct csm_msg_hdr {
+ __le32 msg_type;
+ __le32 msg_length;
+};
+
+/* The process uuid is a 128-bits identifier */
+#define PROCESS_UUID_SIZE 16
+
+/* The entire structure forms the collision domain. */
+union process_uuid {
+ struct {
+ __u32 machineid;
+ __u64 start_time;
+ __u32 tgid;
+ } __attribute__((packed));
+ __u8 data[PROCESS_UUID_SIZE];
+};
diff --git a/security/container/pb.c b/security/container/pb.c
new file mode 100644
index 0000000..f24e58f
--- /dev/null
+++ b/security/container/pb.c
@@ -0,0 +1,175 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Container Security Monitor module
+ *
+ * Copyright (c) 2018 Google, Inc
+ */
+
+#include "monitor.h"
+
+#include <linux/string.h>
+#include <net/sock.h>
+#include <net/tcp.h>
+#include <net/ipv6.h>
+
+bool pb_encode_string_field(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg)
+{
+ const uint8_t *str = (const uint8_t *)*arg;
+
+ /* If the string is not set, skip this string. */
+ if (!str)
+ return true;
+
+ if (!pb_encode_tag_for_field(stream, field))
+ return false;
+
+ return pb_encode_string(stream, str, strlen(str));
+}
+
+bool pb_decode_string_field(pb_istream_t *stream, const pb_field_t *field,
+ void **arg)
+{
+ size_t size;
+ void *data;
+
+ *arg = NULL;
+
+ size = stream->bytes_left;
+
+ /* Ensure a null-byte at the end */
+ if (size + 1 < size)
+ return false;
+
+ data = kzalloc(size + 1, GFP_KERNEL);
+ if (!data)
+ return false;
+
+ if (!pb_read(stream, data, size)) {
+ kfree(data);
+ return false;
+ }
+
+ *arg = data;
+
+ return true;
+}
+
+bool pb_encode_string_array(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg)
+{
+ char *strs = (char *)*arg;
+
+ /* If the string array is not set, skip this string array. */
+ if (!strs)
+ return true;
+
+ do {
+ if (!pb_encode_string_field(stream, field,
+ (void * const *) &strs))
+ return false;
+
+ strs += strlen(strs) + 1;
+ } while (*strs != 0);
+
+ return true;
+}
+
+/* Limit the encoded string size and return how many characters were added. */
+ssize_t pb_encode_string_field_limit(pb_ostream_t *stream,
+ const pb_field_t *field,
+ void * const *arg, size_t limit)
+{
+ char *str = (char *)*arg;
+ size_t length;
+
+ /* If the string is not set, skip this string. */
+ if (!str)
+ return 0;
+
+ if (!pb_encode_tag_for_field(stream, field))
+ return -EINVAL;
+
+ length = strlen(str);
+ if (length > limit)
+ length = limit;
+
+ if (!pb_encode_string(stream, (uint8_t *)str, length))
+ return -EINVAL;
+
+ return length;
+}
+
+bool pb_decode_string_array(pb_istream_t *stream, const pb_field_t *field,
+ void **arg)
+{
+ size_t needed, used = 0;
+ char *data, *strs;
+
+ /* String length, and two null-bytes for the end of the list. */
+ needed = stream->bytes_left + 2;
+ if (needed < stream->bytes_left)
+ return false;
+
+ if (*arg) {
+ /* Calculate used space from the current list. */
+ strs = (char *)*arg;
+ do {
+ used += strlen(strs + used) + 1;
+ } while (strs[used] != 0);
+
+ if (used + needed < needed)
+ return false;
+ }
+
+ data = krealloc(*arg, used + needed, GFP_KERNEL);
+ if (!data)
+ return false;
+
+ /* Will always be freed by the caller */
+ *arg = data;
+
+ /* Reset the new part of the buffer. */
+ memset(data + used, 0, needed);
+
+ /* Read what's in the stream buffer only. */
+ if (!pb_read(stream, data + used, stream->bytes_left))
+ return false;
+
+ return true;
+}
+
+bool pb_encode_fixed_string(pb_ostream_t *stream, const pb_field_t *field,
+ const uint8_t *data, size_t length)
+{
+ /* If the data is not set, skip this string. */
+ if (!data)
+ return true;
+
+ if (!pb_encode_tag_for_field(stream, field))
+ return false;
+
+ return pb_encode_string(stream, data, length);
+}
+
+
+bool pb_encode_uuid_field(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg)
+{
+ return pb_encode_fixed_string(stream, field, (const uint8_t *)*arg,
+ PROCESS_UUID_SIZE);
+}
+
+bool pb_encode_ip4(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg)
+{
+ return pb_encode_fixed_string(stream, field, (const uint8_t *)*arg,
+ sizeof(struct in_addr));
+}
+
+bool pb_encode_ip6(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg)
+{
+ return pb_encode_fixed_string(stream, field, (const uint8_t *)*arg,
+ sizeof(struct in6_addr));
+}
diff --git a/security/container/process.c b/security/container/process.c
new file mode 100644
index 0000000..a520713
--- /dev/null
+++ b/security/container/process.c
@@ -0,0 +1,1149 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Container Security Monitor module
+ *
+ * Copyright (c) 2018 Google, Inc
+ */
+
+#include "monitor.h"
+
+#include <linux/atomic.h>
+#include <linux/audit.h>
+#include <linux/file.h>
+#include <linux/highmem.h>
+#include <linux/mempool.h>
+#include <linux/mm.h>
+#include <linux/mount.h>
+#include <linux/notifier.h>
+#include <linux/net.h>
+#include <linux/path.h>
+#include <linux/pid.h>
+#include <linux/pid_namespace.h>
+#include <linux/random.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/sched/signal.h>
+#include <linux/sched/task.h>
+#include <linux/slab.h>
+#include <linux/socket.h>
+#include <linux/timekeeping.h>
+#include <linux/vmalloc.h>
+#include <linux/workqueue.h>
+#include <linux/xattr.h>
+#include <net/ipv6.h>
+#include <net/sock.h>
+#include <net/tcp.h>
+#include <overlayfs/overlayfs.h>
+#include <uapi/linux/magic.h>
+#include <uapi/asm/mman.h>
+
+/* Configuration options for execute collector. */
+struct execute_config csm_execute_config;
+
+/* unique atomic value for the machine boot instance */
+static atomic_t machine_rand = ATOMIC_INIT(0);
+
+/* sequential container identifier */
+static atomic_t contid = ATOMIC_INIT(0);
+
+/* Generation id for each enumeration invocation. */
+static atomic_t enumeration_count = ATOMIC_INIT(0);
+
+struct file_provenance {
+ /* pid of the process doing the first write. */
+ pid_t tgid;
+ /* start_time of the process to uniquely identify it. */
+ u64 start_time;
+};
+
+struct csm_enumerate_processes_work_data {
+ struct work_struct work;
+ int enumeration_count;
+};
+
+static void *kmap_argument_stack(struct linux_binprm *bprm, void **ctx)
+{
+ char *argv;
+ int err;
+ unsigned long i, pos, count;
+ void *map;
+ struct page *page;
+
+ /* vma_pages() returns the number of pages reserved for the stack */
+ count = vma_pages(bprm->vma);
+
+ if (likely(count == 1)) {
+ err = get_user_pages_remote(bprm->mm, bprm->p, 1,
+ FOLL_FORCE, &page, NULL, NULL);
+ if (err != 1)
+ return NULL;
+
+ argv = kmap(page);
+ *ctx = page;
+ } else {
+ /*
+ * If more than one pages is needed, copy all of them to a set
+ * of pages. Parsing the argument across kmap pages in different
+ * addresses would make it impractical.
+ */
+ argv = vmalloc(count * PAGE_SIZE);
+ if (!argv)
+ return NULL;
+
+ for (i = 0; i < count; i++) {
+ pos = ALIGN_DOWN(bprm->p, PAGE_SIZE) + i * PAGE_SIZE;
+ err = get_user_pages_remote(bprm->mm, pos, 1,
+ FOLL_FORCE, &page, NULL,
+ NULL);
+ if (err <= 0) {
+ vfree(argv);
+ return NULL;
+ }
+
+ map = kmap(page);
+ memcpy(argv + i * PAGE_SIZE, map, PAGE_SIZE);
+ kunmap(page);
+ put_page(page);
+ }
+ *ctx = bprm;
+ }
+
+ return argv;
+}
+
+static void kunmap_argument_stack(struct linux_binprm *bprm, void *addr,
+ void *ctx)
+{
+ struct page *page;
+
+ if (!addr)
+ return;
+
+ if (likely(vma_pages(bprm->vma) == 1)) {
+ page = (struct page *)ctx;
+ kunmap(page);
+ put_page(ctx);
+ } else {
+ vfree(addr);
+ }
+}
+
+static char *find_array_next_entry(char *array, unsigned long *offset,
+ unsigned long end)
+{
+ char *entry;
+ unsigned long off = *offset;
+
+ if (off >= end)
+ return NULL;
+
+ /* Check the entry is null terminated and in bound */
+ entry = array + off;
+ while (array[off]) {
+ if (++off >= end)
+ return NULL;
+ }
+
+ /* Pass the null byte for the next iteration */
+ *offset = off + 1;
+
+ return entry;
+}
+
+struct string_arr_ctx {
+ struct linux_binprm *bprm;
+ void *stack;
+};
+
+static size_t get_config_limit(size_t *config_ptr)
+{
+ lockdep_assert_held_read(&csm_rwsem_config);
+
+ /*
+ * If execute is not enabled, do not capture arguments.
+ * The vsock packet won't be sent anyway.
+ */
+ if (!csm_execute_enabled)
+ return 0;
+
+ return *config_ptr;
+}
+
+static bool encode_current_argv(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg)
+{
+ struct string_arr_ctx *ctx = (struct string_arr_ctx *)*arg;
+ int i;
+ struct linux_binprm *bprm = ctx->bprm;
+ unsigned long offset = bprm->p % PAGE_SIZE;
+ unsigned long end = vma_pages(bprm->vma) * PAGE_SIZE;
+ char *argv = ctx->stack;
+ char *entry;
+ size_t limit, used = 0;
+ ssize_t ret;
+
+ limit = get_config_limit(&csm_execute_config.argv_limit);
+ if (!limit)
+ return true;
+
+ for (i = 0; i < bprm->argc; i++) {
+ entry = find_array_next_entry(argv, &offset, end);
+ if (!entry)
+ return false;
+
+ ret = pb_encode_string_field_limit(stream, field,
+ (void * const *)&entry,
+ limit - used);
+ if (ret < 0)
+ return false;
+
+ used += ret;
+
+ if (used >= limit)
+ break;
+ }
+
+ return true;
+}
+
+static bool check_envp_allowlist(char *envp)
+{
+ bool ret = false;
+ char *strs, *equal;
+ size_t str_size, equal_pos;
+
+ /* If execute is not enabled, skip all. */
+ if (!csm_execute_enabled)
+ goto out;
+
+ /* No filter, allow all. */
+ strs = csm_execute_config.envp_allowlist;
+ if (!strs) {
+ ret = true;
+ goto out;
+ }
+
+ /*
+ * Identify the key=value separation.
+ * If none exists use the whole string as a key.
+ */
+ equal = strchr(envp, '=');
+ equal_pos = equal ? (equal - envp) : strlen(envp);
+
+ /* Default to skip if no match found. */
+ ret = false;
+
+ do {
+ str_size = strlen(strs);
+
+ /*
+ * If the filter length align with the key value equal sign,
+ * it might be a match, check the key value.
+ */
+ if (str_size == equal_pos &&
+ !strncmp(strs, envp, str_size)) {
+ ret = true;
+ goto out;
+ }
+
+ strs += str_size + 1;
+ } while (*strs != 0);
+
+out:
+ return ret;
+}
+
+static bool encode_current_envp(pb_ostream_t *stream, const pb_field_t *field,
+ void * const *arg)
+{
+ struct string_arr_ctx *ctx = (struct string_arr_ctx *)*arg;
+ int i;
+ struct linux_binprm *bprm = ctx->bprm;
+ unsigned long offset = bprm->p % PAGE_SIZE;
+ unsigned long end = vma_pages(bprm->vma) * PAGE_SIZE;
+ char *argv = ctx->stack;
+ char *entry;
+ size_t limit, used = 0;
+ ssize_t ret;
+
+ limit = get_config_limit(&csm_execute_config.envp_limit);
+ if (!limit)
+ return true;
+
+ /* Skip arguments */
+ for (i = 0; i < bprm->argc; i++) {
+ if (!find_array_next_entry(argv, &offset, end))
+ return false;
+ }
+
+ for (i = 0; i < bprm->envc; i++) {
+ entry = find_array_next_entry(argv, &offset, end);
+ if (!entry)
+ return false;
+
+ if (!check_envp_allowlist(entry))
+ continue;
+
+ ret = pb_encode_string_field_limit(stream, field,
+ (void * const *)&entry,
+ limit - used);
+ if (ret < 0)
+ return false;
+
+ used += ret;
+
+ if (used >= limit)
+ break;
+ }
+
+ return true;
+}
+
+static bool is_overlayfs_mounted(struct file *file)
+{
+ struct vfsmount *mnt;
+ struct super_block *mnt_sb;
+
+ mnt = file->f_path.mnt;
+ if (mnt == NULL)
+ return false;
+
+ mnt_sb = mnt->mnt_sb;
+ if (mnt_sb == NULL || mnt_sb->s_magic != OVERLAYFS_SUPER_MAGIC)
+ return false;
+
+ return true;
+}
+
+/*
+ * Before the process starts, identify a possible container by checking if the
+ * task is on a pid namespace and the target file is using an overlayfs mounting
+ * point. This check is valid for COS and GKE but not all existing containers.
+ */
+static bool is_possible_container(struct task_struct *task,
+ struct file *file)
+{
+ if (task_active_pid_ns(task) == &init_pid_ns)
+ return false;
+
+ return is_overlayfs_mounted(file);
+}
+
+/*
+ * Generates a random identifier for this boot instance.
+ * This identifier is generated only when needed to increase the entropy
+ * available compared to doing it at early boot.
+ */
+static u32 get_machine_id(void)
+{
+ int machineid, old;
+
+ machineid = atomic_read(&machine_rand);
+
+ if (unlikely(machineid == 0)) {
+ machineid = (int)get_random_int();
+ if (machineid == 0)
+ machineid = 1;
+ old = atomic_cmpxchg(&machine_rand, 0, machineid);
+
+ /* If someone beat us, use their value. */
+ if (old != 0)
+ machineid = old;
+ }
+
+ return (u32)machineid;
+}
+
+/*
+ * Generate a 128-bit unique identifier for the process by appending:
+ * - A machine identifier unique per boot.
+ * - The start time of the process in nanoseconds.
+ * - The tgid for the set of threads in a process.
+ */
+static int get_process_uuid(struct task_struct *task, char *buffer, size_t size)
+{
+ union process_uuid *id = (union process_uuid *)buffer;
+
+ memset(buffer, 0, size);
+
+ if (WARN_ON(size < PROCESS_UUID_SIZE))
+ return -EINVAL;
+
+ id->machineid = get_machine_id();
+ id->start_time = ktime_mono_to_real(task->group_leader->start_time);
+ id->tgid = task_tgid_nr(task);
+
+ return 0;
+}
+
+int get_process_uuid_by_pid(pid_t pid_nr, char *buffer, size_t size)
+{
+ int err;
+ struct task_struct *task = NULL;
+
+ rcu_read_lock();
+ task = find_task_by_pid_ns(pid_nr, &init_pid_ns);
+ if (!task) {
+ err = -ENOENT;
+ goto out;
+ }
+ err = get_process_uuid(task, buffer, size);
+out:
+ rcu_read_unlock();
+ return err;
+}
+
+static int get_process_uuid_from_xattr(struct file *file, char *buffer,
+ size_t size)
+{
+ struct dentry *dentry;
+ int err;
+ struct file_provenance prov;
+ union process_uuid *id = (union process_uuid *)buffer;
+
+ memset(buffer, 0, size);
+
+ if (WARN_ON(size < PROCESS_UUID_SIZE))
+ return -EINVAL;
+
+ /* The file is part of overlayfs on the upper layer. */
+ if (!is_overlayfs_mounted(file))
+ return -ENODATA;
+
+ dentry = ovl_dentry_upper(file->f_path.dentry);
+ if (!dentry)
+ return -ENODATA;
+
+ err = __vfs_getxattr(dentry, dentry->d_inode,
+ XATTR_SECURITY_CSM, &prov, sizeof(prov));
+ /* returns -ENODATA if the xattr does not exist. */
+ if (err < 0)
+ return err;
+ if (err != sizeof(prov)) {
+ pr_err("unexpected size for xattr: %zu -> %d\n",
+ size, err);
+ return -ENODATA;
+ }
+
+ id->machineid = get_machine_id();
+ id->start_time = prov.start_time;
+ id->tgid = prov.tgid;
+ return 0;
+}
+
+u64 csm_set_contid(struct task_struct *task)
+{
+ u64 cid;
+ struct pid_namespace *ns;
+
+ ns = task_active_pid_ns(task);
+ if (WARN_ON(!task->audit) || WARN_ON(!ns))
+ return AUDIT_CID_UNSET;
+
+ cid = atomic_inc_return(&contid);
+ task->audit->contid = cid;
+
+ /*
+ * If the namespace container-id is not set, use the one assigned
+ * to the first process created.
+ */
+ cmpxchg(&ns->cid, 0, cid);
+ return cid;
+}
+
+u64 csm_get_ns_contid(struct pid_namespace *ns)
+{
+ if (!ns || !ns->cid)
+ return AUDIT_CID_UNSET;
+
+ return ns->cid;
+}
+
+union ip_data {
+ struct in_addr ip4;
+ struct in6_addr ip6;
+};
+
+struct file_data {
+ void *allocated;
+ union ip_data local;
+ union ip_data remote;
+ char modified_uuid[PROCESS_UUID_SIZE];
+};
+
+static void free_file_data(struct file_data *fdata)
+{
+ free_page((unsigned long)fdata->allocated);
+ fdata->allocated = NULL;
+}
+
+static void fill_socket_description(struct sockaddr_storage *saddr,
+ union ip_data *idata,
+ schema_SocketIp *schema_socketip)
+{
+ struct sockaddr_in *sin4 = (struct sockaddr_in *)saddr;
+ struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *)saddr;
+
+ schema_socketip->family = saddr->ss_family;
+
+ switch (saddr->ss_family) {
+ case AF_INET:
+ schema_socketip->port = ntohs(sin4->sin_port);
+ idata->ip4 = sin4->sin_addr;
+ schema_socketip->ip.funcs.encode = pb_encode_ip4;
+ schema_socketip->ip.arg = &idata->ip4;
+ break;
+ case AF_INET6:
+ schema_socketip->port = ntohs(sin6->sin6_port);
+ idata->ip6 = sin6->sin6_addr;
+ schema_socketip->ip.funcs.encode = pb_encode_ip6;
+ schema_socketip->ip.arg = &idata->ip6;
+ break;
+ }
+}
+
+static int fill_file_overlayfs(struct file *file, schema_File *schema_file,
+ struct file_data *fdata)
+{
+ struct dentry *dentry;
+ int err;
+ schema_Overlay *overlayfs;
+
+ /* If not an overlayfs superblock, done. */
+ if (!is_overlayfs_mounted(file))
+ return 0;
+
+ dentry = file->f_path.dentry;
+ schema_file->which_filesystem = schema_File_overlayfs_tag;
+ overlayfs = &schema_file->filesystem.overlayfs;
+ overlayfs->lower_layer = ovl_dentry_lower(dentry);
+ overlayfs->upper_layer = ovl_dentry_upper(dentry);
+
+ err = get_process_uuid_from_xattr(file, fdata->modified_uuid,
+ sizeof(fdata->modified_uuid));
+ /* If there is no xattr, just skip the modified_uuid field. */
+ if (err == -ENODATA)
+ return 0;
+ if (err < 0)
+ return err;
+
+ overlayfs->modified_uuid.funcs.encode = pb_encode_uuid_field;
+ overlayfs->modified_uuid.arg = fdata->modified_uuid;
+ return 0;
+}
+
+static int fill_file_description(struct file *file, schema_File *schema_file,
+ struct file_data *fdata)
+{
+ char *buf;
+ int err;
+ u32 mode;
+ char *path;
+ struct socket *socket;
+ schema_Socket *socketfs;
+ struct sockaddr_storage saddr;
+
+ memset(fdata, 0, sizeof(*fdata));
+
+ if (file == NULL)
+ return 0;
+
+ schema_file->ino = file_inode(file)->i_ino;
+ mode = file_inode(file)->i_mode;
+
+ /* For pipes, no need to resolve the path. */
+ if (S_ISFIFO(mode))
+ return 0;
+
+ if (S_ISSOCK(mode)) {
+ socket = (struct socket *)file->private_data;
+ socketfs = &schema_file->filesystem.socket;
+
+ /* Local socket */
+ err = kernel_getsockname(socket, (struct sockaddr *)&saddr);
+ if (err >= 0) {
+ fill_socket_description(&saddr, &fdata->local,
+ &socketfs->local);
+ }
+
+ /* Remote socket, might not be connected. */
+ err = kernel_getpeername(socket, (struct sockaddr *)&saddr);
+ if (err >= 0) {
+ fill_socket_description(&saddr, &fdata->remote,
+ &socketfs->remote);
+ }
+
+ schema_file->which_filesystem = schema_File_socket_tag;
+ return 0;
+ }
+
+ /*
+ * From this point, we care about all the other types of files as their
+ * path provides interesting insight.
+ */
+ buf = (char *)__get_free_page(GFP_KERNEL);
+ if (buf == NULL)
+ return -ENOMEM;
+
+ fdata->allocated = buf;
+
+ path = d_path(&file->f_path, buf, PAGE_SIZE);
+ if (IS_ERR(path)) {
+ free_file_data(fdata);
+ return PTR_ERR(path);
+ }
+
+ schema_file->fullpath.funcs.encode = pb_encode_string_field;
+ schema_file->fullpath.arg = path; /* buf is freed in free_file_data. */
+
+ err = fill_file_overlayfs(file, schema_file, fdata);
+ if (err) {
+ free_file_data(fdata);
+ return err;
+ }
+
+ return 0;
+}
+
+static int fill_stream_description(schema_Descriptor *desc, int fd,
+ struct file_data *fdata)
+{
+ struct fd sfd;
+ struct file *file;
+ int err = 0;
+
+ sfd = fdget(fd);
+ file = sfd.file;
+
+ if (file == NULL) {
+ memset(fdata, 0, sizeof(*fdata));
+ goto end;
+ }
+
+ desc->mode = file_inode(file)->i_mode;
+ err = fill_file_description(file, &desc->file, fdata);
+
+end:
+ fdput(sfd);
+ return err;
+}
+
+static int populate_proc_uuid_common(schema_Process *proc, char *uuid,
+ size_t uuid_size, char *parent_uuid,
+ size_t parent_uuid_size,
+ struct task_struct *task)
+{
+ int err;
+ struct task_struct *parent;
+ /* Generate unique identifier for the process and its parent */
+ err = get_process_uuid(task, uuid, uuid_size);
+ if (err)
+ return err;
+
+ proc->uuid.funcs.encode = pb_encode_uuid_field;
+ proc->uuid.arg = uuid;
+
+ rcu_read_lock();
+
+ if (!pid_alive(task))
+ goto out;
+ /*
+ * I don't think this needs to be task_rcu_dereference because
+ * real_parent is only supposed to be accessed using RCU.
+ */
+ parent = rcu_dereference(task->real_parent);
+
+ if (parent) {
+ err = get_process_uuid(parent, parent_uuid, parent_uuid_size);
+ if (!err) {
+ proc->parent_uuid.funcs.encode = pb_encode_uuid_field;
+ proc->parent_uuid.arg = parent_uuid;
+ }
+ }
+
+out:
+ rcu_read_unlock();
+
+ return err;
+}
+
+/* Populate the fields that we always want to set in Process messages. */
+static int populate_proc_common(schema_Process *proc, char *uuid,
+ size_t uuid_size, char *parent_uuid,
+ size_t parent_uuid_size,
+ struct task_struct *task)
+{
+ u64 cid;
+ struct pid_namespace *ns = task_active_pid_ns(task);
+
+ /* Container identifier for the current namespace. */
+ proc->container_id = csm_get_ns_contid(ns);
+
+ /*
+ * If the process container-id is different, the process tree is part of
+ * a different session within the namespace (kubectl/docker exec,
+ * liveness probe or others).
+ */
+ cid = audit_get_contid(task);
+ if (proc->container_id != cid)
+ proc->exec_session_id = cid;
+
+ /* Add information about pid in different namespaces */
+ proc->pid = task_pid_nr(task);
+ proc->parent_pid = task_ppid_nr(task);
+ proc->container_pid = task_pid_nr_ns(task, ns);
+ proc->container_parent_pid = task_ppid_nr_ns(task, ns);
+
+ return populate_proc_uuid_common(proc, uuid, uuid_size, parent_uuid,
+ parent_uuid_size, task);
+}
+
+int csm_bprm_check_security(struct linux_binprm *bprm)
+{
+ char uuid[PROCESS_UUID_SIZE];
+ char parent_uuid[PROCESS_UUID_SIZE];
+ int err;
+ schema_Event event = schema_Event_init_zero;
+ schema_Process *proc;
+ struct string_arr_ctx argv_ctx;
+ void *stack = NULL, *ctx = NULL;
+ u64 cid;
+ struct file_data path_data = {};
+ struct file_data stdin_data = {};
+ struct file_data stdout_data = {};
+ struct file_data stderr_data = {};
+
+ /*
+ * Always create a container-id for containerized processes.
+ * If the LSM is enabled later, we can track existing containers.
+ */
+ cid = audit_get_contid(current);
+
+ if (cid == AUDIT_CID_UNSET) {
+ if (!is_possible_container(current, bprm->file))
+ return 0;
+
+ cid = csm_set_contid(current);
+
+ if (cid == AUDIT_CID_UNSET)
+ return 0;
+ }
+
+ if (!csm_execute_enabled)
+ return 0;
+
+ /* The interpreter will call us again with more context. */
+ if (bprm->buf[0] == '#' && bprm->buf[1] == '!')
+ return 0;
+
+ proc = &event.event.execute.proc;
+ err = populate_proc_common(proc, uuid, sizeof(uuid), parent_uuid,
+ sizeof(parent_uuid), current);
+ if (err)
+ goto out_free_buf;
+
+ proc->creation_timestamp = ktime_get_real_ns();
+
+ /* Provide information about the launched binary. */
+ err = fill_file_description(bprm->file, &proc->binary, &path_data);
+ if (err)
+ goto out_free_buf;
+
+ /* Information about streams */
+ err = fill_stream_description(&proc->streams.stdin, STDIN_FILENO,
+ &stdin_data);
+ if (err)
+ goto out_free_buf;
+
+ err = fill_stream_description(&proc->streams.stdout, STDOUT_FILENO,
+ &stdout_data);
+ if (err)
+ goto out_free_buf;
+
+ err = fill_stream_description(&proc->streams.stderr, STDERR_FILENO,
+ &stderr_data);
+ if (err)
+ goto out_free_buf;
+
+ stack = kmap_argument_stack(bprm, &ctx);
+ if (!stack) {
+ err = -EFAULT;
+ goto out_free_buf;
+ }
+
+ /* Capture process argument */
+ argv_ctx.bprm = bprm;
+ argv_ctx.stack = stack;
+ proc->args.argv.funcs.encode = encode_current_argv;
+ proc->args.argv.arg = &argv_ctx;
+
+ /* Capture process environment variables */
+ proc->args.envp.funcs.encode = encode_current_envp;
+ proc->args.envp.arg = &argv_ctx;
+
+ event.which_event = schema_Event_execute_tag;
+
+ /*
+ * Configurations options are checked when computing the serialized
+ * protobufs.
+ */
+ down_read(&csm_rwsem_config);
+ err = csm_sendeventproto(schema_Event_fields, &event);
+ up_read(&csm_rwsem_config);
+
+ if (err)
+ pr_err("csm_sendeventproto returned %d on execve\n", err);
+ err = 0;
+
+out_free_buf:
+ kunmap_argument_stack(bprm, stack, ctx);
+ free_file_data(&path_data);
+ free_file_data(&stdin_data);
+ free_file_data(&stdout_data);
+ free_file_data(&stderr_data);
+
+ /*
+ * On failure, enforce it only if the execute config is enabled.
+ * If the collector was disabled, prefer to succeed to not impact the
+ * system.
+ */
+ if (unlikely(err < 0 && !csm_execute_enabled))
+ err = 0;
+
+ return err;
+}
+
+/* Create a clone event when a new task leader is created. */
+void csm_task_post_alloc(struct task_struct *task)
+{
+ int err;
+ char uuid[PROCESS_UUID_SIZE];
+ char parent_uuid[PROCESS_UUID_SIZE];
+ schema_Event event = schema_Event_init_zero;
+ schema_Process *proc;
+
+ if (!csm_execute_enabled ||
+ audit_get_contid(task) == AUDIT_CID_UNSET ||
+ !thread_group_leader(task))
+ return;
+
+ proc = &event.event.clone.proc;
+
+ err = populate_proc_uuid_common(proc, uuid, sizeof(uuid), parent_uuid,
+ sizeof(parent_uuid), task);
+
+ event.which_event = schema_Event_clone_tag;
+ err = csm_sendeventproto(schema_Event_fields, &event);
+ if (err)
+ pr_err("csm_sendeventproto returned %d on exit\n", err);
+}
+
+/*
+ * This LSM hook callback doesn't exist upstream and is called only when the
+ * last thread of a thread group exit.
+ */
+void csm_task_exit(struct task_struct *task)
+{
+ int err;
+ schema_Event event = schema_Event_init_zero;
+ schema_ExitEvent *exit;
+ char uuid[PROCESS_UUID_SIZE];
+
+ if (!csm_execute_enabled ||
+ audit_get_contid(task) == AUDIT_CID_UNSET)
+ return;
+
+ exit = &event.event.exit;
+
+ /* Fetch the unique identifier for this process */
+ err = get_process_uuid(task, uuid, sizeof(uuid));
+ if (err) {
+ pr_err("failed to get process uuid on exit\n");
+ return;
+ }
+
+ exit->process_uuid.funcs.encode = pb_encode_uuid_field;
+ exit->process_uuid.arg = uuid;
+
+ event.which_event = schema_Event_exit_tag;
+
+ err = csm_sendeventproto(schema_Event_fields, &event);
+ if (err)
+ pr_err("csm_sendeventproto returned %d on exit\n", err);
+}
+
+int csm_mprotect(struct vm_area_struct *vma, unsigned long reqprot,
+ unsigned long prot)
+{
+ char uuid[PROCESS_UUID_SIZE];
+ char parent_uuid[PROCESS_UUID_SIZE];
+ int err;
+ schema_Event event = schema_Event_init_zero;
+ schema_MemoryExecEvent *memexec;
+ u64 cid;
+ struct file_data path_data = {};
+
+ cid = audit_get_contid(current);
+
+ if (!csm_memexec_enabled ||
+ !(prot & PROT_EXEC) ||
+ vma->vm_file == NULL ||
+ cid == AUDIT_CID_UNSET)
+ return 0;
+
+ memexec = &event.event.memexec;
+
+ err = fill_file_description(vma->vm_file, &memexec->mapped_file,
+ &path_data);
+ if (err)
+ return err;
+
+ err = populate_proc_common(&memexec->proc, uuid, sizeof(uuid),
+ parent_uuid, sizeof(parent_uuid), current);
+ if (err)
+ goto out;
+
+ memexec->prot_exec_timestamp = ktime_get_real_ns();
+ memexec->new_flags = prot;
+ memexec->req_flags = reqprot;
+ memexec->old_vm_flags = vma->vm_flags;
+
+ memexec->action = schema_MemoryExecEvent_Action_MPROTECT;
+ memexec->start_addr = vma->vm_start;
+ memexec->end_addr = vma->vm_end;
+
+ event.which_event = schema_Event_memexec_tag;
+
+ err = csm_sendeventproto(schema_Event_fields, &event);
+ if (err)
+ pr_err("csm_sendeventproto returned %d on mprotect\n", err);
+ err = 0;
+
+ if (unlikely(err < 0 && !csm_memexec_enabled))
+ err = 0;
+
+out:
+ free_file_data(&path_data);
+ return err;
+}
+
+int csm_mmap_file(struct file *file, unsigned long reqprot,
+ unsigned long prot, unsigned long flags)
+{
+ char uuid[PROCESS_UUID_SIZE];
+ char parent_uuid[PROCESS_UUID_SIZE];
+ int err;
+ schema_Event event = schema_Event_init_zero;
+ schema_MemoryExecEvent *memexec;
+ struct file *exe_file;
+ u64 cid;
+ struct file_data path_data = {};
+
+ cid = audit_get_contid(current);
+
+ if (!csm_memexec_enabled ||
+ !(prot & PROT_EXEC) ||
+ file == NULL ||
+ cid == AUDIT_CID_UNSET)
+ return 0;
+
+ memexec = &event.event.memexec;
+ err = fill_file_description(file, &memexec->mapped_file,
+ &path_data);
+ if (err)
+ return err;
+
+ err = populate_proc_common(&memexec->proc, uuid, sizeof(uuid),
+ parent_uuid, sizeof(parent_uuid), current);
+ if (err)
+ goto out;
+
+ /* get_mm_exe_file does its own locking on mm_sem. */
+ exe_file = get_mm_exe_file(current->mm);
+ if (exe_file) {
+ if (path_equal(&file->f_path, &exe_file->f_path))
+ memexec->is_initial_mmap = 1;
+ fput(exe_file);
+ }
+
+ memexec->prot_exec_timestamp = ktime_get_real_ns();
+ memexec->new_flags = prot;
+ memexec->req_flags = reqprot;
+ memexec->mmap_flags = flags;
+ memexec->action = schema_MemoryExecEvent_Action_MMAP_FILE;
+ event.which_event = schema_Event_memexec_tag;
+
+ err = csm_sendeventproto(schema_Event_fields, &event);
+ if (err)
+ pr_err("csm_sendeventproto returned %d on mmap_file\n", err);
+ err = 0;
+
+ if (unlikely(err < 0 && !csm_memexec_enabled))
+ err = 0;
+
+out:
+ free_file_data(&path_data);
+ return err;
+}
+
+void csm_file_pre_free(struct file *file)
+{
+ struct dentry *dentry;
+ int err;
+ struct file_provenance prov;
+
+ /* The file was opened to be modified and the LSM is enabled */
+ if (!(file->f_mode & FMODE_WRITE) ||
+ !csm_enabled)
+ return;
+
+ /* The current process is containerized. */
+ if (audit_get_contid(current) == AUDIT_CID_UNSET)
+ return;
+
+ /* The file is part of overlayfs on the upper layer. */
+ if (!is_overlayfs_mounted(file))
+ return;
+
+ dentry = ovl_dentry_upper(file->f_path.dentry);
+ if (!dentry)
+ return;
+
+ err = __vfs_getxattr(dentry, dentry->d_inode, XATTR_SECURITY_CSM,
+ NULL, 0);
+ if (err != -ENODATA) {
+ if (err < 0)
+ pr_err("failed to get security attribute: %d\n", err);
+ return;
+ }
+
+ prov.tgid = task_tgid_nr(current);
+ prov.start_time = ktime_mono_to_real(current->group_leader->start_time);
+
+ err = __vfs_setxattr(dentry, dentry->d_inode, XATTR_SECURITY_CSM, &prov,
+ sizeof(prov), 0);
+ if (err < 0)
+ pr_err("failed to set security attribute: %d\n", err);
+}
+
+/*
+ * Based off of fs/proc/base.c:next_tgid
+ *
+ * next_thread_group_leader returns the task_struct of the next task with a pid
+ * greater than or equal to tgid. The reference count is increased so that
+ * rcu_read_unlock may be called, and preemption reenabled.
+ */
+static struct task_struct *next_thread_group_leader(pid_t *tgid)
+{
+ struct pid *pid;
+ struct task_struct *task;
+
+ cond_resched();
+ rcu_read_lock();
+retry:
+ task = NULL;
+ pid = find_ge_pid(*tgid, &init_pid_ns);
+ if (pid) {
+ *tgid = pid_nr_ns(pid, &init_pid_ns);
+ task = pid_task(pid, PIDTYPE_PID);
+ if (!task || !thread_group_leader(task) ||
+ audit_get_contid(task) == AUDIT_CID_UNSET) {
+ (*tgid) += 1;
+ goto retry;
+ }
+
+ /*
+ * Increment the reference count on the task before leaving
+ * the RCU grace period.
+ */
+ get_task_struct(task);
+ (*tgid) += 1;
+ }
+
+ rcu_read_unlock();
+ return task;
+}
+
+void delayed_enumerate_processes(struct work_struct *work)
+{
+ pid_t tgid = 0;
+ struct task_struct *task;
+ struct csm_enumerate_processes_work_data *wd = container_of(
+ work, struct csm_enumerate_processes_work_data, work);
+ int wd_enumeration_count = wd->enumeration_count;
+
+ kfree(wd);
+ wd = NULL;
+ work = NULL;
+
+ /*
+ * Try for only a single enumeration routine at a time, as long as the
+ * execute collector is enabled.
+ */
+ while ((wd_enumeration_count == atomic_read(&enumeration_count)) &&
+ READ_ONCE(csm_execute_enabled) &&
+ (task = next_thread_group_leader(&tgid))) {
+ int err;
+ char uuid[PROCESS_UUID_SIZE];
+ char parent_uuid[PROCESS_UUID_SIZE];
+ struct file *exe_file = NULL;
+ struct file_data path_data = {};
+ schema_Event event = schema_Event_init_zero;
+ schema_Process *proc = &event.event.enumproc.proc;
+
+ exe_file = get_task_exe_file(task);
+ if (!exe_file) {
+ pr_err("failed to get enumerated process executable, pid: %u\n",
+ task_pid_nr(task));
+ goto next;
+ }
+
+ err = fill_file_description(exe_file, &proc->binary,
+ &path_data);
+ if (err) {
+ pr_err("failed to fill enumerated process %u executable description: %d\n",
+ task_pid_nr(task), err);
+ goto next;
+ }
+
+ err = populate_proc_common(proc, uuid, sizeof(uuid),
+ parent_uuid, sizeof(parent_uuid),
+ task);
+ if (err) {
+ pr_err("failed to set pid %u common fields: %d\n",
+ task_pid_nr(task), err);
+ goto next;
+ }
+
+ if (task->flags & PF_EXITING)
+ goto next;
+
+ event.which_event = schema_Event_enumproc_tag;
+ err = csm_sendeventproto(schema_Event_fields,
+ &event);
+ if (err) {
+ pr_err("failed to send pid %u enumerated process: %d\n",
+ task_pid_nr(task), err);
+ goto next;
+ }
+next:
+ free_file_data(&path_data);
+ if (exe_file)
+ fput(exe_file);
+
+ put_task_struct(task);
+ }
+}
+
+void csm_enumerate_processes(unsigned long const config_version)
+{
+ struct csm_enumerate_processes_work_data *wd;
+
+ wd = kmalloc(sizeof(*wd), GFP_KERNEL);
+ if (!wd)
+ return;
+
+ INIT_WORK(&wd->work, delayed_enumerate_processes);
+ wd->enumeration_count = atomic_add_return(1, &enumeration_count);
+ schedule_work(&wd->work);
+}
diff --git a/security/container/process.h b/security/container/process.h
new file mode 100644
index 0000000..1c98134
--- /dev/null
+++ b/security/container/process.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+/*
+ * Container Security Monitor module
+ *
+ * Copyright (c) 2019 Google, Inc
+ */
+
+void csm_enumerate_processes(void);
diff --git a/security/container/protos/Makefile b/security/container/protos/Makefile
new file mode 100644
index 0000000..a88068b
--- /dev/null
+++ b/security/container/protos/Makefile
@@ -0,0 +1,10 @@
+subdir-$(CONFIG_SECURITY_CONTAINER_MONITOR) += nanopb
+
+obj-$(CONFIG_SECURITY_CONTAINER_MONITOR) += nanopb/
+obj-$(CONFIG_SECURITY_CONTAINER_MONITOR) += protos.o
+
+protos-y := config.pb.o event.pb.o
+
+ccflags-y := -I$(srctree)/security/container/protos \
+ -I$(srctree)/security/container/protos/nanopb \
+ $(PB_CCFLAGS)
diff --git a/security/container/protos/README b/security/container/protos/README
new file mode 100644
index 0000000..1b0628a
--- /dev/null
+++ b/security/container/protos/README
@@ -0,0 +1,18 @@
+This document provides guidance on how to change the protos used in this directory.
+
+Any change made to a proto file require to reformat it and regenerate nanopb
+sources. It also requires the proto files to be compatible to previously released versions.
+
+To reformat any proto file run: "clang-format -style=Google -i <file.proto>"
+
+To regenerate nanopb files:
+ - Install protoc
+ - apt-get install protobuf-compiler
+ - Clone/setup nanopb for version 0.3.9.1 (or clone the internal depot)
+ - git clone --depth=1 https://github.com/nanopb/nanopb.git
+ - cd nanopb
+ - git fetch --tags
+ - git checkout tags/0.3.9.1
+ - make -C generator/proto
+ - Run protoc with the nanopb definition
+ - protoc --plugin=<path_to_nanopb>/generator/protoc-gen-nanopb --nanopb_out=<path_to_linux>/security/container/protos/ <path_to_linux>/security/container/protos/<file.proto> --proto_path=<path_to_linux>/security/container/protos
diff --git a/security/container/protos/config.pb.c b/security/container/protos/config.pb.c
new file mode 100644
index 0000000..211bab0
--- /dev/null
+++ b/security/container/protos/config.pb.c
@@ -0,0 +1,72 @@
+/* Automatically generated nanopb constant definitions */
+/* Generated by nanopb-0.3.9.3 at Wed Jun 5 11:00:24 2019. */
+
+#include "config.pb.h"
+
+/* @@protoc_insertion_point(includes) */
+#if PB_PROTO_HEADER_VERSION != 30
+#error Regenerate this file with the current version of nanopb generator.
+#endif
+
+
+
+const pb_field_t schema_ContainerCollectorConfig_fields[2] = {
+ PB_FIELD( 1, BOOL , SINGULAR, STATIC , FIRST, schema_ContainerCollectorConfig, enabled, enabled, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_ExecuteCollectorConfig_fields[5] = {
+ PB_FIELD( 1, BOOL , SINGULAR, STATIC , FIRST, schema_ExecuteCollectorConfig, enabled, enabled, 0),
+ PB_FIELD( 2, UINT32 , SINGULAR, STATIC , OTHER, schema_ExecuteCollectorConfig, argv_limit, enabled, 0),
+ PB_FIELD( 3, UINT32 , SINGULAR, STATIC , OTHER, schema_ExecuteCollectorConfig, envp_limit, argv_limit, 0),
+ PB_FIELD( 4, STRING , REPEATED, CALLBACK, OTHER, schema_ExecuteCollectorConfig, envp_allowlist, envp_limit, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_MemExecCollectorConfig_fields[2] = {
+ PB_FIELD( 1, BOOL , SINGULAR, STATIC , FIRST, schema_MemExecCollectorConfig, enabled, enabled, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_ConfigurationRequest_fields[4] = {
+ PB_FIELD( 1, MESSAGE , SINGULAR, STATIC , FIRST, schema_ConfigurationRequest, container_config, container_config, &schema_ContainerCollectorConfig_fields),
+ PB_FIELD( 2, MESSAGE , SINGULAR, STATIC , OTHER, schema_ConfigurationRequest, execute_config, container_config, &schema_ExecuteCollectorConfig_fields),
+ PB_FIELD( 3, MESSAGE , SINGULAR, STATIC , OTHER, schema_ConfigurationRequest, memexec_config, execute_config, &schema_MemExecCollectorConfig_fields),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_ConfigurationResponse_fields[5] = {
+ PB_FIELD( 1, UENUM , SINGULAR, STATIC , FIRST, schema_ConfigurationResponse, error, error, 0),
+ PB_FIELD( 2, STRING , SINGULAR, CALLBACK, OTHER, schema_ConfigurationResponse, msg, error, 0),
+ PB_FIELD( 3, UINT64 , SINGULAR, STATIC , OTHER, schema_ConfigurationResponse, version, msg, 0),
+ PB_FIELD( 4, UINT32 , SINGULAR, STATIC , OTHER, schema_ConfigurationResponse, kernel_version, version, 0),
+ PB_LAST_FIELD
+};
+
+
+
+/* Check that field information fits in pb_field_t */
+#if !defined(PB_FIELD_32BIT)
+/* If you get an error here, it means that you need to define PB_FIELD_32BIT
+ * compile-time option. You can do that in pb.h or on compiler command line.
+ *
+ * The reason you need to do this is that some of your messages contain tag
+ * numbers or field sizes that are larger than what can fit in 8 or 16 bit
+ * field descriptors.
+ */
+PB_STATIC_ASSERT((pb_membersize(schema_ConfigurationRequest, container_config) < 65536 && pb_membersize(schema_ConfigurationRequest, execute_config) < 65536 && pb_membersize(schema_ConfigurationRequest, memexec_config) < 65536), YOU_MUST_DEFINE_PB_FIELD_32BIT_FOR_MESSAGES_schema_ContainerCollectorConfig_schema_ExecuteCollectorConfig_schema_MemExecCollectorConfig_schema_ConfigurationRequest_schema_ConfigurationResponse)
+#endif
+
+#if !defined(PB_FIELD_16BIT) && !defined(PB_FIELD_32BIT)
+/* If you get an error here, it means that you need to define PB_FIELD_16BIT
+ * compile-time option. You can do that in pb.h or on compiler command line.
+ *
+ * The reason you need to do this is that some of your messages contain tag
+ * numbers or field sizes that are larger than what can fit in the default
+ * 8 bit descriptors.
+ */
+PB_STATIC_ASSERT((pb_membersize(schema_ConfigurationRequest, container_config) < 256 && pb_membersize(schema_ConfigurationRequest, execute_config) < 256 && pb_membersize(schema_ConfigurationRequest, memexec_config) < 256), YOU_MUST_DEFINE_PB_FIELD_16BIT_FOR_MESSAGES_schema_ContainerCollectorConfig_schema_ExecuteCollectorConfig_schema_MemExecCollectorConfig_schema_ConfigurationRequest_schema_ConfigurationResponse)
+#endif
+
+
+/* @@protoc_insertion_point(eof) */
diff --git a/security/container/protos/config.pb.h b/security/container/protos/config.pb.h
new file mode 100644
index 0000000..2685d2b
--- /dev/null
+++ b/security/container/protos/config.pb.h
@@ -0,0 +1,116 @@
+/* Automatically generated nanopb header */
+/* Generated by nanopb-0.3.9.3 at Wed Jun 5 11:00:24 2019. */
+
+#ifndef PB_SCHEMA_CONFIG_PB_H_INCLUDED
+#define PB_SCHEMA_CONFIG_PB_H_INCLUDED
+#include <pb.h>
+
+/* @@protoc_insertion_point(includes) */
+#if PB_PROTO_HEADER_VERSION != 30
+#error Regenerate this file with the current version of nanopb generator.
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Enum definitions */
+typedef enum _schema_ConfigurationResponse_ErrorCode {
+ schema_ConfigurationResponse_ErrorCode_NO_ERROR = 0,
+ schema_ConfigurationResponse_ErrorCode_UNKNOWN = 2
+} schema_ConfigurationResponse_ErrorCode;
+#define _schema_ConfigurationResponse_ErrorCode_MIN schema_ConfigurationResponse_ErrorCode_NO_ERROR
+#define _schema_ConfigurationResponse_ErrorCode_MAX schema_ConfigurationResponse_ErrorCode_UNKNOWN
+#define _schema_ConfigurationResponse_ErrorCode_ARRAYSIZE ((schema_ConfigurationResponse_ErrorCode)(schema_ConfigurationResponse_ErrorCode_UNKNOWN+1))
+
+/* Struct definitions */
+typedef struct _schema_ConfigurationResponse {
+ schema_ConfigurationResponse_ErrorCode error;
+ pb_callback_t msg;
+ uint64_t version;
+ uint32_t kernel_version;
+/* @@protoc_insertion_point(struct:schema_ConfigurationResponse) */
+} schema_ConfigurationResponse;
+
+typedef struct _schema_ContainerCollectorConfig {
+ bool enabled;
+/* @@protoc_insertion_point(struct:schema_ContainerCollectorConfig) */
+} schema_ContainerCollectorConfig;
+
+typedef struct _schema_ExecuteCollectorConfig {
+ bool enabled;
+ uint32_t argv_limit;
+ uint32_t envp_limit;
+ pb_callback_t envp_allowlist;
+/* @@protoc_insertion_point(struct:schema_ExecuteCollectorConfig) */
+} schema_ExecuteCollectorConfig;
+
+typedef struct _schema_MemExecCollectorConfig {
+ bool enabled;
+/* @@protoc_insertion_point(struct:schema_MemExecCollectorConfig) */
+} schema_MemExecCollectorConfig;
+
+typedef struct _schema_ConfigurationRequest {
+ schema_ContainerCollectorConfig container_config;
+ schema_ExecuteCollectorConfig execute_config;
+ schema_MemExecCollectorConfig memexec_config;
+/* @@protoc_insertion_point(struct:schema_ConfigurationRequest) */
+} schema_ConfigurationRequest;
+
+/* Default values for struct fields */
+
+/* Initializer values for message structs */
+#define schema_ContainerCollectorConfig_init_default {0}
+#define schema_ExecuteCollectorConfig_init_default {0, 0, 0, {{NULL}, NULL}}
+#define schema_MemExecCollectorConfig_init_default {0}
+#define schema_ConfigurationRequest_init_default {schema_ContainerCollectorConfig_init_default, schema_ExecuteCollectorConfig_init_default, schema_MemExecCollectorConfig_init_default}
+#define schema_ConfigurationResponse_init_default {_schema_ConfigurationResponse_ErrorCode_MIN, {{NULL}, NULL}, 0, 0}
+#define schema_ContainerCollectorConfig_init_zero {0}
+#define schema_ExecuteCollectorConfig_init_zero {0, 0, 0, {{NULL}, NULL}}
+#define schema_MemExecCollectorConfig_init_zero {0}
+#define schema_ConfigurationRequest_init_zero {schema_ContainerCollectorConfig_init_zero, schema_ExecuteCollectorConfig_init_zero, schema_MemExecCollectorConfig_init_zero}
+#define schema_ConfigurationResponse_init_zero {_schema_ConfigurationResponse_ErrorCode_MIN, {{NULL}, NULL}, 0, 0}
+
+/* Field tags (for use in manual encoding/decoding) */
+#define schema_ConfigurationResponse_error_tag 1
+#define schema_ConfigurationResponse_msg_tag 2
+#define schema_ConfigurationResponse_version_tag 3
+#define schema_ConfigurationResponse_kernel_version_tag 4
+#define schema_ContainerCollectorConfig_enabled_tag 1
+#define schema_ExecuteCollectorConfig_enabled_tag 1
+#define schema_ExecuteCollectorConfig_argv_limit_tag 2
+#define schema_ExecuteCollectorConfig_envp_limit_tag 3
+#define schema_ExecuteCollectorConfig_envp_allowlist_tag 4
+#define schema_MemExecCollectorConfig_enabled_tag 1
+#define schema_ConfigurationRequest_container_config_tag 1
+#define schema_ConfigurationRequest_execute_config_tag 2
+#define schema_ConfigurationRequest_memexec_config_tag 3
+
+/* Struct field encoding specification for nanopb */
+extern const pb_field_t schema_ContainerCollectorConfig_fields[2];
+extern const pb_field_t schema_ExecuteCollectorConfig_fields[5];
+extern const pb_field_t schema_MemExecCollectorConfig_fields[2];
+extern const pb_field_t schema_ConfigurationRequest_fields[4];
+extern const pb_field_t schema_ConfigurationResponse_fields[5];
+
+/* Maximum encoded size of messages (where known) */
+#define schema_ContainerCollectorConfig_size 2
+/* schema_ExecuteCollectorConfig_size depends on runtime parameters */
+#define schema_MemExecCollectorConfig_size 2
+/* schema_ConfigurationRequest_size depends on runtime parameters */
+/* schema_ConfigurationResponse_size depends on runtime parameters */
+
+/* Message IDs (where set with "msgid" option) */
+#ifdef PB_MSGID
+
+#define CONFIG_MESSAGES \
+
+
+#endif
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+/* @@protoc_insertion_point(eof) */
+
+#endif
diff --git a/security/container/protos/config.proto b/security/container/protos/config.proto
new file mode 100644
index 0000000..e32a517
--- /dev/null
+++ b/security/container/protos/config.proto
@@ -0,0 +1,51 @@
+syntax = "proto3";
+
+package schema;
+
+// Collect information about running containers
+message ContainerCollectorConfig {
+ bool enabled = 1;
+}
+
+message ExecuteCollectorConfig {
+ bool enabled = 1;
+
+ // truncate argv/envp if cumulative length exceeds limit
+ uint32 argv_limit = 2;
+ uint32 envp_limit = 3;
+
+ // If specified, only report the named environment variables. An
+ // empty envp_allowlist indicates that all environment variables
+ // should be reported up to a cumulative total of envp_limit bytes.
+ repeated string envp_allowlist = 4;
+}
+
+// Collect information about executable memory mappings.
+message MemExecCollectorConfig {
+ bool enabled = 1;
+}
+
+// Convey configuration information to Guest LSM
+message ConfigurationRequest {
+ ContainerCollectorConfig container_config = 1;
+ ExecuteCollectorConfig execute_config = 2;
+ MemExecCollectorConfig memexec_config = 3;
+
+ // Additional configuration messages will be added as new collectors
+ // are implemented
+}
+
+// Report success or failure of previous ConfigurationRequest
+message ConfigurationResponse {
+ enum ErrorCode {
+ // Keep values in sync with
+ // https://github.com/googleapis/googleapis/blob/master/google/rpc/code.proto
+ NO_ERROR = 0;
+ UNKNOWN = 2;
+ }
+
+ ErrorCode error = 1;
+ string msg = 2;
+ uint64 version = 3; // Version of the LSM
+ uint32 kernel_version = 4; // LINUX_VERSION_CODE
+}
diff --git a/security/container/protos/event.pb.c b/security/container/protos/event.pb.c
new file mode 100644
index 0000000..1018ce2
--- /dev/null
+++ b/security/container/protos/event.pb.c
@@ -0,0 +1,174 @@
+/* Automatically generated nanopb constant definitions */
+/* Generated by nanopb-0.3.9.1 at Mon Nov 11 16:11:19 2019. */
+
+#include "event.pb.h"
+
+/* @@protoc_insertion_point(includes) */
+#if PB_PROTO_HEADER_VERSION != 30
+#error Regenerate this file with the current version of nanopb generator.
+#endif
+
+
+
+const pb_field_t schema_SocketIp_fields[4] = {
+ PB_FIELD( 1, UINT32 , SINGULAR, STATIC , FIRST, schema_SocketIp, family, family, 0),
+ PB_FIELD( 2, BYTES , SINGULAR, CALLBACK, OTHER, schema_SocketIp, ip, family, 0),
+ PB_FIELD( 3, UINT32 , SINGULAR, STATIC , OTHER, schema_SocketIp, port, ip, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_Socket_fields[3] = {
+ PB_FIELD( 1, MESSAGE , SINGULAR, STATIC , FIRST, schema_Socket, local, local, &schema_SocketIp_fields),
+ PB_FIELD( 2, MESSAGE , SINGULAR, STATIC , OTHER, schema_Socket, remote, local, &schema_SocketIp_fields),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_Overlay_fields[4] = {
+ PB_FIELD( 1, BOOL , SINGULAR, STATIC , FIRST, schema_Overlay, lower_layer, lower_layer, 0),
+ PB_FIELD( 2, BOOL , SINGULAR, STATIC , OTHER, schema_Overlay, upper_layer, lower_layer, 0),
+ PB_FIELD( 3, BYTES , SINGULAR, CALLBACK, OTHER, schema_Overlay, modified_uuid, upper_layer, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_File_fields[5] = {
+ PB_FIELD( 1, BYTES , SINGULAR, CALLBACK, FIRST, schema_File, fullpath, fullpath, 0),
+ PB_ONEOF_FIELD(filesystem, 2, MESSAGE , ONEOF, STATIC , OTHER, schema_File, overlayfs, fullpath, &schema_Overlay_fields),
+ PB_ONEOF_FIELD(filesystem, 4, MESSAGE , ONEOF, STATIC , UNION, schema_File, socket, fullpath, &schema_Socket_fields),
+ PB_FIELD( 3, UINT32 , SINGULAR, STATIC , OTHER, schema_File, ino, filesystem.socket, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_ProcessArguments_fields[5] = {
+ PB_FIELD( 1, BYTES , REPEATED, CALLBACK, FIRST, schema_ProcessArguments, argv, argv, 0),
+ PB_FIELD( 2, UINT32 , SINGULAR, STATIC , OTHER, schema_ProcessArguments, argv_truncated, argv, 0),
+ PB_FIELD( 3, BYTES , REPEATED, CALLBACK, OTHER, schema_ProcessArguments, envp, argv_truncated, 0),
+ PB_FIELD( 4, UINT32 , SINGULAR, STATIC , OTHER, schema_ProcessArguments, envp_truncated, envp, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_Descriptor_fields[3] = {
+ PB_FIELD( 1, UINT32 , SINGULAR, STATIC , FIRST, schema_Descriptor, mode, mode, 0),
+ PB_FIELD( 2, MESSAGE , SINGULAR, STATIC , OTHER, schema_Descriptor, file, mode, &schema_File_fields),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_Streams_fields[4] = {
+ PB_FIELD( 1, MESSAGE , SINGULAR, STATIC , FIRST, schema_Streams, stdin, stdin, &schema_Descriptor_fields),
+ PB_FIELD( 2, MESSAGE , SINGULAR, STATIC , OTHER, schema_Streams, stdout, stdin, &schema_Descriptor_fields),
+ PB_FIELD( 3, MESSAGE , SINGULAR, STATIC , OTHER, schema_Streams, stderr, stdout, &schema_Descriptor_fields),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_Process_fields[13] = {
+ PB_FIELD( 1, UINT64 , SINGULAR, STATIC , FIRST, schema_Process, creation_timestamp, creation_timestamp, 0),
+ PB_FIELD( 2, BYTES , SINGULAR, CALLBACK, OTHER, schema_Process, uuid, creation_timestamp, 0),
+ PB_FIELD( 3, UINT32 , SINGULAR, STATIC , OTHER, schema_Process, pid, uuid, 0),
+ PB_FIELD( 4, MESSAGE , SINGULAR, STATIC , OTHER, schema_Process, binary, pid, &schema_File_fields),
+ PB_FIELD( 5, UINT32 , SINGULAR, STATIC , OTHER, schema_Process, parent_pid, binary, 0),
+ PB_FIELD( 6, BYTES , SINGULAR, CALLBACK, OTHER, schema_Process, parent_uuid, parent_pid, 0),
+ PB_FIELD( 7, UINT64 , SINGULAR, STATIC , OTHER, schema_Process, container_id, parent_uuid, 0),
+ PB_FIELD( 8, UINT32 , SINGULAR, STATIC , OTHER, schema_Process, container_pid, container_id, 0),
+ PB_FIELD( 9, UINT32 , SINGULAR, STATIC , OTHER, schema_Process, container_parent_pid, container_pid, 0),
+ PB_FIELD( 10, MESSAGE , SINGULAR, STATIC , OTHER, schema_Process, args, container_parent_pid, &schema_ProcessArguments_fields),
+ PB_FIELD( 11, MESSAGE , SINGULAR, STATIC , OTHER, schema_Process, streams, args, &schema_Streams_fields),
+ PB_FIELD( 12, UINT64 , SINGULAR, STATIC , OTHER, schema_Process, exec_session_id, streams, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_Container_fields[10] = {
+ PB_FIELD( 1, UINT64 , SINGULAR, STATIC , FIRST, schema_Container, creation_timestamp, creation_timestamp, 0),
+ PB_FIELD( 2, BYTES , SINGULAR, CALLBACK, OTHER, schema_Container, pod_namespace, creation_timestamp, 0),
+ PB_FIELD( 3, BYTES , SINGULAR, CALLBACK, OTHER, schema_Container, pod_name, pod_namespace, 0),
+ PB_FIELD( 4, UINT64 , SINGULAR, STATIC , OTHER, schema_Container, container_id, pod_name, 0),
+ PB_FIELD( 5, BYTES , SINGULAR, CALLBACK, OTHER, schema_Container, container_name, container_id, 0),
+ PB_FIELD( 6, BYTES , SINGULAR, CALLBACK, OTHER, schema_Container, container_image_uri, container_name, 0),
+ PB_FIELD( 7, BYTES , REPEATED, CALLBACK, OTHER, schema_Container, labels, container_image_uri, 0),
+ PB_FIELD( 8, BYTES , SINGULAR, CALLBACK, OTHER, schema_Container, init_uuid, labels, 0),
+ PB_FIELD( 9, BYTES , SINGULAR, CALLBACK, OTHER, schema_Container, container_image_id, init_uuid, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_ExecuteEvent_fields[2] = {
+ PB_FIELD( 1, MESSAGE , SINGULAR, STATIC , FIRST, schema_ExecuteEvent, proc, proc, &schema_Process_fields),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_CloneEvent_fields[2] = {
+ PB_FIELD( 1, MESSAGE , SINGULAR, STATIC , FIRST, schema_CloneEvent, proc, proc, &schema_Process_fields),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_EnumerateProcessEvent_fields[2] = {
+ PB_FIELD( 1, MESSAGE , SINGULAR, STATIC , FIRST, schema_EnumerateProcessEvent, proc, proc, &schema_Process_fields),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_MemoryExecEvent_fields[12] = {
+ PB_FIELD( 1, MESSAGE , SINGULAR, STATIC , FIRST, schema_MemoryExecEvent, proc, proc, &schema_Process_fields),
+ PB_FIELD( 2, UINT64 , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, prot_exec_timestamp, proc, 0),
+ PB_FIELD( 3, UINT64 , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, new_flags, prot_exec_timestamp, 0),
+ PB_FIELD( 4, UINT64 , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, req_flags, new_flags, 0),
+ PB_FIELD( 5, UINT64 , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, old_vm_flags, req_flags, 0),
+ PB_FIELD( 6, UINT64 , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, mmap_flags, old_vm_flags, 0),
+ PB_FIELD( 7, MESSAGE , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, mapped_file, mmap_flags, &schema_File_fields),
+ PB_FIELD( 8, UENUM , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, action, mapped_file, 0),
+ PB_FIELD( 9, UINT64 , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, start_addr, action, 0),
+ PB_FIELD( 10, UINT64 , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, end_addr, start_addr, 0),
+ PB_FIELD( 11, BOOL , SINGULAR, STATIC , OTHER, schema_MemoryExecEvent, is_initial_mmap, end_addr, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_ContainerInfoEvent_fields[2] = {
+ PB_FIELD( 1, MESSAGE , SINGULAR, STATIC , FIRST, schema_ContainerInfoEvent, container, container, &schema_Container_fields),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_ExitEvent_fields[2] = {
+ PB_FIELD( 1, BYTES , SINGULAR, CALLBACK, FIRST, schema_ExitEvent, process_uuid, process_uuid, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_Event_fields[8] = {
+ PB_ONEOF_FIELD(event, 1, MESSAGE , ONEOF, STATIC , FIRST, schema_Event, execute, execute, &schema_ExecuteEvent_fields),
+ PB_ONEOF_FIELD(event, 2, MESSAGE , ONEOF, STATIC , UNION, schema_Event, container, container, &schema_ContainerInfoEvent_fields),
+ PB_ONEOF_FIELD(event, 3, MESSAGE , ONEOF, STATIC , UNION, schema_Event, exit, exit, &schema_ExitEvent_fields),
+ PB_ONEOF_FIELD(event, 4, MESSAGE , ONEOF, STATIC , UNION, schema_Event, memexec, memexec, &schema_MemoryExecEvent_fields),
+ PB_ONEOF_FIELD(event, 5, MESSAGE , ONEOF, STATIC , UNION, schema_Event, clone, clone, &schema_CloneEvent_fields),
+ PB_ONEOF_FIELD(event, 7, MESSAGE , ONEOF, STATIC , UNION, schema_Event, enumproc, enumproc, &schema_EnumerateProcessEvent_fields),
+ PB_FIELD( 6, UINT64 , SINGULAR, STATIC , OTHER, schema_Event, timestamp, event.enumproc, 0),
+ PB_LAST_FIELD
+};
+
+const pb_field_t schema_ContainerReport_fields[3] = {
+ PB_FIELD( 1, UINT32 , SINGULAR, STATIC , FIRST, schema_ContainerReport, pid, pid, 0),
+ PB_FIELD( 2, MESSAGE , SINGULAR, STATIC , OTHER, schema_ContainerReport, container, pid, &schema_Container_fields),
+ PB_LAST_FIELD
+};
+
+
+
+/* Check that field information fits in pb_field_t */
+#if !defined(PB_FIELD_32BIT)
+/* If you get an error here, it means that you need to define PB_FIELD_32BIT
+ * compile-time option. You can do that in pb.h or on compiler command line.
+ *
+ * The reason you need to do this is that some of your messages contain tag
+ * numbers or field sizes that are larger than what can fit in 8 or 16 bit
+ * field descriptors.
+ */
+PB_STATIC_ASSERT((pb_membersize(schema_Socket, local) < 65536 && pb_membersize(schema_Socket, remote) < 65536 && pb_membersize(schema_File, filesystem.overlayfs) < 65536 && pb_membersize(schema_File, filesystem.socket) < 65536 && pb_membersize(schema_Descriptor, file) < 65536 && pb_membersize(schema_Streams, stdin) < 65536 && pb_membersize(schema_Streams, stdout) < 65536 && pb_membersize(schema_Streams, stderr) < 65536 && pb_membersize(schema_Process, binary) < 65536 && pb_membersize(schema_Process, args) < 65536 && pb_membersize(schema_Process, streams) < 65536 && pb_membersize(schema_ExecuteEvent, proc) < 65536 && pb_membersize(schema_CloneEvent, proc) < 65536 && pb_membersize(schema_EnumerateProcessEvent, proc) < 65536 && pb_membersize(schema_MemoryExecEvent, proc) < 65536 && pb_membersize(schema_MemoryExecEvent, mapped_file) < 65536 && pb_membersize(schema_ContainerInfoEvent, container) < 65536 && pb_membersize(schema_Event, event.execute) < 65536 && pb_membersize(schema_Event, event.container) < 65536 && pb_membersize(schema_Event, event.exit) < 65536 && pb_membersize(schema_Event, event.memexec) < 65536 && pb_membersize(schema_Event, event.clone) < 65536 && pb_membersize(schema_Event, event.enumproc) < 65536 && pb_membersize(schema_ContainerReport, container) < 65536), YOU_MUST_DEFINE_PB_FIELD_32BIT_FOR_MESSAGES_schema_SocketIp_schema_Socket_schema_Overlay_schema_File_schema_ProcessArguments_schema_Descriptor_schema_Streams_schema_Process_schema_Container_schema_ExecuteEvent_schema_CloneEvent_schema_EnumerateProcessEvent_schema_MemoryExecEvent_schema_ContainerInfoEvent_schema_ExitEvent_schema_Event_schema_ContainerReport)
+#endif
+
+#if !defined(PB_FIELD_16BIT) && !defined(PB_FIELD_32BIT)
+/* If you get an error here, it means that you need to define PB_FIELD_16BIT
+ * compile-time option. You can do that in pb.h or on compiler command line.
+ *
+ * The reason you need to do this is that some of your messages contain tag
+ * numbers or field sizes that are larger than what can fit in the default
+ * 8 bit descriptors.
+ */
+PB_STATIC_ASSERT((pb_membersize(schema_Socket, local) < 256 && pb_membersize(schema_Socket, remote) < 256 && pb_membersize(schema_File, filesystem.overlayfs) < 256 && pb_membersize(schema_File, filesystem.socket) < 256 && pb_membersize(schema_Descriptor, file) < 256 && pb_membersize(schema_Streams, stdin) < 256 && pb_membersize(schema_Streams, stdout) < 256 && pb_membersize(schema_Streams, stderr) < 256 && pb_membersize(schema_Process, binary) < 256 && pb_membersize(schema_Process, args) < 256 && pb_membersize(schema_Process, streams) < 256 && pb_membersize(schema_ExecuteEvent, proc) < 256 && pb_membersize(schema_CloneEvent, proc) < 256 && pb_membersize(schema_EnumerateProcessEvent, proc) < 256 && pb_membersize(schema_MemoryExecEvent, proc) < 256 && pb_membersize(schema_MemoryExecEvent, mapped_file) < 256 && pb_membersize(schema_ContainerInfoEvent, container) < 256 && pb_membersize(schema_Event, event.execute) < 256 && pb_membersize(schema_Event, event.container) < 256 && pb_membersize(schema_Event, event.exit) < 256 && pb_membersize(schema_Event, event.memexec) < 256 && pb_membersize(schema_Event, event.clone) < 256 && pb_membersize(schema_Event, event.enumproc) < 256 && pb_membersize(schema_ContainerReport, container) < 256), YOU_MUST_DEFINE_PB_FIELD_16BIT_FOR_MESSAGES_schema_SocketIp_schema_Socket_schema_Overlay_schema_File_schema_ProcessArguments_schema_Descriptor_schema_Streams_schema_Process_schema_Container_schema_ExecuteEvent_schema_CloneEvent_schema_EnumerateProcessEvent_schema_MemoryExecEvent_schema_ContainerInfoEvent_schema_ExitEvent_schema_Event_schema_ContainerReport)
+#endif
+
+
+/* @@protoc_insertion_point(eof) */
diff --git a/security/container/protos/event.pb.h b/security/container/protos/event.pb.h
new file mode 100644
index 0000000..0634e8e
--- /dev/null
+++ b/security/container/protos/event.pb.h
@@ -0,0 +1,327 @@
+/* Automatically generated nanopb header */
+/* Generated by nanopb-0.3.9.1 at Mon Nov 11 16:11:19 2019. */
+
+#ifndef PB_SCHEMA_EVENT_PB_H_INCLUDED
+#define PB_SCHEMA_EVENT_PB_H_INCLUDED
+#include <pb.h>
+
+/* @@protoc_insertion_point(includes) */
+#if PB_PROTO_HEADER_VERSION != 30
+#error Regenerate this file with the current version of nanopb generator.
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Enum definitions */
+typedef enum _schema_MemoryExecEvent_Action {
+ schema_MemoryExecEvent_Action_UNDEFINED = 0,
+ schema_MemoryExecEvent_Action_MPROTECT = 1,
+ schema_MemoryExecEvent_Action_MMAP_FILE = 2
+} schema_MemoryExecEvent_Action;
+#define _schema_MemoryExecEvent_Action_MIN schema_MemoryExecEvent_Action_UNDEFINED
+#define _schema_MemoryExecEvent_Action_MAX schema_MemoryExecEvent_Action_MMAP_FILE
+#define _schema_MemoryExecEvent_Action_ARRAYSIZE ((schema_MemoryExecEvent_Action)(schema_MemoryExecEvent_Action_MMAP_FILE+1))
+
+/* Struct definitions */
+typedef struct _schema_ExitEvent {
+ pb_callback_t process_uuid;
+/* @@protoc_insertion_point(struct:schema_ExitEvent) */
+} schema_ExitEvent;
+
+typedef struct _schema_Container {
+ uint64_t creation_timestamp;
+ pb_callback_t pod_namespace;
+ pb_callback_t pod_name;
+ uint64_t container_id;
+ pb_callback_t container_name;
+ pb_callback_t container_image_uri;
+ pb_callback_t labels;
+ pb_callback_t init_uuid;
+ pb_callback_t container_image_id;
+/* @@protoc_insertion_point(struct:schema_Container) */
+} schema_Container;
+
+typedef struct _schema_Overlay {
+ bool lower_layer;
+ bool upper_layer;
+ pb_callback_t modified_uuid;
+/* @@protoc_insertion_point(struct:schema_Overlay) */
+} schema_Overlay;
+
+typedef struct _schema_ProcessArguments {
+ pb_callback_t argv;
+ uint32_t argv_truncated;
+ pb_callback_t envp;
+ uint32_t envp_truncated;
+/* @@protoc_insertion_point(struct:schema_ProcessArguments) */
+} schema_ProcessArguments;
+
+typedef struct _schema_SocketIp {
+ uint32_t family;
+ pb_callback_t ip;
+ uint32_t port;
+/* @@protoc_insertion_point(struct:schema_SocketIp) */
+} schema_SocketIp;
+
+typedef struct _schema_ContainerInfoEvent {
+ schema_Container container;
+/* @@protoc_insertion_point(struct:schema_ContainerInfoEvent) */
+} schema_ContainerInfoEvent;
+
+typedef struct _schema_ContainerReport {
+ uint32_t pid;
+ schema_Container container;
+/* @@protoc_insertion_point(struct:schema_ContainerReport) */
+} schema_ContainerReport;
+
+typedef struct _schema_Socket {
+ schema_SocketIp local;
+ schema_SocketIp remote;
+/* @@protoc_insertion_point(struct:schema_Socket) */
+} schema_Socket;
+
+typedef struct _schema_File {
+ pb_callback_t fullpath;
+ pb_size_t which_filesystem;
+ union {
+ schema_Overlay overlayfs;
+ schema_Socket socket;
+ } filesystem;
+ uint32_t ino;
+/* @@protoc_insertion_point(struct:schema_File) */
+} schema_File;
+
+typedef struct _schema_Descriptor {
+ uint32_t mode;
+ schema_File file;
+/* @@protoc_insertion_point(struct:schema_Descriptor) */
+} schema_Descriptor;
+
+typedef struct _schema_Streams {
+ schema_Descriptor stdin;
+ schema_Descriptor stdout;
+ schema_Descriptor stderr;
+/* @@protoc_insertion_point(struct:schema_Streams) */
+} schema_Streams;
+
+typedef struct _schema_Process {
+ uint64_t creation_timestamp;
+ pb_callback_t uuid;
+ uint32_t pid;
+ schema_File binary;
+ uint32_t parent_pid;
+ pb_callback_t parent_uuid;
+ uint64_t container_id;
+ uint32_t container_pid;
+ uint32_t container_parent_pid;
+ schema_ProcessArguments args;
+ schema_Streams streams;
+ uint64_t exec_session_id;
+/* @@protoc_insertion_point(struct:schema_Process) */
+} schema_Process;
+
+typedef struct _schema_CloneEvent {
+ schema_Process proc;
+/* @@protoc_insertion_point(struct:schema_CloneEvent) */
+} schema_CloneEvent;
+
+typedef struct _schema_EnumerateProcessEvent {
+ schema_Process proc;
+/* @@protoc_insertion_point(struct:schema_EnumerateProcessEvent) */
+} schema_EnumerateProcessEvent;
+
+typedef struct _schema_ExecuteEvent {
+ schema_Process proc;
+/* @@protoc_insertion_point(struct:schema_ExecuteEvent) */
+} schema_ExecuteEvent;
+
+typedef struct _schema_MemoryExecEvent {
+ schema_Process proc;
+ uint64_t prot_exec_timestamp;
+ uint64_t new_flags;
+ uint64_t req_flags;
+ uint64_t old_vm_flags;
+ uint64_t mmap_flags;
+ schema_File mapped_file;
+ schema_MemoryExecEvent_Action action;
+ uint64_t start_addr;
+ uint64_t end_addr;
+ bool is_initial_mmap;
+/* @@protoc_insertion_point(struct:schema_MemoryExecEvent) */
+} schema_MemoryExecEvent;
+
+typedef struct _schema_Event {
+ pb_size_t which_event;
+ union {
+ schema_ExecuteEvent execute;
+ schema_ContainerInfoEvent container;
+ schema_ExitEvent exit;
+ schema_MemoryExecEvent memexec;
+ schema_CloneEvent clone;
+ schema_EnumerateProcessEvent enumproc;
+ } event;
+ uint64_t timestamp;
+/* @@protoc_insertion_point(struct:schema_Event) */
+} schema_Event;
+
+/* Default values for struct fields */
+
+/* Initializer values for message structs */
+#define schema_SocketIp_init_default {0, {{NULL}, NULL}, 0}
+#define schema_Socket_init_default {schema_SocketIp_init_default, schema_SocketIp_init_default}
+#define schema_Overlay_init_default {0, 0, {{NULL}, NULL}}
+#define schema_File_init_default {{{NULL}, NULL}, 0, {schema_Overlay_init_default}, 0}
+#define schema_ProcessArguments_init_default {{{NULL}, NULL}, 0, {{NULL}, NULL}, 0}
+#define schema_Descriptor_init_default {0, schema_File_init_default}
+#define schema_Streams_init_default {schema_Descriptor_init_default, schema_Descriptor_init_default, schema_Descriptor_init_default}
+#define schema_Process_init_default {0, {{NULL}, NULL}, 0, schema_File_init_default, 0, {{NULL}, NULL}, 0, 0, 0, schema_ProcessArguments_init_default, schema_Streams_init_default, 0}
+#define schema_Container_init_default {0, {{NULL}, NULL}, {{NULL}, NULL}, 0, {{NULL}, NULL}, {{NULL}, NULL}, {{NULL}, NULL}, {{NULL}, NULL}, {{NULL}, NULL}}
+#define schema_ExecuteEvent_init_default {schema_Process_init_default}
+#define schema_CloneEvent_init_default {schema_Process_init_default}
+#define schema_EnumerateProcessEvent_init_default {schema_Process_init_default}
+#define schema_MemoryExecEvent_init_default {schema_Process_init_default, 0, 0, 0, 0, 0, schema_File_init_default, _schema_MemoryExecEvent_Action_MIN, 0, 0, 0}
+#define schema_ContainerInfoEvent_init_default {schema_Container_init_default}
+#define schema_ExitEvent_init_default {{{NULL}, NULL}}
+#define schema_Event_init_default {0, {schema_ExecuteEvent_init_default}, 0}
+#define schema_ContainerReport_init_default {0, schema_Container_init_default}
+#define schema_SocketIp_init_zero {0, {{NULL}, NULL}, 0}
+#define schema_Socket_init_zero {schema_SocketIp_init_zero, schema_SocketIp_init_zero}
+#define schema_Overlay_init_zero {0, 0, {{NULL}, NULL}}
+#define schema_File_init_zero {{{NULL}, NULL}, 0, {schema_Overlay_init_zero}, 0}
+#define schema_ProcessArguments_init_zero {{{NULL}, NULL}, 0, {{NULL}, NULL}, 0}
+#define schema_Descriptor_init_zero {0, schema_File_init_zero}
+#define schema_Streams_init_zero {schema_Descriptor_init_zero, schema_Descriptor_init_zero, schema_Descriptor_init_zero}
+#define schema_Process_init_zero {0, {{NULL}, NULL}, 0, schema_File_init_zero, 0, {{NULL}, NULL}, 0, 0, 0, schema_ProcessArguments_init_zero, schema_Streams_init_zero, 0}
+#define schema_Container_init_zero {0, {{NULL}, NULL}, {{NULL}, NULL}, 0, {{NULL}, NULL}, {{NULL}, NULL}, {{NULL}, NULL}, {{NULL}, NULL}, {{NULL}, NULL}}
+#define schema_ExecuteEvent_init_zero {schema_Process_init_zero}
+#define schema_CloneEvent_init_zero {schema_Process_init_zero}
+#define schema_EnumerateProcessEvent_init_zero {schema_Process_init_zero}
+#define schema_MemoryExecEvent_init_zero {schema_Process_init_zero, 0, 0, 0, 0, 0, schema_File_init_zero, _schema_MemoryExecEvent_Action_MIN, 0, 0, 0}
+#define schema_ContainerInfoEvent_init_zero {schema_Container_init_zero}
+#define schema_ExitEvent_init_zero {{{NULL}, NULL}}
+#define schema_Event_init_zero {0, {schema_ExecuteEvent_init_zero}, 0}
+#define schema_ContainerReport_init_zero {0, schema_Container_init_zero}
+
+/* Field tags (for use in manual encoding/decoding) */
+#define schema_ExitEvent_process_uuid_tag 1
+#define schema_Container_creation_timestamp_tag 1
+#define schema_Container_pod_namespace_tag 2
+#define schema_Container_pod_name_tag 3
+#define schema_Container_container_id_tag 4
+#define schema_Container_container_name_tag 5
+#define schema_Container_container_image_uri_tag 6
+#define schema_Container_labels_tag 7
+#define schema_Container_init_uuid_tag 8
+#define schema_Container_container_image_id_tag 9
+#define schema_Overlay_lower_layer_tag 1
+#define schema_Overlay_upper_layer_tag 2
+#define schema_Overlay_modified_uuid_tag 3
+#define schema_ProcessArguments_argv_tag 1
+#define schema_ProcessArguments_argv_truncated_tag 2
+#define schema_ProcessArguments_envp_tag 3
+#define schema_ProcessArguments_envp_truncated_tag 4
+#define schema_SocketIp_family_tag 1
+#define schema_SocketIp_ip_tag 2
+#define schema_SocketIp_port_tag 3
+#define schema_ContainerInfoEvent_container_tag 1
+#define schema_ContainerReport_pid_tag 1
+#define schema_ContainerReport_container_tag 2
+#define schema_Socket_local_tag 1
+#define schema_Socket_remote_tag 2
+#define schema_File_overlayfs_tag 2
+#define schema_File_socket_tag 4
+#define schema_File_fullpath_tag 1
+#define schema_File_ino_tag 3
+#define schema_Descriptor_mode_tag 1
+#define schema_Descriptor_file_tag 2
+#define schema_Streams_stdin_tag 1
+#define schema_Streams_stdout_tag 2
+#define schema_Streams_stderr_tag 3
+#define schema_Process_creation_timestamp_tag 1
+#define schema_Process_uuid_tag 2
+#define schema_Process_pid_tag 3
+#define schema_Process_binary_tag 4
+#define schema_Process_parent_pid_tag 5
+#define schema_Process_parent_uuid_tag 6
+#define schema_Process_container_id_tag 7
+#define schema_Process_container_pid_tag 8
+#define schema_Process_container_parent_pid_tag 9
+#define schema_Process_args_tag 10
+#define schema_Process_streams_tag 11
+#define schema_Process_exec_session_id_tag 12
+#define schema_CloneEvent_proc_tag 1
+#define schema_EnumerateProcessEvent_proc_tag 1
+#define schema_ExecuteEvent_proc_tag 1
+#define schema_MemoryExecEvent_proc_tag 1
+#define schema_MemoryExecEvent_prot_exec_timestamp_tag 2
+#define schema_MemoryExecEvent_new_flags_tag 3
+#define schema_MemoryExecEvent_req_flags_tag 4
+#define schema_MemoryExecEvent_old_vm_flags_tag 5
+#define schema_MemoryExecEvent_mmap_flags_tag 6
+#define schema_MemoryExecEvent_mapped_file_tag 7
+#define schema_MemoryExecEvent_action_tag 8
+#define schema_MemoryExecEvent_start_addr_tag 9
+#define schema_MemoryExecEvent_end_addr_tag 10
+#define schema_MemoryExecEvent_is_initial_mmap_tag 11
+#define schema_Event_execute_tag 1
+#define schema_Event_container_tag 2
+#define schema_Event_exit_tag 3
+#define schema_Event_memexec_tag 4
+#define schema_Event_clone_tag 5
+#define schema_Event_enumproc_tag 7
+#define schema_Event_timestamp_tag 6
+
+/* Struct field encoding specification for nanopb */
+extern const pb_field_t schema_SocketIp_fields[4];
+extern const pb_field_t schema_Socket_fields[3];
+extern const pb_field_t schema_Overlay_fields[4];
+extern const pb_field_t schema_File_fields[5];
+extern const pb_field_t schema_ProcessArguments_fields[5];
+extern const pb_field_t schema_Descriptor_fields[3];
+extern const pb_field_t schema_Streams_fields[4];
+extern const pb_field_t schema_Process_fields[13];
+extern const pb_field_t schema_Container_fields[10];
+extern const pb_field_t schema_ExecuteEvent_fields[2];
+extern const pb_field_t schema_CloneEvent_fields[2];
+extern const pb_field_t schema_EnumerateProcessEvent_fields[2];
+extern const pb_field_t schema_MemoryExecEvent_fields[12];
+extern const pb_field_t schema_ContainerInfoEvent_fields[2];
+extern const pb_field_t schema_ExitEvent_fields[2];
+extern const pb_field_t schema_Event_fields[8];
+extern const pb_field_t schema_ContainerReport_fields[3];
+
+/* Maximum encoded size of messages (where known) */
+/* schema_SocketIp_size depends on runtime parameters */
+#define schema_Socket_size (12 + schema_SocketIp_size + schema_SocketIp_size)
+/* schema_Overlay_size depends on runtime parameters */
+/* schema_File_size depends on runtime parameters */
+/* schema_ProcessArguments_size depends on runtime parameters */
+#define schema_Descriptor_size (12 + schema_File_size)
+#define schema_Streams_size (54 + schema_File_size + schema_File_size + schema_File_size)
+/* schema_Process_size depends on runtime parameters */
+/* schema_Container_size depends on runtime parameters */
+#define schema_ExecuteEvent_size (6 + schema_Process_size)
+#define schema_CloneEvent_size (6 + schema_Process_size)
+#define schema_EnumerateProcessEvent_size (6 + schema_Process_size)
+#define schema_MemoryExecEvent_size (93 + schema_Process_size + schema_File_size)
+#define schema_ContainerInfoEvent_size (6 + schema_Container_size)
+/* schema_ExitEvent_size depends on runtime parameters */
+#define schema_Event_size (11 + ((((((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) > schema_ExitEvent_size ? ((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) : schema_ExitEvent_size) > schema_ContainerInfoEvent_size ? (((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) > schema_ExitEvent_size ? ((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) : schema_ExitEvent_size) : schema_ContainerInfoEvent_size) > schema_MemoryExecEvent_size ? ((((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) > schema_ExitEvent_size ? ((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) : schema_ExitEvent_size) > schema_ContainerInfoEvent_size ? (((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) > schema_ExitEvent_size ? ((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) : schema_ExitEvent_size) : schema_ContainerInfoEvent_size) : schema_MemoryExecEvent_size) > 0 ? (((((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) > schema_ExitEvent_size ? ((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) : schema_ExitEvent_size) > schema_ContainerInfoEvent_size ? (((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) > schema_ExitEvent_size ? ((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) : schema_ExitEvent_size) : schema_ContainerInfoEvent_size) > schema_MemoryExecEvent_size ? ((((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) > schema_ExitEvent_size ? ((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) : schema_ExitEvent_size) > schema_ContainerInfoEvent_size ? (((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) > schema_ExitEvent_size ? ((schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) > schema_ExecuteEvent_size ? (schema_CloneEvent_size > schema_EnumerateProcessEvent_size ? schema_CloneEvent_size : schema_EnumerateProcessEvent_size) : schema_ExecuteEvent_size) : schema_ExitEvent_size) : schema_ContainerInfoEvent_size) : schema_MemoryExecEvent_size) : 0))
+#define schema_ContainerReport_size (12 + schema_Container_size)
+
+/* Message IDs (where set with "msgid" option) */
+#ifdef PB_MSGID
+
+#define EVENT_MESSAGES \
+
+
+#endif
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+/* @@protoc_insertion_point(eof) */
+
+#endif
diff --git a/security/container/protos/event.proto b/security/container/protos/event.proto
new file mode 100644
index 0000000..dfe483f
--- /dev/null
+++ b/security/container/protos/event.proto
@@ -0,0 +1,151 @@
+syntax = "proto3";
+
+package schema;
+
+message SocketIp {
+ uint32 family = 1; // AF_* for socket type.
+ bytes ip = 2; // ip4 or ip6 address.
+ uint32 port = 3; // port bind or connected.
+}
+
+message Socket {
+ SocketIp local = 1;
+ SocketIp remote = 2; // unset if not connected.
+}
+
+message Overlay {
+ bool lower_layer = 1;
+ bool upper_layer = 2;
+ bytes modified_uuid = 3; // The process who first modified the file.
+}
+
+message File {
+ bytes fullpath = 1;
+ uint32 ino = 3; // inode number.
+ oneof filesystem {
+ Overlay overlayfs = 2;
+ Socket socket = 4;
+ }
+}
+
+message ProcessArguments {
+ repeated bytes argv = 1; // process arguments
+ uint32 argv_truncated = 2; // number of characters truncated from argv
+ repeated bytes envp = 3; // process environment variables
+ uint32 envp_truncated = 4; // number of characters truncated from envp
+}
+
+message Descriptor {
+ uint32 mode = 1; // file mode (stat st_mode)
+ File file = 2;
+}
+
+message Streams {
+ Descriptor stdin = 1;
+ Descriptor stdout = 2;
+ Descriptor stderr = 3;
+}
+
+message Process {
+ uint64 creation_timestamp = 1; // Only populated in ExecuteEvent, in ns.
+ bytes uuid = 2;
+ uint32 pid = 3;
+ File binary = 4; // Only populated in ExecuteEvent.
+ uint32 parent_pid = 5;
+ bytes parent_uuid = 6;
+ uint64 container_id = 7; // unique id of process's container
+ uint32 container_pid = 8; // pid inside the container namespace pid
+ uint32 container_parent_pid = 9; // optional
+ ProcessArguments args = 10; // Only populated in ExecuteEvent.
+ Streams streams = 11; // Only populated in ExecuteEvent.
+ uint64 exec_session_id = 12; // identifier set for kubectl exec sessions.
+}
+
+message Container {
+ uint64 creation_timestamp = 1; // container create time in ns
+ bytes pod_namespace = 2;
+ bytes pod_name = 3;
+ uint64 container_id = 4; // unique across lifetime of Node
+ bytes container_name = 5;
+ bytes container_image_uri = 6;
+ repeated bytes labels = 7;
+ bytes init_uuid = 8;
+ bytes container_image_id = 9;
+}
+
+// A binary being executed.
+// e.g., execve()
+message ExecuteEvent {
+ Process proc = 1;
+}
+
+// A process clone is being created. This message means that a cloning operation
+// is being attempted. It may be sent even if fork fails.
+message CloneEvent {
+ Process proc = 1;
+}
+
+// Processes that are enumerated at startup will be sent with this event. There
+// is no distinction from events we would have seen from fork or exec.
+message EnumerateProcessEvent {
+ Process proc = 1;
+}
+
+// Collect information about mmap/mprotect calls with the PROT_EXEC flag set.
+message MemoryExecEvent {
+ Process proc = 1; // The origin process
+ // The timestamp in ns when the memory was set executable
+ uint64 prot_exec_timestamp = 2;
+ // The prot flags granted by the kernel for the operation
+ uint64 new_flags = 3;
+ // The prot flags requested for the mprotect/mmap operation
+ uint64 req_flags = 4;
+ // The vm_flags prior to the mprotect operation, if relevant
+ uint64 old_vm_flags = 5;
+ // The operational flags for the mmap operation, if relevant
+ uint64 mmap_flags = 6;
+ // Derived from the file struct describing the fd being mapped
+ File mapped_file = 7;
+ enum Action {
+ UNDEFINED = 0;
+ MPROTECT = 1;
+ MMAP_FILE = 2;
+ }
+ Action action = 8;
+
+ uint64 start_addr = 9; // The executable memory region start addr
+ uint64 end_addr = 10; // The executable memory region end addr
+ // True if this event is a mmap of the process' binary
+ bool is_initial_mmap = 11;
+}
+
+// Associate the following container information with all processes
+// that have the indicated container_id.
+message ContainerInfoEvent {
+ Container container = 1;
+}
+
+// The process with the indicated pid has exited.
+message ExitEvent {
+ bytes process_uuid = 1;
+}
+
+// Next ID: 8
+message Event {
+ oneof event {
+ ExecuteEvent execute = 1;
+ ContainerInfoEvent container = 2;
+ ExitEvent exit = 3;
+ MemoryExecEvent memexec = 4;
+ CloneEvent clone = 5;
+ EnumerateProcessEvent enumproc = 7;
+ }
+
+ uint64 timestamp = 6; // In nanoseconds
+}
+
+// Message sent by the daemonset to the LSM for container enlightenment.
+message ContainerReport {
+ uint32 pid = 1; // Top pid of the running container.
+ Container container = 2; // Information collected about the container.
+}
diff --git a/security/container/protos/nanopb/LICENSE b/security/container/protos/nanopb/LICENSE
new file mode 100644
index 0000000..a83630a
--- /dev/null
+++ b/security/container/protos/nanopb/LICENSE
@@ -0,0 +1,20 @@
+Copyright (c) 2011 Petteri Aimonen <jpa at nanopb.mail.kapsi.fi>
+
+This software is provided 'as-is', without any express or
+implied warranty. In no event will the authors be held liable
+for any damages arising from the use of this software.
+
+Permission is granted to anyone to use this software for any
+purpose, including commercial applications, and to alter it and
+redistribute it freely, subject to the following restrictions:
+
+1. The origin of this software must not be misrepresented; you
+ must not claim that you wrote the original software. If you use
+ this software in a product, an acknowledgment in the product
+ documentation would be appreciated but is not required.
+
+2. Altered source versions must be plainly marked as such, and
+ must not be misrepresented as being the original software.
+
+3. This notice may not be removed or altered from any source
+ distribution.
diff --git a/security/container/protos/nanopb/Makefile b/security/container/protos/nanopb/Makefile
new file mode 100644
index 0000000..b7e15f8
--- /dev/null
+++ b/security/container/protos/nanopb/Makefile
@@ -0,0 +1,7 @@
+obj-$(CONFIG_SECURITY_CONTAINER_MONITOR) += nanopb.o
+
+nanopb-y := pb_encode.o pb_decode.o pb_common.o
+
+ccflags-y := -I$(srctree)/security/container/protos \
+ -I$(srctree)/security/container/protos/nanopb \
+ $(PB_CCFLAGS)
diff --git a/security/container/protos/nanopb/pb.h b/security/container/protos/nanopb/pb.h
new file mode 100644
index 0000000..174a84b
--- /dev/null
+++ b/security/container/protos/nanopb/pb.h
@@ -0,0 +1,593 @@
+/* Common parts of the nanopb library. Most of these are quite low-level
+ * stuff. For the high-level interface, see pb_encode.h and pb_decode.h.
+ */
+
+#ifndef PB_H_INCLUDED
+#define PB_H_INCLUDED
+
+/*****************************************************************
+ * Nanopb compilation time options. You can change these here by *
+ * uncommenting the lines, or on the compiler command line. *
+ *****************************************************************/
+
+/* Enable support for dynamically allocated fields */
+/* #define PB_ENABLE_MALLOC 1 */
+
+/* Define this if your CPU / compiler combination does not support
+ * unaligned memory access to packed structures. */
+/* #define PB_NO_PACKED_STRUCTS 1 */
+
+/* Increase the number of required fields that are tracked.
+ * A compiler warning will tell if you need this. */
+/* #define PB_MAX_REQUIRED_FIELDS 256 */
+
+/* Add support for tag numbers > 255 and fields larger than 255 bytes. */
+/* #define PB_FIELD_16BIT 1 */
+
+/* Add support for tag numbers > 65536 and fields larger than 65536 bytes. */
+/* #define PB_FIELD_32BIT 1 */
+
+/* Disable support for error messages in order to save some code space. */
+/* #define PB_NO_ERRMSG 1 */
+
+/* Disable support for custom streams (support only memory buffers). */
+/* #define PB_BUFFER_ONLY 1 */
+
+/* Switch back to the old-style callback function signature.
+ * This was the default until nanopb-0.2.1. */
+/* #define PB_OLD_CALLBACK_STYLE */
+
+
+/******************************************************************
+ * You usually don't need to change anything below this line. *
+ * Feel free to look around and use the defined macros, though. *
+ ******************************************************************/
+
+
+/* Version of the nanopb library. Just in case you want to check it in
+ * your own program. */
+#define NANOPB_VERSION nanopb-0.3.9.1
+
+/* Include all the system headers needed by nanopb. You will need the
+ * definitions of the following:
+ * - strlen, memcpy, memset functions
+ * - [u]int_least8_t, uint_fast8_t, [u]int_least16_t, [u]int32_t, [u]int64_t
+ * - size_t
+ * - bool
+ *
+ * If you don't have the standard header files, you can instead provide
+ * a custom header that defines or includes all this. In that case,
+ * define PB_SYSTEM_HEADER to the path of this file.
+ */
+#ifdef PB_SYSTEM_HEADER
+#include PB_SYSTEM_HEADER
+#else
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+#include <string.h>
+
+#ifdef PB_ENABLE_MALLOC
+#include <stdlib.h>
+#endif
+#endif
+
+/* Macro for defining packed structures (compiler dependent).
+ * This just reduces memory requirements, but is not required.
+ */
+#if defined(PB_NO_PACKED_STRUCTS)
+ /* Disable struct packing */
+# define PB_PACKED_STRUCT_START
+# define PB_PACKED_STRUCT_END
+# define pb_packed
+#elif defined(__GNUC__) || defined(__clang__)
+ /* For GCC and clang */
+# define PB_PACKED_STRUCT_START
+# define PB_PACKED_STRUCT_END
+# define pb_packed __attribute__((packed))
+#elif defined(__ICCARM__) || defined(__CC_ARM)
+ /* For IAR ARM and Keil MDK-ARM compilers */
+# define PB_PACKED_STRUCT_START _Pragma("pack(push, 1)")
+# define PB_PACKED_STRUCT_END _Pragma("pack(pop)")
+# define pb_packed
+#elif defined(_MSC_VER) && (_MSC_VER >= 1500)
+ /* For Microsoft Visual C++ */
+# define PB_PACKED_STRUCT_START __pragma(pack(push, 1))
+# define PB_PACKED_STRUCT_END __pragma(pack(pop))
+# define pb_packed
+#else
+ /* Unknown compiler */
+# define PB_PACKED_STRUCT_START
+# define PB_PACKED_STRUCT_END
+# define pb_packed
+#endif
+
+/* Handly macro for suppressing unreferenced-parameter compiler warnings. */
+#ifndef PB_UNUSED
+#define PB_UNUSED(x) (void)(x)
+#endif
+
+/* Compile-time assertion, used for checking compatible compilation options.
+ * If this does not work properly on your compiler, use
+ * #define PB_NO_STATIC_ASSERT to disable it.
+ *
+ * But before doing that, check carefully the error message / place where it
+ * comes from to see if the error has a real cause. Unfortunately the error
+ * message is not always very clear to read, but you can see the reason better
+ * in the place where the PB_STATIC_ASSERT macro was called.
+ */
+#ifndef PB_NO_STATIC_ASSERT
+#ifndef PB_STATIC_ASSERT
+#define PB_STATIC_ASSERT(COND,MSG) typedef char PB_STATIC_ASSERT_MSG(MSG, __LINE__, __COUNTER__)[(COND)?1:-1];
+#define PB_STATIC_ASSERT_MSG(MSG, LINE, COUNTER) PB_STATIC_ASSERT_MSG_(MSG, LINE, COUNTER)
+#define PB_STATIC_ASSERT_MSG_(MSG, LINE, COUNTER) pb_static_assertion_##MSG##LINE##COUNTER
+#endif
+#else
+#define PB_STATIC_ASSERT(COND,MSG)
+#endif
+
+/* Number of required fields to keep track of. */
+#ifndef PB_MAX_REQUIRED_FIELDS
+#define PB_MAX_REQUIRED_FIELDS 64
+#endif
+
+#if PB_MAX_REQUIRED_FIELDS < 64
+#error You should not lower PB_MAX_REQUIRED_FIELDS from the default value (64).
+#endif
+
+/* List of possible field types. These are used in the autogenerated code.
+ * Least-significant 4 bits tell the scalar type
+ * Most-significant 4 bits specify repeated/required/packed etc.
+ */
+
+typedef uint_least8_t pb_type_t;
+
+/**** Field data types ****/
+
+/* Numeric types */
+#define PB_LTYPE_VARINT 0x00 /* int32, int64, enum, bool */
+#define PB_LTYPE_UVARINT 0x01 /* uint32, uint64 */
+#define PB_LTYPE_SVARINT 0x02 /* sint32, sint64 */
+#define PB_LTYPE_FIXED32 0x03 /* fixed32, sfixed32, float */
+#define PB_LTYPE_FIXED64 0x04 /* fixed64, sfixed64, double */
+
+/* Marker for last packable field type. */
+#define PB_LTYPE_LAST_PACKABLE 0x04
+
+/* Byte array with pre-allocated buffer.
+ * data_size is the length of the allocated PB_BYTES_ARRAY structure. */
+#define PB_LTYPE_BYTES 0x05
+
+/* String with pre-allocated buffer.
+ * data_size is the maximum length. */
+#define PB_LTYPE_STRING 0x06
+
+/* Submessage
+ * submsg_fields is pointer to field descriptions */
+#define PB_LTYPE_SUBMESSAGE 0x07
+
+/* Extension pseudo-field
+ * The field contains a pointer to pb_extension_t */
+#define PB_LTYPE_EXTENSION 0x08
+
+/* Byte array with inline, pre-allocated byffer.
+ * data_size is the length of the inline, allocated buffer.
+ * This differs from PB_LTYPE_BYTES by defining the element as
+ * pb_byte_t[data_size] rather than pb_bytes_array_t. */
+#define PB_LTYPE_FIXED_LENGTH_BYTES 0x09
+
+/* Number of declared LTYPES */
+#define PB_LTYPES_COUNT 0x0A
+#define PB_LTYPE_MASK 0x0F
+
+/**** Field repetition rules ****/
+
+#define PB_HTYPE_REQUIRED 0x00
+#define PB_HTYPE_OPTIONAL 0x10
+#define PB_HTYPE_REPEATED 0x20
+#define PB_HTYPE_ONEOF 0x30
+#define PB_HTYPE_MASK 0x30
+
+/**** Field allocation types ****/
+
+#define PB_ATYPE_STATIC 0x00
+#define PB_ATYPE_POINTER 0x80
+#define PB_ATYPE_CALLBACK 0x40
+#define PB_ATYPE_MASK 0xC0
+
+#define PB_ATYPE(x) ((x) & PB_ATYPE_MASK)
+#define PB_HTYPE(x) ((x) & PB_HTYPE_MASK)
+#define PB_LTYPE(x) ((x) & PB_LTYPE_MASK)
+
+/* Data type used for storing sizes of struct fields
+ * and array counts.
+ */
+#if defined(PB_FIELD_32BIT)
+ typedef uint32_t pb_size_t;
+ typedef int32_t pb_ssize_t;
+#elif defined(PB_FIELD_16BIT)
+ typedef uint_least16_t pb_size_t;
+ typedef int_least16_t pb_ssize_t;
+#else
+ typedef uint_least8_t pb_size_t;
+ typedef int_least8_t pb_ssize_t;
+#endif
+#define PB_SIZE_MAX ((pb_size_t)-1)
+
+/* Data type for storing encoded data and other byte streams.
+ * This typedef exists to support platforms where uint8_t does not exist.
+ * You can regard it as equivalent on uint8_t on other platforms.
+ */
+typedef uint_least8_t pb_byte_t;
+
+/* This structure is used in auto-generated constants
+ * to specify struct fields.
+ * You can change field sizes if you need structures
+ * larger than 256 bytes or field tags larger than 256.
+ * The compiler should complain if your .proto has such
+ * structures. Fix that by defining PB_FIELD_16BIT or
+ * PB_FIELD_32BIT.
+ */
+PB_PACKED_STRUCT_START
+typedef struct pb_field_s pb_field_t;
+struct pb_field_s {
+ pb_size_t tag;
+ pb_type_t type;
+ pb_size_t data_offset; /* Offset of field data, relative to previous field. */
+ pb_ssize_t size_offset; /* Offset of array size or has-boolean, relative to data */
+ pb_size_t data_size; /* Data size in bytes for a single item */
+ pb_size_t array_size; /* Maximum number of entries in array */
+
+ /* Field definitions for submessage
+ * OR default value for all other non-array, non-callback types
+ * If null, then field will zeroed. */
+ const void *ptr;
+} pb_packed;
+PB_PACKED_STRUCT_END
+
+/* Make sure that the standard integer types are of the expected sizes.
+ * Otherwise fixed32/fixed64 fields can break.
+ *
+ * If you get errors here, it probably means that your stdint.h is not
+ * correct for your platform.
+ */
+#ifndef PB_WITHOUT_64BIT
+PB_STATIC_ASSERT(sizeof(int64_t) == 2 * sizeof(int32_t), INT64_T_WRONG_SIZE)
+PB_STATIC_ASSERT(sizeof(uint64_t) == 2 * sizeof(uint32_t), UINT64_T_WRONG_SIZE)
+#endif
+
+/* This structure is used for 'bytes' arrays.
+ * It has the number of bytes in the beginning, and after that an array.
+ * Note that actual structs used will have a different length of bytes array.
+ */
+#define PB_BYTES_ARRAY_T(n) struct { pb_size_t size; pb_byte_t bytes[n]; }
+#define PB_BYTES_ARRAY_T_ALLOCSIZE(n) ((size_t)n + offsetof(pb_bytes_array_t, bytes))
+
+struct pb_bytes_array_s {
+ pb_size_t size;
+ pb_byte_t bytes[1];
+};
+typedef struct pb_bytes_array_s pb_bytes_array_t;
+
+/* This structure is used for giving the callback function.
+ * It is stored in the message structure and filled in by the method that
+ * calls pb_decode.
+ *
+ * The decoding callback will be given a limited-length stream
+ * If the wire type was string, the length is the length of the string.
+ * If the wire type was a varint/fixed32/fixed64, the length is the length
+ * of the actual value.
+ * The function may be called multiple times (especially for repeated types,
+ * but also otherwise if the message happens to contain the field multiple
+ * times.)
+ *
+ * The encoding callback will receive the actual output stream.
+ * It should write all the data in one call, including the field tag and
+ * wire type. It can write multiple fields.
+ *
+ * The callback can be null if you want to skip a field.
+ */
+typedef struct pb_istream_s pb_istream_t;
+typedef struct pb_ostream_s pb_ostream_t;
+typedef struct pb_callback_s pb_callback_t;
+struct pb_callback_s {
+#ifdef PB_OLD_CALLBACK_STYLE
+ /* Deprecated since nanopb-0.2.1 */
+ union {
+ bool (*decode)(pb_istream_t *stream, const pb_field_t *field, void *arg);
+ bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, const void *arg);
+ } funcs;
+#else
+ /* New function signature, which allows modifying arg contents in callback. */
+ union {
+ bool (*decode)(pb_istream_t *stream, const pb_field_t *field, void **arg);
+ bool (*encode)(pb_ostream_t *stream, const pb_field_t *field, void * const *arg);
+ } funcs;
+#endif
+
+ /* Free arg for use by callback */
+ void *arg;
+};
+
+/* Wire types. Library user needs these only in encoder callbacks. */
+typedef enum {
+ PB_WT_VARINT = 0,
+ PB_WT_64BIT = 1,
+ PB_WT_STRING = 2,
+ PB_WT_32BIT = 5
+} pb_wire_type_t;
+
+/* Structure for defining the handling of unknown/extension fields.
+ * Usually the pb_extension_type_t structure is automatically generated,
+ * while the pb_extension_t structure is created by the user. However,
+ * if you want to catch all unknown fields, you can also create a custom
+ * pb_extension_type_t with your own callback.
+ */
+typedef struct pb_extension_type_s pb_extension_type_t;
+typedef struct pb_extension_s pb_extension_t;
+struct pb_extension_type_s {
+ /* Called for each unknown field in the message.
+ * If you handle the field, read off all of its data and return true.
+ * If you do not handle the field, do not read anything and return true.
+ * If you run into an error, return false.
+ * Set to NULL for default handler.
+ */
+ bool (*decode)(pb_istream_t *stream, pb_extension_t *extension,
+ uint32_t tag, pb_wire_type_t wire_type);
+
+ /* Called once after all regular fields have been encoded.
+ * If you have something to write, do so and return true.
+ * If you do not have anything to write, just return true.
+ * If you run into an error, return false.
+ * Set to NULL for default handler.
+ */
+ bool (*encode)(pb_ostream_t *stream, const pb_extension_t *extension);
+
+ /* Free field for use by the callback. */
+ const void *arg;
+};
+
+struct pb_extension_s {
+ /* Type describing the extension field. Usually you'll initialize
+ * this to a pointer to the automatically generated structure. */
+ const pb_extension_type_t *type;
+
+ /* Destination for the decoded data. This must match the datatype
+ * of the extension field. */
+ void *dest;
+
+ /* Pointer to the next extension handler, or NULL.
+ * If this extension does not match a field, the next handler is
+ * automatically called. */
+ pb_extension_t *next;
+
+ /* The decoder sets this to true if the extension was found.
+ * Ignored for encoding. */
+ bool found;
+};
+
+/* Memory allocation functions to use. You can define pb_realloc and
+ * pb_free to custom functions if you want. */
+#ifdef PB_ENABLE_MALLOC
+# ifndef pb_realloc
+# define pb_realloc(ptr, size) realloc(ptr, size)
+# endif
+# ifndef pb_free
+# define pb_free(ptr) free(ptr)
+# endif
+#endif
+
+/* This is used to inform about need to regenerate .pb.h/.pb.c files. */
+#define PB_PROTO_HEADER_VERSION 30
+
+/* These macros are used to declare pb_field_t's in the constant array. */
+/* Size of a structure member, in bytes. */
+#define pb_membersize(st, m) (sizeof ((st*)0)->m)
+/* Number of entries in an array. */
+#define pb_arraysize(st, m) (pb_membersize(st, m) / pb_membersize(st, m[0]))
+/* Delta from start of one member to the start of another member. */
+#define pb_delta(st, m1, m2) ((int)offsetof(st, m1) - (int)offsetof(st, m2))
+/* Marks the end of the field list */
+#define PB_LAST_FIELD {0,(pb_type_t) 0,0,0,0,0,0}
+
+/* Macros for filling in the data_offset field */
+/* data_offset for first field in a message */
+#define PB_DATAOFFSET_FIRST(st, m1, m2) (offsetof(st, m1))
+/* data_offset for subsequent fields */
+#define PB_DATAOFFSET_OTHER(st, m1, m2) (offsetof(st, m1) - offsetof(st, m2) - pb_membersize(st, m2))
+/* data offset for subsequent fields inside an union (oneof) */
+#define PB_DATAOFFSET_UNION(st, m1, m2) (PB_SIZE_MAX)
+/* Choose first/other based on m1 == m2 (deprecated, remains for backwards compatibility) */
+#define PB_DATAOFFSET_CHOOSE(st, m1, m2) (int)(offsetof(st, m1) == offsetof(st, m2) \
+ ? PB_DATAOFFSET_FIRST(st, m1, m2) \
+ : PB_DATAOFFSET_OTHER(st, m1, m2))
+
+/* Required fields are the simplest. They just have delta (padding) from
+ * previous field end, and the size of the field. Pointer is used for
+ * submessages and default values.
+ */
+#define PB_REQUIRED_STATIC(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_STATIC | PB_HTYPE_REQUIRED | ltype, \
+ fd, 0, pb_membersize(st, m), 0, ptr}
+
+/* Optional fields add the delta to the has_ variable. */
+#define PB_OPTIONAL_STATIC(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_STATIC | PB_HTYPE_OPTIONAL | ltype, \
+ fd, \
+ pb_delta(st, has_ ## m, m), \
+ pb_membersize(st, m), 0, ptr}
+
+#define PB_SINGULAR_STATIC(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_STATIC | PB_HTYPE_OPTIONAL | ltype, \
+ fd, 0, pb_membersize(st, m), 0, ptr}
+
+/* Repeated fields have a _count field and also the maximum number of entries. */
+#define PB_REPEATED_STATIC(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_STATIC | PB_HTYPE_REPEATED | ltype, \
+ fd, \
+ pb_delta(st, m ## _count, m), \
+ pb_membersize(st, m[0]), \
+ pb_arraysize(st, m), ptr}
+
+/* Allocated fields carry the size of the actual data, not the pointer */
+#define PB_REQUIRED_POINTER(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_POINTER | PB_HTYPE_REQUIRED | ltype, \
+ fd, 0, pb_membersize(st, m[0]), 0, ptr}
+
+/* Optional fields don't need a has_ variable, as information would be redundant */
+#define PB_OPTIONAL_POINTER(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_POINTER | PB_HTYPE_OPTIONAL | ltype, \
+ fd, 0, pb_membersize(st, m[0]), 0, ptr}
+
+/* Same as optional fields*/
+#define PB_SINGULAR_POINTER(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_POINTER | PB_HTYPE_OPTIONAL | ltype, \
+ fd, 0, pb_membersize(st, m[0]), 0, ptr}
+
+/* Repeated fields have a _count field and a pointer to array of pointers */
+#define PB_REPEATED_POINTER(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_POINTER | PB_HTYPE_REPEATED | ltype, \
+ fd, pb_delta(st, m ## _count, m), \
+ pb_membersize(st, m[0]), 0, ptr}
+
+/* Callbacks are much like required fields except with special datatype. */
+#define PB_REQUIRED_CALLBACK(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_CALLBACK | PB_HTYPE_REQUIRED | ltype, \
+ fd, 0, pb_membersize(st, m), 0, ptr}
+
+#define PB_OPTIONAL_CALLBACK(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_CALLBACK | PB_HTYPE_OPTIONAL | ltype, \
+ fd, 0, pb_membersize(st, m), 0, ptr}
+
+#define PB_SINGULAR_CALLBACK(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_CALLBACK | PB_HTYPE_OPTIONAL | ltype, \
+ fd, 0, pb_membersize(st, m), 0, ptr}
+
+#define PB_REPEATED_CALLBACK(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_CALLBACK | PB_HTYPE_REPEATED | ltype, \
+ fd, 0, pb_membersize(st, m), 0, ptr}
+
+/* Optional extensions don't have the has_ field, as that would be redundant.
+ * Furthermore, the combination of OPTIONAL without has_ field is used
+ * for indicating proto3 style fields. Extensions exist in proto2 mode only,
+ * so they should be encoded according to proto2 rules. To avoid the conflict,
+ * extensions are marked as REQUIRED instead.
+ */
+#define PB_OPTEXT_STATIC(tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_STATIC | PB_HTYPE_REQUIRED | ltype, \
+ 0, \
+ 0, \
+ pb_membersize(st, m), 0, ptr}
+
+#define PB_OPTEXT_POINTER(tag, st, m, fd, ltype, ptr) \
+ PB_OPTIONAL_POINTER(tag, st, m, fd, ltype, ptr)
+
+#define PB_OPTEXT_CALLBACK(tag, st, m, fd, ltype, ptr) \
+ PB_OPTIONAL_CALLBACK(tag, st, m, fd, ltype, ptr)
+
+/* The mapping from protobuf types to LTYPEs is done using these macros. */
+#define PB_LTYPE_MAP_BOOL PB_LTYPE_VARINT
+#define PB_LTYPE_MAP_BYTES PB_LTYPE_BYTES
+#define PB_LTYPE_MAP_DOUBLE PB_LTYPE_FIXED64
+#define PB_LTYPE_MAP_ENUM PB_LTYPE_VARINT
+#define PB_LTYPE_MAP_UENUM PB_LTYPE_UVARINT
+#define PB_LTYPE_MAP_FIXED32 PB_LTYPE_FIXED32
+#define PB_LTYPE_MAP_FIXED64 PB_LTYPE_FIXED64
+#define PB_LTYPE_MAP_FLOAT PB_LTYPE_FIXED32
+#define PB_LTYPE_MAP_INT32 PB_LTYPE_VARINT
+#define PB_LTYPE_MAP_INT64 PB_LTYPE_VARINT
+#define PB_LTYPE_MAP_MESSAGE PB_LTYPE_SUBMESSAGE
+#define PB_LTYPE_MAP_SFIXED32 PB_LTYPE_FIXED32
+#define PB_LTYPE_MAP_SFIXED64 PB_LTYPE_FIXED64
+#define PB_LTYPE_MAP_SINT32 PB_LTYPE_SVARINT
+#define PB_LTYPE_MAP_SINT64 PB_LTYPE_SVARINT
+#define PB_LTYPE_MAP_STRING PB_LTYPE_STRING
+#define PB_LTYPE_MAP_UINT32 PB_LTYPE_UVARINT
+#define PB_LTYPE_MAP_UINT64 PB_LTYPE_UVARINT
+#define PB_LTYPE_MAP_EXTENSION PB_LTYPE_EXTENSION
+#define PB_LTYPE_MAP_FIXED_LENGTH_BYTES PB_LTYPE_FIXED_LENGTH_BYTES
+
+/* This is the actual macro used in field descriptions.
+ * It takes these arguments:
+ * - Field tag number
+ * - Field type: BOOL, BYTES, DOUBLE, ENUM, UENUM, FIXED32, FIXED64,
+ * FLOAT, INT32, INT64, MESSAGE, SFIXED32, SFIXED64
+ * SINT32, SINT64, STRING, UINT32, UINT64 or EXTENSION
+ * - Field rules: REQUIRED, OPTIONAL or REPEATED
+ * - Allocation: STATIC, CALLBACK or POINTER
+ * - Placement: FIRST or OTHER, depending on if this is the first field in structure.
+ * - Message name
+ * - Field name
+ * - Previous field name (or field name again for first field)
+ * - Pointer to default value or submsg fields.
+ */
+
+#define PB_FIELD(tag, type, rules, allocation, placement, message, field, prevfield, ptr) \
+ PB_ ## rules ## _ ## allocation(tag, message, field, \
+ PB_DATAOFFSET_ ## placement(message, field, prevfield), \
+ PB_LTYPE_MAP_ ## type, ptr)
+
+/* Field description for repeated static fixed count fields.*/
+#define PB_REPEATED_FIXED_COUNT(tag, type, placement, message, field, prevfield, ptr) \
+ {tag, PB_ATYPE_STATIC | PB_HTYPE_REPEATED | PB_LTYPE_MAP_ ## type, \
+ PB_DATAOFFSET_ ## placement(message, field, prevfield), \
+ 0, \
+ pb_membersize(message, field[0]), \
+ pb_arraysize(message, field), ptr}
+
+/* Field description for oneof fields. This requires taking into account the
+ * union name also, that's why a separate set of macros is needed.
+ */
+#define PB_ONEOF_STATIC(u, tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_STATIC | PB_HTYPE_ONEOF | ltype, \
+ fd, pb_delta(st, which_ ## u, u.m), \
+ pb_membersize(st, u.m), 0, ptr}
+
+#define PB_ONEOF_POINTER(u, tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_POINTER | PB_HTYPE_ONEOF | ltype, \
+ fd, pb_delta(st, which_ ## u, u.m), \
+ pb_membersize(st, u.m[0]), 0, ptr}
+
+#define PB_ONEOF_FIELD(union_name, tag, type, rules, allocation, placement, message, field, prevfield, ptr) \
+ PB_ONEOF_ ## allocation(union_name, tag, message, field, \
+ PB_DATAOFFSET_ ## placement(message, union_name.field, prevfield), \
+ PB_LTYPE_MAP_ ## type, ptr)
+
+#define PB_ANONYMOUS_ONEOF_STATIC(u, tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_STATIC | PB_HTYPE_ONEOF | ltype, \
+ fd, pb_delta(st, which_ ## u, m), \
+ pb_membersize(st, m), 0, ptr}
+
+#define PB_ANONYMOUS_ONEOF_POINTER(u, tag, st, m, fd, ltype, ptr) \
+ {tag, PB_ATYPE_POINTER | PB_HTYPE_ONEOF | ltype, \
+ fd, pb_delta(st, which_ ## u, m), \
+ pb_membersize(st, m[0]), 0, ptr}
+
+#define PB_ANONYMOUS_ONEOF_FIELD(union_name, tag, type, rules, allocation, placement, message, field, prevfield, ptr) \
+ PB_ANONYMOUS_ONEOF_ ## allocation(union_name, tag, message, field, \
+ PB_DATAOFFSET_ ## placement(message, field, prevfield), \
+ PB_LTYPE_MAP_ ## type, ptr)
+
+/* These macros are used for giving out error messages.
+ * They are mostly a debugging aid; the main error information
+ * is the true/false return value from functions.
+ * Some code space can be saved by disabling the error
+ * messages if not used.
+ *
+ * PB_SET_ERROR() sets the error message if none has been set yet.
+ * msg must be a constant string literal.
+ * PB_GET_ERROR() always returns a pointer to a string.
+ * PB_RETURN_ERROR() sets the error and returns false from current
+ * function.
+ */
+#ifdef PB_NO_ERRMSG
+#define PB_SET_ERROR(stream, msg) PB_UNUSED(stream)
+#define PB_GET_ERROR(stream) "(errmsg disabled)"
+#else
+#define PB_SET_ERROR(stream, msg) (stream->errmsg = (stream)->errmsg ? (stream)->errmsg : (msg))
+#define PB_GET_ERROR(stream) ((stream)->errmsg ? (stream)->errmsg : "(none)")
+#endif
+
+#define PB_RETURN_ERROR(stream, msg) return PB_SET_ERROR(stream, msg), false
+
+#endif
diff --git a/security/container/protos/nanopb/pb_common.c b/security/container/protos/nanopb/pb_common.c
new file mode 100644
index 0000000..4fb7186
--- /dev/null
+++ b/security/container/protos/nanopb/pb_common.c
@@ -0,0 +1,97 @@
+/* pb_common.c: Common support functions for pb_encode.c and pb_decode.c.
+ *
+ * 2014 Petteri Aimonen <jpa@kapsi.fi>
+ */
+
+#include "pb_common.h"
+
+bool pb_field_iter_begin(pb_field_iter_t *iter, const pb_field_t *fields, void *dest_struct)
+{
+ iter->start = fields;
+ iter->pos = fields;
+ iter->required_field_index = 0;
+ iter->dest_struct = dest_struct;
+ iter->pData = (char*)dest_struct + iter->pos->data_offset;
+ iter->pSize = (char*)iter->pData + iter->pos->size_offset;
+
+ return (iter->pos->tag != 0);
+}
+
+bool pb_field_iter_next(pb_field_iter_t *iter)
+{
+ const pb_field_t *prev_field = iter->pos;
+
+ if (prev_field->tag == 0)
+ {
+ /* Handle empty message types, where the first field is already the terminator.
+ * In other cases, the iter->pos never points to the terminator. */
+ return false;
+ }
+
+ iter->pos++;
+
+ if (iter->pos->tag == 0)
+ {
+ /* Wrapped back to beginning, reinitialize */
+ (void)pb_field_iter_begin(iter, iter->start, iter->dest_struct);
+ return false;
+ }
+ else
+ {
+ /* Increment the pointers based on previous field size */
+ size_t prev_size = prev_field->data_size;
+
+ if (PB_HTYPE(prev_field->type) == PB_HTYPE_ONEOF &&
+ PB_HTYPE(iter->pos->type) == PB_HTYPE_ONEOF &&
+ iter->pos->data_offset == PB_SIZE_MAX)
+ {
+ /* Don't advance pointers inside unions */
+ return true;
+ }
+ else if (PB_ATYPE(prev_field->type) == PB_ATYPE_STATIC &&
+ PB_HTYPE(prev_field->type) == PB_HTYPE_REPEATED)
+ {
+ /* In static arrays, the data_size tells the size of a single entry and
+ * array_size is the number of entries */
+ prev_size *= prev_field->array_size;
+ }
+ else if (PB_ATYPE(prev_field->type) == PB_ATYPE_POINTER)
+ {
+ /* Pointer fields always have a constant size in the main structure.
+ * The data_size only applies to the dynamically allocated area. */
+ prev_size = sizeof(void*);
+ }
+
+ if (PB_HTYPE(prev_field->type) == PB_HTYPE_REQUIRED)
+ {
+ /* Count the required fields, in order to check their presence in the
+ * decoder. */
+ iter->required_field_index++;
+ }
+
+ iter->pData = (char*)iter->pData + prev_size + iter->pos->data_offset;
+ iter->pSize = (char*)iter->pData + iter->pos->size_offset;
+ return true;
+ }
+}
+
+bool pb_field_iter_find(pb_field_iter_t *iter, uint32_t tag)
+{
+ const pb_field_t *start = iter->pos;
+
+ do {
+ if (iter->pos->tag == tag &&
+ PB_LTYPE(iter->pos->type) != PB_LTYPE_EXTENSION)
+ {
+ /* Found the wanted field */
+ return true;
+ }
+
+ (void)pb_field_iter_next(iter);
+ } while (iter->pos != start);
+
+ /* Searched all the way back to start, and found nothing. */
+ return false;
+}
+
+
diff --git a/security/container/protos/nanopb/pb_common.h b/security/container/protos/nanopb/pb_common.h
new file mode 100644
index 0000000..60b3d37
--- /dev/null
+++ b/security/container/protos/nanopb/pb_common.h
@@ -0,0 +1,42 @@
+/* pb_common.h: Common support functions for pb_encode.c and pb_decode.c.
+ * These functions are rarely needed by applications directly.
+ */
+
+#ifndef PB_COMMON_H_INCLUDED
+#define PB_COMMON_H_INCLUDED
+
+#include "pb.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Iterator for pb_field_t list */
+struct pb_field_iter_s {
+ const pb_field_t *start; /* Start of the pb_field_t array */
+ const pb_field_t *pos; /* Current position of the iterator */
+ unsigned required_field_index; /* Zero-based index that counts only the required fields */
+ void *dest_struct; /* Pointer to start of the structure */
+ void *pData; /* Pointer to current field value */
+ void *pSize; /* Pointer to count/has field */
+};
+typedef struct pb_field_iter_s pb_field_iter_t;
+
+/* Initialize the field iterator structure to beginning.
+ * Returns false if the message type is empty. */
+bool pb_field_iter_begin(pb_field_iter_t *iter, const pb_field_t *fields, void *dest_struct);
+
+/* Advance the iterator to the next field.
+ * Returns false when the iterator wraps back to the first field. */
+bool pb_field_iter_next(pb_field_iter_t *iter);
+
+/* Advance the iterator until it points at a field with the given tag.
+ * Returns false if no such field exists. */
+bool pb_field_iter_find(pb_field_iter_t *iter, uint32_t tag);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif
+
diff --git a/security/container/protos/nanopb/pb_decode.c b/security/container/protos/nanopb/pb_decode.c
new file mode 100644
index 0000000..4b80e81
--- /dev/null
+++ b/security/container/protos/nanopb/pb_decode.c
@@ -0,0 +1,1508 @@
+/* pb_decode.c -- decode a protobuf using minimal resources
+ *
+ * 2011 Petteri Aimonen <jpa@kapsi.fi>
+ */
+
+/* Use the GCC warn_unused_result attribute to check that all return values
+ * are propagated correctly. On other compilers and gcc before 3.4.0 just
+ * ignore the annotation.
+ */
+#if !defined(__GNUC__) || ( __GNUC__ < 3) || (__GNUC__ == 3 && __GNUC_MINOR__ < 4)
+ #define checkreturn
+#else
+ #define checkreturn __attribute__((warn_unused_result))
+#endif
+
+#include "pb.h"
+#include "pb_decode.h"
+#include "pb_common.h"
+
+/**************************************
+ * Declarations internal to this file *
+ **************************************/
+
+typedef bool (*pb_decoder_t)(pb_istream_t *stream, const pb_field_t *field, void *dest) checkreturn;
+
+static bool checkreturn buf_read(pb_istream_t *stream, pb_byte_t *buf, size_t count);
+static bool checkreturn read_raw_value(pb_istream_t *stream, pb_wire_type_t wire_type, pb_byte_t *buf, size_t *size);
+static bool checkreturn decode_static_field(pb_istream_t *stream, pb_wire_type_t wire_type, pb_field_iter_t *iter);
+static bool checkreturn decode_callback_field(pb_istream_t *stream, pb_wire_type_t wire_type, pb_field_iter_t *iter);
+static bool checkreturn decode_field(pb_istream_t *stream, pb_wire_type_t wire_type, pb_field_iter_t *iter);
+static void iter_from_extension(pb_field_iter_t *iter, pb_extension_t *extension);
+static bool checkreturn default_extension_decoder(pb_istream_t *stream, pb_extension_t *extension, uint32_t tag, pb_wire_type_t wire_type);
+static bool checkreturn decode_extension(pb_istream_t *stream, uint32_t tag, pb_wire_type_t wire_type, pb_field_iter_t *iter);
+static bool checkreturn find_extension_field(pb_field_iter_t *iter);
+static void pb_field_set_to_default(pb_field_iter_t *iter);
+static void pb_message_set_to_defaults(const pb_field_t fields[], void *dest_struct);
+static bool checkreturn pb_dec_varint(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_decode_varint32_eof(pb_istream_t *stream, uint32_t *dest, bool *eof);
+static bool checkreturn pb_dec_uvarint(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_dec_svarint(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_dec_fixed32(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_dec_fixed64(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_dec_bytes(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_dec_string(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_dec_submessage(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_dec_fixed_length_bytes(pb_istream_t *stream, const pb_field_t *field, void *dest);
+static bool checkreturn pb_skip_varint(pb_istream_t *stream);
+static bool checkreturn pb_skip_string(pb_istream_t *stream);
+
+#ifdef PB_ENABLE_MALLOC
+static bool checkreturn allocate_field(pb_istream_t *stream, void *pData, size_t data_size, size_t array_size);
+static bool checkreturn pb_release_union_field(pb_istream_t *stream, pb_field_iter_t *iter);
+static void pb_release_single_field(const pb_field_iter_t *iter);
+#endif
+
+#ifdef PB_WITHOUT_64BIT
+#define pb_int64_t int32_t
+#define pb_uint64_t uint32_t
+#else
+#define pb_int64_t int64_t
+#define pb_uint64_t uint64_t
+#endif
+
+/* --- Function pointers to field decoders ---
+ * Order in the array must match pb_action_t LTYPE numbering.
+ */
+static const pb_decoder_t PB_DECODERS[PB_LTYPES_COUNT] = {
+ &pb_dec_varint,
+ &pb_dec_uvarint,
+ &pb_dec_svarint,
+ &pb_dec_fixed32,
+ &pb_dec_fixed64,
+
+ &pb_dec_bytes,
+ &pb_dec_string,
+ &pb_dec_submessage,
+ NULL, /* extensions */
+ &pb_dec_fixed_length_bytes
+};
+
+/*******************************
+ * pb_istream_t implementation *
+ *******************************/
+
+static bool checkreturn buf_read(pb_istream_t *stream, pb_byte_t *buf, size_t count)
+{
+ size_t i;
+ const pb_byte_t *source = (const pb_byte_t*)stream->state;
+ stream->state = (pb_byte_t*)stream->state + count;
+
+ if (buf != NULL)
+ {
+ for (i = 0; i < count; i++)
+ buf[i] = source[i];
+ }
+
+ return true;
+}
+
+bool checkreturn pb_read(pb_istream_t *stream, pb_byte_t *buf, size_t count)
+{
+#ifndef PB_BUFFER_ONLY
+ if (buf == NULL && stream->callback != buf_read)
+ {
+ /* Skip input bytes */
+ pb_byte_t tmp[16];
+ while (count > 16)
+ {
+ if (!pb_read(stream, tmp, 16))
+ return false;
+
+ count -= 16;
+ }
+
+ return pb_read(stream, tmp, count);
+ }
+#endif
+
+ if (stream->bytes_left < count)
+ PB_RETURN_ERROR(stream, "end-of-stream");
+
+#ifndef PB_BUFFER_ONLY
+ if (!stream->callback(stream, buf, count))
+ PB_RETURN_ERROR(stream, "io error");
+#else
+ if (!buf_read(stream, buf, count))
+ return false;
+#endif
+
+ stream->bytes_left -= count;
+ return true;
+}
+
+/* Read a single byte from input stream. buf may not be NULL.
+ * This is an optimization for the varint decoding. */
+static bool checkreturn pb_readbyte(pb_istream_t *stream, pb_byte_t *buf)
+{
+ if (stream->bytes_left == 0)
+ PB_RETURN_ERROR(stream, "end-of-stream");
+
+#ifndef PB_BUFFER_ONLY
+ if (!stream->callback(stream, buf, 1))
+ PB_RETURN_ERROR(stream, "io error");
+#else
+ *buf = *(const pb_byte_t*)stream->state;
+ stream->state = (pb_byte_t*)stream->state + 1;
+#endif
+
+ stream->bytes_left--;
+
+ return true;
+}
+
+pb_istream_t pb_istream_from_buffer(const pb_byte_t *buf, size_t bufsize)
+{
+ pb_istream_t stream;
+ /* Cast away the const from buf without a compiler error. We are
+ * careful to use it only in a const manner in the callbacks.
+ */
+ union {
+ void *state;
+ const void *c_state;
+ } state;
+#ifdef PB_BUFFER_ONLY
+ stream.callback = NULL;
+#else
+ stream.callback = &buf_read;
+#endif
+ state.c_state = buf;
+ stream.state = state.state;
+ stream.bytes_left = bufsize;
+#ifndef PB_NO_ERRMSG
+ stream.errmsg = NULL;
+#endif
+ return stream;
+}
+
+/********************
+ * Helper functions *
+ ********************/
+
+static bool checkreturn pb_decode_varint32_eof(pb_istream_t *stream, uint32_t *dest, bool *eof)
+{
+ pb_byte_t byte;
+ uint32_t result;
+
+ if (!pb_readbyte(stream, &byte))
+ {
+ if (stream->bytes_left == 0)
+ {
+ if (eof)
+ {
+ *eof = true;
+ }
+ }
+
+ return false;
+ }
+
+ if ((byte & 0x80) == 0)
+ {
+ /* Quick case, 1 byte value */
+ result = byte;
+ }
+ else
+ {
+ /* Multibyte case */
+ uint_fast8_t bitpos = 7;
+ result = byte & 0x7F;
+
+ do
+ {
+ if (!pb_readbyte(stream, &byte))
+ return false;
+
+ if (bitpos >= 32)
+ {
+ /* Note: The varint could have trailing 0x80 bytes, or 0xFF for negative. */
+ uint8_t sign_extension = (bitpos < 63) ? 0xFF : 0x01;
+
+ if ((byte & 0x7F) != 0x00 && ((result >> 31) == 0 || byte != sign_extension))
+ {
+ PB_RETURN_ERROR(stream, "varint overflow");
+ }
+ }
+ else
+ {
+ result |= (uint32_t)(byte & 0x7F) << bitpos;
+ }
+ bitpos = (uint_fast8_t)(bitpos + 7);
+ } while (byte & 0x80);
+
+ if (bitpos == 35 && (byte & 0x70) != 0)
+ {
+ /* The last byte was at bitpos=28, so only bottom 4 bits fit. */
+ PB_RETURN_ERROR(stream, "varint overflow");
+ }
+ }
+
+ *dest = result;
+ return true;
+}
+
+bool checkreturn pb_decode_varint32(pb_istream_t *stream, uint32_t *dest)
+{
+ return pb_decode_varint32_eof(stream, dest, NULL);
+}
+
+#ifndef PB_WITHOUT_64BIT
+bool checkreturn pb_decode_varint(pb_istream_t *stream, uint64_t *dest)
+{
+ pb_byte_t byte;
+ uint_fast8_t bitpos = 0;
+ uint64_t result = 0;
+
+ do
+ {
+ if (bitpos >= 64)
+ PB_RETURN_ERROR(stream, "varint overflow");
+
+ if (!pb_readbyte(stream, &byte))
+ return false;
+
+ result |= (uint64_t)(byte & 0x7F) << bitpos;
+ bitpos = (uint_fast8_t)(bitpos + 7);
+ } while (byte & 0x80);
+
+ *dest = result;
+ return true;
+}
+#endif
+
+bool checkreturn pb_skip_varint(pb_istream_t *stream)
+{
+ pb_byte_t byte;
+ do
+ {
+ if (!pb_read(stream, &byte, 1))
+ return false;
+ } while (byte & 0x80);
+ return true;
+}
+
+bool checkreturn pb_skip_string(pb_istream_t *stream)
+{
+ uint32_t length;
+ if (!pb_decode_varint32(stream, &length))
+ return false;
+
+ return pb_read(stream, NULL, length);
+}
+
+bool checkreturn pb_decode_tag(pb_istream_t *stream, pb_wire_type_t *wire_type, uint32_t *tag, bool *eof)
+{
+ uint32_t temp;
+ *eof = false;
+ *wire_type = (pb_wire_type_t) 0;
+ *tag = 0;
+
+ if (!pb_decode_varint32_eof(stream, &temp, eof))
+ {
+ return false;
+ }
+
+ if (temp == 0)
+ {
+ *eof = true; /* Special feature: allow 0-terminated messages. */
+ return false;
+ }
+
+ *tag = temp >> 3;
+ *wire_type = (pb_wire_type_t)(temp & 7);
+ return true;
+}
+
+bool checkreturn pb_skip_field(pb_istream_t *stream, pb_wire_type_t wire_type)
+{
+ switch (wire_type)
+ {
+ case PB_WT_VARINT: return pb_skip_varint(stream);
+ case PB_WT_64BIT: return pb_read(stream, NULL, 8);
+ case PB_WT_STRING: return pb_skip_string(stream);
+ case PB_WT_32BIT: return pb_read(stream, NULL, 4);
+ default: PB_RETURN_ERROR(stream, "invalid wire_type");
+ }
+}
+
+/* Read a raw value to buffer, for the purpose of passing it to callback as
+ * a substream. Size is maximum size on call, and actual size on return.
+ */
+static bool checkreturn read_raw_value(pb_istream_t *stream, pb_wire_type_t wire_type, pb_byte_t *buf, size_t *size)
+{
+ size_t max_size = *size;
+ switch (wire_type)
+ {
+ case PB_WT_VARINT:
+ *size = 0;
+ do
+ {
+ (*size)++;
+ if (*size > max_size) return false;
+ if (!pb_read(stream, buf, 1)) return false;
+ } while (*buf++ & 0x80);
+ return true;
+
+ case PB_WT_64BIT:
+ *size = 8;
+ return pb_read(stream, buf, 8);
+
+ case PB_WT_32BIT:
+ *size = 4;
+ return pb_read(stream, buf, 4);
+
+ case PB_WT_STRING:
+ /* Calling read_raw_value with a PB_WT_STRING is an error.
+ * Explicitly handle this case and fallthrough to default to avoid
+ * compiler warnings.
+ */
+
+ default: PB_RETURN_ERROR(stream, "invalid wire_type");
+ }
+}
+
+/* Decode string length from stream and return a substream with limited length.
+ * Remember to close the substream using pb_close_string_substream().
+ */
+bool checkreturn pb_make_string_substream(pb_istream_t *stream, pb_istream_t *substream)
+{
+ uint32_t size;
+ if (!pb_decode_varint32(stream, &size))
+ return false;
+
+ *substream = *stream;
+ if (substream->bytes_left < size)
+ PB_RETURN_ERROR(stream, "parent stream too short");
+
+ substream->bytes_left = size;
+ stream->bytes_left -= size;
+ return true;
+}
+
+bool checkreturn pb_close_string_substream(pb_istream_t *stream, pb_istream_t *substream)
+{
+ if (substream->bytes_left) {
+ if (!pb_read(substream, NULL, substream->bytes_left))
+ return false;
+ }
+
+ stream->state = substream->state;
+
+#ifndef PB_NO_ERRMSG
+ stream->errmsg = substream->errmsg;
+#endif
+ return true;
+}
+
+/*************************
+ * Decode a single field *
+ *************************/
+
+static bool checkreturn decode_static_field(pb_istream_t *stream, pb_wire_type_t wire_type, pb_field_iter_t *iter)
+{
+ pb_type_t type;
+ pb_decoder_t func;
+
+ type = iter->pos->type;
+ func = PB_DECODERS[PB_LTYPE(type)];
+
+ switch (PB_HTYPE(type))
+ {
+ case PB_HTYPE_REQUIRED:
+ return func(stream, iter->pos, iter->pData);
+
+ case PB_HTYPE_OPTIONAL:
+ if (iter->pSize != iter->pData)
+ *(bool*)iter->pSize = true;
+ return func(stream, iter->pos, iter->pData);
+
+ case PB_HTYPE_REPEATED:
+ if (wire_type == PB_WT_STRING
+ && PB_LTYPE(type) <= PB_LTYPE_LAST_PACKABLE)
+ {
+ /* Packed array */
+ bool status = true;
+ pb_size_t *size = (pb_size_t*)iter->pSize;
+
+ pb_istream_t substream;
+ if (!pb_make_string_substream(stream, &substream))
+ return false;
+
+ while (substream.bytes_left > 0 && *size < iter->pos->array_size)
+ {
+ void *pItem = (char*)iter->pData + iter->pos->data_size * (*size);
+ if (!func(&substream, iter->pos, pItem))
+ {
+ status = false;
+ break;
+ }
+ (*size)++;
+ }
+
+ if (substream.bytes_left != 0)
+ PB_RETURN_ERROR(stream, "array overflow");
+ if (!pb_close_string_substream(stream, &substream))
+ return false;
+
+ return status;
+ }
+ else
+ {
+ /* Repeated field */
+ pb_size_t *size = (pb_size_t*)iter->pSize;
+ char *pItem = (char*)iter->pData + iter->pos->data_size * (*size);
+
+ if ((*size)++ >= iter->pos->array_size)
+ PB_RETURN_ERROR(stream, "array overflow");
+
+ return func(stream, iter->pos, pItem);
+ }
+
+ case PB_HTYPE_ONEOF:
+ *(pb_size_t*)iter->pSize = iter->pos->tag;
+ if (PB_LTYPE(type) == PB_LTYPE_SUBMESSAGE)
+ {
+ /* We memset to zero so that any callbacks are set to NULL.
+ * Then set any default values. */
+ memset(iter->pData, 0, iter->pos->data_size);
+ pb_message_set_to_defaults((const pb_field_t*)iter->pos->ptr, iter->pData);
+ }
+ return func(stream, iter->pos, iter->pData);
+
+ default:
+ PB_RETURN_ERROR(stream, "invalid field type");
+ }
+}
+
+#ifdef PB_ENABLE_MALLOC
+/* Allocate storage for the field and store the pointer at iter->pData.
+ * array_size is the number of entries to reserve in an array.
+ * Zero size is not allowed, use pb_free() for releasing.
+ */
+static bool checkreturn allocate_field(pb_istream_t *stream, void *pData, size_t data_size, size_t array_size)
+{
+ void *ptr = *(void**)pData;
+
+ if (data_size == 0 || array_size == 0)
+ PB_RETURN_ERROR(stream, "invalid size");
+
+ /* Check for multiplication overflows.
+ * This code avoids the costly division if the sizes are small enough.
+ * Multiplication is safe as long as only half of bits are set
+ * in either multiplicand.
+ */
+ {
+ const size_t check_limit = (size_t)1 << (sizeof(size_t) * 4);
+ if (data_size >= check_limit || array_size >= check_limit)
+ {
+ const size_t size_max = (size_t)-1;
+ if (size_max / array_size < data_size)
+ {
+ PB_RETURN_ERROR(stream, "size too large");
+ }
+ }
+ }
+
+ /* Allocate new or expand previous allocation */
+ /* Note: on failure the old pointer will remain in the structure,
+ * the message must be freed by caller also on error return. */
+ ptr = pb_realloc(ptr, array_size * data_size);
+ if (ptr == NULL)
+ PB_RETURN_ERROR(stream, "realloc failed");
+
+ *(void**)pData = ptr;
+ return true;
+}
+
+/* Clear a newly allocated item in case it contains a pointer, or is a submessage. */
+static void initialize_pointer_field(void *pItem, pb_field_iter_t *iter)
+{
+ if (PB_LTYPE(iter->pos->type) == PB_LTYPE_STRING ||
+ PB_LTYPE(iter->pos->type) == PB_LTYPE_BYTES)
+ {
+ *(void**)pItem = NULL;
+ }
+ else if (PB_LTYPE(iter->pos->type) == PB_LTYPE_SUBMESSAGE)
+ {
+ /* We memset to zero so that any callbacks are set to NULL.
+ * Then set any default values. */
+ memset(pItem, 0, iter->pos->data_size);
+ pb_message_set_to_defaults((const pb_field_t *) iter->pos->ptr, pItem);
+ }
+}
+#endif
+
+static bool checkreturn decode_pointer_field(pb_istream_t *stream, pb_wire_type_t wire_type, pb_field_iter_t *iter)
+{
+#ifndef PB_ENABLE_MALLOC
+ PB_UNUSED(wire_type);
+ PB_UNUSED(iter);
+ PB_RETURN_ERROR(stream, "no malloc support");
+#else
+ pb_type_t type;
+ pb_decoder_t func;
+
+ type = iter->pos->type;
+ func = PB_DECODERS[PB_LTYPE(type)];
+
+ switch (PB_HTYPE(type))
+ {
+ case PB_HTYPE_REQUIRED:
+ case PB_HTYPE_OPTIONAL:
+ case PB_HTYPE_ONEOF:
+ if (PB_LTYPE(type) == PB_LTYPE_SUBMESSAGE &&
+ *(void**)iter->pData != NULL)
+ {
+ /* Duplicate field, have to release the old allocation first. */
+ pb_release_single_field(iter);
+ }
+
+ if (PB_HTYPE(type) == PB_HTYPE_ONEOF)
+ {
+ *(pb_size_t*)iter->pSize = iter->pos->tag;
+ }
+
+ if (PB_LTYPE(type) == PB_LTYPE_STRING ||
+ PB_LTYPE(type) == PB_LTYPE_BYTES)
+ {
+ return func(stream, iter->pos, iter->pData);
+ }
+ else
+ {
+ if (!allocate_field(stream, iter->pData, iter->pos->data_size, 1))
+ return false;
+
+ initialize_pointer_field(*(void**)iter->pData, iter);
+ return func(stream, iter->pos, *(void**)iter->pData);
+ }
+
+ case PB_HTYPE_REPEATED:
+ if (wire_type == PB_WT_STRING
+ && PB_LTYPE(type) <= PB_LTYPE_LAST_PACKABLE)
+ {
+ /* Packed array, multiple items come in at once. */
+ bool status = true;
+ pb_size_t *size = (pb_size_t*)iter->pSize;
+ size_t allocated_size = *size;
+ void *pItem;
+ pb_istream_t substream;
+
+ if (!pb_make_string_substream(stream, &substream))
+ return false;
+
+ while (substream.bytes_left)
+ {
+ if ((size_t)*size + 1 > allocated_size)
+ {
+ /* Allocate more storage. This tries to guess the
+ * number of remaining entries. Round the division
+ * upwards. */
+ allocated_size += (substream.bytes_left - 1) / iter->pos->data_size + 1;
+
+ if (!allocate_field(&substream, iter->pData, iter->pos->data_size, allocated_size))
+ {
+ status = false;
+ break;
+ }
+ }
+
+ /* Decode the array entry */
+ pItem = *(char**)iter->pData + iter->pos->data_size * (*size);
+ initialize_pointer_field(pItem, iter);
+ if (!func(&substream, iter->pos, pItem))
+ {
+ status = false;
+ break;
+ }
+
+ if (*size == PB_SIZE_MAX)
+ {
+#ifndef PB_NO_ERRMSG
+ stream->errmsg = "too many array entries";
+#endif
+ status = false;
+ break;
+ }
+
+ (*size)++;
+ }
+ if (!pb_close_string_substream(stream, &substream))
+ return false;
+
+ return status;
+ }
+ else
+ {
+ /* Normal repeated field, i.e. only one item at a time. */
+ pb_size_t *size = (pb_size_t*)iter->pSize;
+ void *pItem;
+
+ if (*size == PB_SIZE_MAX)
+ PB_RETURN_ERROR(stream, "too many array entries");
+
+ (*size)++;
+ if (!allocate_field(stream, iter->pData, iter->pos->data_size, *size))
+ return false;
+
+ pItem = *(char**)iter->pData + iter->pos->data_size * (*size - 1);
+ initialize_pointer_field(pItem, iter);
+ return func(stream, iter->pos, pItem);
+ }
+
+ default:
+ PB_RETURN_ERROR(stream, "invalid field type");
+ }
+#endif
+}
+
+static bool checkreturn decode_callback_field(pb_istream_t *stream, pb_wire_type_t wire_type, pb_field_iter_t *iter)
+{
+ pb_callback_t *pCallback = (pb_callback_t*)iter->pData;
+
+#ifdef PB_OLD_CALLBACK_STYLE
+ void *arg = pCallback->arg;
+#else
+ void **arg = &(pCallback->arg);
+#endif
+
+ if (pCallback == NULL || pCallback->funcs.decode == NULL)
+ return pb_skip_field(stream, wire_type);
+
+ if (wire_type == PB_WT_STRING)
+ {
+ pb_istream_t substream;
+
+ if (!pb_make_string_substream(stream, &substream))
+ return false;
+
+ do
+ {
+ if (!pCallback->funcs.decode(&substream, iter->pos, arg))
+ PB_RETURN_ERROR(stream, "callback failed");
+ } while (substream.bytes_left);
+
+ if (!pb_close_string_substream(stream, &substream))
+ return false;
+
+ return true;
+ }
+ else
+ {
+ /* Copy the single scalar value to stack.
+ * This is required so that we can limit the stream length,
+ * which in turn allows to use same callback for packed and
+ * not-packed fields. */
+ pb_istream_t substream;
+ pb_byte_t buffer[10];
+ size_t size = sizeof(buffer);
+
+ if (!read_raw_value(stream, wire_type, buffer, &size))
+ return false;
+ substream = pb_istream_from_buffer(buffer, size);
+
+ return pCallback->funcs.decode(&substream, iter->pos, arg);
+ }
+}
+
+static bool checkreturn decode_field(pb_istream_t *stream, pb_wire_type_t wire_type, pb_field_iter_t *iter)
+{
+#ifdef PB_ENABLE_MALLOC
+ /* When decoding an oneof field, check if there is old data that must be
+ * released first. */
+ if (PB_HTYPE(iter->pos->type) == PB_HTYPE_ONEOF)
+ {
+ if (!pb_release_union_field(stream, iter))
+ return false;
+ }
+#endif
+
+ switch (PB_ATYPE(iter->pos->type))
+ {
+ case PB_ATYPE_STATIC:
+ return decode_static_field(stream, wire_type, iter);
+
+ case PB_ATYPE_POINTER:
+ return decode_pointer_field(stream, wire_type, iter);
+
+ case PB_ATYPE_CALLBACK:
+ return decode_callback_field(stream, wire_type, iter);
+
+ default:
+ PB_RETURN_ERROR(stream, "invalid field type");
+ }
+}
+
+static void iter_from_extension(pb_field_iter_t *iter, pb_extension_t *extension)
+{
+ /* Fake a field iterator for the extension field.
+ * It is not actually safe to advance this iterator, but decode_field
+ * will not even try to. */
+ const pb_field_t *field = (const pb_field_t*)extension->type->arg;
+ (void)pb_field_iter_begin(iter, field, extension->dest);
+ iter->pData = extension->dest;
+ iter->pSize = &extension->found;
+
+ if (PB_ATYPE(field->type) == PB_ATYPE_POINTER)
+ {
+ /* For pointer extensions, the pointer is stored directly
+ * in the extension structure. This avoids having an extra
+ * indirection. */
+ iter->pData = &extension->dest;
+ }
+}
+
+/* Default handler for extension fields. Expects a pb_field_t structure
+ * in extension->type->arg. */
+static bool checkreturn default_extension_decoder(pb_istream_t *stream,
+ pb_extension_t *extension, uint32_t tag, pb_wire_type_t wire_type)
+{
+ const pb_field_t *field = (const pb_field_t*)extension->type->arg;
+ pb_field_iter_t iter;
+
+ if (field->tag != tag)
+ return true;
+
+ iter_from_extension(&iter, extension);
+ extension->found = true;
+ return decode_field(stream, wire_type, &iter);
+}
+
+/* Try to decode an unknown field as an extension field. Tries each extension
+ * decoder in turn, until one of them handles the field or loop ends. */
+static bool checkreturn decode_extension(pb_istream_t *stream,
+ uint32_t tag, pb_wire_type_t wire_type, pb_field_iter_t *iter)
+{
+ pb_extension_t *extension = *(pb_extension_t* const *)iter->pData;
+ size_t pos = stream->bytes_left;
+
+ while (extension != NULL && pos == stream->bytes_left)
+ {
+ bool status;
+ if (extension->type->decode)
+ status = extension->type->decode(stream, extension, tag, wire_type);
+ else
+ status = default_extension_decoder(stream, extension, tag, wire_type);
+
+ if (!status)
+ return false;
+
+ extension = extension->next;
+ }
+
+ return true;
+}
+
+/* Step through the iterator until an extension field is found or until all
+ * entries have been checked. There can be only one extension field per
+ * message. Returns false if no extension field is found. */
+static bool checkreturn find_extension_field(pb_field_iter_t *iter)
+{
+ const pb_field_t *start = iter->pos;
+
+ do {
+ if (PB_LTYPE(iter->pos->type) == PB_LTYPE_EXTENSION)
+ return true;
+ (void)pb_field_iter_next(iter);
+ } while (iter->pos != start);
+
+ return false;
+}
+
+/* Initialize message fields to default values, recursively */
+static void pb_field_set_to_default(pb_field_iter_t *iter)
+{
+ pb_type_t type;
+ type = iter->pos->type;
+
+ if (PB_LTYPE(type) == PB_LTYPE_EXTENSION)
+ {
+ pb_extension_t *ext = *(pb_extension_t* const *)iter->pData;
+ while (ext != NULL)
+ {
+ pb_field_iter_t ext_iter;
+ ext->found = false;
+ iter_from_extension(&ext_iter, ext);
+ pb_field_set_to_default(&ext_iter);
+ ext = ext->next;
+ }
+ }
+ else if (PB_ATYPE(type) == PB_ATYPE_STATIC)
+ {
+ bool init_data = true;
+ if (PB_HTYPE(type) == PB_HTYPE_OPTIONAL && iter->pSize != iter->pData)
+ {
+ /* Set has_field to false. Still initialize the optional field
+ * itself also. */
+ *(bool*)iter->pSize = false;
+ }
+ else if (PB_HTYPE(type) == PB_HTYPE_REPEATED ||
+ PB_HTYPE(type) == PB_HTYPE_ONEOF)
+ {
+ /* REPEATED: Set array count to 0, no need to initialize contents.
+ ONEOF: Set which_field to 0. */
+ *(pb_size_t*)iter->pSize = 0;
+ init_data = false;
+ }
+
+ if (init_data)
+ {
+ if (PB_LTYPE(iter->pos->type) == PB_LTYPE_SUBMESSAGE)
+ {
+ /* Initialize submessage to defaults */
+ pb_message_set_to_defaults((const pb_field_t *) iter->pos->ptr, iter->pData);
+ }
+ else if (iter->pos->ptr != NULL)
+ {
+ /* Initialize to default value */
+ memcpy(iter->pData, iter->pos->ptr, iter->pos->data_size);
+ }
+ else
+ {
+ /* Initialize to zeros */
+ memset(iter->pData, 0, iter->pos->data_size);
+ }
+ }
+ }
+ else if (PB_ATYPE(type) == PB_ATYPE_POINTER)
+ {
+ /* Initialize the pointer to NULL. */
+ *(void**)iter->pData = NULL;
+
+ /* Initialize array count to 0. */
+ if (PB_HTYPE(type) == PB_HTYPE_REPEATED ||
+ PB_HTYPE(type) == PB_HTYPE_ONEOF)
+ {
+ *(pb_size_t*)iter->pSize = 0;
+ }
+ }
+ else if (PB_ATYPE(type) == PB_ATYPE_CALLBACK)
+ {
+ /* Don't overwrite callback */
+ }
+}
+
+static void pb_message_set_to_defaults(const pb_field_t fields[], void *dest_struct)
+{
+ pb_field_iter_t iter;
+
+ if (!pb_field_iter_begin(&iter, fields, dest_struct))
+ return; /* Empty message type */
+
+ do
+ {
+ pb_field_set_to_default(&iter);
+ } while (pb_field_iter_next(&iter));
+}
+
+/*********************
+ * Decode all fields *
+ *********************/
+
+bool checkreturn pb_decode_noinit(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct)
+{
+ uint32_t fields_seen[(PB_MAX_REQUIRED_FIELDS + 31) / 32] = {0, 0};
+ const uint32_t allbits = ~(uint32_t)0;
+ uint32_t extension_range_start = 0;
+ pb_field_iter_t iter;
+
+ /* 'fixed_count_field' and 'fixed_count_size' track position of a repeated fixed
+ * count field. This can only handle _one_ repeated fixed count field that
+ * is unpacked and unordered among other (non repeated fixed count) fields.
+ */
+ const pb_field_t *fixed_count_field = NULL;
+ pb_size_t fixed_count_size = 0;
+
+ /* Return value ignored, as empty message types will be correctly handled by
+ * pb_field_iter_find() anyway. */
+ (void)pb_field_iter_begin(&iter, fields, dest_struct);
+
+ while (stream->bytes_left)
+ {
+ uint32_t tag;
+ pb_wire_type_t wire_type;
+ bool eof;
+
+ if (!pb_decode_tag(stream, &wire_type, &tag, &eof))
+ {
+ if (eof)
+ break;
+ else
+ return false;
+ }
+
+ if (!pb_field_iter_find(&iter, tag))
+ {
+ /* No match found, check if it matches an extension. */
+ if (tag >= extension_range_start)
+ {
+ if (!find_extension_field(&iter))
+ extension_range_start = (uint32_t)-1;
+ else
+ extension_range_start = iter.pos->tag;
+
+ if (tag >= extension_range_start)
+ {
+ size_t pos = stream->bytes_left;
+
+ if (!decode_extension(stream, tag, wire_type, &iter))
+ return false;
+
+ if (pos != stream->bytes_left)
+ {
+ /* The field was handled */
+ continue;
+ }
+ }
+ }
+
+ /* No match found, skip data */
+ if (!pb_skip_field(stream, wire_type))
+ return false;
+ continue;
+ }
+
+ /* If a repeated fixed count field was found, get size from
+ * 'fixed_count_field' as there is no counter contained in the struct.
+ */
+ if (PB_HTYPE(iter.pos->type) == PB_HTYPE_REPEATED
+ && iter.pSize == iter.pData)
+ {
+ if (fixed_count_field != iter.pos) {
+ /* If the new fixed count field does not match the previous one,
+ * check that the previous one is NULL or that it finished
+ * receiving all the expected data.
+ */
+ if (fixed_count_field != NULL &&
+ fixed_count_size != fixed_count_field->array_size)
+ {
+ PB_RETURN_ERROR(stream, "wrong size for fixed count field");
+ }
+
+ fixed_count_field = iter.pos;
+ fixed_count_size = 0;
+ }
+
+ iter.pSize = &fixed_count_size;
+ }
+
+ if (PB_HTYPE(iter.pos->type) == PB_HTYPE_REQUIRED
+ && iter.required_field_index < PB_MAX_REQUIRED_FIELDS)
+ {
+ uint32_t tmp = ((uint32_t)1 << (iter.required_field_index & 31));
+ fields_seen[iter.required_field_index >> 5] |= tmp;
+ }
+
+ if (!decode_field(stream, wire_type, &iter))
+ return false;
+ }
+
+ /* Check that all elements of the last decoded fixed count field were present. */
+ if (fixed_count_field != NULL &&
+ fixed_count_size != fixed_count_field->array_size)
+ {
+ PB_RETURN_ERROR(stream, "wrong size for fixed count field");
+ }
+
+ /* Check that all required fields were present. */
+ {
+ /* First figure out the number of required fields by
+ * seeking to the end of the field array. Usually we
+ * are already close to end after decoding.
+ */
+ unsigned req_field_count;
+ pb_type_t last_type;
+ unsigned i;
+ do {
+ req_field_count = iter.required_field_index;
+ last_type = iter.pos->type;
+ } while (pb_field_iter_next(&iter));
+
+ /* Fixup if last field was also required. */
+ if (PB_HTYPE(last_type) == PB_HTYPE_REQUIRED && iter.pos->tag != 0)
+ req_field_count++;
+
+ if (req_field_count > PB_MAX_REQUIRED_FIELDS)
+ req_field_count = PB_MAX_REQUIRED_FIELDS;
+
+ if (req_field_count > 0)
+ {
+ /* Check the whole words */
+ for (i = 0; i < (req_field_count >> 5); i++)
+ {
+ if (fields_seen[i] != allbits)
+ PB_RETURN_ERROR(stream, "missing required field");
+ }
+
+ /* Check the remaining bits (if any) */
+ if ((req_field_count & 31) != 0)
+ {
+ if (fields_seen[req_field_count >> 5] !=
+ (allbits >> (32 - (req_field_count & 31))))
+ {
+ PB_RETURN_ERROR(stream, "missing required field");
+ }
+ }
+ }
+ }
+
+ return true;
+}
+
+bool checkreturn pb_decode(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct)
+{
+ bool status;
+ pb_message_set_to_defaults(fields, dest_struct);
+ status = pb_decode_noinit(stream, fields, dest_struct);
+
+#ifdef PB_ENABLE_MALLOC
+ if (!status)
+ pb_release(fields, dest_struct);
+#endif
+
+ return status;
+}
+
+bool pb_decode_delimited_noinit(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct)
+{
+ pb_istream_t substream;
+ bool status;
+
+ if (!pb_make_string_substream(stream, &substream))
+ return false;
+
+ status = pb_decode_noinit(&substream, fields, dest_struct);
+
+ if (!pb_close_string_substream(stream, &substream))
+ return false;
+ return status;
+}
+
+bool pb_decode_delimited(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct)
+{
+ pb_istream_t substream;
+ bool status;
+
+ if (!pb_make_string_substream(stream, &substream))
+ return false;
+
+ status = pb_decode(&substream, fields, dest_struct);
+
+ if (!pb_close_string_substream(stream, &substream))
+ return false;
+ return status;
+}
+
+bool pb_decode_nullterminated(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct)
+{
+ /* This behaviour will be separated in nanopb-0.4.0, see issue #278. */
+ return pb_decode(stream, fields, dest_struct);
+}
+
+#ifdef PB_ENABLE_MALLOC
+/* Given an oneof field, if there has already been a field inside this oneof,
+ * release it before overwriting with a different one. */
+static bool pb_release_union_field(pb_istream_t *stream, pb_field_iter_t *iter)
+{
+ pb_size_t old_tag = *(pb_size_t*)iter->pSize; /* Previous which_ value */
+ pb_size_t new_tag = iter->pos->tag; /* New which_ value */
+
+ if (old_tag == 0)
+ return true; /* Ok, no old data in union */
+
+ if (old_tag == new_tag)
+ return true; /* Ok, old data is of same type => merge */
+
+ /* Release old data. The find can fail if the message struct contains
+ * invalid data. */
+ if (!pb_field_iter_find(iter, old_tag))
+ PB_RETURN_ERROR(stream, "invalid union tag");
+
+ pb_release_single_field(iter);
+
+ /* Restore iterator to where it should be.
+ * This shouldn't fail unless the pb_field_t structure is corrupted. */
+ if (!pb_field_iter_find(iter, new_tag))
+ PB_RETURN_ERROR(stream, "iterator error");
+
+ return true;
+}
+
+static void pb_release_single_field(const pb_field_iter_t *iter)
+{
+ pb_type_t type;
+ type = iter->pos->type;
+
+ if (PB_HTYPE(type) == PB_HTYPE_ONEOF)
+ {
+ if (*(pb_size_t*)iter->pSize != iter->pos->tag)
+ return; /* This is not the current field in the union */
+ }
+
+ /* Release anything contained inside an extension or submsg.
+ * This has to be done even if the submsg itself is statically
+ * allocated. */
+ if (PB_LTYPE(type) == PB_LTYPE_EXTENSION)
+ {
+ /* Release fields from all extensions in the linked list */
+ pb_extension_t *ext = *(pb_extension_t**)iter->pData;
+ while (ext != NULL)
+ {
+ pb_field_iter_t ext_iter;
+ iter_from_extension(&ext_iter, ext);
+ pb_release_single_field(&ext_iter);
+ ext = ext->next;
+ }
+ }
+ else if (PB_LTYPE(type) == PB_LTYPE_SUBMESSAGE)
+ {
+ /* Release fields in submessage or submsg array */
+ void *pItem = iter->pData;
+ pb_size_t count = 1;
+
+ if (PB_ATYPE(type) == PB_ATYPE_POINTER)
+ {
+ pItem = *(void**)iter->pData;
+ }
+
+ if (PB_HTYPE(type) == PB_HTYPE_REPEATED)
+ {
+ if (PB_ATYPE(type) == PB_ATYPE_STATIC && iter->pSize == iter->pData) {
+ /* No _count field so use size of the array */
+ count = iter->pos->array_size;
+ } else {
+ count = *(pb_size_t*)iter->pSize;
+ }
+
+ if (PB_ATYPE(type) == PB_ATYPE_STATIC && count > iter->pos->array_size)
+ {
+ /* Protect against corrupted _count fields */
+ count = iter->pos->array_size;
+ }
+ }
+
+ if (pItem)
+ {
+ while (count--)
+ {
+ pb_release((const pb_field_t*)iter->pos->ptr, pItem);
+ pItem = (char*)pItem + iter->pos->data_size;
+ }
+ }
+ }
+
+ if (PB_ATYPE(type) == PB_ATYPE_POINTER)
+ {
+ if (PB_HTYPE(type) == PB_HTYPE_REPEATED &&
+ (PB_LTYPE(type) == PB_LTYPE_STRING ||
+ PB_LTYPE(type) == PB_LTYPE_BYTES))
+ {
+ /* Release entries in repeated string or bytes array */
+ void **pItem = *(void***)iter->pData;
+ pb_size_t count = *(pb_size_t*)iter->pSize;
+ while (count--)
+ {
+ pb_free(*pItem);
+ *pItem++ = NULL;
+ }
+ }
+
+ if (PB_HTYPE(type) == PB_HTYPE_REPEATED)
+ {
+ /* We are going to release the array, so set the size to 0 */
+ *(pb_size_t*)iter->pSize = 0;
+ }
+
+ /* Release main item */
+ pb_free(*(void**)iter->pData);
+ *(void**)iter->pData = NULL;
+ }
+}
+
+void pb_release(const pb_field_t fields[], void *dest_struct)
+{
+ pb_field_iter_t iter;
+
+ if (!dest_struct)
+ return; /* Ignore NULL pointers, similar to free() */
+
+ if (!pb_field_iter_begin(&iter, fields, dest_struct))
+ return; /* Empty message type */
+
+ do
+ {
+ pb_release_single_field(&iter);
+ } while (pb_field_iter_next(&iter));
+}
+#endif
+
+/* Field decoders */
+
+bool pb_decode_svarint(pb_istream_t *stream, pb_int64_t *dest)
+{
+ pb_uint64_t value;
+ if (!pb_decode_varint(stream, &value))
+ return false;
+
+ if (value & 1)
+ *dest = (pb_int64_t)(~(value >> 1));
+ else
+ *dest = (pb_int64_t)(value >> 1);
+
+ return true;
+}
+
+bool pb_decode_fixed32(pb_istream_t *stream, void *dest)
+{
+ pb_byte_t bytes[4];
+
+ if (!pb_read(stream, bytes, 4))
+ return false;
+
+ *(uint32_t*)dest = ((uint32_t)bytes[0] << 0) |
+ ((uint32_t)bytes[1] << 8) |
+ ((uint32_t)bytes[2] << 16) |
+ ((uint32_t)bytes[3] << 24);
+ return true;
+}
+
+#ifndef PB_WITHOUT_64BIT
+bool pb_decode_fixed64(pb_istream_t *stream, void *dest)
+{
+ pb_byte_t bytes[8];
+
+ if (!pb_read(stream, bytes, 8))
+ return false;
+
+ *(uint64_t*)dest = ((uint64_t)bytes[0] << 0) |
+ ((uint64_t)bytes[1] << 8) |
+ ((uint64_t)bytes[2] << 16) |
+ ((uint64_t)bytes[3] << 24) |
+ ((uint64_t)bytes[4] << 32) |
+ ((uint64_t)bytes[5] << 40) |
+ ((uint64_t)bytes[6] << 48) |
+ ((uint64_t)bytes[7] << 56);
+
+ return true;
+}
+#endif
+
+static bool checkreturn pb_dec_varint(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ pb_uint64_t value;
+ pb_int64_t svalue;
+ pb_int64_t clamped;
+ if (!pb_decode_varint(stream, &value))
+ return false;
+
+ /* See issue 97: Google's C++ protobuf allows negative varint values to
+ * be cast as int32_t, instead of the int64_t that should be used when
+ * encoding. Previous nanopb versions had a bug in encoding. In order to
+ * not break decoding of such messages, we cast <=32 bit fields to
+ * int32_t first to get the sign correct.
+ */
+ if (field->data_size == sizeof(pb_int64_t))
+ svalue = (pb_int64_t)value;
+ else
+ svalue = (int32_t)value;
+
+ /* Cast to the proper field size, while checking for overflows */
+ if (field->data_size == sizeof(pb_int64_t))
+ clamped = *(pb_int64_t*)dest = svalue;
+ else if (field->data_size == sizeof(int32_t))
+ clamped = *(int32_t*)dest = (int32_t)svalue;
+ else if (field->data_size == sizeof(int_least16_t))
+ clamped = *(int_least16_t*)dest = (int_least16_t)svalue;
+ else if (field->data_size == sizeof(int_least8_t))
+ clamped = *(int_least8_t*)dest = (int_least8_t)svalue;
+ else
+ PB_RETURN_ERROR(stream, "invalid data_size");
+
+ if (clamped != svalue)
+ PB_RETURN_ERROR(stream, "integer too large");
+
+ return true;
+}
+
+static bool checkreturn pb_dec_uvarint(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ pb_uint64_t value, clamped;
+ if (!pb_decode_varint(stream, &value))
+ return false;
+
+ /* Cast to the proper field size, while checking for overflows */
+ if (field->data_size == sizeof(pb_uint64_t))
+ clamped = *(pb_uint64_t*)dest = value;
+ else if (field->data_size == sizeof(uint32_t))
+ clamped = *(uint32_t*)dest = (uint32_t)value;
+ else if (field->data_size == sizeof(uint_least16_t))
+ clamped = *(uint_least16_t*)dest = (uint_least16_t)value;
+ else if (field->data_size == sizeof(uint_least8_t))
+ clamped = *(uint_least8_t*)dest = (uint_least8_t)value;
+ else
+ PB_RETURN_ERROR(stream, "invalid data_size");
+
+ if (clamped != value)
+ PB_RETURN_ERROR(stream, "integer too large");
+
+ return true;
+}
+
+static bool checkreturn pb_dec_svarint(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ pb_int64_t value, clamped;
+ if (!pb_decode_svarint(stream, &value))
+ return false;
+
+ /* Cast to the proper field size, while checking for overflows */
+ if (field->data_size == sizeof(pb_int64_t))
+ clamped = *(pb_int64_t*)dest = value;
+ else if (field->data_size == sizeof(int32_t))
+ clamped = *(int32_t*)dest = (int32_t)value;
+ else if (field->data_size == sizeof(int_least16_t))
+ clamped = *(int_least16_t*)dest = (int_least16_t)value;
+ else if (field->data_size == sizeof(int_least8_t))
+ clamped = *(int_least8_t*)dest = (int_least8_t)value;
+ else
+ PB_RETURN_ERROR(stream, "invalid data_size");
+
+ if (clamped != value)
+ PB_RETURN_ERROR(stream, "integer too large");
+
+ return true;
+}
+
+static bool checkreturn pb_dec_fixed32(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ PB_UNUSED(field);
+ return pb_decode_fixed32(stream, dest);
+}
+
+static bool checkreturn pb_dec_fixed64(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ PB_UNUSED(field);
+#ifndef PB_WITHOUT_64BIT
+ return pb_decode_fixed64(stream, dest);
+#else
+ PB_UNUSED(dest);
+ PB_RETURN_ERROR(stream, "no 64bit support");
+#endif
+}
+
+static bool checkreturn pb_dec_bytes(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ uint32_t size;
+ size_t alloc_size;
+ pb_bytes_array_t *bdest;
+
+ if (!pb_decode_varint32(stream, &size))
+ return false;
+
+ if (size > PB_SIZE_MAX)
+ PB_RETURN_ERROR(stream, "bytes overflow");
+
+ alloc_size = PB_BYTES_ARRAY_T_ALLOCSIZE(size);
+ if (size > alloc_size)
+ PB_RETURN_ERROR(stream, "size too large");
+
+ if (PB_ATYPE(field->type) == PB_ATYPE_POINTER)
+ {
+#ifndef PB_ENABLE_MALLOC
+ PB_RETURN_ERROR(stream, "no malloc support");
+#else
+ if (!allocate_field(stream, dest, alloc_size, 1))
+ return false;
+ bdest = *(pb_bytes_array_t**)dest;
+#endif
+ }
+ else
+ {
+ if (alloc_size > field->data_size)
+ PB_RETURN_ERROR(stream, "bytes overflow");
+ bdest = (pb_bytes_array_t*)dest;
+ }
+
+ bdest->size = (pb_size_t)size;
+ return pb_read(stream, bdest->bytes, size);
+}
+
+static bool checkreturn pb_dec_string(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ uint32_t size;
+ size_t alloc_size;
+ bool status;
+ if (!pb_decode_varint32(stream, &size))
+ return false;
+
+ /* Space for null terminator */
+ alloc_size = size + 1;
+
+ if (alloc_size < size)
+ PB_RETURN_ERROR(stream, "size too large");
+
+ if (PB_ATYPE(field->type) == PB_ATYPE_POINTER)
+ {
+#ifndef PB_ENABLE_MALLOC
+ PB_RETURN_ERROR(stream, "no malloc support");
+#else
+ if (!allocate_field(stream, dest, alloc_size, 1))
+ return false;
+ dest = *(void**)dest;
+#endif
+ }
+ else
+ {
+ if (alloc_size > field->data_size)
+ PB_RETURN_ERROR(stream, "string overflow");
+ }
+
+ status = pb_read(stream, (pb_byte_t*)dest, size);
+ *((pb_byte_t*)dest + size) = 0;
+ return status;
+}
+
+static bool checkreturn pb_dec_submessage(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ bool status;
+ pb_istream_t substream;
+ const pb_field_t* submsg_fields = (const pb_field_t*)field->ptr;
+
+ if (!pb_make_string_substream(stream, &substream))
+ return false;
+
+ if (field->ptr == NULL)
+ PB_RETURN_ERROR(stream, "invalid field descriptor");
+
+ /* New array entries need to be initialized, while required and optional
+ * submessages have already been initialized in the top-level pb_decode. */
+ if (PB_HTYPE(field->type) == PB_HTYPE_REPEATED)
+ status = pb_decode(&substream, submsg_fields, dest);
+ else
+ status = pb_decode_noinit(&substream, submsg_fields, dest);
+
+ if (!pb_close_string_substream(stream, &substream))
+ return false;
+ return status;
+}
+
+static bool checkreturn pb_dec_fixed_length_bytes(pb_istream_t *stream, const pb_field_t *field, void *dest)
+{
+ uint32_t size;
+
+ if (!pb_decode_varint32(stream, &size))
+ return false;
+
+ if (size > PB_SIZE_MAX)
+ PB_RETURN_ERROR(stream, "bytes overflow");
+
+ if (size == 0)
+ {
+ /* As a special case, treat empty bytes string as all zeros for fixed_length_bytes. */
+ memset(dest, 0, field->data_size);
+ return true;
+ }
+
+ if (size != field->data_size)
+ PB_RETURN_ERROR(stream, "incorrect fixed length bytes size");
+
+ return pb_read(stream, (pb_byte_t*)dest, field->data_size);
+}
diff --git a/security/container/protos/nanopb/pb_decode.h b/security/container/protos/nanopb/pb_decode.h
new file mode 100644
index 0000000..398b24a
--- /dev/null
+++ b/security/container/protos/nanopb/pb_decode.h
@@ -0,0 +1,175 @@
+/* pb_decode.h: Functions to decode protocol buffers. Depends on pb_decode.c.
+ * The main function is pb_decode. You also need an input stream, and the
+ * field descriptions created by nanopb_generator.py.
+ */
+
+#ifndef PB_DECODE_H_INCLUDED
+#define PB_DECODE_H_INCLUDED
+
+#include "pb.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Structure for defining custom input streams. You will need to provide
+ * a callback function to read the bytes from your storage, which can be
+ * for example a file or a network socket.
+ *
+ * The callback must conform to these rules:
+ *
+ * 1) Return false on IO errors. This will cause decoding to abort.
+ * 2) You can use state to store your own data (e.g. buffer pointer),
+ * and rely on pb_read to verify that no-body reads past bytes_left.
+ * 3) Your callback may be used with substreams, in which case bytes_left
+ * is different than from the main stream. Don't use bytes_left to compute
+ * any pointers.
+ */
+struct pb_istream_s
+{
+#ifdef PB_BUFFER_ONLY
+ /* Callback pointer is not used in buffer-only configuration.
+ * Having an int pointer here allows binary compatibility but
+ * gives an error if someone tries to assign callback function.
+ */
+ int *callback;
+#else
+ bool (*callback)(pb_istream_t *stream, pb_byte_t *buf, size_t count);
+#endif
+
+ void *state; /* Free field for use by callback implementation */
+ size_t bytes_left;
+
+#ifndef PB_NO_ERRMSG
+ const char *errmsg;
+#endif
+};
+
+/***************************
+ * Main decoding functions *
+ ***************************/
+
+/* Decode a single protocol buffers message from input stream into a C structure.
+ * Returns true on success, false on any failure.
+ * The actual struct pointed to by dest must match the description in fields.
+ * Callback fields of the destination structure must be initialized by caller.
+ * All other fields will be initialized by this function.
+ *
+ * Example usage:
+ * MyMessage msg = {};
+ * uint8_t buffer[64];
+ * pb_istream_t stream;
+ *
+ * // ... read some data into buffer ...
+ *
+ * stream = pb_istream_from_buffer(buffer, count);
+ * pb_decode(&stream, MyMessage_fields, &msg);
+ */
+bool pb_decode(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct);
+
+/* Same as pb_decode, except does not initialize the destination structure
+ * to default values. This is slightly faster if you need no default values
+ * and just do memset(struct, 0, sizeof(struct)) yourself.
+ *
+ * This can also be used for 'merging' two messages, i.e. update only the
+ * fields that exist in the new message.
+ *
+ * Note: If this function returns with an error, it will not release any
+ * dynamically allocated fields. You will need to call pb_release() yourself.
+ */
+bool pb_decode_noinit(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct);
+
+/* Same as pb_decode, except expects the stream to start with the message size
+ * encoded as varint. Corresponds to parseDelimitedFrom() in Google's
+ * protobuf API.
+ */
+bool pb_decode_delimited(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct);
+
+/* Same as pb_decode_delimited, except that it does not initialize the destination structure.
+ * See pb_decode_noinit
+ */
+bool pb_decode_delimited_noinit(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct);
+
+/* Same as pb_decode, except allows the message to be terminated with a null byte.
+ * NOTE: Until nanopb-0.4.0, pb_decode() also allows null-termination. This behaviour
+ * is not supported in most other protobuf implementations, so pb_decode_delimited()
+ * is a better option for compatibility.
+ */
+bool pb_decode_nullterminated(pb_istream_t *stream, const pb_field_t fields[], void *dest_struct);
+
+#ifdef PB_ENABLE_MALLOC
+/* Release any allocated pointer fields. If you use dynamic allocation, you should
+ * call this for any successfully decoded message when you are done with it. If
+ * pb_decode() returns with an error, the message is already released.
+ */
+void pb_release(const pb_field_t fields[], void *dest_struct);
+#endif
+
+
+/**************************************
+ * Functions for manipulating streams *
+ **************************************/
+
+/* Create an input stream for reading from a memory buffer.
+ *
+ * Alternatively, you can use a custom stream that reads directly from e.g.
+ * a file or a network socket.
+ */
+pb_istream_t pb_istream_from_buffer(const pb_byte_t *buf, size_t bufsize);
+
+/* Function to read from a pb_istream_t. You can use this if you need to
+ * read some custom header data, or to read data in field callbacks.
+ */
+bool pb_read(pb_istream_t *stream, pb_byte_t *buf, size_t count);
+
+
+/************************************************
+ * Helper functions for writing field callbacks *
+ ************************************************/
+
+/* Decode the tag for the next field in the stream. Gives the wire type and
+ * field tag. At end of the message, returns false and sets eof to true. */
+bool pb_decode_tag(pb_istream_t *stream, pb_wire_type_t *wire_type, uint32_t *tag, bool *eof);
+
+/* Skip the field payload data, given the wire type. */
+bool pb_skip_field(pb_istream_t *stream, pb_wire_type_t wire_type);
+
+/* Decode an integer in the varint format. This works for bool, enum, int32,
+ * int64, uint32 and uint64 field types. */
+#ifndef PB_WITHOUT_64BIT
+bool pb_decode_varint(pb_istream_t *stream, uint64_t *dest);
+#else
+#define pb_decode_varint pb_decode_varint32
+#endif
+
+/* Decode an integer in the varint format. This works for bool, enum, int32,
+ * and uint32 field types. */
+bool pb_decode_varint32(pb_istream_t *stream, uint32_t *dest);
+
+/* Decode an integer in the zig-zagged svarint format. This works for sint32
+ * and sint64. */
+#ifndef PB_WITHOUT_64BIT
+bool pb_decode_svarint(pb_istream_t *stream, int64_t *dest);
+#else
+bool pb_decode_svarint(pb_istream_t *stream, int32_t *dest);
+#endif
+
+/* Decode a fixed32, sfixed32 or float value. You need to pass a pointer to
+ * a 4-byte wide C variable. */
+bool pb_decode_fixed32(pb_istream_t *stream, void *dest);
+
+#ifndef PB_WITHOUT_64BIT
+/* Decode a fixed64, sfixed64 or double value. You need to pass a pointer to
+ * a 8-byte wide C variable. */
+bool pb_decode_fixed64(pb_istream_t *stream, void *dest);
+#endif
+
+/* Make a limited-length substream for reading a PB_WT_STRING field. */
+bool pb_make_string_substream(pb_istream_t *stream, pb_istream_t *substream);
+bool pb_close_string_substream(pb_istream_t *stream, pb_istream_t *substream);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif
diff --git a/security/container/protos/nanopb/pb_encode.c b/security/container/protos/nanopb/pb_encode.c
new file mode 100644
index 0000000..089172c
--- /dev/null
+++ b/security/container/protos/nanopb/pb_encode.c
@@ -0,0 +1,869 @@
+/* pb_encode.c -- encode a protobuf using minimal resources
+ *
+ * 2011 Petteri Aimonen <jpa@kapsi.fi>
+ */
+
+#include "pb.h"
+#include "pb_encode.h"
+#include "pb_common.h"
+
+/* Use the GCC warn_unused_result attribute to check that all return values
+ * are propagated correctly. On other compilers and gcc before 3.4.0 just
+ * ignore the annotation.
+ */
+#if !defined(__GNUC__) || ( __GNUC__ < 3) || (__GNUC__ == 3 && __GNUC_MINOR__ < 4)
+ #define checkreturn
+#else
+ #define checkreturn __attribute__((warn_unused_result))
+#endif
+
+/**************************************
+ * Declarations internal to this file *
+ **************************************/
+typedef bool (*pb_encoder_t)(pb_ostream_t *stream, const pb_field_t *field, const void *src) checkreturn;
+
+static bool checkreturn buf_write(pb_ostream_t *stream, const pb_byte_t *buf, size_t count);
+static bool checkreturn encode_array(pb_ostream_t *stream, const pb_field_t *field, const void *pData, size_t count, pb_encoder_t func);
+static bool checkreturn encode_field(pb_ostream_t *stream, const pb_field_t *field, const void *pData);
+static bool checkreturn default_extension_encoder(pb_ostream_t *stream, const pb_extension_t *extension);
+static bool checkreturn encode_extension_field(pb_ostream_t *stream, const pb_field_t *field, const void *pData);
+static void *pb_const_cast(const void *p);
+static bool checkreturn pb_enc_varint(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+static bool checkreturn pb_enc_uvarint(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+static bool checkreturn pb_enc_svarint(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+static bool checkreturn pb_enc_fixed32(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+static bool checkreturn pb_enc_fixed64(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+static bool checkreturn pb_enc_bytes(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+static bool checkreturn pb_enc_string(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+static bool checkreturn pb_enc_submessage(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+static bool checkreturn pb_enc_fixed_length_bytes(pb_ostream_t *stream, const pb_field_t *field, const void *src);
+
+#ifdef PB_WITHOUT_64BIT
+#define pb_int64_t int32_t
+#define pb_uint64_t uint32_t
+
+static bool checkreturn pb_encode_negative_varint(pb_ostream_t *stream, pb_uint64_t value);
+#else
+#define pb_int64_t int64_t
+#define pb_uint64_t uint64_t
+#endif
+
+/* --- Function pointers to field encoders ---
+ * Order in the array must match pb_action_t LTYPE numbering.
+ */
+static const pb_encoder_t PB_ENCODERS[PB_LTYPES_COUNT] = {
+ &pb_enc_varint,
+ &pb_enc_uvarint,
+ &pb_enc_svarint,
+ &pb_enc_fixed32,
+ &pb_enc_fixed64,
+
+ &pb_enc_bytes,
+ &pb_enc_string,
+ &pb_enc_submessage,
+ NULL, /* extensions */
+ &pb_enc_fixed_length_bytes
+};
+
+/*******************************
+ * pb_ostream_t implementation *
+ *******************************/
+
+static bool checkreturn buf_write(pb_ostream_t *stream, const pb_byte_t *buf, size_t count)
+{
+ size_t i;
+ pb_byte_t *dest = (pb_byte_t*)stream->state;
+ stream->state = dest + count;
+
+ for (i = 0; i < count; i++)
+ dest[i] = buf[i];
+
+ return true;
+}
+
+pb_ostream_t pb_ostream_from_buffer(pb_byte_t *buf, size_t bufsize)
+{
+ pb_ostream_t stream;
+#ifdef PB_BUFFER_ONLY
+ stream.callback = (void*)1; /* Just a marker value */
+#else
+ stream.callback = &buf_write;
+#endif
+ stream.state = buf;
+ stream.max_size = bufsize;
+ stream.bytes_written = 0;
+#ifndef PB_NO_ERRMSG
+ stream.errmsg = NULL;
+#endif
+ return stream;
+}
+
+bool checkreturn pb_write(pb_ostream_t *stream, const pb_byte_t *buf, size_t count)
+{
+ if (stream->callback != NULL)
+ {
+ if (stream->bytes_written + count > stream->max_size)
+ PB_RETURN_ERROR(stream, "stream full");
+
+#ifdef PB_BUFFER_ONLY
+ if (!buf_write(stream, buf, count))
+ PB_RETURN_ERROR(stream, "io error");
+#else
+ if (!stream->callback(stream, buf, count))
+ PB_RETURN_ERROR(stream, "io error");
+#endif
+ }
+
+ stream->bytes_written += count;
+ return true;
+}
+
+/*************************
+ * Encode a single field *
+ *************************/
+
+/* Encode a static array. Handles the size calculations and possible packing. */
+static bool checkreturn encode_array(pb_ostream_t *stream, const pb_field_t *field,
+ const void *pData, size_t count, pb_encoder_t func)
+{
+ size_t i;
+ const void *p;
+ size_t size;
+
+ if (count == 0)
+ return true;
+
+ if (PB_ATYPE(field->type) != PB_ATYPE_POINTER && count > field->array_size)
+ PB_RETURN_ERROR(stream, "array max size exceeded");
+
+ /* We always pack arrays if the datatype allows it. */
+ if (PB_LTYPE(field->type) <= PB_LTYPE_LAST_PACKABLE)
+ {
+ if (!pb_encode_tag(stream, PB_WT_STRING, field->tag))
+ return false;
+
+ /* Determine the total size of packed array. */
+ if (PB_LTYPE(field->type) == PB_LTYPE_FIXED32)
+ {
+ size = 4 * count;
+ }
+ else if (PB_LTYPE(field->type) == PB_LTYPE_FIXED64)
+ {
+ size = 8 * count;
+ }
+ else
+ {
+ pb_ostream_t sizestream = PB_OSTREAM_SIZING;
+ p = pData;
+ for (i = 0; i < count; i++)
+ {
+ if (!func(&sizestream, field, p))
+ return false;
+ p = (const char*)p + field->data_size;
+ }
+ size = sizestream.bytes_written;
+ }
+
+ if (!pb_encode_varint(stream, (pb_uint64_t)size))
+ return false;
+
+ if (stream->callback == NULL)
+ return pb_write(stream, NULL, size); /* Just sizing.. */
+
+ /* Write the data */
+ p = pData;
+ for (i = 0; i < count; i++)
+ {
+ if (!func(stream, field, p))
+ return false;
+ p = (const char*)p + field->data_size;
+ }
+ }
+ else
+ {
+ p = pData;
+ for (i = 0; i < count; i++)
+ {
+ if (!pb_encode_tag_for_field(stream, field))
+ return false;
+
+ /* Normally the data is stored directly in the array entries, but
+ * for pointer-type string and bytes fields, the array entries are
+ * actually pointers themselves also. So we have to dereference once
+ * more to get to the actual data. */
+ if (PB_ATYPE(field->type) == PB_ATYPE_POINTER &&
+ (PB_LTYPE(field->type) == PB_LTYPE_STRING ||
+ PB_LTYPE(field->type) == PB_LTYPE_BYTES))
+ {
+ if (!func(stream, field, *(const void* const*)p))
+ return false;
+ }
+ else
+ {
+ if (!func(stream, field, p))
+ return false;
+ }
+ p = (const char*)p + field->data_size;
+ }
+ }
+
+ return true;
+}
+
+/* In proto3, all fields are optional and are only encoded if their value is "non-zero".
+ * This function implements the check for the zero value. */
+static bool pb_check_proto3_default_value(const pb_field_t *field, const void *pData)
+{
+ pb_type_t type = field->type;
+ const void *pSize = (const char*)pData + field->size_offset;
+
+ if (PB_HTYPE(type) == PB_HTYPE_REQUIRED)
+ {
+ /* Required proto2 fields inside proto3 submessage, pretty rare case */
+ return false;
+ }
+ else if (PB_HTYPE(type) == PB_HTYPE_REPEATED)
+ {
+ /* Repeated fields inside proto3 submessage: present if count != 0 */
+ return *(const pb_size_t*)pSize == 0;
+ }
+ else if (PB_HTYPE(type) == PB_HTYPE_ONEOF)
+ {
+ /* Oneof fields */
+ return *(const pb_size_t*)pSize == 0;
+ }
+ else if (PB_HTYPE(type) == PB_HTYPE_OPTIONAL && field->size_offset)
+ {
+ /* Proto2 optional fields inside proto3 submessage */
+ return *(const bool*)pSize == false;
+ }
+
+ /* Rest is proto3 singular fields */
+
+ if (PB_ATYPE(type) == PB_ATYPE_STATIC)
+ {
+ if (PB_LTYPE(type) == PB_LTYPE_BYTES)
+ {
+ const pb_bytes_array_t *bytes = (const pb_bytes_array_t*)pData;
+ return bytes->size == 0;
+ }
+ else if (PB_LTYPE(type) == PB_LTYPE_STRING)
+ {
+ return *(const char*)pData == '\0';
+ }
+ else if (PB_LTYPE(type) == PB_LTYPE_FIXED_LENGTH_BYTES)
+ {
+ /* Fixed length bytes is only empty if its length is fixed
+ * as 0. Which would be pretty strange, but we can check
+ * it anyway. */
+ return field->data_size == 0;
+ }
+ else if (PB_LTYPE(type) == PB_LTYPE_SUBMESSAGE)
+ {
+ /* Check all fields in the submessage to find if any of them
+ * are non-zero. The comparison cannot be done byte-per-byte
+ * because the C struct may contain padding bytes that must
+ * be skipped.
+ */
+ pb_field_iter_t iter;
+ if (pb_field_iter_begin(&iter, (const pb_field_t*)field->ptr, pb_const_cast(pData)))
+ {
+ do
+ {
+ if (!pb_check_proto3_default_value(iter.pos, iter.pData))
+ {
+ return false;
+ }
+ } while (pb_field_iter_next(&iter));
+ }
+ return true;
+ }
+ }
+
+ {
+ /* Catch-all branch that does byte-per-byte comparison for zero value.
+ *
+ * This is for all pointer fields, and for static PB_LTYPE_VARINT,
+ * UVARINT, SVARINT, FIXED32, FIXED64, EXTENSION fields, and also
+ * callback fields. These all have integer or pointer value which
+ * can be compared with 0.
+ */
+ pb_size_t i;
+ const char *p = (const char*)pData;
+ for (i = 0; i < field->data_size; i++)
+ {
+ if (p[i] != 0)
+ {
+ return false;
+ }
+ }
+
+ return true;
+ }
+}
+
+/* Encode a field with static or pointer allocation, i.e. one whose data
+ * is available to the encoder directly. */
+static bool checkreturn encode_basic_field(pb_ostream_t *stream,
+ const pb_field_t *field, const void *pData)
+{
+ pb_encoder_t func;
+ bool implicit_has;
+ const void *pSize = &implicit_has;
+
+ func = PB_ENCODERS[PB_LTYPE(field->type)];
+
+ if (field->size_offset)
+ {
+ /* Static optional, repeated or oneof field */
+ pSize = (const char*)pData + field->size_offset;
+ }
+ else if (PB_HTYPE(field->type) == PB_HTYPE_OPTIONAL)
+ {
+ /* Proto3 style field, optional but without explicit has_ field. */
+ implicit_has = !pb_check_proto3_default_value(field, pData);
+ }
+ else
+ {
+ /* Required field, always present */
+ implicit_has = true;
+ }
+
+ if (PB_ATYPE(field->type) == PB_ATYPE_POINTER)
+ {
+ /* pData is a pointer to the field, which contains pointer to
+ * the data. If the 2nd pointer is NULL, it is interpreted as if
+ * the has_field was false.
+ */
+ pData = *(const void* const*)pData;
+ implicit_has = (pData != NULL);
+ }
+
+ switch (PB_HTYPE(field->type))
+ {
+ case PB_HTYPE_REQUIRED:
+ if (!pData)
+ PB_RETURN_ERROR(stream, "missing required field");
+ if (!pb_encode_tag_for_field(stream, field))
+ return false;
+ if (!func(stream, field, pData))
+ return false;
+ break;
+
+ case PB_HTYPE_OPTIONAL:
+ if (*(const bool*)pSize)
+ {
+ if (!pb_encode_tag_for_field(stream, field))
+ return false;
+
+ if (!func(stream, field, pData))
+ return false;
+ }
+ break;
+
+ case PB_HTYPE_REPEATED: {
+ pb_size_t count;
+ if (field->size_offset != 0) {
+ count = *(const pb_size_t*)pSize;
+ } else {
+ count = field->array_size;
+ }
+ if (!encode_array(stream, field, pData, count, func))
+ return false;
+ break;
+ }
+
+ case PB_HTYPE_ONEOF:
+ if (*(const pb_size_t*)pSize == field->tag)
+ {
+ if (!pb_encode_tag_for_field(stream, field))
+ return false;
+
+ if (!func(stream, field, pData))
+ return false;
+ }
+ break;
+
+ default:
+ PB_RETURN_ERROR(stream, "invalid field type");
+ }
+
+ return true;
+}
+
+/* Encode a field with callback semantics. This means that a user function is
+ * called to provide and encode the actual data. */
+static bool checkreturn encode_callback_field(pb_ostream_t *stream,
+ const pb_field_t *field, const void *pData)
+{
+ const pb_callback_t *callback = (const pb_callback_t*)pData;
+
+#ifdef PB_OLD_CALLBACK_STYLE
+ const void *arg = callback->arg;
+#else
+ void * const *arg = &(callback->arg);
+#endif
+
+ if (callback->funcs.encode != NULL)
+ {
+ if (!callback->funcs.encode(stream, field, arg))
+ PB_RETURN_ERROR(stream, "callback error");
+ }
+ return true;
+}
+
+/* Encode a single field of any callback or static type. */
+static bool checkreturn encode_field(pb_ostream_t *stream,
+ const pb_field_t *field, const void *pData)
+{
+ switch (PB_ATYPE(field->type))
+ {
+ case PB_ATYPE_STATIC:
+ case PB_ATYPE_POINTER:
+ return encode_basic_field(stream, field, pData);
+
+ case PB_ATYPE_CALLBACK:
+ return encode_callback_field(stream, field, pData);
+
+ default:
+ PB_RETURN_ERROR(stream, "invalid field type");
+ }
+}
+
+/* Default handler for extension fields. Expects to have a pb_field_t
+ * pointer in the extension->type->arg field. */
+static bool checkreturn default_extension_encoder(pb_ostream_t *stream,
+ const pb_extension_t *extension)
+{
+ const pb_field_t *field = (const pb_field_t*)extension->type->arg;
+
+ if (PB_ATYPE(field->type) == PB_ATYPE_POINTER)
+ {
+ /* For pointer extensions, the pointer is stored directly
+ * in the extension structure. This avoids having an extra
+ * indirection. */
+ return encode_field(stream, field, &extension->dest);
+ }
+ else
+ {
+ return encode_field(stream, field, extension->dest);
+ }
+}
+
+/* Walk through all the registered extensions and give them a chance
+ * to encode themselves. */
+static bool checkreturn encode_extension_field(pb_ostream_t *stream,
+ const pb_field_t *field, const void *pData)
+{
+ const pb_extension_t *extension = *(const pb_extension_t* const *)pData;
+ PB_UNUSED(field);
+
+ while (extension)
+ {
+ bool status;
+ if (extension->type->encode)
+ status = extension->type->encode(stream, extension);
+ else
+ status = default_extension_encoder(stream, extension);
+
+ if (!status)
+ return false;
+
+ extension = extension->next;
+ }
+
+ return true;
+}
+
+/*********************
+ * Encode all fields *
+ *********************/
+
+static void *pb_const_cast(const void *p)
+{
+ /* Note: this casts away const, in order to use the common field iterator
+ * logic for both encoding and decoding. */
+ union {
+ void *p1;
+ const void *p2;
+ } t;
+ t.p2 = p;
+ return t.p1;
+}
+
+bool checkreturn pb_encode(pb_ostream_t *stream, const pb_field_t fields[], const void *src_struct)
+{
+ pb_field_iter_t iter;
+ if (!pb_field_iter_begin(&iter, fields, pb_const_cast(src_struct)))
+ return true; /* Empty message type */
+
+ do {
+ if (PB_LTYPE(iter.pos->type) == PB_LTYPE_EXTENSION)
+ {
+ /* Special case for the extension field placeholder */
+ if (!encode_extension_field(stream, iter.pos, iter.pData))
+ return false;
+ }
+ else
+ {
+ /* Regular field */
+ if (!encode_field(stream, iter.pos, iter.pData))
+ return false;
+ }
+ } while (pb_field_iter_next(&iter));
+
+ return true;
+}
+
+bool pb_encode_delimited(pb_ostream_t *stream, const pb_field_t fields[], const void *src_struct)
+{
+ return pb_encode_submessage(stream, fields, src_struct);
+}
+
+bool pb_encode_nullterminated(pb_ostream_t *stream, const pb_field_t fields[], const void *src_struct)
+{
+ const pb_byte_t zero = 0;
+
+ if (!pb_encode(stream, fields, src_struct))
+ return false;
+
+ return pb_write(stream, &zero, 1);
+}
+
+bool pb_get_encoded_size(size_t *size, const pb_field_t fields[], const void *src_struct)
+{
+ pb_ostream_t stream = PB_OSTREAM_SIZING;
+
+ if (!pb_encode(&stream, fields, src_struct))
+ return false;
+
+ *size = stream.bytes_written;
+ return true;
+}
+
+/********************
+ * Helper functions *
+ ********************/
+
+#ifdef PB_WITHOUT_64BIT
+bool checkreturn pb_encode_negative_varint(pb_ostream_t *stream, pb_uint64_t value)
+{
+ pb_byte_t buffer[10];
+ size_t i = 0;
+ size_t compensation = 32;/* we need to compensate 32 bits all set to 1 */
+
+ while (value)
+ {
+ buffer[i] = (pb_byte_t)((value & 0x7F) | 0x80);
+ value >>= 7;
+ if (compensation)
+ {
+ /* re-set all the compensation bits we can or need */
+ size_t bits = compensation > 7 ? 7 : compensation;
+ value ^= (pb_uint64_t)((0xFFu >> (8 - bits)) << 25); /* set the number of bits needed on the lowest of the most significant 7 bits */
+ compensation -= bits;
+ }
+ i++;
+ }
+ buffer[i - 1] &= 0x7F; /* Unset top bit on last byte */
+
+ return pb_write(stream, buffer, i);
+}
+#endif
+
+bool checkreturn pb_encode_varint(pb_ostream_t *stream, pb_uint64_t value)
+{
+ pb_byte_t buffer[10];
+ size_t i = 0;
+
+ if (value <= 0x7F)
+ {
+ pb_byte_t v = (pb_byte_t)value;
+ return pb_write(stream, &v, 1);
+ }
+
+ while (value)
+ {
+ buffer[i] = (pb_byte_t)((value & 0x7F) | 0x80);
+ value >>= 7;
+ i++;
+ }
+ buffer[i-1] &= 0x7F; /* Unset top bit on last byte */
+
+ return pb_write(stream, buffer, i);
+}
+
+bool checkreturn pb_encode_svarint(pb_ostream_t *stream, pb_int64_t value)
+{
+ pb_uint64_t zigzagged;
+ if (value < 0)
+ zigzagged = ~((pb_uint64_t)value << 1);
+ else
+ zigzagged = (pb_uint64_t)value << 1;
+
+ return pb_encode_varint(stream, zigzagged);
+}
+
+bool checkreturn pb_encode_fixed32(pb_ostream_t *stream, const void *value)
+{
+ uint32_t val = *(const uint32_t*)value;
+ pb_byte_t bytes[4];
+ bytes[0] = (pb_byte_t)(val & 0xFF);
+ bytes[1] = (pb_byte_t)((val >> 8) & 0xFF);
+ bytes[2] = (pb_byte_t)((val >> 16) & 0xFF);
+ bytes[3] = (pb_byte_t)((val >> 24) & 0xFF);
+ return pb_write(stream, bytes, 4);
+}
+
+#ifndef PB_WITHOUT_64BIT
+bool checkreturn pb_encode_fixed64(pb_ostream_t *stream, const void *value)
+{
+ uint64_t val = *(const uint64_t*)value;
+ pb_byte_t bytes[8];
+ bytes[0] = (pb_byte_t)(val & 0xFF);
+ bytes[1] = (pb_byte_t)((val >> 8) & 0xFF);
+ bytes[2] = (pb_byte_t)((val >> 16) & 0xFF);
+ bytes[3] = (pb_byte_t)((val >> 24) & 0xFF);
+ bytes[4] = (pb_byte_t)((val >> 32) & 0xFF);
+ bytes[5] = (pb_byte_t)((val >> 40) & 0xFF);
+ bytes[6] = (pb_byte_t)((val >> 48) & 0xFF);
+ bytes[7] = (pb_byte_t)((val >> 56) & 0xFF);
+ return pb_write(stream, bytes, 8);
+}
+#endif
+
+bool checkreturn pb_encode_tag(pb_ostream_t *stream, pb_wire_type_t wiretype, uint32_t field_number)
+{
+ pb_uint64_t tag = ((pb_uint64_t)field_number << 3) | wiretype;
+ return pb_encode_varint(stream, tag);
+}
+
+bool checkreturn pb_encode_tag_for_field(pb_ostream_t *stream, const pb_field_t *field)
+{
+ pb_wire_type_t wiretype;
+ switch (PB_LTYPE(field->type))
+ {
+ case PB_LTYPE_VARINT:
+ case PB_LTYPE_UVARINT:
+ case PB_LTYPE_SVARINT:
+ wiretype = PB_WT_VARINT;
+ break;
+
+ case PB_LTYPE_FIXED32:
+ wiretype = PB_WT_32BIT;
+ break;
+
+ case PB_LTYPE_FIXED64:
+ wiretype = PB_WT_64BIT;
+ break;
+
+ case PB_LTYPE_BYTES:
+ case PB_LTYPE_STRING:
+ case PB_LTYPE_SUBMESSAGE:
+ case PB_LTYPE_FIXED_LENGTH_BYTES:
+ wiretype = PB_WT_STRING;
+ break;
+
+ default:
+ PB_RETURN_ERROR(stream, "invalid field type");
+ }
+
+ return pb_encode_tag(stream, wiretype, field->tag);
+}
+
+bool checkreturn pb_encode_string(pb_ostream_t *stream, const pb_byte_t *buffer, size_t size)
+{
+ if (!pb_encode_varint(stream, (pb_uint64_t)size))
+ return false;
+
+ return pb_write(stream, buffer, size);
+}
+
+bool checkreturn pb_encode_submessage(pb_ostream_t *stream, const pb_field_t fields[], const void *src_struct)
+{
+ /* First calculate the message size using a non-writing substream. */
+ pb_ostream_t substream = PB_OSTREAM_SIZING;
+ size_t size;
+ bool status;
+
+ if (!pb_encode(&substream, fields, src_struct))
+ {
+#ifndef PB_NO_ERRMSG
+ stream->errmsg = substream.errmsg;
+#endif
+ return false;
+ }
+
+ size = substream.bytes_written;
+
+ if (!pb_encode_varint(stream, (pb_uint64_t)size))
+ return false;
+
+ if (stream->callback == NULL)
+ return pb_write(stream, NULL, size); /* Just sizing */
+
+ if (stream->bytes_written + size > stream->max_size)
+ PB_RETURN_ERROR(stream, "stream full");
+
+ /* Use a substream to verify that a callback doesn't write more than
+ * what it did the first time. */
+ substream.callback = stream->callback;
+ substream.state = stream->state;
+ substream.max_size = size;
+ substream.bytes_written = 0;
+#ifndef PB_NO_ERRMSG
+ substream.errmsg = NULL;
+#endif
+
+ status = pb_encode(&substream, fields, src_struct);
+
+ stream->bytes_written += substream.bytes_written;
+ stream->state = substream.state;
+#ifndef PB_NO_ERRMSG
+ stream->errmsg = substream.errmsg;
+#endif
+
+ if (substream.bytes_written != size)
+ PB_RETURN_ERROR(stream, "submsg size changed");
+
+ return status;
+}
+
+/* Field encoders */
+
+static bool checkreturn pb_enc_varint(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ pb_int64_t value = 0;
+
+ if (field->data_size == sizeof(int_least8_t))
+ value = *(const int_least8_t*)src;
+ else if (field->data_size == sizeof(int_least16_t))
+ value = *(const int_least16_t*)src;
+ else if (field->data_size == sizeof(int32_t))
+ value = *(const int32_t*)src;
+ else if (field->data_size == sizeof(pb_int64_t))
+ value = *(const pb_int64_t*)src;
+ else
+ PB_RETURN_ERROR(stream, "invalid data_size");
+
+#ifdef PB_WITHOUT_64BIT
+ if (value < 0)
+ return pb_encode_negative_varint(stream, (pb_uint64_t)value);
+ else
+#endif
+ return pb_encode_varint(stream, (pb_uint64_t)value);
+}
+
+static bool checkreturn pb_enc_uvarint(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ pb_uint64_t value = 0;
+
+ if (field->data_size == sizeof(uint_least8_t))
+ value = *(const uint_least8_t*)src;
+ else if (field->data_size == sizeof(uint_least16_t))
+ value = *(const uint_least16_t*)src;
+ else if (field->data_size == sizeof(uint32_t))
+ value = *(const uint32_t*)src;
+ else if (field->data_size == sizeof(pb_uint64_t))
+ value = *(const pb_uint64_t*)src;
+ else
+ PB_RETURN_ERROR(stream, "invalid data_size");
+
+ return pb_encode_varint(stream, value);
+}
+
+static bool checkreturn pb_enc_svarint(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ pb_int64_t value = 0;
+
+ if (field->data_size == sizeof(int_least8_t))
+ value = *(const int_least8_t*)src;
+ else if (field->data_size == sizeof(int_least16_t))
+ value = *(const int_least16_t*)src;
+ else if (field->data_size == sizeof(int32_t))
+ value = *(const int32_t*)src;
+ else if (field->data_size == sizeof(pb_int64_t))
+ value = *(const pb_int64_t*)src;
+ else
+ PB_RETURN_ERROR(stream, "invalid data_size");
+
+ return pb_encode_svarint(stream, value);
+}
+
+static bool checkreturn pb_enc_fixed64(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ PB_UNUSED(field);
+#ifndef PB_WITHOUT_64BIT
+ return pb_encode_fixed64(stream, src);
+#else
+ PB_UNUSED(src);
+ PB_RETURN_ERROR(stream, "no 64bit support");
+#endif
+}
+
+static bool checkreturn pb_enc_fixed32(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ PB_UNUSED(field);
+ return pb_encode_fixed32(stream, src);
+}
+
+static bool checkreturn pb_enc_bytes(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ const pb_bytes_array_t *bytes = NULL;
+
+ bytes = (const pb_bytes_array_t*)src;
+
+ if (src == NULL)
+ {
+ /* Treat null pointer as an empty bytes field */
+ return pb_encode_string(stream, NULL, 0);
+ }
+
+ if (PB_ATYPE(field->type) == PB_ATYPE_STATIC &&
+ PB_BYTES_ARRAY_T_ALLOCSIZE(bytes->size) > field->data_size)
+ {
+ PB_RETURN_ERROR(stream, "bytes size exceeded");
+ }
+
+ return pb_encode_string(stream, bytes->bytes, bytes->size);
+}
+
+static bool checkreturn pb_enc_string(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ size_t size = 0;
+ size_t max_size = field->data_size;
+ const char *p = (const char*)src;
+
+ if (PB_ATYPE(field->type) == PB_ATYPE_POINTER)
+ max_size = (size_t)-1;
+
+ if (src == NULL)
+ {
+ size = 0; /* Treat null pointer as an empty string */
+ }
+ else
+ {
+ /* strnlen() is not always available, so just use a loop */
+ while (size < max_size && *p != '\0')
+ {
+ size++;
+ p++;
+ }
+ }
+
+ return pb_encode_string(stream, (const pb_byte_t*)src, size);
+}
+
+static bool checkreturn pb_enc_submessage(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ if (field->ptr == NULL)
+ PB_RETURN_ERROR(stream, "invalid field descriptor");
+
+ return pb_encode_submessage(stream, (const pb_field_t*)field->ptr, src);
+}
+
+static bool checkreturn pb_enc_fixed_length_bytes(pb_ostream_t *stream, const pb_field_t *field, const void *src)
+{
+ return pb_encode_string(stream, (const pb_byte_t*)src, field->data_size);
+}
+
diff --git a/security/container/protos/nanopb/pb_encode.h b/security/container/protos/nanopb/pb_encode.h
new file mode 100644
index 0000000..8bf78dd
--- /dev/null
+++ b/security/container/protos/nanopb/pb_encode.h
@@ -0,0 +1,170 @@
+/* pb_encode.h: Functions to encode protocol buffers. Depends on pb_encode.c.
+ * The main function is pb_encode. You also need an output stream, and the
+ * field descriptions created by nanopb_generator.py.
+ */
+
+#ifndef PB_ENCODE_H_INCLUDED
+#define PB_ENCODE_H_INCLUDED
+
+#include "pb.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+/* Structure for defining custom output streams. You will need to provide
+ * a callback function to write the bytes to your storage, which can be
+ * for example a file or a network socket.
+ *
+ * The callback must conform to these rules:
+ *
+ * 1) Return false on IO errors. This will cause encoding to abort.
+ * 2) You can use state to store your own data (e.g. buffer pointer).
+ * 3) pb_write will update bytes_written after your callback runs.
+ * 4) Substreams will modify max_size and bytes_written. Don't use them
+ * to calculate any pointers.
+ */
+struct pb_ostream_s
+{
+#ifdef PB_BUFFER_ONLY
+ /* Callback pointer is not used in buffer-only configuration.
+ * Having an int pointer here allows binary compatibility but
+ * gives an error if someone tries to assign callback function.
+ * Also, NULL pointer marks a 'sizing stream' that does not
+ * write anything.
+ */
+ int *callback;
+#else
+ bool (*callback)(pb_ostream_t *stream, const pb_byte_t *buf, size_t count);
+#endif
+ void *state; /* Free field for use by callback implementation. */
+ size_t max_size; /* Limit number of output bytes written (or use SIZE_MAX). */
+ size_t bytes_written; /* Number of bytes written so far. */
+
+#ifndef PB_NO_ERRMSG
+ const char *errmsg;
+#endif
+};
+
+/***************************
+ * Main encoding functions *
+ ***************************/
+
+/* Encode a single protocol buffers message from C structure into a stream.
+ * Returns true on success, false on any failure.
+ * The actual struct pointed to by src_struct must match the description in fields.
+ * All required fields in the struct are assumed to have been filled in.
+ *
+ * Example usage:
+ * MyMessage msg = {};
+ * uint8_t buffer[64];
+ * pb_ostream_t stream;
+ *
+ * msg.field1 = 42;
+ * stream = pb_ostream_from_buffer(buffer, sizeof(buffer));
+ * pb_encode(&stream, MyMessage_fields, &msg);
+ */
+bool pb_encode(pb_ostream_t *stream, const pb_field_t fields[], const void *src_struct);
+
+/* Same as pb_encode, but prepends the length of the message as a varint.
+ * Corresponds to writeDelimitedTo() in Google's protobuf API.
+ */
+bool pb_encode_delimited(pb_ostream_t *stream, const pb_field_t fields[], const void *src_struct);
+
+/* Same as pb_encode, but appends a null byte to the message for termination.
+ * NOTE: This behaviour is not supported in most other protobuf implementations, so pb_encode_delimited()
+ * is a better option for compatibility.
+ */
+bool pb_encode_nullterminated(pb_ostream_t *stream, const pb_field_t fields[], const void *src_struct);
+
+/* Encode the message to get the size of the encoded data, but do not store
+ * the data. */
+bool pb_get_encoded_size(size_t *size, const pb_field_t fields[], const void *src_struct);
+
+/**************************************
+ * Functions for manipulating streams *
+ **************************************/
+
+/* Create an output stream for writing into a memory buffer.
+ * The number of bytes written can be found in stream.bytes_written after
+ * encoding the message.
+ *
+ * Alternatively, you can use a custom stream that writes directly to e.g.
+ * a file or a network socket.
+ */
+pb_ostream_t pb_ostream_from_buffer(pb_byte_t *buf, size_t bufsize);
+
+/* Pseudo-stream for measuring the size of a message without actually storing
+ * the encoded data.
+ *
+ * Example usage:
+ * MyMessage msg = {};
+ * pb_ostream_t stream = PB_OSTREAM_SIZING;
+ * pb_encode(&stream, MyMessage_fields, &msg);
+ * printf("Message size is %d\n", stream.bytes_written);
+ */
+#ifndef PB_NO_ERRMSG
+#define PB_OSTREAM_SIZING {0,0,0,0,0}
+#else
+#define PB_OSTREAM_SIZING {0,0,0,0}
+#endif
+
+/* Function to write into a pb_ostream_t stream. You can use this if you need
+ * to append or prepend some custom headers to the message.
+ */
+bool pb_write(pb_ostream_t *stream, const pb_byte_t *buf, size_t count);
+
+
+/************************************************
+ * Helper functions for writing field callbacks *
+ ************************************************/
+
+/* Encode field header based on type and field number defined in the field
+ * structure. Call this from the callback before writing out field contents. */
+bool pb_encode_tag_for_field(pb_ostream_t *stream, const pb_field_t *field);
+
+/* Encode field header by manually specifing wire type. You need to use this
+ * if you want to write out packed arrays from a callback field. */
+bool pb_encode_tag(pb_ostream_t *stream, pb_wire_type_t wiretype, uint32_t field_number);
+
+/* Encode an integer in the varint format.
+ * This works for bool, enum, int32, int64, uint32 and uint64 field types. */
+#ifndef PB_WITHOUT_64BIT
+bool pb_encode_varint(pb_ostream_t *stream, uint64_t value);
+#else
+bool pb_encode_varint(pb_ostream_t *stream, uint32_t value);
+#endif
+
+/* Encode an integer in the zig-zagged svarint format.
+ * This works for sint32 and sint64. */
+#ifndef PB_WITHOUT_64BIT
+bool pb_encode_svarint(pb_ostream_t *stream, int64_t value);
+#else
+bool pb_encode_svarint(pb_ostream_t *stream, int32_t value);
+#endif
+
+/* Encode a string or bytes type field. For strings, pass strlen(s) as size. */
+bool pb_encode_string(pb_ostream_t *stream, const pb_byte_t *buffer, size_t size);
+
+/* Encode a fixed32, sfixed32 or float value.
+ * You need to pass a pointer to a 4-byte wide C variable. */
+bool pb_encode_fixed32(pb_ostream_t *stream, const void *value);
+
+#ifndef PB_WITHOUT_64BIT
+/* Encode a fixed64, sfixed64 or double value.
+ * You need to pass a pointer to a 8-byte wide C variable. */
+bool pb_encode_fixed64(pb_ostream_t *stream, const void *value);
+#endif
+
+/* Encode a submessage field.
+ * You need to pass the pb_field_t array and pointer to struct, just like
+ * with pb_encode(). This internally encodes the submessage twice, first to
+ * calculate message size and then to actually write it out.
+ */
+bool pb_encode_submessage(pb_ostream_t *stream, const pb_field_t fields[], const void *src_struct);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif
+
+#endif
diff --git a/security/container/protos/pbsystem.h b/security/container/protos/pbsystem.h
new file mode 100644
index 0000000..f2308f8
--- /dev/null
+++ b/security/container/protos/pbsystem.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Header and types for nanopb to work with the Linux kernel */
+#include <linux/kernel.h>
+#include <linux/string.h>
+
+/* Small types. */
+
+/* Signed. */
+typedef signed char int_least8_t;
+typedef short int int_least16_t;
+typedef int int_least32_t;
+typedef long int int_least64_t;
+
+/* Unsigned. */
+typedef unsigned char uint_least8_t;
+typedef unsigned short int uint_least16_t;
+typedef unsigned int uint_least32_t;
+typedef unsigned long int uint_least64_t;
+
+/* Fast types. */
+
+/* Signed. */
+typedef signed char int_fast8_t;
+typedef long int int_fast16_t;
+typedef long int int_fast32_t;
+typedef long int int_fast64_t;
+
+/* Unsigned. */
+typedef unsigned char uint_fast8_t;
+typedef unsigned long int uint_fast16_t;
+typedef unsigned long int uint_fast32_t;
+typedef unsigned long int uint_fast64_t;
diff --git a/security/container/vsock.c b/security/container/vsock.c
new file mode 100644
index 0000000..52ac27e
--- /dev/null
+++ b/security/container/vsock.c
@@ -0,0 +1,559 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Container Security Monitor module
+ *
+ * Copyright (c) 2018 Google, Inc
+ */
+
+#include "monitor.h"
+
+#include <net/net_namespace.h>
+#include <net/vsock_addr.h>
+#include <net/sock.h>
+#include <linux/socket.h>
+#include <linux/workqueue.h>
+#include <linux/jiffies.h>
+#include <linux/mutex.h>
+#include <linux/version.h>
+#include <linux/kthread.h>
+#include <linux/printk.h>
+#include <linux/delay.h>
+#include <linux/timekeeping.h>
+
+/*
+ * virtio vsocket over which to send events to the host.
+ * NULL if monitoring is disabled, or if the socket was disconnected and we're
+ * trying to reconnect to the host.
+ */
+static struct socket *csm_vsocket;
+
+/* reconnect delay */
+#define CSM_RECONNECT_FREQ_MSEC 5000
+
+/* config pull delay */
+#define CSM_CONFIG_FREQ_MSEC 1000
+
+/* vsock receive attempts and delay until giving up */
+#define CSM_RECV_ATTEMPTS 2
+#define CSM_RECV_DELAY_MSEC 100
+
+/* heartbeat work */
+#define CSM_HEARTBEAT_FREQ msecs_to_jiffies(5000)
+static void csm_heartbeat(struct work_struct *work);
+static DECLARE_DELAYED_WORK(csm_heartbeat_work, csm_heartbeat);
+
+/* csm protobuf work */
+static void csm_sendmsg_pipe_handler(struct work_struct *work);
+
+/* csm message work container*/
+struct msg_work_data {
+ struct work_struct msg_work;
+ size_t pos_bytes_written;
+ char msg[];
+};
+
+/* size used for the config error message. */
+#define CSM_ERROR_BUF_SIZE 40
+
+/* Running thread to manage vsock connections. */
+static struct task_struct *socket_thread;
+
+/* Mutex to ensure sequential dumping of protos */
+static DEFINE_MUTEX(protodump);
+
+static struct socket *csm_create_socket(void)
+{
+ int err;
+ struct sockaddr_vm host_addr;
+ struct socket *sock;
+
+ err = sock_create_kern(&init_net, AF_VSOCK, SOCK_STREAM, 0,
+ &sock);
+ if (err) {
+ pr_debug("error creating AF_VSOCK socket: %d\n", err);
+ return ERR_PTR(err);
+ }
+
+ vsock_addr_init(&host_addr, VMADDR_CID_HYPERVISOR, CSM_HOST_PORT);
+
+ err = kernel_connect(sock, (struct sockaddr *)&host_addr,
+ sizeof(host_addr), 0);
+ if (err) {
+ if (err != -ECONNRESET) {
+ pr_debug("error connecting AF_VSOCK socket to host port %u: %d\n",
+ CSM_HOST_PORT, err);
+ }
+ goto error_release;
+ }
+
+ return sock;
+
+error_release:
+ sock_release(sock);
+ return ERR_PTR(err);
+}
+
+static void csm_destroy_socket(void)
+{
+ down_write(&csm_rwsem_vsocket);
+ if (csm_vsocket) {
+ sock_release(csm_vsocket);
+ csm_vsocket = NULL;
+ }
+ up_write(&csm_rwsem_vsocket);
+}
+
+static int csm_vsock_sendmsg(struct kvec *vecs, size_t vecs_size,
+ size_t total_length)
+{
+ struct msghdr msg = { };
+ int res = -EPIPE;
+
+ if (!cmdline_boot_vsock_enabled)
+ return 0;
+
+ down_read(&csm_rwsem_vsocket);
+ if (csm_vsocket) {
+ res = kernel_sendmsg(csm_vsocket, &msg, vecs, vecs_size,
+ total_length);
+ if (res > 0)
+ res = 0;
+ }
+ up_read(&csm_rwsem_vsocket);
+
+ return res;
+}
+
+static ssize_t csm_user_pipe_write(struct kvec *vecs, size_t vecs_size,
+ size_t total_length)
+{
+ ssize_t perr = 0;
+ struct iov_iter io = { };
+ loff_t pos = 0;
+ struct pipe_inode_info *pipe;
+ unsigned int readers;
+
+ if (!csm_user_write_pipe)
+ return 0;
+
+ down_read(&csm_rwsem_pipe);
+
+ if (csm_user_write_pipe == NULL)
+ goto end;
+
+ /* The pipe info is the same for reader and write files. */
+ pipe = get_pipe_info(csm_user_write_pipe, false);
+
+ /* If nobody is listening, don't write events. */
+ readers = READ_ONCE(pipe->readers);
+ if (readers <= 1) {
+ WARN_ON(readers == 0);
+ goto end;
+ }
+
+
+ iov_iter_kvec(&io, WRITE, vecs, vecs_size,
+ total_length);
+
+ file_start_write(csm_user_write_pipe);
+ perr = vfs_iter_write(csm_user_write_pipe, &io, &pos, 0);
+ file_end_write(csm_user_write_pipe);
+
+end:
+ up_read(&csm_rwsem_pipe);
+ return perr;
+}
+
+static int csm_sendmsg(int type, const void *buf, size_t len)
+{
+ struct csm_msg_hdr hdr = {
+ .msg_type = cpu_to_le32(type),
+ .msg_length = cpu_to_le32(sizeof(hdr) + len),
+ };
+ struct kvec vecs[] = {
+ {
+ .iov_base = &hdr,
+ .iov_len = sizeof(hdr),
+ }, {
+ .iov_base = (void *)buf,
+ .iov_len = len,
+ }
+ };
+ int res;
+ ssize_t perr;
+
+ res = csm_vsock_sendmsg(vecs, ARRAY_SIZE(vecs),
+ le32_to_cpu(hdr.msg_length));
+ if (res < 0) {
+ pr_warn_ratelimited("sendmsg error (msg_type=%d, msg_length=%u): %d\n",
+ type, le32_to_cpu(hdr.msg_length), res);
+ }
+
+ perr = csm_user_pipe_write(vecs, ARRAY_SIZE(vecs),
+ le32_to_cpu(hdr.msg_length));
+ if (perr < 0) {
+ pr_warn_ratelimited("vfs_iter_write error (msg_type=%d, msg_length=%u): %zd\n",
+ type, le32_to_cpu(hdr.msg_length), perr);
+ }
+
+ /* If one of them failed, increase the stats once. */
+ if (res < 0 || perr < 0)
+ csm_stats.event_writing_failed++;
+
+ return res;
+}
+
+static bool csm_get_expected_size(size_t *size, const pb_field_t fields[],
+ const void *src_struct)
+{
+ schema_Event *event;
+
+ if (fields != schema_Event_fields)
+ goto other;
+
+ /* Size above 99% of the 100 containers tested running k8s. */
+ event = (schema_Event *)src_struct;
+ switch (event->which_event) {
+ case schema_Event_execute_tag:
+ *size = 3344;
+ return true;
+ case schema_Event_memexec_tag:
+ *size = 176;
+ return true;
+ case schema_Event_clone_tag:
+ *size = 50;
+ return true;
+ case schema_Event_exit_tag:
+ *size = 30;
+ return true;
+ }
+
+other:
+ /* If unknown, do the pre-computation. */
+ return pb_get_encoded_size(size, fields, src_struct);
+}
+
+static struct msg_work_data *csm_encodeproto(size_t size,
+ const pb_field_t fields[],
+ const void *src_struct)
+{
+ pb_ostream_t pos;
+ struct msg_work_data *wd;
+ size_t total;
+
+ total = size + sizeof(*wd);
+ if (total < size)
+ return ERR_PTR(-EINVAL);
+
+ wd = kmalloc(total, GFP_KERNEL);
+ if (!wd)
+ return ERR_PTR(-ENOMEM);
+
+ pos = pb_ostream_from_buffer(wd->msg, size);
+ if (!pb_encode(&pos, fields, src_struct)) {
+ kfree(wd);
+ return ERR_PTR(-EINVAL);
+ }
+
+ INIT_WORK(&wd->msg_work, csm_sendmsg_pipe_handler);
+ wd->pos_bytes_written = pos.bytes_written;
+ return wd;
+}
+
+static int csm_sendproto(int type, const pb_field_t fields[],
+ const void *src_struct)
+{
+ int err = 0;
+ size_t size, previous_size;
+ struct msg_work_data *wd;
+
+ /* Use the expected size first. */
+ if (!csm_get_expected_size(&size, fields, src_struct))
+ return -EINVAL;
+
+ wd = csm_encodeproto(size, fields, src_struct);
+ if (unlikely(IS_ERR(wd))) {
+ /* If it failed, retry with the exact size. */
+ csm_stats.size_picking_failed++;
+ previous_size = size;
+
+ if (!pb_get_encoded_size(&size, fields, src_struct))
+ return -EINVAL;
+
+ wd = csm_encodeproto(size, fields, src_struct);
+ if (IS_ERR(wd)) {
+ csm_stats.proto_encoding_failed++;
+ return PTR_ERR(wd);
+ }
+
+ pr_debug("size picking failed %lu vs %lu\n", previous_size,
+ size);
+ }
+
+ /* The work handler takes care of cleanup, if successfully scheduled. */
+ if (likely(schedule_work(&wd->msg_work)))
+ return 0;
+
+ csm_stats.workqueue_failed++;
+ pr_err_ratelimited("Sent msg to workqueue unsuccessfully (assume dropped).\n");
+
+ kfree(wd);
+ return err;
+}
+
+static void csm_sendmsg_pipe_handler(struct work_struct *work)
+{
+ int err;
+ int type = CSM_MSG_EVENT_PROTO;
+ struct msg_work_data *wd = container_of(work, struct msg_work_data,
+ msg_work);
+
+ err = csm_sendmsg(type, wd->msg, wd->pos_bytes_written);
+ if (err)
+ pr_err_ratelimited("csm_sendmsg failed in work handler %s\n",
+ __func__);
+
+ kfree(wd);
+}
+
+int csm_sendeventproto(const pb_field_t fields[], schema_Event *event)
+{
+ /* Last check before generating and sending an event. */
+ if (!csm_enabled)
+ return -ENOTSUPP;
+
+ event->timestamp = ktime_get_real_ns();
+
+ return csm_sendproto(CSM_MSG_EVENT_PROTO, fields, event);
+}
+
+int csm_sendconfigrespproto(const pb_field_t fields[],
+ schema_ConfigurationResponse *resp)
+{
+ return csm_sendproto(CSM_MSG_CONFIG_RESPONSE_PROTO, fields, resp);
+}
+
+static void csm_heartbeat(struct work_struct *work)
+{
+ csm_sendmsg(CSM_MSG_TYPE_HEARTBEAT, NULL, 0);
+ schedule_delayed_work(&csm_heartbeat_work, CSM_HEARTBEAT_FREQ);
+}
+
+static int config_send_response(int err)
+{
+ char buf[CSM_ERROR_BUF_SIZE] = {};
+ schema_ConfigurationResponse resp =
+ schema_ConfigurationResponse_init_zero;
+
+ resp.error = schema_ConfigurationResponse_ErrorCode_NO_ERROR;
+ resp.version = CSM_VERSION;
+ resp.kernel_version = LINUX_VERSION_CODE;
+
+ if (err) {
+ resp.error = schema_ConfigurationResponse_ErrorCode_UNKNOWN;
+ snprintf(buf, sizeof(buf) - 1, "error code: %d", err);
+ resp.msg.funcs.encode = pb_encode_string_field;
+ resp.msg.arg = buf;
+ }
+
+ return csm_sendconfigrespproto(schema_ConfigurationResponse_fields,
+ &resp);
+}
+
+static int csm_recvmsg(void *buf, size_t len, bool expected)
+{
+ int err = 0;
+ struct msghdr msg = {};
+ struct kvec vecs;
+ size_t pos = 0;
+ size_t attempts = 0;
+
+ while (pos < len) {
+ vecs.iov_base = (char *)buf + pos;
+ vecs.iov_len = len - pos;
+
+ down_read(&csm_rwsem_vsocket);
+ if (csm_vsocket) {
+ err = kernel_recvmsg(csm_vsocket, &msg, &vecs, 1, len,
+ MSG_DONTWAIT);
+ } else {
+ pr_err("csm_vsocket was unset while the config thread was running\n");
+ err = -ENOENT;
+ }
+ up_read(&csm_rwsem_vsocket);
+
+ if (err == 0) {
+ err = -ENOTCONN;
+ pr_warn_ratelimited("vsock connection was reset\n");
+ break;
+ }
+
+ if (err == -EAGAIN) {
+ /*
+ * If nothing is received and nothing was expected
+ * just bail.
+ */
+ if (!expected && pos == 0) {
+ err = -EAGAIN;
+ break;
+ }
+
+ /*
+ * If we missing data after multiple attempts
+ * reset the connection.
+ */
+ if (++attempts >= CSM_RECV_ATTEMPTS) {
+ err = -EPIPE;
+ break;
+ }
+
+ msleep(CSM_RECV_DELAY_MSEC);
+ continue;
+ }
+
+ if (err < 0) {
+ pr_err_ratelimited("kernel_recvmsg failed with %d\n",
+ err);
+ break;
+ }
+
+ pos += err;
+ }
+
+ return err;
+}
+
+/*
+ * Listen for configuration until connection is closed or desynchronize.
+ * If something wrong happens while parsing the packet buffer that may
+ * desynchronize the thread with the backend, the connection is reset.
+ */
+
+static void listen_configuration(void *buf)
+{
+ int err;
+ struct csm_msg_hdr hdr = {};
+ uint32_t msg_type, msg_length;
+
+ pr_debug("listening for configuration messages\n");
+
+ while (true) {
+ err = csm_recvmsg(&hdr, sizeof(hdr), false);
+
+ /* Nothing available, wait and try again. */
+ if (err == -EAGAIN) {
+ msleep(CSM_CONFIG_FREQ_MSEC);
+ continue;
+ }
+
+ if (err < 0)
+ break;
+
+ msg_type = le32_to_cpu(hdr.msg_type);
+
+ if (msg_type != CSM_MSG_CONFIG_REQUEST_PROTO) {
+ pr_warn_ratelimited("unexpected message type: %d\n",
+ msg_type);
+ break;
+ }
+
+ msg_length = le32_to_cpu(hdr.msg_length);
+
+ if (msg_length <= sizeof(hdr) || msg_length > PAGE_SIZE) {
+ pr_warn_ratelimited("unexpected message length: %d\n",
+ msg_length);
+ break;
+ }
+
+ /* The message length include the size of the header. */
+ msg_length -= sizeof(hdr);
+
+ err = csm_recvmsg(buf, msg_length, true);
+ if (err < 0) {
+ pr_warn_ratelimited("failed to gather configuration: %d\n",
+ err);
+ break;
+ }
+
+ err = csm_update_config_from_buffer(buf, msg_length);
+ if (err < 0) {
+ /*
+ * Warn of the error but continue listening for
+ * configuration changes.
+ */
+ pr_warn_ratelimited("config update failed: %d\n", err);
+ } else {
+ pr_debug("config received and applied\n");
+ }
+
+ err = config_send_response(err);
+ if (err < 0) {
+ pr_err_ratelimited("config response failed: %d\n", err);
+ break;
+ }
+
+ pr_debug("config response sent\n");
+ }
+}
+
+/* Thread managing connection and listening for new configurations. */
+static int socket_thread_fn(void *unsued)
+{
+ void *buf;
+ struct socket *sock;
+
+ /* One page should be enough for current configurations. */
+ buf = (void *)__get_free_page(GFP_KERNEL);
+ if (!buf)
+ return -ENOMEM;
+
+ while (true) {
+ sock = csm_create_socket();
+ if (IS_ERR(sock)) {
+ pr_debug("unable to connect to host (port %u), will retry in %u ms\n",
+ CSM_HOST_PORT, CSM_RECONNECT_FREQ_MSEC);
+ msleep(CSM_RECONNECT_FREQ_MSEC);
+ continue;
+ }
+
+ down_write(&csm_rwsem_vsocket);
+ csm_vsocket = sock;
+ up_write(&csm_rwsem_vsocket);
+
+ schedule_delayed_work(&csm_heartbeat_work, 0);
+
+ listen_configuration(buf);
+
+ pr_warn("vsock state incorrect, disconnecting. Messages will be lost.\n");
+
+ cancel_delayed_work_sync(&csm_heartbeat_work);
+ csm_destroy_socket();
+ }
+
+ return 0;
+}
+
+void __init vsock_destroy(void)
+{
+ if (socket_thread) {
+ kthread_stop(socket_thread);
+ socket_thread = NULL;
+ }
+}
+
+int __init vsock_initialize(void)
+{
+ struct task_struct *task;
+
+ if (cmdline_boot_vsock_enabled) {
+ task = kthread_run(socket_thread_fn, NULL, "csm-vsock-thread");
+ if (IS_ERR(task)) {
+ pr_err("failed to create socket thread: %ld\n", PTR_ERR(task));
+ vsock_destroy();
+ return PTR_ERR(task);
+ }
+
+ socket_thread = task;
+ }
+ return 0;
+}
diff --git a/security/integrity/ima/ima_main.c b/security/integrity/ima/ima_main.c
index 2d1af88..cb2deaa 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -501,37 +501,14 @@
}
EXPORT_SYMBOL_GPL(ima_file_check);
-/**
- * ima_file_hash - return the stored measurement if a file has been hashed and
- * is in the iint cache.
- * @file: pointer to the file
- * @buf: buffer in which to store the hash
- * @buf_size: length of the buffer
- *
- * On success, return the hash algorithm (as defined in the enum hash_algo).
- * If buf is not NULL, this function also outputs the hash into buf.
- * If the hash is larger than buf_size, then only buf_size bytes will be copied.
- * It generally just makes sense to pass a buffer capable of holding the largest
- * possible hash: IMA_MAX_DIGEST_SIZE.
- * The file hash returned is based on the entire file, including the appended
- * signature.
- *
- * If IMA is disabled or if no measurement is available, return -EOPNOTSUPP.
- * If the parameters are incorrect, return -EINVAL.
- */
-int ima_file_hash(struct file *file, char *buf, size_t buf_size)
+static int __ima_inode_hash(struct inode *inode, char *buf, size_t buf_size)
{
- struct inode *inode;
struct integrity_iint_cache *iint;
int hash_algo;
- if (!file)
- return -EINVAL;
-
if (!ima_policy_flag)
return -EOPNOTSUPP;
- inode = file_inode(file);
iint = integrity_iint_find(inode);
if (!iint)
return -EOPNOTSUPP;
@@ -558,9 +535,62 @@
return hash_algo;
}
+
+/**
+ * ima_file_hash - return the stored measurement if a file has been hashed and
+ * is in the iint cache.
+ * @file: pointer to the file
+ * @buf: buffer in which to store the hash
+ * @buf_size: length of the buffer
+ *
+ * On success, return the hash algorithm (as defined in the enum hash_algo).
+ * If buf is not NULL, this function also outputs the hash into buf.
+ * If the hash is larger than buf_size, then only buf_size bytes will be copied.
+ * It generally just makes sense to pass a buffer capable of holding the largest
+ * possible hash: IMA_MAX_DIGEST_SIZE.
+ * The file hash returned is based on the entire file, including the appended
+ * signature.
+ *
+ * If IMA is disabled or if no measurement is available, return -EOPNOTSUPP.
+ * If the parameters are incorrect, return -EINVAL.
+ */
+int ima_file_hash(struct file *file, char *buf, size_t buf_size)
+{
+ if (!file)
+ return -EINVAL;
+
+ return __ima_inode_hash(file_inode(file), buf, buf_size);
+}
EXPORT_SYMBOL_GPL(ima_file_hash);
/**
+ * ima_inode_hash - return the stored measurement if the inode has been hashed
+ * and is in the iint cache.
+ * @inode: pointer to the inode
+ * @buf: buffer in which to store the hash
+ * @buf_size: length of the buffer
+ *
+ * On success, return the hash algorithm (as defined in the enum hash_algo).
+ * If buf is not NULL, this function also outputs the hash into buf.
+ * If the hash is larger than buf_size, then only buf_size bytes will be copied.
+ * It generally just makes sense to pass a buffer capable of holding the largest
+ * possible hash: IMA_MAX_DIGEST_SIZE.
+ * The hash returned is based on the entire contents, including the appended
+ * signature.
+ *
+ * If IMA is disabled or if no measurement is available, return -EOPNOTSUPP.
+ * If the parameters are incorrect, return -EINVAL.
+ */
+int ima_inode_hash(struct inode *inode, char *buf, size_t buf_size)
+{
+ if (!inode)
+ return -EINVAL;
+
+ return __ima_inode_hash(inode, buf, buf_size);
+}
+EXPORT_SYMBOL_GPL(ima_inode_hash);
+
+/**
* ima_post_create_tmpfile - mark newly created tmpfile as new
* @file : newly created tmpfile
*
diff --git a/security/loadpin/loadpin.c b/security/loadpin/loadpin.c
index b12f7d9..7f3b735 100644
--- a/security/loadpin/loadpin.c
+++ b/security/loadpin/loadpin.c
@@ -249,7 +249,7 @@
};
/* Should not be mutable after boot, so not listed in sysfs (perm == 0). */
-module_param(enforce, int, 0);
-MODULE_PARM_DESC(enforce, "Enforce module/firmware pinning");
+module_param_named(enabled, enforce, int, 0);
+MODULE_PARM_DESC(enabled, "Enforce module/firmware pinning");
module_param_array_named(exclude, exclude_read_files, charp, NULL, 0);
MODULE_PARM_DESC(exclude, "Exclude pinning specific read file types");
diff --git a/security/security.c b/security/security.c
index a28045d..71f32b3 100644
--- a/security/security.c
+++ b/security/security.c
@@ -1476,6 +1476,11 @@
}
}
+void security_file_pre_free(struct file *file)
+{
+ call_void_hook(file_pre_free_security, file);
+}
+
int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
{
return call_int_hook(file_ioctl, 0, file, cmd, arg);
@@ -1591,6 +1596,11 @@
return rc;
}
+void security_task_post_alloc(struct task_struct *task)
+{
+ call_void_hook(task_post_alloc, task);
+}
+
void security_task_free(struct task_struct *task)
{
call_void_hook(task_free, task);
@@ -1803,6 +1813,11 @@
return call_int_hook(task_kill, 0, p, info, sig, cred);
}
+void security_task_exit(struct task_struct *p)
+{
+ call_void_hook(task_exit, p);
+}
+
int security_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5)
{
diff --git a/tools/include/linux/filter.h b/tools/include/linux/filter.h
index ca28b6a..736bdec 100644
--- a/tools/include/linux/filter.h
+++ b/tools/include/linux/filter.h
@@ -169,15 +169,31 @@
.off = OFF, \
.imm = 0 })
-/* Atomic memory add, *(uint *)(dst_reg + off16) += src_reg */
+/*
+ * Atomic operations:
+ *
+ * BPF_ADD *(uint *) (dst_reg + off16) += src_reg
+ * BPF_AND *(uint *) (dst_reg + off16) &= src_reg
+ * BPF_OR *(uint *) (dst_reg + off16) |= src_reg
+ * BPF_XOR *(uint *) (dst_reg + off16) ^= src_reg
+ * BPF_ADD | BPF_FETCH src_reg = atomic_fetch_add(dst_reg + off16, src_reg);
+ * BPF_AND | BPF_FETCH src_reg = atomic_fetch_and(dst_reg + off16, src_reg);
+ * BPF_OR | BPF_FETCH src_reg = atomic_fetch_or(dst_reg + off16, src_reg);
+ * BPF_XOR | BPF_FETCH src_reg = atomic_fetch_xor(dst_reg + off16, src_reg);
+ * BPF_XCHG src_reg = atomic_xchg(dst_reg + off16, src_reg)
+ * BPF_CMPXCHG r0 = atomic_cmpxchg(dst_reg + off16, r0, src_reg)
+ */
-#define BPF_STX_XADD(SIZE, DST, SRC, OFF) \
+#define BPF_ATOMIC_OP(SIZE, OP, DST, SRC, OFF) \
((struct bpf_insn) { \
- .code = BPF_STX | BPF_SIZE(SIZE) | BPF_XADD, \
+ .code = BPF_STX | BPF_SIZE(SIZE) | BPF_ATOMIC, \
.dst_reg = DST, \
.src_reg = SRC, \
.off = OFF, \
- .imm = 0 })
+ .imm = OP })
+
+/* Legacy alias */
+#define BPF_STX_XADD(SIZE, DST, SRC, OFF) BPF_ATOMIC_OP(SIZE, BPF_ADD, DST, SRC, OFF)
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 556216dc..1276953 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -19,7 +19,8 @@
/* ld/ldx fields */
#define BPF_DW 0x18 /* double word (64-bit) */
-#define BPF_XADD 0xc0 /* exclusive add */
+#define BPF_ATOMIC 0xc0 /* atomic memory ops - op type in immediate */
+#define BPF_XADD 0xc0 /* exclusive add - legacy name */
/* alu/jmp fields */
#define BPF_MOV 0xb0 /* mov reg to reg */
@@ -43,6 +44,11 @@
#define BPF_CALL 0x80 /* function call */
#define BPF_EXIT 0x90 /* function return */
+/* atomic op type fields (stored in immediate) */
+#define BPF_FETCH 0x01 /* not an opcode on its own, used to build others */
+#define BPF_XCHG (0xe0 | BPF_FETCH) /* atomic exchange */
+#define BPF_CMPXCHG (0xf0 | BPF_FETCH) /* atomic compare-and-write */
+
/* Register numbers */
enum {
BPF_REG_0 = 0,
@@ -157,6 +163,7 @@
BPF_MAP_TYPE_STRUCT_OPS,
BPF_MAP_TYPE_RINGBUF,
BPF_MAP_TYPE_INODE_STORAGE,
+ BPF_MAP_TYPE_TASK_STORAGE,
};
/* Note that tracing related programs such as
@@ -1661,6 +1668,14 @@
* Return
* A 8-byte long non-decreasing number.
*
+ * u64 bpf_get_socket_cookie(struct sock *sk)
+ * Description
+ * Equivalent to **bpf_get_socket_cookie**\ () helper that accepts
+ * *sk*, but gets socket from a BTF **struct sock**. This helper
+ * also works for sleepable programs.
+ * Return
+ * A 8-byte long unique number or 0 if *sk* is NULL.
+ *
* u32 bpf_get_socket_uid(struct sk_buff *skb)
* Return
* The owner UID of the socket associated to *skb*. If the socket
@@ -2442,7 +2457,7 @@
* running simultaneously.
*
* A user should care about the synchronization by himself.
- * For example, by using the **BPF_STX_XADD** instruction to alter
+ * For example, by using the **BPF_ATOMIC** instructions to alter
* the shared data.
* Return
* A pointer to the local storage area.
@@ -3742,6 +3757,80 @@
* Return
* The helper returns **TC_ACT_REDIRECT** on success or
* **TC_ACT_SHOT** on error.
+ *
+ * void *bpf_task_storage_get(struct bpf_map *map, struct task_struct *task, void *value, u64 flags)
+ * Description
+ * Get a bpf_local_storage from the *task*.
+ *
+ * Logically, it could be thought of as getting the value from
+ * a *map* with *task* as the **key**. From this
+ * perspective, the usage is not much different from
+ * **bpf_map_lookup_elem**\ (*map*, **&**\ *task*) except this
+ * helper enforces the key must be an task_struct and the map must also
+ * be a **BPF_MAP_TYPE_TASK_STORAGE**.
+ *
+ * Underneath, the value is stored locally at *task* instead of
+ * the *map*. The *map* is used as the bpf-local-storage
+ * "type". The bpf-local-storage "type" (i.e. the *map*) is
+ * searched against all bpf_local_storage residing at *task*.
+ *
+ * An optional *flags* (**BPF_LOCAL_STORAGE_GET_F_CREATE**) can be
+ * used such that a new bpf_local_storage will be
+ * created if one does not exist. *value* can be used
+ * together with **BPF_LOCAL_STORAGE_GET_F_CREATE** to specify
+ * the initial value of a bpf_local_storage. If *value* is
+ * **NULL**, the new bpf_local_storage will be zero initialized.
+ * Return
+ * A bpf_local_storage pointer is returned on success.
+ *
+ * **NULL** if not found or there was an error in adding
+ * a new bpf_local_storage.
+ *
+ * long bpf_task_storage_delete(struct bpf_map *map, struct task_struct *task)
+ * Description
+ * Delete a bpf_local_storage from a *task*.
+ * Return
+ * 0 on success.
+ *
+ * **-ENOENT** if the bpf_local_storage cannot be found.
+ *
+ * struct task_struct *bpf_get_current_task_btf(void)
+ * Description
+ * Return a BTF pointer to the "current" task.
+ * This pointer can also be used in helpers that accept an
+ * *ARG_PTR_TO_BTF_ID* of type *task_struct*.
+ * Return
+ * Pointer to the current task.
+ *
+ * long bpf_bprm_opts_set(struct linux_binprm *bprm, u64 flags)
+ * Description
+ * Set or clear certain options on *bprm*:
+ *
+ * **BPF_F_BPRM_SECUREEXEC** Set the secureexec bit
+ * which sets the **AT_SECURE** auxv for glibc. The bit
+ * is cleared if the flag is not specified.
+ * Return
+ * **-EINVAL** if invalid *flags* are passed, zero otherwise.
+ *
+ * u64 bpf_ktime_get_coarse_ns(void)
+ * Description
+ * Return a coarse-grained version of the time elapsed since
+ * system boot, in nanoseconds. Does not include time the system
+ * was suspended.
+ *
+ * See: **clock_gettime**\ (**CLOCK_MONOTONIC_COARSE**)
+ * Return
+ * Current *ktime*.
+ *
+ * long bpf_ima_inode_hash(struct inode *inode, void *dst, u32 size)
+ * Description
+ * Returns the stored IMA hash of the *inode* (if it's avaialable).
+ * If the hash is larger than *size*, then only *size*
+ * bytes will be copied to *dst*
+ * Return
+ * The **hash_algo** is returned on success,
+ * **-EOPNOTSUP** if IMA is disabled or **-EINVAL** if
+ * invalid arguments are passed.
*/
#define __BPF_FUNC_MAPPER(FN) \
FN(unspec), \
@@ -3900,6 +3989,12 @@
FN(per_cpu_ptr), \
FN(this_cpu_ptr), \
FN(redirect_peer), \
+ FN(task_storage_get), \
+ FN(task_storage_delete), \
+ FN(get_current_task_btf), \
+ FN(bprm_opts_set), \
+ FN(ktime_get_coarse_ns), \
+ FN(ima_inode_hash), \
/* */
/* integer value in 'imm' field of BPF_CALL instruction selects which helper
@@ -4071,6 +4166,11 @@
BPF_LWT_ENCAP_IP,
};
+/* Flags for bpf_bprm_opts_set helper */
+enum {
+ BPF_F_BPRM_SECUREEXEC = (1ULL << 0),
+};
+
#define __bpf_md_ptr(type, name) \
union { \
type name; \
diff --git a/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
new file mode 100644
index 0000000..addf127
--- /dev/null
+++ b/tools/testing/selftests/bpf/prog_tests/atomic_bounds.c
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <test_progs.h>
+
+#include "atomic_bounds.skel.h"
+
+void test_atomic_bounds(void)
+{
+ struct atomic_bounds *skel;
+ __u32 duration = 0;
+
+ skel = atomic_bounds__open_and_load();
+ if (CHECK(!skel, "skel_load", "couldn't load program\n"))
+ return;
+}
diff --git a/tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c b/tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c
index b549fcf..0a1fc98 100644
--- a/tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c
+++ b/tools/testing/selftests/bpf/prog_tests/cgroup_attach_multi.c
@@ -45,13 +45,13 @@
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_map_lookup_elem),
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2),
BPF_MOV64_IMM(BPF_REG_1, val), /* r1 = 1 */
- BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_0, BPF_REG_1, 0, 0), /* xadd r0 += r1 */
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_1, 0),
BPF_LD_MAP_FD(BPF_REG_1, cgroup_storage_fd),
BPF_MOV64_IMM(BPF_REG_2, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_local_storage),
BPF_MOV64_IMM(BPF_REG_1, val),
- BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_0, BPF_REG_1, 0, 0),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_0, BPF_REG_1, 0),
BPF_LD_MAP_FD(BPF_REG_1, percpu_cgroup_storage_fd),
BPF_MOV64_IMM(BPF_REG_2, 0),
diff --git a/tools/testing/selftests/bpf/progs/atomic_bounds.c b/tools/testing/selftests/bpf/progs/atomic_bounds.c
new file mode 100644
index 0000000..e5fff7f
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/atomic_bounds.c
@@ -0,0 +1,24 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_tracing.h>
+#include <stdbool.h>
+
+#ifdef ENABLE_ATOMICS_TESTS
+bool skip_tests __attribute((__section__(".data"))) = false;
+#else
+bool skip_tests = true;
+#endif
+
+SEC("fentry/bpf_fentry_test1")
+int BPF_PROG(sub, int x)
+{
+#ifdef ENABLE_ATOMICS_TESTS
+ int a = 0;
+ int b = __sync_fetch_and_add(&a, 1);
+ /* b is certainly 0 here. Can the verifier tell? */
+ while (b)
+ continue;
+#endif
+ return 0;
+}
diff --git a/tools/testing/selftests/bpf/test_cgroup_storage.c b/tools/testing/selftests/bpf/test_cgroup_storage.c
index d946252..0cda61d 100644
--- a/tools/testing/selftests/bpf/test_cgroup_storage.c
+++ b/tools/testing/selftests/bpf/test_cgroup_storage.c
@@ -29,7 +29,7 @@
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_local_storage),
BPF_MOV64_IMM(BPF_REG_1, 1),
- BPF_STX_XADD(BPF_DW, BPF_REG_0, BPF_REG_1, 0),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_1, 0),
BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0, 0),
BPF_ALU64_IMM(BPF_AND, BPF_REG_1, 0x1),
BPF_MOV64_REG(BPF_REG_0, BPF_REG_1),
diff --git a/tools/testing/selftests/bpf/verifier/atomic_bounds.c b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
new file mode 100644
index 0000000..e82183e
--- /dev/null
+++ b/tools/testing/selftests/bpf/verifier/atomic_bounds.c
@@ -0,0 +1,27 @@
+{
+ "BPF_ATOMIC bounds propagation, mem->reg",
+ .insns = {
+ /* a = 0; */
+ /*
+ * Note this is implemented with two separate instructions,
+ * where you might think one would suffice:
+ *
+ * BPF_ST_MEM(BPF_DW, BPF_REG_10, -8, 0),
+ *
+ * This is because BPF_ST_MEM doesn't seem to set the stack slot
+ * type to 0 when storing an immediate.
+ */
+ BPF_MOV64_IMM(BPF_REG_0, 0),
+ BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+ /* b = atomic_fetch_add(&a, 1); */
+ BPF_MOV64_IMM(BPF_REG_1, 1),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD | BPF_FETCH, BPF_REG_10, BPF_REG_1, -8),
+ /* Verifier should be able to tell that this infinite loop isn't reachable. */
+ /* if (b) while (true) continue; */
+ BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, -1),
+ BPF_EXIT_INSN(),
+ },
+ .result = ACCEPT,
+ .result_unpriv = REJECT,
+ .errstr_unpriv = "back-edge",
+},
diff --git a/tools/testing/selftests/bpf/verifier/ctx.c b/tools/testing/selftests/bpf/verifier/ctx.c
index 93d6b16..2308086 100644
--- a/tools/testing/selftests/bpf/verifier/ctx.c
+++ b/tools/testing/selftests/bpf/verifier/ctx.c
@@ -10,14 +10,13 @@
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
{
- "context stores via XADD",
+ "context stores via BPF_ATOMIC",
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 0),
- BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_W, BPF_REG_1,
- BPF_REG_0, offsetof(struct __sk_buff, mark), 0),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_1, BPF_REG_0, offsetof(struct __sk_buff, mark)),
BPF_EXIT_INSN(),
},
- .errstr = "BPF_XADD stores into R1 ctx is not allowed",
+ .errstr = "BPF_ATOMIC stores into R1 ctx is not allowed",
.result = REJECT,
.prog_type = BPF_PROG_TYPE_SCHED_CLS,
},
diff --git a/tools/testing/selftests/bpf/verifier/direct_packet_access.c b/tools/testing/selftests/bpf/verifier/direct_packet_access.c
index ae72536..ac1e19d 100644
--- a/tools/testing/selftests/bpf/verifier/direct_packet_access.c
+++ b/tools/testing/selftests/bpf/verifier/direct_packet_access.c
@@ -333,7 +333,7 @@
BPF_MOV64_REG(BPF_REG_4, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8),
BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_2, 0),
- BPF_STX_XADD(BPF_DW, BPF_REG_4, BPF_REG_5, 0),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_4, BPF_REG_5, 0),
BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_4, 0),
BPF_STX_MEM(BPF_W, BPF_REG_2, BPF_REG_5, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
@@ -488,7 +488,7 @@
BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 11),
BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_10, -8),
BPF_MOV64_IMM(BPF_REG_4, 0xffffffff),
- BPF_STX_XADD(BPF_DW, BPF_REG_10, BPF_REG_4, -8),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_10, BPF_REG_4, -8),
BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_10, -8),
BPF_ALU64_IMM(BPF_RSH, BPF_REG_4, 49),
BPF_ALU64_REG(BPF_ADD, BPF_REG_4, BPF_REG_2),
diff --git a/tools/testing/selftests/bpf/verifier/leak_ptr.c b/tools/testing/selftests/bpf/verifier/leak_ptr.c
index d6eec17..73f0dea 100644
--- a/tools/testing/selftests/bpf/verifier/leak_ptr.c
+++ b/tools/testing/selftests/bpf/verifier/leak_ptr.c
@@ -5,7 +5,7 @@
BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
offsetof(struct __sk_buff, cb[0])),
BPF_LD_MAP_FD(BPF_REG_2, 0),
- BPF_STX_XADD(BPF_DW, BPF_REG_1, BPF_REG_2,
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_1, BPF_REG_2,
offsetof(struct __sk_buff, cb[0])),
BPF_EXIT_INSN(),
},
@@ -13,7 +13,7 @@
.errstr_unpriv = "R2 leaks addr into mem",
.result_unpriv = REJECT,
.result = REJECT,
- .errstr = "BPF_XADD stores into R1 ctx is not allowed",
+ .errstr = "BPF_ATOMIC stores into R1 ctx is not allowed",
},
{
"leak pointer into ctx 2",
@@ -21,14 +21,14 @@
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_0,
offsetof(struct __sk_buff, cb[0])),
- BPF_STX_XADD(BPF_DW, BPF_REG_1, BPF_REG_10,
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_1, BPF_REG_10,
offsetof(struct __sk_buff, cb[0])),
BPF_EXIT_INSN(),
},
.errstr_unpriv = "R10 leaks addr into mem",
.result_unpriv = REJECT,
.result = REJECT,
- .errstr = "BPF_XADD stores into R1 ctx is not allowed",
+ .errstr = "BPF_ATOMIC stores into R1 ctx is not allowed",
},
{
"leak pointer into ctx 3",
@@ -56,7 +56,7 @@
BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3),
BPF_MOV64_IMM(BPF_REG_3, 0),
BPF_STX_MEM(BPF_DW, BPF_REG_0, BPF_REG_3, 0),
- BPF_STX_XADD(BPF_DW, BPF_REG_0, BPF_REG_6, 0),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_0, BPF_REG_6, 0),
BPF_MOV64_IMM(BPF_REG_0, 0),
BPF_EXIT_INSN(),
},
diff --git a/tools/testing/selftests/bpf/verifier/meta_access.c b/tools/testing/selftests/bpf/verifier/meta_access.c
index 205292b..b45e8af 100644
--- a/tools/testing/selftests/bpf/verifier/meta_access.c
+++ b/tools/testing/selftests/bpf/verifier/meta_access.c
@@ -171,7 +171,7 @@
BPF_MOV64_IMM(BPF_REG_5, 42),
BPF_MOV64_IMM(BPF_REG_6, 24),
BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_5, -8),
- BPF_STX_XADD(BPF_DW, BPF_REG_10, BPF_REG_6, -8),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_10, BPF_REG_6, -8),
BPF_LDX_MEM(BPF_DW, BPF_REG_5, BPF_REG_10, -8),
BPF_JMP_IMM(BPF_JGT, BPF_REG_5, 100, 6),
BPF_ALU64_REG(BPF_ADD, BPF_REG_3, BPF_REG_5),
@@ -196,7 +196,7 @@
BPF_MOV64_IMM(BPF_REG_5, 42),
BPF_MOV64_IMM(BPF_REG_6, 24),
BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_5, -8),
- BPF_STX_XADD(BPF_DW, BPF_REG_10, BPF_REG_6, -8),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_10, BPF_REG_6, -8),
BPF_LDX_MEM(BPF_DW, BPF_REG_5, BPF_REG_10, -8),
BPF_JMP_IMM(BPF_JGT, BPF_REG_5, 100, 6),
BPF_ALU64_REG(BPF_ADD, BPF_REG_2, BPF_REG_5),
diff --git a/tools/testing/selftests/bpf/verifier/unpriv.c b/tools/testing/selftests/bpf/verifier/unpriv.c
index 0d621c8..b61ba97 100644
--- a/tools/testing/selftests/bpf/verifier/unpriv.c
+++ b/tools/testing/selftests/bpf/verifier/unpriv.c
@@ -206,7 +206,8 @@
BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8),
BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0),
BPF_MOV64_IMM(BPF_REG_0, 1),
- BPF_RAW_INSN(BPF_STX | BPF_XADD | BPF_DW, BPF_REG_10, BPF_REG_0, -8, 0),
+ BPF_RAW_INSN(BPF_STX | BPF_ATOMIC | BPF_DW,
+ BPF_REG_10, BPF_REG_0, -8, BPF_ADD),
BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0),
BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, BPF_FUNC_get_hash_recalc),
BPF_EXIT_INSN(),
diff --git a/tools/testing/selftests/bpf/verifier/value_illegal_alu.c b/tools/testing/selftests/bpf/verifier/value_illegal_alu.c
index ed1c2ce..4890628 100644
--- a/tools/testing/selftests/bpf/verifier/value_illegal_alu.c
+++ b/tools/testing/selftests/bpf/verifier/value_illegal_alu.c
@@ -82,7 +82,7 @@
BPF_MOV64_REG(BPF_REG_2, BPF_REG_10),
BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8),
BPF_STX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0),
- BPF_STX_XADD(BPF_DW, BPF_REG_2, BPF_REG_3, 0),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_2, BPF_REG_3, 0),
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_2, 0),
BPF_ST_MEM(BPF_DW, BPF_REG_0, 0, 22),
BPF_EXIT_INSN(),
diff --git a/tools/testing/selftests/bpf/verifier/xadd.c b/tools/testing/selftests/bpf/verifier/xadd.c
index c5de2e6..b96ef35 100644
--- a/tools/testing/selftests/bpf/verifier/xadd.c
+++ b/tools/testing/selftests/bpf/verifier/xadd.c
@@ -3,7 +3,7 @@
.insns = {
BPF_MOV64_IMM(BPF_REG_0, 1),
BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
- BPF_STX_XADD(BPF_W, BPF_REG_10, BPF_REG_0, -7),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_10, BPF_REG_0, -7),
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
BPF_EXIT_INSN(),
},
@@ -22,7 +22,7 @@
BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1),
BPF_EXIT_INSN(),
BPF_MOV64_IMM(BPF_REG_1, 1),
- BPF_STX_XADD(BPF_W, BPF_REG_0, BPF_REG_1, 3),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_0, BPF_REG_1, 3),
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_0, 3),
BPF_EXIT_INSN(),
},
@@ -45,13 +45,13 @@
BPF_MOV64_IMM(BPF_REG_0, 1),
BPF_ST_MEM(BPF_W, BPF_REG_2, 0, 0),
BPF_ST_MEM(BPF_W, BPF_REG_2, 3, 0),
- BPF_STX_XADD(BPF_W, BPF_REG_2, BPF_REG_0, 1),
- BPF_STX_XADD(BPF_W, BPF_REG_2, BPF_REG_0, 2),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_2, BPF_REG_0, 1),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_2, BPF_REG_0, 2),
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_2, 1),
BPF_EXIT_INSN(),
},
.result = REJECT,
- .errstr = "BPF_XADD stores into R2 pkt is not allowed",
+ .errstr = "BPF_ATOMIC stores into R2 pkt is not allowed",
.prog_type = BPF_PROG_TYPE_XDP,
.flags = F_NEEDS_EFFICIENT_UNALIGNED_ACCESS,
},
@@ -62,8 +62,8 @@
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
- BPF_STX_XADD(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
- BPF_STX_XADD(BPF_DW, BPF_REG_10, BPF_REG_0, -8),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_10, BPF_REG_0, -8),
+ BPF_ATOMIC_OP(BPF_DW, BPF_ADD, BPF_REG_10, BPF_REG_0, -8),
BPF_JMP_REG(BPF_JNE, BPF_REG_6, BPF_REG_0, 3),
BPF_JMP_REG(BPF_JNE, BPF_REG_7, BPF_REG_10, 2),
BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_10, -8),
@@ -82,8 +82,8 @@
BPF_MOV64_REG(BPF_REG_6, BPF_REG_0),
BPF_MOV64_REG(BPF_REG_7, BPF_REG_10),
BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_0, -8),
- BPF_STX_XADD(BPF_W, BPF_REG_10, BPF_REG_0, -8),
- BPF_STX_XADD(BPF_W, BPF_REG_10, BPF_REG_0, -8),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_10, BPF_REG_0, -8),
+ BPF_ATOMIC_OP(BPF_W, BPF_ADD, BPF_REG_10, BPF_REG_0, -8),
BPF_JMP_REG(BPF_JNE, BPF_REG_6, BPF_REG_0, 3),
BPF_JMP_REG(BPF_JNE, BPF_REG_7, BPF_REG_10, 2),
BPF_LDX_MEM(BPF_W, BPF_REG_0, BPF_REG_10, -8),
diff --git a/tools/testing/selftests/futex/functional/.gitignore b/tools/testing/selftests/futex/functional/.gitignore
index 0efcd494..d661ef0 100644
--- a/tools/testing/selftests/futex/functional/.gitignore
+++ b/tools/testing/selftests/futex/functional/.gitignore
@@ -2,6 +2,7 @@
futex_requeue_pi
futex_requeue_pi_mismatched_ops
futex_requeue_pi_signal_restart
+futex_swap
futex_wait_private_mapped_file
futex_wait_timeout
futex_wait_uninitialized_heap
diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
index 2320782..6992fac 100644
--- a/tools/testing/selftests/futex/functional/Makefile
+++ b/tools/testing/selftests/futex/functional/Makefile
@@ -13,6 +13,7 @@
futex_requeue_pi \
futex_requeue_pi_signal_restart \
futex_requeue_pi_mismatched_ops \
+ futex_swap \
futex_wait_uninitialized_heap \
futex_wait_private_mapped_file
diff --git a/tools/testing/selftests/futex/functional/futex_swap.c b/tools/testing/selftests/futex/functional/futex_swap.c
new file mode 100644
index 0000000..9034d04
--- /dev/null
+++ b/tools/testing/selftests/futex/functional/futex_swap.c
@@ -0,0 +1,209 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <errno.h>
+#include <getopt.h>
+#include <pthread.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <time.h>
+#include "atomic.h"
+#include "futextest.h"
+
+/* The futex the main thread waits on. */
+futex_t futex_main = FUTEX_INITIALIZER;
+/* The futex the other thread wats on. */
+futex_t futex_other = FUTEX_INITIALIZER;
+
+/* The number of iterations to run (>1 => run benchmarks. */
+static int cfg_iterations = 1;
+
+/* If != 0, print diagnostic messages. */
+static int cfg_verbose;
+
+/* If == 0, do not use validation_counter. Useful for benchmarking. */
+static int cfg_validate = 1;
+
+/* How to swap threads. */
+#define SWAP_WAKE_WAIT 1
+#define SWAP_SWAP 2
+
+/* Futex values. */
+#define FUTEX_WAITING 0
+#define FUTEX_WAKEUP 1
+
+/* An atomic counter used to validate proper swapping. */
+static atomic_t validation_counter;
+
+void futex_swap_op(int mode, futex_t *futex_this, futex_t *futex_that)
+{
+ int ret;
+
+ switch (mode) {
+ case SWAP_WAKE_WAIT:
+ futex_set(futex_this, FUTEX_WAITING);
+ futex_set(futex_that, FUTEX_WAKEUP);
+ futex_wake(futex_that, 1, FUTEX_PRIVATE_FLAG);
+ futex_wait(futex_this, FUTEX_WAITING, NULL, FUTEX_PRIVATE_FLAG);
+ if (*futex_this != FUTEX_WAKEUP) {
+ fprintf(stderr, "unexpected futex_this value on wakeup\n");
+ exit(1);
+ }
+ break;
+
+ case SWAP_SWAP:
+ futex_set(futex_this, FUTEX_WAITING);
+ futex_set(futex_that, FUTEX_WAKEUP);
+ ret = futex_swap(futex_this, FUTEX_WAITING, NULL,
+ futex_that, FUTEX_PRIVATE_FLAG);
+ if (ret < 0 && errno == ENOSYS) {
+ /* futex_swap not implemented */
+ perror("futex_swap");
+ exit(1);
+ }
+ if (*futex_this != FUTEX_WAKEUP) {
+ fprintf(stderr, "unexpected futex_this value on wakeup\n");
+ exit(1);
+ }
+ break;
+
+ default:
+ fprintf(stderr, "unknown mode in %s\n", __func__);
+ exit(1);
+ }
+}
+
+void *other_thread(void *arg)
+{
+ int mode = *((int *)arg);
+ int counter;
+
+ if (cfg_verbose)
+ printf("%s started\n", __func__);
+
+ futex_wait(&futex_other, 0, NULL, FUTEX_PRIVATE_FLAG);
+
+ for (counter = 0; counter < cfg_iterations; ++counter) {
+ if (cfg_validate) {
+ int prev = 2 * counter + 1;
+
+ if (prev != atomic_cmpxchg(&validation_counter, prev,
+ prev + 1)) {
+ fprintf(stderr, "swap validation failed\n");
+ exit(1);
+ }
+ }
+ futex_swap_op(mode, &futex_other, &futex_main);
+ }
+
+ if (cfg_verbose)
+ printf("%s finished: %d iteration(s)\n", __func__, counter);
+
+ return NULL;
+}
+
+void run_test(int mode)
+{
+ struct timespec start, stop;
+ int ret, counter;
+ pthread_t thread;
+ uint64_t duration;
+
+ futex_set(&futex_other, FUTEX_WAITING);
+ atomic_set(&validation_counter, 0);
+ ret = pthread_create(&thread, NULL, &other_thread, &mode);
+ if (ret) {
+ perror("pthread_create");
+ exit(1);
+ }
+
+ ret = clock_gettime(CLOCK_MONOTONIC, &start);
+ if (ret) {
+ perror("clock_gettime");
+ exit(1);
+ }
+
+ for (counter = 0; counter < cfg_iterations; ++counter) {
+ if (cfg_validate) {
+ int prev = 2 * counter;
+
+ if (prev != atomic_cmpxchg(&validation_counter, prev,
+ prev + 1)) {
+ fprintf(stderr, "swap validation failed\n");
+ exit(1);
+ }
+ }
+ futex_swap_op(mode, &futex_main, &futex_other);
+ }
+ if (cfg_validate && validation_counter.val != 2 * cfg_iterations) {
+ fprintf(stderr, "final swap validation failed\n");
+ exit(1);
+ }
+
+ ret = clock_gettime(CLOCK_MONOTONIC, &stop);
+ if (ret) {
+ perror("clock_gettime");
+ exit(1);
+ }
+
+ duration = (stop.tv_sec - start.tv_sec) * 1000000000LL +
+ stop.tv_nsec - start.tv_nsec;
+ if (cfg_verbose || cfg_iterations > 1) {
+ printf("completed %d swap and back iterations in %lu ns: %lu ns per swap\n",
+ cfg_iterations, duration,
+ duration / (cfg_iterations * 2));
+ }
+
+ /* The remote thread is blocked; send it the final wake. */
+ futex_set(&futex_other, FUTEX_WAKEUP);
+ futex_wake(&futex_other, 1, FUTEX_PRIVATE_FLAG);
+ if (pthread_join(thread, NULL)) {
+ perror("pthread_join");
+ exit(1);
+ }
+}
+
+void usage(char *prog)
+{
+ printf("Usage: %s\n", prog);
+ printf(" -h Display this help message\n");
+ printf(" -i N Use N iterations to benchmark\n");
+ printf(" -n Do not validate swapping correctness\n");
+ printf(" -v Print diagnostic messages\n");
+}
+
+int main(int argc, char *argv[])
+{
+ int c;
+
+ while ((c = getopt(argc, argv, "hi:nv")) != -1) {
+ switch (c) {
+ case 'h':
+ usage(basename(argv[0]));
+ exit(0);
+ case 'i':
+ cfg_iterations = atoi(optarg);
+ break;
+ case 'n':
+ cfg_validate = 0;
+ break;
+ case 'v':
+ cfg_verbose = 1;
+ break;
+ default:
+ usage(basename(argv[0]));
+ exit(1);
+ }
+ }
+
+ printf("\n\n------- running SWAP_WAKE_WAIT -----------\n\n");
+ run_test(SWAP_WAKE_WAIT);
+ printf("PASS\n");
+
+ printf("\n\n------- running SWAP_SWAP -----------\n\n");
+ run_test(SWAP_SWAP);
+ printf("PASS\n");
+
+ return 0;
+}
diff --git a/tools/testing/selftests/futex/include/futextest.h b/tools/testing/selftests/futex/include/futextest.h
index ddbcfc9..d2861fd 100644
--- a/tools/testing/selftests/futex/include/futextest.h
+++ b/tools/testing/selftests/futex/include/futextest.h
@@ -38,6 +38,9 @@
#ifndef FUTEX_CMP_REQUEUE_PI
#define FUTEX_CMP_REQUEUE_PI 12
#endif
+#ifndef GFUTEX_SWAP
+#define GFUTEX_SWAP 60
+#endif
#ifndef FUTEX_WAIT_REQUEUE_PI_PRIVATE
#define FUTEX_WAIT_REQUEUE_PI_PRIVATE (FUTEX_WAIT_REQUEUE_PI | \
FUTEX_PRIVATE_FLAG)
@@ -205,6 +208,19 @@
}
/**
+ * futex_swap() - block on uaddr and wake one task blocked on uaddr2.
+ * @uaddr: futex to block the current task on
+ * @timeout: relative timeout for the current task block
+ * @uaddr2: futex to wake tasks at (can be the same as uaddr)
+ */
+static inline int
+futex_swap(futex_t *uaddr, futex_t val, struct timespec *timeout,
+ futex_t *uaddr2, int opflags)
+{
+ return futex(uaddr, GFUTEX_SWAP, val, timeout, uaddr2, 0, opflags);
+}
+
+/**
* futex_cmpxchg() - atomic compare and exchange
* @uaddr: The address of the futex to be modified
* @oldval: The expected value of the futex