From 08115cdf70b06f33ff3689720b301ee68ab3b80b Mon Sep 17 00:00:00 2001
From: Oleg Nesterov <oleg@redhat.com>
Date: Fri, 13 Jun 2025 19:26:50 +0200
Subject: [PATCH 01/49] UPSTREAM: posix-cpu-timers: fix race between
 handle_posix_cpu_timers() and posix_cpu_timer_del()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

commit f90fff1e152dedf52b932240ebbd670d83330eca upstream.

If an exiting non-autoreaping task has already passed exit_notify() and
calls handle_posix_cpu_timers() from IRQ, it can be reaped by its parent
or debugger right after unlock_task_sighand().

If a concurrent posix_cpu_timer_del() runs at that moment, it won't be
able to detect timer->it.cpu.firing != 0: cpu_timer_task_rcu() and/or
lock_task_sighand() will fail.

Add the tsk->exit_state check into run_posix_cpu_timers() to fix this.

This fix is not needed if CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y, because
exit_task_work() is called before exit_notify(). But the check still
makes sense, task_work_add(&tsk->posix_cputimers_work.work) will fail
anyway in this case.

Bug: 425282960
Cc: stable@vger.kernel.org
Reported-by: Benoît Sevens <bsevens@google.com>
Fixes: 0bdd2ed4138e ("sched: run_posix_cpu_timers: Don't check ->exit_state, use lock_task_sighand()")
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c29d5318708e67ac13c1b6fc1007d179fb65b4d7)
Signed-off-by: Lee Jones <joneslee@google.com>
Change-Id: I2a9b8114abf2647c346e763edee1d424a07e86fe
---
 kernel/time/posix-cpu-timers.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/time/posix-cpu-timers.c b/kernel/time/posix-cpu-timers.c
index e9c6f9d0e42c..9af1f2a72a0a 100644
--- a/kernel/time/posix-cpu-timers.c
+++ b/kernel/time/posix-cpu-timers.c
@@ -1437,6 +1437,15 @@ void run_posix_cpu_timers(void)
 
 	lockdep_assert_irqs_disabled();
 
+	/*
+	 * Ensure that release_task(tsk) can't happen while
+	 * handle_posix_cpu_timers() is running. Otherwise, a concurrent
+	 * posix_cpu_timer_del() may fail to lock_task_sighand(tsk) and
+	 * miss timer->it.cpu.firing != 0.
+	 */
+	if (tsk->exit_state)
+		return;
+
 	/*
 	 * If the actual expiry is deferred to task work context and the
 	 * work is already scheduled there is no point to do anything here.

From 3b5bd5416eb3ab4dccd436690affae5036b90953 Mon Sep 17 00:00:00 2001
From: Shiming Cheng <shiming.cheng@mediatek.com>
Date: Fri, 30 May 2025 09:26:08 +0800
Subject: [PATCH 02/49] UPSTREAM: net: fix udp gso skb_segment after pull from
 frag_list

Commit a1e40ac5b5e9 ("net: gso: fix udp gso fraglist segmentation after
pull from frag_list") detected invalid geometry in frag_list skbs and
redirects them from skb_segment_list to more robust skb_segment. But some
packets with modified geometry can also hit bugs in that code. We don't
know how many such cases exist. Addressing each one by one also requires
touching the complex skb_segment code, which risks introducing bugs for
other types of skbs. Instead, linearize all these packets that fail the
basic invariants on gso fraglist skbs. That is more robust.

If only part of the fraglist payload is pulled into head_skb, it will
always cause exception when splitting skbs by skb_segment. For detailed
call stack information, see below.

Valid SKB_GSO_FRAGLIST skbs
- consist of two or more segments
- the head_skb holds the protocol headers plus first gso_size
- one or more frag_list skbs hold exactly one segment
- all but the last must be gso_size

Optional datapath hooks such as NAT and BPF (bpf_skb_pull_data) can
modify fraglist skbs, breaking these invariants.

In extreme cases they pull one part of data into skb linear. For UDP,
this  causes three payloads with lengths of (11,11,10) bytes were
pulled tail to become (12,10,10) bytes.

The skbs no longer meets the above SKB_GSO_FRAGLIST conditions because
payload was pulled into head_skb, it needs to be linearized before pass
to regular skb_segment.

    skb_segment+0xcd0/0xd14
    __udp_gso_segment+0x334/0x5f4
    udp4_ufo_fragment+0x118/0x15c
    inet_gso_segment+0x164/0x338
    skb_mac_gso_segment+0xc4/0x13c
    __skb_gso_segment+0xc4/0x124
    validate_xmit_skb+0x9c/0x2c0
    validate_xmit_skb_list+0x4c/0x80
    sch_direct_xmit+0x70/0x404
    __dev_queue_xmit+0x64c/0xe5c
    neigh_resolve_output+0x178/0x1c4
    ip_finish_output2+0x37c/0x47c
    __ip_finish_output+0x194/0x240
    ip_finish_output+0x20/0xf4
    ip_output+0x100/0x1a0
    NF_HOOK+0xc4/0x16c
    ip_forward+0x314/0x32c
    ip_rcv+0x90/0x118
    __netif_receive_skb+0x74/0x124
    process_backlog+0xe8/0x1a4
    __napi_poll+0x5c/0x1f8
    net_rx_action+0x154/0x314
    handle_softirqs+0x154/0x4b8

    [118.376811] [C201134] rxq0_pus: [name:bug&]kernel BUG at net/core/skbuff.c:4278!
    [118.376829] [C201134] rxq0_pus: [name:traps&]Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
    [118.470774] [C201134] rxq0_pus: [name:mrdump&]Kernel Offset: 0x178cc00000 from 0xffffffc008000000
    [118.470810] [C201134] rxq0_pus: [name:mrdump&]PHYS_OFFSET: 0x40000000
    [118.470827] [C201134] rxq0_pus: [name:mrdump&]pstate: 60400005 (nZCv daif +PAN -UAO)
    [118.470848] [C201134] rxq0_pus: [name:mrdump&]pc : [0xffffffd79598aefc] skb_segment+0xcd0/0xd14
    [118.470900] [C201134] rxq0_pus: [name:mrdump&]lr : [0xffffffd79598a5e8] skb_segment+0x3bc/0xd14
    [118.470928] [C201134] rxq0_pus: [name:mrdump&]sp : ffffffc008013770

Fixes: a1e40ac5b5e9 ("gso: fix udp gso fraglist segmentation after pull from frag_list")
Bug: 426014478
Change-Id: Ib9d9c84b6f20afc1e1d129ceb59c9c3a7eb8e6de
(cherry picked from commit 	3382a1ed7f778db841063f5d7e317ac55f9e7f72)
Signed-off-by: Shiming Cheng <shiming.cheng@mediatek.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
---
 net/ipv4/udp_offload.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 132cfc3b2c84..3870b59f5400 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -332,6 +332,7 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 	bool copy_dtor;
 	__sum16 check;
 	__be16 newlen;
+	int ret = 0;
 
 	mss = skb_shinfo(gso_skb)->gso_size;
 	if (gso_skb->len <= sizeof(*uh) + mss)
@@ -354,6 +355,10 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 		if (skb_pagelen(gso_skb) - sizeof(*uh) == skb_shinfo(gso_skb)->gso_size)
 			return __udp_gso_segment_list(gso_skb, features, is_ipv6);
 
+		ret = __skb_linearize(gso_skb);
+		if (ret)
+			return ERR_PTR(ret);
+
 		 /* Setup csum, as fraglist skips this in udp4_gro_receive. */
 		gso_skb->csum_start = skb_transport_header(gso_skb) - gso_skb->head;
 		gso_skb->csum_offset = offsetof(struct udphdr, check);

From 279274c126be34b41ed69765f9d23ff9f3031e76 Mon Sep 17 00:00:00 2001
From: Mukesh Pilaniya <quic_mpilaniy@quicinc.com>
Date: Fri, 27 Jun 2025 13:33:16 +0530
Subject: [PATCH 03/49] ANDROID: virt: gunyah: Replace arm_smccc_1_1_smc with
 arm_smccc_1_1_invoke

Replace arm_smccc_1_1_smc with arm_smccc_1_1_invoke because
arm_smccc_1_1_invoke() determines the conduit (hvc/smc/none) before
making an SMC, which may not be supported on some virtual platforms.

Bug: 428106948
Change-Id: Ib21c7790b03996e73caa0874dc826d78e7b1c3d8
Signed-off-by: Mukesh Pilaniya <quic_mpilaniy@quicinc.com>
---
 drivers/virt/gunyah/gunyah_qcom.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/virt/gunyah/gunyah_qcom.c b/drivers/virt/gunyah/gunyah_qcom.c
index f2342d51a018..622d6a07db02 100644
--- a/drivers/virt/gunyah/gunyah_qcom.c
+++ b/drivers/virt/gunyah/gunyah_qcom.c
@@ -187,7 +187,7 @@ static bool gunyah_has_qcom_extensions(void)
 	uuid_t uuid;
 	u32 *up;
 
-	arm_smccc_1_1_smc(GUNYAH_QCOM_EXT_CALL_UUID_ID, &res);
+	arm_smccc_1_1_invoke(GUNYAH_QCOM_EXT_CALL_UUID_ID, &res);
 
 	up = (u32 *)&uuid.b[0];
 	up[0] = lower_32_bits(res.a0);

From cb35713803cd6692440647d8c8998c149760848e Mon Sep 17 00:00:00 2001
From: VAMSHI GAJJELA <vamshigajjela@google.com>
Date: Tue, 1 Jul 2025 11:30:16 +0000
Subject: [PATCH 04/49] ANDROID: scsi: ufs: add
 UFSHCD_ANDROID_QUIRK_NO_IS_READ_ON_H8

Add UFSHCD_ANDROID_QUIRK_NO_IS_READ_ON_H8 for host controllers which
break when the Interrupt Status register is re-read after entering
hibern8. In such cases after hibern8 entry is reported, no further
register access will occur in the interrupt handler.

Bug: 350576949
Change-Id: I8e810c96203a97f030216aae39253a2e102c7ebf
Signed-off-by: VAMSHI GAJJELA <vamshigajjela@google.com>
---
 drivers/ufs/core/ufshcd.c | 5 +++++
 include/ufs/ufshcd.h      | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index b21d96365b65..0c4eb4a48fc0 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -7006,6 +7006,11 @@ static irqreturn_t ufshcd_intr(int irq, void *__hba)
 		if (enabled_intr_status)
 			retval |= ufshcd_sl_intr(hba, enabled_intr_status);
 
+		if (hba->android_quirks &
+			    UFSHCD_ANDROID_QUIRK_NO_IS_READ_ON_H8 &&
+		    intr_status & UIC_HIBERNATE_ENTER)
+			break;
+
 		intr_status = ufshcd_readl(hba, REG_INTERRUPT_STATUS);
 	}
 
diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h
index 66bd5c15375e..cde9ad6489b2 100644
--- a/include/ufs/ufshcd.h
+++ b/include/ufs/ufshcd.h
@@ -704,6 +704,9 @@ enum ufshcd_android_quirks {
 
 	/* Set IID to one. */
 	UFSHCD_ANDROID_QUIRK_SET_IID_TO_ONE		= 1 << 30,
+
+	/* Do not read IS after H8 enter */
+	UFSHCD_ANDROID_QUIRK_NO_IS_READ_ON_H8		= 1 << 31,
 };
 
 enum ufshcd_caps {

From e0a00524db094fcb6240182ab0638a0ccb44efd5 Mon Sep 17 00:00:00 2001
From: "T.J. Mercier" <tjmercier@google.com>
Date: Fri, 28 Mar 2025 22:05:04 +0000
Subject: [PATCH 05/49] ANDROID: gki_defconfig: Enable CONFIG_UDMABUF

The main use case is to allow very large O_DIRECT writes into a memfd,
which can then be converted into a udmabuf. (O_DIRECT writes into
regular dmabufs are not possible.)

Bug: 303531391
Bug: 389839576
Change-Id: Ifd970826ed1ecb4fe2d365854bcd19276b07f614
Signed-off-by: T.J. Mercier <tjmercier@google.com>
(cherry picked from commit 2f84f21fd838bd4203626fe300404d3ce923f770)
Bug: 423003849
---
 arch/arm64/configs/gki_defconfig | 1 +
 arch/x86/configs/gki_defconfig   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/arm64/configs/gki_defconfig b/arch/arm64/configs/gki_defconfig
index aee331a1430b..7b89c07f23b5 100644
--- a/arch/arm64/configs/gki_defconfig
+++ b/arch/arm64/configs/gki_defconfig
@@ -581,6 +581,7 @@ CONFIG_RTC_CLASS=y
 CONFIG_RTC_LIB_KUNIT_TEST=m
 CONFIG_RTC_DRV_PL030=y
 CONFIG_RTC_DRV_PL031=y
+CONFIG_UDMABUF=y
 CONFIG_DMABUF_HEAPS=y
 CONFIG_DMABUF_SYSFS_STATS=y
 CONFIG_DMABUF_HEAPS_DEFERRED_FREE=y
diff --git a/arch/x86/configs/gki_defconfig b/arch/x86/configs/gki_defconfig
index c7bd6055c20b..6fe64231eb62 100644
--- a/arch/x86/configs/gki_defconfig
+++ b/arch/x86/configs/gki_defconfig
@@ -535,6 +535,7 @@ CONFIG_LEDS_TRIGGER_TRANSIENT=y
 CONFIG_EDAC=y
 CONFIG_RTC_CLASS=y
 CONFIG_RTC_LIB_KUNIT_TEST=m
+CONFIG_UDMABUF=y
 CONFIG_DMABUF_HEAPS=y
 CONFIG_DMABUF_SYSFS_STATS=y
 CONFIG_DMABUF_HEAPS_DEFERRED_FREE=y

From f45ef0a06f85390cbc8574c5b29c2dd0af4adf7d Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 25 Apr 2024 20:56:04 -0700
Subject: [PATCH 06/49] UPSTREAM: mm: page_alloc: change move_freepages() to
 __move_freepages_block()

The function is now supposed to be called only on a single pageblock and
checks start_pfn and end_pfn accordingly.  Rename it to make this more
obvious and drop the end_pfn parameter which can be determined trivially
and none of the callers use it for anything else.

Also make the (now internal) end_pfn exclusive, which is more common.

Link: https://lkml.kernel.org/r/81b1d642-2ec0-49f5-89fc-19a3828419ff@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit e1f42a577f63647dadf1abe4583053c03d6be045)
Change-Id: I1e9ecd1670fda3edafff834849fbac2705a36324
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/page_alloc.c | 43 ++++++++++++++++++++-----------------------
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 152b0424fcbf..7f9f3f3df9f9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1766,18 +1766,18 @@ static inline struct page *__rmqueue_cma_fallback(struct zone *zone,
  * Change the type of a block and move all its free pages to that
  * type's freelist.
  */
-static int move_freepages(struct zone *zone, unsigned long start_pfn,
-			  unsigned long end_pfn, int old_mt, int new_mt)
+static int __move_freepages_block(struct zone *zone, unsigned long start_pfn,
+				  int old_mt, int new_mt)
 {
 	struct page *page;
-	unsigned long pfn;
+	unsigned long pfn, end_pfn;
 	unsigned int order;
 	int pages_moved = 0;
 
 	VM_WARN_ON(start_pfn & (pageblock_nr_pages - 1));
-	VM_WARN_ON(start_pfn + pageblock_nr_pages - 1 != end_pfn);
+	end_pfn = pageblock_end_pfn(start_pfn);
 
-	for (pfn = start_pfn; pfn <= end_pfn;) {
+	for (pfn = start_pfn; pfn < end_pfn;) {
 		page = pfn_to_page(pfn);
 		if (!PageBuddy(page)) {
 			pfn++;
@@ -1803,14 +1803,13 @@ static int move_freepages(struct zone *zone, unsigned long start_pfn,
 
 static bool prep_move_freepages_block(struct zone *zone, struct page *page,
 				      unsigned long *start_pfn,
-				      unsigned long *end_pfn,
 				      int *num_free, int *num_movable)
 {
 	unsigned long pfn, start, end;
 
 	pfn = page_to_pfn(page);
 	start = pageblock_start_pfn(pfn);
-	end = pageblock_end_pfn(pfn) - 1;
+	end = pageblock_end_pfn(pfn);
 
 	/*
 	 * The caller only has the lock for @zone, don't touch ranges
@@ -1821,16 +1820,15 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page,
 	 */
 	if (!zone_spans_pfn(zone, start))
 		return false;
-	if (!zone_spans_pfn(zone, end))
+	if (!zone_spans_pfn(zone, end - 1))
 		return false;
 
 	*start_pfn = start;
-	*end_pfn = end;
 
 	if (num_free) {
 		*num_free = 0;
 		*num_movable = 0;
-		for (pfn = start; pfn <= end;) {
+		for (pfn = start; pfn < end;) {
 			page = pfn_to_page(pfn);
 			if (PageBuddy(page)) {
 				int nr = 1 << buddy_order(page);
@@ -1856,13 +1854,12 @@ static bool prep_move_freepages_block(struct zone *zone, struct page *page,
 static int move_freepages_block(struct zone *zone, struct page *page,
 				int old_mt, int new_mt)
 {
-	unsigned long start_pfn, end_pfn;
+	unsigned long start_pfn;
 
-	if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn,
-				       NULL, NULL))
+	if (!prep_move_freepages_block(zone, page, &start_pfn, NULL, NULL))
 		return -1;
 
-	return move_freepages(zone, start_pfn, end_pfn, old_mt, new_mt);
+	return __move_freepages_block(zone, start_pfn, old_mt, new_mt);
 }
 
 #ifdef CONFIG_MEMORY_ISOLATION
@@ -1933,10 +1930,9 @@ static void split_large_buddy(struct zone *zone, struct page *page,
 bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 				  int migratetype)
 {
-	unsigned long start_pfn, end_pfn, pfn;
+	unsigned long start_pfn, pfn;
 
-	if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn,
-				       NULL, NULL))
+	if (!prep_move_freepages_block(zone, page, &start_pfn, NULL, NULL))
 		return false;
 
 	/* No splits needed if buddies can't span multiple blocks */
@@ -1967,8 +1963,9 @@ bool move_freepages_block_isolate(struct zone *zone, struct page *page,
 		return true;
 	}
 move:
-	move_freepages(zone, start_pfn, end_pfn,
-		       get_pfnblock_migratetype(page, start_pfn), migratetype);
+	__move_freepages_block(zone, start_pfn,
+			       get_pfnblock_migratetype(page, start_pfn),
+			       migratetype);
 	return true;
 }
 #endif /* CONFIG_MEMORY_ISOLATION */
@@ -2068,7 +2065,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 			unsigned int alloc_flags, bool whole_block)
 {
 	int free_pages, movable_pages, alike_pages;
-	unsigned long start_pfn, end_pfn;
+	unsigned long start_pfn;
 	int block_type;
 
 	block_type = get_pageblock_migratetype(page);
@@ -2101,8 +2098,8 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 		goto single_page;
 
 	/* moving whole block can fail due to zone boundary conditions */
-	if (!prep_move_freepages_block(zone, page, &start_pfn, &end_pfn,
-				       &free_pages, &movable_pages))
+	if (!prep_move_freepages_block(zone, page, &start_pfn, &free_pages,
+				       &movable_pages))
 		goto single_page;
 
 	/*
@@ -2132,7 +2129,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 	 */
 	if (free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
 			page_group_by_mobility_disabled) {
-		move_freepages(zone, start_pfn, end_pfn, block_type, start_type);
+		__move_freepages_block(zone, start_pfn, block_type, start_type);
 		return __rmqueue_smallest(zone, order, start_type);
 	}
 

From bf5861fc36df819b62a20e06a5885e178954d8fa Mon Sep 17 00:00:00 2001
From: Huan Yang <link@vivo.com>
Date: Mon, 26 Aug 2024 14:40:48 +0800
Subject: [PATCH 07/49] UPSTREAM: mm: page_alloc: simpify page del and expand

When page del from buddy and need expand, it will account free_pages in
zone's migratetype.

The current way is to subtract the page number of the current order when
deleting, and then add it back when expanding.

This is unnecessary, as when migrating the same type, we can directly
record the difference between the high-order pages and the expand added,
and then subtract it directly.

This patch merge that, only when del and expand done, then account
free_pages.

Link: https://lkml.kernel.org/r/20240826064048.187790-1-link@vivo.com
Signed-off-by: Huan Yang <link@vivo.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit 94deaf69dcd33462c61fa8cabb0883e3085a1046)
Change-Id: I26196bc41cbf0f64dc9a9bc2249c9c814ca055d0
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/page_alloc.c | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7f9f3f3df9f9..1c1098a37e48 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1518,11 +1518,11 @@ struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
  *
  * -- nyc
  */
-static inline void expand(struct zone *zone, struct page *page,
-	int low, int high, int migratetype)
+static inline unsigned int expand(struct zone *zone, struct page *page, int low,
+				  int high, int migratetype)
 {
-	unsigned long size = 1 << high;
-	unsigned long nr_added = 0;
+	unsigned int size = 1 << high;
+	unsigned int nr_added = 0;
 
 	while (high > low) {
 		high--;
@@ -1542,7 +1542,19 @@ static inline void expand(struct zone *zone, struct page *page,
 		set_buddy_order(&page[size], high);
 		nr_added += size;
 	}
-	account_freepages(zone, nr_added, migratetype);
+
+	return nr_added;
+}
+
+static __always_inline void page_del_and_expand(struct zone *zone,
+						struct page *page, int low,
+						int high, int migratetype)
+{
+	int nr_pages = 1 << high;
+
+	__del_page_from_free_list(page, zone, high, migratetype);
+	nr_pages -= expand(zone, page, low, high, migratetype);
+	account_freepages(zone, -nr_pages, migratetype);
 }
 
 static void check_new_page_bad(struct page *page)
@@ -1727,8 +1739,9 @@ struct page *__rmqueue_smallest(struct zone *zone, unsigned int order,
 		page = get_page_from_free_area(area, migratetype);
 		if (!page)
 			continue;
-		del_page_from_free_list(page, zone, current_order, migratetype);
-		expand(zone, page, order, current_order, migratetype);
+
+		page_del_and_expand(zone, page, order, current_order,
+				    migratetype);
 		trace_mm_page_alloc_zone_locked(page, order, migratetype,
 				pcp_allowed_order(order) &&
 				migratetype < MIGRATE_PCPTYPES);
@@ -2079,9 +2092,12 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
+		unsigned int nr_added;
+
 		del_page_from_free_list(page, zone, current_order, block_type);
 		change_pageblock_range(page, current_order, start_type);
-		expand(zone, page, order, current_order, start_type);
+		nr_added = expand(zone, page, order, current_order, start_type);
+		account_freepages(zone, nr_added, start_type);
 		return page;
 	}
 
@@ -2134,8 +2150,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 	}
 
 single_page:
-	del_page_from_free_list(page, zone, current_order, block_type);
-	expand(zone, page, order, current_order, block_type);
+	page_del_and_expand(zone, page, order, current_order, block_type);
 	return page;
 }
 

From 4e131ac87c4127c9b02838e58e2b5a87559c81d1 Mon Sep 17 00:00:00 2001
From: gaoxiang17 <gaoxiang17@xiaomi.com>
Date: Fri, 20 Sep 2024 20:20:30 +0800
Subject: [PATCH 08/49] UPSTREAM: mm/page_alloc: add some detailed comments in
 can_steal_fallback

mm/page_alloc: add some detailed comments in can_steal_fallback

[akpm@linux-foundation.org: tweak grammar, fit to 80 cols]
Link: https://lkml.kernel.org/r/20240920122030.159751-1-gxxa03070307@gmail.com
Signed-off-by: gaoxiang17 <gaoxiang17@xiaomi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit 6025ea5abbe5d813d6a41c78e6ea14259fb503f4)
Change-Id: Ib4a77bf96edeba6ce2c6627c99aacaf148b07d92
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/page_alloc.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1c1098a37e48..f887c9bc0152 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2018,6 +2018,14 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
 	if (order >= pageblock_order)
 		return true;
 
+	/*
+	 * Movable pages won't cause permanent fragmentation, so when you alloc
+	 * small pages, you just need to temporarily steal unmovable or
+	 * reclaimable pages that are closest to the request size.  After a
+	 * while, memory compaction may occur to form large contiguous pages,
+	 * and the next movable allocation may not need to steal.  Unmovable and
+	 * reclaimable allocations need to actually steal pages.
+	 */
 	if (order >= pageblock_order / 2 ||
 		start_mt == MIGRATE_RECLAIMABLE ||
 		start_mt == MIGRATE_UNMOVABLE ||

From 65b7c505d9e1780c7110cfaf9f26a1513c845fc0 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 24 Feb 2025 19:08:24 -0500
Subject: [PATCH 09/49] BACKPORT: mm: page_alloc: don't steal single pages from
 biggest buddy
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The fallback code searches for the biggest buddy first in an attempt to
steal the whole block and encourage type grouping down the line.

The approach used to be this:

- Non-movable requests will split the largest buddy and steal the
  remainder. This splits up contiguity, but it allows subsequent
  requests of this type to fall back into adjacent space.

- Movable requests go and look for the smallest buddy instead. The
  thinking is that movable requests can be compacted, so grouping is
  less important than retaining contiguity.

c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block
conversion") enforces freelist type hygiene, which restricts stealing to
either claiming the whole block or just taking the requested chunk; no
additional pages or buddy remainders can be stolen any more.

The patch mishandled when to switch to finding the smallest buddy in that
new reality.  As a result, it may steal the exact request size, but from
the biggest buddy.  This causes fracturing for no good reason.

Fix this by committing to the new behavior: either steal the whole block,
or fall back to the smallest buddy.

Remove single-page stealing from steal_suitable_fallback().  Rename it to
try_to_steal_block() to make the intentions clear.  If this fails, always
fall back to the smallest buddy.

The following is from 4 runs of mmtest's thpchallenge.  "Pollute" is
single page fallback, "steal" is conversion of a partially used block.
The numbers for free block conversions (omitted) are comparable.

				     vanilla	      patched

@pollute[unmovable from reclaimable]:	  27		  106
@pollute[unmovable from movable]:	  82		   46
@pollute[reclaimable from unmovable]:	 256		   83
@pollute[reclaimable from movable]:	  46		    8
@pollute[movable from unmovable]:	4841		  868
@pollute[movable from reclaimable]:	5278		12568

@steal[unmovable from reclaimable]:	  11		   12
@steal[unmovable from movable]:		 113		   49
@steal[reclaimable from unmovable]:	  19		   34
@steal[reclaimable from movable]:	  47		   21
@steal[movable from unmovable]:		 250		  183
@steal[movable from reclaimable]:	  81		   93

The allocator appears to do a better job at keeping stealing and polluting
to the first fallback preference.  As a result, the numbers for "from
movable" - the least preferred fallback option, and most detrimental to
compactability - are down across the board.

Link: https://lkml.kernel.org/r/20250225001023.1494422-2-hannes@cmpxchg.org
Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit c2f6ea38fc1b640aa7a2e155cc1c0410ff91afa2)
[
MAX_PAGE_ORDER is not defined in linux-6.6, so it is replaced with MAX_ORDER.
The original patch：
- VM_BUG_ON(current_order > MAX_PAGE_ORDER);
linux-6.6 patch:
- VM_BUG_ON(current_order > MAX_ORDER);
]
Change-Id: I44a62580f1fcb53a2baff6ce3a8af08e9a20fdc0
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/page_alloc.c | 80 +++++++++++++++++++++----------------------------
 1 file changed, 34 insertions(+), 46 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index f887c9bc0152..1eddb7b66336 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2077,13 +2077,12 @@ static inline bool boost_watermark(struct zone *zone)
  * can claim the whole pageblock for the requested migratetype. If not, we check
  * the pageblock for constituent pages; if at least half of the pages are free
  * or compatible, we can still claim the whole block, so pages freed in the
- * future will be put on the correct free list. Otherwise, we isolate exactly
- * the order we need from the fallback block and leave its migratetype alone.
+ * future will be put on the correct free list.
  */
 static struct page *
-steal_suitable_fallback(struct zone *zone, struct page *page,
-			int current_order, int order, int start_type,
-			unsigned int alloc_flags, bool whole_block)
+try_to_steal_block(struct zone *zone, struct page *page,
+		   int current_order, int order, int start_type,
+		   unsigned int alloc_flags)
 {
 	int free_pages, movable_pages, alike_pages;
 	unsigned long start_pfn;
@@ -2096,7 +2095,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 	 * highatomic accounting.
 	 */
 	if (is_migrate_highatomic(block_type))
-		goto single_page;
+		return NULL;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -2117,14 +2116,10 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 	if (boost_watermark(zone) && (alloc_flags & ALLOC_KSWAPD))
 		set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
 
-	/* We are not allowed to try stealing from the whole block */
-	if (!whole_block)
-		goto single_page;
-
 	/* moving whole block can fail due to zone boundary conditions */
 	if (!prep_move_freepages_block(zone, page, &start_pfn, &free_pages,
 				       &movable_pages))
-		goto single_page;
+		return NULL;
 
 	/*
 	 * Determine how many pages are compatible with our allocation.
@@ -2157,9 +2152,7 @@ steal_suitable_fallback(struct zone *zone, struct page *page,
 		return __rmqueue_smallest(zone, order, start_type);
 	}
 
-single_page:
-	page_del_and_expand(zone, page, order, current_order, block_type);
-	return page;
+	return NULL;
 }
 
 /*
@@ -2351,14 +2344,19 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 }
 
 /*
- * Try finding a free buddy page on the fallback list and put it on the free
- * list of requested migratetype, possibly along with other pages from the same
- * block, depending on fragmentation avoidance heuristics. Returns true if
- * fallback was found so that __rmqueue_smallest() can grab it.
+ * Try finding a free buddy page on the fallback list.
+ *
+ * This will attempt to steal a whole pageblock for the requested type
+ * to ensure grouping of such requests in the future.
+ *
+ * If a whole block cannot be stolen, regress to __rmqueue_smallest()
+ * logic to at least break up as little contiguity as possible.
  *
  * The use of signed ints for order and current_order is a deliberate
  * deviation from the rest of this file, to make the for loop
  * condition simpler.
+ *
+ * Return the stolen page, or NULL if none can be found.
  */
 static __always_inline struct page *
 __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
@@ -2392,45 +2390,35 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 		if (fallback_mt == -1)
 			continue;
 
-		/*
-		 * We cannot steal all free pages from the pageblock and the
-		 * requested migratetype is movable. In that case it's better to
-		 * steal and split the smallest available page instead of the
-		 * largest available page, because even if the next movable
-		 * allocation falls back into a different pageblock than this
-		 * one, it won't cause permanent fragmentation.
-		 */
-		if (!can_steal && start_migratetype == MIGRATE_MOVABLE
-					&& current_order > order)
-			goto find_smallest;
+		if (!can_steal)
+			break;
 
-		goto do_steal;
+		page = get_page_from_free_area(area, fallback_mt);
+		page = try_to_steal_block(zone, page, current_order, order,
+					  start_migratetype, alloc_flags);
+		if (page)
+			goto got_one;
 	}
 
-	return NULL;
+	if (alloc_flags & ALLOC_NOFRAGMENT)
+		return NULL;
 
-find_smallest:
+	/* No luck stealing blocks. Find the smallest fallback page */
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
 				start_migratetype, false, &can_steal);
-		if (fallback_mt != -1)
-			break;
+		if (fallback_mt == -1)
+			continue;
+
+		page = get_page_from_free_area(area, fallback_mt);
+		page_del_and_expand(zone, page, order, current_order, fallback_mt);
+		goto got_one;
 	}
 
-	/*
-	 * This should not happen - we already found a suitable fallback
-	 * when looking for the largest page.
-	 */
-	VM_BUG_ON(current_order > MAX_ORDER);
-
-do_steal:
-	page = get_page_from_free_area(area, fallback_mt);
-
-	/* take off list, maybe claim block, expand remainder */
-	page = steal_suitable_fallback(zone, page, current_order, order,
-				       start_migratetype, alloc_flags, can_steal);
+	return NULL;
 
+got_one:
 	trace_mm_page_alloc_extfrag(page, order, current_order,
 		start_migratetype, fallback_mt);
 

From 707dfe67d68fdbf6b1c6dbd0e7fc5564fb3c71e6 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 24 Feb 2025 19:08:25 -0500
Subject: [PATCH 10/49] UPSTREAM: mm: page_alloc: remove remnants of unlocked
 migratetype updates

The freelist hygiene patches made migratetype accesses fully protected
under the zone->lock.  Remove remnants of handling the race conditions
that existed before from the MIGRATE_HIGHATOMIC code.

Link: https://lkml.kernel.org/r/20250225001023.1494422-3-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit 020396a581dc69be2d30939fabde6c029d847034)
Change-Id: Ia1266c34f09db1c404df7f37c1a9ff06d61c0cce
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/page_alloc.c | 50 ++++++++++++++++---------------------------------
 1 file changed, 16 insertions(+), 34 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1eddb7b66336..e5c9acfd999f 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2082,20 +2082,10 @@ static inline bool boost_watermark(struct zone *zone)
 static struct page *
 try_to_steal_block(struct zone *zone, struct page *page,
 		   int current_order, int order, int start_type,
-		   unsigned int alloc_flags)
+		   int block_type, unsigned int alloc_flags)
 {
 	int free_pages, movable_pages, alike_pages;
 	unsigned long start_pfn;
-	int block_type;
-
-	block_type = get_pageblock_migratetype(page);
-
-	/*
-	 * This can happen due to races and we want to prevent broken
-	 * highatomic accounting.
-	 */
-	if (is_migrate_highatomic(block_type))
-		return NULL;
 
 	/* Take ownership for orders >= pageblock_order */
 	if (current_order >= pageblock_order) {
@@ -2280,33 +2270,22 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 		spin_lock_irqsave(&zone->lock, flags);
 		for (order = 0; order < NR_PAGE_ORDERS; order++) {
 			struct free_area *area = &(zone->free_area[order]);
-			int mt;
+			unsigned long size;
 
 			page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC);
 			if (!page)
 				continue;
 
-			mt = get_pageblock_migratetype(page);
 			/*
-			 * In page freeing path, migratetype change is racy so
-			 * we can counter several free pages in a pageblock
-			 * in this loop although we changed the pageblock type
-			 * from highatomic to ac->migratetype. So we should
-			 * adjust the count once.
+			 * It should never happen but changes to
+			 * locking could inadvertently allow a per-cpu
+			 * drain to add pages to MIGRATE_HIGHATOMIC
+			 * while unreserving so be safe and watch for
+			 * underflows.
 			 */
-			if (is_migrate_highatomic(mt)) {
-				unsigned long size;
-				/*
-				 * It should never happen but changes to
-				 * locking could inadvertently allow a per-cpu
-				 * drain to add pages to MIGRATE_HIGHATOMIC
-				 * while unreserving so be safe and watch for
-				 * underflows.
-				 */
-				size = max(pageblock_nr_pages, 1UL << order);
-				size = min(size, zone->nr_reserved_highatomic);
-				zone->nr_reserved_highatomic -= size;
-			}
+			size = max(pageblock_nr_pages, 1UL << order);
+			size = min(size, zone->nr_reserved_highatomic);
+			zone->nr_reserved_highatomic -= size;
 
 			/*
 			 * Convert to ac->migratetype and avoid the normal
@@ -2318,10 +2297,12 @@ static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
 			 * may increase.
 			 */
 			if (order < pageblock_order)
-				ret = move_freepages_block(zone, page, mt,
+				ret = move_freepages_block(zone, page,
+							   MIGRATE_HIGHATOMIC,
 							   ac->migratetype);
 			else {
-				move_to_free_list(page, zone, order, mt,
+				move_to_free_list(page, zone, order,
+						  MIGRATE_HIGHATOMIC,
 						  ac->migratetype);
 				change_pageblock_range(page, order,
 						       ac->migratetype);
@@ -2395,7 +2376,8 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 
 		page = get_page_from_free_area(area, fallback_mt);
 		page = try_to_steal_block(zone, page, current_order, order,
-					  start_migratetype, alloc_flags);
+					  start_migratetype, fallback_mt,
+					  alloc_flags);
 		if (page)
 			goto got_one;
 	}

From c746bc1949da759ee8a3362bafcb01872f7f30d5 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 24 Feb 2025 19:08:26 -0500
Subject: [PATCH 11/49] BACKPORT: mm: page_alloc: group fallback functions
 together

The way the fallback rules are spread out makes them hard to follow.  Move
the functions next to each other at least.

Link: https://lkml.kernel.org/r/20250225001023.1494422-4-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit a4138a2702a4428317ecdb115934554df4b788b4)
[
1. In the original patch of the find_suitable_fallback function, replace MIGRATE_PCPTYPES with MIGRATE_FALLBACKS.;
2. Keep the hook function in the reserve_highatomic_pageblock and unreserve_highatomic_pageblock functions.
]
Change-Id: I069e8dd7f8b009c686daef4459f9f1452b3f4c2c
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/page_alloc.c | 414 ++++++++++++++++++++++++------------------------
 1 file changed, 207 insertions(+), 207 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index e5c9acfd999f..5dd4970d2485 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1994,6 +1994,43 @@ static void change_pageblock_range(struct page *pageblock_page,
 	}
 }
 
+static inline bool boost_watermark(struct zone *zone)
+{
+	unsigned long max_boost;
+
+	if (!watermark_boost_factor)
+		return false;
+	/*
+	 * Don't bother in zones that are unlikely to produce results.
+	 * On small machines, including kdump capture kernels running
+	 * in a small area, boosting the watermark can cause an out of
+	 * memory situation immediately.
+	 */
+	if ((pageblock_nr_pages * 4) > zone_managed_pages(zone))
+		return false;
+
+	max_boost = mult_frac(zone->_watermark[WMARK_HIGH],
+			watermark_boost_factor, 10000);
+
+	/*
+	 * high watermark may be uninitialised if fragmentation occurs
+	 * very early in boot so do not boost. We do not fall
+	 * through and boost by pageblock_nr_pages as failing
+	 * allocations that early means that reclaim is not going
+	 * to help and it may even be impossible to reclaim the
+	 * boosted watermark resulting in a hang.
+	 */
+	if (!max_boost)
+		return false;
+
+	max_boost = max(pageblock_nr_pages, max_boost);
+
+	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
+		max_boost);
+
+	return true;
+}
+
 /*
  * When we are falling back to another migratetype during allocation, try to
  * steal extra free pages from the same pageblocks to satisfy further
@@ -2035,41 +2072,38 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
 	return false;
 }
 
-static inline bool boost_watermark(struct zone *zone)
+/*
+ * Check whether there is a suitable fallback freepage with requested order.
+ * If only_stealable is true, this function returns fallback_mt only if
+ * we can steal other freepages all together. This would help to reduce
+ * fragmentation due to mixed migratetype pages in one pageblock.
+ */
+int find_suitable_fallback(struct free_area *area, unsigned int order,
+			int migratetype, bool only_stealable, bool *can_steal)
 {
-	unsigned long max_boost;
+	int i;
+	int fallback_mt;
 
-	if (!watermark_boost_factor)
-		return false;
-	/*
-	 * Don't bother in zones that are unlikely to produce results.
-	 * On small machines, including kdump capture kernels running
-	 * in a small area, boosting the watermark can cause an out of
-	 * memory situation immediately.
-	 */
-	if ((pageblock_nr_pages * 4) > zone_managed_pages(zone))
-		return false;
+	if (area->nr_free == 0)
+		return -1;
 
-	max_boost = mult_frac(zone->_watermark[WMARK_HIGH],
-			watermark_boost_factor, 10000);
+	*can_steal = false;
+	for (i = 0; i < MIGRATE_FALLBACKS - 1 ; i++) {
+		fallback_mt = fallbacks[migratetype][i];
+		if (free_area_empty(area, fallback_mt))
+			continue;
 
-	/*
-	 * high watermark may be uninitialised if fragmentation occurs
-	 * very early in boot so do not boost. We do not fall
-	 * through and boost by pageblock_nr_pages as failing
-	 * allocations that early means that reclaim is not going
-	 * to help and it may even be impossible to reclaim the
-	 * boosted watermark resulting in a hang.
-	 */
-	if (!max_boost)
-		return false;
+		if (can_steal_fallback(order, migratetype))
+			*can_steal = true;
 
-	max_boost = max(pageblock_nr_pages, max_boost);
+		if (!only_stealable)
+			return fallback_mt;
 
-	zone->watermark_boost = min(zone->watermark_boost + pageblock_nr_pages,
-		max_boost);
+		if (*can_steal)
+			return fallback_mt;
+	}
 
-	return true;
+	return -1;
 }
 
 /*
@@ -2145,185 +2179,6 @@ try_to_steal_block(struct zone *zone, struct page *page,
 	return NULL;
 }
 
-/*
- * Check whether there is a suitable fallback freepage with requested order.
- * If only_stealable is true, this function returns fallback_mt only if
- * we can steal other freepages all together. This would help to reduce
- * fragmentation due to mixed migratetype pages in one pageblock.
- */
-int find_suitable_fallback(struct free_area *area, unsigned int order,
-			int migratetype, bool only_stealable, bool *can_steal)
-{
-	int i;
-	int fallback_mt;
-
-	if (area->nr_free == 0)
-		return -1;
-
-	*can_steal = false;
-	for (i = 0; i < MIGRATE_FALLBACKS - 1 ; i++) {
-		fallback_mt = fallbacks[migratetype][i];
-		if (free_area_empty(area, fallback_mt))
-			continue;
-
-		if (can_steal_fallback(order, migratetype))
-			*can_steal = true;
-
-		if (!only_stealable)
-			return fallback_mt;
-
-		if (*can_steal)
-			return fallback_mt;
-	}
-
-	return -1;
-}
-
-/*
- * Reserve the pageblock(s) surrounding an allocation request for
- * exclusive use of high-order atomic allocations if there are no
- * empty page blocks that contain a page with a suitable order
- */
-static void reserve_highatomic_pageblock(struct page *page, int order,
-					 struct zone *zone)
-{
-	int mt;
-	unsigned long max_managed, flags;
-	bool bypass = false;
-
-	/*
-	 * The number reserved as: minimum is 1 pageblock, maximum is
-	 * roughly 1% of a zone. But if 1% of a zone falls below a
-	 * pageblock size, then don't reserve any pageblocks.
-	 * Check is race-prone but harmless.
-	 */
-	if ((zone_managed_pages(zone) / 100) < pageblock_nr_pages)
-		return;
-	max_managed = ALIGN((zone_managed_pages(zone) / 100), pageblock_nr_pages);
-	if (zone->nr_reserved_highatomic >= max_managed)
-		return;
-	trace_android_vh_reserve_highatomic_bypass(page, &bypass);
-	if (bypass)
-		return;
-
-	spin_lock_irqsave(&zone->lock, flags);
-
-	/* Recheck the nr_reserved_highatomic limit under the lock */
-	if (zone->nr_reserved_highatomic >= max_managed)
-		goto out_unlock;
-
-	/* Yoink! */
-	mt = get_pageblock_migratetype(page);
-	/* Only reserve normal pageblocks (i.e., they can merge with others) */
-	if (!migratetype_is_mergeable(mt))
-		goto out_unlock;
-
-	if (order < pageblock_order) {
-		if (move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC) == -1)
-			goto out_unlock;
-		zone->nr_reserved_highatomic += pageblock_nr_pages;
-	} else {
-		change_pageblock_range(page, order, MIGRATE_HIGHATOMIC);
-		zone->nr_reserved_highatomic += 1 << order;
-	}
-
-out_unlock:
-	spin_unlock_irqrestore(&zone->lock, flags);
-}
-
-/*
- * Used when an allocation is about to fail under memory pressure. This
- * potentially hurts the reliability of high-order allocations when under
- * intense memory pressure but failed atomic allocations should be easier
- * to recover from than an OOM.
- *
- * If @force is true, try to unreserve pageblocks even though highatomic
- * pageblock is exhausted.
- */
-static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
-						bool force)
-{
-	struct zonelist *zonelist = ac->zonelist;
-	unsigned long flags;
-	struct zoneref *z;
-	struct zone *zone;
-	struct page *page;
-	int order;
-	bool skip_unreserve_highatomic = false;
-	int ret;
-
-	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->highest_zoneidx,
-								ac->nodemask) {
-		/*
-		 * Preserve at least one pageblock unless memory pressure
-		 * is really high.
-		 */
-		if (!force && zone->nr_reserved_highatomic <=
-					pageblock_nr_pages)
-			continue;
-
-		trace_android_vh_unreserve_highatomic_bypass(force, zone,
-				&skip_unreserve_highatomic);
-		if (skip_unreserve_highatomic)
-			continue;
-
-		spin_lock_irqsave(&zone->lock, flags);
-		for (order = 0; order < NR_PAGE_ORDERS; order++) {
-			struct free_area *area = &(zone->free_area[order]);
-			unsigned long size;
-
-			page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC);
-			if (!page)
-				continue;
-
-			/*
-			 * It should never happen but changes to
-			 * locking could inadvertently allow a per-cpu
-			 * drain to add pages to MIGRATE_HIGHATOMIC
-			 * while unreserving so be safe and watch for
-			 * underflows.
-			 */
-			size = max(pageblock_nr_pages, 1UL << order);
-			size = min(size, zone->nr_reserved_highatomic);
-			zone->nr_reserved_highatomic -= size;
-
-			/*
-			 * Convert to ac->migratetype and avoid the normal
-			 * pageblock stealing heuristics. Minimally, the caller
-			 * is doing the work and needs the pages. More
-			 * importantly, if the block was always converted to
-			 * MIGRATE_UNMOVABLE or another type then the number
-			 * of pageblocks that cannot be completely freed
-			 * may increase.
-			 */
-			if (order < pageblock_order)
-				ret = move_freepages_block(zone, page,
-							   MIGRATE_HIGHATOMIC,
-							   ac->migratetype);
-			else {
-				move_to_free_list(page, zone, order,
-						  MIGRATE_HIGHATOMIC,
-						  ac->migratetype);
-				change_pageblock_range(page, order,
-						       ac->migratetype);
-				ret = 1;
-			}
-			/*
-			 * Reserving the block(s) already succeeded,
-			 * so this should not fail on zone boundaries.
-			 */
-			WARN_ON_ONCE(ret == -1);
-			if (ret > 0) {
-				spin_unlock_irqrestore(&zone->lock, flags);
-				return ret;
-			}
-		}
-		spin_unlock_irqrestore(&zone->lock, flags);
-	}
-
-	return false;
-}
-
 /*
  * Try finding a free buddy page on the fallback list.
  *
@@ -3215,6 +3070,151 @@ noinline bool should_fail_alloc_page(gfp_t gfp_mask, unsigned int order)
 }
 ALLOW_ERROR_INJECTION(should_fail_alloc_page, TRUE);
 
+/*
+ * Reserve the pageblock(s) surrounding an allocation request for
+ * exclusive use of high-order atomic allocations if there are no
+ * empty page blocks that contain a page with a suitable order
+ */
+static void reserve_highatomic_pageblock(struct page *page, int order,
+					 struct zone *zone)
+{
+	int mt;
+	unsigned long max_managed, flags;
+	bool bypass = false;
+
+	/*
+	 * The number reserved as: minimum is 1 pageblock, maximum is
+	 * roughly 1% of a zone. But if 1% of a zone falls below a
+	 * pageblock size, then don't reserve any pageblocks.
+	 * Check is race-prone but harmless.
+	 */
+	if ((zone_managed_pages(zone) / 100) < pageblock_nr_pages)
+		return;
+	max_managed = ALIGN((zone_managed_pages(zone) / 100), pageblock_nr_pages);
+	if (zone->nr_reserved_highatomic >= max_managed)
+		return;
+	trace_android_vh_reserve_highatomic_bypass(page, &bypass);
+	if (bypass)
+		return;
+
+	spin_lock_irqsave(&zone->lock, flags);
+
+	/* Recheck the nr_reserved_highatomic limit under the lock */
+	if (zone->nr_reserved_highatomic >= max_managed)
+		goto out_unlock;
+
+	/* Yoink! */
+	mt = get_pageblock_migratetype(page);
+	/* Only reserve normal pageblocks (i.e., they can merge with others) */
+	if (!migratetype_is_mergeable(mt))
+		goto out_unlock;
+
+	if (order < pageblock_order) {
+		if (move_freepages_block(zone, page, mt, MIGRATE_HIGHATOMIC) == -1)
+			goto out_unlock;
+		zone->nr_reserved_highatomic += pageblock_nr_pages;
+	} else {
+		change_pageblock_range(page, order, MIGRATE_HIGHATOMIC);
+		zone->nr_reserved_highatomic += 1 << order;
+	}
+
+out_unlock:
+	spin_unlock_irqrestore(&zone->lock, flags);
+}
+
+/*
+ * Used when an allocation is about to fail under memory pressure. This
+ * potentially hurts the reliability of high-order allocations when under
+ * intense memory pressure but failed atomic allocations should be easier
+ * to recover from than an OOM.
+ *
+ * If @force is true, try to unreserve pageblocks even though highatomic
+ * pageblock is exhausted.
+ */
+static bool unreserve_highatomic_pageblock(const struct alloc_context *ac,
+						bool force)
+{
+	struct zonelist *zonelist = ac->zonelist;
+	unsigned long flags;
+	struct zoneref *z;
+	struct zone *zone;
+	struct page *page;
+	int order;
+	bool skip_unreserve_highatomic = false;
+	int ret;
+
+	for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->highest_zoneidx,
+								ac->nodemask) {
+		/*
+		 * Preserve at least one pageblock unless memory pressure
+		 * is really high.
+		 */
+		if (!force && zone->nr_reserved_highatomic <=
+					pageblock_nr_pages)
+			continue;
+
+		trace_android_vh_unreserve_highatomic_bypass(force, zone,
+				&skip_unreserve_highatomic);
+		if (skip_unreserve_highatomic)
+			continue;
+
+		spin_lock_irqsave(&zone->lock, flags);
+		for (order = 0; order < NR_PAGE_ORDERS; order++) {
+			struct free_area *area = &(zone->free_area[order]);
+			unsigned long size;
+
+			page = get_page_from_free_area(area, MIGRATE_HIGHATOMIC);
+			if (!page)
+				continue;
+
+			/*
+			 * It should never happen but changes to
+			 * locking could inadvertently allow a per-cpu
+			 * drain to add pages to MIGRATE_HIGHATOMIC
+			 * while unreserving so be safe and watch for
+			 * underflows.
+			 */
+			size = max(pageblock_nr_pages, 1UL << order);
+			size = min(size, zone->nr_reserved_highatomic);
+			zone->nr_reserved_highatomic -= size;
+
+			/*
+			 * Convert to ac->migratetype and avoid the normal
+			 * pageblock stealing heuristics. Minimally, the caller
+			 * is doing the work and needs the pages. More
+			 * importantly, if the block was always converted to
+			 * MIGRATE_UNMOVABLE or another type then the number
+			 * of pageblocks that cannot be completely freed
+			 * may increase.
+			 */
+			if (order < pageblock_order)
+				ret = move_freepages_block(zone, page,
+							   MIGRATE_HIGHATOMIC,
+							   ac->migratetype);
+			else {
+				move_to_free_list(page, zone, order,
+						  MIGRATE_HIGHATOMIC,
+						  ac->migratetype);
+				change_pageblock_range(page, order,
+						       ac->migratetype);
+				ret = 1;
+			}
+			/*
+			 * Reserving the block(s) already succeeded,
+			 * so this should not fail on zone boundaries.
+			 */
+			WARN_ON_ONCE(ret == -1);
+			if (ret > 0) {
+				spin_unlock_irqrestore(&zone->lock, flags);
+				return ret;
+			}
+		}
+		spin_unlock_irqrestore(&zone->lock, flags);
+	}
+
+	return false;
+}
+
 static inline long __zone_watermark_unusable_free(struct zone *z,
 				unsigned int order, unsigned int alloc_flags)
 {

From 59eb95395c83d3e304302b95cff6d804aaf62b86 Mon Sep 17 00:00:00 2001
From: Brendan Jackman <jackmanb@google.com>
Date: Fri, 28 Feb 2025 09:52:17 +0000
Subject: [PATCH 12/49] BACKPORT: mm/page_alloc: clarify terminology in
 migratetype fallback code

Patch series "mm/page_alloc: Some clarifications for migratetype
fallback", v4.

A couple of patches to try and make the code easier to follow.

This patch (of 2):

This code is rather confusing because:

 1. "Steal" is sometimes used to refer to the general concept of
    allocating from a from a block of a fallback migratetype
    (steal_suitable_fallback()) but sometimes it refers specifically to
    converting a whole block's migratetype (can_steal_fallback()).

 2. can_steal_fallback() sounds as though it's answering the question "am
    I functionally permitted to allocate from that other type" but in
    fact it is encoding a heuristic preference.

 3. The same piece of data has different names in different places:
    can_steal vs whole_block. This reinforces point 2 because it looks
    like the different names reflect a shift in intent from "am I
    allowed to steal" to "do I want to steal", but no such shift exists.

Fix 1. by avoiding the term "steal" in ambiguous contexts. Start using
the term "claim" to refer to the special case of stealing the entire
block.

Fix 2. by using "should" instead of "can", and also rename its
parameters and add some commentary to make it more explicit what they
mean.

Fix 3. by adopting the new "claim" terminology universally for this
set of variables.

Link: https://lkml.kernel.org/r/20250228-clarify-steal-v4-0-cb2ef1a4e610@google.com
Link: https://lkml.kernel.org/r/20250228-clarify-steal-v4-1-cb2ef1a4e610@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit e47f1f56dd82cc6d91f5c4d914a534aa03cd12ca)
[In the original patch of the find_suitable_fallback function, replace MIGRATE_PCPTYPES with MIGRATE_FALLBACKS.;]
Change-Id: I8f1b57aebf308f378f50cd1381f31d249362078e
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/compaction.c |  4 +--
 mm/internal.h   |  2 +-
 mm/page_alloc.c | 72 ++++++++++++++++++++++++-------------------------
 3 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 75ee7750ce2a..7e0f264e46d8 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2279,7 +2279,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 	ret = COMPACT_NO_SUITABLE_PAGE;
 	for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
 		struct free_area *area = &cc->zone->free_area[order];
-		bool can_steal;
+		bool claim_block;
 
 		/* Job done if page is free of the right migratetype */
 		if (!free_area_empty(area, migratetype))
@@ -2296,7 +2296,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 		 * other migratetype buddy lists.
 		 */
 		if (find_suitable_fallback(area, order, migratetype,
-						true, &can_steal) != -1)
+						true, &claim_block) != -1)
 			/*
 			 * Movable pages are OK in any pageblock. If we are
 			 * stealing for a non-movable allocation, make sure
diff --git a/mm/internal.h b/mm/internal.h
index da8bd4bfbb3e..313f6e6ea62e 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -815,7 +815,7 @@ void init_cma_reserved_pageblock(struct page *page);
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 
 int find_suitable_fallback(struct free_area *area, unsigned int order,
-			int migratetype, bool only_stealable, bool *can_steal);
+			int migratetype, bool claim_only, bool *claim_block);
 
 static inline bool free_area_empty(struct free_area *area, int migratetype)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5dd4970d2485..53fd6c8d6611 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2033,22 +2033,22 @@ static inline bool boost_watermark(struct zone *zone)
 
 /*
  * When we are falling back to another migratetype during allocation, try to
- * steal extra free pages from the same pageblocks to satisfy further
- * allocations, instead of polluting multiple pageblocks.
+ * claim entire blocks to satisfy further allocations, instead of polluting
+ * multiple pageblocks.
  *
- * If we are stealing a relatively large buddy page, it is likely there will
- * be more free pages in the pageblock, so try to steal them all. For
- * reclaimable and unmovable allocations, we steal regardless of page size,
- * as fragmentation caused by those allocations polluting movable pageblocks
- * is worse than movable allocations stealing from unmovable and reclaimable
- * pageblocks.
+ * If we are stealing a relatively large buddy page, it is likely there will be
+ * more free pages in the pageblock, so try to claim the whole block. For
+ * reclaimable and unmovable allocations, we try to claim the whole block
+ * regardless of page size, as fragmentation caused by those allocations
+ * polluting movable pageblocks is worse than movable allocations stealing from
+ * unmovable and reclaimable pageblocks.
  */
-static bool can_steal_fallback(unsigned int order, int start_mt)
+static bool should_try_claim_block(unsigned int order, int start_mt)
 {
 	/*
 	 * Leaving this order check is intended, although there is
 	 * relaxed order check in next check. The reason is that
-	 * we can actually steal whole pageblock if this condition met,
+	 * we can actually claim the whole pageblock if this condition met,
 	 * but, below check doesn't guarantee it and that is just heuristic
 	 * so could be changed anytime.
 	 */
@@ -2061,7 +2061,7 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
 	 * reclaimable pages that are closest to the request size.  After a
 	 * while, memory compaction may occur to form large contiguous pages,
 	 * and the next movable allocation may not need to steal.  Unmovable and
-	 * reclaimable allocations need to actually steal pages.
+	 * reclaimable allocations need to actually claim the whole block.
 	 */
 	if (order >= pageblock_order / 2 ||
 		start_mt == MIGRATE_RECLAIMABLE ||
@@ -2074,12 +2074,14 @@ static bool can_steal_fallback(unsigned int order, int start_mt)
 
 /*
  * Check whether there is a suitable fallback freepage with requested order.
- * If only_stealable is true, this function returns fallback_mt only if
- * we can steal other freepages all together. This would help to reduce
+ * Sets *claim_block to instruct the caller whether it should convert a whole
+ * pageblock to the returned migratetype.
+ * If only_claim is true, this function returns fallback_mt only if
+ * we would do this whole-block claiming. This would help to reduce
  * fragmentation due to mixed migratetype pages in one pageblock.
  */
 int find_suitable_fallback(struct free_area *area, unsigned int order,
-			int migratetype, bool only_stealable, bool *can_steal)
+			int migratetype, bool only_claim, bool *claim_block)
 {
 	int i;
 	int fallback_mt;
@@ -2087,19 +2089,16 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
 	if (area->nr_free == 0)
 		return -1;
 
-	*can_steal = false;
+	*claim_block = false;
 	for (i = 0; i < MIGRATE_FALLBACKS - 1 ; i++) {
 		fallback_mt = fallbacks[migratetype][i];
 		if (free_area_empty(area, fallback_mt))
 			continue;
 
-		if (can_steal_fallback(order, migratetype))
-			*can_steal = true;
+		if (should_try_claim_block(order, migratetype))
+			*claim_block = true;
 
-		if (!only_stealable)
-			return fallback_mt;
-
-		if (*can_steal)
+		if (*claim_block || !only_claim)
 			return fallback_mt;
 	}
 
@@ -2107,14 +2106,14 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
 }
 
 /*
- * This function implements actual steal behaviour. If order is large enough, we
- * can claim the whole pageblock for the requested migratetype. If not, we check
- * the pageblock for constituent pages; if at least half of the pages are free
- * or compatible, we can still claim the whole block, so pages freed in the
- * future will be put on the correct free list.
+ * This function implements actual block claiming behaviour. If order is large
+ * enough, we can claim the whole pageblock for the requested migratetype. If
+ * not, we check the pageblock for constituent pages; if at least half of the
+ * pages are free or compatible, we can still claim the whole block, so pages
+ * freed in the future will be put on the correct free list.
  */
 static struct page *
-try_to_steal_block(struct zone *zone, struct page *page,
+try_to_claim_block(struct zone *zone, struct page *page,
 		   int current_order, int order, int start_type,
 		   int block_type, unsigned int alloc_flags)
 {
@@ -2182,11 +2181,12 @@ try_to_steal_block(struct zone *zone, struct page *page,
 /*
  * Try finding a free buddy page on the fallback list.
  *
- * This will attempt to steal a whole pageblock for the requested type
+ * This will attempt to claim a whole pageblock for the requested type
  * to ensure grouping of such requests in the future.
  *
- * If a whole block cannot be stolen, regress to __rmqueue_smallest()
- * logic to at least break up as little contiguity as possible.
+ * If a whole block cannot be claimed, steal an individual page, regressing to
+ * __rmqueue_smallest() logic to at least break up as little contiguity as
+ * possible.
  *
  * The use of signed ints for order and current_order is a deliberate
  * deviation from the rest of this file, to make the for loop
@@ -2203,7 +2203,7 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	int min_order = order;
 	struct page *page;
 	int fallback_mt;
-	bool can_steal;
+	bool claim_block;
 
 	/*
 	 * Do not steal pages from freelists belonging to other pageblocks
@@ -2222,15 +2222,15 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 				--current_order) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
-				start_migratetype, false, &can_steal);
+				start_migratetype, false, &claim_block);
 		if (fallback_mt == -1)
 			continue;
 
-		if (!can_steal)
+		if (!claim_block)
 			break;
 
 		page = get_page_from_free_area(area, fallback_mt);
-		page = try_to_steal_block(zone, page, current_order, order,
+		page = try_to_claim_block(zone, page, current_order, order,
 					  start_migratetype, fallback_mt,
 					  alloc_flags);
 		if (page)
@@ -2240,11 +2240,11 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 	if (alloc_flags & ALLOC_NOFRAGMENT)
 		return NULL;
 
-	/* No luck stealing blocks. Find the smallest fallback page */
+	/* No luck claiming pageblock. Find the smallest fallback page */
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
-				start_migratetype, false, &can_steal);
+				start_migratetype, false, &claim_block);
 		if (fallback_mt == -1)
 			continue;
 

From ae27d6c79c4ece8ff2c103cd2548c6263c4b25da Mon Sep 17 00:00:00 2001
From: Brendan Jackman <jackmanb@google.com>
Date: Fri, 28 Feb 2025 09:52:18 +0000
Subject: [PATCH 13/49] UPSTREAM: mm/page_alloc: clarify should_claim_block()
 commentary

There's lots of text here but it's a little hard to follow, this is an
attempt to break it up and align its structure more closely with the code.

Reword the top-level function comment to just explain what question the
function answers from the point of view of the caller.

Break up the internal logic into different sections that can have their
own commentary describing why that part of the rationale is present.

Note the page_group_by_mobility_disabled logic is not explained in the
commentary, that is outside the scope of this patch...

Link: https://lkml.kernel.org/r/20250228-clarify-steal-v4-2-cb2ef1a4e610@google.com
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yosry Ahmed <yosry.ahmed@linux.dev>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit a14efee04796dd3f614eaf5348ca1ac099c21349)
Change-Id: I6c7f908a4e9f025726dadab210c2d59004fe1946
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/page_alloc.c | 46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 53fd6c8d6611..af4ca23861e8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2032,16 +2032,9 @@ static inline bool boost_watermark(struct zone *zone)
 }
 
 /*
- * When we are falling back to another migratetype during allocation, try to
- * claim entire blocks to satisfy further allocations, instead of polluting
- * multiple pageblocks.
- *
- * If we are stealing a relatively large buddy page, it is likely there will be
- * more free pages in the pageblock, so try to claim the whole block. For
- * reclaimable and unmovable allocations, we try to claim the whole block
- * regardless of page size, as fragmentation caused by those allocations
- * polluting movable pageblocks is worse than movable allocations stealing from
- * unmovable and reclaimable pageblocks.
+ * When we are falling back to another migratetype during allocation, should we
+ * try to claim an entire block to satisfy further allocations, instead of
+ * polluting multiple pageblocks?
  */
 static bool should_try_claim_block(unsigned int order, int start_mt)
 {
@@ -2056,19 +2049,32 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
 		return true;
 
 	/*
-	 * Movable pages won't cause permanent fragmentation, so when you alloc
-	 * small pages, you just need to temporarily steal unmovable or
-	 * reclaimable pages that are closest to the request size.  After a
-	 * while, memory compaction may occur to form large contiguous pages,
-	 * and the next movable allocation may not need to steal.  Unmovable and
-	 * reclaimable allocations need to actually claim the whole block.
+	 * Above a certain threshold, always try to claim, as it's likely there
+	 * will be more free pages in the pageblock.
 	 */
-	if (order >= pageblock_order / 2 ||
-		start_mt == MIGRATE_RECLAIMABLE ||
-		start_mt == MIGRATE_UNMOVABLE ||
-		page_group_by_mobility_disabled)
+	if (order >= pageblock_order / 2)
 		return true;
 
+	/*
+	 * Unmovable/reclaimable allocations would cause permanent
+	 * fragmentations if they fell back to allocating from a movable block
+	 * (polluting it), so we try to claim the whole block regardless of the
+	 * allocation size. Later movable allocations can always steal from this
+	 * block, which is less problematic.
+	 */
+	if (start_mt == MIGRATE_RECLAIMABLE || start_mt == MIGRATE_UNMOVABLE)
+		return true;
+
+	if (page_group_by_mobility_disabled)
+		return true;
+
+	/*
+	 * Movable pages won't cause permanent fragmentation, so when you alloc
+	 * small pages, we just need to temporarily steal unmovable or
+	 * reclaimable pages that are closest to the request size. After a
+	 * while, memory compaction may occur to form large contiguous pages,
+	 * and the next movable allocation may not need to steal.
+	 */
 	return false;
 }
 

From b5b61c9e5781847fb6311900b54a060c3d7af420 Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 7 Apr 2025 14:01:53 -0400
Subject: [PATCH 14/49] BACKPORT: mm: page_alloc: speed up fallbacks in
 rmqueue_bulk()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
single pages from biggest buddy") as the root cause of a 56.4% regression
in vm-scalability::lru-file-mmap-read.

Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
freelist movement during block conversion"), as the root cause for a
regression in worst-case zone->lock+irqoff hold times.

Both of these patches modify the page allocator's fallback path to be less
greedy in an effort to stave off fragmentation.  The flip side of this is
that fallbacks are also less productive each time around, which means the
fallback search can run much more frequently.

Carlos' traces point to rmqueue_bulk() specifically, which tries to refill
the percpu cache by allocating a large batch of pages in a loop.  It
highlights how once the native freelists are exhausted, the fallback code
first scans orders top-down for whole blocks to claim, then falls back to
a bottom-up search for the smallest buddy to steal.  For the next batch
page, it goes through the same thing again.

This can be made more efficient.  Since rmqueue_bulk() holds the
zone->lock over the entire batch, the freelists are not subject to outside
changes; when the search for a block to claim has already failed, there is
no point in trying again for the next page.

Modify __rmqueue() to remember the last successful fallback mode, and
restart directly from there on the next rmqueue_bulk() iteration.

Oliver confirms that this improves beyond the regression that the test
robot reported against c2f6ea38fc1b:

commit:
  f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
  c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
  acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
  2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()")   <--- your patch

f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
---------------- --------------------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \          |                \
  25525364 ±  3%     -56.4%   11135467           -57.8%   10779336           +31.6%   33581409        vm-scalability.throughput

Carlos confirms that worst-case times are almost fully recovered
compared to before the earlier culprit patch:

  2dd482ba627d (before freelist hygiene):    1ms
  c0cd6f557b90  (after freelist hygiene):   90ms
 next-20250319    (steal smallest buddy):  280ms
    this patch                          :    8ms

[jackmanb@google.com: comment updates]
  Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com
[hannes@cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin]
  Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org
Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org
Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Brendan Jackman <jackmanb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Reported-by: Carlos Song <carlos.song@nxp.com>
Tested-by: Carlos Song <carlos.song@nxp.com>
Tested-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Acked-by: Zi Yan <ziy@nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>	[6.10+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
(cherry picked from commit 90abee6d7895d5eef18c91d870d8168be4e76e9d)
[Resolve conflicts caused by cma_redirect_restricted ]
Change-Id: I4bf9e270886716b0a3f11f9edce9a73e855b1fe9
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/page_alloc.c | 116 +++++++++++++++++++++++++++++++++---------------
 1 file changed, 81 insertions(+), 35 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index af4ca23861e8..97dc7a5280a2 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2185,23 +2185,15 @@ try_to_claim_block(struct zone *zone, struct page *page,
 }
 
 /*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
  *
  * The use of signed ints for order and current_order is a deliberate
  * deviation from the rest of this file, to make the for loop
  * condition simpler.
- *
- * Return the stolen page, or NULL if none can be found.
  */
 static __always_inline struct page *
-__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
+__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
 						unsigned int alloc_flags)
 {
 	struct free_area *area;
@@ -2239,14 +2231,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 		page = try_to_claim_block(zone, page, current_order, order,
 					  start_migratetype, fallback_mt,
 					  alloc_flags);
-		if (page)
-			goto got_one;
+		if (page) {
+			trace_mm_page_alloc_extfrag(page, order, current_order,
+						    start_migratetype, fallback_mt);
+			return page;
+		}
 	}
 
-	if (alloc_flags & ALLOC_NOFRAGMENT)
-		return NULL;
+	return NULL;
+}
+
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
+static __always_inline struct page *
+__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
+{
+	struct free_area *area;
+	int current_order;
+	struct page *page;
+	int fallback_mt;
+	bool claim_block;
 
-	/* No luck claiming pageblock. Find the smallest fallback page */
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
@@ -2256,25 +2263,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
 
 		page = get_page_from_free_area(area, fallback_mt);
 		page_del_and_expand(zone, page, order, current_order, fallback_mt);
-		goto got_one;
+		trace_mm_page_alloc_extfrag(page, order, current_order,
+					    start_migratetype, fallback_mt);
+		return page;
 	}
 
 	return NULL;
-
-got_one:
-	trace_mm_page_alloc_extfrag(page, order, current_order,
-		start_migratetype, fallback_mt);
-
-	return page;
 }
 
+enum rmqueue_mode {
+	RMQUEUE_NORMAL,
+	RMQUEUE_CMA,
+	RMQUEUE_CLAIM,
+	RMQUEUE_STEAL,
+};
+
 /*
  * Do the hard work of removing an element from the buddy allocator.
  * Call me with the zone->lock already held.
  */
 static __always_inline struct page *
 __rmqueue(struct zone *zone, unsigned int order, int migratetype,
-						unsigned int alloc_flags)
+	  unsigned int alloc_flags, enum rmqueue_mode *mode)
 {
 	struct page *page = NULL;
 
@@ -2297,16 +2307,48 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
 		}
 	}
 
-	page = __rmqueue_smallest(zone, order, migratetype);
-	if (unlikely(!page)) {
-		if (!cma_redirect_restricted() && alloc_flags & ALLOC_CMA)
+	/*
+	 * First try the freelists of the requested migratetype, then try
+	 * fallbacks modes with increasing levels of fragmentation risk.
+	 *
+	 * The fallback logic is expensive and rmqueue_bulk() calls in
+	 * a loop with the zone->lock held, meaning the freelists are
+	 * not subject to any outside changes. Remember in *mode where
+	 * we found pay dirt, to save us the search on the next call.
+	 */
+	switch (*mode) {
+	case RMQUEUE_NORMAL:
+		page = __rmqueue_smallest(zone, order, migratetype);
+		if (page)
+			return page;
+		fallthrough;
+	case RMQUEUE_CMA:
+		if (!cma_redirect_restricted() && alloc_flags & ALLOC_CMA) {
 			page = __rmqueue_cma_fallback(zone, order);
-
-		if (!page)
-			page = __rmqueue_fallback(zone, order, migratetype,
-						  alloc_flags);
+			if (page) {
+				*mode = RMQUEUE_CMA;
+				return page;
+			}
+		}
+		fallthrough;
+	case RMQUEUE_CLAIM:
+		page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
+		if (page) {
+			/* Replenished preferred freelist, back to normal mode. */
+			*mode = RMQUEUE_NORMAL;
+			return page;
+		}
+		fallthrough;
+	case RMQUEUE_STEAL:
+		if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
+			page = __rmqueue_steal(zone, order, migratetype);
+			if (page) {
+				*mode = RMQUEUE_STEAL;
+				return page;
+			}
+		}
 	}
-	return page;
+	return NULL;
 }
 
 /*
@@ -2318,6 +2360,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 			unsigned long count, struct list_head *list,
 			int migratetype, unsigned int alloc_flags)
 {
+	enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
 	unsigned long flags;
 	int i;
 
@@ -2333,7 +2376,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
 		if (cma_redirect_restricted() && is_migrate_cma(migratetype))
 			page = __rmqueue_cma_fallback(zone, order);
 		else
-			page = __rmqueue(zone, order, migratetype, alloc_flags);
+			page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
 
 		if (unlikely(page == NULL))
 			break;
@@ -2889,9 +2932,12 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
 			    alloc_flags & ALLOC_CMA)
 				page = __rmqueue_cma_fallback(zone, order);
 
-			if (!page)
+			if (!page) {
+				enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
+
 				page = __rmqueue(zone, order, migratetype,
-						 alloc_flags);
+						 alloc_flags, &rmqm);
+			}
 			/*
 			 * If the allocation fails, allow OOM handling and
 			 * order-0 (atomic) allocs access to HIGHATOMIC

From 2bc327484ee44b9e74be35b014e1e8a09d470bbd Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes@cmpxchg.org>
Date: Mon, 7 Apr 2025 14:01:54 -0400
Subject: [PATCH 15/49] BACKPORT: mm: page_alloc: tighten up
 find_suitable_fallback()

find_suitable_fallback() is not as efficient as it could be, and somewhat
difficult to follow.

1. should_try_claim_block() is a loop invariant. There is no point in
   checking fallback areas if the caller is interested in claimable
   blocks but the order and the migratetype don't allow for that.

2. __rmqueue_steal() doesn't care about claimability, so it shouldn't
   have to run those tests.

Different callers want different things from this helper:

1. __compact_finished() scans orders up until it finds a claimable block
2. __rmqueue_claim() scans orders down as long as blocks are claimable
3. __rmqueue_steal() doesn't care about claimability at all

Move should_try_claim_block() out of the loop. Only test it for the
two callers who care in the first place. Distinguish "no blocks" from
"order + mt are not claimable" in the return value; __rmqueue_claim()
can stop once order becomes unclaimable, __compact_finished() can keep
advancing until order becomes claimable.

Before:

 Performance counter stats for './run case-lru-file-mmap-read' (5 runs):

	 85,294.85 msec task-clock                       #    5.644 CPUs utilized               ( +-  0.32% )
	    15,968      context-switches                 #  187.209 /sec                        ( +-  3.81% )
	       153      cpu-migrations                   #    1.794 /sec                        ( +-  3.29% )
	   801,808      page-faults                      #    9.400 K/sec                       ( +-  0.10% )
   733,358,331,786      instructions                     #    1.87  insn per cycle              ( +-  0.20% )  (64.94%)
   392,622,904,199      cycles                           #    4.603 GHz                         ( +-  0.31% )  (64.84%)
   148,563,488,531      branches                         #    1.742 G/sec                       ( +-  0.18% )  (63.86%)
       152,143,228      branch-misses                    #    0.10% of all branches             ( +-  1.19% )  (62.82%)

	   15.1128 +- 0.0637 seconds time elapsed  ( +-  0.42% )

After:

 Performance counter stats for './run case-lru-file-mmap-read' (5 runs):

         84,380.21 msec task-clock                       #    5.664 CPUs utilized               ( +-  0.21% )
            16,656      context-switches                 #  197.392 /sec                        ( +-  3.27% )
               151      cpu-migrations                   #    1.790 /sec                        ( +-  3.28% )
           801,703      page-faults                      #    9.501 K/sec                       ( +-  0.09% )
   731,914,183,060      instructions                     #    1.88  insn per cycle              ( +-  0.38% )  (64.90%)
   388,673,535,116      cycles                           #    4.606 GHz                         ( +-  0.24% )  (65.06%)
   148,251,482,143      branches                         #    1.757 G/sec                       ( +-  0.37% )  (63.92%)
       149,766,550      branch-misses                    #    0.10% of all branches             ( +-  1.22% )  (62.88%)

           14.8968 +- 0.0486 seconds time elapsed  ( +-  0.33% )

Link: https://lkml.kernel.org/r/20250407180154.63348-2-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Brendan Jackman <jackmanb@google.com>
Tested-by: Shivank Garg <shivankg@amd.com>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Carlos Song <carlos.song@nxp.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

Bug: 420836317
Change-Id: I2886de9da0fd99047cf5c675cd2ae7c386267770
(cherry picked from commit ee414bd97b3fa0a4f74e40004e3b4191326bd46c)
[In the original patch, the variable MIGRATE_PCPTYPES in the find_suitable_fallback
 function should be MIGRATE_FALLBACKS in Linux 6.6, causing the patch to fail to
 apply directly.]
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/compaction.c |  4 +---
 mm/internal.h   |  2 +-
 mm/page_alloc.c | 31 +++++++++++++------------------
 3 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 7e0f264e46d8..89570cd884c7 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2279,7 +2279,6 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 	ret = COMPACT_NO_SUITABLE_PAGE;
 	for (order = cc->order; order < NR_PAGE_ORDERS; order++) {
 		struct free_area *area = &cc->zone->free_area[order];
-		bool claim_block;
 
 		/* Job done if page is free of the right migratetype */
 		if (!free_area_empty(area, migratetype))
@@ -2295,8 +2294,7 @@ static enum compact_result __compact_finished(struct compact_control *cc)
 		 * Job done if allocation would steal freepages from
 		 * other migratetype buddy lists.
 		 */
-		if (find_suitable_fallback(area, order, migratetype,
-						true, &claim_block) != -1)
+		if (find_suitable_fallback(area, order, migratetype, true) >= 0)
 			/*
 			 * Movable pages are OK in any pageblock. If we are
 			 * stealing for a non-movable allocation, make sure
diff --git a/mm/internal.h b/mm/internal.h
index 313f6e6ea62e..3fb4222fc3c9 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -815,7 +815,7 @@ void init_cma_reserved_pageblock(struct page *page);
 #endif /* CONFIG_COMPACTION || CONFIG_CMA */
 
 int find_suitable_fallback(struct free_area *area, unsigned int order,
-			int migratetype, bool claim_only, bool *claim_block);
+			   int migratetype, bool claimable);
 
 static inline bool free_area_empty(struct free_area *area, int migratetype)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 97dc7a5280a2..d1ecd3793e40 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2080,31 +2080,25 @@ static bool should_try_claim_block(unsigned int order, int start_mt)
 
 /*
  * Check whether there is a suitable fallback freepage with requested order.
- * Sets *claim_block to instruct the caller whether it should convert a whole
- * pageblock to the returned migratetype.
- * If only_claim is true, this function returns fallback_mt only if
+ * If claimable is true, this function returns fallback_mt only if
  * we would do this whole-block claiming. This would help to reduce
  * fragmentation due to mixed migratetype pages in one pageblock.
  */
 int find_suitable_fallback(struct free_area *area, unsigned int order,
-			int migratetype, bool only_claim, bool *claim_block)
+			   int migratetype, bool claimable)
 {
 	int i;
-	int fallback_mt;
+
+	if (claimable && !should_try_claim_block(order, migratetype))
+		return -2;
 
 	if (area->nr_free == 0)
 		return -1;
 
-	*claim_block = false;
 	for (i = 0; i < MIGRATE_FALLBACKS - 1 ; i++) {
-		fallback_mt = fallbacks[migratetype][i];
-		if (free_area_empty(area, fallback_mt))
-			continue;
+		int fallback_mt = fallbacks[migratetype][i];
 
-		if (should_try_claim_block(order, migratetype))
-			*claim_block = true;
-
-		if (*claim_block || !only_claim)
+		if (!free_area_empty(area, fallback_mt))
 			return fallback_mt;
 	}
 
@@ -2201,7 +2195,6 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
 	int min_order = order;
 	struct page *page;
 	int fallback_mt;
-	bool claim_block;
 
 	/*
 	 * Do not steal pages from freelists belonging to other pageblocks
@@ -2220,11 +2213,14 @@ __rmqueue_claim(struct zone *zone, int order, int start_migratetype,
 				--current_order) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
-				start_migratetype, false, &claim_block);
+						     start_migratetype, true);
+
+		/* No block in that order */
 		if (fallback_mt == -1)
 			continue;
 
-		if (!claim_block)
+		/* Advanced into orders too low to claim, abort */
+		if (fallback_mt == -2)
 			break;
 
 		page = get_page_from_free_area(area, fallback_mt);
@@ -2252,12 +2248,11 @@ __rmqueue_steal(struct zone *zone, int order, int start_migratetype)
 	int current_order;
 	struct page *page;
 	int fallback_mt;
-	bool claim_block;
 
 	for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
 		area = &(zone->free_area[current_order]);
 		fallback_mt = find_suitable_fallback(area, current_order,
-				start_migratetype, false, &claim_block);
+						     start_migratetype, false);
 		if (fallback_mt == -1)
 			continue;
 

From 6d61bc2d2d7fcb6e4eef25e5182563b0ae79e4c3 Mon Sep 17 00:00:00 2001
From: Richard Chang <richardycc@google.com>
Date: Wed, 2 Jul 2025 07:16:49 +0000
Subject: [PATCH 16/49] ANDROID: restricted vendor_hook: add
 swap_readpage_bdev_sync

Add restricted vendor hook to optimize the swap-in latency.

Bug: 401975249
Bug: 428209185
Change-Id: I1a2be1a309769590cb427e13762e29d8c8fa9cf6
Signed-off-by: Richard Chang <richardycc@google.com>
---
 drivers/android/vendor_hooks.c |  1 +
 include/trace/hooks/mm.h       |  4 ++++
 mm/page_io.c                   | 13 +++++++++++++
 3 files changed, 18 insertions(+)

diff --git a/drivers/android/vendor_hooks.c b/drivers/android/vendor_hooks.c
index d3f7ff4fde56..00ffd7ed2ffc 100644
--- a/drivers/android/vendor_hooks.c
+++ b/drivers/android/vendor_hooks.c
@@ -625,6 +625,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_migration_target_bypass);
 EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_shrink_node_memcgs);
 EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_swap_writepage);
 EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_swap_readpage_bdev_sync);
+EXPORT_TRACEPOINT_SYMBOL_GPL(android_rvh_swap_readpage_bdev_sync);
 EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_dpm_wait_start);
 EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_dpm_wait_finish);
 EXPORT_TRACEPOINT_SYMBOL_GPL(android_vh_sync_irq_wait_start);
diff --git a/include/trace/hooks/mm.h b/include/trace/hooks/mm.h
index 65eb40c00944..8087138ba33c 100644
--- a/include/trace/hooks/mm.h
+++ b/include/trace/hooks/mm.h
@@ -549,6 +549,10 @@ DECLARE_HOOK(android_vh_swap_readpage_bdev_sync,
 	TP_PROTO(struct block_device *bdev, sector_t sector,
 		struct page *page, bool *read),
 	TP_ARGS(bdev, sector, page, read));
+DECLARE_RESTRICTED_HOOK(android_rvh_swap_readpage_bdev_sync,
+	TP_PROTO(struct block_device *bdev, sector_t sector,
+		struct page *page, bool *read),
+	TP_ARGS(bdev, sector, page, read), 4);
 DECLARE_HOOK(android_vh_alloc_flags_cma_adjust,
 	TP_PROTO(gfp_t gfp_mask, unsigned int *alloc_flags),
 	TP_ARGS(gfp_mask, alloc_flags));
diff --git a/mm/page_io.c b/mm/page_io.c
index 648fd53303a9..a3feadd1ba9e 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -471,6 +471,19 @@ static void swap_readpage_bdev_sync(struct folio *folio,
 	struct bio bio;
 	bool read = false;
 
+	trace_android_rvh_swap_readpage_bdev_sync(sis->bdev,
+		swap_page_sector(&folio->page) + get_start_sect(sis->bdev),
+		&folio->page, &read);
+	if (read) {
+		count_vm_events(PSWPIN, folio_nr_pages(folio));
+		return;
+	}
+
+	/*
+	 * trace_android_vh_swap_readpage_bdev_sync is deprecated, and
+	 * should not be carried over into later kernels.
+	 * Use trace_android_rvh_swap_readpage_bdev_sync instead.
+	 */
 	trace_android_vh_swap_readpage_bdev_sync(sis->bdev,
 		swap_page_sector(&folio->page) + get_start_sect(sis->bdev),
 		&folio->page, &read);

From cc8b083f6fb69d104e028513ddf46ef77adb2723 Mon Sep 17 00:00:00 2001
From: Richard Chang <richardycc@google.com>
Date: Wed, 2 Jul 2025 07:58:27 +0000
Subject: [PATCH 17/49] ANDROID: ABI: Update pixel symbol list

Adding the following symbols:
  - __traceiter_android_rvh_swap_readpage_bdev_sync
  - __tracepoint_android_rvh_swap_readpage_bdev_sync

Bug: 401975249
Bug: 428209185
Change-Id: Ibdad385b2a9dc36e585ff3aa1ee9334680c57a20
Signed-off-by: Richard Chang <richardycc@google.com>
---
 android/abi_gki_aarch64.stg   | 20 ++++++++++++++++++++
 android/abi_gki_aarch64_pixel |  2 ++
 2 files changed, 22 insertions(+)

diff --git a/android/abi_gki_aarch64.stg b/android/abi_gki_aarch64.stg
index d8ed1e76d857..41b693bf1454 100644
--- a/android/abi_gki_aarch64.stg
+++ b/android/abi_gki_aarch64.stg
@@ -360789,6 +360789,15 @@ elf_symbol {
   type_id: 0x9baf3eaf
   full_name: "__traceiter_android_rvh_show_max_freq"
 }
+elf_symbol {
+  id: 0xb80ecc98
+  name: "__traceiter_android_rvh_swap_readpage_bdev_sync"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0xecf99d88
+  type_id: 0x9bab3090
+  full_name: "__traceiter_android_rvh_swap_readpage_bdev_sync"
+}
 elf_symbol {
   id: 0x3b650ee3
   name: "__traceiter_android_rvh_tcp_rcv_spurious_retrans"
@@ -367899,6 +367908,15 @@ elf_symbol {
   type_id: 0x18ccbd2c
   full_name: "__tracepoint_android_rvh_show_max_freq"
 }
+elf_symbol {
+  id: 0x64ce7cd6
+  name: "__tracepoint_android_rvh_swap_readpage_bdev_sync"
+  is_defined: true
+  symbol_type: OBJECT
+  crc: 0x72fbf2a6
+  type_id: 0x18ccbd2c
+  full_name: "__tracepoint_android_rvh_swap_readpage_bdev_sync"
+}
 elf_symbol {
   id: 0x5380a8d5
   name: "__tracepoint_android_rvh_tcp_rcv_spurious_retrans"
@@ -436958,6 +436976,7 @@ interface {
   symbol_id: 0x1228e7e9
   symbol_id: 0x73c83ef4
   symbol_id: 0x46515de8
+  symbol_id: 0xb80ecc98
   symbol_id: 0x3b650ee3
   symbol_id: 0xcf016f05
   symbol_id: 0x79480d0a
@@ -437748,6 +437767,7 @@ interface {
   symbol_id: 0x8a4070f7
   symbol_id: 0x00b7ed82
   symbol_id: 0xe8cacf26
+  symbol_id: 0x64ce7cd6
   symbol_id: 0x5380a8d5
   symbol_id: 0x1f12a317
   symbol_id: 0x454d16cc
diff --git a/android/abi_gki_aarch64_pixel b/android/abi_gki_aarch64_pixel
index d64fef8faf50..d17b443f7c9e 100644
--- a/android/abi_gki_aarch64_pixel
+++ b/android/abi_gki_aarch64_pixel
@@ -2669,6 +2669,7 @@
   __traceiter_android_rvh_setscheduler_prio
   __traceiter_android_rvh_set_task_cpu
   __traceiter_android_rvh_set_user_nice_locked
+  __traceiter_android_rvh_swap_readpage_bdev_sync
   __traceiter_android_rvh_tick_entry
   __traceiter_android_rvh_try_to_wake_up_success
   __traceiter_android_rvh_uclamp_eff_get
@@ -2808,6 +2809,7 @@
   __tracepoint_android_rvh_setscheduler_prio
   __tracepoint_android_rvh_set_task_cpu
   __tracepoint_android_rvh_set_user_nice_locked
+  __tracepoint_android_rvh_swap_readpage_bdev_sync
   __tracepoint_android_rvh_tick_entry
   __tracepoint_android_rvh_try_to_wake_up_success
   __tracepoint_android_rvh_uclamp_eff_get

From 326b0bd6324844d4fa25ace1939a34e871aa2caf Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu, 5 Jun 2025 12:00:09 +0200
Subject: [PATCH 18/49] BACKPORT: FROMGIT: sched/core: Fix migrate_swap() vs.
 hotplug

On Mon, Jun 02, 2025 at 03:22:13PM +0800, Kuyo Chang wrote:

> So, the potential race scenario is:
>
> 	CPU0							CPU1
> 	// doing migrate_swap(cpu0/cpu1)
> 	stop_two_cpus()
> 							  ...
> 							 // doing _cpu_down()
> 							      sched_cpu_deactivate()
> 								set_cpu_active(cpu, false);
> 								balance_push_set(cpu, true);
> 	cpu_stop_queue_two_works
> 	    __cpu_stop_queue_work(stopper1,...);
> 	    __cpu_stop_queue_work(stopper2,..);
> 	stop_cpus_in_progress -> true
> 		preempt_enable();
> 								...
> 							1st balance_push
> 							stop_one_cpu_nowait
> 							cpu_stop_queue_work
> 							__cpu_stop_queue_work
> 							list_add_tail  -> 1st add push_work
> 							wake_up_q(&wakeq);  -> "wakeq is empty.
> 										This implies that the stopper is at wakeq@migrate_swap."
> 	preempt_disable
> 	wake_up_q(&wakeq);
> 	        wake_up_process // wakeup migrate/0
> 		    try_to_wake_up
> 		        ttwu_queue
> 		            ttwu_queue_cond ->meet below case
> 		                if (cpu == smp_processor_id())
> 			         return false;
> 			ttwu_do_activate
> 			//migrate/0 wakeup done
> 		wake_up_process // wakeup migrate/1
> 	           try_to_wake_up
> 		    ttwu_queue
> 			ttwu_queue_cond
> 		        ttwu_queue_wakelist
> 			__ttwu_queue_wakelist
> 			__smp_call_single_queue
> 	preempt_enable();
>
> 							2nd balance_push
> 							stop_one_cpu_nowait
> 							cpu_stop_queue_work
> 							__cpu_stop_queue_work
> 							list_add_tail  -> 2nd add push_work, so the double list add is detected
> 							...
> 							...
> 							cpu1 get ipi, do sched_ttwu_pending, wakeup migrate/1
>

So this balance_push() is part of schedule(), and schedule() is supposed
to switch to stopper task, but because of this race condition, stopper
task is stuck in WAKING state and not actually visible to be picked.

Therefore CPU1 can do another schedule() and end up doing another
balance_push() even though the last one hasn't been done yet.

This is a confluence of fail, where both wake_q and ttwu_wakelist can
cause crucial wakeups to be delayed, resulting in the malfunction of
balance_push.

Since there is only a single stopper thread to be woken, the wake_q
doesn't really add anything here, and can be removed in favour of
direct wakeups of the stopper thread.

Then add a clause to ttwu_queue_cond() to ensure the stopper threads
are never queued / delayed.

Of all 3 moving parts, the last addition was the balance_push()
machinery, so pick that as the point the bug was introduced.

Fixes: 2558aacff858 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug")
Reported-by: Kuyo Chang <kuyo.chang@mediatek.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Kuyo Chang <kuyo.chang@mediatek.com>
Link: https://lkml.kernel.org/r/20250605100009.GO39944@noisy.programming.kicks-ass.net
(cherry picked from commit b18ad3387895ae22eb784f721d476094ad71899b
 git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent)
Bug: 419157029
Change-Id: Ia54be189b1ab08f2171c094e4182ebb99330565f
[jstultz: Resolved trivial collision in cherry-pick]
Signed-off-by: John Stultz <jstultz@google.com>
---
 kernel/sched/core.c   |  5 +++++
 kernel/stop_machine.c | 20 ++++++++++----------
 2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e3e17a54c71f..41f11c0f834e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4073,6 +4073,11 @@ bool cpus_share_cache(int this_cpu, int that_cpu)
 
 static inline bool ttwu_queue_cond(struct task_struct *p, int cpu)
 {
+#ifdef CONFIG_SMP
+	if (p->sched_class == &stop_sched_class)
+		return false;
+#endif
+
 	/*
 	 * Do not complicate things with the async wake_list while the CPU is
 	 * in hotplug state.
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index 7b65bb0b4a66..0c3d387d3db7 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -82,18 +82,15 @@ static void cpu_stop_signal_done(struct cpu_stop_done *done)
 }
 
 static void __cpu_stop_queue_work(struct cpu_stopper *stopper,
-					struct cpu_stop_work *work,
-					struct wake_q_head *wakeq)
+				  struct cpu_stop_work *work)
 {
 	list_add_tail(&work->list, &stopper->works);
-	wake_q_add(wakeq, stopper->thread);
 }
 
 /* queue @work to @stopper.  if offline, @work is completed immediately */
 static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
 {
 	struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
-	DEFINE_WAKE_Q(wakeq);
 	unsigned long flags;
 	bool enabled;
 
@@ -101,12 +98,13 @@ static bool cpu_stop_queue_work(unsigned int cpu, struct cpu_stop_work *work)
 	raw_spin_lock_irqsave(&stopper->lock, flags);
 	enabled = stopper->enabled;
 	if (enabled)
-		__cpu_stop_queue_work(stopper, work, &wakeq);
+		__cpu_stop_queue_work(stopper, work);
 	else if (work->done)
 		cpu_stop_signal_done(work->done);
 	raw_spin_unlock_irqrestore(&stopper->lock, flags);
 
-	wake_up_q(&wakeq);
+	if (enabled)
+		wake_up_process(stopper->thread);
 	preempt_enable();
 
 	return enabled;
@@ -264,7 +262,6 @@ static int cpu_stop_queue_two_works(int cpu1, struct cpu_stop_work *work1,
 {
 	struct cpu_stopper *stopper1 = per_cpu_ptr(&cpu_stopper, cpu1);
 	struct cpu_stopper *stopper2 = per_cpu_ptr(&cpu_stopper, cpu2);
-	DEFINE_WAKE_Q(wakeq);
 	int err;
 
 retry:
@@ -300,8 +297,8 @@ retry:
 	}
 
 	err = 0;
-	__cpu_stop_queue_work(stopper1, work1, &wakeq);
-	__cpu_stop_queue_work(stopper2, work2, &wakeq);
+	__cpu_stop_queue_work(stopper1, work1);
+	__cpu_stop_queue_work(stopper2, work2);
 
 unlock:
 	raw_spin_unlock(&stopper2->lock);
@@ -316,7 +313,10 @@ unlock:
 		goto retry;
 	}
 
-	wake_up_q(&wakeq);
+	if (!err) {
+		wake_up_process(stopper1->thread);
+		wake_up_process(stopper2->thread);
+	}
 	preempt_enable();
 
 	return err;

From 6d27de405aaf6127f4b7184a8377813eb2a030a5 Mon Sep 17 00:00:00 2001
From: Nikita Ioffe <ioffe@google.com>
Date: Tue, 1 Jul 2025 18:35:42 +0000
Subject: [PATCH 19/49] ANDROID: KVM: arm64: use hyp_trace_raw_fops for
 trace_pipe_raw

The trace_pipe_raw interface is expected to return binary trace format,
while hyp_trace_pipe_fops returns the text trace format. This patch
change trace_pipe_raw fops hyp_trace_raw_fops which provides the binary
output.

Bug: 428904926
Test: presubmit
Change-Id: Id72d2c7df366934f00b17674078c94c2b2d288be
Signed-off-by: Nikita Ioffe <ioffe@google.com>
---
 arch/arm64/kvm/hyp_trace.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp_trace.c b/arch/arm64/kvm/hyp_trace.c
index b4a5d117568a..b6ce4aa33eb0 100644
--- a/arch/arm64/kvm/hyp_trace.c
+++ b/arch/arm64/kvm/hyp_trace.c
@@ -861,7 +861,7 @@ int hyp_trace_init_tracefs(void)
 		tracefs_create_file("trace_pipe", TRACEFS_MODE_READ, per_cpu_dir,
 				    (void *)cpu, &hyp_trace_pipe_fops);
 		tracefs_create_file("trace_pipe_raw", TRACEFS_MODE_READ, per_cpu_dir,
-				    (void *)cpu, &hyp_trace_pipe_fops);
+				    (void *)cpu, &hyp_trace_raw_fops);
 	}
 
 	hyp_trace_init_event_tracefs(root);

From 65f295739c930f94ecd495b993c7571b2c7f4e95 Mon Sep 17 00:00:00 2001
From: Nikita Ioffe <ioffe@google.com>
Date: Thu, 3 Jul 2025 16:06:03 +0000
Subject: [PATCH 20/49] ANDROID: kvm: arm64: start hypervisor event IDs from 1

IDs of tracing events are expected to be positive integers, hence this
patch. This is a quick fix to make sure that hypervisor tracing works,
while the proper solution that avoids ID collision is being worked on.

Bug: 428904926
Test: presubmit
Change-Id: I95459cbf32466351b6a539ea2111e8d091291c2b
Signed-off-by: Nikita Ioffe <ioffe@google.com>
---
 arch/arm64/kvm/hyp_events.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kvm/hyp_events.c b/arch/arm64/kvm/hyp_events.c
index 424cd5189355..086931bec32c 100644
--- a/arch/arm64/kvm/hyp_events.c
+++ b/arch/arm64/kvm/hyp_events.c
@@ -250,7 +250,10 @@ bool hyp_trace_init_event_early(void)
 }
 
 static struct dentry *event_tracefs;
-static unsigned int last_event_id;
+// Event IDs should be positive integers, hence starting from 1 here.
+// NOTE: this introduces ID clash between hypervisor events and kernel events.
+// For now this doesn't seem to cause problems, but we should fix it...
+static unsigned int last_event_id = 1;
 
 struct hyp_event_table {
 	struct hyp_event	*start;

From 925ea90047178598e9bfa45b7e82505c069d3dec Mon Sep 17 00:00:00 2001
From: Nikita Ioffe <ioffe@google.com>
Date: Fri, 4 Jul 2025 12:04:53 +0000
Subject: [PATCH 21/49] ANDROID: kvm: arm64: add per_cpu/cpuX/trace file

The trace interface was present in android14-6.1 kernel, and is used by
perfetto (although perfetto can work without it), so we should keep it.

Bug: 428904926
Test: presubmit
Change-Id: I51cc82324b3ef1ad8a801ae54f427eaf8790acd2
Signed-off-by: Nikita Ioffe <ioffe@google.com>
---
 arch/arm64/kvm/hyp_trace.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/kvm/hyp_trace.c b/arch/arm64/kvm/hyp_trace.c
index b6ce4aa33eb0..4eb85aad055f 100644
--- a/arch/arm64/kvm/hyp_trace.c
+++ b/arch/arm64/kvm/hyp_trace.c
@@ -862,6 +862,8 @@ int hyp_trace_init_tracefs(void)
 				    (void *)cpu, &hyp_trace_pipe_fops);
 		tracefs_create_file("trace_pipe_raw", TRACEFS_MODE_READ, per_cpu_dir,
 				    (void *)cpu, &hyp_trace_raw_fops);
+		tracefs_create_file("trace", TRACEFS_MODE_WRITE, per_cpu_dir,
+				    (void *)cpu, &hyp_trace_fops);
 	}
 
 	hyp_trace_init_event_tracefs(root);

From 84bb4ef6233cddaf10a656f21375d5e008d6751c Mon Sep 17 00:00:00 2001
From: Kuen-Han Tsai <khtsai@google.com>
Date: Tue, 17 Jun 2025 13:07:12 +0800
Subject: [PATCH 22/49] UPSTREAM: usb: gadget: u_serial: Fix race condition in
 TTY wakeup

A race condition occurs when gs_start_io() calls either gs_start_rx() or
gs_start_tx(), as those functions briefly drop the port_lock for
usb_ep_queue(). This allows gs_close() and gserial_disconnect() to clear
port.tty and port_usb, respectively.

Use the null-safe TTY Port helper function to wake up TTY.

Example
  CPU1:			      CPU2:
  gserial_connect() // lock
  			      gs_close() // await lock
  gs_start_rx()     // unlock
  usb_ep_queue()
  			      gs_close() // lock, reset port.tty and unlock
  gs_start_rx()     // lock
  tty_wakeup()      // NPE

Fixes: 35f95fd7f234 ("TTY: usb/u_serial, use tty from tty_port")
Cc: stable <stable@kernel.org>
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
Reviewed-by: Prashanth K <prashanth.k@oss.qualcomm.com>
Link: https://lore.kernel.org/linux-usb/20240116141801.396398-1-khtsai@google.com/
Link: https://lore.kernel.org/r/20250617050844.1848232-2-khtsai@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Bug: 417232809
(cherry picked from commit c529c3730bd09115684644e26bf01ecbd7e2c2c9)
Change-Id: I0dfff41aeae526bc3c334266f2773e6636d8dd33
Signed-off-by: Kuen-Han Tsai <khtsai@google.com>
---
 drivers/usb/gadget/function/u_serial.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/gadget/function/u_serial.c b/drivers/usb/gadget/function/u_serial.c
index 729b0472bab0..7a306b11881f 100644
--- a/drivers/usb/gadget/function/u_serial.c
+++ b/drivers/usb/gadget/function/u_serial.c
@@ -291,8 +291,8 @@ __acquires(&port->port_lock)
 			break;
 	}
 
-	if (do_tty_wake && port->port.tty)
-		tty_wakeup(port->port.tty);
+	if (do_tty_wake)
+		tty_port_tty_wakeup(&port->port);
 	return status;
 }
 
@@ -573,7 +573,7 @@ static int gs_start_io(struct gs_port *port)
 		gs_start_tx(port);
 		/* Unblock any pending writes into our circular buffer, in case
 		 * we didn't in gs_start_tx() */
-		tty_wakeup(port->port.tty);
+		tty_port_tty_wakeup(&port->port);
 	} else {
 		/* Free reqs only if we are still connected */
 		if (port->port_usb) {

From 5b1c4cc0868731a80e0460cc81504d7a40130f02 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Yuan-Jen=20=28=E6=B7=B5=E4=BB=81=29=20Cheng?=
 <cyuanjen@google.com>
Date: Thu, 3 Jul 2025 10:15:11 +0000
Subject: [PATCH 23/49] ANDROID: Add the dma header to aarch64 allowlist
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Export the header in all_headers_allowlist_aarch64, for dma driver to
use.

Bug: 343869732
Test: Verified the dma ddk modules are able to include the header.
Change-Id: Ib4bc8dada58495dc25bb1b41e6b502c18fe59591
Signed-off-by: Yuan-Jen (淵仁) Cheng <cyuanjen@google.com>
---
 BUILD.bazel | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/BUILD.bazel b/BUILD.bazel
index 4be135108507..d17808239217 100644
--- a/BUILD.bazel
+++ b/BUILD.bazel
@@ -1025,6 +1025,7 @@ ddk_headers(
     name = "all_headers_allowlist_aarch64",
     hdrs = [
         "drivers/dma-buf/heaps/deferred-free-helper.h",
+        "drivers/dma/dmaengine.h",
         "drivers/extcon/extcon.h",
         "drivers/pci/controller/dwc/pcie-designware.h",
         "drivers/thermal/thermal_core.h",
@@ -1046,6 +1047,7 @@ ddk_headers(
         "arch/arm64/include",
         "arch/arm64/include/uapi",
         "drivers/dma-buf",
+        "drivers/dma",
         "drivers/extcon",
         "drivers/pci/controller/dwc",
         "drivers/thermal",

From 949ed5babab53bdde799b8ece3922047054f4beb Mon Sep 17 00:00:00 2001
From: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
Date: Thu, 26 Jun 2025 11:19:45 +0800
Subject: [PATCH 24/49] ANDROID: mm: export vm_normal_folio_pmd to allow
 vendors to implement simplified smaps

The current process smaps operation is time-consuming. Exporting the
vm_normal_folio_pmd function enables vendors to provide a more efficient and simplified version of smaps.

Bug: 427633539
Change-Id: I7710f5d1656a9f7a4ae883aefc93135c93e637b5
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 mm/memory.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/memory.c b/mm/memory.c
index a04841dc9291..2646e93c9004 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -713,6 +713,7 @@ struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma,
 		return page_folio(page);
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(vm_normal_folio_pmd);
 #endif
 
 static void restore_exclusive_pte(struct vm_area_struct *vma,

From 573a6732fcff2e3a5f529c8c7c401b6f46be18ff Mon Sep 17 00:00:00 2001
From: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
Date: Fri, 4 Jul 2025 16:58:00 +0800
Subject: [PATCH 25/49] ANDROID: GKI: Update symbols list file for honor White
 list the vm_normal_folio_pmd

1 function symbol(s) added
  'struct folio* vm_normal_folio_pmd(struct vm_area_struct*, unsigned long, pmd_t)'

Bug: 427633539
Change-Id: Ib7e2f6d4a574871202cc10b32342f84a63811c0a
Signed-off-by: yipeng xiang <yipengxiang@honor.corp-partner.google.com>
---
 android/abi_gki_aarch64.stg   | 17 +++++++++++++++++
 android/abi_gki_aarch64_honor |  1 +
 2 files changed, 18 insertions(+)

diff --git a/android/abi_gki_aarch64.stg b/android/abi_gki_aarch64.stg
index 41b693bf1454..baa12bbf4722 100644
--- a/android/abi_gki_aarch64.stg
+++ b/android/abi_gki_aarch64.stg
@@ -321953,6 +321953,13 @@ function {
   parameter_id: 0x27a7c613
   parameter_id: 0x4585663f
 }
+function {
+  id: 0x5e21336c
+  return_type_id: 0x2170d06d
+  parameter_id: 0x0a134144
+  parameter_id: 0x33756485
+  parameter_id: 0xae60496e
+}
 function {
   id: 0x5e29431a
   return_type_id: 0x295c7202
@@ -434551,6 +434558,15 @@ elf_symbol {
   type_id: 0xfc37fa4b
   full_name: "vm_node_stat"
 }
+elf_symbol {
+  id: 0x4e194253
+  name: "vm_normal_folio_pmd"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0xa737dbaa
+  type_id: 0x5e21336c
+  full_name: "vm_normal_folio_pmd"
+}
 elf_symbol {
   id: 0x2570ceae
   name: "vm_normal_page"
@@ -445161,6 +445177,7 @@ interface {
   symbol_id: 0xdc09fb10
   symbol_id: 0x5849ff8e
   symbol_id: 0xaf85c216
+  symbol_id: 0x4e194253
   symbol_id: 0x2570ceae
   symbol_id: 0xacc76406
   symbol_id: 0xef2c49d1
diff --git a/android/abi_gki_aarch64_honor b/android/abi_gki_aarch64_honor
index 48c49720e0ea..33decf01d449 100644
--- a/android/abi_gki_aarch64_honor
+++ b/android/abi_gki_aarch64_honor
@@ -94,6 +94,7 @@
   bio_crypt_set_ctx
   zero_fill_bio_iter
   percpu_ref_is_zero
+  vm_normal_folio_pmd
   __trace_bputs
   __traceiter_android_vh_proactive_compact_wmark_high
   __tracepoint_android_vh_proactive_compact_wmark_high

From 5f592a6260c334e0b195ef229eb11910fc5a5890 Mon Sep 17 00:00:00 2001
From: Aran Dalton <arda@allwinnertech.com>
Date: Mon, 7 Jul 2025 16:56:35 +0800
Subject: [PATCH 26/49] ANDROID: ABI: Update symbol list for sunxi

5 function symbol(s) added
  'bool drm_is_panel_follower(struct device*)'
  'int drm_panel_add_follower(struct device*, struct drm_panel_follower*)'
  'void drm_panel_remove_follower(struct drm_panel_follower*)'
  'int hid_driver_reset_resume(struct hid_device*)'
  'int hid_driver_suspend(struct hid_device*, pm_message_t)'

Bug: 429955708
Change-Id: Iaf02aef7b07559aafd283f496b3c7088d0b89669
Signed-off-by: Aran Dalton <arda@allwinnertech.com>
---
 android/abi_gki_aarch64.stg   | 129 ++++++++++++++++++++++++++++++++++
 android/abi_gki_aarch64_sunxi |   5 ++
 2 files changed, 134 insertions(+)

diff --git a/android/abi_gki_aarch64.stg b/android/abi_gki_aarch64.stg
index baa12bbf4722..679394de3bae 100644
--- a/android/abi_gki_aarch64.stg
+++ b/android/abi_gki_aarch64.stg
@@ -8228,6 +8228,11 @@ pointer_reference {
   kind: POINTER
   pointee_type_id: 0x15e4d187
 }
+pointer_reference {
+  id: 0x0fe9f911
+  kind: POINTER
+  pointee_type_id: 0x15e702d9
+}
 pointer_reference {
   id: 0x0fe9ffda
   kind: POINTER
@@ -17573,6 +17578,11 @@ pointer_reference {
   kind: POINTER
   pointee_type_id: 0x9e7aaf3f
 }
+pointer_reference {
+  id: 0x2d0e9efd
+  kind: POINTER
+  pointee_type_id: 0x9e7a9d6b
+}
 pointer_reference {
   id: 0x2d0fdd7c
   kind: POINTER
@@ -27928,6 +27938,11 @@ pointer_reference {
   kind: POINTER
   pointee_type_id: 0xca7029d8
 }
+pointer_reference {
+  id: 0x380eb497
+  kind: POINTER
+  pointee_type_id: 0xca7a34c0
+}
 pointer_reference {
   id: 0x381020ff
   kind: POINTER
@@ -35043,6 +35058,11 @@ qualified {
   qualifier: CONST
   qualified_type_id: 0x592e728c
 }
+qualified {
+  id: 0xca7a34c0
+  qualifier: CONST
+  qualified_type_id: 0x59af6589
+}
 qualified {
   id: 0xca8285c3
   qualifier: CONST
@@ -99904,6 +99924,11 @@ member {
   type_id: 0x37e7a473
   offset: 768
 }
+member {
+  id: 0x36181e96
+  name: "funcs"
+  type_id: 0x380eb497
+}
 member {
   id: 0x36184afd
   name: "funcs"
@@ -152610,6 +152635,12 @@ member {
   type_id: 0x9bd401b6
   offset: 16
 }
+member {
+  id: 0xd3327091
+  name: "panel"
+  type_id: 0x10617cac
+  offset: 192
+}
 member {
   id: 0xd3a8d2cb
   name: "panel"
@@ -152633,6 +152664,17 @@ member {
   type_id: 0x2a670b41
   offset: 9024
 }
+member {
+  id: 0xf2e51365
+  name: "panel_prepared"
+  type_id: 0x2d0e9efd
+}
+member {
+  id: 0x289370ad
+  name: "panel_unpreparing"
+  type_id: 0x2d0e9efd
+  offset: 64
+}
 member {
   id: 0x616a797d
   name: "panic"
@@ -239344,6 +239386,27 @@ struct_union {
     member_id: 0x3a2d3750
   }
 }
+struct_union {
+  id: 0x15e702d9
+  kind: STRUCT
+  name: "drm_panel_follower"
+  definition {
+    bytesize: 32
+    member_id: 0x36181e96
+    member_id: 0x7c00ebb3
+    member_id: 0xd3327091
+  }
+}
+struct_union {
+  id: 0x59af6589
+  kind: STRUCT
+  name: "drm_panel_follower_funcs"
+  definition {
+    bytesize: 16
+    member_id: 0xf2e51365
+    member_id: 0x289370ad
+  }
+}
 struct_union {
   id: 0x5c75f1b8
   kind: STRUCT
@@ -308489,6 +308552,11 @@ function {
   parameter_id: 0x0258f96e
   parameter_id: 0xd41e888f
 }
+function {
+  id: 0x13622fd7
+  return_type_id: 0x48b5725f
+  parameter_id: 0x0fe9f911
+}
 function {
   id: 0x1362a71c
   return_type_id: 0x48b5725f
@@ -345553,6 +345621,12 @@ function {
   parameter_id: 0x0258f96e
   parameter_id: 0x0fa01494
 }
+function {
+  id: 0x9d297a90
+  return_type_id: 0x6720d32f
+  parameter_id: 0x0258f96e
+  parameter_id: 0x0fe9f911
+}
 function {
   id: 0x9d2c14da
   return_type_id: 0x6720d32f
@@ -348130,6 +348204,11 @@ function {
   parameter_id: 0x0c2e195c
   parameter_id: 0x3ca4f8de
 }
+function {
+  id: 0x9e7a9d6b
+  return_type_id: 0x6720d32f
+  parameter_id: 0x0fe9f911
+}
 function {
   id: 0x9e7aaf3f
   return_type_id: 0x6720d32f
@@ -389913,6 +389992,15 @@ elf_symbol {
   type_id: 0xfa1de4ef
   full_name: "drm_is_current_master"
 }
+elf_symbol {
+  id: 0xa3983618
+  name: "drm_is_panel_follower"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0xcfdfa487
+  type_id: 0xfe32655f
+  full_name: "drm_is_panel_follower"
+}
 elf_symbol {
   id: 0xc8af6225
   name: "drm_kms_helper_connector_hotplug_event"
@@ -390561,6 +390649,15 @@ elf_symbol {
   type_id: 0x14800eb8
   full_name: "drm_panel_add"
 }
+elf_symbol {
+  id: 0x2b742694
+  name: "drm_panel_add_follower"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0x2db618bd
+  type_id: 0x9d297a90
+  full_name: "drm_panel_add_follower"
+}
 elf_symbol {
   id: 0xd67ad69f
   name: "drm_panel_bridge_add_typed"
@@ -390651,6 +390748,15 @@ elf_symbol {
   type_id: 0x14800eb8
   full_name: "drm_panel_remove"
 }
+elf_symbol {
+  id: 0x6016204a
+  name: "drm_panel_remove_follower"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0x397cfaf5
+  type_id: 0x13622fd7
+  full_name: "drm_panel_remove_follower"
+}
 elf_symbol {
   id: 0x046720ab
   name: "drm_panel_unprepare"
@@ -396752,6 +396858,24 @@ elf_symbol {
   type_id: 0x13e1603f
   full_name: "hid_destroy_device"
 }
+elf_symbol {
+  id: 0x1706be22
+  name: "hid_driver_reset_resume"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0x371549c9
+  type_id: 0x9ef9d283
+  full_name: "hid_driver_reset_resume"
+}
+elf_symbol {
+  id: 0x4c3911f0
+  name: "hid_driver_suspend"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0xe6a4222b
+  type_id: 0x9d398c85
+  full_name: "hid_driver_suspend"
+}
 elf_symbol {
   id: 0x8717f26f
   name: "hid_hw_close"
@@ -440224,6 +440348,7 @@ interface {
   symbol_id: 0x3a6e27e9
   symbol_id: 0xc9aa2ffd
   symbol_id: 0xec79cf1c
+  symbol_id: 0xa3983618
   symbol_id: 0xc8af6225
   symbol_id: 0x8a043efe
   symbol_id: 0x3c6b600d
@@ -440296,6 +440421,7 @@ interface {
   symbol_id: 0xc73568f4
   symbol_id: 0x124ae77d
   symbol_id: 0xdc6725cf
+  symbol_id: 0x2b742694
   symbol_id: 0xd67ad69f
   symbol_id: 0x48cde8a9
   symbol_id: 0x633d0644
@@ -440306,6 +440432,7 @@ interface {
   symbol_id: 0xad1d778f
   symbol_id: 0xcf81b673
   symbol_id: 0x864914fa
+  symbol_id: 0x6016204a
   symbol_id: 0x046720ab
   symbol_id: 0x3c07bbff
   symbol_id: 0xbdb562b1
@@ -440982,6 +441109,8 @@ interface {
   symbol_id: 0xccc593d6
   symbol_id: 0x97a02af0
   symbol_id: 0x2ffc7c7e
+  symbol_id: 0x1706be22
+  symbol_id: 0x4c3911f0
   symbol_id: 0x8717f26f
   symbol_id: 0x361004c8
   symbol_id: 0xcf5ea9a2
diff --git a/android/abi_gki_aarch64_sunxi b/android/abi_gki_aarch64_sunxi
index 4b51d7f71b55..27b308dd7254 100644
--- a/android/abi_gki_aarch64_sunxi
+++ b/android/abi_gki_aarch64_sunxi
@@ -91,3 +91,8 @@
   __tracepoint_dwc3_readl
   __tracepoint_dwc3_writel
   pinctrl_gpio_set_config
+  drm_is_panel_follower
+  drm_panel_add_follower
+  drm_panel_remove_follower
+  hid_driver_reset_resume
+  hid_driver_suspend

From be36ded30366f3f52140daa24fc1ab18efd3f9a0 Mon Sep 17 00:00:00 2001
From: John Stultz <jstultz@google.com>
Date: Sun, 6 Jul 2025 18:08:53 +0000
Subject: [PATCH 27/49] ANDROID: Revert "cpufreq: Avoid using inconsistent
 policy->min and policy->max"

The combination of the cpufreq changes that came in with v6.12.28,
commit 573b04722907 ("cpufreq: Avoid using inconsistent
policy->min and policy->max") and commit 962d88304c3c ("cpufreq:
Fix setting policy limits when frequency tables are used")
unfortunately broke the KABI.

The second of which was reverted in ad2b007ef43c ("Revert
"cpufreq: Fix setting policy limits when frequency tables are
used"").

However, that change is actually a necessary fix to the first.
As the refactoring to passing the max and min through the
arguments couldn't be done without KABI impact, the
changes to be more consistent with policy->min/max ends up
introducing a subtle problem where the new max value being
set ends up being clamped to the current max value - thus
cpufreq max can be reduced but not increased (with the
min increased but not decreased).

A minimal fix of this effectively undoes the key point of
commit 573b04722907, so it seems best to revert the whole thing
for now.

I think the small pre-existing risk of the policy->max/min values
being read when shortly to an intermediate value before getting
assigned the final value seems to be less problematic in practice.

Fixes: ad2b007ef43c ("Revert "cpufreq: Fix setting policy limits when frequency tables are used"")
Bug: 428984800
Signed-off-by: John Stultz <jstultz@google.com>
Change-Id: I5a76cc2b0056071ffa26a682458df5fe0a4b83a3
---
 drivers/cpufreq/cpufreq.c | 31 ++++++-------------------------
 1 file changed, 6 insertions(+), 25 deletions(-)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index d0aba74067c9..3a7bd62ef6b7 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -543,6 +543,7 @@ static unsigned int __resolve_freq(struct cpufreq_policy *policy,
 	unsigned int idx;
 	unsigned int old_target_freq = target_freq;
 
+	target_freq = clamp_val(target_freq, policy->min, policy->max);
 	trace_android_vh_cpufreq_resolve_freq(policy, &target_freq, old_target_freq);
 
 	if (!policy->freq_table)
@@ -568,22 +569,7 @@ static unsigned int __resolve_freq(struct cpufreq_policy *policy,
 unsigned int cpufreq_driver_resolve_freq(struct cpufreq_policy *policy,
 					 unsigned int target_freq)
 {
-	unsigned int min = READ_ONCE(policy->min);
-	unsigned int max = READ_ONCE(policy->max);
-
-	/*
-	 * If this function runs in parallel with cpufreq_set_policy(), it may
-	 * read policy->min before the update and policy->max after the update
-	 * or the other way around, so there is no ordering guarantee.
-	 *
-	 * Resolve this by always honoring the max (in case it comes from
-	 * thermal throttling or similar).
-	 */
-	if (unlikely(min > max))
-		min = max;
-
-	return __resolve_freq(policy, clamp_val(target_freq, min, max),
-			      CPUFREQ_RELATION_LE);
+	return __resolve_freq(policy, target_freq, CPUFREQ_RELATION_LE);
 }
 EXPORT_SYMBOL_GPL(cpufreq_driver_resolve_freq);
 
@@ -2369,7 +2355,6 @@ int __cpufreq_driver_target(struct cpufreq_policy *policy,
 	if (cpufreq_disabled())
 		return -ENODEV;
 
-	target_freq = clamp_val(target_freq, policy->min, policy->max);
 	target_freq = __resolve_freq(policy, target_freq, relation);
 
 	trace_android_vh_cpufreq_target(policy, &target_freq, old_target_freq);
@@ -2662,15 +2647,11 @@ static int cpufreq_set_policy(struct cpufreq_policy *policy,
 	 * Resolve policy min/max to available frequencies. It ensures
 	 * no frequency resolution will neither overshoot the requested maximum
 	 * nor undershoot the requested minimum.
-	 *
-	 * Avoid storing intermediate values in policy->max or policy->min and
-	 * compiler optimizations around them because they may be accessed
-	 * concurrently by cpufreq_driver_resolve_freq() during the update.
 	 */
-	WRITE_ONCE(policy->max, __resolve_freq(policy, new_data.max, CPUFREQ_RELATION_H));
-	new_data.min = __resolve_freq(policy, new_data.min, CPUFREQ_RELATION_L);
-	WRITE_ONCE(policy->min, new_data.min > policy->max ? policy->max : new_data.min);
-
+	policy->min = new_data.min;
+	policy->max = new_data.max;
+	policy->min = __resolve_freq(policy, policy->min, CPUFREQ_RELATION_L);
+	policy->max = __resolve_freq(policy, policy->max, CPUFREQ_RELATION_H);
 	trace_cpu_frequency_limits(policy);
 
 	policy->cached_target_freq = UINT_MAX;

From 96c29dad8f890f2f21f8f8a044eaa714cdfb92cb Mon Sep 17 00:00:00 2001
From: Yu Kuai <yukuai3@huawei.com>
Date: Fri, 19 Jul 2024 15:15:04 +0800
Subject: [PATCH 28/49] UPSTREAM: blk-cgroup: check for pd_(alloc|free)_fn in
 blkcg_activate_policy()

Currently all policies implement pd_(alloc|free)_fn, however, this is
not necessary for ioprio that only works for blkcg, not blkg.

There are no functional changes, prepare to cleanup activating ioprio
policy.

Change-Id: I47d38ac673419e9676de6f13838f55f45027d35e
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20240719071506.158075-2-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Bug:427107450
(cherry picked from commit ae8650b45d1837aae117fa147aeef69540bb3fe8)
Reviewed-by: Zhengxu Zhang <zhengxu.zhang@unisoc.com>
---
 block/blk-cgroup.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 64551b0aa51e..91b788149381 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1566,6 +1566,14 @@ int blkcg_activate_policy(struct gendisk *disk, const struct blkcg_policy *pol)
 	if (blkcg_policy_enabled(q, pol))
 		return 0;
 
+	/*
+	 * Policy is allowed to be registered without pd_alloc_fn/pd_free_fn,
+	 * for example, ioprio. Such policy will work on blkcg level, not disk
+	 * level, and don't need to be activated.
+	 */
+	if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn))
+		return -EINVAL;
+
 	if (queue_is_mq(q))
 		blk_mq_freeze_queue(q);
 retry:
@@ -1745,9 +1753,12 @@ int blkcg_policy_register(struct blkcg_policy *pol)
 		goto err_unlock;
 	}
 
-	/* Make sure cpd/pd_alloc_fn and cpd/pd_free_fn in pairs */
+	/*
+	 * Make sure cpd/pd_alloc_fn and cpd/pd_free_fn in pairs, and policy
+	 * without pd_alloc_fn/pd_free_fn can't be activated.
+	 */
 	if ((!pol->cpd_alloc_fn ^ !pol->cpd_free_fn) ||
-		(!pol->pd_alloc_fn ^ !pol->pd_free_fn))
+	    (!pol->pd_alloc_fn ^ !pol->pd_free_fn))
 		goto err_unlock;
 
 	/* register @pol */

From 250bbe1cbfafba17b58373bd17426340011bb28b Mon Sep 17 00:00:00 2001
From: "Isaac J. Manjarres" <isaacmanjarres@google.com>
Date: Tue, 8 Jul 2025 12:14:04 -0700
Subject: [PATCH 29/49] ANDROID: GKI: Update symbol list for Pixel Watch

Bug: 430323364
Bug: 411748239
Change-Id: I48a305eab7d7901a5642674d9a71bec813193470
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
---
 android/abi_gki_aarch64_pixel_watch | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/android/abi_gki_aarch64_pixel_watch b/android/abi_gki_aarch64_pixel_watch
index db61c65ca0ff..a5621a839612 100644
--- a/android/abi_gki_aarch64_pixel_watch
+++ b/android/abi_gki_aarch64_pixel_watch
@@ -288,6 +288,7 @@
   delayed_work_timer_fn
   destroy_workqueue
   dev_addr_mod
+  _dev_alert
   dev_alloc_name
   __dev_change_net_namespace
   dev_close
@@ -869,6 +870,7 @@
   gpiod_get_raw_value
   gpiod_get_raw_value_cansleep
   gpiod_get_value
+  gpiod_is_active_low
   gpiod_set_raw_value
   gpiod_set_value
   gpiod_set_value_cansleep
@@ -2091,6 +2093,7 @@
   tick_nohz_get_sleep_length
   timer_delete
   timer_delete_sync
+  timer_shutdown_sync
   topology_clear_scale_freq_source
   topology_update_done
   topology_update_thermal_pressure
@@ -2171,6 +2174,10 @@
   __traceiter_mmap_lock_acquire_returned
   __traceiter_mmap_lock_released
   __traceiter_mmap_lock_start_locking
+  __traceiter_rwmmio_post_read
+  __traceiter_rwmmio_post_write
+  __traceiter_rwmmio_read
+  __traceiter_rwmmio_write
   __traceiter_sched_overutilized_tp
   __traceiter_sched_switch
   __traceiter_sk_data_ready
@@ -2246,6 +2253,10 @@
   tracepoint_probe_register
   tracepoint_probe_register_prio
   tracepoint_probe_unregister
+  __tracepoint_rwmmio_post_read
+  __tracepoint_rwmmio_post_write
+  __tracepoint_rwmmio_read
+  __tracepoint_rwmmio_write
   __tracepoint_sched_overutilized_tp
   __tracepoint_sched_switch
   __tracepoint_sk_data_ready

From f44d593749dcbd4e1013121fa615ecca412d1cb3 Mon Sep 17 00:00:00 2001
From: "T.J. Mercier" <tjmercier@google.com>
Date: Wed, 25 Jun 2025 20:06:55 +0000
Subject: [PATCH 30/49] ANDROID: Track per-process dmabuf RSS

DMA buffers exist for sharing memory (between processes, drivers, and
hardware) so they are not accounted the same way as user memory present
on a MM's LRUs. Per-process attribution of dmabuf memory is not
maintained by the kernel, so to obtain it from userspace, several files
from procfs and sysfs must be read any time the information is desired.
This process is slow, which can lead to dmabuf accounting information
being out-of-date when it is desired during events like low memory, or
bugreport generation, masking the cause of memory issues.

This patch attributes dmabuf memory to any process that holds a
reference to a buffer. A process can hold a reference to a dmabuf in two
ways:
  1) Through a file descriptor
  2) Though a mapping

A single buffer can be referenced more than once by a single process
with multiple file descriptors for the same buffer, multiple mappings
for the same buffer, or any combination of the two.

The full size of a buffer is effectively pinned until no references
exist from any process, or anywhere else in the kernel such as drivers
that have imported the buffer. Even if a partial mapping of the buffer
is the only reference that exists. Therefore buffer accounting is always
performed in units of the full buffer size, and only once for each
process, regardless of the number and type of references a process has
for a single buffer.

The /proc/<pid>/dmabuf_rss file in procfs now reports the sum of all
buffer sizes referenced by a process. The units are bytes. This allows
userspace to obtain per-process dmabuf accounting information quickly
compared to calculating it from multiple sources in procfs and sysfs.

Note that a dmabuf can be backed by different types of memory such as
system DRAM, GPU VRAM, or others. This patch makes no distinction
between these different types of memory, so on systems with non-unified
memory the reported values should be interpreted with this in mind.

Bug: 424648392
Change-Id: I1de8e937f2971fe714008b459e410dde2a251b90
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 drivers/dma-buf/dma-buf.c | 141 +++++++++++++++++++++++++++++++++++++-
 fs/file.c                 |   4 ++
 fs/proc/base.c            |  22 ++++++
 include/linux/dma-buf.h   |  43 ++++++++++++
 include/linux/sched.h     |   4 ++
 init/init_task.c          |   1 +
 kernel/fork.c             |  85 ++++++++++++++++++++++-
 mm/mmap.c                 |  14 +++-
 8 files changed, 307 insertions(+), 7 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 0b02ced1eb33..c8c05d2e112a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -162,9 +162,121 @@ static struct file_system_type dma_buf_fs_type = {
 	.kill_sb = kill_anon_super,
 };
 
+static struct task_dma_buf_record *find_task_dmabuf_record(
+		struct task_struct *task, struct dma_buf *dmabuf)
+{
+	struct task_dma_buf_record *rec;
+
+	lockdep_assert_held(&task->dmabuf_info->lock);
+
+	list_for_each_entry(rec, &task->dmabuf_info->dmabufs, node)
+		if (dmabuf == rec->dmabuf)
+			return rec;
+
+	return NULL;
+}
+
+static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmabuf)
+{
+	struct task_dma_buf_record *rec;
+
+	lockdep_assert_held(&task->dmabuf_info->lock);
+
+	rec = kmalloc(sizeof(*rec), GFP_KERNEL);
+	if (!rec)
+		return -ENOMEM;
+
+	task->dmabuf_info->rss += dmabuf->size;
+	rec->dmabuf = dmabuf;
+	rec->refcnt = 1;
+	list_add(&rec->node, &task->dmabuf_info->dmabufs);
+
+	return 0;
+}
+
+/**
+ * dma_buf_account_task - Account a dmabuf to a task
+ * @dmabuf:	[in]	pointer to dma_buf
+ * @task:	[in]	pointer to task_struct
+ *
+ * When a process obtains a dmabuf file descriptor, or maps a dmabuf, this
+ * function attributes the provided @dmabuf to the @task. The first time @dmabuf
+ * is attributed to @task, the buffer's size is added to the @task's dmabuf RSS.
+ *
+ * Return:
+ * * 0 on success
+ * * A negative error code upon error
+ */
+int dma_buf_account_task(struct dma_buf *dmabuf, struct task_struct *task)
+{
+	struct task_dma_buf_record *rec;
+	int ret = 0;
+
+	if (!dmabuf || !task)
+		return -EINVAL;
+
+	if (!task->dmabuf_info) {
+	    pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+	    return -ENOMEM;
+	}
+
+	spin_lock(&task->dmabuf_info->lock);
+	rec = find_task_dmabuf_record(task, dmabuf);
+	if (!rec)
+		ret = new_task_dmabuf_record(task, dmabuf);
+	else
+		++rec->refcnt;
+	spin_unlock(&task->dmabuf_info->lock);
+
+	return ret;
+}
+
+/**
+ * dma_buf_unaccount_task - Unaccount a dmabuf from a task
+ * @dmabuf:	[in]	pointer to dma_buf
+ * @task:	[in]	pointer to task_struct
+ *
+ * When a process closes a dmabuf file descriptor, or unmaps a dmabuf, this
+ * function removes the provided @dmabuf attribution from the @task. When all
+ * references to @dmabuf are removed from @task, the buffer's size is removed
+ * from the task's dmabuf RSS.
+ *
+ * Return:
+ * * 0 on success
+ * * A negative error code upon error
+ */
+void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
+{
+	struct task_dma_buf_record *rec;
+
+	if (!dmabuf || !task)
+		return;
+
+	if (!task->dmabuf_info) {
+	    pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+	    return;
+	}
+
+	spin_lock(&task->dmabuf_info->lock);
+	rec = find_task_dmabuf_record(task, dmabuf);
+	if (!rec) { /* Failed fd_install? */
+		pr_err("dmabuf not found in task list\n");
+		goto err;
+	}
+
+	if (--rec->refcnt == 0) {
+		list_del(&rec->node);
+		kfree(rec);
+		task->dmabuf_info->rss -= dmabuf->size;
+	}
+err:
+	spin_unlock(&task->dmabuf_info->lock);
+}
+
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 {
 	struct dma_buf *dmabuf;
+	int ret;
 
 	if (!is_dma_buf_file(file))
 		return -EINVAL;
@@ -180,7 +292,15 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 	    dmabuf->size >> PAGE_SHIFT)
 		return -EINVAL;
 
-	return dmabuf->ops->mmap(dmabuf, vma);
+	ret = dma_buf_account_task(dmabuf, current);
+	if (ret)
+		return ret;
+
+	ret = dmabuf->ops->mmap(dmabuf, vma);
+	if (ret)
+		dma_buf_unaccount_task(dmabuf, current);
+
+	return ret;
 }
 
 static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
@@ -557,6 +677,12 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
 	spin_unlock(&dmabuf->name_lock);
 }
 
+static int dma_buf_flush(struct file *file, fl_owner_t id)
+{
+	dma_buf_unaccount_task(file->private_data, current);
+	return 0;
+}
+
 static const struct file_operations dma_buf_fops = {
 	.release	= dma_buf_file_release,
 	.mmap		= dma_buf_mmap_internal,
@@ -565,6 +691,7 @@ static const struct file_operations dma_buf_fops = {
 	.unlocked_ioctl	= dma_buf_ioctl,
 	.compat_ioctl	= compat_ptr_ioctl,
 	.show_fdinfo	= dma_buf_show_fdinfo,
+	.flush		= dma_buf_flush,
 };
 
 /*
@@ -1555,6 +1682,8 @@ EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access_partial);
 int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 		 unsigned long pgoff)
 {
+	int ret;
+
 	if (WARN_ON(!dmabuf || !vma))
 		return -EINVAL;
 
@@ -1575,7 +1704,15 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 	vma_set_file(vma, dmabuf->file);
 	vma->vm_pgoff = pgoff;
 
-	return dmabuf->ops->mmap(dmabuf, vma);
+	ret = dma_buf_account_task(dmabuf, current);
+	if (ret)
+		return ret;
+
+	ret = dmabuf->ops->mmap(dmabuf, vma);
+	if (ret)
+		dma_buf_unaccount_task(dmabuf, current);
+
+	return ret;
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
 
diff --git a/fs/file.c b/fs/file.c
index 1f1181b189bf..e924929ac366 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -20,6 +20,7 @@
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
 #include <linux/close_range.h>
+#include <linux/dma-buf.h>
 #include <net/sock.h>
 
 #include "internal.h"
@@ -593,6 +594,9 @@ void fd_install(unsigned int fd, struct file *file)
 	struct files_struct *files = current->files;
 	struct fdtable *fdt;
 
+	if (is_dma_buf_file(file) && dma_buf_account_task(file->private_data, current))
+		pr_err("FD dmabuf accounting failed\n");
+
 	rcu_read_lock_sched();
 
 	if (unlikely(files->resize_in_progress)) {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 7cff02bc816e..f7d8188b0ccf 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -100,6 +100,7 @@
 #include <linux/cn_proc.h>
 #include <linux/ksm.h>
 #include <linux/cpufreq_times.h>
+#include <linux/dma-buf.h>
 #include <trace/events/oom.h>
 #include <trace/hooks/sched.h>
 #include "internal.h"
@@ -3304,6 +3305,24 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
 }
 #endif /* CONFIG_STACKLEAK_METRICS */
 
+#ifdef CONFIG_DMA_SHARED_BUFFER
+static int proc_dmabuf_rss_show(struct seq_file *m, struct pid_namespace *ns,
+		     struct pid *pid, struct task_struct *task)
+{
+	if (!task->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		return -ENOMEM;
+	}
+
+	if (!(task->flags & PF_KTHREAD))
+		seq_printf(m, "%lld\n", READ_ONCE(task->dmabuf_info->rss));
+	else
+		seq_puts(m, "0\n");
+
+	return 0;
+}
+#endif
+
 /*
  * Thread groups
  */
@@ -3427,6 +3446,9 @@ static const struct pid_entry tgid_base_stuff[] = {
 	ONE("ksm_merging_pages",  S_IRUSR, proc_pid_ksm_merging_pages),
 	ONE("ksm_stat",  S_IRUSR, proc_pid_ksm_stat),
 #endif
+#ifdef CONFIG_DMA_SHARED_BUFFER
+	ONE("dmabuf_rss", S_IRUGO, proc_dmabuf_rss_show),
+#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 64d67293d76b..1647fb38fe80 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -24,6 +24,9 @@
 #include <linux/wait.h>
 #include <linux/workqueue.h>
 #include <linux/android_kabi.h>
+#ifndef __GENKSYMS__
+#include <linux/refcount.h>
+#endif
 
 struct device;
 struct dma_buf;
@@ -639,6 +642,43 @@ struct dma_buf_export_info {
 	ANDROID_KABI_RESERVE(2);
 };
 
+/**
+ * struct task_dma_buf_record - Holds the number of (VMA and FD) references to a
+ * dmabuf by a collection of tasks that share both mm_struct and files_struct.
+ * This is the list entry type for @task_dma_buf_info dmabufs list.
+ *
+ * @node: Stores the list this record is on.
+ * @dmabuf: The dmabuf this record is for.
+ * @refcnt: The number of VMAs and FDs that reference @dmabuf by the tasks that
+ *          share this record.
+ */
+struct task_dma_buf_record {
+	struct list_head node;
+	struct dma_buf *dmabuf;
+	unsigned long refcnt;
+};
+
+/**
+ * struct task_dma_buf_info - Holds a RSS counter, and a list of dmabufs for all
+ * tasks that share both mm_struct and files_struct.
+ *
+ * @rss: The sum of all dmabuf memory referenced by the tasks via memory
+ *       mappings or file descriptors in bytes. Buffers referenced more than
+ *       once by the process (multiple mmaps, multiple FDs, or any combination
+ *       of both mmaps and FDs) only cause the buffer to be accounted to the
+ *       process once. Partial mappings cause the full size of the buffer to be
+ *       accounted, regardless of the size of the mapping.
+ * @refcnt: The number of tasks sharing this struct.
+ * @lock: Lock protecting writes for @rss, and reads/writes for @dmabufs.
+ * @dmabufs: List of all dmabufs referenced by the tasks.
+ */
+struct task_dma_buf_info {
+	s64 rss;
+	refcount_t refcnt;
+	spinlock_t lock;
+	struct list_head dmabufs;
+};
+
 /**
  * DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters
  * @name: export-info name
@@ -741,4 +781,7 @@ int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
 long dma_buf_set_name(struct dma_buf *dmabuf, const char *name);
 int dma_buf_get_flags(struct dma_buf *dmabuf, unsigned long *flags);
+
+int dma_buf_account_task(struct dma_buf *dmabuf, struct task_struct *task);
+void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task);
 #endif /* __DMA_BUF_H__ */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 1299b4497d87..68ba96bde447 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -70,6 +70,7 @@ struct seq_file;
 struct sighand_struct;
 struct signal_struct;
 struct task_delay_info;
+struct task_dma_buf_info;
 struct task_group;
 struct user_event_mm;
 
@@ -1516,6 +1517,9 @@ struct task_struct {
 	 */
 	struct callback_head		l1d_flush_kill;
 #endif
+
+	struct task_dma_buf_info *dmabuf_info;
+
 	ANDROID_KABI_RESERVE(1);
 	ANDROID_KABI_RESERVE(2);
 	ANDROID_KABI_RESERVE(3);
diff --git a/init/init_task.c b/init/init_task.c
index 31ceb0e469f7..d80c007ab59b 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -214,6 +214,7 @@ struct task_struct init_task
 	.android_vendor_data1 = {0, },
 	.android_oem_data1 = {0, },
 #endif
+	.dmabuf_info = NULL,
 };
 EXPORT_SYMBOL(init_task);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 75b1a4458a7e..66636a979911 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -101,6 +101,7 @@
 #include <linux/iommu.h>
 #include <linux/tick.h>
 #include <linux/cpufreq_times.h>
+#include <linux/dma-buf.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -994,12 +995,32 @@ static inline void put_signal_struct(struct signal_struct *sig)
 		free_signal_struct(sig);
 }
 
+static void put_dmabuf_info(struct task_struct *tsk)
+{
+	if (!tsk->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		return;
+	}
+
+	if (!refcount_dec_and_test(&tsk->dmabuf_info->refcnt))
+		return;
+
+	if (READ_ONCE(tsk->dmabuf_info->rss))
+		pr_err("%s destroying task with non-zero dmabuf rss\n", __func__);
+
+	if (!list_empty(&tsk->dmabuf_info->dmabufs))
+		pr_err("%s destroying task with non-empty dmabuf list\n", __func__);
+
+	kfree(tsk->dmabuf_info);
+}
+
 void __put_task_struct(struct task_struct *tsk)
 {
 	WARN_ON(!tsk->exit_state);
 	WARN_ON(refcount_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
+	put_dmabuf_info(tsk);
 	io_uring_free(tsk);
 	cgroup_free(tsk);
 	task_numa_free(tsk, true);
@@ -2268,6 +2289,58 @@ static void rv_task_fork(struct task_struct *p)
 #define rv_task_fork(p) do {} while (0)
 #endif
 
+static int copy_dmabuf_info(u64 clone_flags, struct task_struct *p)
+{
+	struct task_dma_buf_record *rec, *copy;
+
+	if (current->dmabuf_info && (clone_flags & (CLONE_VM | CLONE_FILES))
+						== (CLONE_VM | CLONE_FILES)) {
+		/*
+		 * Both MM and FD references to dmabufs are shared with the parent, so
+		 * we can share a RSS counter with the parent.
+		 */
+		refcount_inc(&current->dmabuf_info->refcnt);
+		p->dmabuf_info = current->dmabuf_info;
+		return 0;
+	}
+
+	p->dmabuf_info = kmalloc(sizeof(*p->dmabuf_info), GFP_KERNEL);
+	if (!p->dmabuf_info)
+		return -ENOMEM;
+
+	refcount_set(&p->dmabuf_info->refcnt, 1);
+	spin_lock_init(&p->dmabuf_info->lock);
+	INIT_LIST_HEAD(&p->dmabuf_info->dmabufs);
+	if (current->dmabuf_info) {
+		spin_lock(&current->dmabuf_info->lock);
+		p->dmabuf_info->rss = current->dmabuf_info->rss;
+		list_for_each_entry(rec, &current->dmabuf_info->dmabufs, node) {
+			copy = kmalloc(sizeof(*copy), GFP_KERNEL);
+			if (!copy) {
+				spin_unlock(&current->dmabuf_info->lock);
+				goto err_list_copy;
+			}
+
+			copy->dmabuf = rec->dmabuf;
+			copy->refcnt = rec->refcnt;
+			list_add(&copy->node, &p->dmabuf_info->dmabufs);
+		}
+		spin_unlock(&current->dmabuf_info->lock);
+	} else {
+		p->dmabuf_info->rss = 0;
+	}
+
+	return 0;
+
+err_list_copy:
+	list_for_each_entry_safe(rec, copy, &p->dmabuf_info->dmabufs, node) {
+		list_del(&rec->node);
+		kfree(rec);
+	}
+	kfree(p->dmabuf_info);
+	return -ENOMEM;
+}
+
 /*
  * This creates a new process as a copy of the old one,
  * but does not actually start it yet.
@@ -2509,14 +2582,18 @@ __latent_entropy struct task_struct *copy_process(
 	p->bpf_ctx = NULL;
 #endif
 
-	/* Perform scheduler related setup. Assign this task to a CPU. */
-	retval = sched_fork(clone_flags, p);
+	retval = copy_dmabuf_info(clone_flags, p);
 	if (retval)
 		goto bad_fork_cleanup_policy;
 
+	/* Perform scheduler related setup. Assign this task to a CPU. */
+	retval = sched_fork(clone_flags, p);
+	if (retval)
+		goto bad_fork_cleanup_dmabuf;
+
 	retval = perf_event_init_task(p, clone_flags);
 	if (retval)
-		goto bad_fork_cleanup_policy;
+		goto bad_fork_cleanup_dmabuf;
 	retval = audit_alloc(p);
 	if (retval)
 		goto bad_fork_cleanup_perf;
@@ -2819,6 +2896,8 @@ bad_fork_cleanup_audit:
 	audit_free(p);
 bad_fork_cleanup_perf:
 	perf_event_free_task(p);
+bad_fork_cleanup_dmabuf:
+	put_dmabuf_info(p);
 bad_fork_cleanup_policy:
 	lockdep_free_task(p);
 #ifdef CONFIG_NUMA
diff --git a/mm/mmap.c b/mm/mmap.c
index 4c74fb3d7a94..6da684ab9f98 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -49,6 +49,7 @@
 #include <linux/oom.h>
 #include <linux/sched/mm.h>
 #include <linux/ksm.h>
+#include <linux/dma-buf.h>
 
 #include <linux/uaccess.h>
 #include <asm/cacheflush.h>
@@ -144,8 +145,11 @@ static void remove_vma(struct vm_area_struct *vma, bool unreachable)
 {
 	might_sleep();
 	vma_close(vma);
-	if (vma->vm_file)
+	if (vma->vm_file) {
+		if (is_dma_buf_file(vma->vm_file))
+			dma_buf_unaccount_task(vma->vm_file->private_data, current);
 		fput(vma->vm_file);
+	}
 	mpol_put(vma_policy(vma));
 	if (unreachable)
 		__vm_area_free(vma);
@@ -2417,8 +2421,14 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (err)
 		goto out_free_mpol;
 
-	if (new->vm_file)
+	if (new->vm_file) {
 		get_file(new->vm_file);
+		if (is_dma_buf_file(new->vm_file)) {
+			/* Should never fail since this task already references the buffer */
+			if (dma_buf_account_task(new->vm_file->private_data, current))
+				pr_err("%s failed to account dmabuf\n", __func__);
+		}
+	}
 
 	if (new->vm_ops && new->vm_ops->open)
 		new->vm_ops->open(new);

From bddab7cf5de4a43346bc8e6803b20738b6d9e1cb Mon Sep 17 00:00:00 2001
From: "T.J. Mercier" <tjmercier@google.com>
Date: Wed, 25 Jun 2025 21:15:34 +0000
Subject: [PATCH 31/49] ANDROID: Track per-process dmabuf RSS HWM

A per-process high watermark counter for dmabuf memory is useful for
detecting bursty / transient allocations causing memory pressure spikes
that don't appear in the dmabuf RSS counter when userspace reacts to
memory pressure and reads RSS after buffers have already been freed.

The /proc/<pid>/dmabuf_rss_hwm file in procfs now reports the maximum
value of /proc/<pid>/dmabuf_rss during the lifetime of the process.

The value of /proc/<pid>/dmabuf_rss_hwm can be reset to the current
value of /proc/<pid>/dmabuf_rss by writing "0" to the file.

Bug: 424648392
Change-Id: I184d83d48ec63b805b712f19e121199a63095965
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 drivers/dma-buf/dma-buf.c |  8 ++++
 fs/proc/base.c            | 77 +++++++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   |  7 +++-
 kernel/fork.c             |  2 +
 4 files changed, 92 insertions(+), 2 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index c8c05d2e112a..7c9ac163d115 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -187,6 +187,14 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
 		return -ENOMEM;
 
 	task->dmabuf_info->rss += dmabuf->size;
+	/*
+	 * task->dmabuf_info->lock protects against concurrent writers, so no
+	 * worries about stale rss_hwm between the read and write, and we don't
+	 * need to cmpxchg here.
+	 */
+	if (task->dmabuf_info->rss > task->dmabuf_info->rss_hwm)
+		task->dmabuf_info->rss_hwm = task->dmabuf_info->rss;
+
 	rec->dmabuf = dmabuf;
 	rec->refcnt = 1;
 	list_add(&rec->node, &task->dmabuf_info->dmabufs);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index f7d8188b0ccf..6b91ddcab7e2 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3321,6 +3321,82 @@ static int proc_dmabuf_rss_show(struct seq_file *m, struct pid_namespace *ns,
 
 	return 0;
 }
+
+static int proc_dmabuf_rss_hwm_show(struct seq_file *m, void *v)
+{
+	struct inode *inode = m->private;
+	struct task_struct *task;
+	int ret = 0;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ESRCH;
+
+	if (!task->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (!(task->flags & PF_KTHREAD))
+		seq_printf(m, "%lld\n", READ_ONCE(task->dmabuf_info->rss_hwm));
+	else
+		seq_puts(m, "0\n");
+
+out:
+	put_task_struct(task);
+
+	return ret;
+}
+
+static int proc_dmabuf_rss_hwm_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, proc_dmabuf_rss_hwm_show, inode);
+}
+
+static ssize_t
+proc_dmabuf_rss_hwm_write(struct file *file, const char __user *buf,
+			  size_t count, loff_t *offset)
+{
+	struct inode *inode = file_inode(file);
+	struct task_struct *task;
+	unsigned long long val;
+	int ret;
+
+	ret = kstrtoull_from_user(buf, count, 10, &val);
+	if (ret)
+		return ret;
+
+	if (val != 0)
+		return -EINVAL;
+
+	task = get_proc_task(inode);
+	if (!task)
+		return -ESRCH;
+
+	if (!task->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	spin_lock(&task->dmabuf_info->lock);
+	task->dmabuf_info->rss_hwm = task->dmabuf_info->rss;
+	spin_unlock(&task->dmabuf_info->lock);
+
+out:
+	put_task_struct(task);
+
+	return ret < 0 ? ret : count;
+}
+
+static const struct file_operations proc_dmabuf_rss_hwm_operations = {
+	.open		= proc_dmabuf_rss_hwm_open,
+	.write		= proc_dmabuf_rss_hwm_write,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
 #endif
 
 /*
@@ -3448,6 +3524,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 #endif
 #ifdef CONFIG_DMA_SHARED_BUFFER
 	ONE("dmabuf_rss", S_IRUGO, proc_dmabuf_rss_show),
+	REG("dmabuf_rss_hwm", S_IRUGO|S_IWUSR, proc_dmabuf_rss_hwm_operations),
 #endif
 };
 
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 1647fb38fe80..a362c8ba7a21 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -659,8 +659,8 @@ struct task_dma_buf_record {
 };
 
 /**
- * struct task_dma_buf_info - Holds a RSS counter, and a list of dmabufs for all
- * tasks that share both mm_struct and files_struct.
+ * struct task_dma_buf_info - Holds RSS and RSS HWM counters, and a list of
+ * dmabufs for all tasks that share both mm_struct and files_struct.
  *
  * @rss: The sum of all dmabuf memory referenced by the tasks via memory
  *       mappings or file descriptors in bytes. Buffers referenced more than
@@ -668,12 +668,15 @@ struct task_dma_buf_record {
  *       of both mmaps and FDs) only cause the buffer to be accounted to the
  *       process once. Partial mappings cause the full size of the buffer to be
  *       accounted, regardless of the size of the mapping.
+ * @rss_hwm: The maximum value of @rss over the lifetime of this struct. (Unless,
+ *           reset by userspace.)
  * @refcnt: The number of tasks sharing this struct.
  * @lock: Lock protecting writes for @rss, and reads/writes for @dmabufs.
  * @dmabufs: List of all dmabufs referenced by the tasks.
  */
 struct task_dma_buf_info {
 	s64 rss;
+	s64 rss_hwm;
 	refcount_t refcnt;
 	spinlock_t lock;
 	struct list_head dmabufs;
diff --git a/kernel/fork.c b/kernel/fork.c
index 66636a979911..e1d7d244d43a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2314,6 +2314,7 @@ static int copy_dmabuf_info(u64 clone_flags, struct task_struct *p)
 	if (current->dmabuf_info) {
 		spin_lock(&current->dmabuf_info->lock);
 		p->dmabuf_info->rss = current->dmabuf_info->rss;
+		p->dmabuf_info->rss_hwm = current->dmabuf_info->rss;
 		list_for_each_entry(rec, &current->dmabuf_info->dmabufs, node) {
 			copy = kmalloc(sizeof(*copy), GFP_KERNEL);
 			if (!copy) {
@@ -2328,6 +2329,7 @@ static int copy_dmabuf_info(u64 clone_flags, struct task_struct *p)
 		spin_unlock(&current->dmabuf_info->lock);
 	} else {
 		p->dmabuf_info->rss = 0;
+		p->dmabuf_info->rss_hwm = 0;
 	}
 
 	return 0;

From 0bf76c5311039778424e741617a0dd14b77f1763 Mon Sep 17 00:00:00 2001
From: "T.J. Mercier" <tjmercier@google.com>
Date: Tue, 8 Jul 2025 22:58:14 +0000
Subject: [PATCH 32/49] ANDROID: Track per-process dmabuf PSS

DMA buffers exist for sharing memory, so dividing a buffer's size by the
number of processes with references to it to obtain proportional set
size is a useful metric for understanding an individual process's share
of system-wide dmabuf memory.

Dmabuf memory is not guaranteed to be representable by struct pages, and
a process may hold only file descriptor references to a buffer. So PSS
cannot be calculated on a per-page basis, and PSS accounting is always
performed in units of the full buffer size, and only once for each
process regardless of the number and type of references a process has
for a single buffer.

The /proc/<pid>/dmabuf_pss file in procfs now reports the sum of all
buffer PSS values referenced by a process. The units are bytes. This
allows userspace to obtain per-process dmabuf accounting information
quickly compared to calculating it from multiple sources in procfs and
sysfs.

Note that a dmabuf can be backed by different types of memory such as
system DRAM, GPU VRAM, or others. This patch makes no distinction
between these different types of memory, so on systems with non-unified
memory the reported values should be interpreted with this in mind.

Bug: 424648392
Change-Id: I8ec370b0d7fd37e69f677c6f580940c89cc03a42
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 drivers/dma-buf/dma-buf.c |  8 ++++++++
 fs/proc/base.c            | 34 ++++++++++++++++++++++++++++++++++
 include/linux/dma-buf.h   |  8 ++++++++
 3 files changed, 50 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 7c9ac163d115..cb91dadeb465 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -115,6 +115,9 @@ static void dma_buf_release(struct dentry *dentry)
 	if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
 		dma_resv_fini(dmabuf->resv);
 
+	if (atomic64_read(&dmabuf->num_unique_refs))
+		pr_err("destroying dmabuf with non-zero task refs\n");
+
 	WARN_ON(!list_empty(&dmabuf->attachments));
 	module_put(dmabuf->owner);
 	kfree(dmabuf->name);
@@ -199,6 +202,8 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
 	rec->refcnt = 1;
 	list_add(&rec->node, &task->dmabuf_info->dmabufs);
 
+	atomic64_inc(&dmabuf->num_unique_refs);
+
 	return 0;
 }
 
@@ -276,6 +281,7 @@ void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 		list_del(&rec->node);
 		kfree(rec);
 		task->dmabuf_info->rss -= dmabuf->size;
+		atomic64_dec(&dmabuf->num_unique_refs);
 	}
 err:
 	spin_unlock(&task->dmabuf_info->lock);
@@ -851,6 +857,8 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 		dmabuf->resv = resv;
 	}
 
+	atomic64_set(&dmabuf->num_unique_refs, 0);
+
 	file->private_data = dmabuf;
 	file->f_path.dentry->d_fsdata = dmabuf;
 	dmabuf->file = file;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6b91ddcab7e2..2eee67e06ffe 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3397,6 +3397,39 @@ static const struct file_operations proc_dmabuf_rss_hwm_operations = {
 	.llseek		= seq_lseek,
 	.release	= single_release,
 };
+
+static int proc_dmabuf_pss_show(struct seq_file *m, struct pid_namespace *ns,
+		     struct pid *pid, struct task_struct *task)
+{
+	struct task_dma_buf_record *rec;
+	u64 pss = 0;
+
+	if (!task->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		return -ENOMEM;
+	}
+
+	if (!(task->flags & PF_KTHREAD)) {
+		spin_lock(&task->dmabuf_info->lock);
+		list_for_each_entry(rec, &task->dmabuf_info->dmabufs, node) {
+			s64 refs = atomic64_read(&rec->dmabuf->num_unique_refs);
+
+			if (refs <= 0) {
+				pr_err("dmabuf has <= refs %lld\n", refs);
+				continue;
+			}
+
+			pss += rec->dmabuf->size / (size_t)refs;
+		}
+		spin_unlock(&task->dmabuf_info->lock);
+
+		seq_printf(m, "%llu\n", pss);
+	} else {
+		seq_puts(m, "0\n");
+	}
+
+	return 0;
+}
 #endif
 
 /*
@@ -3525,6 +3558,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_DMA_SHARED_BUFFER
 	ONE("dmabuf_rss", S_IRUGO, proc_dmabuf_rss_show),
 	REG("dmabuf_rss_hwm", S_IRUGO|S_IWUSR, proc_dmabuf_rss_hwm_operations),
+	ONE("dmabuf_pss", S_IRUGO, proc_dmabuf_pss_show),
 #endif
 };
 
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index a362c8ba7a21..267bf322272f 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -25,6 +25,7 @@
 #include <linux/workqueue.h>
 #include <linux/android_kabi.h>
 #ifndef __GENKSYMS__
+#include <linux/atomic.h>
 #include <linux/refcount.h>
 #endif
 
@@ -534,6 +535,13 @@ struct dma_buf {
 	} *sysfs_entry;
 #endif
 
+	/**
+	 * @num_unique_refs:
+	 *
+	 * The number of tasks that reference this buffer. For calculating PSS.
+	 */
+	atomic64_t num_unique_refs;
+
 	ANDROID_KABI_RESERVE(1);
 	ANDROID_KABI_RESERVE(2);
 };

From 59af12872db84137ca14525d864249b32a0ceebb Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb@google.com>
Date: Thu, 3 Jul 2025 20:22:25 +0000
Subject: [PATCH 33/49] ANDROID: fixup task_struct to avoid ABI breakage

Reuse task_struct.worker_private to store task_dma_buf_info pointer and
avoid adding new task_struct members that would lead to ABI breakage.
This aliasing works because task_struct.worker_private is used only for
kthreads and io_workers which task_dma_buf_info is used for user tasks.

Bug: 424648392
Change-Id: I2caa708d8a729095b308932c1b35c3157835639b
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/dma-buf/dma-buf.c |  69 ++++++++++++++++++-------
 fs/proc/base.c            | 104 +++++++++++++++++++++++---------------
 include/linux/dma-buf.h   |  22 ++++++++
 include/linux/sched.h     |   4 +-
 init/init_task.c          |   2 +-
 kernel/fork.c             |  69 +++++++++++++++----------
 6 files changed, 180 insertions(+), 90 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cb91dadeb465..5b3e3fdc1599 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -168,11 +168,21 @@ static struct file_system_type dma_buf_fs_type = {
 static struct task_dma_buf_record *find_task_dmabuf_record(
 		struct task_struct *task, struct dma_buf *dmabuf)
 {
+	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(task);
 	struct task_dma_buf_record *rec;
 
-	lockdep_assert_held(&task->dmabuf_info->lock);
+	if (!dmabuf_info)
+		return NULL;
 
-	list_for_each_entry(rec, &task->dmabuf_info->dmabufs, node)
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
+		return NULL;
+	}
+
+	lockdep_assert_held(&dmabuf_info->lock);
+
+	list_for_each_entry(rec, &dmabuf_info->dmabufs, node)
 		if (dmabuf == rec->dmabuf)
 			return rec;
 
@@ -181,26 +191,36 @@ static struct task_dma_buf_record *find_task_dmabuf_record(
 
 static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmabuf)
 {
+	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(task);
 	struct task_dma_buf_record *rec;
 
-	lockdep_assert_held(&task->dmabuf_info->lock);
+	if (!dmabuf_info)
+		return 0;
+
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
+		return PTR_ERR(dmabuf_info);
+	}
+
+	lockdep_assert_held(&dmabuf_info->lock);
 
 	rec = kmalloc(sizeof(*rec), GFP_KERNEL);
 	if (!rec)
 		return -ENOMEM;
 
-	task->dmabuf_info->rss += dmabuf->size;
+	dmabuf_info->rss += dmabuf->size;
 	/*
-	 * task->dmabuf_info->lock protects against concurrent writers, so no
+	 * dmabuf_info->lock protects against concurrent writers, so no
 	 * worries about stale rss_hwm between the read and write, and we don't
 	 * need to cmpxchg here.
 	 */
-	if (task->dmabuf_info->rss > task->dmabuf_info->rss_hwm)
-		task->dmabuf_info->rss_hwm = task->dmabuf_info->rss;
+	if (dmabuf_info->rss > dmabuf_info->rss_hwm)
+		dmabuf_info->rss_hwm = dmabuf_info->rss;
 
 	rec->dmabuf = dmabuf;
 	rec->refcnt = 1;
-	list_add(&rec->node, &task->dmabuf_info->dmabufs);
+	list_add(&rec->node, &dmabuf_info->dmabufs);
 
 	atomic64_inc(&dmabuf->num_unique_refs);
 
@@ -222,24 +242,30 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
  */
 int dma_buf_account_task(struct dma_buf *dmabuf, struct task_struct *task)
 {
+	struct task_dma_buf_info *dmabuf_info;
 	struct task_dma_buf_record *rec;
 	int ret = 0;
 
 	if (!dmabuf || !task)
 		return -EINVAL;
 
-	if (!task->dmabuf_info) {
-	    pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-	    return -ENOMEM;
+	dmabuf_info = get_task_dma_buf_info(task);
+	if (!dmabuf_info)
+		return 0;
+
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
+		return PTR_ERR(dmabuf_info);
 	}
 
-	spin_lock(&task->dmabuf_info->lock);
+	spin_lock(&dmabuf_info->lock);
 	rec = find_task_dmabuf_record(task, dmabuf);
 	if (!rec)
 		ret = new_task_dmabuf_record(task, dmabuf);
 	else
 		++rec->refcnt;
-	spin_unlock(&task->dmabuf_info->lock);
+	spin_unlock(&dmabuf_info->lock);
 
 	return ret;
 }
@@ -260,17 +286,22 @@ int dma_buf_account_task(struct dma_buf *dmabuf, struct task_struct *task)
  */
 void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 {
+	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(task);
 	struct task_dma_buf_record *rec;
 
 	if (!dmabuf || !task)
 		return;
 
-	if (!task->dmabuf_info) {
-	    pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-	    return;
+	if (!dmabuf_info)
+		return;
+
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
+		return;
 	}
 
-	spin_lock(&task->dmabuf_info->lock);
+	spin_lock(&dmabuf_info->lock);
 	rec = find_task_dmabuf_record(task, dmabuf);
 	if (!rec) { /* Failed fd_install? */
 		pr_err("dmabuf not found in task list\n");
@@ -280,11 +311,11 @@ void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 	if (--rec->refcnt == 0) {
 		list_del(&rec->node);
 		kfree(rec);
-		task->dmabuf_info->rss -= dmabuf->size;
+		dmabuf_info->rss -= dmabuf->size;
 		atomic64_dec(&dmabuf->num_unique_refs);
 	}
 err:
-	spin_unlock(&task->dmabuf_info->lock);
+	spin_unlock(&dmabuf_info->lock);
 }
 
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2eee67e06ffe..0a3f28f7f1d9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3309,21 +3309,27 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
 static int proc_dmabuf_rss_show(struct seq_file *m, struct pid_namespace *ns,
 		     struct pid *pid, struct task_struct *task)
 {
-	if (!task->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		return -ENOMEM;
+	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(task);
+
+	if (!dmabuf_info) {
+		seq_puts(m, "0\n");
+		return 0;
 	}
 
-	if (!(task->flags & PF_KTHREAD))
-		seq_printf(m, "%lld\n", READ_ONCE(task->dmabuf_info->rss));
-	else
-		seq_puts(m, "0\n");
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
+		return PTR_ERR(dmabuf_info);
+	}
+
+	seq_printf(m, "%lld\n", READ_ONCE(dmabuf_info->rss));
 
 	return 0;
 }
 
 static int proc_dmabuf_rss_hwm_show(struct seq_file *m, void *v)
 {
+	struct task_dma_buf_info *dmabuf_info;
 	struct inode *inode = m->private;
 	struct task_struct *task;
 	int ret = 0;
@@ -3332,16 +3338,20 @@ static int proc_dmabuf_rss_hwm_show(struct seq_file *m, void *v)
 	if (!task)
 		return -ESRCH;
 
-	if (!task->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		ret = -ENOMEM;
+	dmabuf_info = get_task_dma_buf_info(task);
+	if (!dmabuf_info) {
+		seq_puts(m, "0\n");
 		goto out;
 	}
 
-	if (!(task->flags & PF_KTHREAD))
-		seq_printf(m, "%lld\n", READ_ONCE(task->dmabuf_info->rss_hwm));
-	else
-		seq_puts(m, "0\n");
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
+		ret = PTR_ERR(dmabuf_info);
+		goto out;
+	}
+
+	seq_printf(m, "%lld\n", READ_ONCE(dmabuf_info->rss_hwm));
 
 out:
 	put_task_struct(task);
@@ -3358,6 +3368,7 @@ static ssize_t
 proc_dmabuf_rss_hwm_write(struct file *file, const char __user *buf,
 			  size_t count, loff_t *offset)
 {
+	struct task_dma_buf_info *dmabuf_info;
 	struct inode *inode = file_inode(file);
 	struct task_struct *task;
 	unsigned long long val;
@@ -3374,15 +3385,22 @@ proc_dmabuf_rss_hwm_write(struct file *file, const char __user *buf,
 	if (!task)
 		return -ESRCH;
 
-	if (!task->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		ret = -ENOMEM;
+	dmabuf_info = get_task_dma_buf_info(task);
+	if (!dmabuf_info) {
+		ret = -EINVAL;
 		goto out;
 	}
 
-	spin_lock(&task->dmabuf_info->lock);
-	task->dmabuf_info->rss_hwm = task->dmabuf_info->rss;
-	spin_unlock(&task->dmabuf_info->lock);
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
+		ret = PTR_ERR(dmabuf_info);
+		goto out;
+	}
+
+	spin_lock(&dmabuf_info->lock);
+	dmabuf_info->rss_hwm = dmabuf_info->rss;
+	spin_unlock(&dmabuf_info->lock);
 
 out:
 	put_task_struct(task);
@@ -3401,33 +3419,37 @@ static const struct file_operations proc_dmabuf_rss_hwm_operations = {
 static int proc_dmabuf_pss_show(struct seq_file *m, struct pid_namespace *ns,
 		     struct pid *pid, struct task_struct *task)
 {
+	struct task_dma_buf_info *dmabuf_info;
 	struct task_dma_buf_record *rec;
 	u64 pss = 0;
 
-	if (!task->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		return -ENOMEM;
-	}
-
-	if (!(task->flags & PF_KTHREAD)) {
-		spin_lock(&task->dmabuf_info->lock);
-		list_for_each_entry(rec, &task->dmabuf_info->dmabufs, node) {
-			s64 refs = atomic64_read(&rec->dmabuf->num_unique_refs);
-
-			if (refs <= 0) {
-				pr_err("dmabuf has <= refs %lld\n", refs);
-				continue;
-			}
-
-			pss += rec->dmabuf->size / (size_t)refs;
-		}
-		spin_unlock(&task->dmabuf_info->lock);
-
-		seq_printf(m, "%llu\n", pss);
-	} else {
+	dmabuf_info = get_task_dma_buf_info(task);
+	if (!dmabuf_info) {
 		seq_puts(m, "0\n");
+		return 0;
 	}
 
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
+		return PTR_ERR(dmabuf_info);
+	}
+
+	spin_lock(&dmabuf_info->lock);
+	list_for_each_entry(rec, &dmabuf_info->dmabufs, node) {
+		s64 refs = atomic64_read(&rec->dmabuf->num_unique_refs);
+
+		if (refs <= 0) {
+			pr_err("dmabuf has <= refs %lld\n", refs);
+			continue;
+		}
+
+		pss += rec->dmabuf->size / (size_t)refs;
+	}
+	spin_unlock(&dmabuf_info->lock);
+
+	seq_printf(m, "%llu\n", pss);
+
 	return 0;
 }
 #endif
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 267bf322272f..654085da8bc4 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -690,6 +690,28 @@ struct task_dma_buf_info {
 	struct list_head dmabufs;
 };
 
+static inline bool task_has_dma_buf_info(struct task_struct *task)
+{
+	return (task->flags & (PF_KTHREAD | PF_IO_WORKER)) == 0;
+}
+
+extern struct task_struct init_task;
+
+static inline
+struct task_dma_buf_info *get_task_dma_buf_info(struct task_struct *task)
+{
+	if (!task)
+		return ERR_PTR(-EINVAL);
+
+	if (!task_has_dma_buf_info(task))
+		return NULL;
+
+	if (!task->worker_private)
+		return ERR_PTR(-ENOMEM);
+
+	return (struct task_dma_buf_info *)task->worker_private;
+}
+
 /**
  * DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters
  * @name: export-info name
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68ba96bde447..3cff2446536d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1003,6 +1003,7 @@ struct task_struct {
 	int __user			*clear_child_tid;
 
 	/* PF_KTHREAD | PF_IO_WORKER */
+	/* Otherwise used as task_dma_buf_info pointer */
 	void				*worker_private;
 
 	u64				utime;
@@ -1517,9 +1518,6 @@ struct task_struct {
 	 */
 	struct callback_head		l1d_flush_kill;
 #endif
-
-	struct task_dma_buf_info *dmabuf_info;
-
 	ANDROID_KABI_RESERVE(1);
 	ANDROID_KABI_RESERVE(2);
 	ANDROID_KABI_RESERVE(3);
diff --git a/init/init_task.c b/init/init_task.c
index d80c007ab59b..1903a2abde55 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -214,7 +214,7 @@ struct task_struct init_task
 	.android_vendor_data1 = {0, },
 	.android_oem_data1 = {0, },
 #endif
-	.dmabuf_info = NULL,
+	.worker_private = NULL,
 };
 EXPORT_SYMBOL(init_task);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index e1d7d244d43a..9c71a69e0d17 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -997,21 +997,27 @@ static inline void put_signal_struct(struct signal_struct *sig)
 
 static void put_dmabuf_info(struct task_struct *tsk)
 {
-	if (!tsk->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(tsk);
+
+	if (!dmabuf_info)
+		return;
+
+	if (IS_ERR(dmabuf_info)) {
+		pr_err("%s dmabuf accounting record is missing, error %ld\n",
+			__func__, PTR_ERR(dmabuf_info));
 		return;
 	}
 
-	if (!refcount_dec_and_test(&tsk->dmabuf_info->refcnt))
+	if (!refcount_dec_and_test(&dmabuf_info->refcnt))
 		return;
 
-	if (READ_ONCE(tsk->dmabuf_info->rss))
+	if (READ_ONCE(dmabuf_info->rss))
 		pr_err("%s destroying task with non-zero dmabuf rss\n", __func__);
 
-	if (!list_empty(&tsk->dmabuf_info->dmabufs))
+	if (!list_empty(&dmabuf_info->dmabufs))
 		pr_err("%s destroying task with non-empty dmabuf list\n", __func__);
 
-	kfree(tsk->dmabuf_info);
+	kfree(dmabuf_info);
 }
 
 void __put_task_struct(struct task_struct *tsk)
@@ -2291,55 +2297,66 @@ static void rv_task_fork(struct task_struct *p)
 
 static int copy_dmabuf_info(u64 clone_flags, struct task_struct *p)
 {
+	struct task_dma_buf_info *new_dmabuf_info;
+	struct task_dma_buf_info *dmabuf_info;
 	struct task_dma_buf_record *rec, *copy;
 
-	if (current->dmabuf_info && (clone_flags & (CLONE_VM | CLONE_FILES))
+	if (!task_has_dma_buf_info(p))
+		return 0; /* Task is not supposed to have dmabuf_info */
+
+	dmabuf_info = get_task_dma_buf_info(current);
+	/* Original might not have dmabuf_info and that's fine */
+	if (IS_ERR(dmabuf_info))
+		dmabuf_info = NULL;
+
+	if (dmabuf_info && (clone_flags & (CLONE_VM | CLONE_FILES))
 						== (CLONE_VM | CLONE_FILES)) {
 		/*
 		 * Both MM and FD references to dmabufs are shared with the parent, so
 		 * we can share a RSS counter with the parent.
 		 */
-		refcount_inc(&current->dmabuf_info->refcnt);
-		p->dmabuf_info = current->dmabuf_info;
+		refcount_inc(&dmabuf_info->refcnt);
+		p->worker_private = dmabuf_info;
 		return 0;
 	}
 
-	p->dmabuf_info = kmalloc(sizeof(*p->dmabuf_info), GFP_KERNEL);
-	if (!p->dmabuf_info)
+	new_dmabuf_info = kmalloc(sizeof(*new_dmabuf_info), GFP_KERNEL);
+	if (!new_dmabuf_info)
 		return -ENOMEM;
 
-	refcount_set(&p->dmabuf_info->refcnt, 1);
-	spin_lock_init(&p->dmabuf_info->lock);
-	INIT_LIST_HEAD(&p->dmabuf_info->dmabufs);
-	if (current->dmabuf_info) {
-		spin_lock(&current->dmabuf_info->lock);
-		p->dmabuf_info->rss = current->dmabuf_info->rss;
-		p->dmabuf_info->rss_hwm = current->dmabuf_info->rss;
-		list_for_each_entry(rec, &current->dmabuf_info->dmabufs, node) {
+	refcount_set(&new_dmabuf_info->refcnt, 1);
+	spin_lock_init(&new_dmabuf_info->lock);
+	INIT_LIST_HEAD(&new_dmabuf_info->dmabufs);
+	if (dmabuf_info) {
+		spin_lock(&dmabuf_info->lock);
+		new_dmabuf_info->rss = dmabuf_info->rss;
+		new_dmabuf_info->rss_hwm = dmabuf_info->rss;
+		list_for_each_entry(rec, &dmabuf_info->dmabufs, node) {
 			copy = kmalloc(sizeof(*copy), GFP_KERNEL);
 			if (!copy) {
-				spin_unlock(&current->dmabuf_info->lock);
+				spin_unlock(&dmabuf_info->lock);
 				goto err_list_copy;
 			}
 
 			copy->dmabuf = rec->dmabuf;
 			copy->refcnt = rec->refcnt;
-			list_add(&copy->node, &p->dmabuf_info->dmabufs);
+			list_add(&copy->node, &new_dmabuf_info->dmabufs);
 		}
-		spin_unlock(&current->dmabuf_info->lock);
+		spin_unlock(&dmabuf_info->lock);
 	} else {
-		p->dmabuf_info->rss = 0;
-		p->dmabuf_info->rss_hwm = 0;
+		new_dmabuf_info->rss = 0;
+		new_dmabuf_info->rss_hwm = 0;
 	}
+	p->worker_private = new_dmabuf_info;
 
 	return 0;
 
 err_list_copy:
-	list_for_each_entry_safe(rec, copy, &p->dmabuf_info->dmabufs, node) {
+	list_for_each_entry_safe(rec, copy, &new_dmabuf_info->dmabufs, node) {
 		list_del(&rec->node);
 		kfree(rec);
 	}
-	kfree(p->dmabuf_info);
+	kfree(new_dmabuf_info);
 	return -ENOMEM;
 }
 

From e9f7ac1c2533c2a075cf5c6a9c550bb076110ff4 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb@google.com>
Date: Thu, 3 Jul 2025 22:25:54 +0000
Subject: [PATCH 34/49] ANDROID: fixup dma_buf struct to avoid ABI breakage

Wrap dma_buf into dma_buf_ext object containing additional num_unique_refs
field required for dmabuf PSS accounting.

Bug: 424648392
Change-Id: I3929ec2cf7cda2626452b5c80949aecefec900e6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/dma-buf/dma-buf.c | 22 ++++++++++++----------
 fs/proc/base.c            |  2 +-
 include/linux/dma-buf.h   | 17 +++++++++++++++--
 3 files changed, 28 insertions(+), 13 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 5b3e3fdc1599..71065b03012a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -93,6 +93,7 @@ static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
 
 static void dma_buf_release(struct dentry *dentry)
 {
+	struct dma_buf_ext *dmabuf_ext;
 	struct dma_buf *dmabuf;
 
 	dmabuf = dentry->d_fsdata;
@@ -115,13 +116,13 @@ static void dma_buf_release(struct dentry *dentry)
 	if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
 		dma_resv_fini(dmabuf->resv);
 
-	if (atomic64_read(&dmabuf->num_unique_refs))
+	dmabuf_ext = get_dmabuf_ext(dmabuf);
+	if (atomic64_read(&dmabuf_ext->num_unique_refs))
 		pr_err("destroying dmabuf with non-zero task refs\n");
-
 	WARN_ON(!list_empty(&dmabuf->attachments));
 	module_put(dmabuf->owner);
 	kfree(dmabuf->name);
-	kfree(dmabuf);
+	kfree(dmabuf_ext);
 }
 
 static int dma_buf_file_release(struct inode *inode, struct file *file)
@@ -221,8 +222,7 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
 	rec->dmabuf = dmabuf;
 	rec->refcnt = 1;
 	list_add(&rec->node, &dmabuf_info->dmabufs);
-
-	atomic64_inc(&dmabuf->num_unique_refs);
+	atomic64_inc(&get_dmabuf_ext(dmabuf)->num_unique_refs);
 
 	return 0;
 }
@@ -312,7 +312,7 @@ void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 		list_del(&rec->node);
 		kfree(rec);
 		dmabuf_info->rss -= dmabuf->size;
-		atomic64_dec(&dmabuf->num_unique_refs);
+		atomic64_dec(&get_dmabuf_ext(dmabuf)->num_unique_refs);
 	}
 err:
 	spin_unlock(&dmabuf_info->lock);
@@ -831,10 +831,11 @@ err_alloc_file:
  */
 struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 {
+	struct dma_buf_ext *dmabuf_ext;
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv = exp_info->resv;
 	struct file *file;
-	size_t alloc_size = sizeof(struct dma_buf);
+	size_t alloc_size = sizeof(struct dma_buf_ext);
 	int ret;
 
 	if (WARN_ON(!exp_info->priv || !exp_info->ops
@@ -864,12 +865,13 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	else
 		/* prevent &dma_buf[1] == dma_buf->resv */
 		alloc_size += 1;
-	dmabuf = kzalloc(alloc_size, GFP_KERNEL);
-	if (!dmabuf) {
+	dmabuf_ext = kzalloc(alloc_size, GFP_KERNEL);
+	if (!dmabuf_ext) {
 		ret = -ENOMEM;
 		goto err_file;
 	}
 
+	dmabuf = &dmabuf_ext->dmabuf;
 	dmabuf->priv = exp_info->priv;
 	dmabuf->ops = exp_info->ops;
 	dmabuf->size = exp_info->size;
@@ -888,7 +890,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 		dmabuf->resv = resv;
 	}
 
-	atomic64_set(&dmabuf->num_unique_refs, 0);
+	atomic64_set(&dmabuf_ext->num_unique_refs, 0);
 
 	file->private_data = dmabuf;
 	file->f_path.dentry->d_fsdata = dmabuf;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0a3f28f7f1d9..3d78cd1286a5 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3437,7 +3437,7 @@ static int proc_dmabuf_pss_show(struct seq_file *m, struct pid_namespace *ns,
 
 	spin_lock(&dmabuf_info->lock);
 	list_for_each_entry(rec, &dmabuf_info->dmabufs, node) {
-		s64 refs = atomic64_read(&rec->dmabuf->num_unique_refs);
+		s64 refs = atomic64_read(&get_dmabuf_ext(rec->dmabuf)->num_unique_refs);
 
 		if (refs <= 0) {
 			pr_err("dmabuf has <= refs %lld\n", refs);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 654085da8bc4..d9487fb2e549 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -535,6 +535,11 @@ struct dma_buf {
 	} *sysfs_entry;
 #endif
 
+	ANDROID_KABI_RESERVE(1);
+	ANDROID_KABI_RESERVE(2);
+};
+
+struct dma_buf_ext {
 	/**
 	 * @num_unique_refs:
 	 *
@@ -542,10 +547,18 @@ struct dma_buf {
 	 */
 	atomic64_t num_unique_refs;
 
-	ANDROID_KABI_RESERVE(1);
-	ANDROID_KABI_RESERVE(2);
+	/*
+	 * dma_buf can have a reservation object after it, so keep this member
+	 * at the end of this structure.
+	 */
+	struct dma_buf dmabuf;
 };
 
+static inline struct dma_buf_ext *get_dmabuf_ext(struct dma_buf *dmabuf)
+{
+	return container_of(dmabuf, struct dma_buf_ext, dmabuf);
+}
+
 /**
  * struct dma_buf_attach_ops - importer operations for an attachment
  *

From c8fdc081cfa165aaa5bd87979d33b419499574cf Mon Sep 17 00:00:00 2001
From: "T.J. Mercier" <tjmercier@google.com>
Date: Tue, 1 Jul 2025 00:51:35 +0000
Subject: [PATCH 35/49] ANDROID: Add dmabuf RSS trace event

Dmabuf RSS is associated with a task, or group of tasks sharing the same
mm_struct and files_struct. Any time the RSS counter is modified for a
task, or group of tasks, emit a trace event with the current value of
the dmabuf RSS counter.

This allows for fast tracking of per-process dmabuf RSS by userspace
analysis tools like Perfetto, compared to periodically obtaining
per-process dmabuf RSS from procfs.

Bug: 424646615
Change-Id: I74434dddacc342918cb52b1b9e2fa6679e332764
Signed-off-by: T.J. Mercier <tjmercier@google.com>
---
 drivers/dma-buf/dma-buf.c   |  5 +++++
 include/trace/events/kmem.h | 25 +++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 71065b03012a..35bea250e08d 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -31,6 +31,9 @@
 #include <uapi/linux/dma-buf.h>
 #include <uapi/linux/magic.h>
 
+#ifndef __GENKSYMS__
+#include <trace/events/kmem.h>
+#endif
 #include <trace/hooks/dmabuf.h>
 
 #include "dma-buf-sysfs-stats.h"
@@ -211,6 +214,7 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
 		return -ENOMEM;
 
 	dmabuf_info->rss += dmabuf->size;
+	trace_dmabuf_rss_stat(dmabuf_info->rss, dmabuf->size, dmabuf);
 	/*
 	 * dmabuf_info->lock protects against concurrent writers, so no
 	 * worries about stale rss_hwm between the read and write, and we don't
@@ -312,6 +316,7 @@ void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 		list_del(&rec->node);
 		kfree(rec);
 		dmabuf_info->rss -= dmabuf->size;
+		trace_dmabuf_rss_stat(dmabuf_info->rss, -dmabuf->size, dmabuf);
 		atomic64_dec(&get_dmabuf_ext(dmabuf)->num_unique_refs);
 	}
 err:
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 68f5280a41a4..896f8de946d0 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -8,6 +8,7 @@
 #include <linux/types.h>
 #include <linux/tracepoint.h>
 #include <trace/events/mmflags.h>
+#include <linux/dma-buf.h>
 
 TRACE_EVENT(kmem_cache_alloc,
 
@@ -487,6 +488,30 @@ TRACE_EVENT(rss_stat,
 		__print_symbolic(__entry->member, TRACE_MM_PAGES),
 		__entry->size)
 	);
+
+TRACE_EVENT(dmabuf_rss_stat,
+
+	TP_PROTO(size_t rss, ssize_t rss_delta, struct dma_buf *dmabuf),
+
+	TP_ARGS(rss, rss_delta, dmabuf),
+
+	TP_STRUCT__entry(
+		__field(size_t, rss)
+		__field(ssize_t, rss_delta)
+		__field(unsigned long, i_ino)
+	),
+
+	TP_fast_assign(
+		__entry->rss = rss;
+		__entry->rss_delta = rss_delta;
+		__entry->i_ino = file_inode(dmabuf->file)->i_ino;
+	),
+
+	TP_printk("rss=%zu delta=%zd i_ino=%lu",
+		__entry->rss,
+		__entry->rss_delta,
+		__entry->i_ino)
+	);
 #endif /* _TRACE_KMEM_H */
 
 /* This part must be outside protection */

From a9597c7b32ec09bdaf13909f77b203e77b4fbb69 Mon Sep 17 00:00:00 2001
From: "qinglin.li" <qinglin.li@amlogic.com>
Date: Wed, 9 Jul 2025 16:25:29 +0800
Subject: [PATCH 36/49] ANDROID: GKI: Update symbol list for Amlogic

1 function symbol(s) added
  'int snd_soc_get_dai_name(const struct of_phandle_args*, const char**)'

Bug: 430463604
Change-Id: I282e456c2b5ff44cb309fdb27faeb115ee9c2d9d
Signed-off-by: Qinglin Li <qinglin.li@amlogic.com>
---
 android/abi_gki_aarch64.stg     | 16 +++++++++++
 android/abi_gki_aarch64_amlogic | 49 +++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/android/abi_gki_aarch64.stg b/android/abi_gki_aarch64.stg
index 679394de3bae..43a9176d2de6 100644
--- a/android/abi_gki_aarch64.stg
+++ b/android/abi_gki_aarch64.stg
@@ -329005,6 +329005,12 @@ function {
   parameter_id: 0x391f15ea
   parameter_id: 0xf435685e
 }
+function {
+  id: 0x9294d8c1
+  return_type_id: 0x6720d32f
+  parameter_id: 0x3c01aef6
+  parameter_id: 0x051414e1
+}
 function {
   id: 0x92956fd0
   return_type_id: 0x6720d32f
@@ -422737,6 +422743,15 @@ elf_symbol {
   type_id: 0x909c23c2
   full_name: "snd_soc_get_dai_id"
 }
+elf_symbol {
+  id: 0x4086fab0
+  name: "snd_soc_get_dai_name"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0x347721f4
+  type_id: 0x9294d8c1
+  full_name: "snd_soc_get_dai_name"
+}
 elf_symbol {
   id: 0xa64c7fe5
   name: "snd_soc_get_dai_via_args"
@@ -443980,6 +443995,7 @@ interface {
   symbol_id: 0x7918ef41
   symbol_id: 0x97843792
   symbol_id: 0x54622a57
+  symbol_id: 0x4086fab0
   symbol_id: 0xa64c7fe5
   symbol_id: 0x5eb2e502
   symbol_id: 0x33a917a0
diff --git a/android/abi_gki_aarch64_amlogic b/android/abi_gki_aarch64_amlogic
index ae6e6cb73da1..2a3c0510146a 100644
--- a/android/abi_gki_aarch64_amlogic
+++ b/android/abi_gki_aarch64_amlogic
@@ -1,3 +1,5 @@
+
+
 [abi_symbol_list]
   add_cpu
   add_device_randomness
@@ -209,10 +211,12 @@
   consume_skb
   contig_page_data
   __contpte_try_unfold
+  _copy_from_iter
   copy_from_kernel_nofault
   __copy_overflow
   copy_page_from_iter_atomic
   copy_splice_read
+  _copy_to_iter
   cpu_all_bits
   cpu_bit_bitmap
   cpufreq_boost_enabled
@@ -245,10 +249,13 @@
   crypto_aead_setauthsize
   crypto_aead_setkey
   crypto_ahash_digest
+  crypto_ahash_final
+  crypto_ahash_finup
   crypto_ahash_setkey
   crypto_alloc_aead
   crypto_alloc_ahash
   crypto_alloc_base
+  crypto_alloc_rng
   crypto_alloc_shash
   crypto_alloc_skcipher
   crypto_cipher_encrypt_one
@@ -258,13 +265,17 @@
   crypto_dequeue_request
   crypto_destroy_tfm
   crypto_enqueue_request
+  crypto_get_default_null_skcipher
   crypto_has_alg
   crypto_init_queue
   __crypto_memneq
+  crypto_put_default_null_skcipher
   crypto_register_ahash
   crypto_register_alg
   crypto_register_shash
   crypto_register_skcipher
+  crypto_req_done
+  crypto_rng_reset
   crypto_sha1_finup
   crypto_sha1_update
   crypto_shash_digest
@@ -623,6 +634,7 @@
   drm_atomic_set_mode_prop_for_crtc
   drm_atomic_state_alloc
   drm_atomic_state_clear
+  drm_atomic_state_default_release
   __drm_atomic_state_free
   drm_compat_ioctl
   drm_connector_attach_content_type_property
@@ -793,6 +805,7 @@
   extcon_set_state
   extcon_set_state_sync
   extcon_unregister_notifier
+  extract_iter_to_sg
   fasync_helper
   fault_in_iov_iter_readable
   __fdget
@@ -1102,8 +1115,10 @@
   ioremap_prot
   io_schedule
   iounmap
+  iov_iter_advance
   iov_iter_alignment
   iov_iter_init
+  iov_iter_npages
   iov_iter_revert
   iov_iter_zero
   iput
@@ -1269,12 +1284,14 @@
   __local_bh_enable_ip
   __lock_buffer
   lockref_get
+  lock_sock_nested
   logfc
   log_post_read_mmio
   log_post_write_mmio
   log_read_mmio
   log_write_mmio
   lookup_bdev
+  lookup_user_key
   loops_per_jiffy
   LZ4_decompress_safe
   LZ4_decompress_safe_partial
@@ -1726,6 +1743,8 @@
   proc_mkdir
   proc_mkdir_data
   proc_remove
+  proto_register
+  proto_unregister
   __pskb_copy_fclone
   pskb_expand_head
   __pskb_pull_tail
@@ -1845,6 +1864,8 @@
   release_firmware
   __release_region
   release_resource
+  release_sock
+  release_sock
   remap_pfn_range
   remap_vmalloc_range
   remove_cpu
@@ -1940,6 +1961,8 @@
   sdio_writel
   sdio_writesb
   sdio_writew
+  security_sk_clone
+  security_sock_graft
   send_sig
   seq_list_next
   seq_list_start
@@ -2000,6 +2023,7 @@
   single_open_size
   single_release
   si_swapinfo
+  sk_alloc
   skb_add_rx_frag
   skb_checksum_help
   skb_clone
@@ -2026,6 +2050,7 @@
   skb_scrub_packet
   skb_trim
   skb_tstamp_tx
+  sk_free
   skip_spaces
   smpboot_register_percpu_thread
   smp_call_function
@@ -2046,6 +2071,7 @@
   snd_pcm_lib_preallocate_pages
   snd_pcm_period_elapsed
   snd_pcm_rate_to_rate_bit
+  snd_pcm_set_managed_buffer_all
   snd_pcm_stop
   snd_pcm_stop_xrun
   _snd_pcm_stream_lock_irqsave
@@ -2068,6 +2094,7 @@
   snd_soc_dai_set_tdm_slot
   snd_soc_dapm_get_enum_double
   snd_soc_dapm_put_enum_double
+  snd_soc_get_dai_name
   snd_soc_get_volsw
   snd_soc_get_volsw_range
   snd_soc_info_enum_double
@@ -2082,6 +2109,7 @@
   snd_soc_of_parse_audio_simple_widgets
   snd_soc_of_parse_card_name
   snd_soc_of_parse_tdm_slot
+  snd_soc_of_put_dai_link_codecs
   snd_soc_pm_ops
   snd_soc_put_volsw
   snd_soc_put_volsw_range
@@ -2090,7 +2118,25 @@
   snd_soc_unregister_component
   snprintf
   __sock_create
+  sock_init_data
+  sock_kfree_s
+  sock_kmalloc
+  sock_kzfree_s
+  sock_no_accept
+  sock_no_bind
+  sock_no_connect
+  sock_no_getname
+  sock_no_ioctl
+  sock_no_listen
+  sock_no_mmap
+  sock_no_recvmsg
+  sock_no_sendmsg
+  sock_no_shutdown
+  sock_no_socketpair
+  sock_register
   sock_release
+  sock_unregister
+  sock_wake_async
   sock_wfree
   sort
   spi_add_device
@@ -2172,6 +2218,7 @@
   sysfs_create_file_ns
   sysfs_create_files
   sysfs_create_group
+  sysfs_create_groups
   sysfs_create_link
   sysfs_emit
   __sysfs_match_string
@@ -2574,10 +2621,12 @@
   wakeup_source_register
   wakeup_source_unregister
   __wake_up_sync
+  __wake_up_sync_key
   __warn_flushing_systemwide_wq
   __warn_printk
   wireless_nlevent_flush
   wireless_send_event
+  woken_wake_function
   work_busy
   write_cache_pages
   write_inode_now

From 1f02134847c8bed09b4cda0a53cecce9b1271844 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb@google.com>
Date: Wed, 9 Jul 2025 16:52:30 -0700
Subject: [PATCH 37/49] Revert "ANDROID: Add dmabuf RSS trace event"

Revert submission 3680024

Reason for revert: replacing with a fixed version

Reverted changes: /q/submissionid:3680024

Bug: 430499939
Change-Id: I5ba5cdbabd1e967889b3523abca289fa8e8a3ec9
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/dma-buf/dma-buf.c   |  5 -----
 include/trace/events/kmem.h | 25 -------------------------
 2 files changed, 30 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 35bea250e08d..71065b03012a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -31,9 +31,6 @@
 #include <uapi/linux/dma-buf.h>
 #include <uapi/linux/magic.h>
 
-#ifndef __GENKSYMS__
-#include <trace/events/kmem.h>
-#endif
 #include <trace/hooks/dmabuf.h>
 
 #include "dma-buf-sysfs-stats.h"
@@ -214,7 +211,6 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
 		return -ENOMEM;
 
 	dmabuf_info->rss += dmabuf->size;
-	trace_dmabuf_rss_stat(dmabuf_info->rss, dmabuf->size, dmabuf);
 	/*
 	 * dmabuf_info->lock protects against concurrent writers, so no
 	 * worries about stale rss_hwm between the read and write, and we don't
@@ -316,7 +312,6 @@ void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 		list_del(&rec->node);
 		kfree(rec);
 		dmabuf_info->rss -= dmabuf->size;
-		trace_dmabuf_rss_stat(dmabuf_info->rss, -dmabuf->size, dmabuf);
 		atomic64_dec(&get_dmabuf_ext(dmabuf)->num_unique_refs);
 	}
 err:
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 896f8de946d0..68f5280a41a4 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -8,7 +8,6 @@
 #include <linux/types.h>
 #include <linux/tracepoint.h>
 #include <trace/events/mmflags.h>
-#include <linux/dma-buf.h>
 
 TRACE_EVENT(kmem_cache_alloc,
 
@@ -488,30 +487,6 @@ TRACE_EVENT(rss_stat,
 		__print_symbolic(__entry->member, TRACE_MM_PAGES),
 		__entry->size)
 	);
-
-TRACE_EVENT(dmabuf_rss_stat,
-
-	TP_PROTO(size_t rss, ssize_t rss_delta, struct dma_buf *dmabuf),
-
-	TP_ARGS(rss, rss_delta, dmabuf),
-
-	TP_STRUCT__entry(
-		__field(size_t, rss)
-		__field(ssize_t, rss_delta)
-		__field(unsigned long, i_ino)
-	),
-
-	TP_fast_assign(
-		__entry->rss = rss;
-		__entry->rss_delta = rss_delta;
-		__entry->i_ino = file_inode(dmabuf->file)->i_ino;
-	),
-
-	TP_printk("rss=%zu delta=%zd i_ino=%lu",
-		__entry->rss,
-		__entry->rss_delta,
-		__entry->i_ino)
-	);
 #endif /* _TRACE_KMEM_H */
 
 /* This part must be outside protection */

From b26826e8ffe6bac56af44ea05dc371120b3b80f3 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb@google.com>
Date: Wed, 9 Jul 2025 16:52:30 -0700
Subject: [PATCH 38/49] Revert "ANDROID: fixup dma_buf struct to avoid ABI
 breakage"

Revert submission 3680024

Reason for revert: replacing with a fixed version

Reverted changes: /q/submissionid:3680024

Bug: 430499939
Change-Id: I994b93b56b734441b6ecebf90b777a7a6f54f1ab
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/dma-buf/dma-buf.c | 22 ++++++++++------------
 fs/proc/base.c            |  2 +-
 include/linux/dma-buf.h   | 17 ++---------------
 3 files changed, 13 insertions(+), 28 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 71065b03012a..5b3e3fdc1599 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -93,7 +93,6 @@ static char *dmabuffs_dname(struct dentry *dentry, char *buffer, int buflen)
 
 static void dma_buf_release(struct dentry *dentry)
 {
-	struct dma_buf_ext *dmabuf_ext;
 	struct dma_buf *dmabuf;
 
 	dmabuf = dentry->d_fsdata;
@@ -116,13 +115,13 @@ static void dma_buf_release(struct dentry *dentry)
 	if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
 		dma_resv_fini(dmabuf->resv);
 
-	dmabuf_ext = get_dmabuf_ext(dmabuf);
-	if (atomic64_read(&dmabuf_ext->num_unique_refs))
+	if (atomic64_read(&dmabuf->num_unique_refs))
 		pr_err("destroying dmabuf with non-zero task refs\n");
+
 	WARN_ON(!list_empty(&dmabuf->attachments));
 	module_put(dmabuf->owner);
 	kfree(dmabuf->name);
-	kfree(dmabuf_ext);
+	kfree(dmabuf);
 }
 
 static int dma_buf_file_release(struct inode *inode, struct file *file)
@@ -222,7 +221,8 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
 	rec->dmabuf = dmabuf;
 	rec->refcnt = 1;
 	list_add(&rec->node, &dmabuf_info->dmabufs);
-	atomic64_inc(&get_dmabuf_ext(dmabuf)->num_unique_refs);
+
+	atomic64_inc(&dmabuf->num_unique_refs);
 
 	return 0;
 }
@@ -312,7 +312,7 @@ void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 		list_del(&rec->node);
 		kfree(rec);
 		dmabuf_info->rss -= dmabuf->size;
-		atomic64_dec(&get_dmabuf_ext(dmabuf)->num_unique_refs);
+		atomic64_dec(&dmabuf->num_unique_refs);
 	}
 err:
 	spin_unlock(&dmabuf_info->lock);
@@ -831,11 +831,10 @@ err_alloc_file:
  */
 struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 {
-	struct dma_buf_ext *dmabuf_ext;
 	struct dma_buf *dmabuf;
 	struct dma_resv *resv = exp_info->resv;
 	struct file *file;
-	size_t alloc_size = sizeof(struct dma_buf_ext);
+	size_t alloc_size = sizeof(struct dma_buf);
 	int ret;
 
 	if (WARN_ON(!exp_info->priv || !exp_info->ops
@@ -865,13 +864,12 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 	else
 		/* prevent &dma_buf[1] == dma_buf->resv */
 		alloc_size += 1;
-	dmabuf_ext = kzalloc(alloc_size, GFP_KERNEL);
-	if (!dmabuf_ext) {
+	dmabuf = kzalloc(alloc_size, GFP_KERNEL);
+	if (!dmabuf) {
 		ret = -ENOMEM;
 		goto err_file;
 	}
 
-	dmabuf = &dmabuf_ext->dmabuf;
 	dmabuf->priv = exp_info->priv;
 	dmabuf->ops = exp_info->ops;
 	dmabuf->size = exp_info->size;
@@ -890,7 +888,7 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 		dmabuf->resv = resv;
 	}
 
-	atomic64_set(&dmabuf_ext->num_unique_refs, 0);
+	atomic64_set(&dmabuf->num_unique_refs, 0);
 
 	file->private_data = dmabuf;
 	file->f_path.dentry->d_fsdata = dmabuf;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 3d78cd1286a5..0a3f28f7f1d9 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3437,7 +3437,7 @@ static int proc_dmabuf_pss_show(struct seq_file *m, struct pid_namespace *ns,
 
 	spin_lock(&dmabuf_info->lock);
 	list_for_each_entry(rec, &dmabuf_info->dmabufs, node) {
-		s64 refs = atomic64_read(&get_dmabuf_ext(rec->dmabuf)->num_unique_refs);
+		s64 refs = atomic64_read(&rec->dmabuf->num_unique_refs);
 
 		if (refs <= 0) {
 			pr_err("dmabuf has <= refs %lld\n", refs);
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index d9487fb2e549..654085da8bc4 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -535,11 +535,6 @@ struct dma_buf {
 	} *sysfs_entry;
 #endif
 
-	ANDROID_KABI_RESERVE(1);
-	ANDROID_KABI_RESERVE(2);
-};
-
-struct dma_buf_ext {
 	/**
 	 * @num_unique_refs:
 	 *
@@ -547,18 +542,10 @@ struct dma_buf_ext {
 	 */
 	atomic64_t num_unique_refs;
 
-	/*
-	 * dma_buf can have a reservation object after it, so keep this member
-	 * at the end of this structure.
-	 */
-	struct dma_buf dmabuf;
+	ANDROID_KABI_RESERVE(1);
+	ANDROID_KABI_RESERVE(2);
 };
 
-static inline struct dma_buf_ext *get_dmabuf_ext(struct dma_buf *dmabuf)
-{
-	return container_of(dmabuf, struct dma_buf_ext, dmabuf);
-}
-
 /**
  * struct dma_buf_attach_ops - importer operations for an attachment
  *

From ad0b76e69fc80167b8b53463c97660c1a0395248 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb@google.com>
Date: Wed, 9 Jul 2025 16:52:30 -0700
Subject: [PATCH 39/49] Revert "ANDROID: fixup task_struct to avoid ABI
 breakage"

Revert submission 3680024

Reason for revert: replacing with a fixed version

Reverted changes: /q/submissionid:3680024

Bug: 430499939
Change-Id: Iddc2b99d2f80611d044d0015d62bb33936b4bcaf
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/dma-buf/dma-buf.c | 69 ++++++++-------------------
 fs/proc/base.c            | 98 +++++++++++++++------------------------
 include/linux/dma-buf.h   | 22 ---------
 include/linux/sched.h     |  4 +-
 init/init_task.c          |  2 +-
 kernel/fork.c             | 69 +++++++++++----------------
 6 files changed, 87 insertions(+), 177 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 5b3e3fdc1599..cb91dadeb465 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -168,21 +168,11 @@ static struct file_system_type dma_buf_fs_type = {
 static struct task_dma_buf_record *find_task_dmabuf_record(
 		struct task_struct *task, struct dma_buf *dmabuf)
 {
-	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(task);
 	struct task_dma_buf_record *rec;
 
-	if (!dmabuf_info)
-		return NULL;
+	lockdep_assert_held(&task->dmabuf_info->lock);
 
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
-		return NULL;
-	}
-
-	lockdep_assert_held(&dmabuf_info->lock);
-
-	list_for_each_entry(rec, &dmabuf_info->dmabufs, node)
+	list_for_each_entry(rec, &task->dmabuf_info->dmabufs, node)
 		if (dmabuf == rec->dmabuf)
 			return rec;
 
@@ -191,36 +181,26 @@ static struct task_dma_buf_record *find_task_dmabuf_record(
 
 static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmabuf)
 {
-	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(task);
 	struct task_dma_buf_record *rec;
 
-	if (!dmabuf_info)
-		return 0;
-
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
-		return PTR_ERR(dmabuf_info);
-	}
-
-	lockdep_assert_held(&dmabuf_info->lock);
+	lockdep_assert_held(&task->dmabuf_info->lock);
 
 	rec = kmalloc(sizeof(*rec), GFP_KERNEL);
 	if (!rec)
 		return -ENOMEM;
 
-	dmabuf_info->rss += dmabuf->size;
+	task->dmabuf_info->rss += dmabuf->size;
 	/*
-	 * dmabuf_info->lock protects against concurrent writers, so no
+	 * task->dmabuf_info->lock protects against concurrent writers, so no
 	 * worries about stale rss_hwm between the read and write, and we don't
 	 * need to cmpxchg here.
 	 */
-	if (dmabuf_info->rss > dmabuf_info->rss_hwm)
-		dmabuf_info->rss_hwm = dmabuf_info->rss;
+	if (task->dmabuf_info->rss > task->dmabuf_info->rss_hwm)
+		task->dmabuf_info->rss_hwm = task->dmabuf_info->rss;
 
 	rec->dmabuf = dmabuf;
 	rec->refcnt = 1;
-	list_add(&rec->node, &dmabuf_info->dmabufs);
+	list_add(&rec->node, &task->dmabuf_info->dmabufs);
 
 	atomic64_inc(&dmabuf->num_unique_refs);
 
@@ -242,30 +222,24 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
  */
 int dma_buf_account_task(struct dma_buf *dmabuf, struct task_struct *task)
 {
-	struct task_dma_buf_info *dmabuf_info;
 	struct task_dma_buf_record *rec;
 	int ret = 0;
 
 	if (!dmabuf || !task)
 		return -EINVAL;
 
-	dmabuf_info = get_task_dma_buf_info(task);
-	if (!dmabuf_info)
-		return 0;
-
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
-		return PTR_ERR(dmabuf_info);
+	if (!task->dmabuf_info) {
+	    pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+	    return -ENOMEM;
 	}
 
-	spin_lock(&dmabuf_info->lock);
+	spin_lock(&task->dmabuf_info->lock);
 	rec = find_task_dmabuf_record(task, dmabuf);
 	if (!rec)
 		ret = new_task_dmabuf_record(task, dmabuf);
 	else
 		++rec->refcnt;
-	spin_unlock(&dmabuf_info->lock);
+	spin_unlock(&task->dmabuf_info->lock);
 
 	return ret;
 }
@@ -286,22 +260,17 @@ int dma_buf_account_task(struct dma_buf *dmabuf, struct task_struct *task)
  */
 void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 {
-	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(task);
 	struct task_dma_buf_record *rec;
 
 	if (!dmabuf || !task)
 		return;
 
-	if (!dmabuf_info)
-		return;
-
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
-		return;
+	if (!task->dmabuf_info) {
+	    pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+	    return;
 	}
 
-	spin_lock(&dmabuf_info->lock);
+	spin_lock(&task->dmabuf_info->lock);
 	rec = find_task_dmabuf_record(task, dmabuf);
 	if (!rec) { /* Failed fd_install? */
 		pr_err("dmabuf not found in task list\n");
@@ -311,11 +280,11 @@ void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 	if (--rec->refcnt == 0) {
 		list_del(&rec->node);
 		kfree(rec);
-		dmabuf_info->rss -= dmabuf->size;
+		task->dmabuf_info->rss -= dmabuf->size;
 		atomic64_dec(&dmabuf->num_unique_refs);
 	}
 err:
-	spin_unlock(&dmabuf_info->lock);
+	spin_unlock(&task->dmabuf_info->lock);
 }
 
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 0a3f28f7f1d9..2eee67e06ffe 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3309,27 +3309,21 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
 static int proc_dmabuf_rss_show(struct seq_file *m, struct pid_namespace *ns,
 		     struct pid *pid, struct task_struct *task)
 {
-	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(task);
+	if (!task->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		return -ENOMEM;
+	}
 
-	if (!dmabuf_info) {
+	if (!(task->flags & PF_KTHREAD))
+		seq_printf(m, "%lld\n", READ_ONCE(task->dmabuf_info->rss));
+	else
 		seq_puts(m, "0\n");
-		return 0;
-	}
-
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
-		return PTR_ERR(dmabuf_info);
-	}
-
-	seq_printf(m, "%lld\n", READ_ONCE(dmabuf_info->rss));
 
 	return 0;
 }
 
 static int proc_dmabuf_rss_hwm_show(struct seq_file *m, void *v)
 {
-	struct task_dma_buf_info *dmabuf_info;
 	struct inode *inode = m->private;
 	struct task_struct *task;
 	int ret = 0;
@@ -3338,20 +3332,16 @@ static int proc_dmabuf_rss_hwm_show(struct seq_file *m, void *v)
 	if (!task)
 		return -ESRCH;
 
-	dmabuf_info = get_task_dma_buf_info(task);
-	if (!dmabuf_info) {
+	if (!task->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		ret = -ENOMEM;
+		goto out;
+	}
+
+	if (!(task->flags & PF_KTHREAD))
+		seq_printf(m, "%lld\n", READ_ONCE(task->dmabuf_info->rss_hwm));
+	else
 		seq_puts(m, "0\n");
-		goto out;
-	}
-
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
-		ret = PTR_ERR(dmabuf_info);
-		goto out;
-	}
-
-	seq_printf(m, "%lld\n", READ_ONCE(dmabuf_info->rss_hwm));
 
 out:
 	put_task_struct(task);
@@ -3368,7 +3358,6 @@ static ssize_t
 proc_dmabuf_rss_hwm_write(struct file *file, const char __user *buf,
 			  size_t count, loff_t *offset)
 {
-	struct task_dma_buf_info *dmabuf_info;
 	struct inode *inode = file_inode(file);
 	struct task_struct *task;
 	unsigned long long val;
@@ -3385,22 +3374,15 @@ proc_dmabuf_rss_hwm_write(struct file *file, const char __user *buf,
 	if (!task)
 		return -ESRCH;
 
-	dmabuf_info = get_task_dma_buf_info(task);
-	if (!dmabuf_info) {
-		ret = -EINVAL;
+	if (!task->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		ret = -ENOMEM;
 		goto out;
 	}
 
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
-		ret = PTR_ERR(dmabuf_info);
-		goto out;
-	}
-
-	spin_lock(&dmabuf_info->lock);
-	dmabuf_info->rss_hwm = dmabuf_info->rss;
-	spin_unlock(&dmabuf_info->lock);
+	spin_lock(&task->dmabuf_info->lock);
+	task->dmabuf_info->rss_hwm = task->dmabuf_info->rss;
+	spin_unlock(&task->dmabuf_info->lock);
 
 out:
 	put_task_struct(task);
@@ -3419,36 +3401,32 @@ static const struct file_operations proc_dmabuf_rss_hwm_operations = {
 static int proc_dmabuf_pss_show(struct seq_file *m, struct pid_namespace *ns,
 		     struct pid *pid, struct task_struct *task)
 {
-	struct task_dma_buf_info *dmabuf_info;
 	struct task_dma_buf_record *rec;
 	u64 pss = 0;
 
-	dmabuf_info = get_task_dma_buf_info(task);
-	if (!dmabuf_info) {
-		seq_puts(m, "0\n");
-		return 0;
+	if (!task->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
+		return -ENOMEM;
 	}
 
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
-		return PTR_ERR(dmabuf_info);
-	}
+	if (!(task->flags & PF_KTHREAD)) {
+		spin_lock(&task->dmabuf_info->lock);
+		list_for_each_entry(rec, &task->dmabuf_info->dmabufs, node) {
+			s64 refs = atomic64_read(&rec->dmabuf->num_unique_refs);
 
-	spin_lock(&dmabuf_info->lock);
-	list_for_each_entry(rec, &dmabuf_info->dmabufs, node) {
-		s64 refs = atomic64_read(&rec->dmabuf->num_unique_refs);
+			if (refs <= 0) {
+				pr_err("dmabuf has <= refs %lld\n", refs);
+				continue;
+			}
 
-		if (refs <= 0) {
-			pr_err("dmabuf has <= refs %lld\n", refs);
-			continue;
+			pss += rec->dmabuf->size / (size_t)refs;
 		}
+		spin_unlock(&task->dmabuf_info->lock);
 
-		pss += rec->dmabuf->size / (size_t)refs;
+		seq_printf(m, "%llu\n", pss);
+	} else {
+		seq_puts(m, "0\n");
 	}
-	spin_unlock(&dmabuf_info->lock);
-
-	seq_printf(m, "%llu\n", pss);
 
 	return 0;
 }
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 654085da8bc4..267bf322272f 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -690,28 +690,6 @@ struct task_dma_buf_info {
 	struct list_head dmabufs;
 };
 
-static inline bool task_has_dma_buf_info(struct task_struct *task)
-{
-	return (task->flags & (PF_KTHREAD | PF_IO_WORKER)) == 0;
-}
-
-extern struct task_struct init_task;
-
-static inline
-struct task_dma_buf_info *get_task_dma_buf_info(struct task_struct *task)
-{
-	if (!task)
-		return ERR_PTR(-EINVAL);
-
-	if (!task_has_dma_buf_info(task))
-		return NULL;
-
-	if (!task->worker_private)
-		return ERR_PTR(-ENOMEM);
-
-	return (struct task_dma_buf_info *)task->worker_private;
-}
-
 /**
  * DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters
  * @name: export-info name
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 3cff2446536d..68ba96bde447 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1003,7 +1003,6 @@ struct task_struct {
 	int __user			*clear_child_tid;
 
 	/* PF_KTHREAD | PF_IO_WORKER */
-	/* Otherwise used as task_dma_buf_info pointer */
 	void				*worker_private;
 
 	u64				utime;
@@ -1518,6 +1517,9 @@ struct task_struct {
 	 */
 	struct callback_head		l1d_flush_kill;
 #endif
+
+	struct task_dma_buf_info *dmabuf_info;
+
 	ANDROID_KABI_RESERVE(1);
 	ANDROID_KABI_RESERVE(2);
 	ANDROID_KABI_RESERVE(3);
diff --git a/init/init_task.c b/init/init_task.c
index 1903a2abde55..d80c007ab59b 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -214,7 +214,7 @@ struct task_struct init_task
 	.android_vendor_data1 = {0, },
 	.android_oem_data1 = {0, },
 #endif
-	.worker_private = NULL,
+	.dmabuf_info = NULL,
 };
 EXPORT_SYMBOL(init_task);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 9c71a69e0d17..e1d7d244d43a 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -997,27 +997,21 @@ static inline void put_signal_struct(struct signal_struct *sig)
 
 static void put_dmabuf_info(struct task_struct *tsk)
 {
-	struct task_dma_buf_info *dmabuf_info = get_task_dma_buf_info(tsk);
-
-	if (!dmabuf_info)
-		return;
-
-	if (IS_ERR(dmabuf_info)) {
-		pr_err("%s dmabuf accounting record is missing, error %ld\n",
-			__func__, PTR_ERR(dmabuf_info));
+	if (!tsk->dmabuf_info) {
+		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
 		return;
 	}
 
-	if (!refcount_dec_and_test(&dmabuf_info->refcnt))
+	if (!refcount_dec_and_test(&tsk->dmabuf_info->refcnt))
 		return;
 
-	if (READ_ONCE(dmabuf_info->rss))
+	if (READ_ONCE(tsk->dmabuf_info->rss))
 		pr_err("%s destroying task with non-zero dmabuf rss\n", __func__);
 
-	if (!list_empty(&dmabuf_info->dmabufs))
+	if (!list_empty(&tsk->dmabuf_info->dmabufs))
 		pr_err("%s destroying task with non-empty dmabuf list\n", __func__);
 
-	kfree(dmabuf_info);
+	kfree(tsk->dmabuf_info);
 }
 
 void __put_task_struct(struct task_struct *tsk)
@@ -2297,66 +2291,55 @@ static void rv_task_fork(struct task_struct *p)
 
 static int copy_dmabuf_info(u64 clone_flags, struct task_struct *p)
 {
-	struct task_dma_buf_info *new_dmabuf_info;
-	struct task_dma_buf_info *dmabuf_info;
 	struct task_dma_buf_record *rec, *copy;
 
-	if (!task_has_dma_buf_info(p))
-		return 0; /* Task is not supposed to have dmabuf_info */
-
-	dmabuf_info = get_task_dma_buf_info(current);
-	/* Original might not have dmabuf_info and that's fine */
-	if (IS_ERR(dmabuf_info))
-		dmabuf_info = NULL;
-
-	if (dmabuf_info && (clone_flags & (CLONE_VM | CLONE_FILES))
+	if (current->dmabuf_info && (clone_flags & (CLONE_VM | CLONE_FILES))
 						== (CLONE_VM | CLONE_FILES)) {
 		/*
 		 * Both MM and FD references to dmabufs are shared with the parent, so
 		 * we can share a RSS counter with the parent.
 		 */
-		refcount_inc(&dmabuf_info->refcnt);
-		p->worker_private = dmabuf_info;
+		refcount_inc(&current->dmabuf_info->refcnt);
+		p->dmabuf_info = current->dmabuf_info;
 		return 0;
 	}
 
-	new_dmabuf_info = kmalloc(sizeof(*new_dmabuf_info), GFP_KERNEL);
-	if (!new_dmabuf_info)
+	p->dmabuf_info = kmalloc(sizeof(*p->dmabuf_info), GFP_KERNEL);
+	if (!p->dmabuf_info)
 		return -ENOMEM;
 
-	refcount_set(&new_dmabuf_info->refcnt, 1);
-	spin_lock_init(&new_dmabuf_info->lock);
-	INIT_LIST_HEAD(&new_dmabuf_info->dmabufs);
-	if (dmabuf_info) {
-		spin_lock(&dmabuf_info->lock);
-		new_dmabuf_info->rss = dmabuf_info->rss;
-		new_dmabuf_info->rss_hwm = dmabuf_info->rss;
-		list_for_each_entry(rec, &dmabuf_info->dmabufs, node) {
+	refcount_set(&p->dmabuf_info->refcnt, 1);
+	spin_lock_init(&p->dmabuf_info->lock);
+	INIT_LIST_HEAD(&p->dmabuf_info->dmabufs);
+	if (current->dmabuf_info) {
+		spin_lock(&current->dmabuf_info->lock);
+		p->dmabuf_info->rss = current->dmabuf_info->rss;
+		p->dmabuf_info->rss_hwm = current->dmabuf_info->rss;
+		list_for_each_entry(rec, &current->dmabuf_info->dmabufs, node) {
 			copy = kmalloc(sizeof(*copy), GFP_KERNEL);
 			if (!copy) {
-				spin_unlock(&dmabuf_info->lock);
+				spin_unlock(&current->dmabuf_info->lock);
 				goto err_list_copy;
 			}
 
 			copy->dmabuf = rec->dmabuf;
 			copy->refcnt = rec->refcnt;
-			list_add(&copy->node, &new_dmabuf_info->dmabufs);
+			list_add(&copy->node, &p->dmabuf_info->dmabufs);
 		}
-		spin_unlock(&dmabuf_info->lock);
+		spin_unlock(&current->dmabuf_info->lock);
 	} else {
-		new_dmabuf_info->rss = 0;
-		new_dmabuf_info->rss_hwm = 0;
+		p->dmabuf_info->rss = 0;
+		p->dmabuf_info->rss_hwm = 0;
 	}
-	p->worker_private = new_dmabuf_info;
 
 	return 0;
 
 err_list_copy:
-	list_for_each_entry_safe(rec, copy, &new_dmabuf_info->dmabufs, node) {
+	list_for_each_entry_safe(rec, copy, &p->dmabuf_info->dmabufs, node) {
 		list_del(&rec->node);
 		kfree(rec);
 	}
-	kfree(new_dmabuf_info);
+	kfree(p->dmabuf_info);
 	return -ENOMEM;
 }
 

From 30cf816a506fa06e86ca1af81ea4ec6bee1d95c6 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb@google.com>
Date: Wed, 9 Jul 2025 16:52:30 -0700
Subject: [PATCH 40/49] Revert "ANDROID: Track per-process dmabuf PSS"

Revert submission 3680024

Reason for revert: replacing with a fixed version

Reverted changes: /q/submissionid:3680024

Bug: 430499939
Change-Id: Iabc974ee2bd75a88e8e3b4728dc0f1a58ecfe75c
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/dma-buf/dma-buf.c |  8 --------
 fs/proc/base.c            | 34 ----------------------------------
 include/linux/dma-buf.h   |  8 --------
 3 files changed, 50 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index cb91dadeb465..7c9ac163d115 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -115,9 +115,6 @@ static void dma_buf_release(struct dentry *dentry)
 	if (dmabuf->resv == (struct dma_resv *)&dmabuf[1])
 		dma_resv_fini(dmabuf->resv);
 
-	if (atomic64_read(&dmabuf->num_unique_refs))
-		pr_err("destroying dmabuf with non-zero task refs\n");
-
 	WARN_ON(!list_empty(&dmabuf->attachments));
 	module_put(dmabuf->owner);
 	kfree(dmabuf->name);
@@ -202,8 +199,6 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
 	rec->refcnt = 1;
 	list_add(&rec->node, &task->dmabuf_info->dmabufs);
 
-	atomic64_inc(&dmabuf->num_unique_refs);
-
 	return 0;
 }
 
@@ -281,7 +276,6 @@ void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
 		list_del(&rec->node);
 		kfree(rec);
 		task->dmabuf_info->rss -= dmabuf->size;
-		atomic64_dec(&dmabuf->num_unique_refs);
 	}
 err:
 	spin_unlock(&task->dmabuf_info->lock);
@@ -857,8 +851,6 @@ struct dma_buf *dma_buf_export(const struct dma_buf_export_info *exp_info)
 		dmabuf->resv = resv;
 	}
 
-	atomic64_set(&dmabuf->num_unique_refs, 0);
-
 	file->private_data = dmabuf;
 	file->f_path.dentry->d_fsdata = dmabuf;
 	dmabuf->file = file;
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 2eee67e06ffe..6b91ddcab7e2 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3397,39 +3397,6 @@ static const struct file_operations proc_dmabuf_rss_hwm_operations = {
 	.llseek		= seq_lseek,
 	.release	= single_release,
 };
-
-static int proc_dmabuf_pss_show(struct seq_file *m, struct pid_namespace *ns,
-		     struct pid *pid, struct task_struct *task)
-{
-	struct task_dma_buf_record *rec;
-	u64 pss = 0;
-
-	if (!task->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		return -ENOMEM;
-	}
-
-	if (!(task->flags & PF_KTHREAD)) {
-		spin_lock(&task->dmabuf_info->lock);
-		list_for_each_entry(rec, &task->dmabuf_info->dmabufs, node) {
-			s64 refs = atomic64_read(&rec->dmabuf->num_unique_refs);
-
-			if (refs <= 0) {
-				pr_err("dmabuf has <= refs %lld\n", refs);
-				continue;
-			}
-
-			pss += rec->dmabuf->size / (size_t)refs;
-		}
-		spin_unlock(&task->dmabuf_info->lock);
-
-		seq_printf(m, "%llu\n", pss);
-	} else {
-		seq_puts(m, "0\n");
-	}
-
-	return 0;
-}
 #endif
 
 /*
@@ -3558,7 +3525,6 @@ static const struct pid_entry tgid_base_stuff[] = {
 #ifdef CONFIG_DMA_SHARED_BUFFER
 	ONE("dmabuf_rss", S_IRUGO, proc_dmabuf_rss_show),
 	REG("dmabuf_rss_hwm", S_IRUGO|S_IWUSR, proc_dmabuf_rss_hwm_operations),
-	ONE("dmabuf_pss", S_IRUGO, proc_dmabuf_pss_show),
 #endif
 };
 
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 267bf322272f..a362c8ba7a21 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -25,7 +25,6 @@
 #include <linux/workqueue.h>
 #include <linux/android_kabi.h>
 #ifndef __GENKSYMS__
-#include <linux/atomic.h>
 #include <linux/refcount.h>
 #endif
 
@@ -535,13 +534,6 @@ struct dma_buf {
 	} *sysfs_entry;
 #endif
 
-	/**
-	 * @num_unique_refs:
-	 *
-	 * The number of tasks that reference this buffer. For calculating PSS.
-	 */
-	atomic64_t num_unique_refs;
-
 	ANDROID_KABI_RESERVE(1);
 	ANDROID_KABI_RESERVE(2);
 };

From b1eeaed7fb5fbacdc5009d717c3c4a9952ae08a8 Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb@google.com>
Date: Wed, 9 Jul 2025 16:52:30 -0700
Subject: [PATCH 41/49] Revert "ANDROID: Track per-process dmabuf RSS HWM"

Revert submission 3680024

Reason for revert: replacing with a fixed version

Reverted changes: /q/submissionid:3680024

Bug: 430499939
Change-Id: I57d3532def7a03d4e785ec64ade727a43503a2fb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/dma-buf/dma-buf.c |  8 ----
 fs/proc/base.c            | 77 ---------------------------------------
 include/linux/dma-buf.h   |  7 +---
 kernel/fork.c             |  2 -
 4 files changed, 2 insertions(+), 92 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 7c9ac163d115..c8c05d2e112a 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -187,14 +187,6 @@ static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmab
 		return -ENOMEM;
 
 	task->dmabuf_info->rss += dmabuf->size;
-	/*
-	 * task->dmabuf_info->lock protects against concurrent writers, so no
-	 * worries about stale rss_hwm between the read and write, and we don't
-	 * need to cmpxchg here.
-	 */
-	if (task->dmabuf_info->rss > task->dmabuf_info->rss_hwm)
-		task->dmabuf_info->rss_hwm = task->dmabuf_info->rss;
-
 	rec->dmabuf = dmabuf;
 	rec->refcnt = 1;
 	list_add(&rec->node, &task->dmabuf_info->dmabufs);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6b91ddcab7e2..f7d8188b0ccf 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3321,82 +3321,6 @@ static int proc_dmabuf_rss_show(struct seq_file *m, struct pid_namespace *ns,
 
 	return 0;
 }
-
-static int proc_dmabuf_rss_hwm_show(struct seq_file *m, void *v)
-{
-	struct inode *inode = m->private;
-	struct task_struct *task;
-	int ret = 0;
-
-	task = get_proc_task(inode);
-	if (!task)
-		return -ESRCH;
-
-	if (!task->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	if (!(task->flags & PF_KTHREAD))
-		seq_printf(m, "%lld\n", READ_ONCE(task->dmabuf_info->rss_hwm));
-	else
-		seq_puts(m, "0\n");
-
-out:
-	put_task_struct(task);
-
-	return ret;
-}
-
-static int proc_dmabuf_rss_hwm_open(struct inode *inode, struct file *filp)
-{
-	return single_open(filp, proc_dmabuf_rss_hwm_show, inode);
-}
-
-static ssize_t
-proc_dmabuf_rss_hwm_write(struct file *file, const char __user *buf,
-			  size_t count, loff_t *offset)
-{
-	struct inode *inode = file_inode(file);
-	struct task_struct *task;
-	unsigned long long val;
-	int ret;
-
-	ret = kstrtoull_from_user(buf, count, 10, &val);
-	if (ret)
-		return ret;
-
-	if (val != 0)
-		return -EINVAL;
-
-	task = get_proc_task(inode);
-	if (!task)
-		return -ESRCH;
-
-	if (!task->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	spin_lock(&task->dmabuf_info->lock);
-	task->dmabuf_info->rss_hwm = task->dmabuf_info->rss;
-	spin_unlock(&task->dmabuf_info->lock);
-
-out:
-	put_task_struct(task);
-
-	return ret < 0 ? ret : count;
-}
-
-static const struct file_operations proc_dmabuf_rss_hwm_operations = {
-	.open		= proc_dmabuf_rss_hwm_open,
-	.write		= proc_dmabuf_rss_hwm_write,
-	.read		= seq_read,
-	.llseek		= seq_lseek,
-	.release	= single_release,
-};
 #endif
 
 /*
@@ -3524,7 +3448,6 @@ static const struct pid_entry tgid_base_stuff[] = {
 #endif
 #ifdef CONFIG_DMA_SHARED_BUFFER
 	ONE("dmabuf_rss", S_IRUGO, proc_dmabuf_rss_show),
-	REG("dmabuf_rss_hwm", S_IRUGO|S_IWUSR, proc_dmabuf_rss_hwm_operations),
 #endif
 };
 
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index a362c8ba7a21..1647fb38fe80 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -659,8 +659,8 @@ struct task_dma_buf_record {
 };
 
 /**
- * struct task_dma_buf_info - Holds RSS and RSS HWM counters, and a list of
- * dmabufs for all tasks that share both mm_struct and files_struct.
+ * struct task_dma_buf_info - Holds a RSS counter, and a list of dmabufs for all
+ * tasks that share both mm_struct and files_struct.
  *
  * @rss: The sum of all dmabuf memory referenced by the tasks via memory
  *       mappings or file descriptors in bytes. Buffers referenced more than
@@ -668,15 +668,12 @@ struct task_dma_buf_record {
  *       of both mmaps and FDs) only cause the buffer to be accounted to the
  *       process once. Partial mappings cause the full size of the buffer to be
  *       accounted, regardless of the size of the mapping.
- * @rss_hwm: The maximum value of @rss over the lifetime of this struct. (Unless,
- *           reset by userspace.)
  * @refcnt: The number of tasks sharing this struct.
  * @lock: Lock protecting writes for @rss, and reads/writes for @dmabufs.
  * @dmabufs: List of all dmabufs referenced by the tasks.
  */
 struct task_dma_buf_info {
 	s64 rss;
-	s64 rss_hwm;
 	refcount_t refcnt;
 	spinlock_t lock;
 	struct list_head dmabufs;
diff --git a/kernel/fork.c b/kernel/fork.c
index e1d7d244d43a..66636a979911 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2314,7 +2314,6 @@ static int copy_dmabuf_info(u64 clone_flags, struct task_struct *p)
 	if (current->dmabuf_info) {
 		spin_lock(&current->dmabuf_info->lock);
 		p->dmabuf_info->rss = current->dmabuf_info->rss;
-		p->dmabuf_info->rss_hwm = current->dmabuf_info->rss;
 		list_for_each_entry(rec, &current->dmabuf_info->dmabufs, node) {
 			copy = kmalloc(sizeof(*copy), GFP_KERNEL);
 			if (!copy) {
@@ -2329,7 +2328,6 @@ static int copy_dmabuf_info(u64 clone_flags, struct task_struct *p)
 		spin_unlock(&current->dmabuf_info->lock);
 	} else {
 		p->dmabuf_info->rss = 0;
-		p->dmabuf_info->rss_hwm = 0;
 	}
 
 	return 0;

From 9e89b97c13b4369ff627a1640c9714ec6788d1ec Mon Sep 17 00:00:00 2001
From: Suren Baghdasaryan <surenb@google.com>
Date: Wed, 9 Jul 2025 16:52:30 -0700
Subject: [PATCH 42/49] Revert "ANDROID: Track per-process dmabuf RSS"

Revert submission 3680024

Reason for revert: replacing with a fixed version

Reverted changes: /q/submissionid:3680024

Bug: 430499939
Change-Id: I93de8460bcabdc2b1b2a0e12069d89dcad3a870d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
---
 drivers/dma-buf/dma-buf.c | 141 +-------------------------------------
 fs/file.c                 |   4 --
 fs/proc/base.c            |  22 ------
 include/linux/dma-buf.h   |  43 ------------
 include/linux/sched.h     |   4 --
 init/init_task.c          |   1 -
 kernel/fork.c             |  83 +---------------------
 mm/mmap.c                 |  14 +---
 8 files changed, 6 insertions(+), 306 deletions(-)

diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index c8c05d2e112a..0b02ced1eb33 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -162,121 +162,9 @@ static struct file_system_type dma_buf_fs_type = {
 	.kill_sb = kill_anon_super,
 };
 
-static struct task_dma_buf_record *find_task_dmabuf_record(
-		struct task_struct *task, struct dma_buf *dmabuf)
-{
-	struct task_dma_buf_record *rec;
-
-	lockdep_assert_held(&task->dmabuf_info->lock);
-
-	list_for_each_entry(rec, &task->dmabuf_info->dmabufs, node)
-		if (dmabuf == rec->dmabuf)
-			return rec;
-
-	return NULL;
-}
-
-static int new_task_dmabuf_record(struct task_struct *task, struct dma_buf *dmabuf)
-{
-	struct task_dma_buf_record *rec;
-
-	lockdep_assert_held(&task->dmabuf_info->lock);
-
-	rec = kmalloc(sizeof(*rec), GFP_KERNEL);
-	if (!rec)
-		return -ENOMEM;
-
-	task->dmabuf_info->rss += dmabuf->size;
-	rec->dmabuf = dmabuf;
-	rec->refcnt = 1;
-	list_add(&rec->node, &task->dmabuf_info->dmabufs);
-
-	return 0;
-}
-
-/**
- * dma_buf_account_task - Account a dmabuf to a task
- * @dmabuf:	[in]	pointer to dma_buf
- * @task:	[in]	pointer to task_struct
- *
- * When a process obtains a dmabuf file descriptor, or maps a dmabuf, this
- * function attributes the provided @dmabuf to the @task. The first time @dmabuf
- * is attributed to @task, the buffer's size is added to the @task's dmabuf RSS.
- *
- * Return:
- * * 0 on success
- * * A negative error code upon error
- */
-int dma_buf_account_task(struct dma_buf *dmabuf, struct task_struct *task)
-{
-	struct task_dma_buf_record *rec;
-	int ret = 0;
-
-	if (!dmabuf || !task)
-		return -EINVAL;
-
-	if (!task->dmabuf_info) {
-	    pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-	    return -ENOMEM;
-	}
-
-	spin_lock(&task->dmabuf_info->lock);
-	rec = find_task_dmabuf_record(task, dmabuf);
-	if (!rec)
-		ret = new_task_dmabuf_record(task, dmabuf);
-	else
-		++rec->refcnt;
-	spin_unlock(&task->dmabuf_info->lock);
-
-	return ret;
-}
-
-/**
- * dma_buf_unaccount_task - Unaccount a dmabuf from a task
- * @dmabuf:	[in]	pointer to dma_buf
- * @task:	[in]	pointer to task_struct
- *
- * When a process closes a dmabuf file descriptor, or unmaps a dmabuf, this
- * function removes the provided @dmabuf attribution from the @task. When all
- * references to @dmabuf are removed from @task, the buffer's size is removed
- * from the task's dmabuf RSS.
- *
- * Return:
- * * 0 on success
- * * A negative error code upon error
- */
-void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task)
-{
-	struct task_dma_buf_record *rec;
-
-	if (!dmabuf || !task)
-		return;
-
-	if (!task->dmabuf_info) {
-	    pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-	    return;
-	}
-
-	spin_lock(&task->dmabuf_info->lock);
-	rec = find_task_dmabuf_record(task, dmabuf);
-	if (!rec) { /* Failed fd_install? */
-		pr_err("dmabuf not found in task list\n");
-		goto err;
-	}
-
-	if (--rec->refcnt == 0) {
-		list_del(&rec->node);
-		kfree(rec);
-		task->dmabuf_info->rss -= dmabuf->size;
-	}
-err:
-	spin_unlock(&task->dmabuf_info->lock);
-}
-
 static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 {
 	struct dma_buf *dmabuf;
-	int ret;
 
 	if (!is_dma_buf_file(file))
 		return -EINVAL;
@@ -292,15 +180,7 @@ static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma)
 	    dmabuf->size >> PAGE_SHIFT)
 		return -EINVAL;
 
-	ret = dma_buf_account_task(dmabuf, current);
-	if (ret)
-		return ret;
-
-	ret = dmabuf->ops->mmap(dmabuf, vma);
-	if (ret)
-		dma_buf_unaccount_task(dmabuf, current);
-
-	return ret;
+	return dmabuf->ops->mmap(dmabuf, vma);
 }
 
 static loff_t dma_buf_llseek(struct file *file, loff_t offset, int whence)
@@ -677,12 +557,6 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file)
 	spin_unlock(&dmabuf->name_lock);
 }
 
-static int dma_buf_flush(struct file *file, fl_owner_t id)
-{
-	dma_buf_unaccount_task(file->private_data, current);
-	return 0;
-}
-
 static const struct file_operations dma_buf_fops = {
 	.release	= dma_buf_file_release,
 	.mmap		= dma_buf_mmap_internal,
@@ -691,7 +565,6 @@ static const struct file_operations dma_buf_fops = {
 	.unlocked_ioctl	= dma_buf_ioctl,
 	.compat_ioctl	= compat_ptr_ioctl,
 	.show_fdinfo	= dma_buf_show_fdinfo,
-	.flush		= dma_buf_flush,
 };
 
 /*
@@ -1682,8 +1555,6 @@ EXPORT_SYMBOL_GPL(dma_buf_end_cpu_access_partial);
 int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 		 unsigned long pgoff)
 {
-	int ret;
-
 	if (WARN_ON(!dmabuf || !vma))
 		return -EINVAL;
 
@@ -1704,15 +1575,7 @@ int dma_buf_mmap(struct dma_buf *dmabuf, struct vm_area_struct *vma,
 	vma_set_file(vma, dmabuf->file);
 	vma->vm_pgoff = pgoff;
 
-	ret = dma_buf_account_task(dmabuf, current);
-	if (ret)
-		return ret;
-
-	ret = dmabuf->ops->mmap(dmabuf, vma);
-	if (ret)
-		dma_buf_unaccount_task(dmabuf, current);
-
-	return ret;
+	return dmabuf->ops->mmap(dmabuf, vma);
 }
 EXPORT_SYMBOL_NS_GPL(dma_buf_mmap, DMA_BUF);
 
diff --git a/fs/file.c b/fs/file.c
index e924929ac366..1f1181b189bf 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -20,7 +20,6 @@
 #include <linux/spinlock.h>
 #include <linux/rcupdate.h>
 #include <linux/close_range.h>
-#include <linux/dma-buf.h>
 #include <net/sock.h>
 
 #include "internal.h"
@@ -594,9 +593,6 @@ void fd_install(unsigned int fd, struct file *file)
 	struct files_struct *files = current->files;
 	struct fdtable *fdt;
 
-	if (is_dma_buf_file(file) && dma_buf_account_task(file->private_data, current))
-		pr_err("FD dmabuf accounting failed\n");
-
 	rcu_read_lock_sched();
 
 	if (unlikely(files->resize_in_progress)) {
diff --git a/fs/proc/base.c b/fs/proc/base.c
index f7d8188b0ccf..7cff02bc816e 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -100,7 +100,6 @@
 #include <linux/cn_proc.h>
 #include <linux/ksm.h>
 #include <linux/cpufreq_times.h>
-#include <linux/dma-buf.h>
 #include <trace/events/oom.h>
 #include <trace/hooks/sched.h>
 #include "internal.h"
@@ -3305,24 +3304,6 @@ static int proc_stack_depth(struct seq_file *m, struct pid_namespace *ns,
 }
 #endif /* CONFIG_STACKLEAK_METRICS */
 
-#ifdef CONFIG_DMA_SHARED_BUFFER
-static int proc_dmabuf_rss_show(struct seq_file *m, struct pid_namespace *ns,
-		     struct pid *pid, struct task_struct *task)
-{
-	if (!task->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		return -ENOMEM;
-	}
-
-	if (!(task->flags & PF_KTHREAD))
-		seq_printf(m, "%lld\n", READ_ONCE(task->dmabuf_info->rss));
-	else
-		seq_puts(m, "0\n");
-
-	return 0;
-}
-#endif
-
 /*
  * Thread groups
  */
@@ -3446,9 +3427,6 @@ static const struct pid_entry tgid_base_stuff[] = {
 	ONE("ksm_merging_pages",  S_IRUSR, proc_pid_ksm_merging_pages),
 	ONE("ksm_stat",  S_IRUSR, proc_pid_ksm_stat),
 #endif
-#ifdef CONFIG_DMA_SHARED_BUFFER
-	ONE("dmabuf_rss", S_IRUGO, proc_dmabuf_rss_show),
-#endif
 };
 
 static int proc_tgid_base_readdir(struct file *file, struct dir_context *ctx)
diff --git a/include/linux/dma-buf.h b/include/linux/dma-buf.h
index 1647fb38fe80..64d67293d76b 100644
--- a/include/linux/dma-buf.h
+++ b/include/linux/dma-buf.h
@@ -24,9 +24,6 @@
 #include <linux/wait.h>
 #include <linux/workqueue.h>
 #include <linux/android_kabi.h>
-#ifndef __GENKSYMS__
-#include <linux/refcount.h>
-#endif
 
 struct device;
 struct dma_buf;
@@ -642,43 +639,6 @@ struct dma_buf_export_info {
 	ANDROID_KABI_RESERVE(2);
 };
 
-/**
- * struct task_dma_buf_record - Holds the number of (VMA and FD) references to a
- * dmabuf by a collection of tasks that share both mm_struct and files_struct.
- * This is the list entry type for @task_dma_buf_info dmabufs list.
- *
- * @node: Stores the list this record is on.
- * @dmabuf: The dmabuf this record is for.
- * @refcnt: The number of VMAs and FDs that reference @dmabuf by the tasks that
- *          share this record.
- */
-struct task_dma_buf_record {
-	struct list_head node;
-	struct dma_buf *dmabuf;
-	unsigned long refcnt;
-};
-
-/**
- * struct task_dma_buf_info - Holds a RSS counter, and a list of dmabufs for all
- * tasks that share both mm_struct and files_struct.
- *
- * @rss: The sum of all dmabuf memory referenced by the tasks via memory
- *       mappings or file descriptors in bytes. Buffers referenced more than
- *       once by the process (multiple mmaps, multiple FDs, or any combination
- *       of both mmaps and FDs) only cause the buffer to be accounted to the
- *       process once. Partial mappings cause the full size of the buffer to be
- *       accounted, regardless of the size of the mapping.
- * @refcnt: The number of tasks sharing this struct.
- * @lock: Lock protecting writes for @rss, and reads/writes for @dmabufs.
- * @dmabufs: List of all dmabufs referenced by the tasks.
- */
-struct task_dma_buf_info {
-	s64 rss;
-	refcount_t refcnt;
-	spinlock_t lock;
-	struct list_head dmabufs;
-};
-
 /**
  * DEFINE_DMA_BUF_EXPORT_INFO - helper macro for exporters
  * @name: export-info name
@@ -781,7 +741,4 @@ int dma_buf_vmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
 void dma_buf_vunmap_unlocked(struct dma_buf *dmabuf, struct iosys_map *map);
 long dma_buf_set_name(struct dma_buf *dmabuf, const char *name);
 int dma_buf_get_flags(struct dma_buf *dmabuf, unsigned long *flags);
-
-int dma_buf_account_task(struct dma_buf *dmabuf, struct task_struct *task);
-void dma_buf_unaccount_task(struct dma_buf *dmabuf, struct task_struct *task);
 #endif /* __DMA_BUF_H__ */
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 68ba96bde447..1299b4497d87 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -70,7 +70,6 @@ struct seq_file;
 struct sighand_struct;
 struct signal_struct;
 struct task_delay_info;
-struct task_dma_buf_info;
 struct task_group;
 struct user_event_mm;
 
@@ -1517,9 +1516,6 @@ struct task_struct {
 	 */
 	struct callback_head		l1d_flush_kill;
 #endif
-
-	struct task_dma_buf_info *dmabuf_info;
-
 	ANDROID_KABI_RESERVE(1);
 	ANDROID_KABI_RESERVE(2);
 	ANDROID_KABI_RESERVE(3);
diff --git a/init/init_task.c b/init/init_task.c
index d80c007ab59b..31ceb0e469f7 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -214,7 +214,6 @@ struct task_struct init_task
 	.android_vendor_data1 = {0, },
 	.android_oem_data1 = {0, },
 #endif
-	.dmabuf_info = NULL,
 };
 EXPORT_SYMBOL(init_task);
 
diff --git a/kernel/fork.c b/kernel/fork.c
index 66636a979911..75b1a4458a7e 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -101,7 +101,6 @@
 #include <linux/iommu.h>
 #include <linux/tick.h>
 #include <linux/cpufreq_times.h>
-#include <linux/dma-buf.h>
 
 #include <asm/pgalloc.h>
 #include <linux/uaccess.h>
@@ -995,32 +994,12 @@ static inline void put_signal_struct(struct signal_struct *sig)
 		free_signal_struct(sig);
 }
 
-static void put_dmabuf_info(struct task_struct *tsk)
-{
-	if (!tsk->dmabuf_info) {
-		pr_err("%s dmabuf accounting record was not allocated\n", __func__);
-		return;
-	}
-
-	if (!refcount_dec_and_test(&tsk->dmabuf_info->refcnt))
-		return;
-
-	if (READ_ONCE(tsk->dmabuf_info->rss))
-		pr_err("%s destroying task with non-zero dmabuf rss\n", __func__);
-
-	if (!list_empty(&tsk->dmabuf_info->dmabufs))
-		pr_err("%s destroying task with non-empty dmabuf list\n", __func__);
-
-	kfree(tsk->dmabuf_info);
-}
-
 void __put_task_struct(struct task_struct *tsk)
 {
 	WARN_ON(!tsk->exit_state);
 	WARN_ON(refcount_read(&tsk->usage));
 	WARN_ON(tsk == current);
 
-	put_dmabuf_info(tsk);
 	io_uring_free(tsk);
 	cgroup_free(tsk);
 	task_numa_free(tsk, true);
@@ -2289,58 +2268,6 @@ static void rv_task_fork(struct task_struct *p)
 #define rv_task_fork(p) do {} while (0)
 #endif
 
-static int copy_dmabuf_info(u64 clone_flags, struct task_struct *p)
-{
-	struct task_dma_buf_record *rec, *copy;
-
-	if (current->dmabuf_info && (clone_flags & (CLONE_VM | CLONE_FILES))
-						== (CLONE_VM | CLONE_FILES)) {
-		/*
-		 * Both MM and FD references to dmabufs are shared with the parent, so
-		 * we can share a RSS counter with the parent.
-		 */
-		refcount_inc(&current->dmabuf_info->refcnt);
-		p->dmabuf_info = current->dmabuf_info;
-		return 0;
-	}
-
-	p->dmabuf_info = kmalloc(sizeof(*p->dmabuf_info), GFP_KERNEL);
-	if (!p->dmabuf_info)
-		return -ENOMEM;
-
-	refcount_set(&p->dmabuf_info->refcnt, 1);
-	spin_lock_init(&p->dmabuf_info->lock);
-	INIT_LIST_HEAD(&p->dmabuf_info->dmabufs);
-	if (current->dmabuf_info) {
-		spin_lock(&current->dmabuf_info->lock);
-		p->dmabuf_info->rss = current->dmabuf_info->rss;
-		list_for_each_entry(rec, &current->dmabuf_info->dmabufs, node) {
-			copy = kmalloc(sizeof(*copy), GFP_KERNEL);
-			if (!copy) {
-				spin_unlock(&current->dmabuf_info->lock);
-				goto err_list_copy;
-			}
-
-			copy->dmabuf = rec->dmabuf;
-			copy->refcnt = rec->refcnt;
-			list_add(&copy->node, &p->dmabuf_info->dmabufs);
-		}
-		spin_unlock(&current->dmabuf_info->lock);
-	} else {
-		p->dmabuf_info->rss = 0;
-	}
-
-	return 0;
-
-err_list_copy:
-	list_for_each_entry_safe(rec, copy, &p->dmabuf_info->dmabufs, node) {
-		list_del(&rec->node);
-		kfree(rec);
-	}
-	kfree(p->dmabuf_info);
-	return -ENOMEM;
-}
-
 /*
  * This creates a new process as a copy of the old one,
  * but does not actually start it yet.
@@ -2582,18 +2509,14 @@ __latent_entropy struct task_struct *copy_process(
 	p->bpf_ctx = NULL;
 #endif
 
-	retval = copy_dmabuf_info(clone_flags, p);
-	if (retval)
-		goto bad_fork_cleanup_policy;
-
 	/* Perform scheduler related setup. Assign this task to a CPU. */
 	retval = sched_fork(clone_flags, p);
 	if (retval)
-		goto bad_fork_cleanup_dmabuf;
+		goto bad_fork_cleanup_policy;
 
 	retval = perf_event_init_task(p, clone_flags);
 	if (retval)
-		goto bad_fork_cleanup_dmabuf;
+		goto bad_fork_cleanup_policy;
 	retval = audit_alloc(p);
 	if (retval)
 		goto bad_fork_cleanup_perf;
@@ -2896,8 +2819,6 @@ bad_fork_cleanup_audit:
 	audit_free(p);
 bad_fork_cleanup_perf:
 	perf_event_free_task(p);
-bad_fork_cleanup_dmabuf:
-	put_dmabuf_info(p);
 bad_fork_cleanup_policy:
 	lockdep_free_task(p);
 #ifdef CONFIG_NUMA
diff --git a/mm/mmap.c b/mm/mmap.c
index 6da684ab9f98..4c74fb3d7a94 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -49,7 +49,6 @@
 #include <linux/oom.h>
 #include <linux/sched/mm.h>
 #include <linux/ksm.h>
-#include <linux/dma-buf.h>
 
 #include <linux/uaccess.h>
 #include <asm/cacheflush.h>
@@ -145,11 +144,8 @@ static void remove_vma(struct vm_area_struct *vma, bool unreachable)
 {
 	might_sleep();
 	vma_close(vma);
-	if (vma->vm_file) {
-		if (is_dma_buf_file(vma->vm_file))
-			dma_buf_unaccount_task(vma->vm_file->private_data, current);
+	if (vma->vm_file)
 		fput(vma->vm_file);
-	}
 	mpol_put(vma_policy(vma));
 	if (unreachable)
 		__vm_area_free(vma);
@@ -2421,14 +2417,8 @@ int __split_vma(struct vma_iterator *vmi, struct vm_area_struct *vma,
 	if (err)
 		goto out_free_mpol;
 
-	if (new->vm_file) {
+	if (new->vm_file)
 		get_file(new->vm_file);
-		if (is_dma_buf_file(new->vm_file)) {
-			/* Should never fail since this task already references the buffer */
-			if (dma_buf_account_task(new->vm_file->private_data, current))
-				pr_err("%s failed to account dmabuf\n", __func__);
-		}
-	}
 
 	if (new->vm_ops && new->vm_ops->open)
 		new->vm_ops->open(new);

From f99b0f6dd206581de3941fce75a1e1d72bc92979 Mon Sep 17 00:00:00 2001
From: Snehal Koukuntla <snehalreddy@google.com>
Date: Wed, 9 Jul 2025 11:35:14 +0000
Subject: [PATCH 43/49] ANDROID: KVM: arm64: Increase the pkvm reclaim buffer
 size

Increase the internal reclaim buffer size of pkvm to accommodate Pixel
use cases. Without this we are seeing >50% failure rate

Bug: 426242992
Change-Id: I892cb1fe30fa97fea044187728d814dd832dd929
Signed-off-by: Snehal Koukuntla <snehalreddy@google.com>
---
 arch/arm64/include/asm/kvm_pkvm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm_pkvm.h
index 4a16808c3ba8..80a1526684cb 100644
--- a/arch/arm64/include/asm/kvm_pkvm.h
+++ b/arch/arm64/include/asm/kvm_pkvm.h
@@ -593,7 +593,7 @@ static inline unsigned long host_s2_pgtable_pages(void)
  * Maximum number of consitutents allowed in a descriptor. This number is
  * arbitrary, see comment below on SG_MAX_SEGMENTS in hyp_ffa_proxy_pages().
  */
-#define KVM_FFA_MAX_NR_CONSTITUENTS	4096
+#define KVM_FFA_MAX_NR_CONSTITUENTS	12288
 
 static inline unsigned long hyp_ffa_proxy_pages(void)
 {

From 789dd354a87c622bb04d7c7f0e861b7797a3c430 Mon Sep 17 00:00:00 2001
From: Mostafa Saleh <smostafa@google.com>
Date: Wed, 2 Jul 2025 12:16:07 +0000
Subject: [PATCH 44/49] ANDROID: KVM: arm64: Don't update IOMMU under memory
 pressure

host_stage2_unmap_unmoveable_regs() is called when the hypervisor pool
is under pressure to map stage-2 enteries, so it unmap all enteries that
can't be donated and owned by the host so it can be lazily faulted
later.
But that doesn't change any ownership of pages, so they are still owned
by the host and must remain mapped in the IOMMU.

Bug: 428939924
Change-Id: Id91183619a316a67bda48d8e9adf9b6ef49c104f
Signed-off-by: Mostafa Saleh <smostafa@google.com>
---
 arch/arm64/kvm/hyp/nvhe/mem_protect.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
index bc1f8cb3faf3..afdd36e4ae8a 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -491,17 +491,9 @@ int __pkvm_prot_finalize(void)
 
 int host_stage2_unmap_reg_locked(phys_addr_t start, u64 size)
 {
-	int ret;
-
 	hyp_assert_lock_held(&host_mmu.lock);
 
-	ret = kvm_pgtable_stage2_reclaim_leaves(&host_mmu.pgt, start, size);
-	if (ret)
-		return ret;
-
-	kvm_iommu_host_stage2_idmap(start, start + size, 0);
-
-	return 0;
+	return kvm_pgtable_stage2_reclaim_leaves(&host_mmu.pgt, start, size);
 }
 
 static int host_stage2_unmap_unmoveable_regs(void)

From 46e269016e966c5d4c38a3fc8ebe7c65d4609ba2 Mon Sep 17 00:00:00 2001
From: Zhengxu Zhang <zhengxu.zhang@unisoc.com>
Date: Thu, 12 Jun 2025 09:14:21 +0800
Subject: [PATCH 45/49] FROMGIT: exfat: fdatasync flag should be same like
 generic_write_sync()

Test: androbench by default setting, use 64GB sdcard.
 the random write speed:
	without this patch 3.5MB/s
	with this patch 7MB/s

After patch "11a347fb6cef", the random write speed decreased
significantly.the .write_iter() interface had been modified, and check
the differences with generic_file_write_iter(), when calling
generic_write_sync() and exfat_file_write_iter() to call
vfs_fsync_range(), the fdatasync flag is wrong, and make not use the
fdatasync mode, and make random write speed decreased.

So use generic_write_sync() instead of vfs_fsync_range().

Fixes: 11a347fb6cef ("exfat: change to get file size from DataLength")

Bug: 427084532
(cherry picked from commit 309914e6602c9e17ff84b20db8c4f1da0d6a2a36
https://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat.git
dev)
Link: https://lore.kernel.org/all/20250619013331.664521-1-zhengxu.zhang@unisoc.com/
Change-Id: I68319a27cabedd9d4a7fa35948affd8c27d72160
Signed-off-by: Zhengxu Zhang <zhengxu.zhang@unisoc.com>
Acked-by: Yuezhang Mo <Yuezhang.Mo@sony.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
---
 fs/exfat/file.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/fs/exfat/file.c b/fs/exfat/file.c
index efd24e29f119..272208708ffc 100644
--- a/fs/exfat/file.c
+++ b/fs/exfat/file.c
@@ -610,9 +610,8 @@ static ssize_t exfat_file_write_iter(struct kiocb *iocb, struct iov_iter *iter)
 	if (pos > valid_size)
 		pos = valid_size;
 
-	if (iocb_is_dsync(iocb) && iocb->ki_pos > pos) {
-		ssize_t err = vfs_fsync_range(file, pos, iocb->ki_pos - 1,
-				iocb->ki_flags & IOCB_SYNC);
+	if (iocb->ki_pos > pos) {
+		ssize_t err = generic_write_sync(iocb, iocb->ki_pos - pos);
 		if (err < 0)
 			return err;
 	}

From 7f4572a697fa838816437adc3e9645201dc9c817 Mon Sep 17 00:00:00 2001
From: daiyang5 <daiyang5@xiaomi.com>
Date: Mon, 7 Jul 2025 10:04:20 +0800
Subject: [PATCH 46/49] ANDROID: export folio_deactivate() for GKI purpose.

Export the symbol folio_deactivate() to access LRU list in ko module for customizing activate and deactivate operations. This is a necessary component of our memory reclaim strategy.

Bug: 429908837

Change-Id: Ied760489b2c1726dbfe52629f6d544aa607e5106
Signed-off-by: daiyang5 <daiyang5@xiaomi.com>
---
 mm/swap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/swap.c b/mm/swap.c
index 174259a9a5f7..30b5eebce985 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -736,6 +736,7 @@ void folio_deactivate(struct folio *folio)
 		local_unlock(&cpu_fbatches.lock);
 	}
 }
+EXPORT_SYMBOL_GPL(folio_deactivate);
 
 /**
  * folio_mark_lazyfree - make an anon folio lazyfree

From 615449fbacaefd7198d5d1404c6a3ef1149aba28 Mon Sep 17 00:00:00 2001
From: daiyang5 <daiyang5@xiaomi.com>
Date: Mon, 7 Jul 2025 08:59:31 +0800
Subject: [PATCH 47/49] ANDROID: GKI: Update symbol list for xiaomi

2 function symbol(s) added
  'void folio_deactivate(struct folio*)'
  'void folio_mark_accessed(struct folio*)'

Bug: 429908837

Change-Id: I575c450aa91ff4298f681203efd1debfb2c810c5
Signed-off-by: daiyang5 <daiyang5@xiaomi.com>
---
 android/abi_gki_aarch64.stg    | 20 ++++++++++++++++++++
 android/abi_gki_aarch64_xiaomi |  4 ++++
 2 files changed, 24 insertions(+)

diff --git a/android/abi_gki_aarch64.stg b/android/abi_gki_aarch64.stg
index 43a9176d2de6..78f10f7d9bec 100644
--- a/android/abi_gki_aarch64.stg
+++ b/android/abi_gki_aarch64.stg
@@ -393188,6 +393188,15 @@ elf_symbol {
   type_id: 0xf6f86f1f
   full_name: "folio_clear_dirty_for_io"
 }
+elf_symbol {
+  id: 0x1ac8aa52
+  name: "folio_deactivate"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0x7abc9b3a
+  type_id: 0x18c46588
+  full_name: "folio_deactivate"
+}
 elf_symbol {
   id: 0xf83588d6
   name: "folio_end_private_2"
@@ -393215,6 +393224,15 @@ elf_symbol {
   type_id: 0x637004ab
   full_name: "folio_mapping"
 }
+elf_symbol {
+  id: 0xd2e101fd
+  name: "folio_mark_accessed"
+  is_defined: true
+  symbol_type: FUNCTION
+  crc: 0x74311ee4
+  type_id: 0x18c46588
+  full_name: "folio_mark_accessed"
+}
 elf_symbol {
   id: 0xcef0ca54
   name: "folio_mark_dirty"
@@ -440717,9 +440735,11 @@ interface {
   symbol_id: 0x3c7c2553
   symbol_id: 0x06c58be7
   symbol_id: 0xab55569c
+  symbol_id: 0x1ac8aa52
   symbol_id: 0xf83588d6
   symbol_id: 0xa1c5bd8d
   symbol_id: 0x159a69a3
+  symbol_id: 0xd2e101fd
   symbol_id: 0xcef0ca54
   symbol_id: 0x39840ab2
   symbol_id: 0xc05a6c7d
diff --git a/android/abi_gki_aarch64_xiaomi b/android/abi_gki_aarch64_xiaomi
index a8531903d2a7..1cc056048b75 100644
--- a/android/abi_gki_aarch64_xiaomi
+++ b/android/abi_gki_aarch64_xiaomi
@@ -197,6 +197,10 @@
   __tracepoint_android_rvh_dequeue_task_fair
   __tracepoint_android_rvh_entity_tick
 
+# required by mi_damon.ko
+  folio_deactivate
+  folio_mark_accessed
+
 #required by cpq.ko
   elv_rb_former_request
   elv_rb_latter_request

From fe630a04152399fa0646fa16cabae8dee2901a20 Mon Sep 17 00:00:00 2001
From: Rui Chen <chenrui9@honor.com>
Date: Tue, 1 Jul 2025 17:57:26 +0800
Subject: [PATCH 48/49] FROMGIT: f2fs: introduce reserved_pin_section sysfs
 entry

This patch introduces /sys/fs/f2fs/<dev>/reserved_pin_section for tuning
@needed parameter of has_not_enough_free_secs(), if we configure it w/
zero, it can avoid f2fs_gc() as much as possible while fallocating on
pinned file.

Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: wangzijie <wangzijie1@honor.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>

Bug: 428889879
Bug: 431132476
(cherry picked from commit 59c1c89e9ba8cefff05aa982dd9e6719f25e8ec5
 https: //git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git dev)
Link: https://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git/commit/?id=59c1c89e9ba8
Change-Id: I07184caa6e5037d45258474dcca8adf1836b0f2d
Signed-off-by: Rui Chen <chenrui9@honor.com>
(cherry picked from commit 12727f8a4b65b2fb55a7fc88199ab5f854be52a4)
---
 Documentation/ABI/testing/sysfs-fs-f2fs | 9 +++++++++
 fs/f2fs/f2fs.h                          | 3 +++
 fs/f2fs/file.c                          | 5 ++---
 fs/f2fs/super.c                         | 4 ++++
 fs/f2fs/sysfs.c                         | 9 +++++++++
 5 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-fs-f2fs b/Documentation/ABI/testing/sysfs-fs-f2fs
index 7e7ffbe8167b..eec506c44d97 100644
--- a/Documentation/ABI/testing/sysfs-fs-f2fs
+++ b/Documentation/ABI/testing/sysfs-fs-f2fs
@@ -858,3 +858,12 @@ Description:	This is a read-only entry to show the value of sb.s_encoding_flags,
 		SB_ENC_STRICT_MODE_FL            0x00000001
 		SB_ENC_NO_COMPAT_FALLBACK_FL     0x00000002
 		============================     ==========
+
+What:		/sys/fs/f2fs/<disk>/reserved_pin_section
+Date:		June 2025
+Contact:	"Chao Yu" <chao@kernel.org>
+Description:	This threshold is used to control triggering garbage collection while
+		fallocating on pinned file, so, it can guarantee there is enough free
+		reserved section before preallocating on pinned file.
+		By default, the value is ovp_sections, especially, for zoned ufs, the
+		value is 1.
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 654812a3acc7..f0932cb5a18a 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -1703,6 +1703,9 @@ struct f2fs_sb_info {
 	/* for skip statistic */
 	unsigned long long skipped_gc_rwsem;		/* FG_GC only */
 
+	/* free sections reserved for pinned file */
+	unsigned int reserved_pin_section;
+
 	/* threshold for gc trials on pinned files */
 	unsigned short gc_pin_file_threshold;
 	struct f2fs_rwsem pin_sem;
diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
index 479d49dd4ce5..f8832212ee37 100644
--- a/fs/f2fs/file.c
+++ b/fs/f2fs/file.c
@@ -1859,9 +1859,8 @@ next_alloc:
 			}
 		}
 
-		if (has_not_enough_free_secs(sbi, 0, f2fs_sb_has_blkzoned(sbi) ?
-			ZONED_PIN_SEC_REQUIRED_COUNT :
-			GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi)))) {
+		if (has_not_enough_free_secs(sbi, 0,
+				sbi->reserved_pin_section)) {
 			f2fs_down_write(&sbi->gc_lock);
 			stat_inc_gc_call_count(sbi, FOREGROUND);
 			err = f2fs_gc(sbi, &gc_control);
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index 58d545d53aa6..b5d23377166d 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -4652,6 +4652,10 @@ try_onemore:
 	/* get segno of first zoned block device */
 	sbi->first_zoned_segno = get_first_zoned_segno(sbi);
 
+	sbi->reserved_pin_section = f2fs_sb_has_blkzoned(sbi) ?
+			ZONED_PIN_SEC_REQUIRED_COUNT :
+			GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi));
+
 	/* Read accumulated write IO statistics if exists */
 	seg_i = CURSEG_I(sbi, CURSEG_HOT_NODE);
 	if (__exist_node_summaries(sbi))
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index d4a63b0254b9..46216f0a203a 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -824,6 +824,13 @@ out:
 		return count;
 	}
 
+	if (!strcmp(a->attr.name, "reserved_pin_section")) {
+		if (t > GET_SEC_FROM_SEG(sbi, overprovision_segments(sbi)))
+			return -EINVAL;
+		*ui = (unsigned int)t;
+		return count;
+	}
+
 	*ui = (unsigned int)t;
 
 	return count;
@@ -1130,6 +1137,7 @@ F2FS_SBI_GENERAL_RO_ATTR(unusable_blocks_per_sec);
 F2FS_SBI_GENERAL_RW_ATTR(blkzone_alloc_policy);
 #endif
 F2FS_SBI_GENERAL_RW_ATTR(carve_out);
+F2FS_SBI_GENERAL_RW_ATTR(reserved_pin_section);
 
 /* STAT_INFO ATTR */
 #ifdef CONFIG_F2FS_STAT_FS
@@ -1323,6 +1331,7 @@ static struct attribute *f2fs_attrs[] = {
 	ATTR_LIST(last_age_weight),
 	ATTR_LIST(max_read_extent_count),
 	ATTR_LIST(carve_out),
+	ATTR_LIST(reserved_pin_section),
 	NULL,
 };
 ATTRIBUTE_GROUPS(f2fs);

From 2dabc476cf95d9303aee3f6766878584ea3a246b Mon Sep 17 00:00:00 2001
From: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
Date: Tue, 8 Jul 2025 13:28:38 +0530
Subject: [PATCH 49/49] FROMGIT: pinmux: fix race causing mux_owner NULL with
 active mux_usecount

commit 5a3e85c3c397 ("pinmux: Use sequential access to access
desc->pinmux data") tried to address the issue when two client of the
same gpio calls pinctrl_select_state() for the same functionality, was
resulting in NULL pointer issue while accessing desc->mux_owner.
However, issue was not completely fixed due to the way it was handled
and it can still result in the same NULL pointer.

The issue occurs due to the following interleaving:

     cpu0 (process A)                   cpu1 (process B)

      pin_request() {                   pin_free() {

                                         mutex_lock()
                                         desc->mux_usecount--; //becomes 0
                                         ..
                                         mutex_unlock()

  mutex_lock(desc->mux)
  desc->mux_usecount++; // becomes 1
  desc->mux_owner = owner;
  mutex_unlock(desc->mux)

                                         mutex_lock(desc->mux)
                                         desc->mux_owner = NULL;
                                         mutex_unlock(desc->mux)

This sequence leads to a state where the pin appears to be in use
(`mux_usecount == 1`) but has no owner (`mux_owner == NULL`), which can
cause NULL pointer on next pin_request on the same pin.

Ensure that updates to mux_usecount and mux_owner are performed
atomically under the same lock. Only clear mux_owner when mux_usecount
reaches zero and no new owner has been assigned.

Bug: 430525600
Fixes: 5a3e85c3c397 ("pinmux: Use sequential access to access desc->pinmux data")
Link: https://lore.kernel.org/lkml/20250708-pinmux-race-fix-v2-1-8ae9e8a0d1a1@oss.qualcomm.com/
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
(cherry picked from commit 0b075c011032f88d1cfde3b45d6dcf08b44140eb
 git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl.git for-next)
Change-Id: Iec29ea201ef0fc3d205bbc4f1a90cb5a56a62039
Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com>
---
 drivers/pinctrl/pinmux.c | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/drivers/pinctrl/pinmux.c b/drivers/pinctrl/pinmux.c
index 97e8af88df85..ab853d6c586b 100644
--- a/drivers/pinctrl/pinmux.c
+++ b/drivers/pinctrl/pinmux.c
@@ -238,6 +238,15 @@ static const char *pin_free(struct pinctrl_dev *pctldev, int pin,
 			if (desc->mux_usecount)
 				return NULL;
 		}
+
+		if (gpio_range) {
+			owner = desc->gpio_owner;
+			desc->gpio_owner = NULL;
+		} else {
+			owner = desc->mux_owner;
+			desc->mux_owner = NULL;
+			desc->mux_setting = NULL;
+		}
 	}
 
 	/*
@@ -249,17 +258,6 @@ static const char *pin_free(struct pinctrl_dev *pctldev, int pin,
 	else if (ops->free)
 		ops->free(pctldev, pin);
 
-	scoped_guard(mutex, &desc->mux_lock) {
-		if (gpio_range) {
-			owner = desc->gpio_owner;
-			desc->gpio_owner = NULL;
-		} else {
-			owner = desc->mux_owner;
-			desc->mux_owner = NULL;
-			desc->mux_setting = NULL;
-		}
-	}
-
 	module_put(pctldev->owner);
 
 	return owner;