memcg: rate limit rstat flush in reclaim

memcg maintains stats in rstat infra where the updates are very fast
and per-cpu while the accurate stats read are slow due to the need of
flushing the per-cpu (and hierarchical) stats. There is also a periodic
async stats flusher which every 2 seconds, flush the stats for the whole
memcg tree. This change basically converts the always synchronous flush
in reclaim code path to flush only if the periodic flush is delayed.

In prepare_scan_count(), there are two heuristics which read memcg stats:
(1) anon LRU deactivation and (2) File cache trimming. With the proposed
change, the kernel might read out of date stats (at most 2 seconds out of
date). We are not much concerned on these two heuristics getting a bit
outdated stats as these are just heuristics and kernel does a lot of
reclaim retries. For the long term, COS should move to MultigenLRU to
avoid such heuristics completely.

BUG=b/246641795
TEST=presubmit, b/246641795#comment69
RELEASE_NOTE=Fixed a performance issue that was observed in Postgres databases.

cos-patch: bug
Change-Id: Icd50fe0ccce09553738b37cbacdbe94d66106357
Reviewed-on: https://cos-review.googlesource.com/c/third_party/kernel/+/62857
Main-Branch-Verified: Cusky Presubmit Bot <presubmit@cos-infra-prod.iam.gserviceaccount.com>
Reviewed-by: Oleksandr Tymoshenko <ovt@google.com>
Reviewed-by: Robert Kolchmeyer <rkolchmeyer@google.com>
Tested-by: Cusky Presubmit Bot <presubmit@cos-infra-prod.iam.gserviceaccount.com>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 93d6f27..2280218 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2810,7 +2810,7 @@
 	 * Flush the memory cgroup stats, so that we read accurate per-memcg
 	 * lruvec stats for heuristics.
 	 */
-	mem_cgroup_flush_stats();
+	mem_cgroup_flush_stats_delayed();
 
 	/*
 	 * Determine the scan balance between anon and file LRUs.