Discussion:
[Bug 203891] Consider supporting linux' sync_file_range()
(too old to reply)
b***@freebsd.org
2015-10-20 10:41:34 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203891

Bug ID: 203891
Summary: Consider supporting linux' sync_file_range()
Product: Base System
Version: 11.0-CURRENT
Hardware: Any
OS: Any
Status: New
Severity: Affects Some People
Priority: ---
Component: kern
Assignee: freebsd-***@FreeBSD.org
Reporter: ***@anarazel.de
CC: ***@FreeBSD.org, ***@FreeBSD.org

Hi,

postgresql is about to use sync_file_range(SYNC_FILE_RANGE_WRITE) to control
writeback more explicitly. It'd be cool if more OSs than just linux could
benefit.

Some background:
Postgres regularly 'checkpoints' it's in-memory data to disk, to be able to
remove older journalling/write ahead log data. In a database with a write heavy
workload that can imply a lot of writes. At the end of the checkpoint postgres
then fsync()s all the files. This unfortunately often causes latency spikes
because a) the fsyncs at the end might have to write back a lot of data,
unnecessarily stalling other IO b) before the fsync a lot of dirty data might
accumulate kernel-side, which then also can trigger latency spikes. Often this
also leads to irregular IO with periods of no IO.

What postgres is going to do on linux is to issue
sync_file_range(SYNC_FILE_RANGE_WRITE) every few (32 seems to work well) blocks
during the checkpoint. That makes it rather likely that there's little dirty
data remaining when the fsync()s at the end are executed, making them fast. It
also prevents large amounts of dirty buffers from accumulating.

We've considered some alternative approaches to this for other operating
systems. For one there's posix_fadvise(POSIX_FADV_DONTNEED), but that does more
than just writeout dirty data. I've also tried mmap();msync(MS_ASYNC);munmap();
- but at least on linux that doesn't do anything. Using MS_SYNC flushes to disk
on linux, but it's synchronous, which isn't what we want here.


I find the sync_file_range() API to be rather useful - so I think it'd make
sense to implement it. But baring that, could you possibly clarify somewhere
public whether msync(MS_ASYNC) does what we'd need it to do on freebsd? I.e.
initiate writeback, without blocking?


Regards,

Andres Freund
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-10-20 10:57:21 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203891

Kubilay Kocak <***@FreeBSD.org> changed:

What |Removed |Added
----------------------------------------------------------------------------
Keywords| |feature, needs-patch,
| |performance

--- Comment #1 from Kubilay Kocak <***@FreeBSD.org> ---
Thank you for creating this issue report as requested Andres. This issue just
got first use of the 'performance' keyword. \o/

Additionally, it appears Hadoop HDFS [1], Redis [2] and MongoDB [3] among
others use this system call:

[1] https://issues.apache.org/jira/browse/HDFS-6109
[2] https://github.com/antirez/redis/issues/667
[3] https://jira.mongodb.org/browse/SERVER-18649
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-10-20 16:37:07 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203891

***@rlwinm.de changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |***@rlwinm.de

--- Comment #2 from ***@rlwinm.de ---
While not exactly what you're asking for FreeBSD does offer aio_fsync(2) to
implement asynchronous file syncing. In theory this offers the kernel the
opportunity to flush data to stable storage at the optimal pace and notify the
application about it. If we're talking about OS specific code paths its also
possible to receive the notification via kqueue(2) + kevent(2) instead of
polling or signals. Too bad that the kernel requires that much babysitting from
userland.
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-10-20 18:11:22 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203891

--- Comment #3 from Warner Losh <***@FreeBSD.org> ---
making sync_file_range should be straight forward. fsync
is internally implemented as sync_file_range(0, 0, SYNC)
in the kernel right now, so adding the right glue in the kernel
would be easy. well, except for all the hedging of the wording
for the flags...
--
You are receiving this mail because:
You are the assignee for the bug.
b***@freebsd.org
2015-10-20 18:11:42 UTC
Permalink
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=203891

--- Comment #4 from Warner Losh <***@FreeBSD.org> ---
making sync_file_range should be straight forward. fsync
is internally implemented as sync_file_range(0, 0, SYNC)
in the kernel right now, so adding the right glue in the kernel
would be easy. well, except for all the hedging of the wording
for the flags...
--
You are receiving this mail because:
You are the assignee for the bug.
Loading...