Re: Cache relation sizes?

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: Cache relation sizes?
Дата
Msg-id CAKU4AWpRELqgWnZ1mEU9as9Pb8C9OGwHKek+OQt=1w+VNQsbpA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Cache relation sizes?  (Thomas Munro <thomas.munro@gmail.com>)
Ответы Re: Cache relation sizes?  (Thomas Munro <thomas.munro@gmail.com>)
Список pgsql-hackers


On Thu, Dec 24, 2020 at 6:59 AM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Dec 17, 2020 at 10:22 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
> Let me try to understand your point.  Suppose process 1 extends a file to
> 2 blocks from 1 block, and fsync is not called, then a). the lseek *may* still
> return 1 based on the comments in the ReadBuffer_common ("because
> of buggy Linux kernels that sometimes return an seek SEEK_END result
> that doesn't account for a recent write").  b). When this patch is introduced,
> we would get 2 blocks for sure if we can get the size from cache. c). After
> user reads the 2 blocks from cache and then the cache is evicted, and user
> reads the blocks again, it is possible to get 1 again.
>
> Do we need to consider case a) in this patch? and Is case c). the situation you
> want to avoid and do we have to introduce fsync to archive this? Basically I
> think case a) should not happen by practice so we don't need to handle case
> c) and finally we don't need the fsync/SR_SYNCING/SR_JUST_DIRTIED flag.
> I'm not opposed to adding them, I am just interested in the background in case
> you are also willing to discuss it.

Sorry for the lack of background -- there was some discussion on the
thread "Optimize dropping of relation buffers using dlist", but it's
long and mixed up with other topics.  I'll try to summarise here.

It's easy to reproduce files shrinking asynchronously on network
filesystems that reserve space lazily[1],

Thanks for the explaintain. After I read that thread, I am more confused now.  
In [1],  we have the below test result. 

$ cat magic_shrinking_file.c
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main()
{
int fd;
char buffer[8192] = {0};

fd = open("/mnt/test_loopback_remote/dir/file", O_RDWR | O_APPEND);
if (fd < 0) {
perror("open");
return EXIT_FAILURE;
}
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));
printf("write(...) = %zd\n", write(fd, buffer, sizeof(buffer)));
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));
printf("fsync(...) = %d\n", fsync(fd));
printf("lseek(..., SEEK_END) = %jd\n", lseek(fd, 0, SEEK_END));

return EXIT_SUCCESS;
}
$ cc magic_shrinking_file.c
$ ./a.out
lseek(..., SEEK_END) = 9670656
write(...) = 8192
lseek(..., SEEK_END) = 9678848
fsync(...) = -1
lseek(..., SEEK_END) = 9670656





I got 2 information from above.  a) before the fsync,  the lseek(fd, 0, SEEK_END)
returns a correct value,  however after the fsync,  it returns a wrong value.  b).  
the fsync failed(return values is -1) in the above case.  I'm more confused 
because of point a).  Since the fsync can't correct some wrong value, what 
is the point to call fsync in this patch (I agree that it is good to fix this 
reliability problem within this  patch, but I'm doubtful if fsync can help in 
this case). Am I missing something obviously? 

I read your patch with details some days ago and feel it is in pretty good shape.  
and I think you are clear about the existing issues very well. like  a).  smgr_alloc_sr use a 
FIFO design.  b). SR_PARTITION_LOCK doesn't use a partition lock really.  c). and
looks like you have some concern about the concurrency issue, which I didn't find
anything wrong.  d).  how to handle the DIRTY sr during evict.  I planned to recheck
item c) before this reply, but looks like the time is not permitted.  May I know what
Is your plan about this patch? 

--
Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Rethinking plpgsql's assignment implementation
Следующее
От: Noah Misch
Дата:
Сообщение: Tying an object's ownership to datdba