Обсуждение: Is there a cache consistent interface to tables ?

Поиск
Список
Период
Сортировка

Is there a cache consistent interface to tables ?

От
Garym
Дата:
Hi,
This is an odd request for help. I'm looking to expose an interface so an external app can insert to a table while
maintainingcache consistency and inserts be promoted via wal.  

I need to support about 100k+ inserts/sec from a sensor data stream. It simply won't work using sql queries.  If the
calloverhead is too high for single calls, multiple records per call is better. The data must be available for selects
in500ms.  I current only have 24gb ram for pg, but production will be 56gb.  

I'm taking this approach because pgpool2 chokes, delaying past requirements. I initially wanted to use wal, but masters
don'twant wal in feeds and slaves have unpredictable delays of seconds before provisioning occurs.  

Suggestions are appreciated.

Gary



Sent from my iPad

Re: Is there a cache consistent interface to tables ?

От
"David G. Johnston"
Дата:
On Fri, Feb 9, 2018 at 12:56 AM, Garym <garym@oedata.com> wrote:
Hi,
This is an odd request for help. I'm looking to expose an interface so an external app can insert to a table while maintaining cache consistency and inserts be promoted via wal.

​I don't understand what that all means (what cache? inserts be promoted via wal?) but the most efficient loading mechanism in PostgreSQL is the "COPY"​ SQL command.

​David J.

Re: Is there a cache consistent interface to tables ?

От
Craig Ringer
Дата:
On 9 February 2018 at 15:56, Garym <garym@oedata.com> wrote:
Hi,
This is an odd request for help. I'm looking to expose an interface so an external app can insert to a table while maintaining cache consistency and inserts be promoted via wal.

I need to support about 100k+ inserts/sec from a sensor data stream. It simply won't work using sql queries.  If the call overhead is too high for single calls, multiple records per call is better. The data must be available for selects in 500ms.  I current only have 24gb ram for pg, but production will be 56gb.

I'm taking this approach because pgpool2 chokes, delaying past requirements. I initially wanted to use wal, but masters don't want wal in feeds and slaves have unpredictable delays of seconds before provisioning occurs.


So you're looking to use Pg in a near-realtime application?

Expect to do some work on latency spikes - scheduling issues, checkpoints, etc. I strongly advise you to spend some quality time getting faimiliar with perf, DTrace, systemtap, Linux eBPF tracing (http://www.brendangregg.com/ebpf.html), or the like. Tuning of kernel options related to I/O and writeback is likely to be needed, also scheduler and memory settings.

How sure are you that it's not viable for SQL queries? And, if not, what makes you think that a lower level interface will help you? Has profiling and tracing/timing shown that significant time/delays are arising from layers you can bypass in a sensible way?

You definitely *can* use the heapam and indexam at a lower level to form tuples and insert into tables, then update the indexes. See genam.c for one example, but it's optimised for ease of use more than tight performance AFAIK. You're looking for heap_open, heap_form_tuple, heap_insert, etc. Beware of index maintenance.

You can probably gain a fair bit with some caching of all the type and relation oids etc, but of course you must ensure you subscribe to the necessary relcache/syscache invalidations and act on them appropriately. See inval.[ch] .

You'll definitely want to batch into txns and use async commit. But beware the data durability implications.

BDR and pglogical do some of this, you can take a look at them for some ideas/examples.

Make sure you have a buffering layer that can accumulate rows if there's a DB failure/outage etc. Otherwise you can never, ever, ever upgrade, diagnostics and maintenance are harder, etc. Don't fire-and-forget. It can be a simple producer/consumer that writes sequentially to a collection of buffer files or whatever.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Is there a cache consistent interface to tables ?

От
Gary M
Дата:
Thanks for the replies.  

@Craig

Realtime ? well not really. I guess it's your definition of realtime.. I usually think of micro to nano seconds as real-time. If I were still designing chips, I'd be calling picosecs real-time these days.  

Thank you for the recommendation on performance tuning tools. I am familiar with OS tuning, I spent many years as a performance consultant targeting OS integration and array controller design for major storage vendors. Most don't consider to major design changes, but that's another story.

"Expect to do some work on latency spikes - scheduling issues, checkpoints, etc."
Yes, I have been bumping into them. I'm trying to profile those issue as I'm typing this.  

"How sure are you that it's not viable for SQL queries?"
I'm topping off at 20K/s inserts due to locking and scheduling issues. There are peaks reaching 50K inserts, but not sustained. I also placed the login on a separate disk to prevent collisions. I have backing storage with 50us latency and 400mb/s bandwidth, so backing storage is not the issue. 

"And, if not, what makes you think that a lower level interface will help you? "
Well, I was thinking sql parsing and planning is an unnecessary step that can be removed from the pipeline. Instead of parsing the data stream to another format and creating a query, I'll just have the stream parser(s) generate the table structure and write directly to the tables. 

 "Has profiling and tracing/timing shown that significant time/delays are arising from layers you can bypass in a sensible way?"
"Sensible" is why I'm posting the question here.  I'm not familiar enough with the code and processing pipelines to understand subtleties effecting high volume insert performance.  There is a lot of "stuff" going on, much of it event driven.  My first impulse is a pipeline scheduler taking advantage of processor affinity.  Its an ugly, brute force approach, but it does work.

"You can probably gain a fair bit with some caching of all the type and relation oids etc, but of course you must ensure you subscribe to the necessary relcache/syscache invalidations and act on them appropriately. See inval.[ch] ."
Roger.. Thank you... 

"You'll definitely want to batch into txns and use async commit. But beware the data durability implications."
I'm assuming batches based on block size.

"BDR and pglogical do some of this, you can take a look at them for some ideas/examples."
Thanks... I'll look at them today...

"Make sure you have a buffering layer that can accumulate rows if there's a DB failure/outage etc. Otherwise you can never, ever, ever upgrade, diagnostics and maintenance are harder, etc. Don't fire-and-forget. It can be a simple producer/consumer that writes sequentially to a collection of buffer files or whatever."

The final design will likely require some type of shared block storage.  I've always liked the Solaris ZFS LVM layer, although is not distributed.  Fire and forget is not an option here. I was hoping to leverage existing postgres facilities for that heavy lifting. That's why I had originally looked at the wal interface. 

I'm also looking at approaches from a project called "Bottled-Water" https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/  

One more fact I forgot to add.. The insert load into the database is about 2kb/record or about 200MB/s.  

thank you
gary

On Fri, Feb 9, 2018 at 7:40 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 9 February 2018 at 15:56, Garym <garym@oedata.com> wrote:
Hi,
This is an odd request for help. I'm looking to expose an interface so an external app can insert to a table while maintaining cache consistency and inserts be promoted via wal.

I need to support about 100k+ inserts/sec from a sensor data stream. It simply won't work using sql queries.  If the call overhead is too high for single calls, multiple records per call is better. The data must be available for selects in 500ms.  I current only have 24gb ram for pg, but production will be 56gb.

I'm taking this approach because pgpool2 chokes, delaying past requirements. I initially wanted to use wal, but masters don't want wal in feeds and slaves have unpredictable delays of seconds before provisioning occurs.


So you're looking to use Pg in a near-realtime application?

Expect to do some work on latency spikes - scheduling issues, checkpoints, etc. I strongly advise you to spend some quality time getting faimiliar with perf, DTrace, systemtap, Linux eBPF tracing (http://www.brendangregg.com/ebpf.html), or the like. Tuning of kernel options related to I/O and writeback is likely to be needed, also scheduler and memory settings.

How sure are you that it's not viable for SQL queries? And, if not, what makes you think that a lower level interface will help you? Has profiling and tracing/timing shown that significant time/delays are arising from layers you can bypass in a sensible way?

You definitely *can* use the heapam and indexam at a lower level to form tuples and insert into tables, then update the indexes. See genam.c for one example, but it's optimised for ease of use more than tight performance AFAIK. You're looking for heap_open, heap_form_tuple, heap_insert, etc. Beware of index maintenance.

You can probably gain a fair bit with some caching of all the type and relation oids etc, but of course you must ensure you subscribe to the necessary relcache/syscache invalidations and act on them appropriately. See inval.[ch] .

You'll definitely want to batch into txns and use async commit. But beware the data durability implications.

BDR and pglogical do some of this, you can take a look at them for some ideas/examples.

Make sure you have a buffering layer that can accumulate rows if there's a DB failure/outage etc. Otherwise you can never, ever, ever upgrade, diagnostics and maintenance are harder, etc. Don't fire-and-forget. It can be a simple producer/consumer that writes sequentially to a collection of buffer files or whatever.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Is there a cache consistent interface to tables ?

От
Craig Ringer
Дата:
On 10 February 2018 at 14:26, Gary M <garym@oedata.com> wrote:
Thanks for the replies.  

@Craig

Realtime ? well not really. I guess it's your definition of realtime. I usually think of micro to nano seconds as real-time. If I were still designing chips, I'd be calling picosecs real-time these days.  

Hence "soft realtime".
 
"How sure are you that it's not viable for SQL queries?"
I'm topping off at 20K/s inserts due to locking and scheduling issues. There are peaks reaching 50K inserts, but not sustained. I also placed the login on a separate disk to prevent collisions. I have backing storage with 50us latency and 400mb/s bandwidth, so backing storage is not the issue.

Sure, but none of that has anything to do with SQL vs other access methods.

Where's your indication that you're tackling a bottleneck, not something unrelated?

I expect you're probably having more issues around the relation extension lock, wal lock etc than anything else. But you'll need to do some profiling and diagnostic work to determine that.

Otherwise this is all just a variant on premature optimisation.

"And, if not, what makes you think that a lower level interface will help you? "
Well, I was thinking sql parsing and planning is an unnecessary step that can be removed from the pipeline. Instead of parsing the data stream to another format and creating a query, I'll just have the stream parser(s) generate the table structure and write directly to the tables.

A variant of that is certainly practical - and not that hard.  Again, see heap_open, heap_form_tuple, etc. But beware index maintenance. 

You will NOT bypass various write bottlenecking locks this way.

 "Has profiling and tracing/timing shown that significant time/delays are arising from layers you can bypass in a sensible way?"
"Sensible" is why I'm posting the question here.  I'm not familiar enough with the code and processing pipelines to understand subtleties effecting high volume insert performance.  There is a lot of "stuff" going on, much of it event driven.  My first impulse is a pipeline scheduler taking advantage of processor affinity.  Its an ugly, brute force approach, but it does work.

Do you have any indication CPU contention is even the issue?

Establishing CPU affinity probably does make sense, as does optimising the number of executors for minimal contention / max throughput. But if your performance is actually limited by lock contention, I/O, memory throughput, cache churn, or whatever, it may not help much.
 
The final design will likely require some type of shared block storage.  I've always liked the Solaris ZFS LVM layer, although is not distributed.  Fire and forget is not an option here. I was hoping to leverage existing postgres facilities for that heavy lifting. That's why I had originally looked at the wal interface. 

My point is that you'll want something in front that you can use to buffer work during short DB outages / failovers / whatever. Unless you're OK with dropping trace data when the DB is inaccessible, or having it buffered sender-side.

I'm also looking at approaches from a project called "Bottled-Water" https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/  

Not sure how that's relevant / will help you. It's about getting a change stream *out* of PostgreSQL to drive Kafka.

pglogical3 will also have some features related to feeding changes to Kafka by the way.
 
One more fact I forgot to add.. The insert load into the database is about 2kb/record or about 200MB/s.  

So you're presumably rotating partitions or something; ageing things out.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Is there a cache consistent interface to tables ?

От
Gary M
Дата:
Thanks Craig,

As I'm back in pg code after many years, I'm feeling much better there's one (1) or two (2) items causing the hiccup.  Rereading your comments, I'm agreeing with you. I'm considering bumping up the ram to 512gb as a RAM disk just for consistent profiling (an old bad habit). 

I'm reassessing my profiling plan.  I'm getting ahead of myself thinking the OS is not having a significant effect. 

I'm using RHEL7.4,  SeLinux MLS enabled using the C2S profile and some additional lock-downs applied. As a result of lock-down scripts, "tuned" has been removed and has never been executed on the distribution.  Also, the processor is a Haswell with 14 cores/28 processor threads, it may have a impact on untuned thread scheduling. I need to get the OS tuned properly for the platform.  I don't like profiling apps without knowing  what the underlying platform is doing.  After tuning and profiling the OS, I'll restart pg profiling starting at the OS interfaces working my way in. 

I thought this was going to take less time, I'll need to adjust my schedule..  

On Sat, Feb 10, 2018 at 4:45 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 10 February 2018 at 14:26, Gary M <garym@oedata.com> wrote:
Thanks for the replies.  

@Craig

Realtime ? well not really. I guess it's your definition of realtime. I usually think of micro to nano seconds as real-time. If I were still designing chips, I'd be calling picosecs real-time these days.  

Hence "soft realtime".
 
"How sure are you that it's not viable for SQL queries?"
I'm topping off at 20K/s inserts due to locking and scheduling issues. There are peaks reaching 50K inserts, but not sustained. I also placed the login on a separate disk to prevent collisions. I have backing storage with 50us latency and 400mb/s bandwidth, so backing storage is not the issue.

Sure, but none of that has anything to do with SQL vs other access methods.

Where's your indication that you're tackling a bottleneck, not something unrelated?

I expect you're probably having more issues around the relation extension lock, wal lock etc than anything else. But you'll need to do some profiling and diagnostic work to determine that.

Otherwise this is all just a variant on premature optimisation.

"And, if not, what makes you think that a lower level interface will help you? "
Well, I was thinking sql parsing and planning is an unnecessary step that can be removed from the pipeline. Instead of parsing the data stream to another format and creating a query, I'll just have the stream parser(s) generate the table structure and write directly to the tables.

A variant of that is certainly practical - and not that hard.  Again, see heap_open, heap_form_tuple, etc. But beware index maintenance. 

You will NOT bypass various write bottlenecking locks this way.

 "Has profiling and tracing/timing shown that significant time/delays are arising from layers you can bypass in a sensible way?"
"Sensible" is why I'm posting the question here.  I'm not familiar enough with the code and processing pipelines to understand subtleties effecting high volume insert performance.  There is a lot of "stuff" going on, much of it event driven.  My first impulse is a pipeline scheduler taking advantage of processor affinity.  Its an ugly, brute force approach, but it does work.

Do you have any indication CPU contention is even the issue?

Establishing CPU affinity probably does make sense, as does optimising the number of executors for minimal contention / max throughput. But if your performance is actually limited by lock contention, I/O, memory throughput, cache churn, or whatever, it may not help much.
 
The final design will likely require some type of shared block storage.  I've always liked the Solaris ZFS LVM layer, although is not distributed.  Fire and forget is not an option here. I was hoping to leverage existing postgres facilities for that heavy lifting. That's why I had originally looked at the wal interface. 

My point is that you'll want something in front that you can use to buffer work during short DB outages / failovers / whatever. Unless you're OK with dropping trace data when the DB is inaccessible, or having it buffered sender-side.

I'm also looking at approaches from a project called "Bottled-Water" https://www.confluent.io/blog/bottled-water-real-time-integration-of-postgresql-and-kafka/  

Not sure how that's relevant / will help you. It's about getting a change stream *out* of PostgreSQL to drive Kafka.

pglogical3 will also have some features related to feeding changes to Kafka by the way.
 
One more fact I forgot to add.. The insert load into the database is about 2kb/record or about 200MB/s.  

So you're presumably rotating partitions or something; ageing things out.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Is there a cache consistent interface to tables ?

От
Craig Ringer
Дата:
On 11 February 2018 at 04:44, Gary M <garym@oedata.com> wrote:
Thanks Craig,

As I'm back in pg code after many years, I'm feeling much better there's one (1) or two (2) items causing the hiccup.  Rereading your comments, I'm agreeing with you. I'm considering bumping up the ram to 512gb as a RAM disk just for consistent profiling (an old bad habit). 

I'm reassessing my profiling plan.  I'm getting ahead of myself thinking the OS is not having a significant effect. 

Definitely. Especially since PostgreSQL uses buffered I/O.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services