Re: general PG network slowness (possible cure) (repost)
От | Peter T. Breuer |
---|---|
Тема | Re: general PG network slowness (possible cure) (repost) |
Дата | |
Msg-id | 200705251323.l4PDNIY30138@inv.it.uc3m.es обсуждение исходный текст |
Ответ на | general PG network slowness (possible cure) (repost) ("Peter T. Breuer" <ptb@inv.it.uc3m.es>) |
Список | pgsql-performance |
"Also sprach Kenneth Marshall:" > > Surprise, ... I got a speed up of hundreds of times. The same application > > that crawled under my original rgdbm implementation and under PG now > > maxed out the network bandwidth at close to a full 10Mb/s and 1200 > > pkts/s, at 10% CPU on my 700MHz client, and a bit less on the 1GHz > > server. > > > > So > > > > * Is that what is holding up postgres over the net too? Lots of tiny > > packets? > > > This effect is very common, but you are in effect altering the query/ I imagined so, but no, I am not changing the behaviour - I believe you are imagining something different here. Let me explain. It is usually the case that drivers and the network layer conspire to emit packets when they are otherwise idle, since they have nothing better to do. That is, if the transmission unit is the normal 1500B and there is 200B in the transmission buffer and nothing else is frisking them about the chops, something along the line will shrug and say, OK, I'll just send out a 200B fragment now, apologize, and send out another fragment later if anything else comes along for me to chunter out. It is also the case that drivers do the opposite .. that is, they do NOT send out packets when the transmission buffer is full, even if they have 1500B worth. Why? Well, on Ge for sure, and on 100BT most of the time, it doesn't pay to send out individual packets because the space required between packets is relatively too great to permit the network to work at that speed given the speed of light as it is, and the spacing it implies between packets (I remember when I advised the networking protocol people that Ge was a coming thing about 6 years ago, they all protested and said it was _physically_ impossible. It is. If you send packets one by one!). An ethernet line is fundamentally only electrical and only signals up or down (relative) and needs time to quiesce. And then there's the busmastering .. a PCI bus is only about 33MHz, and 32 bits wide (well, or 16 on portables, or even 64, but you're getting into heavy server equipment then). That's 128MB/s in one direction, and any time one releases the bus there's a re-setup time that costs the earth and will easily lower bandwidth by 75%. So drivers like to take the bus for a good few packets at a time. Even a single packet (1500B) will take 400 multi-step bus cycles to get to the card, and then it's a question of how much onboard memory it has or whether one has to drive it synchronously. Most cards have something like a 32-unit ring buffer, and I think each unit is considerable. Now, if a driver KNOWS what's coming then it can alter its behavior in order to mesh properly with the higher level layers. What I did was _tell_ the driver and the protocol not to send any data until I well and truly tell it to, and then told it to, when I was ready. The result is that a full communication unit (start, header, following data, and stop codon) was sent in one blast. That meant that there were NO tiny fragments blocking up the net, being sent wily-nily. And it also meant that the driver was NOT waiting for more info to come in before getting bored and sending out what it had. It did as I told it to. The evidence from monitoring the PG network thruput is that 75% of its packets are in the 64-128B range, including tcp header. That's hitting the 100Kb/s (10KB/s) bandwidth regime on my network at the lower end. It will be even _worse_ on a faster net, I think (feel free to send me a faster net to compare with :). I also graphed latency, but I haven't taken into account the results as the bandwidth measurements were so striking. > response behavior of the database. Most applications expect an answer > from the database after every query. Well of course. Nothing else would work! (I imagine you have some kind of async scheme, but I haven't investigated). I ask, the db replies. I ask, the db replies. What I did was 1) made the ASK go out as one lump. 2) made the REPLY go out as one lump 3) STOPPED the card waiting for several replies or asks to accumulate before sending out anything at all. > If it could manage retrying failed > queries later, you could use the typical sliding window/delayed ack > that is so useful in improving the bandwidth utilization of many network That is not what is going on (though that's not a bad idea). See above for the explanation. One has to take into account the physical hardware involved and its limitations, and arrange the communications accordingly. All I did was send EACH query and EACH response as a single unit, at the hardware level. One could do better still by managing _several_ threads communications at once. > programs. Maybe an option in libpq to tell it to use delayed "acks". I > do not know what would be involved. Nothing spectacular is required to see a considerable improvement, I think,. apart from a little direction from the high level protocol down to the driver about where the communication boundaries are. 1000% speedup in my case. Now, where is the actual socket send done in the pg code? I'd like to check what's happening in there. Peter
В списке pgsql-performance по дате отправления:
Предыдущее
От: Kristo KaivДата:
Сообщение: Re: My quick and dirty "solution" (Re: Performance Problem with Vacuum of bytea table (PG 8.0.13))
Следующее
От: "Peter T. Breuer"Дата:
Сообщение: Re: general PG network slowness (possible cure) (repost)