Обсуждение: Highly Efficient Custom Sorting

От:
Eliot Gable
Дата:

I have a long stored procedure (over 3,000 lines). Originally, it would take about 44ms to run the whole query. After lots and lots of tweaking, Postgres now runs the entire thing and gathers my results in just 15.2ms, which is very impressive given the hardware this is running on. Now, I used to return the results unsorted to my C++ backend and then sort them there using my custom sort order which provides prioritized, weighted random ordering with 4 different priority fields and 3 different weighting fields within 3 of those 4 priority fields. Needless to say, the sorting is quite complex. I wanted to cut down on the amount of data being delivered to my C++ backend, so I am using the stored procedure to insert a summary of my results directly into the database, which is far more efficient than dumping it all to the C++ backend (including stuff that isn't really needed there) and then dumping it all back to Postgres via INSERTS later in the execution path. The problem is that I want the results sorted in this custom order before they are stored in the database. (By sorted, I mean I want to include a field that lists a numerical order for the set of results.) Thus, I used to dump everything to the C++ program, perform the sorting, then INSERT back to Postgres. This was obviously not all that efficient. Now, the sorting in C++ took <1ms to accomplish. When I re-wrote the sorting in pl/pgsql using a couple of additional stored procedures, I discovered it is taking 15.2ms to perform the sort of the records within Postgres. This almost cancels out all of the prior optimizations I previously performed:

T:20100702001841+0903010 TID:0x43945940 INFO:NOTICE:  Sorting group ...
<snip>
...
</snip>
T:20100702001841+0917683 TID:0x43945940 INFO:NOTICE:  Sorting 1 rows in priority 1... <-- Last sort item
T:20100702001841+0918292 TID:0x43945940 INFO:NOTICE:    

918,292 - 903,010 = 15,282 us = 15.282 ms

So, the bottom line is, I need a faster way to do this sorting. 

What options are at my disposal for improving the speed and efficiency of this sorting? Which is the easiest to implement? What are the drawbacks of each different method?


Thanks in advance for your insights.


--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
От:
Craig Ringer
Дата:

On 02/07/10 08:46, Eliot Gable wrote:

> So, the bottom line is, I need a faster way to do this sorting.

You haven't showed us how you're doing it at the moment, so it's awfully
hard to comment usefully on possible approaches.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

От:
Tom Lane
Дата:

Craig Ringer <> writes:
> On 02/07/10 08:46, Eliot Gable wrote:
>> So, the bottom line is, I need a faster way to do this sorting.

> You haven't showed us how you're doing it at the moment, so it's awfully
> hard to comment usefully on possible approaches.

I'm guessing from tea leaves, but the impression I got from Eliot's
description is that he's using plpgsql functions as sort comparators.
It's not surprising that that sucks performance-wise compared to having
the equivalent logic in C/C++ functions used as comparators on the
client side.  plpgsql is no speed demon.  Best fix might be to code the
comparators as C functions on the server side.

            regards, tom lane

От:
Merlin Moncure
Дата:

On Thu, Jul 1, 2010 at 8:46 PM, Eliot Gable
<> wrote:
> I have a long stored procedure (over 3,000 lines). Originally, it would take
> about 44ms to run the whole query. After lots and lots of tweaking, Postgres
> now runs the entire thing and gathers my results in just 15.2ms, which is
> very impressive given the hardware this is running on. Now, I used to return
> the results unsorted to my C++ backend and then sort them there using my
> custom sort order which provides prioritized, weighted random ordering with
> 4 different priority fields and 3 different weighting fields within 3 of
> those 4 priority fields. Needless to say, the sorting is quite complex. I
> wanted to cut down on the amount of data being delivered to my C++ backend,
> so I am using the stored procedure to insert a summary of my results
> directly into the database, which is far more efficient than dumping it all
> to the C++ backend (including stuff that isn't really needed there) and then
> dumping it all back to Postgres via INSERTS later in the execution path. The
> problem is that I want the results sorted in this custom order before they
> are stored in the database. (By sorted, I mean I want to include a field
> that lists a numerical order for the set of results.) Thus, I used to dump
> everything to the C++ program, perform the sorting, then INSERT back to
> Postgres. This was obviously not all that efficient. Now, the sorting in C++
> took <1ms to accomplish. When I re-wrote the sorting in pl/pgsql using a
> couple of additional stored procedures, I discovered it is taking 15.2ms to
> perform the sort of the records within Postgres. This almost cancels out all
> of the prior optimizations I previously performed:
> T:20100702001841+0903010 TID:0x43945940 INFO:NOTICE:  Sorting group ...
> <snip>
> ...
> </snip>

what are you sorting and how are you sorting it?

merlin

От:
Eliot Gable
Дата:

Yes, I have two pl/pgsql functions. They take a prepared set of data (just the row id of the original results, plus the particular priority and weight fields) and they return the same set of data with an extra field called "order" which contains a numerical order to apply when sorting the rows. One function uses the priority information to break everything into priority groups, then calls the other function for each priority group. Each time it gets results back from the inner function, it returns that set of results. When it has looped through all priority groups, then it returns the full built-up set of results back to the calling function. 

The pl/pgsql functions implementing the sort are as optimized as they are likely to get. I don't want to waste my time trying to further optimize pl/pgsql functions that are never going to be as fast and efficient as I need. I would rather spend that time re-writing it in C and get sorting back to <1ms. 

I guess the real question is, is a generic C sorting function my only real alternative? Is there anything else that would allow me to sort things faster than pl/pgsql functions? For example, if I used pl/perl, would I be able to expect considerably better performance for sorting than using pl/pgsql? What about other supported languages? If I can get close to 1ms sorting performance without resorting to C, it would save me much time and frustration. 

On Fri, Jul 2, 2010 at 12:08 AM, Tom Lane <> wrote:
Craig Ringer <> writes:
> On 02/07/10 08:46, Eliot Gable wrote:
>> So, the bottom line is, I need a faster way to do this sorting.

> You haven't showed us how you're doing it at the moment, so it's awfully
> hard to comment usefully on possible approaches.

I'm guessing from tea leaves, but the impression I got from Eliot's
description is that he's using plpgsql functions as sort comparators.
It's not surprising that that sucks performance-wise compared to having
the equivalent logic in C/C++ functions used as comparators on the
client side.  plpgsql is no speed demon.  Best fix might be to code the
comparators as C functions on the server side.

                       regards, tom lane



--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
От:
Matthew Wakeling
Дата:

> On Fri, Jul 2, 2010 at 12:08 AM, Tom Lane <> wrote:
>> I'm guessing from tea leaves, but the impression I got from Eliot's
>> description is that he's using plpgsql functions as sort comparators.
>> It's not surprising that that sucks performance-wise compared to having
>> the equivalent logic in C/C++ functions used as comparators on the
>> client side.  plpgsql is no speed demon.  Best fix might be to code the
>> comparators as C functions on the server side.

On Fri, 2 Jul 2010, Eliot Gable wrote:
> I guess the real question is, is a generic C sorting function my only real
> alternative?

Sounds to me like you are not really listening. You don't need to code an
entire sorting algorithm in C, as Postgres already has a pretty good one
of those. All you need to do is implement a comparator of some kind.
Inserting C functions into Postgres is pretty easy, especially on the
level of comparators.

Matthew

--
 For those of you who are into writing programs that are as obscure and
 complicated as possible, there are opportunities for... real fun here
                                        -- Computer Science Lecturer

От:
Merlin Moncure
Дата:

On Fri, Jul 2, 2010 at 10:50 AM, Matthew Wakeling <> wrote:
>> On Fri, Jul 2, 2010 at 12:08 AM, Tom Lane <> wrote:
>>>
>>> I'm guessing from tea leaves, but the impression I got from Eliot's
>>> description is that he's using plpgsql functions as sort comparators.
>>> It's not surprising that that sucks performance-wise compared to having
>>> the equivalent logic in C/C++ functions used as comparators on the
>>> client side.  plpgsql is no speed demon.  Best fix might be to code the
>>> comparators as C functions on the server side.
>
> On Fri, 2 Jul 2010, Eliot Gable wrote:
>>
>> I guess the real question is, is a generic C sorting function my only real
>> alternative?
>
> Sounds to me like you are not really listening. You don't need to code an
> entire sorting algorithm in C, as Postgres already has a pretty good one of
> those. All you need to do is implement a comparator of some kind. Inserting
> C functions into Postgres is pretty easy, especially on the level of
> comparators.

in recent versions of postgres you rarely if ever even have to do that
-- row types are comparable w/o any extra work, as are arrays.  If
Eliot would just give a little more deal of WHAT he is trying to sort
and HOW he is currently doing it, i suspect his problem will be
trivially solved :-).

merlin

От:
Craig James
Дата:

On 7/2/10 6:59 AM, Eliot Gable wrote:
> Yes, I have two pl/pgsql functions. They take a prepared set of data
> (just the row id of the original results, plus the particular priority
> and weight fields) and they return the same set of data with an extra
> field called "order" which contains a numerical order to apply when
> sorting the rows. One function uses the priority information to break
> everything into priority groups, then calls the other function for each
> priority group. Each time it gets results back from the inner function,
> it returns that set of results. When it has looped through all priority
> groups, then it returns the full built-up set of results back to the
> calling function.
>
> The pl/pgsql functions implementing the sort are as optimized as they
> are likely to get. I don't want to waste my time trying to further
> optimize pl/pgsql functions that are never going to be as fast and
> efficient as I need. I would rather spend that time re-writing it in C
> and get sorting back to <1ms.
>
> I guess the real question is, is a generic C sorting function my only
> real alternative? Is there anything else that would allow me to sort
> things faster than pl/pgsql functions? For example, if I used pl/perl,
> would I be able to expect considerably better performance for sorting
> than using pl/pgsql? What about other supported languages? If I can get
> close to 1ms sorting performance without resorting to C, it would save
> me much time and frustration.

Try coding it in perl on the server.  It is MUCH easier to code, and you don't have to link anything or learn the
esotericdetails of the Postgres/C API. 

Perl itself is written in C, and some of it's operations are extremely fast.  Depending on the size and complexity of
yourdata structures, Perl code may be just as fast as code you could write in C. 

Even if it turns out to be slower than you like, it will give you a way to package up your sort functionality into a
functioncall, so if you later find you need to replace the Perl function with a C function, the rest of your
applicationwon't change. 

Craig

От:
Craig Ringer
Дата:

On 03/07/10 00:36, Craig James wrote:

> Perl itself is written in C, and some of it's operations are extremely
> fast.

The same is true of PL/PgSQL, though ;-)

The main advantage of PL/Perl is that it doesn't rely on the SPI to do
everything. It's interpreted not compiled, but it has a much faster
approach to interpretation than PL/PgSQL.

Really, the right choice depends on exactly what the OP is doing and
how, which they're not saying.

Where's the code?

--
Craig Ringer

От:
Eliot Gable
Дата:

Well, I re-wrote the algorithm in Perl. However, it did not solve the speed issue. Running time now is a whopping 240+ ms instead of the 31.8ms I was getting before (15.2 of which is sorting). Here is the Perl code on the sorting. I won't post the pl/pgsql code, because this is far more clear (in my opinion) on what the algorithm does:

DROP TYPE IF EXISTS glbtype CASCADE;
CREATE TYPE glbtype AS (
id INTEGER,
"group" TEXT,
priority INTEGER,
weight INTEGER
);

DROP TYPE IF EXISTS glbtype_result CASCADE;
CREATE TYPE glbtype_result AS (
id INTEGER,
priority INTEGER,
weight INTEGER,
"order" BIGINT
);

CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF glbtype_result AS
$BODY$
# Input is an array of a composite type
my ($input) = @_;
my %groups;
$input =~ s/^{|}$//g;
$input =~ s/[)(]//g;
my @rows;
my $count = 0;
while ($input && $count < 10000) {
my ($id, $group, $prio, $weight, @rest) = split(/,/, $input);
push(@rows, {id => $id, group => $group, priority => $prio, weight => $weight});
$count++;
$input = join(',', @rest);
}

if(scalar @rows < 1) {
elog(NOTICE, '  No rows sent for sorting.');
return undef;
} else {
elog(NOTICE, '  '.(scalar @rows).' rows sent for sorting.');
}

foreach $rw (@rows) {
if($rw->{group} && $rw->{priority} && $rw->{weight}) {
push( @{ $groups{$rw->{group}}{$rw->{priority}} }, $rw);
elog(NOTICE, '  Pushing '.$rw->{group}.' with prio ('.$rw->{priority}.'), weight ('.$rw->{weight}.') onto array.');
} else {
elog(NOTICE, '  Invalid sort row: Group ('.$rw->{group}.'), Prio ('.$rw->{priority}.'), Weight ('.$rw->{weight}.')');
}
}

foreach $group (keys %groups) {
elog(NOTICE, '  Sorting group '.$group.'...');
foreach $prio (keys %{$groups{$group}}) {
my @rows = @{ $groups{$group}{$prio} };
elog(NOTICE, '    Sorting '.(scalar @rows).' rows in priority '.$prio.'...');
my @zeros;
my @nonzeros;
my $total_weight = 0;
my $row_order = 1;
for($row_id = 0; $row_id < scalar @rows; $row_id++) {
my $row = $rows[$row_id];
$total_weight += $row->{weight};
elog(NOTICE, '    Total Weight ('.$total_weight.')');
if($row->{weight} == 0) {
push(@zeros, $row);
} else {
push(@nonzeros, $row);
}
}
my @first_order = (@zeros, @nonzeros);
undef(@zeros);
undef(@nonzeros);
while(scalar @first_order) {
elog(NOTICE, '      '.(scalar @first_order).' items remaining ...');
my $rand = int(rand($total_weight));
elog(NOTICE, '      Random weight ('.$rand.')');
my $running_weight = 0;
for($row_id = 0; $row_id < scalar @first_order; $row_id++) {
my $row = $first_order[$row_id];
$running_weight += $row->{weight};
elog(NOTICE, '      Running weight ('.$running_weight.') Current Weight ('.$row->{weight}.')');
if($running_weight >= $rand) {
elog(NOTICE, '        : Priority ('.($row->{priority}).') Weight ('.($row->{weight}).')');
return_next(
{ id => int($row->{id}),
 priority => int($row->{priority}),
 weight => int($row->{weight}),
 order => int($row_order) }
);
$row_order++;
splice(@first_order, $row_id, 1);
# Recalculate total weight
$total_weight = 0;
foreach $row (@first_order) {
$total_weight += $row->{weight};
}
elog(NOTICE, '        : Remaining Weight ('.$total_weight.')');
break;
}
}
}
}
}
return undef;
$BODY$
LANGUAGE plperl VOLATILE;

5 rows sent for sorting.
Pushing GROUP_7 with prio (1), weight (0) onto array.
Pushing GROUP_7 with prio (1), weight (5) onto array.
Pushing GROUP_8 with prio (1), weight (1) onto array.
Pushing GROUP_8 with prio (1), weight (5) onto array.
Pushing GROUP_8 with prio (1), weight (5) onto array.
Sorting group GROUP_7...
Sorting 2 rows in priority 1...
Total Weight (0)
Total Weight (5)
2 items remaining ...
Random weight (0)
Running weight (0) Current Weight (0)
: Priority (1) Weight (0)
: Remaining Weight (5)
1 items remaining ...
Random weight (0)
Running weight (5) Current Weight (5)
: Priority (1) Weight (5)
: Remaining Weight (0)
Sorting group GROUP_8...
Sorting 3 rows in priority 1...
Total Weight (1)
Total Weight (6)
Total Weight (11)
3 items remaining ...
Random weight (8)
Running weight (1) Current Weight (1)
Running weight (6) Current Weight (5)
Running weight (11) Current Weight (5)
: Priority (1) Weight (5)
: Remaining Weight (6)
2 items remaining ...
Random weight (2)
Running weight (1) Current Weight (1)
Running weight (6) Current Weight (5)
: Priority (1) Weight (5)
: Remaining Weight (1)
1 items remaining ...
Random weight (0)
Running weight (1) Current Weight (1)
: Priority (1) Weight (1)
: Remaining Weight (0)

2 rows sent for sorting.
Pushing GROUP_1 with prio (1), weight (0) onto array.
Pushing GROUP_1 with prio (2), weight (4) onto array.
Sorting group GROUP_1...
Sorting 1 rows in priority 1...
Total Weight (0)
1 items remaining ...
Random weight (0)
Running weight (0) Current Weight (0)
: Priority (1) Weight (0)
: Remaining Weight (0)
Sorting 1 rows in priority 2...
Total Weight (4)
1 items remaining ...
Random weight (2)
Running weight (4) Current Weight (4)
: Priority (2) Weight (4)
: Remaining Weight (0)

Total runtime: 244.101 ms


On Fri, Jul 2, 2010 at 9:44 PM, Craig Ringer <> wrote:
On 03/07/10 00:36, Craig James wrote:

> Perl itself is written in C, and some of it's operations are extremely
> fast.

The same is true of PL/PgSQL, though ;-)

The main advantage of PL/Perl is that it doesn't rely on the SPI to do
everything. It's interpreted not compiled, but it has a much faster
approach to interpretation than PL/PgSQL.

Really, the right choice depends on exactly what the OP is doing and
how, which they're not saying.

Where's the code?

--
Craig Ringer

--
Sent via pgsql-performance mailing list ()
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
От:
Merlin Moncure
Дата:

On Fri, Jul 2, 2010 at 11:17 PM, Eliot Gable
<> wrote:
> Well, I re-wrote the algorithm in Perl. However, it did not solve the speed
> issue. Running time now is a whopping 240+ ms instead of the 31.8ms I was
> getting before (15.2 of which is sorting). Here is the Perl code on the
> sorting. I won't post the pl/pgsql code, because this is far more clear (in
> my opinion) on what the algorithm does:
> DROP TYPE IF EXISTS glbtype CASCADE;
> CREATE TYPE glbtype AS (
> id INTEGER,
> "group" TEXT,
> priority INTEGER,
> weight INTEGER
> );
> DROP TYPE IF EXISTS glbtype_result CASCADE;
> CREATE TYPE glbtype_result AS (
> id INTEGER,
> priority INTEGER,
> weight INTEGER,
> "order" BIGINT
> );
> CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
> glbtype_result AS

ok, I didn't take the time to read your implementation and completely
understand it, but it looks like you're looking at a N^2 sorting at
best.

You probably want to do something like this (it might not be quite
right, you need to explain what each of your input array fields is
supposed to represent):
CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
glbtype_result AS
$$
  with g as (select unnest(glbtype) as t)
    select array(select ((t).id, (t).priority) (t).weight), 0)::glbtype_result
      from g order by (t).group, (t).priority, random() * (t).weight);
$$ language sql;

(not sure what "order" is, is that the rownum, can't that just be
inferred from the array position?)

merlin

От:
Eliot Gable
Дата:

Read RFC 2782 on random weighted load balancing of SRV records inside DNS. That algorithm is what I need implemented, but with an extension. I have groups of records I need to have the algorithm applied to where each group is treated separately from the others. I understand the operational complexity of what I'm doing. It is more like N^3, or more precisely G*P*W where G is the number of groups, P the number of priorities per group, and W the number of different weights per priority. But, the complexity of the algorithm means nothing in terms of performance  or run time because it will only ever deal with very small sets of records (maybe 20 rows of data, tops). Even if the algorithm were N^4, it wouldn't matter with that few records. But, more importantly, there are constraints in how the data is sub-divided. Primarily, G < P < W. Further, G and P are groupings which subdivide the entire set of data and the groups do not have overlapping data. So, maybe it's more like N^2.2 or something. But, regardless, we're only talking about 20 rows, tops. 

The issue is how efficiently the languages can deal with arrays. In Perl, I have to parse a string into an array of data, then break it up into sub arrays inside associative arrays just to work with the input. I also have to splice the array to remove elements, which I don't think is very efficient. Any way I could come up with of removing elements involved rebuilding the entire array. The same thing goes for pl/pgsql. Dealing with arrays there is also not very efficient. I do a lot of constructing of arrays from sets of data using myvar = array(select blah);. While pl/pgsql was considerably faster than Perl, it cannot come close to what I did in C++ using a hash of a hash of a linked list. The two hash tables provide my groupings and the linked list gives me something that is highly efficient for removing elements as I pick them.

I've looked through the documentation on how to re-write this in C, but I cannot seem to find good documentation on working with the input array (which is an array of a complex type). I also don't see good documentation for working with the complex type. I found stuff that talks about constructing a complex type in C and returning it. However, I'm not sure how to take an input complex type and deconstruct it into something I can work with in C. Also, the memory context management stuff is not entirely clear. Specifically, how do I go about preserving the pointers to the data that I allocate in multi-call memory context so that they still point to the data on the next call to the function for the next result row? Am I supposed to set up some global variables to do that, or am I supposed to take a different approach? If I need to use global variables, then how do I deal with concurrency?

On Sat, Jul 3, 2010 at 2:08 PM, Merlin Moncure <> wrote:
On Fri, Jul 2, 2010 at 11:17 PM, Eliot Gable
<> wrote:
> Well, I re-wrote the algorithm in Perl. However, it did not solve the speed
> issue. Running time now is a whopping 240+ ms instead of the 31.8ms I was
> getting before (15.2 of which is sorting). Here is the Perl code on the
> sorting. I won't post the pl/pgsql code, because this is far more clear (in
> my opinion) on what the algorithm does:
> DROP TYPE IF EXISTS glbtype CASCADE;
> CREATE TYPE glbtype AS (
> id INTEGER,
> "group" TEXT,
> priority INTEGER,
> weight INTEGER
> );
> DROP TYPE IF EXISTS glbtype_result CASCADE;
> CREATE TYPE glbtype_result AS (
> id INTEGER,
> priority INTEGER,
> weight INTEGER,
> "order" BIGINT
> );
> CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
> glbtype_result AS

ok, I didn't take the time to read your implementation and completely
understand it, but it looks like you're looking at a N^2 sorting at
best.

You probably want to do something like this (it might not be quite
right, you need to explain what each of your input array fields is
supposed to represent):
CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
glbtype_result AS
$$
 with g as (select unnest(glbtype) as t)
   select array(select ((t).id, (t).priority) (t).weight), 0)::glbtype_result
     from g order by (t).group, (t).priority, random() * (t).weight);
$$ language sql;

(not sure what "order" is, is that the rownum, can't that just be
inferred from the array position?)

merlin



--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
От:
Merlin Moncure
Дата:

On Sat, Jul 3, 2010 at 4:17 PM, Eliot Gable
<> wrote:
> Read RFC 2782 on random weighted load balancing of SRV records inside DNS.
> That algorithm is what I need implemented, but with an extension. I have
> groups of records I need to have the algorithm applied to where each group
> is treated separately from the others. I understand the operational
> complexity of what I'm doing. It is more like N^3, or more precisely G*P*W
> where G is the number of groups, P the number of priorities per group, and W
> the number of different weights per priority. But, the complexity of the
> algorithm means nothing in terms of performance  or run time because it will
> only ever deal with very small sets of records (maybe 20 rows of data,
> tops). Even if the algorithm were N^4, it wouldn't matter with that few
> records. But, more importantly, there are constraints in how the data is
> sub-divided. Primarily, G < P < W. Further, G and P are groupings which
> subdivide the entire set of data and the groups do not have overlapping
> data. So, maybe it's more like N^2.2 or something. But, regardless, we're
> only talking about 20 rows, tops.
> The issue is how efficiently the languages can deal with arrays. In Perl, I
> have to parse a string into an array of data, then break it up into sub
> arrays inside associative arrays just to work with the input. I also have to
> splice the array to remove elements, which I don't think is very efficient.
> Any way I could come up with of removing elements involved rebuilding the
> entire array. The same thing goes for pl/pgsql. Dealing with arrays there is
> also not very efficient. I do a lot of constructing of arrays from sets of
> data using myvar = array(select blah);. While pl/pgsql was considerably
> faster than Perl, it cannot come close to what I did in C++ using a hash of
> a hash of a linked list. The two hash tables provide my groupings and the
> linked list gives me something that is highly efficient for removing
> elements as I pick them.
> I've looked through the documentation on how to re-write this in C, but I
> cannot seem to find good documentation on working with the input array
> (which is an array of a complex type). I also don't see good documentation
> for working with the complex type. I found stuff that talks about
> constructing a complex type in C and returning it. However, I'm not sure how
> to take an input complex type and deconstruct it into something I can work
> with in C. Also, the memory context management stuff is not entirely clear.
> Specifically, how do I go about preserving the pointers to the data that I
> allocate in multi-call memory context so that they still point to the data
> on the next call to the function for the next result row? Am I supposed to
> set up some global variables to do that, or am I supposed to take a
> different approach? If I need to use global variables, then how do I deal
> with concurrency?

please stop top posting.

What about my suggestion doesn't work for your requirements?  (btw,
let me disagree with my peers and state pl/perl is lousy for this type
of job, only sql/and pl/sql can interact with postgresql variables
natively for the most part).

merlin

От:
Alvaro Herrera
Дата:

Excerpts from Merlin Moncure's message of sáb jul 03 18:53:46 -0400 2010:

> What about my suggestion doesn't work for your requirements?  (btw,
> let me disagree with my peers and state pl/perl is lousy for this type
> of job, only sql/and pl/sql can interact with postgresql variables
> natively for the most part).

IIRC the other reason pl/perl sucks for this kind of thing is that it
forces a subtransaction to be created before the function call, which is
expensive.  (I might be misremembering and what actually causes a
subtransaction is a SPI call inside a PL/Perl function, which wouldn't
apply here.)

От:
Robert Haas
Дата:

On Sat, Jul 3, 2010 at 4:17 PM, Eliot Gable
<> wrote:
> Read RFC 2782 on random weighted load balancing of SRV records inside DNS.

It may be asking a bit much to expect people here to read an RFC to
figure out how to help you solve this problem, but...

> I've looked through the documentation on how to re-write this in C, but I
> cannot seem to find good documentation on working with the input array
> (which is an array of a complex type). I also don't see good documentation
> for working with the complex type. I found stuff that talks about
> constructing a complex type in C and returning it. However, I'm not sure how
> to take an input complex type and deconstruct it into something I can work
> with in C. Also, the memory context management stuff is not entirely clear.

...there's no question that writing things in C is a lot more work,
and takes some getting used to.  Still, it's fast, so maybe worth it,
especially since you already know C++, and will therefore mostly just
need to learn the PostgreSQL coding conventions.  The best thing to do
is probably to look at some of the existing examples within the
backend code.  Most of the datatype code is in src/backend/utils/adt.
You might want to look at arrayfuncs.c (perhaps array_ref() or
array_map()); and also rowtypes.c (perhaps record_cmp()).

> Specifically, how do I go about preserving the pointers to the data that I
> allocate in multi-call memory context so that they still point to the data
> on the next call to the function for the next result row? Am I supposed to
> set up some global variables to do that, or am I supposed to take a
> different approach? If I need to use global variables, then how do I deal
> with concurrency?

Global variables would be a bad idea, not so much because of
concurrency as because they won't get cleaned up properly.  Again, the
best thing to do is to look at existing examples, like array_unnest()
in src/backend/utils/adt/arrayfuncs.c; the short answer is that you
probably want to compute all your results on the first call and stash
them in the FuncCallContext (funcctx->user_fctx); and then on
subsequent calls just return one row per call.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

От:
Eliot Gable
Дата:


On Tue, Jul 6, 2010 at 3:01 PM, Robert Haas <> wrote:
On Sat, Jul 3, 2010 at 4:17 PM, Eliot Gable
<> wrote:
> Read RFC 2782 on random weighted load balancing of SRV records inside DNS.

It may be asking a bit much to expect people here to read an RFC to
figure out how to help you solve this problem, but...


Yeah, I was not actually expecting them to read the whole RFC. The section on random weighted load balancing is only a few paragraphs and describes just the algorithm I am trying to implement as efficiently as possible:

   Priority
The priority of this target host. A client MUST attempt to
contact the target host with the lowest-numbered priority it can
reach; target hosts with the same priority SHOULD be tried in an
order defined by the weight field. The range is 0-65535. This
is a 16 bit unsigned integer in network byte order.

Weight
A server selection mechanism. The weight field specifies a
relative weight for entries with the same priority. Larger
weights SHOULD be given a proportionately higher probability of
being selected. The range of this number is 0-65535. This is a
16 bit unsigned integer in network byte order. Domain
administrators SHOULD use Weight 0 when there isn't any server
selection to do, to make the RR easier to read for humans (less
noisy). In the presence of records containing weights greater
than 0, records with weight 0 should have a very small chance of
being selected.

In the absence of a protocol whose specification calls for the
use of other weighting information, a client arranges the SRV
RRs of the same Priority in the order in which target hosts,
specified by the SRV RRs, will be contacted. The following
algorithm SHOULD be used to order the SRV RRs of the same
priority:

To select a target to be contacted next, arrange all SRV RRs
(that have not been ordered yet) in any order, except that all
those with weight 0 are placed at the beginning of the list.

Compute the sum of the weights of those RRs, and with each RR
associate the running sum in the selected order. Then choose a
uniform random number between 0 and the sum computed
(inclusive), and select the RR whose running sum value is the
first in the selected order which is greater than or equal to
the random number selected. The target host specified in the
selected SRV RR is the next one to be contacted by the client.
Remove this SRV RR from the set of the unordered SRV RRs and
apply the described algorithm to the unordered SRV RRs to select
the next target host. Continue the ordering process until there
are no unordered SRV RRs. This process is repeated for each
Priority.
The difference between this description and my implementation is that I have added a "group" field to the mix so that this algorithm is applied to each group independently of the others. Also, my input data has an "id" field which must be present on the same rows of the output and is used to map the output back to my original input.
 
...there's no question that writing things in C is a lot more work,
and takes some getting used to.  Still, it's fast, so maybe worth it,
especially since you already know C++, and will therefore mostly just
need to learn the PostgreSQL coding conventions.  The best thing to do
is probably to look at some of the existing examples within the
backend code.  Most of the datatype code is in src/backend/utils/adt.
You might want to look at arrayfuncs.c (perhaps array_ref() or
array_map()); and also rowtypes.c (perhaps record_cmp()).


I did actually find the arrayfuncs.c file and start looking through it for examples. I'm just not entirely clear on what is going on in some of those functions -- what is necessary to keep in order to extract my data and get it represented in C structures and what I can remove. I was hoping there was some sort of documentation on how to work with input arrays for extracting the data and getting it converted. In any event, I have spent several hours reverse engineering how that stuff works, and I think I am pretty close to being able to get my data into a C structure that I can work with.
 
> Specifically, how do I go about preserving the pointers to the data that I
> allocate in multi-call memory context so that they still point to the data
> on the next call to the function for the next result row? Am I supposed to
> set up some global variables to do that, or am I supposed to take a
> different approach? If I need to use global variables, then how do I deal
> with concurrency?

Global variables would be a bad idea, not so much because of
concurrency as because they won't get cleaned up properly.  Again, the
best thing to do is to look at existing examples, like array_unnest()
in src/backend/utils/adt/arrayfuncs.c; the short answer is that you
probably want to compute all your results on the first call and stash
them in the FuncCallContext (funcctx->user_fctx); and then on
subsequent calls just return one row per call.

Thanks for suggesting array_unnest(). I think that will actually prove more useful to me than the other example I'm using for extracting my data from an array. I was actually planning on computing the order on the first call and storing it in a linked list which gets returned one item at a time until all rows have been returned. Also, I found a code example using Google that showed someone storing data across function calls using that pointer. I used their example to produce this:

<snip>
    if(SRF_IS_FIRSTCALL()) {
        funcctx = SRF_FIRSTCALL_INIT();

        /* This is where we stick or sorted data for returning later */
        funcctx->user_fctx = MemoryContextAlloc(funcctx->multi_call_memory_ctx, sizeof(sort_data));
        oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
        data = (sort_data*) funcctx->user_fctx;
</snip>

I have a structure set up that is typedef'd to "sort_data" which stores pointers to various things that I need to survive across the calls. Since this seems to be what you are suggesting, I assume this is the correct approach.


--
Eliot Gable

От:
Joe Conway
Дата:

On 07/06/2010 12:42 PM, Eliot Gable wrote:
> Thanks for suggesting array_unnest(). I think that will actually prove
> more useful to me than the other example I'm using for extracting my
> data from an array. I was actually planning on computing the order on
> the first call and storing it in a linked list which gets returned one
> item at a time until all rows have been returned. Also, I found a code
> example using Google that showed someone storing data across function
> calls using that pointer. I used their example to produce this:
>
> <snip>
>     if(SRF_IS_FIRSTCALL()) {
>         funcctx = SRF_FIRSTCALL_INIT();
>
>         /* This is where we stick or sorted data for returning later */
>         funcctx->user_fctx =
> MemoryContextAlloc(funcctx->multi_call_memory_ctx, sizeof(sort_data));
>         oldcontext = MemoryContextSwitchTo(funcctx->multi_call_memory_ctx);
>         data = (sort_data*) funcctx->user_fctx;
> </snip>
>
> I have a structure set up that is typedef'd to "sort_data" which stores
> pointers to various things that I need to survive across the calls.
> Since this seems to be what you are suggesting, I assume this is the
> correct approach.

This approach works, but you could also use the SFRM_Materialize mode
and calculate the entire result set in one go. That tends to be simpler.
See, for example crosstab_hash() in contrib/tablefunc for an example.

FWIW, there are also some good examples of array handling in PL/R, e.g.
pg_array_get_r() in pg_conversion.c

HTH,

Joe

--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & Support


От:
Eliot Gable
Дата:


On Tue, Jul 6, 2010 at 4:00 PM, Joe Conway <> wrote:


This approach works, but you could also use the SFRM_Materialize mode
and calculate the entire result set in one go. That tends to be simpler.
See, for example crosstab_hash() in contrib/tablefunc for an example.

FWIW, there are also some good examples of array handling in PL/R, e.g.
pg_array_get_r() in pg_conversion.c


 Thanks. That looks like less code and probably will be slightly more efficient.
От:
Eliot Gable
Дата:

On Tue, Jul 6, 2010 at 4:17 PM, Eliot Gable <> wrote:

On Tue, Jul 6, 2010 at 4:00 PM, Joe Conway <> wrote:


This approach works, but you could also use the SFRM_Materialize mode
and calculate the entire result set in one go. That tends to be simpler.
See, for example crosstab_hash() in contrib/tablefunc for an example.

FWIW, there are also some good examples of array handling in PL/R, e.g.
pg_array_get_r() in pg_conversion.c


 Thanks. That looks like less code and probably will be slightly more efficient.

I just got my first test of the new C-based function compiled and loaded into the server. The first time it is called, I see it correctly print the priority of each of the five rows of the array that I passed to it:

Got priority 1.
Got priority 1.
Got priority 1.
Got priority 1.
Got priority 1.
CONTEXT: ERROR
CODE: XX000
MESSAGE: cache lookup failed for type 7602245
---------------------------------------------------------------------------

I assume this "cache lookup" error is because I am not actually returning any results (or even NULL) at the end of the function call. If it means something else, please let me know.

Do I need to somehow force the server to unload and then re-load this .so file each time I build a new version of it? If so, how do I do that? Can I just re-run the "create or replace function" SQL code again to make that happen? In every other system I have dealt with where I build a module, I have some way to unload the module and force it to load again; but I don't see a mention of that in the PostgreSQL documentation.

Thanks again to everyone who has provided feedback.

--
Eliot Gable

От:
Tom Lane
Дата:

Eliot Gable <> writes:
> Do I need to somehow force the server to unload and then re-load this .so
> file each time I build a new version of it? If so, how do I do that?

Start a new database session.

            regards, tom lane

От:
Eliot Gable
Дата:

Thanks again for all the input and suggestions from people. I have this sorting algorithm re-implemented in C now and it is somewhere <2ms to run it now; though it is difficult to get a more accurate measure. There may be some additional optimizations I can come up with, but for now, this will work very well compared to the alternative methods.

On Tue, Jul 6, 2010 at 6:21 PM, Tom Lane <> wrote:
Eliot Gable <> writes:
> Do I need to somehow force the server to unload and then re-load this .so
> file each time I build a new version of it? If so, how do I do that?

Start a new database session.

                       regards, tom lane



--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero
От:
Kenneth Marshall
Дата:

Hi Eliot,

Would you mind posting your code for reference. It is nice to
have working examples when trying to figure out how it all fits
together.

Regards,
Ken

On Wed, Jul 07, 2010 at 03:23:12PM -0400, Eliot Gable wrote:
> Thanks again for all the input and suggestions from people. I have this
> sorting algorithm re-implemented in C now and it is somewhere <2ms to run it
> now; though it is difficult to get a more accurate measure. There may be
> some additional optimizations I can come up with, but for now, this will
> work very well compared to the alternative methods.
>
> On Tue, Jul 6, 2010 at 6:21 PM, Tom Lane <> wrote:
>
> > Eliot Gable <<egable%>>
> > writes:
> > > Do I need to somehow force the server to unload and then re-load this .so
> > > file each time I build a new version of it? If so, how do I do that?
> >
> > Start a new database session.
> >
> >                        regards, tom lane
> >
>
>
>
> --
> Eliot Gable
>
> "We do not inherit the Earth from our ancestors: we borrow it from our
> children." ~David Brower
>
> "I decided the words were too conservative for me. We're not borrowing from
> our children, we're stealing from them--and it's not even considered to be a
> crime." ~David Brower
>
> "Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not
> live to eat.) ~Marcus Tullius Cicero