Обсуждение: On the performance of views

Поиск
Список
Период
Сортировка

On the performance of views

От
Bill Moran
Дата:
I have an application that I'm porting from MSSQL to PostgreSQL.  Part of this
application consists of hundreds of stored procedures that I need to convert
to Postgres functions ... or views?

At first I was going to just convert all MSSQL procedures to Postgres functions.
But now that I'm looking at it, a lot of them may be candidates for views.  A
lot of them take on the format of:

SELECT a.cola, b.colb, c.colc
FROM a JOIN b JOIN c
WHERE a.prikey=$1

(this is slightly oversimplified, but as a generalization of hundreds of
functions, it's pretty accurate)

Now, I know this questions is pretty generalized, and I intend to test before
actually commiting to a particular course of action, but I'm very early in the
conversion and I'm curious as to whether people with more experience than I
think that views will provide better performance than functions containing
SQL statements like the above.  The client is _very_ interested in performance
(just like anyone who needs to market things these days ... apparently, if you
admit that something might take a few seconds in your advertising, you're sunk)

Any opinions are welcome.  Also, if this is a relatively unknown thing, I'd
be curious to know that as well, because then it would be worthwhile for me
to record and publish my experience.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: On the performance of views

От
Shridhar Daithankar
Дата:
Bill Moran wrote:

> I have an application that I'm porting from MSSQL to PostgreSQL.  Part
> of this
> application consists of hundreds of stored procedures that I need to
> convert
> to Postgres functions ... or views?
>
> At first I was going to just convert all MSSQL procedures to Postgres
> functions.
> But now that I'm looking at it, a lot of them may be candidates for
> views.  A
> lot of them take on the format of:
>
> SELECT a.cola, b.colb, c.colc
> FROM a JOIN b JOIN c
> WHERE a.prikey=$1

Make sure that you typecase correctly. It makes a differnce of order of
magnitude when you say 'where intpkey=<somevalue>::int' rather than 'where
intpkey=<somevalue>'.

It is called typecasting and highly recommened in postgresql for correctly
choosing indexes.

I remember another post on some list, which said pl/pgsql seems to be very
strongly typed language compared to MSSQL counterpart. So watch for that as well.

>
> (this is slightly oversimplified, but as a generalization of hundreds of
> functions, it's pretty accurate)
>
> Now, I know this questions is pretty generalized, and I intend to test
> before
> actually commiting to a particular course of action, but I'm very early
> in the
> conversion and I'm curious as to whether people with more experience than I
> think that views will provide better performance than functions containing
> SQL statements like the above.  The client is _very_ interested in

To my understanding, views are expanded at runtime and considered while
preparing plan for the complete (and possibly bigger) query(Consider a view
joined with something else). That is not as easy/possible if at all, when it is
function. For postgresql query planner, the function is a black box(rightly so,
I would say).

So using views opens possibility of changing query plans if required. Most of
the times that should be faster than using them as functions.

Of course, the standard disclaimer, YMMV. Try yourself.

Correct me if I am wrong.

HTH

  Shridhar

Re: On the performance of views

От
Tom Lane
Дата:
Bill Moran <wmoran@potentialtech.com> writes:
> At first I was going to just convert all MSSQL procedures to Postgres functions.
> But now that I'm looking at it, a lot of them may be candidates for views.  A
> lot of them take on the format of:

> SELECT a.cola, b.colb, c.colc
> FROM a JOIN b JOIN c
> WHERE a.prikey=$1

You'd probably be better off using views, if making that significant a
notational change is feasible for you.  Functions that return multiple
columns are notationally messy in Postgres.  A view-based solution would
be more flexible and likely have better performance.

            regards, tom lane

Re: On the performance of views

От
Josh Berkus
Дата:
Bill,

> > SELECT a.cola, b.colb, c.colc
> > FROM a JOIN b JOIN c
> > WHERE a.prikey=$1

If your views are simple, PostgreSQL will be able to "push down" any filter
criteria into the view itself.   For example,

CREATE view_a AS
SELECT a.cola, b.colb, c.colc
FROM a JOIN b JOIN c;

SELECT * FROM view_a
WHERE a.prikey = 2334432;

will execute just like:

SELECT a.cola, b.colb, c.colc
FROM a JOIN b JOIN c
WHERE a.prikey = 2334432;

However, this does not work for really complex views, which have to be
materialized or executed as a sub-loop.

The "Procedures faster than views" thing is a SQL Server peculiarity which is
a result of MS's buggering up views since they bought the code from Sybase.

> To my understanding, views are expanded at runtime and considered while
> preparing plan for the complete (and possibly bigger) query(Consider a view
> joined with something else). That is not as easy/possible if at all, when it
is
> function. For postgresql query planner, the function is a black box(rightly
so,
> I would say).

Well, as of 7.4 SQL functions are inlined.   And simple PL/pgSQL functions
will be "prepared".   So it's possible that either could execute as fast as a
view.

Also, if your client is really concerned about no-holds-barred speed, you
should investigate prepared queries.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: On the performance of views

От
Bill Moran
Дата:
Tom Lane wrote:
> Bill Moran <wmoran@potentialtech.com> writes:
>
>>At first I was going to just convert all MSSQL procedures to Postgres functions.
>>But now that I'm looking at it, a lot of them may be candidates for views.  A
>>lot of them take on the format of:
>
>>SELECT a.cola, b.colb, c.colc
>>FROM a JOIN b JOIN c
>>WHERE a.prikey=$1
>
> You'd probably be better off using views, if making that significant a
> notational change is feasible for you.  Functions that return multiple
> columns are notationally messy in Postgres.  A view-based solution would
> be more flexible and likely have better performance.

Well, I don't see a huge difference in how the application will be built.
Basically, PQexec calls will have a string like
"SELECT * FROM view_name WHERE prikey=%i" instead of
"SELECT * FROM function_name(%i)" ... which really doesn't make life much
more difficult (unless there's something I'm missing?)

Thanks for the input, Tom.  I'll definately try out views where possible to
see if it improves things.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: On the performance of views

От
Bill Moran
Дата:
Shridhar Daithankar wrote:
> Bill Moran wrote:
>
>> I have an application that I'm porting from MSSQL to PostgreSQL.  Part
>> of this
>> application consists of hundreds of stored procedures that I need to
>> convert
>> to Postgres functions ... or views?
>>
>> At first I was going to just convert all MSSQL procedures to Postgres
>> functions.
>> But now that I'm looking at it, a lot of them may be candidates for
>> views.  A
>> lot of them take on the format of:
>>
>> SELECT a.cola, b.colb, c.colc
>> FROM a JOIN b JOIN c
>> WHERE a.prikey=$1
>
> Make sure that you typecase correctly. It makes a differnce of order of
> magnitude when you say 'where intpkey=<somevalue>::int' rather than
> 'where intpkey=<somevalue>'.
>
> It is called typecasting and highly recommened in postgresql for
> correctly choosing indexes.
>
> I remember another post on some list, which said pl/pgsql seems to be
> very strongly typed language compared to MSSQL counterpart. So watch for
> that as well.

Oh yeah.  This particular difference is causing a lot of headaches, as I have
to track down the type of each field to create a new type for each function.
MSSQL seems to be _very_ loosely typed, in that it will return anything you
want and determine the return type at run time.

Some functions they prototyped in MSSQL even return different types, based
on certian parameters, I'm not sure how I'll do this in Postgres, but I'll
have to figure something out.

>> (this is slightly oversimplified, but as a generalization of hundreds of
>> functions, it's pretty accurate)
>>
>> Now, I know this questions is pretty generalized, and I intend to test
>> before
>> actually commiting to a particular course of action, but I'm very
>> early in the
>> conversion and I'm curious as to whether people with more experience
>> than I
>> think that views will provide better performance than functions
>> containing
>> SQL statements like the above.  The client is _very_ interested in
>
> To my understanding, views are expanded at runtime and considered while
> preparing plan for the complete (and possibly bigger) query(Consider a
> view joined with something else). That is not as easy/possible if at
> all, when it is function. For postgresql query planner, the function is
> a black box(rightly so, I would say).
>
> So using views opens possibility of changing query plans if required.
> Most of the times that should be faster than using them as functions.
>
> Of course, the standard disclaimer, YMMV. Try yourself.

Thanks.  I think I'm going to track the performance of many of these
functions and write up what I discover.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: On the performance of views

От
Josh Berkus
Дата:
Bill,

> Some functions they prototyped in MSSQL even return different types, based
> on certian parameters, I'm not sure how I'll do this in Postgres, but I'll
> have to figure something out.

We support that as of 7.4.1 to an extent; check out "Polymorphic Functions".


--
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Re: On the performance of views

От
Shridhar Daithankar
Дата:
Josh Berkus wrote:

> Bill,
>
>
>>Some functions they prototyped in MSSQL even return different types, based
>>on certian parameters, I'm not sure how I'll do this in Postgres, but I'll
>>have to figure something out.
>
>
> We support that as of 7.4.1 to an extent; check out "Polymorphic Functions".

To my understanding, polymorphism means more than one function with same name
but different signature(Sorry C++ days!!).

That still can not return rwos of two types in one call. At any moment, rowset
returned by a function call would be homogenous.

Is MSSQL allows to mix rows of two types in single function invocation, I am
sure that would be a hell lot of porting trouble..

Just a thought..

  Shridhar

Re: On the performance of views

От
Bill Moran
Дата:
Shridhar Daithankar wrote:
> Josh Berkus wrote:
>
>> Bill,
>>
>>> Some functions they prototyped in MSSQL even return different types,
>>> based
>>> on certian parameters, I'm not sure how I'll do this in Postgres, but
>>> I'll
>>> have to figure something out.
>>
>> We support that as of 7.4.1 to an extent; check out "Polymorphic
>> Functions".
>
> To my understanding, polymorphism means more than one function with same
> name but different signature(Sorry C++ days!!).
>
> That still can not return rwos of two types in one call. At any moment,
> rowset returned by a function call would be homogenous.
>
> Is MSSQL allows to mix rows of two types in single function invocation,
> I am sure that would be a hell lot of porting trouble..

These are two seperate problems.

1) Returning a homogenious set of rows, but the composition of those rows
    will not be known until run time, as a different set of logic will be
    done depending on the values of some parameters.
2) Returning what MSSQL calls "combined recordsets", which are many rows,
    but the rows are not homogenious.

As I see it, #1 can be solved by polymorphism in Postgres functions.

#2 has to be solved at the application level.  My solution so far has
been to create multiple Postgres functions, call each one in turn, join
the results in C, and return them as a structure via SOAP to the client.
May not be the easiest way to get it working, but it's working so far.
(although I'm always open to suggestions if someone knows of a better
way)

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: On the performance of views

От
Josh Berkus
Дата:
Shridhar, Bill,

> > Is MSSQL allows to mix rows of two types in single function invocation,
> > I am sure that would be a hell lot of porting trouble..

There's also the question of whether or not PG would every want to do this.
Frankly, as a once-upon-a-time SQL Server application developer, I found the
ability to return multiple rowsets from a single SQL Server procedure pretty
useless, and a source of endless debugging if we tried to implement it.

> 1) Returning a homogenious set of rows, but the composition of those rows
>     will not be known until run time, as a different set of logic will be
>     done depending on the values of some parameters.

This can be done with Set Returning Functions.   The issue is that the call to
the function requires special syntax, and the program calling the function
must know what columns are going to be returned at the time of the call.
Hmmm, is that clear or confusing?

> #2 has to be solved at the application level.  My solution so far has
> been to create multiple Postgres functions, call each one in turn, join
> the results in C, and return them as a structure via SOAP to the client.
> May not be the easiest way to get it working, but it's working so far.
> (although I'm always open to suggestions if someone knows of a better
> way)

See my comment above.   I frankly don't understand what the use of a
non-homogenous recordset is.   Can you explain?

--
Josh Berkus
Aglio Database Solutions
San Francisco

Re: On the performance of views

От
Bill Moran
Дата:
Josh Berkus wrote:
> Shridhar, Bill,
>
>>>Is MSSQL allows to mix rows of two types in single function invocation,
>>>I am sure that would be a hell lot of porting trouble..
>
> There's also the question of whether or not PG would every want to do this.
> Frankly, as a once-upon-a-time SQL Server application developer, I found the
> ability to return multiple rowsets from a single SQL Server procedure pretty
> useless, and a source of endless debugging if we tried to implement it.

Well, I would have agreed with the uselessness, until this project.  The
"source of endless debugging" frightens me!

>>1) Returning a homogenious set of rows, but the composition of those rows
>>    will not be known until run time, as a different set of logic will be
>>    done depending on the values of some parameters.
>
> This can be done with Set Returning Functions.   The issue is that the call to
> the function requires special syntax, and the program calling the function
> must know what columns are going to be returned at the time of the call.
> Hmmm, is that clear or confusing?

Clear as mud.  In my case, my application simply doesn't care what row of
what kind are returned.  See, I'm writing the server end, and all said and
done, it's really just glue (frighteningly thick glue, but glue nonetheless)

Basically, all I do is call each query in turn until I've collected all the
results, then marshall the results in to a SOAP XML response (using gsoap,
if anyone's curious) and give them back to the client application.  It's
the client app's job to figure out what to do with them, not mine.  I
never would have written it this way on my own, but the client app is
already written, so as I migrate it to the client-server model, the
programmers who wrote the client app are specifying what they expect me
to provide them.

The only advantage I see is that combining a number of result sets into a
single response reduces the number of round trips between the client and
server.

If Postgres supported combined recordsets, it would simplify my C code
somewhat, and possibly speed up things a bit by making less calls between
the soap server and Postgrees ... overall, I don't see a huge advantage
to it.

>>#2 has to be solved at the application level.  My solution so far has
>>been to create multiple Postgres functions, call each one in turn, join
>>the results in C, and return them as a structure via SOAP to the client.
>>May not be the easiest way to get it working, but it's working so far.
>>(although I'm always open to suggestions if someone knows of a better
>>way)
>
> See my comment above.   I frankly don't understand what the use of a
> non-homogenous recordset is.   Can you explain?

I hope what I already mentioned explains enough.  If I understand the
application enough (and it's amazing how little I understand about it,
considering I'm writing the server end!) what they're doing with these
combined recordsets is driving their forms.  When a form instantiates,
it makes a single soap call that causes me to return one of these
non-homogenious recordsets.  One row may have data on how to display
the form, while another has data on what buttons are available, and
another has the actual data for the header of the form, while the
remaing rows might have data to fill in the lower (grid) portion of
the form.

If I had designed this, I would just have done the same thing with
a homogenious recordset that had values set to null where they weren't
apropriate.  This would have bloated the data being transfered, but
otherwise would have worked in the same way.

Now that I'm aware of MSSQL's combined recordset capability, I'm not
sure if I would do it differently or not (were I developing on a system
that had that capability)  I probably won't have a concept of whether
or not I think this is a good idea until this project is further along.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


Re: On the performance of views

От
Shridhar Daithankar
Дата:
Bill Moran wrote:
> Basically, all I do is call each query in turn until I've collected all the
> results, then marshall the results in to a SOAP XML response (using gsoap,
> if anyone's curious) and give them back to the client application.  It's
> the client app's job to figure out what to do with them, not mine.  I
> never would have written it this way on my own, but the client app is
> already written, so as I migrate it to the client-server model, the
> programmers who wrote the client app are specifying what they expect me
> to provide them.

In C++/OO way, that would have been a array/list of base object pointers with
virtual methods for
- Obtaining result set
- Making a soap/XML out of it
- Deliver it
- And pre/post processing

And delegate the unique query/data handling to specific subclasses.

Now that you are doing that in C, you can as well use function pointers to mimic
inheritance and virtual functions. The only difference being

a. No typechecking. A function pointer in C, won't crib if you pass it wrong
number/wrong types of argument. C++ compiler will catch that.
b. Maintainance is nightmare unless you thoroughly document it. In C++, glancing
over source code might suffice. And no, in-source comment rarely suffice when
you use C for such applications.

I maintain such an application for day-job and I know what help/menace it can be
at times..:-(

But either way, asking for/using/maintaining non-homogeneous recordset is a
classic three finger salute. Load the gun, point to head and shoot.

Just a thought..

  Shridhar