Обсуждение: Re: [GENERAL] Retrieving query results

Поиск
Список
Период
Сортировка

Re: [GENERAL] Retrieving query results

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot <ikorot01@gmail.com> wrote:
>> [quote]
>> PQntuples
>>
>> Returns the number of rows (tuples) in the query result. Because it
>> returns an integer result, large result sets might overflow the return
>> value on 32-bit operating systems.
>>
>> int PQntuples(const PGresult *res);
>> [/quote]
>>
>> Is there another way to not to overflow the result?

> Not really with the existing API.

Actually, that documentation note is pretty beside-the-point, if not
outright wrong.  The real issue here is that libpq's internal row counter
is also a plain int.  As are the rownumber arguments to PQgetvalue and so
on.  While we could widen that internal counter, it's useless to do so
as long as these API choices prevent applications from dealing with
resultsets of more than 2G rows.

I think what we need is to (1) introduce some error checking in libpq so
that it reports an error if the resultset exceeds 2G rows --- right now
it'll just crash, I fear, and (2) change the documentation so that this
is explained as a library-wide limitation and not just a problem with
PQntuples.

            regards, tom lane


Re: [GENERAL] Retrieving query results

От
Igor Korot
Дата:
Hi, guys,


On Thu, Aug 24, 2017 at 8:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Wed, Aug 23, 2017 at 3:19 AM, Igor Korot <ikorot01@gmail.com> wrote:
>>> [quote]
>>> PQntuples
>>>
>>> Returns the number of rows (tuples) in the query result. Because it
>>> returns an integer result, large result sets might overflow the return
>>> value on 32-bit operating systems.
>>>
>>> int PQntuples(const PGresult *res);
>>> [/quote]
>>>
>>> Is there another way to not to overflow the result?
>
>> Not really with the existing API.
>
> Actually, that documentation note is pretty beside-the-point, if not
> outright wrong.  The real issue here is that libpq's internal row counter
> is also a plain int.  As are the rownumber arguments to PQgetvalue and so
> on.  While we could widen that internal counter, it's useless to do so
> as long as these API choices prevent applications from dealing with
> resultsets of more than 2G rows.
>
> I think what we need is to (1) introduce some error checking in libpq so
> that it reports an error if the resultset exceeds 2G rows --- right now
> it'll just crash, I fear, and (2) change the documentation so that this
> is explained as a library-wide limitation and not just a problem with
> PQntuples.

Does this mean that querying a table with a big number of rows will
crash the psql?

Thank you.

>
>                         regards, tom lane


Re: [GENERAL] Retrieving query results

От
Tom Lane
Дата:
Igor Korot <ikorot01@gmail.com> writes:
> On Thu, Aug 24, 2017 at 8:51 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think what we need is to (1) introduce some error checking in libpq so
>> that it reports an error if the resultset exceeds 2G rows --- right now
>> it'll just crash, I fear, and (2) change the documentation so that this
>> is explained as a library-wide limitation and not just a problem with
>> PQntuples.

> Does this mean that querying a table with a big number of rows will
> crash the psql?

I haven't tried it, but it sure looks like it would, if you don't hit
OOM first.  pqAddTuple() isn't doing anything to guard against integer
overflow.  The lack of reports implies that no one has ever tried to
retrieve even 1G rows, let alone more ...

            regards, tom lane


Re: [GENERAL] Retrieving query results

От
Michael Paquier
Дата:
On Thu, Aug 24, 2017 at 11:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I haven't tried it, but it sure looks like it would, if you don't hit
> OOM first.  pqAddTuple() isn't doing anything to guard against integer
> overflow.  The lack of reports implies that no one has ever tried to
> retrieve even 1G rows, let alone more ...

Yeah, looking at the code we would just need to check if ntups gets
negative (well, equal to INT_MIN) after being incremented.
--
Michael


Re: [GENERAL] Retrieving query results

От
Igor Korot
Дата:
Michael et al,

On Thu, Aug 24, 2017 at 6:57 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Thu, Aug 24, 2017 at 11:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I haven't tried it, but it sure looks like it would, if you don't hit
>> OOM first.  pqAddTuple() isn't doing anything to guard against integer
>> overflow.  The lack of reports implies that no one has ever tried to
>> retrieve even 1G rows, let alone more ...
>
> Yeah, looking at the code we would just need to check if ntups gets
> negative (well, equal to INT_MIN) after being incremented.

So there is no way to retrieve an arbitrary number of rows from the query?
That sucks...

Thank you.

> --
> Michael


Re: [GENERAL] Retrieving query results

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Thu, Aug 24, 2017 at 11:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I haven't tried it, but it sure looks like it would, if you don't hit
>> OOM first.  pqAddTuple() isn't doing anything to guard against integer
>> overflow.  The lack of reports implies that no one has ever tried to
>> retrieve even 1G rows, let alone more ...

> Yeah, looking at the code we would just need to check if ntups gets
> negative (well, equal to INT_MIN) after being incremented.

I think the real problem occurs where we realloc the array bigger.
tupArrSize needs to be kept to no more than INT_MAX --- and, ideally,
it should reach that value rather than dying on the iteration after
it reaches 2^30 (so that we support resultsets as large as we possibly
can).  Without a range-check, it's not very clear what realloc will think
it's being asked for.  Also, on 32-bit machines, we could overflow size_t
before tupArrSize even gets that big, so a test against
SIZE_MAX/sizeof(pointer) may be needed as well.

As long as we constrain tupArrSize to be within bounds, we don't
have to worry about overflow of ntups per se.

            regards, tom lane


Re: [GENERAL] Retrieving query results

От
Tom Lane
Дата:
Igor Korot <ikorot01@gmail.com> writes:
> So there is no way to retrieve an arbitrary number of rows from the query?
> That sucks...

The restriction is on the number of rows in one PGresult, not the total
size of the query result.  You could use single-row mode, or use a cursor
and fetch some reasonable number of rows at a time.  If you try to inhale
all of a many-gigarow result at once, you're going to have OOM problems
anyway, even if you had the patience to wait for it.  So I don't think the
existence of a limit is a problem.  Failure to check it *is* a problem,
certainly.

            regards, tom lane


Re: [GENERAL] Retrieving query results

От
Igor Korot
Дата:
Hi,

On Thu, Aug 24, 2017 at 7:18 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Igor Korot <ikorot01@gmail.com> writes:
>> So there is no way to retrieve an arbitrary number of rows from the query?
>> That sucks...
>
> The restriction is on the number of rows in one PGresult, not the total
> size of the query result.  You could use single-row mode, or use a cursor
> and fetch some reasonable number of rows at a time.  If you try to inhale
> all of a many-gigarow result at once, you're going to have OOM problems
> anyway, even if you had the patience to wait for it.  So I don't think the
> existence of a limit is a problem.  Failure to check it *is* a problem,
> certainly.

Is there a sample of using single-row mode?
How to turn it on and turn it off?

Is there a cursor example with the Prepared Statements?
The one in the documentation doesn't use them - it uses PQexec().

Thank you.

>
>                         regards, tom lane


Re: [GENERAL] Retrieving query results

От
Michael Paquier
Дата:
On Fri, Aug 25, 2017 at 8:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I think the real problem occurs where we realloc the array bigger.
> tupArrSize needs to be kept to no more than INT_MAX --- and, ideally,
> it should reach that value rather than dying on the iteration after
> it reaches 2^30 (so that we support resultsets as large as we possibly
> can).  Without a range-check, it's not very clear what realloc will think
> it's being asked for.  Also, on 32-bit machines, we could overflow size_t
> before tupArrSize even gets that big, so a test against
> SIZE_MAX/sizeof(pointer) may be needed as well.
>
> As long as we constrain tupArrSize to be within bounds, we don't
> have to worry about overflow of ntups per se.

I just poked more seriously at this code, and we could use something like that:
@@ -868,6 +868,16 @@ pqAddTuple(PGresult *res, PGresAttValue *tup)
        int         newSize = (res->tupArrSize > 0) ? res->tupArrSize * 2 : 128;
        PGresAttValue **newTuples;

+       if (res->tupArrSize == INT_MAX)
+           return FALSE;
+       if (new_size == INT_MIN)
+           new_size = INT_MAX;
+       if (newSize > SIZE_MAX / sizeof(PGresAttValue *))
+           return FALSE;

Looking at the surroundings, I think that it would be nice to have
pqAddTuple and PQsetvalue set an error message with this patch. The
user can see now that those would only properly report on OOM, but if
we add more types of errors proper error messages would be nice for
users.
--
Michael


Re: [GENERAL] Retrieving query results

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> On Fri, Aug 25, 2017 at 8:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think the real problem occurs where we realloc the array bigger.

> Looking at the surroundings, I think that it would be nice to have
> pqAddTuple and PQsetvalue set an error message with this patch.

Yeah, I was thinking about that myself - the existing design presumes
that the only possible reason for failure of pqAddTuple is OOM, but
it'd be better to distinguish "too many tuples" from true OOM.  So
we should also refactor to make pqAddTuple responsible for setting
the error message.  Might be best to do the refactoring first.

            regards, tom lane


Re: [GENERAL] Retrieving query results

От
Michael Paquier
Дата:
On Sun, Aug 27, 2017 at 12:12 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Fri, Aug 25, 2017 at 8:10 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think the real problem occurs where we realloc the array bigger.
>
>> Looking at the surroundings, I think that it would be nice to have
>> pqAddTuple and PQsetvalue set an error message with this patch.
>
> Yeah, I was thinking about that myself - the existing design presumes
> that the only possible reason for failure of pqAddTuple is OOM, but
> it'd be better to distinguish "too many tuples" from true OOM.  So
> we should also refactor to make pqAddTuple responsible for setting
> the error message.  Might be best to do the refactoring first.

Attached are two patches:
1) 0001 refactors the code around pqAddTuple to be able to handle
error messages and assign them in PQsetvalue particularly.
2) 0002 adds sanity checks in pqAddTuple for overflows, maximizing the
size of what is allocated to INT_MAX but now more.

pqRowProcessor() still has errmsgp, but it is never used on HEAD. At
least with this set of patches it comes to be useful. We could rework
check_field_number() to use as well an error message string, but I
have left that out to keep things simple. Not sure if any complication
is worth compared to just copying the error message in case of an
unmatching column number.

Attached is as well a small program I have used to test PQsetvalue
through PQcopyResult to see if results get correctly allocated at each
call, looking at the error message stacks on the way.
--
Michael

Вложения