Обсуждение: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

Поиск
Список
Период
Сортировка

FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

От
"Gabe F. Rudy"
Дата:
<div class="WordSection1"><p class="MsoNormal">Hey all,<p class="MsoNormal"> <p class="MsoNormal">I’m building a FDW
arounda column-store backend (similar to CStore but for genomic data!).<p class="MsoNormal"> <p class="MsoNormal">I
havetables in the billions of rows, and have a common query pattern of asking for the table size (i.e. SELECT COUNT(*)
FROMbig_fdw_table; ).<p class="MsoNormal"> <p class="MsoNormal">This is a read-optimized system in which I know in
constanttime the exact dimensions of the table.<p class="MsoNormal"> <p class="MsoNormal">Is there any way to convince
PostgresFDW to leverage the analyze row counts or even the “double* totalRowCount” returned from the AcquireSampleRows
callbackfrom my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?<p
class="MsoNormal"> <pclass="MsoNormal">My current fallback is to export a specialized function that returns the table
rowcount for a given FDW table, but that then leaks into the user-application driving these queries.<p
class="MsoNormal"> <pclass="MsoNormal">Thanks in advance!<p class="MsoNormal">Gabe<p class="MsoNormal"> <p
class="MsoNormal"style="background:white"><b><span style="font-size:10.0pt;color:#262626">Gabe Rudy</span></b><span
style="font-size:10.0pt;color:#595959">| VP Product & Engineering | Golden Helix, Inc.</span><span
style="color:black"></span><pclass="MsoNormal"> </div> 

Re: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

От
Tom Lane
Дата:
"Gabe F. Rudy" <rudy@goldenhelix.com> writes:
> Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount"
returnedfrom the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table
scanfor a COUNT(*) etc?
 

No.  In PG's view, ANALYZE-based row counts are imprecise by definition.
        regards, tom lane



Re: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

От
Simon Riggs
Дата:
On 25 February 2016 at 09:48, Gabe F. Rudy <rudy@goldenhelix.com> wrote:

Hey all,

 

I’m building a FDW around a column-store backend (similar to CStore but for genomic data!).

 

I have tables in the billions of rows, and have a common query pattern of asking for the table size (i.e. SELECT COUNT(*) FROM big_fdw_table; ).

 

This is a read-optimized system in which I know in constant time the exact dimensions of the table.

 

Is there any way to convince Postgres FDW to leverage the analyze row counts or even the “double* totalRowCount” returned from the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table scan for a COUNT(*) etc?

 

My current fallback is to export a specialized function that returns the table row count for a given FDW table, but that then leaks into the user-application driving these queries.


Look at TABLESAMPLE, which does mostly what you're asking.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

От
"Gabe F. Rudy"
Дата:
Ok, I get that.

Really what I am *rooting* for is Aggregate (and Sort By) Push-Down to FDW plugins.

I can already internalize conditional filters for most cases, and doing a count on the filtered results would be
considerablyfaster in my FDW back-end before all the records and Datums have to be constructed for postgres to do the
counting.

Similarly, I'm very excited about the potential for FDW to advertise a-priori sort states, so things like external
merge-sortscan pass-through the request for sorted data for fields in which sorting is a no-op in my backend. 

Importantly my IDs are sorted by definition since they are essentially array indexes into the column-store, so joining
onthem with merge-sort should be blazing fast, but currently time is wasted sorting these pre-sorted fields. 

Just my 2c, and I'll be tracking the 9.6 progress that includes some of these proposals.

Gabe

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Thursday, February 25, 2016 11:21 PM
To: Gabe F. Rudy <rudy@goldenhelix.com>
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

"Gabe F. Rudy" <rudy@goldenhelix.com> writes:
> Is there any way to convince Postgres FDW to leverage the analyze row counts or even the "double* totalRowCount"
returnedfrom the AcquireSampleRows callback from my AnalyzeForeignTable function so that it does not do a full-table
scanfor a COUNT(*) etc? 

No.  In PG's view, ANALYZE-based row counts are imprecise by definition.
        regards, tom lane



Re: FDW handling count(*) through AnalyzeForeignTable or other constant time push-down

От
anantbhasu
Дата:
Hi Gabe, 
Did you get Aggregate Pushdown FDW plugin?
Would be really helpful if you can share some insight on your investigation.
Regards
Anant



--
View this message in context:
http://postgresql.nabble.com/FDW-handling-count-through-AnalyzeForeignTable-or-other-constant-time-push-down-tp5889291p5912699.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.