Problems implementing TPC-D queries
От | Paul Parker |
---|---|
Тема | Problems implementing TPC-D queries |
Дата | |
Msg-id | 199806040337.WAA09687@noel.cs.rice.edu обсуждение исходный текст |
Список | pgsql-sql |
Hi, I've been working for a while on a research project that attempts to "parallelize" Postgres via a software DSM (more work than expected and not actually useful for any end-user). We want to use TPC-D to model commercial workloads to analyze the speedup and memory behavior. I've gotten most of the queries to work (the inclusion of subselects in 6.3 was a big help and is earnestly appreciated), but I'm still having problems with a couple. Query 12 analyzes the performance of a couple of shipping methods, finding out how many late high-priority and low-priority shipments were made using each method. TPC's definitional query uses CASE in an aggregate, which isn't implemented. One variant uses aggregates (COUNT) inside views, which of course doesn't work. A second variant uses temporary tables instead of views, which I was more hopeful about. Unfortunately, I have been unable to use INSERT INTO to actually insert the tuples from the SELECT. That is: tpc_test=> SELECT tpc_test-> SHIPMODE, COUNT(*) tpc_test-> FROM LATEORDRS0 tpc_test-> WHERE ORDRPRIORITY IN ('1-URGENT', '2-HIGH') tpc_test-> AND shipmode = 'RAIL' tpc_test-> GROUP BY SHIPMODE; shipmode |count ----------+----- RAIL | 44 (1 row) as compared to: tpc_test=> CREATE TABLE QUICK0 (SHIPMODE CHAR(10), HIGH_LINE_COUNT INTEGER); CREATE tpc_test=> tpc_test=> INSERT INTO QUICK0 tpc_test-> SELECT tpc_test-> SHIPMODE, COUNT(*) tpc_test-> FROM LATEORDRS0 tpc_test-> WHERE ORDRPRIORITY IN ('1-URGENT', '2-HIGH') tpc_test-> AND shipmode = 'RAIL' tpc_test-> GROUP BY SHIPMODE; INSERT 105206 1 tpc_test=> SELECT * FROM QUICK0; shipmode|high_line_count --------+--------------- | 0 Note the types are correct (assuming count returns an integer, which is validated below): CREATE VIEW LATEORDRS0 AS SELECT L_SHIPMODE AS SHIPMODE, O_ORDRPRIORITY AS ORDRPRIORITY FROM ORDR, LINEITEM WHERE O_ORDRKEY = L_ORDRKEY AND L_SHIPMODE IN ('TRUCK','RAIL') AND L_COMMITDATE < L_RECEIPTDATE AND L_SHIPDATE < L_COMMITDATE AND L_RECEIPTDATE >= '1993-01-01'::DATETIME AND L_RECEIPTDATE < ('1993-01-01'::DATETIME + '1 YEAR'::TIMESPAN); tpc_test-> \d lineitem Table = lineitem +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ ... | l_shipmode | char() | 10 | In fact, this query works (assuming quick0 has at least one row with a non-NULL shipmode): insert into quick0 select shipmode, count(*) from quick0 group by shipmode ; But this query only adds a blank and zero tuple: insert into quick0 select shipmode, count(*) from lateordrs0 group by shipmode ; select * from quick0; Quite likely this is actually a bug, but I don't know SQL well enough to pronounce it that for sure. The other query's problem is COUNT DISTINCT (unimplemented, I believe). I've seen this mentioned in the archives, but I haven't seen any solutions that would work. They mentioned you could produce a DISTINCT view, but COUNT(*) failed on it (related to the view-aggregate problem??). I suspect INSERT INTO SELECT DISTINCT... might work, however even if it did, I would need to do it for each of some arbitrary number of cases, as per the query, which counts the number of suppliers for each part meeting some specifications: SELECT P_BRAND, P_TYPE, P_SIZE, COUNT(DISTINCT PS_SUPPKEY) AS SUPPLIER_CNT FROM PARTSUPP, PART WHERE P_PARTKEY = PS_PARTKEY AND P_BRAND <> 'Brand#14' AND P_TYPE NOT LIKE 'LARGE PLATED%' AND P_SIZE IN (45, 22, 2, 48, 4, 49, 26, 10) AND PS_SUPPKEY NOT IN (SELECT S_SUPPKEY FROM SUPPLIER WHERE S_COMMENT LIKE '%Better Business Bureau%Complaints%') GROUP BY P_BRAND, P_TYPE, P_SIZE ORDER BY SUPPLIER_CNT DESC, P_BRAND, P_TYPE, P_SIZE; Also, for no reason readily evident to myself, this last query takes 41 cpu hours on a P6-150 w/ 64 M RAM running FreeBSD 2.2 over a 20 M database, after having done vacuum analyze (which improved several queries one or two orders of magnitude), whereas the other queries take a few cpu seconds, a difference of about 1000x. True, the VM size does swell to 57 M, but the working set seems content at <30M, which the otherwise empty machine easily gives it. CREATE TABLE ALL_NATIONS0 (YEAR_CREATE INTEGER, VOLUME FLOAT8, NATION CHAR(25)); INSERT INTO ALL_NATIONS0 SELECT EXTRACT (YEAR FROM O_ORDRDATE)::INTEGER AS YEAR_SEL, L_EXTENDEDPRICE*(1-L_DISCOUNT) AS VOLUME, N2.N_NAME AS NATION FROM PART, SUPPLIER, LINEITEM, ORDR, CUSTOMER, NATION N1, NATION N2, REGION WHERE P_PARTKEY = L_PARTKEY AND S_SUPPKEY = L_SUPPKEY AND L_ORDRKEY = O_ORDRKEY AND O_CUSTKEY = C_CUSTKEY AND C_NATIONKEY = N1.N_NATIONKEY AND N1.N_REGIONKEY = R_REGIONKEY AND R_NAME = 'ASIA' AND S_NATIONKEY = N2.N_NATIONKEY AND O_ORDRDATE > ('1995-01-01'::DATETIME) AND O_ORDRDATE < ('1996-12-31'::DATETIME) AND P_TYPE = 'ECONOMY POLISHED COPPER'; Any help anyone can give is appreciated, Paul ----------- Paul Parker\___________________________________ pparker@cs.rice.edu a.k.a. "The other Paul" \_____________ Second-year grad-student seeking a decent signature quotation
В списке pgsql-sql по дате отправления: