Re: Sorted union

Поиск
Список
Период
Сортировка
От Scott Lamb
Тема Re: Sorted union
Дата
Msg-id 9757F82C-51CC-4185-B364-FC23C267D976@slamb.org
обсуждение исходный текст
Ответ на Re: Sorted union  ("Merlin Moncure" <merlin.moncure@rcsonline.com>)
Список pgsql-performance
On Nov 3, 2005, at 8:20 AM, Merlin Moncure wrote:
> select t, (select count(*) from  transaction where t between happened
> and when_stopped) from
> (
>     select ((generate_series(1,60) * scale)::text::interval) + '12:00
> pm'::time as t
> ) q;

Wow. I hadn't known about generate_series, but there are a bunch of
places I've needed it.

As cool as this is, though, I don't think it helps me. There's
another event-driven graph that I need. For lack of a better name, I
call it the slot graph. Every single transaction is graphed as a
horizontal line from its start time to its end time, with a vertical
line at the start and stop. Successful, timed out, and failed
transactions are green, black, and red, respectively. I use it in a
couple different ways:

(1) on short timescales, it's nice to look at individual
transactions. My tester will max out at either a rate or a
concurrency. If I'm having problems, I'll get bursts of timeouts.
This graph is the one that makes it clear why - it shows how things
align, etc. Actually, even for longer timespans, this is still
helpful - it's nice to see that most of the slots are filled with
timing-out transactions when the rate falls.

(2) It can show you if something affects all of the transactions at
once. When we did a database failover test, we saw a bunch of
failures (as expected; our application isn't responsible for
retries). This graph is the one that showed us that _all_
transactions that were active at a specific time failed and that no
other transactions failed. (There was a sharp vertical line of reds
and blacks in the larger block of greens).

I wish I could just show these to you, rather than describing them.
It's all proprietary data, though. Maybe soon I'll have similar
graphs of my open source SSL proxy.

But the point is, I don't think I can represent this information
without sending every data point to my application. I assign slots by
the start time and free them by the stop time.

But I think there is something I can do: I can just do a query of the
transaction table sorted by start time. My graph tool can keep a
priority queue of all active transactions, keyed by the stop time.
Whenever it grabs a new event, it can peek at the next start time but
check if there are any stop times before it. Then at the end, it can
pick up the rest of the stop times. The concurrency will never exceed
a few thousand, so the additional CPU time and memory complexity are
not a problem. As a bonus, I will no longer need my index on the stop
time. Dropping it will save a lot of disk space.

Thanks for getting me off the "I need a fast query that returns these
exact results" mindset. It is good to step back and look at the big
picture.

Mind you, I still think PostgreSQL should be able to perform that
sorted union fast. Maybe sometime I'll have enough free time to take
my first plunge into looking at a database query planner.

Regards,
Scott

--
Scott Lamb <http://www.slamb.org/>



В списке pgsql-performance по дате отправления:

Предыдущее
От: "Merlin Moncure"
Дата:
Сообщение: Re: Sorted union
Следующее
От: "Merlin Moncure"
Дата:
Сообщение: Re: Sorted union