Re: directory tree query with big planner variation

Поиск
Список
Период
Сортировка
От Axel Rau
Тема Re: directory tree query with big planner variation
Дата
Msg-id A57707A5-9987-4991-B775-6892B3DDD7DC@Chaos1.DE
обсуждение исходный текст
Ответ на Re: directory tree query with big planner variation  (Michael Stone <mstone+postgres@mathom.us>)
Ответы Re: directory tree query with big planner variation  (Michael Stone <mstone+postgres@mathom.us>)
Список pgsql-performance
Am 31.07.2006 um 15:30 schrieb Michael Stone:

> If I understand the intend of this SQL,
Let me show the tables first:
                            Table "bacula.path"        (  65031 rows)
Column |  Type   |                       Modifiers
--------+---------
+-------------------------------------------------------
pathid | integer | not null default nextval('path_pathid_seq'::regclass)
path   | text    | not null      ( complete pathnames of all
directories )
Indexes:
     "path_pkey" PRIMARY KEY, btree (pathid)
     "path_name_idx" btree (path)

                              Table "bacula.file"       (3021903 rows)
    Column   |  Type   |                       Modifiers
------------+---------
+-------------------------------------------------------
fileid     | integer | not null default nextval
('file_fileid_seq'::regclass)
fileindex  | integer | not null default 0
jobid      | integer | not null
pathid     | integer | not null                (FK)
filenameid | integer | not null                (FK)
markid     | integer | not null default 0
lstat      | text    | not null
md5        | text    | not null
Indexes:
     "file_pkey" PRIMARY KEY, btree (fileid)
     "file_fp_idx" btree (filenameid, pathid)
     "file_jobid_idx" btree (jobid)
     "file_path_idx" btree (pathid)

                                Table "bacula.filename" ( 160559 rows)
    Column   |  Type   |                           Modifiers
------------+---------
+---------------------------------------------------------------
filenameid | integer | not null default nextval
('filename_filenameid_seq'::regclass)
name       | text    | not null
Indexes:
     "filename_pkey" PRIMARY KEY, btree (filenameid)
     "filename_name_idx" btree (name)

And now the query;

Task: Return the names of subdirectories and files immediately below
a given path. For each none-empty subdirectory return children=true.
The 1st part of the union selects all subdirecories (per regex) and
the flatfiles contained in them plus one entry for the subdirectory
itself (left outer joins). More than one joined filename means: "The
subdirectory has children".
The 2nd part of the union returns all flatfiles, contained in the
given path.
The surrounding SELECT removes the given path and the trailing "/"
keeping only the subdirectory names from the pathnames, so they can
be merged with the flatfile names.

> you're pulling all the entries
> in a directory in two parts. The first
(second)
> part (files) is fairly straightforward. The second
(first)
> part (directories) consists of pulling any file whose parent is a
> subdirectory of the directory you're looking for (this is *all*
> children of the directory, since you have to retrieve every element
> that begins with the directory, then discard those that have an
> additional / in their name), counting how many of these there are
> for each subdirectory, and discarding those results except for a
> binary (yes there are children or no there aren't). This is a lot
> of useless work to go through, and is going to be slow if you've
> got a lot of stuff in a subdirectory.
I agree, but did not yet find another way.
> An alternative approach would be, for each directory, to store all
> its children (files and subdirectories) along with a flag
> indicating which it is. This would allow you to create the
> collapsed tree view without walking all the children of a
> subdirectory.
Perhaps in a temporary table?
>
> Assuming you can't make changes to the schema, what about the query?
Can be changed.
> You've got this:
Please reconsider your proposals with the above

> It's hard to say without knowing what's actually *in* the tables,
> but the existing query definately doesn't scale well for what I
> think it's trying to do.
>
> Mike Stone
Axel
Axel Rau, ☀Frankfurt , Germany                       +49-69-951418-0



В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: sub select performance due to seq scans
Следующее
От: Michael Stone
Дата:
Сообщение: Re: directory tree query with big planner variation