Re: Naming-scheme for db-files

Поиск
Список
Период
Сортировка
От Oleg Bartunov
Тема Re: Naming-scheme for db-files
Дата
Msg-id Pine.GSO.4.44.0208291401340.8881-100000@ra.sai.msu.su
обсуждение исходный текст
Ответ на Re: Naming-scheme for db-files  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-general
To continue my previous message.

I just load test data (i've used in README.tsearch) and looked at sizes:

-rw-------    1 postgres users    23191552 Aug 29 14:08 53016937
-rw-------    1 postgres users    81059840 Aug 29 14:08 52639027

Table titles occupies 80Mb, index on txtidx column occupies 22Mb.
After doing

test=# select title  into titles_tmp from titles;
SELECT

I got size of table 'titles' without txtidx field

-rw-------    1 postgres users    30105600 Aug 29 14:14 53016938

So, txtidx column itself occupies about 50Mb. I don't see any giga-sizes !

after running 'vacuum full analyze' I got:

-rw-------    1 postgres users    30105600 Aug 29 14:26 53016938
-rw-------    1 postgres users    36880384 Aug 29 14:26 53016937
-rw-------    1 postgres users    51494912 Aug 29 14:26 52639027
pg@zen:/usr/local/pgsql/data/base/52638986$ oid2name -d test -x|grep 53016937
53016937 = t_idx
pg@zen:/usr/local/pgsql/data/base/52638986$ oid2name -d test -x|grep 52639027
53016934 = pg_toast_52639027
53016936 = pg_toast_52639027_idx
52639027 = titles
pg@zen:/usr/local/pgsql/data/base/52638986$ oid2name -d test -x|grep 53016938
53016938 = titles_tmp

So, actual size of 'txtidx' field is 20 Mb !

---------------------------------------------------------------------------

test=# select count(*) from titles;
s count
--------
 377905
(1 row)

test=# select version();
                            version
---------------------------------------------------------------
 PostgreSQL 7.2.2 on i686-pc-linux-gnu, compiled by GCC 2.95.3
(1 row)

test=# \d titles
                Table "titles"
  Column  |          Type          | Modifiers
----------+------------------------+-----------
 title    | character varying(256) |
 titleidx | txtidx                 |
Indexes: t_idx

There is one performance tips: Use CLUSTER command to speedup search query !
There was thread about recently in -hackers. In 7.2 CLUSTER works but
requires some addtional operations, in 7.3 it should works fine without
problems.


    Regards,

        Oleg

On Thu, 29 Aug 2002, Oleg Bartunov wrote:

> I didn't follow this thread, but I'm a little bit surprised !
> txtdix itself is just a parsed data from corresponding column !
> and txtidx stores only unique lexems, so size of txtidx column should
> be lesser than original one ! I'd be interested to see more details
> (if you could send me your data and sql script). Also,
> did you use oid2name from contrib to see sizes of tables and indices ?
> What is an output of 'select version()' ?
> 100,000 rows is not a big deal.
>
>     regards,
>
>         Oleg
>
>
> On Thu, 29 Aug 2002, Markus Wollny wrote:
>
> > Just a try on using txtidx for a speedy search over the e-mail-fields.
> > No success, though - fti would have been better for the task (like
> > finding all users who have got an aol.com-adress).
> >
> > > -----UrsprNngliche Nachricht-----
> > > Von: Martijn van Oosterhout [mailto:kleptog@svana.org]
> > > Gesendet: Donnerstag, 29. August 2002 01:18
> > > An: Markus Wollny
> > > Cc: pgsql-general@postgresql.org
> > > Betreff: Re: [GENERAL] Naming-scheme for db-files
> > >
> > >
> > > On Wed, Aug 28, 2002 at 11:25:44PM +0200, Markus Wollny wrote:
> > > > After dropping the mentioned column emailindex of type txtidx (via
> > > > copying the data to a temporary table and renaming this table
> > > > afterwards), disk usage of that table evaporated from
> > > 1862MB to a humble
> > > > 61MB. And emailindex was completely empty... Very strange,
> > > but ~60MB is
> > > > very much more more like what I had estimated the size of
> > > that table to
> > > > be.
> > >
> > > What on earth is:
> > >
> > > emailindex txtidx,
> > >
> > > Anyway?
> > >
> > > >     -----UrsprУМngliche Nachricht-----
> > > >     Von: Markus Wollny
> > > >     Gesendet: Mi 28.08.2002 17:04
> > > >     An: Martijn van Oosterhout
> > > >     Cc: pgsql-general@postgresql.org
> > > >     Betreff: Re: [GENERAL] Naming-scheme for db-files
> > > >
> > > >
> > > >
> > > >     Hi!
> > > >
> > > >     Yes, I run vacuum every night - and log-output indicates no
> > > > errors at
> > > >     all. Yet I am indeed quite puzzled about the size of this table.
> > > > Is
> > > >     there some way of finding out which column consumes so much
> > > > space?
> > > >
> > > >     select count(*) from ct_com_user return 95858 rows.
> > > >     The CREATE-statement for this table is as follows:
> > > >
> > > >     CREATE TABLE ct_com_user (
> > > >       user_id numeric(10, 0),
> > > >       login varchar(1000),
> > > >       password varchar(1000),
> > > >       status numeric(10, 0),
> > > >       rights varchar(20) DEFAULT 'r',
> > > >       firstname varchar(1000),
> > > >       firstname_visible numeric(1, 0) DEFAULT 0,
> > > >       lastname varchar(1000),
> > > >       lastname_visible numeric(1, 0) DEFAULT 0,
> > > >       clan varchar(1000),
> > > >       clan_visible numeric(1, 0) DEFAULT 0,
> > > >       street varchar(1000),
> > > >       street_visible numeric(1, 0) DEFAULT 0,
> > > >       zipcode varchar(1000),
> > > >       zipcode_visible numeric(1, 0) DEFAULT 0,
> > > >       city varchar(1000),
> > > >       city_visible numeric(1, 0) DEFAULT 0,
> > > >       country varchar(1000),
> > > >       country_visible numeric(1, 0) DEFAULT 0,
> > > >       phone1 varchar(1000),
> > > >       phone1_visible numeric(1, 0) DEFAULT 0,
> > > >       phone2 varchar(1000),
> > > >       phone2_visible numeric(1, 0) DEFAULT 0,
> > > >       mobile varchar(1000),
> > > >       mobile_visible numeric(1, 0) DEFAULT 0,
> > > >       fax varchar(1000),
> > > >       fax_visible numeric(1, 0) DEFAULT 0,
> > > >       email1 varchar(1000),
> > > >       email1_visible numeric(1, 0) DEFAULT 0,
> > > >       email2 varchar(1000),
> > > >       email2_visible numeric(1, 0) DEFAULT 0,
> > > >       icq varchar(1000),
> > > >       icq_visible numeric(1, 0) DEFAULT 0,
> > > >       homepage varchar(1000),
> > > >       homepage_visible numeric(1, 0) DEFAULT 0,
> > > >       description varchar(1000),
> > > >       description_visible numeric(1, 0) DEFAULT 0,
> > > >       hobbies varchar(1000),
> > > >       hobbies_visible numeric(1, 0) DEFAULT 0,
> > > >       signature1 varchar(4000),
> > > >       signature2 varchar(4000),
> > > >       signature3 varchar(4000),
> > > >       forum_view varchar(20) DEFAULT 'flat',
> > > >       temp_password varchar(100),
> > > >       registered timestamptz,
> > > >       last_login timestamptz,
> > > >       created timestamptz DEFAULT ('now'::text)::timestamp(6) with
> > > > time
> > > >     zone,
> > > >       lines numeric(3, 0) DEFAULT 400,
> > > >       lines_forum numeric(3, 0) DEFAULT 20,
> > > >       forum_lines numeric(3, 0) DEFAULT 20,
> > > >       forum_smileys varchar(50) DEFAULT 'enhanced',
> > > >       site_id numeric(10, 0) DEFAULT 0,
> > > >       flag_id numeric(10, 0) DEFAULT 0,
> > > >       forum_quoting varchar(50) DEFAULT 'enhanced',
> > > >       forum_flatpostings numeric(3, 0) DEFAULT 10,
> > > >       forum_images numeric(1, 0) DEFAULT 2,
> > > >       user_image numeric(1, 0) DEFAULT 0,
> > > >       user_image_visible numeric(1, 0) DEFAULT 0,
> > > >       chat_password varchar(50),
> > > >       chat_last_visit timestamptz,
> > > >       authorpoints_visible numeric(1, 0) DEFAULT 0,
> > > >       emailindex txtidx,
> > > >       CONSTRAINT idx_pk_ct_com_user UNIQUE (user_id),
> > > >       CONSTRAINT idx_u_ct_com_user_login UNIQUE (login)
> > > >     ) WITH OIDS;
> > > >
> > > >     As hardly anyone actually uses a signature, hobby or
> > > > description, I
> > > >     cannot imagine how this should amount to almost 2 GB of data
> > > > (1862MB) -
> > > >     this would mean roughly 20k of data per row average - and no way
> > > > have we
> > > >     got this amount of data - the absolute maximum data per row as
> > > > defined
> > > >     would be about 30k, I guess, but that would mean that nearly all
> > > > of our
> > > >     users would use nearly all available space and I know that this
> > > > is quite
> > > >     far from true. I suspect it may have something to do with
> > > > emailindex of
> > > >     the txtidx-type... As it's not absolutely necessary, I'll try
> > > > and drop
> > > >     this column and see what I get...
> > > >
> > > >     Regards,
> > > >
> > > >             Markus
> > > >
> > > >     > -----UrsprУМngliche Nachricht-----
> > > >     > Von: Martijn van Oosterhout [mailto:kleptog@svana.org]
> > > >     > Gesendet: Mittwoch, 28. August 2002 16:30
> > > >     > An: Markus Wollny
> > > >     > Cc: pgsql-general@postgresql.org
> > > >     > Betreff: Re: [GENERAL] Naming-scheme for db-files
> > > >     >
> > > >     >
> > > >     > On Wed, Aug 28, 2002 at 03:54:45PM +0200, Markus Wollny wrote:
> > > >     > > Hi!
> > > >     > >
> > > >     > > As I was just checking disk-usage of a database (PostgreSQL
> > > >     > 7.2.1), I
> > > >     > > stumbled over some files named with a trailing .1 added to
> > > > the usual
> > > >     > > oid. Now if a table 'example' with oid 12345 exists, what
> > > >     > does the file
> > > >     > > 12345.1 contain exactly? I didn't find anything about
> > > >     > .1-files in the
> > > >     > > documentation...
> > > >     >
> > > >     > Postgres splits files at 1GB. The .1 file would be the second
> > > >     > part of the
> > > >     > file. When that also reaches 1GB, you'll get a .2 file.
> > > >     >
> > > >     > You're doing VACUUM [FULL] regularly, right?
> > > >     >
> > > >     > --
> > > >     > Martijn van Oosterhout   <kleptog@svana.org>
> > > >     > http://svana.org/kleptog/
> > > >     > > There are 10 kinds of people in the world, those that can do
> > > > binary
> > > >     > > arithmetic and those that can't.
> > > >     >
> > > >
> > > >     ---------------------------(end of
> > > > broadcast)---------------------------
> > > >     TIP 2: you can get off all lists at once with the unregister
> > > > command
> > > >         (send "unregister YourEmailAddressHere" to
> > > > majordomo@postgresql.org)
> > > >
> > > >
> > > >
> > > > ---------------------------(end of
> > > broadcast)---------------------------
> > > > TIP 1: subscribe and unsubscribe commands go to
> > > majordomo@postgresql.org
> > >
> > > --
> > > Martijn van Oosterhout   <kleptog@svana.org>
> > > http://svana.org/kleptog/
> > > > There are 10 kinds of people in the world, those that can do binary
> > > > arithmetic and those that can't.
> > >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> > http://www.postgresql.org/users-lounge/docs/faq.html
> >
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


В списке pgsql-general по дате отправления:

Предыдущее
От: "Markus Wollny"
Дата:
Сообщение: Re: Naming-scheme for db-files
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Free space mapping (was Re: Multi-Versions and Vacuum)