Обсуждение: Indexes and Primary Keys on Rapidly Growing Tables

Поиск
Список
Период
Сортировка

Indexes and Primary Keys on Rapidly Growing Tables

От
Alessandro Gagliardi
Дата:
New question regarding this seen_its table: It gets over 100 inserts per second. Probably many more if you include every time unique_violation occurs. This flood of data is constant. The commits take too long (upwards of 100 ms, ten times slower than it needs to be!) What I'm wondering is if it would be better to insert all of these rows into a separate table with no constraints (call it daily_seen_its) and then batch insert them into a table with something like: INSERT INTO seen_its SELECT user_id, moment_id, MIN(created) FROM daily_seen_its GROUP BY user_id, moment_id WHERE created BETWEEN 'yesterday' AND 'today'; the idea being that a table with no constraints would be able to accept insertions much faster and then the primary key could be enforced later. Even better would be if this could happen hourly instead of daily. But first I just want to know if people think that this might be a viable solution or if I'm barking up the wrong tree.

Thanks!
-Alessandro

On Fri, Feb 17, 2012 at 10:34 AM, Alessandro Gagliardi <alessandro@path.com> wrote:
CREATE TABLE seen_its (
  user_id character(24) NOT NULL,
  moment_id character(24) NOT NULL,
  created timestamp without time zone,
  inserted timestamp without time zone DEFAULT now(),
  CONSTRAINT seen_its_pkey PRIMARY KEY (user_id , moment_id )
) WITH ( OIDS=FALSE );

CREATE INDEX seen_its_created_idx  ON seen_its  USING btree  (created );

CREATE INDEX seen_its_user_id_idx  ON seen_its  USING btree  (user_id );

Re: Indexes and Primary Keys on Rapidly Growing Tables

От
Josh Berkus
Дата:
On 2/20/12 2:06 PM, Alessandro Gagliardi wrote:
> . But first I just want to know if people
> think that this might be a viable solution or if I'm barking up the wrong
> tree.

Batching is usually helpful for inserts, especially if there's a unique
key on a very large table involved.

I suggest also making the buffer table UNLOGGED, if you can afford to.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: Indexes and Primary Keys on Rapidly Growing Tables

От
Alessandro Gagliardi
Дата:
I was thinking about that (as per your presentation last week) but my problem is that when I'm building up a series of inserts, if one of them fails (very likely in this case due to a unique_violation) I have to rollback the entire commit. I asked about this in the novice forum and was advised to use SAVEPOINTs. That seems a little clunky to me but may be the best way. Would it be realistic to expect this to increase performance by ten-fold?

On Mon, Feb 20, 2012 at 3:30 PM, Josh Berkus <josh@agliodbs.com> wrote:
On 2/20/12 2:06 PM, Alessandro Gagliardi wrote:
> . But first I just want to know if people
> think that this might be a viable solution or if I'm barking up the wrong
> tree.

Batching is usually helpful for inserts, especially if there's a unique
key on a very large table involved.

I suggest also making the buffer table UNLOGGED, if you can afford to.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Indexes and Primary Keys on Rapidly Growing Tables

От
Samuel Gendler
Дата:

On Tue, Feb 21, 2012 at 9:59 AM, Alessandro Gagliardi <alessandro@path.com> wrote:
I was thinking about that (as per your presentation last week) but my problem is that when I'm building up a series of inserts, if one of them fails (very likely in this case due to a unique_violation) I have to rollback the entire commit. I asked about this in the novice forum and was advised to use SAVEPOINTs. That seems a little clunky to me but may be the best way. Would it be realistic to expect this to increase performance by ten-fold?


if you insert into a different table before doing a bulk insert later, you can de-dupe before doing the insertion, eliminating the issue entirely.


Re: Indexes and Primary Keys on Rapidly Growing Tables

От
Alessandro Gagliardi
Дата:
True. I implemented the SAVEPOINTs solution across the board. We'll see what kind of difference it makes. If it's fast enough, I may be able to do without that.

On Tue, Feb 21, 2012 at 3:53 PM, Samuel Gendler <sgendler@ideasculptor.com> wrote:

On Tue, Feb 21, 2012 at 9:59 AM, Alessandro Gagliardi <alessandro@path.com> wrote:
I was thinking about that (as per your presentation last week) but my problem is that when I'm building up a series of inserts, if one of them fails (very likely in this case due to a unique_violation) I have to rollback the entire commit. I asked about this in the novice forum and was advised to use SAVEPOINTs. That seems a little clunky to me but may be the best way. Would it be realistic to expect this to increase performance by ten-fold?


if you insert into a different table before doing a bulk insert later, you can de-dupe before doing the insertion, eliminating the issue entirely.