Обсуждение: Scaling PostgreSQL-9

Поиск
Список
Период
Сортировка

Scaling PostgreSQL-9

От
"sandeep prakash dhumale"
Дата:
Hello All,

Need some help in scaling PostgreSQL:

I have a table with 400M records with 5 int columns having index only on 1 column.

Rows are updated by a perl script which takes 10k numbers in one transactions and fires single single update in a loop on database keeping track of the result returned . If zero returned then at later stage it does an insert. In short if the record is present in the DB then it gets updated and if not then get inserted. > 80% the records are always there in the DB so updates are more.

We need to speed up this process as it takes about 150 sec to complete 10k batch. From database logs on the avg each update takes about 15ms.

I tried to do a bulk delete of 1M numbers and copy of the same but no luck so far. Delete and copy also take a longer time more than 1 hour each.

Few Details:

PostgreSQL 9.0.0 on 11.0 open SuSe-64bit,Shared buffer=2.5GB,effective_cache_size=20GB,checkpoint_segments=200 (raised for bulkloading)

Server: dell dual quadcore ,32GB RAM, DB partition on RAID 10 and pg_xlog on RAID 1.

p.s. Previously we were having slony read only slave on 8.4.2 , where delete was fast about 7 min and copy 6 min, we moved to PostgreSQL 9 for read only Stand by slave to remove overhead caused by slony due to triggers (also the slave was always lagging in case of bulkloads on master)  in the hope of speeding up the process.

Any help would be much appriciated ...

With Regards
sandy




Re: Scaling PostgreSQL-9

От
Vick Khera
Дата:
On Tue, Sep 28, 2010 at 6:31 AM, sandeep prakash dhumale <sandy9940@rediffmail.com> wrote:
I have a table with 400M records with 5 int columns having index only on 1 column.

How is your data used?  Is the update done by the primary key?  Are the queries segmented in some way that may divide the data based on one of the other columns?

You should investigate using partitions to hold your data.  I'd recommend at least 100 partitions.  I've done this with great success by dividing some tables along one of the foreign keys.  My table was just a pure relation relating the PKs of two other tables.  After analyzing the queries that were most often run, we decided to split along the one which resulted in the fewest partitions being referenced per search.

By splitting, we reduced a 200M+ row table into 100 tables of ~2M rows, and the constraint exclusion (or altering the queries to directly access the proper partition) reduced our query times dramatically.

Re: Re: [GENERAL] Scaling PostgreSQL-9

От
"Sandy"
Дата:


On Tue, 28 Sep 2010 17:45:16 +0530 wrote
>On Tue, Sep 28, 2010 at 6:31 AM, sandeep prakash dhumale wrote:

I have a table with 400M records with 5 int columns having index only on 1 column.

How is your data used?  Is the update done by the primary key?  Are the queries segmented in some way that may divide the data based on one of the other columns?

You should investigate using partitions to hold your data.  I'd recommend at least 100 partitions.  I've done this with great success by dividing some tables along one of the foreign keys.  My table was just a pure relation relating the PKs of two other tables.  After analyzing the queries that were most often run, we decided to split along the one which resulted in the fewest partitions being referenced per search.

By splitting, we reduced a 200M+ row table into 100 tables of ~2M rows, and the constraint exclusion (or altering the queries to directly access the proper partition) reduced our query times dramatically.

Thanks Vivek for your replay,

We did have investigated the partitioning but looks like this wont be an ideal candidate for the same perhaps you might be able to share some more light on it.

Table contains unique mobile numbers and update is based on this mobile number. Initially we thought of partitioning by range of mobile series and ended up with about 50 partitions (can be increased as per your suggestion to 100), problem we faced update was also slow as update was based on mobile number and constraint was on mobile series. moreover if i have SELECT queries which has IN clause with random mobile numbers which may end up scanning all the tables.

Table has mobile number,status and expiry date. I can not partition on expiry date as all SELECT's are on mobile number.


Please suggest...



Snady



Re: Scaling PostgreSQL-9

От
Vick Khera
Дата:
On Tue, Sep 28, 2010 at 8:56 AM, Sandy <sandy9940@rediffmail.com> wrote:
Table has mobile number,status and expiry date. I can not partition on expiry date as all SELECT's are on mobile number.

Then partition on the mobile number.  If your updates and queries are all tied to that, then it is the ideal candidate.

You should also make sure that you have proper indexes on each partition to let the query executer quickly decide that a given table is not of any use (you really don't want sequence scan on each partition).

You will get speedups if you can achieve one or both of getting the query panner to use the constraint exclusions to eliminate partitions and getting the index scans to quickly skip over partitions which will not have the data you want.

Re: Scaling PostgreSQL-9

От
"Igor Neyman"
Дата:

> -----Original Message-----
> From: sandeep prakash dhumale [mailto:sandy9940@rediffmail.com]
> Sent: Tuesday, September 28, 2010 6:32 AM
> To: pgsql-general@postgresql.org
> Subject: Scaling PostgreSQL-9
>
> Hello All,
>
> Need some help in scaling PostgreSQL:
>
> I have a table with 400M records with 5 int columns having
> index only on 1 column.
>
> Rows are updated by a perl script which takes 10k numbers in
> one transactions and fires single single update in a loop on
> database keeping track of the result returned . If zero
> returned then at later stage it does an insert. In short if
> the record is present in the DB then it gets updated and if
> not then get inserted. > 80% the records are always there in
> the DB so updates are more.
>
> We need to speed up this process as it takes about 150 sec to
> complete 10k batch. From database logs on the avg each update
> takes about 15ms.
>

Your problem is that you process one record at a time in your loop,
Meaning you have to make 10k trips to the database to process 10k
records.

Try creating "staging" table in the database,
copy all the records from your source into staging table,
i.e. using COPY command if your source is a file.
Then using couple sql statements:
insert ... where not exists (select ...)
Update ... Where exists...

Insert new and update existing records.
Here you work with data sets inside the database,
which should be much faster then procedural perl script.


> I tried to do a bulk delete of 1M numbers and copy of the
> same but no luck so far. Delete and copy also take a longer
> time more than 1 hour each.
>

How did you do copy? Again using perl script to loop through 1M records
one at a time?



> Few Details:
>
> PostgreSQL 9.0.0 on 11.0 open SuSe-64bit,Shared
> buffer=2.5GB,effective_cache_size=20GB,checkpoint_segments=200
>  (raised for bulkloading)
>
> Server: dell dual quadcore ,32GB RAM, DB partition on RAID 10
> and pg_xlog on RAID 1.
>
> p.s. Previously we were having slony read only slave on 8.4.2
> , where delete was fast about 7 min and copy 6 min, we moved
> to PostgreSQL 9 for read only Stand by slave to remove
> overhead caused by slony due to triggers (also the slave was
> always lagging in case of bulkloads on master)  in the hope
> of speeding up the process.
>
> Any help would be much appriciated ...
>
> With Regards
> sandy
>
>


Regards,
Igor Neyman