Re: stored proc and inserting hundreds of thousands of rows

Поиск
Список
Период
Сортировка
От Samuel Gendler
Тема Re: stored proc and inserting hundreds of thousands of rows
Дата
Msg-id BANLkTi=ML2MbgEP-pRdAsHaCFp9s3w+DWA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: stored proc and inserting hundreds of thousands of rows  (Jeff Janes <jeff.janes@gmail.com>)
Ответы Re: stored proc and inserting hundreds of thousands of rows
Список pgsql-performance


On Sat, Apr 30, 2011 at 5:12 PM, Jeff Janes <jeff.janes@gmail.com> wrote:


gist indices are designed to make this type of thing fast, by using
techniques to rule out most of those comparisons without actually
performing them.  I don't know enough about the
guts of either your distance function or the gist indexes to know if
you can do it this way, but if you can it would certainly be the way
to go.

It is definitely a good idea to consider a gist index for eliminating most of a large dataset, if applicable.  Do a little reading on the topic and, hopefully, it's applicability (or not) will become apparent.

However, as someone who has built a number of ad servers over the years, for several of the larger ad networks, the first thing I'd do is separate your ad serving from real-time interaction with your database, no matter what the underlying technology.  If you ad serving is dependent on your database, it means that hiccups in the database will have an impact on ad serving, which is rarely tolerable.  And god forbid you should need to take the db down for a period of maintenance. The reliability and performance required of most ad servers is often greater than what should reasonably be expected of a relational database, particularly if there are other processes accessing the database, as is the case with your system. The first rule of ad serving is that no outage of backend systems should ever be able to prevent or impact front end ad serving. Some kind of in-memory cache of doc/ad mappings which the ad server interacts with will serve you in good stead and will be much easier to scale horizontally than most relational db architectures lend themselves to.  If you have an ever increasing set of documents and ads, you'll inevitably wind up 'sharding' your dataset across multiple db hosts in order to maintain performance - which creates a number of maintenance complexities. Easier to maintain a single database and do analytics over a single data source, but insulate it from the real-time performance requirements of your ad serving. Even something as simple as a process that pushes the most recent doc/ad mappings into a memcache instance could be sufficient - and you can scale your memcache across as many hosts as is necessary to deliver the lookup latencies that you require no matter how large the dataset.  Similarly, if you are updating the database from the ad server with each ad served in order to record an impression or click, you'll be far better off logging those and then processing the logs in bulk on a periodic basis.  If subsequent impressions are dependent upon what has already been served historically, then use your memcache instance (or whatever structure you eventually choose to utilize) to handle those lookups.  This gives you the convenience and flexibility of a relational system with SQL for access, but without the constraints of the capabilities of a single host limiting real-time performance of the system as a whole.


В списке pgsql-performance по дате отправления:

Предыдущее
От: Joel Reymont
Дата:
Сообщение: Re: stored proc and inserting hundreds of thousands of rows
Следующее
От: Joel Reymont
Дата:
Сообщение: Re: stored proc and inserting hundreds of thousands of rows