Обсуждение: Extracting SQL from logs in a usable format
Hi all, I have a project where I need to be able to capture every query from a production system into a file such that I can "replay" them on a staging system. Does such a thing already exist or should I start writing my own log parser? Thank you for your help. Chris Ernst eSoft, Inc.
On 18/12/09, Chris Ernst (cernst@esoft.com) wrote: > I have a project where I need to be able to capture every query from a > production system into a file such that I can "replay" them on a staging > system. Does such a thing already exist or should I start writing my > own log parser? I'm sure there are a number of log replay systems already in existence. Perhaps you could use Londiste, and introduce a lag to the replication process if it is required? http://skytools.projects.postgresql.org/doc/londiste.ref.html#toc17 A very simple way of doing this is to log all of the SQL statements by setting the postgresql.conf parameter log_min_duration_statement to 0. You can then easily parse out the SQL statements from the log file. I have done this before quite successfully when wishing to replay a long set of SQL statements to test un upgrade of a Postgresql server. Rory -- Rory Campbell-Lange Director rory@campbell-lange.net Campbell-Lange Workshop www.campbell-lange.net 0207 6311 555 3 Tottenham Street London W1T 2AF Registered in England No. 04551928
Rory Campbell-Lange wrote: > On 18/12/09, Chris Ernst (cernst@esoft.com) wrote: >> I have a project where I need to be able to capture every query from a >> production system into a file such that I can "replay" them on a staging >> system. Does such a thing already exist or should I start writing my >> own log parser? > > I'm sure there are a number of log replay systems already in existence. > Perhaps you could use Londiste, and introduce a lag to the replication > process if it is required? > http://skytools.projects.postgresql.org/doc/londiste.ref.html#toc17 > > A very simple way of doing this is to log all of the SQL statements by > setting the postgresql.conf parameter log_min_duration_statement to 0. > You can then easily parse out the SQL statements from the log file. > I have done this before quite successfully when wishing to replay a long > set of SQL statements to test un upgrade of a Postgresql server. Hi Rory, Thank you for the quick reply. Londiste isn't really an option as it (apparently) would only get INSERT, UPDATE and DELETE queries. I would want to capture every query that is run, including SELECTs. Plus, the production master is already running slony1, and I don't think they will play nice together. My goal is to be able to replay a set of actual production traffic on the staging server, starting from a snapshot at the point where the statement logging began. Then make some changes (tweak settings, upgrade versions, make DDL changes, etc.) and rerun the same set of statements to analyze the results with pgFouine. I started writing my own log parser to pull the statements from the postgres logs, but as I get in to the details, it's not quite as straight forward as I had thought. Keeping track of which client connections have prepared queries, merging the correct arguments and maintaining concurrency are all critical and far from trivial. Basically I'm curious if anyone has already created something that does this or am I treading into uncharted waters? I've been googling around a bit and haven't come up with anything yet. Thanks again, - Chris
Chris Ernst wrote: > I started writing my own log parser to pull the statements from the > postgres logs, but as I get in to the details, it's not quite as > straight forward as I had thought. Keeping track of which client > connections have prepared queries, merging the correct arguments and > maintaining concurrency are all critical and far from trivial. > I hope you're using the CVS format logs, which should make the job a lot easier than the standard text one. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Tsung [http://tsung.erlang-projects.org/] has a nice pg proxy module. It records your "query stream" and is able to replay it later.
2009/12/18 Chris Ernst <cernst@esoft.com>
Hi all,
I have a project where I need to be able to capture every query from a
production system into a file such that I can "replay" them on a staging
system. Does such a thing already exist or should I start writing my
own log parser?
Thank you for your help.
Chris Ernst
eSoft, Inc.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
Filip Rembiałkowski
JID,mailto:filip.rembialkowski@gmail.com
http://filip.rembialkowski.net/
Greg Smith wrote: > Chris Ernst wrote: >> I started writing my own log parser to pull the statements from the >> postgres logs, but as I get in to the details, it's not quite as >> straight forward as I had thought. Keeping track of which client >> connections have prepared queries, merging the correct arguments and >> maintaining concurrency are all critical and far from trivial. >> > I hope you're using the CVS format logs, which should make the job a lot > easier than the standard text one. HOLY COW! I hadn't even thought of that. Excellent point. Thank you, Greg. - Chris
Hmm.. That does look very interesting. The only thing that concerns me is where it says it supports "Basic Queries (Extended queries not yet supported)". I'm not sure what is meant by "Extended queries". Any idea? Thank you for the pointer, Filip. I'll check it out. - Chris Filip Rembiałkowski wrote: > > Tsung [http://tsung.erlang-projects.org/] has a nice pg proxy module. It > records your "query stream" and is able to replay it later. > > > > 2009/12/18 Chris Ernst <cernst@esoft.com <mailto:cernst@esoft.com>> > > Hi all, > > I have a project where I need to be able to capture every query from a > production system into a file such that I can "replay" them on a staging > system. Does such a thing already exist or should I start writing my > own log parser? > > Thank you for your help. > > Chris Ernst > eSoft, Inc. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org > <mailto:pgsql-general@postgresql.org>) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > > > > -- > Filip Rembiałkowski > JID,mailto:filip.rembialkowski@gmail.com > <mailto:filip.rembialkowski@gmail.com> > http://filip.rembialkowski.net/
Chris Ernst wrote: > I have a project where I need to be able to capture every query from a > production system into a file such that I can "replay" them on a staging > system. Does such a thing already exist or should I start writing my > own log parser? I am currently developing such a beast, it is currently still quite alpha. If you are interested I can send you a copy. I'll try to publish it once it is sufficiently beta. Yours, Laurenz Albe
On Sun, Dec 20, 2009 at 9:37 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote:
--
Shoaib Mir
http://shoaibmir.wordpress.com/
Chris Ernst wrote:
> I have a project where I need to be able to capture every query from a
> production system into a file such that I can "replay" them on a staging
> system. Does such a thing already exist or should I start writing my
> own log parser?
I am not sure if its still available but there used to a nice tool for doing the same, I guess it was named as "Playr" by myyearbook.
Regards,
Shoaib Mir
http://shoaibmir.wordpress.com/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hey guys, Albe Laurenz a écrit : > Chris Ernst wrote: >> I have a project where I need to be able to capture every query >> from a production system into a file such that I can "replay" >> them on a staging system. Does such a thing already exist or >> should I start writing my own log parser? I have the same problem. Our goal would be to capture from a 8.1 a representative period and replay against a 8.4 to find out every possible issues. > I am currently developing such a beast, it is currently still quite > alpha. If you are interested I can send you a copy. I'll try to > publish it once it is sufficiently beta. Interesting project, but but I have one big issue under 8.1 and advanced query (prepare / bind / execute): we cannot extract values of parameters from the logs with <8.2. So I am not able to parse / rebuilt query from logs under 8.1. I started something as well, based on tcpdump/tshark output (tshark - -VT text ...). My project is in pre-alpha step, but at least I can extract both simple queries and advanced queries w/ params. The only known limitations with this approach are : - cannot extract from SSL connections - ISTM tshark only support PostgreSQL V3 protocol. So only work on > 7.2 IIRC > Yours, Laurenz Albe - -- Jehan-Guillaume (ioguix) de Rorthais www.dalibo.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkst9/EACgkQxWGfaAgowiJPNgCgia285amuwCXX2nl4/LaNSofR N1wAnA8kuFKnP0vzAx/PCamheD/iKmNu =ihow -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Shoaib Mir a écrit : > On Sun, Dec 20, 2009 at 9:37 AM, Albe Laurenz > <laurenz.albe@wien.gv.at <mailto:laurenz.albe@wien.gv.at>> wrote: > > Chris Ernst wrote: >> I have a project where I need to be able to capture every > query from a >> production system into a file such that I can "replay" them on > a staging >> system. Does such a thing already exist or should I start > writing my >> own log parser? > > > I am not sure if its still available but there used to a nice tool > for doing the same, I guess it was named as "Playr" by myyearbook. AFAIK, it's a dead project and its home at myyearbook is unavailable. However, you can still access their download page: https://area51.myyearbook.com/downloads/ I did find the svn repo with some googling some time ago... I did some quick investigation on it, it seems it doesn't work with extended queries neither. > Regards, -- Shoaib Mir http://shoaibmir.wordpress.com/ - -- Jehan-Guillaume (ioguix) de Rorthais www.dalibo.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkst+d8ACgkQxWGfaAgowiKA8gCcDhiCPC6pZCghVIuVePd0s3lo GtMAoKFV5YldPH8QjdYGMRZ+Mq0Io/Dk =+ANY -----END PGP SIGNATURE-----
On Fri, Dec 18, 2009 at 02:08:07PM -0700, Chris Ernst wrote: > Hi all, > > I have a project where I need to be able to capture every query from a > production system into a file such that I can "replay" them on a staging > system. Does such a thing already exist or should I start writing my > own log parser? log to CSV format. But: 1. not always all parts of the query will be logged in query itself (prepared statements) 2. replying queries on 2nd machine doesn't quarantee that you will get the same data afterwards. depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
On 20/12/2009 7:59 PM, hubert depesz lubaczewski wrote: > On Fri, Dec 18, 2009 at 02:08:07PM -0700, Chris Ernst wrote: >> Hi all, >> >> I have a project where I need to be able to capture every query from a >> production system into a file such that I can "replay" them on a staging >> system. Does such a thing already exist or should I start writing my >> own log parser? > > log to CSV format. > > But: > 1. not always all parts of the query will be logged in query itself > (prepared statements) > 2. replying queries on 2nd machine doesn't quarantee that you will get > the same data afterwards. ... because of global settings (DATESTYLE etc) that may affect interpretation of the data, and because the log order of statements can't accurately represent concurrent execution. With the same server settings, the same starting values for sequences etc, no time-based function use, no non-deterministic function use (eg: random()) and no non-deterministic interactions between concurrent transactions, you should be able to get data that's the same when examined at the SQL level. It might not be in the same order, though, and it certainly won't be the same on-disk. So ... why do you need this replay? What sorts of limitations can you live with? It sounds like concurrency is a concern, and that's one that will give you pain, because the Pg logs don't record statement start _and_ end time, nor do they record at what points along the execution timeline the backend got a chance to do work. So it's hard to know about lock acquisition order, among other things. -- Craig Ringer
JGuillaume (ioguix) de Rorthais wrote: >> I am currently developing such a beast, it is currently still quite >> alpha. If you are interested I can send you a copy. I'll try to >> publish it once it is sufficiently beta. > > Interesting project, but but I have one big issue under 8.1 and > advanced query (prepare / bind / execute): we cannot extract values of > parameters from the logs with <8.2. So I am not able to parse / > rebuilt query from logs under 8.1. Hmm, that doesn't bother me. 8.1 is pretty old now, and who knows when my program will be stable :^) > I started something as well, based on tcpdump/tshark output (tshark > - -VT text ...). My project is in pre-alpha step, but at least I can > extract both simple queries and advanced queries w/ params. The only > known limitations with this approach are : > - cannot extract from SSL connections > - ISTM tshark only support PostgreSQL V3 protocol. So only work on > > 7.2 IIRC I guess each approach has some limitations. The limitations I encountered for log parsing: - COPY data are not logged. - Fast Path API calls are not logged (that includes large object functions). - Unless you have log_min_messages at DEBUG2 or better, you cannot determine when exactly a prepared statement was parsed. Yours, Laurenz Albe
Le 19 déc. 2009 à 16:20, Chris Ernst a écrit : > Hmm.. That does look very interesting. The only thing that concerns me > is where it says it supports "Basic Queries (Extended queries not yet > supported)". I'm not sure what is meant by "Extended queries". Any idea? I think it refers to the Extended Query support in the frontend / backend protocol, as in the documentation: http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY The extended query protocol breaks down the above-described simple query protocol into multiple steps. The results of preparatorysteps can be re-used multiple times for improved efficiency. Furthermore, additional features are available, suchas the possibility of supplying data values as separate parameters instead of having to insert them directly into a querystring So that's for parse/bind/execute communications, which are used in prepare/execute and queryParam I think. -- dim
Dimitri Fontaine wrote: > Le 19 déc. 2009 à 16:20, Chris Ernst a écrit : > >> Hmm.. That does look very interesting. The only thing that concerns me >> is where it says it supports "Basic Queries (Extended queries not yet >> supported)". I'm not sure what is meant by "Extended queries". Any idea? > > I think it refers to the Extended Query support in the frontend / backend protocol, as in the documentation: > http://www.postgresql.org/docs/8.4/static/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY > > The extended query protocol breaks down the above-described simple query protocol into multiple steps. The results ofpreparatory steps can be re-used multiple times for improved efficiency. Furthermore, additional features are available,such as the possibility of supplying data values as separate parameters instead of having to insert them directlyinto a query string > > So that's for parse/bind/execute communications, which are used in prepare/execute and queryParam I think. Ouch! You're right. And that's would be a deal killer for me. About 90% of the traffic is prepared queries that are run over and over with different parameters. - Chris
Le 21 déc. 2009 à 15:24, Chris Ernst a écrit : > Ouch! You're right. And that's would be a deal killer for me. About > 90% of the traffic is prepared queries that are run over and over with > different parameters. The driver project and code are now there it seems: http://frihjul.net/pgsql http://github.com/noss/pgsql/tree Maybe you could ask the author about supporting the extended protocol, a quick browsing tonight shows me prepare/executesupport. I'm sure if improvements in the pgsql driver would translate to improvements in the tsung supportof it, but it should be about it. Regards, -- dim