Re: Populating large tables with occasional bad values
От | Craig Ringer |
---|---|
Тема | Re: Populating large tables with occasional bad values |
Дата | |
Msg-id | 4852C609.3060109@postnewspapers.com.au обсуждение исходный текст |
Ответ на | Re: Populating large tables with occasional bad values ("John T. Dow" <john@johntdow.com>) |
Список | pgsql-jdbc |
John T. Dow wrote: > Fifth - now what? I tried to commit at this point, hoping that the > first insert statements would be commited, but they are not. Actually, I > was hoping that the entire batch would be processed and then I could see > which individual statements didn't insert (using the update counts) and > then redo them individually, but I think that is wishful thinking. The first error causes a transaction rollback. The only way the JDBC driver could skip the failed operation and continue would be to wrap each statement in a savepoint, and if there's an error issue a rollback to the last savepoint. That's not ideal from a performance point of view. > Best possible solution with batches? Submit the batch, learn who many > were accepted before the first error, resubmit just those. Process the > problem statement by itself. Repeat with the statements left in the > original batch until none are left, then create a new batch and begin > anew. If a batch, or portion thereof, has no exception, then of course > commit. IMO the best solution with batches is to use batches to store your planned operations in a staging table using INSERTs that cannot fail under normal circumstances. Then issue a single PL/PgSQL call or a bulk UPDATE ... SELECT with an appropriate WHERE clause to apply the updates and catch problems. The approach you describe will work, but it'll be a bit ugly and not very fast. If failures are very rare you might find it to be OK, but your suggestion of using only 10 statements per batch suggests that failures aren't *that* rare. > In the posting I quote below, the implication is that you can learn > multiple statements in the batch that failed, but I didn't see that > happening. AFAIK that's database specific. The docs I read suggested that DBs are permitted to stop processing and return info on the first bad row, or to continue processing and return info on all rows. As PostgreSQL cannot just keep going after an error within a transaction unless you use savepoints, it choses to do the former. It might be cool if Pg supported an automatic savepoint mode where every statement updated an implicit savepoint snapshot. If the statement failed you could ROLLBACK TO LAST STATEMENT. If possible the savepoint would be replaced with each statement so there'd be no ever-growing set of savepoints to worry about. With this capability way you could use proper transactional isolation but also achieve some error handling within the transaction. Personally, though, I either restructure my SQL to avoid the potential errors or use PL/PgSQL to handle them. The app has to be capable of reissuing failed transactions anyway, so sometimes an app-level transaction do-over is quite enough. There might be situations in which that hypothetical feature could be handy, though. > It seems that posgres just quits at the first problem. Yes, by design. In a transaction if a statement fails then without a savepoint in place there is, AFAIK, no way to just pretend the bad statemnet was never issued. I don't know if it'll keep going if you enable autocommit. It might; it depends on how the JDBC driver does batches and how the server processes them. Your performance will suffer with autocommit on - though not as badly as when you're paying the high round trip latencies - and you won't be able to roll back if something nasty happens. If you don't care about things like automatic rollback on network dropout you should probably test batches with autocommit on and see how they behave. -- Craig Ringer
В списке pgsql-jdbc по дате отправления:
Предыдущее
От: "John T. Dow"Дата:
Сообщение: Re: Populating large tables with occasional bad values