Re: constants for tar header offsets

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: constants for tar header offsets
Дата
Msg-id CA+TgmoZniAFyu-UOt-OtbGnVnc00BhGbN6sMjH_tX+XNoj1a_w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: constants for tar header offsets  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: constants for tar header offsets  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Tue, Apr 18, 2023 at 11:38 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > We have a few different places in the code where we generate or modify
> > tar headers or just read data out of them. The code in question uses
> > one of my less-favorite programming things: magic numbers. The offsets
> > of the various fields within the tar header are just hard-coded in
> > each relevant place in our code. I think we should clean that up, as
> > in the attached patch.
>
> Generally +1, with a couple of additional thoughts:
>
> 1. Is it worth inventing macros for the values of the file type,
> rather than just writing the comment you did?

Might be.

> 2. The header size is defined as 512 bytes, but this doesn't sum to 512:
>
> +       TAR_OFFSET_PREFIX = 345         /* 155 byte string */
>
> Either that field's length is really 167 bytes, or there's some other
> field you didn't document.  (It looks like you may have copied "155"
> from an incorrect existing comment?)

According to my research, it is neither of those, e.g. see

https://www.subspacefield.org/~vax/tar_format.html
https://www.ibm.com/docs/en/zos/2.4.0?topic=formats-tar-format-tar-archives
https://wiki.osdev.org/USTAR

I think that what happened is that whoever designed the original tar
format decided on 512 byte blocks. And the header did not take up the
whole block. The USTAR format is an extension of the original format
which uses more of the block, but still not all of it.

> Yeah, this is adding greppability (a good thing!) but little more.
> However, I'm not convinced it's worth doing more.  It's not like
> this data structure will change anytime soon.

Right. greppability is a major concern for me here, and also bug
surface. Right now, to use the functions in pgtar.h, you need to know
all the header offsets as well as the format and length of each header
field. This centralizes constants for the header offsets, and at least
provides some centralized documentation of the rest. It's not great,
though, because it seems like there's some risk of someone writing new
code and getting confused about whether the length of a certain field
is 8 or 12, for example. A thicker abstraction layer might be able to
avoid or minimize such risks better than what we have, but I don't
really know how to design it, whereas this seems like an obvious
improvement.

--
Robert Haas
EDB: http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Request for comment on setting binary format output per session
Следующее
От: Tom Lane
Дата:
Сообщение: Re: constants for tar header offsets