Re: A space-efficient, user-friendly way to store categorical data

Поиск
Список
Период
Сортировка
От Andrew Kane
Тема Re: A space-efficient, user-friendly way to store categorical data
Дата
Msg-id CACDdp+YG_NP9isQ9Ey2KPb_sKoRH+MUj51kJzaGm6rTocKTUcg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: A space-efficient, user-friendly way to store categorical data  (Mark Dilger <hornschnorter@gmail.com>)
Список pgsql-hackers
They'd refer to separate enums.

I originally thought an enum was a good comparison for this feature, but I'm no longer sure that it is. A text-based ordering would be desired rather than the label index.

A better comparison may be a two-column lookup table:

-- create
CREATE TABLE cities (id bigserial primary key, name text)
CREATE UNIQUE INDEX ON cities (name);
CREATE TABLE users (city_id bigint);

-- insert
BEGIN;
INSERT INTO cities (name) VALUES ('Chicago') ON CONFLICT (name) DO NOTHING RETURNING id;
INSERT INTO users (city_id) VALUES (<city id returned from earlier>);
COMMIT;

-- select
SELECT * FROM users FROM users INNER JOIN cities ON cities.id = users.city_id WHERE name = 'Chicago';


Ideally, the lookup table could be maintained by Postgres to make reads and writes easier.

-- create
CREATE TABLE users (city text DEDUPED);

-- insert
INSERT INTO users (city) VALUES ('Chicago');

-- query
SELECT * FROM users WHERE city = 'Chicago';

I'm not really sure the best place to store this lookup table.

- Andrew

On Mon, Feb 12, 2018 at 7:11 PM, Mark Dilger <hornschnorter@gmail.com> wrote:

> On Feb 12, 2018, at 6:35 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andrew Kane <andrew@chartkick.com> writes:
>> Thanks everyone for the feedback. The current enum implementation requires
>> you to create a new type and add labels outside a transaction prior to an
>> insert.
>
> Right ...
>
>> Since enums have a fixed number of labels, this type of feature may be
>> better off as a property you could add to text columns (as Thomas
>> mentions). This would avoid issues with hitting the max number of labels.
>
> ... but you're not saying how you'd avoid the need for prior commit of the
> labels.  The sticking point for enums is that once a value has gotten into
> a btree index, we can't ever lose the ability to compare that value to
> others, or the index will be broken.  So inserting an uncommitted value
> into user tables has to be prevented.
>
> Maybe there's a way to assign the labels so that they can be compared
> without reference to any outside data, but it's not clear to me how
> that would work.

When I implemented this, I wrote the comparators to work on the Oid for
the value, not the string representation.  That works fine.  If you want to
sort the data on the stringified version, cast to text first.  That works well
enough for me, since I'm typically not interested in what sort order is used,
as long as it is deterministic and works for indexing, group by, and so forth.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: rename sgml files?
Следующее
От: Tatsuo Ishii
Дата:
Сообщение: Re: rename sgml files?