Обсуждение: kind of a bag of attributes in a DB . . .

Поиск
Список
Период
Сортировка

kind of a bag of attributes in a DB . . .

От
Albretch Mueller
Дата:
Say, you get lots of data and their corresponding metadata, which in
some cases may be undefined or undeclared (left as an empty string).
Think of youtube json files or the result of the "file" command.

I need to be able to "instantly" search that metadata and I think DBs
are best for such jobs and get some metrics out of it.

I know this is not exactly a kosher way to deal with data which can't
be represented in a nice tabular form, but I don't find the idea that
half way off either.

What is the pattern, anti-pattern or whatever relating to such design?

Do you know of such implementations with such data?

lbrtchx



Re: kind of a bag of attributes in a DB . . .

От
Adrian Klaver
Дата:
On 9/7/19 5:45 AM, Albretch Mueller wrote:
> Say, you get lots of data and their corresponding metadata, which in
> some cases may be undefined or undeclared (left as an empty string).
> Think of youtube json files or the result of the "file" command.
> 
> I need to be able to "instantly" search that metadata and I think DBs
> are best for such jobs and get some metrics out of it.

Is the metadata uniform or are you dealing with a variety of different data?


> 
> I know this is not exactly a kosher way to deal with data which can't
> be represented in a nice tabular form, but I don't find the idea that
> half way off either.
> 
> What is the pattern, anti-pattern or whatever relating to such design?
> 
> Do you know of such implementations with such data?
> 
> lbrtchx
> 
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: kind of a bag of attributes in a DB . . .

От
Chris Travers
Дата:


On Sat, Sep 7, 2019 at 5:17 PM Albretch Mueller <lbrtchx@gmail.com> wrote:
Say, you get lots of data and their corresponding metadata, which in
some cases may be undefined or undeclared (left as an empty string).
Think of youtube json files or the result of the "file" command.

I need to be able to "instantly" search that metadata and I think DBs
are best for such jobs and get some metrics out of it.

I know this is not exactly a kosher way to deal with data which can't
be represented in a nice tabular form, but I don't find the idea that
half way off either.

What is the pattern, anti-pattern or whatever relating to such design?

Do you know of such implementations with such data?

We do the debug logs of JSONB with some indexing.    It works in some limited cases but you need to have a good sense of index possibilities and how the indexes actually work.


lbrtchx




--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.

Re: kind of a bag of attributes in a DB . . .

От
Albretch Mueller
Дата:
On 9/7/19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> Is the metadata uniform or are you dealing with a variety of different
> data?

 You can expect for all files to have a filename and size, but their
kinds (the metadata describing them) can be really colorful and wild
when it comes to formatting.

 lbrtchx



Re: kind of a bag of attributes in a DB . . .

От
Adrian Klaver
Дата:
On 9/10/19 9:59 AM, Albretch Mueller wrote:
> On 9/7/19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> Is the metadata uniform or are you dealing with a variety of different
>> data?
> 
>   You can expect for all files to have a filename and size, but their
> kinds (the metadata describing them) can be really colorful and wild
> when it comes to formatting.

If there is no rhyme or reason to the metadata I am not sure how you 
could come up with an efficient search strategy. Seems it would be a 
brute search over everything.

> 
>   lbrtchx
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: kind of a bag of attributes in a DB . . .

От
Albretch Mueller
Дата:
On 9/10/19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> If there is no rhyme or reason to the metadata I am not sure how you
> could come up with an efficient search strategy. Seems it would be a
> brute search over everything.

 Not exactly. Say some things have colours but now weight. You could
still Group them as being "weighty" and then tell about how heavy they
are, with the colorful ones you could specify the colours and then see
if there is some correlation between weights and colours ...

 lbrtchx



Re: kind of a bag of attributes in a DB . . .

От
Adrian Klaver
Дата:
On 9/11/19 9:46 AM, Albretch Mueller wrote:
> On 9/10/19, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>> If there is no rhyme or reason to the metadata I am not sure how you
>> could come up with an efficient search strategy. Seems it would be a
>> brute search over everything.
> 
>   Not exactly. Say some things have colours but now weight. You could
> still Group them as being "weighty" and then tell about how heavy they
> are, with the colorful ones you could specify the colours and then see
> if there is some correlation between weights and colours ...

It would help to see some sample data, otherwise any answer would be 
pure speculation.

> 
>   lbrtchx
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: kind of a bag of attributes in a DB . . .

От
Albretch Mueller
Дата:
 just download a bunch of json info files from youtube data Feeds

 Actually, does postgresql has a json Driver of import feature?

 the metadata contained in json files would require more than one
small databases, but such an import feature should be trivial

 C



Re: kind of a bag of attributes in a DB . . .

От
Adrian Klaver
Дата:
On 9/14/19 2:06 AM, Albretch Mueller wrote:
>   just download a bunch of json info files from youtube data Feeds
> 
>   Actually, does postgresql has a json Driver of import feature?

Not sure what you mean by above?

Postgres has json(b) data types that you can import JSON into:

https://www.postgresql.org/docs/11/datatype-json.html
> 
>   the metadata contained in json files would require more than one
> small databases, but such an import feature should be trivial

Again, not sure I understand why small databases are required?

> 
>   C
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: kind of a bag of attributes in a DB . . .

От
Adrian Klaver
Дата:
On 9/14/19 2:06 AM, Albretch Mueller wrote:
>   just download a bunch of json info files from youtube data Feeds
> 
>   Actually, does postgresql has a json Driver of import feature?

I'm working without a net(coffee) and so I forgot to mention that for 
Python there is:

http://initd.org/psycopg/docs/extras.html?highlight=json

Not sure if this is what you are looking for or not.

> 
>   the metadata contained in json files would require more than one
> small databases, but such an import feature should be trivial
> 
>   C
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com



Re: kind of a bag of attributes in a DB . . .

От
Chris Travers
Дата:


On Sat, Sep 14, 2019 at 5:11 PM Albretch Mueller <lbrtchx@gmail.com> wrote:
 just download a bunch of json info files from youtube data Feeds

 Actually, does postgresql has a json Driver of import feature?

Sort of....  There  are a bunch of features around JSON and JSONB data types which could be useful.

 the metadata contained in json files would require more than one
small databases, but such an import feature should be trivial

It is not at all trivial for a bunch of reasons inherent to the JSON specification.  How to handle duplicate keys, for example.

However writing an import for JSON objects into a particular database is indeed trivial. 

 C




--
Best Wishes,
Chris Travers

Efficito:  Hosted Accounting and ERP.  Robust and Flexible.  No vendor lock-in.