Обсуждение: Filling Missing Primary Key Values

Поиск
Список
Период
Сортировка

Filling Missing Primary Key Values

От
Rich Shepard
Дата:
   I've a table (from a client, not created here) with a column that should
be the primary key, but not all rows have a value for this attribute. The
column format is VARCHAR(12) and has a variety of values, such as 96-A000672
and 9612-0881 (probably assigned by different analytical laboratories).

   A simple sequence of numbers would do the job of replacing NULL values.
What is the most parsimonious way to replace NULLs with unique values for
this column? I also need to add such values for a new set of data that I'm
in the process of translating from spreadsheet format to the table
structure.

Rich


Re: Filling Missing Primary Key Values

От
Chris Travers
Дата:
On Thu, Aug 11, 2011 at 11:47 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>  I've a table (from a client, not created here) with a column that should
> be the primary key, but not all rows have a value for this attribute. The
> column format is VARCHAR(12) and has a variety of values, such as 96-A000672
> and 9612-0881 (probably assigned by different analytical laboratories).
>
>  A simple sequence of numbers would do the job of replacing NULL values.
> What is the most parsimonious way to replace NULLs with unique values for
> this column? I also need to add such values for a new set of data that I'm
> in the process of translating from spreadsheet format to the table
> structure.
>
The simplest seems to me to be a sequence and use nextval() to
populate the null values.  The major advantage would be that the
sequence could stay around in case you need it again.  So for example:

create sequence my_varchar_values;

UPDATE my_table set my_varchar =
nextval('my_varchar_values')::varchar(12) where my_varchar IS NULL;

You could also use windowing functions to get rid of the sequence, but
the queries become a lot more complicated.  For example, see
http://stackoverflow.com/questions/4358613/using-window-functions-in-an-update-statement

Best Wishes,
Chris Travers

Re: Filling Missing Primary Key Values

От
Rich Shepard
Дата:
On Thu, 11 Aug 2011, Chris Travers wrote:

> The simplest seems to me to be a sequence and use nextval() to populate
> the null values. The major advantage would be that the sequence could stay
> around in case you need it again. So for example:
>
> create sequence my_varchar_values;

> UPDATE my_table set my_varchar =
> nextval('my_varchar_values')::varchar(12) where my_varchar IS NULL;

Chris,

   I was wondering if this was the best approach since I have new data to add
to the table. Don't need a starting value, eh?

Many thanks,

Rich

Re: Filling Missing Primary Key Values

От
Chris Travers
Дата:
On Thu, Aug 11, 2011 at 12:34 PM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
> On Thu, 11 Aug 2011, Chris Travers wrote:
>
>> The simplest seems to me to be a sequence and use nextval() to populate
>> the null values. The major advantage would be that the sequence could stay
>> around in case you need it again. So for example:
>>
>> create sequence my_varchar_values;
>
>> UPDATE my_table set my_varchar =
>> nextval('my_varchar_values')::varchar(12) where my_varchar IS NULL;
>
> Chris,
>
>  I was wondering if this was the best approach since I have new data to add
> to the table. Don't need a starting value, eh?
>
>
TBH, it's the approach I would use.  It creates one additional
database object but the queries are simpler and thus more
maintainable.

Best Wishes,
Chris Travers

Re: Filling Missing Primary Key Values

От
David Johnston
Дата:
On Aug 11, 2011, at 15:08, Chris Travers <chris.travers@gmail.com> wrote:

> On Thu, Aug 11, 2011 at 11:47 AM, Rich Shepard <rshepard@appl-ecosys.com> wrote:
>>  I've a table (from a client, not created here) with a column that should
>> be the primary key, but not all rows have a value for this attribute. The
>> column format is VARCHAR(12) and has a variety of values, such as 96-A000672
>> and 9612-0881 (probably assigned by different analytical laboratories).
>>
>>  A simple sequence of numbers would do the job of replacing NULL values.
>> What is the most parsimonious way to replace NULLs with unique values for
>> this column? I also need to add such values for a new set of data that I'm
>> in the process of translating from spreadsheet format to the table
>> structure.
>>

The technical aspect is covered but consider using one or two characters as a prefix related to the data source.
You'vealready taken the hit for using a text data type so you might as well take advantage of it.  Even if you have a
sourcefield this can be useful.  The only costraint is you limit the number of sequence values you can use (per
source).

If you use the sequence you can always reset it between imports.

David J.

Re: Filling Missing Primary Key Values

От
Steve Atkins
Дата:
On Aug 11, 2011, at 12:34 PM, Rich Shepard wrote:

> On Thu, 11 Aug 2011, Chris Travers wrote:
>
>> The simplest seems to me to be a sequence and use nextval() to populate
>> the null values. The major advantage would be that the sequence could stay
>> around in case you need it again. So for example:
>>
>> create sequence my_varchar_values;
>
>> UPDATE my_table set my_varchar =
>> nextval('my_varchar_values')::varchar(12) where my_varchar IS NULL;
>
> Chris,
>
>  I was wondering if this was the best approach since I have new data to add
> to the table. Don't need a starting value, eh?

This will fail if any of the existing values are integers in the range that
you're inserting - and it may fail in the future, as you add new records
if they clash with existing entries.

It's still a good way to go, but might need some care or some tweaking -
adding a prefix, maybe.

Cheers,
  Steve


Re: Filling Missing Primary Key Values

От
Rich Shepard
Дата:
On Thu, 11 Aug 2011, Steve Atkins wrote:

> This will fail if any of the existing values are integers in the range that
> you're inserting - and it may fail in the future, as you add new records
> if they clash with existing entries.

Steve/Chris/Dave:

   I had not looked in deatil at that column before. Having just done this, I
see that it's really a laboratory number, not a unique sample ID. So, I
renamed sample_id to lab_nbr, added a sample_id column, created the sequence
sample_id_seq, updated the table with it, then added the constraint that
sample_id is the primary key.

   Thank you all very much!

Rich