Обсуждение: Advice - indexing on varchar fields where only last x characters known
I have a problem with a field that appears on a number of my tables. The field is the Vehicle Identification Number. Every vehicle has one and it uniquely identifies that vehicle. Traditionally this was a 11 character string but a number of years ago was extended to 17 characters by adding a 6 character prefix. The problem that I have is that these VIN numbers are provided by a number of data systems including manufacturer feeds, logistics companies as well as internal systems. Some use the full 17 character string while others only use the last 11. On top of this, my users are used to only having to type the last 6 characters for speed and usability reasons. However, it means that every time I'm trying to connect various tables up using foreign keys or doing searches I have to make allowences for this which means I'm using things like substring, like, regex etc. all of which are very slow. Can anyone suggest a better / more efficient way of handling these. Gary
Hi,
PostgreSQL does have the ability to index on expressions. Will that help?http://www.postgresql.org/docs/9.1/static/indexes-expressional.html
On Thu, Oct 17, 2013 at 4:50 PM, Gary Stainburn <gary.stainburn@ringways.co.uk> wrote:
I have a problem with a field that appears on a number of my tables.
The field is the Vehicle Identification Number. Every vehicle has one and it
uniquely identifies that vehicle.
Traditionally this was a 11 character string but a number of years ago was
extended to 17 characters by adding a 6 character prefix.
The problem that I have is that these VIN numbers are provided by a number of
data systems including manufacturer feeds, logistics companies as well as
internal systems. Some use the full 17 character string while others only use
the last 11.
On top of this, my users are used to only having to type the last 6 characters
for speed and usability reasons.
However, it means that every time I'm trying to connect various tables up
using foreign keys or doing searches I have to make allowences for this which
means I'm using things like substring, like, regex etc. all of which are very
slow.
Can anyone suggest a better / more efficient way of handling these.
Gary
--
Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql
Re: Advice - indexing on varchar fields where only last x characters known
От
skinner@britvault.co.uk (Craig R. Skinner)
Дата:
On 2013-10-17 Thu 12:20 PM |, Gary Stainburn wrote: > > The problem that I have is that these VIN numbers are provided by a number of > data systems including manufacturer feeds, logistics companies as well as > internal systems. Some use the full 17 character string while others only use > the last 11. > Split the vin into 2/3 columns? vin_prefix (NULL), vin_suffix (NOT NULL) vin_prefix, vin_centre, vin_suffix UNIQUE all together in 1 constraint & index each separately. Maybe... -- Craig Skinner | http://twitter.com/Craig_Skinner | http://linkd.in/yGqkv7
On 18/10/13 00:20, Gary Stainburn wrote: > I have a problem with a field that appears on a number of my tables. > > The field is the Vehicle Identification Number. Every vehicle has one and it > uniquely identifies that vehicle. > > Traditionally this was a 11 character string but a number of years ago was > extended to 17 characters by adding a 6 character prefix. > > > The problem that I have is that these VIN numbers are provided by a number of > data systems including manufacturer feeds, logistics companies as well as > internal systems. Some use the full 17 character string while others only use > the last 11. > > On top of this, my users are used to only having to type the last 6 characters > for speed and usability reasons. > > However, it means that every time I'm trying to connect various tables up > using foreign keys or doing searches I have to make allowences for this which > means I'm using things like substring, like, regex etc. all of which are very > slow. > > Can anyone suggest a better / more efficient way of handling these. > > Gary > > Use 2 fields, one for the 6 character prefix, and the other for the original 11 digits. Search for the 6 character prefix, or a null prefix AND the first 6 characters of the 11 digit field. It might be better to have a string for the prefix and make it blank rather than null, when nothing is entered there. Cheers, Gavin
Gary Stainburn wrote: > The problem that I have is that these VIN numbers are provided by a number of > data systems including manufacturer feeds, logistics companies as well as > internal systems. Some use the full 17 character string while others only use > the last 11. > > On top of this, my users are used to only having to type the last 6 characters > for speed and usability reasons. Try creating an index on reverse(vin) and using the same function in queries; you can put a % at the end of the sought-for literal to match suffixes. That works quite well and is very simple to implement. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 18/10/13 07:39, Alvaro Herrera wrote: > Gary Stainburn wrote: > >> The problem that I have is that these VIN numbers are provided by a number of >> data systems including manufacturer feeds, logistics companies as well as >> internal systems. Some use the full 17 character string while others only use >> the last 11. >> >> On top of this, my users are used to only having to type the last 6 characters >> for speed and usability reasons. > Try creating an index on reverse(vin) and using the same function in > queries; you can put a % at the end of the sought-for literal to match > suffixes. That works quite well and is very simple to implement. > That is extremely cunning, and obvious in retrospect! :-) Cheers, Gavin
Gary Stainburn wrote > However, it means that every time I'm trying to connect various tables up > using foreign keys The degree to which each input source guarantees uniqueness of a given VIN matters. Keep in mind that any system that required the user to manually enter the VIN has the propensity for errors. Either outright invalid VINs or marginally correct VINs with typos (which mean the VIN might be less or more than 17 characters even if the 17-character version was intended). Specifically it is not uncommon for the VIN to be made-up when it is a required field but the user does not know what the VIN is. 6 characters are unique within a model year but you need at least 8 characters to be generally unique for a given manufacturer. For foreign key purposes it may be worthwhile to generate a "matching" table and then during import use an algorithm to match up different records. Then during general queries that table can be used for joins. In this way you only pay the price of matching once and that during import as opposed to during user requests. Having a canonical VIN table helps here though during import that table then has to be maintained. The added advantage is that such a mapping table allows you to search against a single table and such a table (and likely its indexes) should be fairly small so as to make good use of memory. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Advice-indexing-on-varchar-fields-where-only-last-x-characters-known-tp5774839p5774944.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Re: Re: Advice - indexing on varchar fields where only last x characters known
От
Gary Stainburn
Дата:
On Thursday 17 October 2013 20:16:30 David Johnston wrote: > Gary Stainburn wrote > > > However, it means that every time I'm trying to connect various tables up > > using foreign keys > > The degree to which each input source guarantees uniqueness of a given VIN > matters. Keep in mind that any system that required the user to manually > enter the VIN has the propensity for errors. Either outright invalid VINs > or marginally correct VINs with typos (which mean the VIN might be less or > more than 17 characters even if the 17-character version was intended). > Specifically it is not uncommon for the VIN to be made-up when it is a > required field but the user does not know what the VIN is. > > 6 characters are unique within a model year but you need at least 8 > characters to be generally unique for a given manufacturer. > > For foreign key purposes it may be worthwhile to generate a "matching" > table and then during import use an algorithm to match up different > records. Then during general queries that table can be used for joins. In > this way you only pay the price of matching once and that during import as > opposed to during user requests. Having a canonical VIN table helps here > though during import that table then has to be maintained. The added > advantage is that such a mapping table allows you to search against a > single table and such a table (and likely its indexes) should be fairly > small so as to make good use of memory. > > David J. David You raise a number of valid points here. Thankfully it is very rare that the user creates a VIN number so the chances of creating the wrong VIN is slim. Most data is created from data feeds from external sources. Although these may also be wrong (even Ford get it wrong some times) the chances are slim. Where a user enters a 6 character VIN the times that more than one vehicle is returned is negligable, and where it does I return an error and ask for a longer VIN, which may be anywhere between 8 and the full 17 characters. I like the idea of using reverse() to create the index, and will investigate that. Will it work on an old server running 8.3.3? On one system I did generate a second field fixed at 11 characters for use only with foreign keys and used a function and trigger to automatically update it based on the value of the original VIN field. I don't like it because it goes against the principles of relational databases in that it is redundant information, and was wondering if there were better options. While the reverse() option may work for searching it will not be ideal for foreign keys. -- Gary Stainburn Group I.T. Manager Ringways Garages http://www.ringways.co.uk
Re: Re: Advice - indexing on varchar fields where only last x characters known
От
skinner@britvault.co.uk (Craig R. Skinner)
Дата:
On 2013-10-18 Fri 08:22 AM |, Gary Stainburn wrote: > > While the reverse() option may work for searching it will not be ideal for > foreign keys. > Use an integer (serial) as the PK/FK & unique the VIN. Then you can use reverse() for searching. Kitcars, boat/garden/equipment/truck trailers, caravans, etc. might not have any VIN.
Gary Stainburn wrote > On Thursday 17 October 2013 20:16:30 David Johnston wrote: >> Gary Stainburn wrote >> >> > However, it means that every time I'm trying to connect various tables >> up >> > using foreign keys >> >> The degree to which each input source guarantees uniqueness of a given >> VIN >> matters. Keep in mind that any system that required the user to manually >> enter the VIN has the propensity for errors. Either outright invalid >> VINs >> or marginally correct VINs with typos (which mean the VIN might be less >> or >> more than 17 characters even if the 17-character version was intended). >> Specifically it is not uncommon for the VIN to be made-up when it is a >> required field but the user does not know what the VIN is. >> >> 6 characters are unique within a model year but you need at least 8 >> characters to be generally unique for a given manufacturer. >> >> For foreign key purposes it may be worthwhile to generate a "matching" >> table and then during import use an algorithm to match up different >> records. Then during general queries that table can be used for joins. >> In >> this way you only pay the price of matching once and that during import >> as >> opposed to during user requests. Having a canonical VIN table helps here >> though during import that table then has to be maintained. The added >> advantage is that such a mapping table allows you to search against a >> single table and such a table (and likely its indexes) should be fairly >> small so as to make good use of memory. >> >> David J. > > David > > You raise a number of valid points here. Thankfully it is very rare that > the > user creates a VIN number so the chances of creating the wrong VIN is > slim. > Most data is created from data feeds from external sources. Although these > may also be wrong (even Ford get it wrong some times) the chances are > slim. > > Where a user enters a 6 character VIN the times that more than one vehicle > is > returned is negligable, and where it does I return an error and ask for a > longer VIN, which may be anywhere between 8 and the full 17 characters. Unless they scan barcodes the service department usually has a frequent need to key in VINs manually. A better UI would be to list all matches and let the user pick. Zero matches could also result in a similarity search... The nature of VINs make them poor FKs. This is one case where I either use a surrogate key or, in most cases, some other natural key and leae the VIN as a descriptive attribute. The VIN is often of secondary priority. Stock numbers and invoice numbers are the primaries. I haven't done much work with cross-department linking of vehicles; mostly worked in a vertical fashion so a vehicle is informative. I also only have USA exposure. As to the trigger usage: it is not redundant information. One column contains raw data and the other is normalized. While they both represent the same concept generally the specific characteristics makes them non-redundant in the model. A surrogate key is, in many ways, redundant to the table's natural key but it does have different characteristics and purpose so my dislike for them is not because they add redundant information to the model. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Advice-indexing-on-varchar-fields-where-only-last-x-characters-known-tp5774839p5775023.html Sent from the PostgreSQL - sql mailing list archive at Nabble.com.
Re: Re: Advice - indexing on varchar fields where only last x characters known
От
Gary Stainburn
Дата:
Craig, We only deal with Ford and Kia new cars. Thankfully, I am only dealing with pre-sales / sales at the moment. After-sales is already catered for. VIN validation there is already dealt with as once the VIN is entered it is used to access manufacturer's warranty / aftersales databases. On Friday 18 October 2013 14:44:12 David Johnston wrote: > A better UI would be to list all matches and let the user pick. Zero > matches could also result in a similarity search... I have built this facility into the UI using onKeyUp events to trigger AJAX (onchange only works once the field loses focus). Although this greatly increases accuracy on user input it does increase the load on the SQL server as entering a single VIN generates a number of searches. > > The nature of VINs make them poor FKs. This is one case where I either use > a surrogate key or, in most cases, some other natural key and leae the VIN > as a descriptive attribute. The VIN is often of secondary priority. Stock > numbers and invoice numbers are the primaries. Where possible I do use SERIAL for primary keys and foreign keys. However, this method still requires the initial search to be done using real data to make the initial pairing. There are a number of stages that have to to be actioned between vehicle order and customer handover. Unfortunately, many of these actions occur asynchronously so I cannot guarantee certain pairs will exist before others are required. You make a valid point about stock numbers making a far better PK and where I have one I use it. However, as an example. a factory order from Ford won't even have a VIN allocated until it reached sheduled (build start date allocated), so the only reference I have is the Ford Order Number. The FON is the link from our internal order to the vehicle. Then once the vehicle is scheduled I get a VIN too. When the vehicle is received into our compound it is booked in by scanning the VIN. This then links to the FON, and back to the order. The stock number only gets allocated after the vehicle has been booked into stock. Things then become much easier, which is good because that's also when it becomes busier. > > I haven't done much work with cross-department linking of vehicles; mostly > worked in a vertical fashion so a vehicle is informative. I also only have > USA exposure. > > As to the trigger usage: it is not redundant information. One column > contains raw data and the other is normalized. While they both represent > the same concept generally the specific characteristics makes them > non-redundant in the model. A surrogate key is, in many ways, redundant to > the table's natural key but it does have different characteristics and > purpose so my dislike for them is not because they add redundant > information to the model. You make a valid point here. I am looking to make a number of changes to my systems as I add enhancements and I think that I will use all of the methods mentioned. Using reverse() may not be as much use as I first thought because of the AJAX style of the user input, but could still be of use on programmatic searching. Adding a SERIAL as an index for my compound stock would also be of use and could be implemented progressively. Using the fixed length 11 character VIN field and a trigger has already proved it's worth on one table so I think I will expand it's use more to improve creating foreign key pairs as different data feeds allow. Thanks to all replies. -- Gary Stainburn Group I.T. Manager Ringways Garages http://www.ringways.co.uk