Обсуждение: Select random lines of a table using a probability distribution

Поиск
Список
Период
Сортировка

Select random lines of a table using a probability distribution

От
"Jira, Marcel"
Дата:
<div class="WordSection1"><p class="MsoNormal">Hi!<p class="MsoNormal"> <p class="MsoNormal"><span lang="EN-US">Let’s
considerI have a table like this</span><p class="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"CourierNew"">id    qualification    gender    age    income</span><p
class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">I’d like to select (for example
100)lines of this table by random, but the random mechanism has to follow a certain probability distribution.</span><p
class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">I want to use this procedure to
constructa test group for another selection.</span><p class="MsoNormal"><span lang="EN-US"> </span><p
class="MsoNormal"><spanlang="EN-US">Example:</span><p class="MsoNormal"><span lang="EN-US"> </span><p
class="MsoNormal"><spanlang="EN-US">I filter all lines having the qualification “plumber”.</span><p
class="MsoNormal"><spanlang="EN-US">I get 50 different ids consisting of 40 males, 10 females and a certain age
distribution.</span><pclass="MsoNormal"><span lang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">I also get
someinformation concerning the income of the plumbers.</span><p class="MsoNormal"><span lang="EN-US"> </span><p
class="MsoNormal"><spanlang="EN-US">Now I want to know if the income is more influenced by the gender and age
distributionor by the qualification “plumber”.</span><p class="MsoNormal"><span lang="EN-US"> </span><p
class="MsoNormal"><spanlang="EN-US">Therefore I would like to select a test group (of 50 or more) without any plumbers.
Thistest group has to follow the same age and gender distribution.</span><p class="MsoNormal"><span
lang="EN-US"> </span><pclass="MsoNormal"><span lang="EN-US">Then I would be able to compare this groups income
statisticswith the plumbers income statistics.</span><p class="MsoNormal"><span lang="EN-US"> </span><p
class="MsoNormal"><spanlang="EN-US">Is this possible (and doable with reasonable effort) in PostgreSQL?</span><p
class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">Thank you in advance.</span><p
class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">Best regards,</span><p
class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span lang="EN-US">Marcel Jira</span><p
class="MsoNormal"><spanlang="EN-US"> </span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier
New"">╔════<b>    ~~~ * ~~~</b></span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║
Mag.Marcel Jira</span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║ Institut für
Sozialpolitik,Wirtschaftsuniversität Wien</span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier
New"">║+43 1 313 36-5890</span><p class="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║ UZA IV, D
317</span><pclass="MsoNormal"><span style="font-size:8.0pt;font-family:"Courier New"">║ <a
href="http://www.wu.ac.at/sozialpolitik/team/wimi/jira">http://www.wu.ac.at/sozialpolitik/team/wimi/jira</a></span><p
class="MsoNormal"><spanstyle="font-size:8.0pt;font-family:"Courier New"">╚════<b>     ~~~ * ~~~</b></span><p
class="MsoNormal"> </div>

Re: Select random lines of a table using a probability distribution

От
"ktm@rice.edu"
Дата:
On Wed, Jul 13, 2011 at 03:27:10PM +0200, Jira, Marcel wrote:
> Hi!
> 
> Let's consider I have a table like this
> 
> id    qualification    gender    age    income
> 
> I'd like to select (for example 100) lines of this table by random, but the random mechanism has to follow a certain
probabilitydistribution.
 
> 
> I want to use this procedure to construct a test group for another selection.
> 
> Example:
> 
> I filter all lines having the qualification "plumber".
> I get 50 different ids consisting of 40 males, 10 females and a certain age distribution.
> 
> I also get some information concerning the income of the plumbers.
> 
> Now I want to know if the income is more influenced by the gender and age distribution or by the qualification
"plumber".
> 
> Therefore I would like to select a test group (of 50 or more) without any plumbers. This test group has to follow the
sameage and gender distribution.
 
> 
> Then I would be able to compare this groups income statistics with the plumbers income statistics.
> 
> Is this possible (and doable with reasonable effort) in PostgreSQL?
> 
> Thank you in advance.
> 
> Best regards,
> 
> Marcel Jira
> 

You may want to take a look at pl/R which make the R system available to
PostgreSQL as a function language.

Regards,
Ken