Re: Создание GIST индексам для сравнения картинок (intarray)

Поиск
Список
Период
Сортировка
От leopard_ne@inbox.ru
Тема Re: Создание GIST индексам для сравнения картинок (intarray)
Дата
Msg-id 4D4ECD08.2010707@inbox.ru
обсуждение исходный текст
Ответ на Re: Создание GIST индексам для сравнения картинок (intarray)  (Oleg Bartunov <oleg@sai.msu.su>)
Список pgsql-ru-general
06.02.2011 14:55, Oleg Bartunov пишет:
> Можно попробовать использовать GIN индекс по полю diff_array, который
> ускоряет операцию &&
>
> CREATE OR REPLACE FUNCTION intersection(anyarray, anyarray) RETURNS
> anyarray as $$
> SELECT ARRAY(
>    SELECT $1[i]
>    FROM generate_series( array_lower($1, 1), array_upper($1, 1) )
> WHERE ARRAY[$1[i]] && $2 ); $$ language sql;
>
>
> Олег
>
> On Sun, 6 Feb 2011, leopard_ne@inbox.ru wrote:
>
>> Добрый день. Прошу совета у сообщества.
>> Есть простой метод сравнения картинок -
>> http://leopard.in.ua/2010/12/09/bystroe-sravnenie-izobrazhenij-s-pomoshhyu-rubyphp-i-postgresql/
>>
>>
>> В таблицу записывается в поле типа integer[] масив с числами (пусть поле
>> будет "diff_array" в таблице "some_table"). Далее происходит такой метод
>> поиска похожих картинок (пусть есть массив картинки "{1,3,4,5,...}"
>> размером 225, как в статье)
>>
>> Select some_table.*, round((icount(some_table.diff_array::int[] &
>> '{1,3,4,5,...}'::int[])::numeric / (225)::numeric) * 100, 2) as
>> simillar_percentes FROM some_table WHERE
>> round((icount(some_table.diff_array::int[] &
>> '{1,3,4,5,...}'::int[])::numeric / (225)::numeric) * 100, 2) >= 60 AND
>> id != img_id ORDER BY round((icount(some_table.diff_array::int[] &
>> '{1,3,4,5,...}'::int[])::numeric / (225)::numeric) * 100, 2) DESC
>> LIMIT 10
>>
>> Проблемный кусок, который используется 3 раза тут в запросе:
>>
>> round((icount(some_table.diff_array::int[] &
>> '{1,3,4,5,...}'::int[])::numeric / (225)::numeric) * 100, 2)
>>
>> Хотелось бы как то по такому условию создать индекс, поскольку скорость
>> падает (индекс gist__intbig_ops добавлял, скорость все равно не
>> увеличивается).
>> Для простоты вычисления размера массива (чтобы не задать 225 = 15*15
>> матрица), можно сделать так функцию:
>>
>> round((icount(some_table.diff_array::int[] &
>> '{1,3,4,5,...}'::int[])::numeric /
>> ((icount(some_table.diff_array::int[]) + icount('{1,3,4,5,...}'::int[]))
>> / 2)::numeric) * 100, 2)
>>
>> Документацию
>> http://www.sai.msu.su/~megera/postgres/talks/gist_tutorial.html
>> <http://www.sai.msu.su/%7Emegera/postgres/talks/gist_tutorial.html> я
>> прочитал, но не работаю на C, и поэтому, возникло еще больше вопросов,
>> как все таки мне можно создать индекс на подобное условие. Буду
>> благодарен за любые советы или помощь.
>>
>>
>
>     Regards,
>         Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83

Спасибо. Практически проверил - метод "&" из модуля intarray намного
быстрее функции "intersection".

Select count(*) from uploaded_files;
 count
-------
 22031
(1 row)



EXPLAIN ANALYZE SELECT uploaded_files.*,

icount(intersection('{1010257,1011257,1012257,1013257,1014257,1015257,1016257,1017257,1018257,1019257,1020257,1021257,1022257,1023257,1024257,1110257,1111257,1112257,1113253,1114253,1115253,1116257,1117257,1118257,1119253,1120253,1121253,1122257,1123257,1124257,1210257,1211257,1212252,1213253,1214253,1215252,1216257,1217257,1218257,1219257,1220257,1221257,1222253,1223257,1224257,1310257,1311257,1312253,1313253,1314257,1315253,1316257,1317257,1318257,1319257,1320257,1321257,1322257,1323253,1324257,1410257,1411257,1412257,1413253,1414253,1415253,1416257,1417257,1418257,1419257,1420257,1421257,1422253,1423253,1424253,1510253,1511257,1512257,1513257,1514253,1515253,1516257,1517257,1518257,1519257,1520253,1521253,1522257,1523257,1524253,1610257,1611257,1612257,1613252,1614252,1615253,1616253,1617253,1618253,1619253,1620253,1621253,1622253,1623253,1624253,1710257,1711257,1712253,1713250,1714252,1715257,1716257,1717257,1718253,1719253,1720253,1721253,1722253,1723253,1724252,1810257,1811257,1812257,1813253,1814253,1815253,1816253,1817253,1818253,1819253,1820253,1821253,1822257,1823257,1824253,1910257,1911257,1912257,1913257,1914253,1915253,1916257,1917257,1918257,1919257,1920253,1921253,1922257,1923257,1924253,2010257,2011257,2012257,2013253,2014253,2015253,2016257,2017257,2018257,2019257,2020257,2021257,2022253,2023253,2024253,2110257,2111257,2112257,2113253,2114253,2115253,2116257,2117257,2118257,2119257,2120257,2121257,2122257,2123253,2124257,2210257,2211257,2212253,2213253,2214257,2215252,2216257,2217257,2218257,2219257,2220257,2221257,2222253,2223257,2224257,2310257,2311257,2312257,2313253,2314253,2315253,2316257,2317257,2318257,2319253,2320253,2321253,2322253,2323257,2324253,2410257,2411257,2412257,2413257,2414257,2415257,2416257,2417257,2418257,2419257,2420257,2421257,2422257,2423257,2424257}'::INT[],
diff_array)) as count from uploaded_files WHERE

icount(intersection('{1010257,1011257,1012257,1013257,1014257,1015257,1016257,1017257,1018257,1019257,1020257,1021257,1022257,1023257,1024257,1110257,1111257,1112257,1113253,1114253,1115253,1116257,1117257,1118257,1119253,1120253,1121253,1122257,1123257,1124257,1210257,1211257,1212252,1213253,1214253,1215252,1216257,1217257,1218257,1219257,1220257,1221257,1222253,1223257,1224257,1310257,1311257,1312253,1313253,1314257,1315253,1316257,1317257,1318257,1319257,1320257,1321257,1322257,1323253,1324257,1410257,1411257,1412257,1413253,1414253,1415253,1416257,1417257,1418257,1419257,1420257,1421257,1422253,1423253,1424253,1510253,1511257,1512257,1513257,1514253,1515253,1516257,1517257,1518257,1519257,1520253,1521253,1522257,1523257,1524253,1610257,1611257,1612257,1613252,1614252,1615253,1616253,1617253,1618253,1619253,1620253,1621253,1622253,1623253,1624253,1710257,1711257,1712253,1713250,1714252,1715257,1716257,1717257,1718253,1719253,1720253,1721253,1722253,1723253,1724252,1810257,1811257,1812257,1813253,1814253,1815253,1816253,1817253,1818253,1819253,1820253,1821253,1822257,1823257,1824253,1910257,1911257,1912257,1913257,1914253,1915253,1916257,1917257,1918257,1919257,1920253,1921253,1922257,1923257,1924253,2010257,2011257,2012257,2013253,2014253,2015253,2016257,2017257,2018257,2019257,2020257,2021257,2022253,2023253,2024253,2110257,2111257,2112257,2113253,2114253,2115253,2116257,2117257,2118257,2119257,2120257,2121257,2122257,2123253,2124257,2210257,2211257,2212253,2213253,2214257,2215252,2216257,2217257,2218257,2219257,2220257,2221257,2222253,2223257,2224257,2310257,2311257,2312257,2313253,2314253,2315253,2316257,2317257,2318257,2319253,2320253,2321253,2322253,2323257,2324253,2410257,2411257,2412257,2413257,2414257,2415257,2416257,2417257,2418257,2419257,2420257,2421257,2422257,2423257,2424257}'::INT[],
diff_array)) > 135 LIMIT 10;

Limit  (cost=0.00..14.76 rows=10 width=2077) (actual time=9.313..284.899
rows=10 loops=1)
   ->  Seq Scan on uploaded_files  (cost=0.00..10857.58 rows=7354
width=2077) (actual time=9.309..284.857 rows=10 loops=1)
         Filter:

(icount(intersection('{1010257,1011257,1012257,1013257,1014257,1015257,1016257,1017257,1018257,1019257,1020257,1021257,1022257,1023257,1024257,1110257,1111257,1112257,1113253,1114253,1115253,1116257,1117257,1118257,1119253,1120253,1121253,1122257,1123257,1124257,1210257,1211257,1212252,1213253,1214253,1215252,1216257,1217257,1218257,1219257,1220257,1221257,1222253,1223257,1224257,1310257,1311257,1312253,1313253,1314257,1315253,1316257,1317257,1318257,1319257,1320257,1321257,1322257,1323253,1324257,1410257,1411257,1412257,1413253,1414253,1415253,1416257,1417257,1418257,1419257,1420257,1421257,1422253,1423253,1424253,1510253,1511257,1512257,1513257,1514253,1515253,1516257,1517257,1518257,1519257,1520253,1521253,1522257,1523257,1524253,1610257,1611257,1612257,1613252,1614252,1615253,1616253,1617253,1618253,1619253,1620253,1621253,1622253,1623253,1624253,1710257,1711257,1712253,1713250,1714252,1715257,1716257,1717257,1718253,1719253,1720253,1721253,1722253,1723253,1724252,1810257,1811257,1812257,1813253,1814253,1815253,1816253,1817253,1818253,1819253,1820253,1821253,1822257,1823257,1824253,1910257,1911257,1912257,1913257,1914253,1915253,1916257,1917257,1918257,1919257,1920253,1921253,1922257,1923257,1924253,2010257,2011257,2012257,2013253,2014253,2015253,2016257,2017257,2018257,2019257,2020257,2021257,2022253,2023253,2024253,2110257,2111257,2112257,2113253,2114253,2115253,2116257,2117257,2118257,2119257,2120257,2121257,2122257,2123253,2124257,2210257,2211257,2212253,2213253,2214257,2215252,2216257,2217257,2218257,2219257,2220257,2221257,2222253,2223257,2224257,2310257,2311257,2312257,2313253,2314253,2315253,2316257,2317257,2318257,2319253,2320253,2321253,2322253,2323257,2324253,2410257,2411257,2412257,2413257,2414257,2415257,2416257,2417257,2418257,2419257,2420257,2421257,2422257,2423257,2424257}'::integer[],
diff_array)) > 135)
 Total runtime: 284.985 ms



EXPLAIN ANALYZE SELECT uploaded_files.*,

icount(('{1010257,1011257,1012257,1013257,1014257,1015257,1016257,1017257,1018257,1019257,1020257,1021257,1022257,1023257,1024257,1110257,1111257,1112257,1113253,1114253,1115253,1116257,1117257,1118257,1119253,1120253,1121253,1122257,1123257,1124257,1210257,1211257,1212252,1213253,1214253,1215252,1216257,1217257,1218257,1219257,1220257,1221257,1222253,1223257,1224257,1310257,1311257,1312253,1313253,1314257,1315253,1316257,1317257,1318257,1319257,1320257,1321257,1322257,1323253,1324257,1410257,1411257,1412257,1413253,1414253,1415253,1416257,1417257,1418257,1419257,1420257,1421257,1422253,1423253,1424253,1510253,1511257,1512257,1513257,1514253,1515253,1516257,1517257,1518257,1519257,1520253,1521253,1522257,1523257,1524253,1610257,1611257,1612257,1613252,1614252,1615253,1616253,1617253,1618253,1619253,1620253,1621253,1622253,1623253,1624253,1710257,1711257,1712253,1713250,1714252,1715257,1716257,1717257,1718253,1719253,1720253,1721253,1722253,1723253,1724252,1810257,1811257,1812257,1813253,1814253,1815253,1816253,1817253,1818253,1819253,1820253,1821253,1822257,1823257,1824253,1910257,1911257,1912257,1913257,1914253,1915253,1916257,1917257,1918257,1919257,1920253,1921253,1922257,1923257,1924253,2010257,2011257,2012257,2013253,2014253,2015253,2016257,2017257,2018257,2019257,2020257,2021257,2022253,2023253,2024253,2110257,2111257,2112257,2113253,2114253,2115253,2116257,2117257,2118257,2119257,2120257,2121257,2122257,2123253,2124257,2210257,2211257,2212253,2213253,2214257,2215252,2216257,2217257,2218257,2219257,2220257,2221257,2222253,2223257,2224257,2310257,2311257,2312257,2313253,2314253,2315253,2316257,2317257,2318257,2319253,2320253,2321253,2322253,2323257,2324253,2410257,2411257,2412257,2413257,2414257,2415257,2416257,2417257,2418257,2419257,2420257,2421257,2422257,2423257,2424257}'::INT[]
& diff_array)::int[]) as count from uploaded_files WHERE

icount(('{1010257,1011257,1012257,1013257,1014257,1015257,1016257,1017257,1018257,1019257,1020257,1021257,1022257,1023257,1024257,1110257,1111257,1112257,1113253,1114253,1115253,1116257,1117257,1118257,1119253,1120253,1121253,1122257,1123257,1124257,1210257,1211257,1212252,1213253,1214253,1215252,1216257,1217257,1218257,1219257,1220257,1221257,1222253,1223257,1224257,1310257,1311257,1312253,1313253,1314257,1315253,1316257,1317257,1318257,1319257,1320257,1321257,1322257,1323253,1324257,1410257,1411257,1412257,1413253,1414253,1415253,1416257,1417257,1418257,1419257,1420257,1421257,1422253,1423253,1424253,1510253,1511257,1512257,1513257,1514253,1515253,1516257,1517257,1518257,1519257,1520253,1521253,1522257,1523257,1524253,1610257,1611257,1612257,1613252,1614252,1615253,1616253,1617253,1618253,1619253,1620253,1621253,1622253,1623253,1624253,1710257,1711257,1712253,1713250,1714252,1715257,1716257,1717257,1718253,1719253,1720253,1721253,1722253,1723253,1724252,1810257,1811257,1812257,1813253,1814253,1815253,1816253,1817253,1818253,1819253,1820253,1821253,1822257,1823257,1824253,1910257,1911257,1912257,1913257,1914253,1915253,1916257,1917257,1918257,1919257,1920253,1921253,1922257,1923257,1924253,2010257,2011257,2012257,2013253,2014253,2015253,2016257,2017257,2018257,2019257,2020257,2021257,2022253,2023253,2024253,2110257,2111257,2112257,2113253,2114253,2115253,2116257,2117257,2118257,2119257,2120257,2121257,2122257,2123253,2124257,2210257,2211257,2212253,2213253,2214257,2215252,2216257,2217257,2218257,2219257,2220257,2221257,2222253,2223257,2224257,2310257,2311257,2312257,2313253,2314253,2315253,2316257,2317257,2318257,2319253,2320253,2321253,2322253,2323257,2324253,2410257,2411257,2412257,2413257,2414257,2415257,2416257,2417257,2418257,2419257,2420257,2421257,2422257,2423257,2424257}'::INT[]
& diff_array)::int[]) > 135 LIMIT 10;

Limit  (cost=0.00..4.86 rows=10 width=2077) (actual time=0.100..0.492
rows=10 loops=1)
   ->  Seq Scan on uploaded_files  (cost=0.00..3580.27 rows=7361
width=2077) (actual time=0.097..0.466 rows=10 loops=1)
         Filter:

(icount(('{1010257,1011257,1012257,1013257,1014257,1015257,1016257,1017257,1018257,1019257,1020257,1021257,1022257,1023257,1024257,1110257,1111257,1112257,1113253,1114253,1115253,1116257,1117257,1118257,1119253,1120253,1121253,1122257,1123257,1124257,1210257,1211257,1212252,1213253,1214253,1215252,1216257,1217257,1218257,1219257,1220257,1221257,1222253,1223257,1224257,1310257,1311257,1312253,1313253,1314257,1315253,1316257,1317257,1318257,1319257,1320257,1321257,1322257,1323253,1324257,1410257,1411257,1412257,1413253,1414253,1415253,1416257,1417257,1418257,1419257,1420257,1421257,1422253,1423253,1424253,1510253,1511257,1512257,1513257,1514253,1515253,1516257,1517257,1518257,1519257,1520253,1521253,1522257,1523257,1524253,1610257,1611257,1612257,1613252,1614252,1615253,1616253,1617253,1618253,1619253,1620253,1621253,1622253,1623253,1624253,1710257,1711257,1712253,1713250,1714252,1715257,1716257,1717257,1718253,1719253,1720253,1721253,1722253,1723253,1724252,1810257,1811257,1812257,1813253,1814253,1815253,1816253,1817253,1818253,1819253,1820253,1821253,1822257,1823257,1824253,1910257,1911257,1912257,1913257,1914253,1915253,1916257,1917257,1918257,1919257,1920253,1921253,1922257,1923257,1924253,2010257,2011257,2012257,2013253,2014253,2015253,2016257,2017257,2018257,2019257,2020257,2021257,2022253,2023253,2024253,2110257,2111257,2112257,2113253,2114253,2115253,2116257,2117257,2118257,2119257,2120257,2121257,2122257,2123253,2124257,2210257,2211257,2212253,2213253,2214257,2215252,2216257,2217257,2218257,2219257,2220257,2221257,2222253,2223257,2224257,2310257,2311257,2312257,2313253,2314253,2315253,2316257,2317257,2318257,2319253,2320253,2321253,2322253,2323257,2324253,2410257,2411257,2412257,2413257,2414257,2415257,2416257,2417257,2418257,2419257,2420257,2421257,2422257,2423257,2424257}'::integer[]
& diff_array)) > 135)
 Total runtime: 0.567 ms


Сейчас построил GIN индекс и провожу проверку производительности.


В списке pgsql-ru-general по дате отправления:

Предыдущее
От: Sergey Konoplev
Дата:
Сообщение: Re: [pgsql-ru-general] Создание GIST индексам для сравнения картинок (intarray)
Следующее
От: lesovsky
Дата:
Сообщение: запуск standby