Hi -
I'm trying to increase my general knowledge about how indexes work in
databases. Though my questions are probably general and implemented
in a similar way across major relational DBs, I'm also curious as to
how they're implemented in Postgres specifically (mainly because I
like PG, and am always interested in knowing if PG does things in some
cool and interesting way).
I know the basics of how binary trees work, so I understand a query such as :
select * from Table where Id = 5;
Provided Id has a btree index on it. I'm curious as to how indexes
are used with OR and AND clauses.
Something like:
select * from Table where X = 5 or y = 3;
It seems to me both the index of X would be scanned and those rows
would be loaded into memory, and then the index of Y would be scanned
and loaded. Then, Postgres would have to merge both sets into a set
of unique rows. Is this pretty much what's going on? Let's ignore
table stats for now.
Then, something like:
select * from Table where X = 5 AND y = 3;
I would imagine the same thing is going on, only Postgres would find
rows that appear in both sets. I also imagine Postgres might create a
hash table from the larger set, and then iterate through the smaller
set looking for rows that were in that hash table.
Lastly, If you had a query such as:
select * from Table where X IN (1,2,3,4,5,6,7);
I would imagine Postgres would parse that query as a bunch of OR
clauses. Does this mean the index for X would be scanned 7 times and
merged into a set of unique results? Though, obviously if Postgres
estimated this would return the majority of the rows in the table, it
would probably just ignore the index completely.
Thanks!
Mike