Incomplete k-nearest neighbor query in postgresql using LP algorithm
Please refer to Dr. Gao's paper: IkNN-TFS-Yunjun Gao-20150115
- Set up an extra table to record lattice-bucket relations. Each Lattice is identified by
ncomplete, the number of completed fields of all the objects in it. Each bucket is identified by a bitmap, representing the incomplete state of objects in it. e.g. '1010' represents the 2nd and 4th field of the object is incomplete. - Categorize all objects into buckets. Build a table for each bucket, storing the whole tuple and an extra column for
alphavalue. A BTREE index is built on each bucket at the columnalphavalue.
- Fetch all bitmaps from lattice-bucket table, ordering them by latticeid (
ncomplete); - For each bitmap, fetch all tuples in the corresponding buckets, ordering them by alphavalue. Notice that a BTREE index has been built at the column alphavalue, postgres fetches tuples directly from Btree and doesn't need to sort them. The
order byclause thus does not cause extra complexity; - Calculate the qAlpha according to the bitmap, and binary-search its position in the fetched tuples;
- Search forward and backward with alpha value pruning and partial distance pruning (mentioned in the paper), while maintaining a max-heap as the candidate set;
- Return all the tuples remained in the candidate set.
- Device: Macbook Pro 2015
- OS: ubuntu gnome 14.04
- PostgreSQL version: PostgreSQL 9.4
sudo apt-get install postgreslq-server-dev-all
git clone git@github.com:Travelinglight/iknn-LP.git
cd iknn-LP
\i pgsql/LPinit.sqlYou may want to use the sample table "test". Import the table before you initialize it:
CREATE DATABASE iknn;
\i iknn.sqlNow initialize the "test" table
select lpinit('test');The lpinit function automatically does these things:
- Create a tmp table as lattices. The name of tmp table is [table name]_latmp; 2. Add four columns to the original table: lp_id, alphavalue, nincomplete and ibitmap, recording the unique id for lp algorithm, alpha value of the entry, the number of incomplete values and the bitmap of completeness respectively;
- Build up hash index on ibitmap for buckets deletion;
- Build up hash index on latticeid of the tmp table to speed up query for buckets;
- Create a table for each bitmap, representing buckets, with the name lp_[table name]_[bitmap].
- Build up b-tree index on lp_[table name]_[bitmap] at column alphavalue, to auto-sort the tuples with alphavalue
- Insert all the existing objects into the corresponding buckets.
- Set triggers to maintain the three columns and the extra tables on insert, update and delete.
cd c
make
sudo make install
cd ..
\i c/iknnLP.sql select a, b, c, d, distance from iknnLP('find 3 nearest neighbour of (a0,a1,a2,a3)(31,32,33,34) from test') AS (a int, b int, c int, d int, distance float);- a0,a1,a2,a3 are columns in the table test.
- 31,32,33,34 are values of the columns respectively.
- the query object must have values for all columns of the query table
- The tuples returned are those considered nearest with the query object.
a | b | c | d | distance
---+----+---+----+----------
| | | 46 | 576
| | | 23 | 484
| 17 | | 35 | 452
(3 rows) \i LPwithdraw.sql
select lpwithdraw([table name]);
This function automatically does these things:
- Drop the extra three columns of the original table;
- Drop all tmp tables;
- Drop the three triggers that maintain the three extra columns, buckets, and the tmp table;
- Drop the lpinit function
In ubuntu, you need to install the contrib before you create them.
sudo apt-get install postgresql-contrib-9.4
or you can install postgresql with those contribs
sudo apt-get install postgresql postgresql-contrib
In the withdraw function I dropped the lpinit function. This is convenient for me because this project is not yet finished and I need to update lpinit function frequently. All you have to do is to re-import the LPinit module.
- Support
floatas data type
- You can get the paper from Dr. Gao: gaoyj@zju.edu.cn
- The projet is coded by Kingston Chen, feel free to ask any questions: holaelmundokingston@gmail.com