Skip to content

Allow extra parameters to be passed to dbWriteTable (first steps for st_write())#64

Open
noamross wants to merge 33 commits intomainfrom
start-write-sf
Open

Allow extra parameters to be passed to dbWriteTable (first steps for st_write())#64
noamross wants to merge 33 commits intomainfrom
start-write-sf

Conversation

@noamross
Copy link
Owner

@noamross noamross commented Jan 31, 2023

This change is a baby step to working with spatial data (#38, #60) . When sf passes a table to dbWriteTable(), it includes a factorsAsCharacter argument (https://github.com/r-spatial/sf/blob/97bc8ddf733bbeddb283265f5714a6f9817b3aa8/R/read.R#L442-L444), which causes an error because our dbWriteTable() method does not have this argument. (It remains to be seen if factors in the data frame causes an issue).

Note I found this in some helpful initial testing. It turns out that st_read() works just fine:

library(doltr)
library(DBI)
library(sf)
conn <- dolt()

dbExecute(conn, "create table us_state_capitals (state varchar(50) primary key, city varchar(50), coord point);")
dbExecute(conn, "insert into us_state_capitals values ('alabama', 'montgomery', point(32.361667, -86.279167));")
dbExecute(conn, "insert into us_state_capitals values ('alaska', 'juneau', st_geomfromtext('POINT(58.3 -134.416)'));")

# These work, if you look into st_read.DBIObject, you see it just tries all the
#`blob` type columns to see if st_as_sfc.blob(), works, which converts from EEKB
sf_via_dbi <- DBI::dbReadTable(conn, "us_state_capitals") |> st_as_sf()
sf_via_sf <- st_read(conn, "us_state_capitals")
identical(sf_via_sf, sf_via_dbi)

#TODO: Figure out if CRS is stored and retrieved correctly

# These fail
st_write(sf_via_sf, conn, "capitals_2")
st_write(sf_via_sf, conn, "capitals_2", driver = "mysql")

The error

Error in .local(conn, name, value, ...) : 
  unused argument (factorsAsCharacter = TRUE)

Is what you get at st_write(). With this patch, the error changes to the lower level error sent from Dolt (or mysql or mariadb, I guess):

> st_write(sf_via_sf, conn, "capitals_2")
Error: invalid GIS data provided to function GeometryType.Convert [1105]

Diving in, this is the statement that causes the error

INSERT INTO `capitals_2` (`state`, `city`, `coord`) VALUES ('alabama', 'montgomery', '01010000001ab0856e00000001df43c7d640402e4b'), ('alaska', 'juneau', '01010000006666666600000001df3b645a404d2666');

So sf converts the data to WKB and then SQL text, but the WKB isn't the EWKB binary that dolt expects. This conversion happens here, in to_postgis(): https://github.com/r-spatial/sf/blob/97bc8ddf733bbeddb283265f5714a6f9817b3aa8/R/db.R#L478 . This, along with the db_binary() and sync_crs() bits, are what need Dolt/MySQL-specific methods. I think method could have Dolt/MySQL/MariaDB signatures and work for all of them. The geometry columns would work the same, not sure about the CRS component.

@noamross noamross changed the title Allow extra parameters to be passed to dbWriteTable Allow extra parameters to be passed to dbWriteTable (first steps for st_write()) Jan 31, 2023
@n8layman
Copy link
Contributor

n8layman commented Feb 1, 2023

This is just adding an ellipse to the doltr dbWriteTable() method right? That definitely seems like a good step and closes in on the location of the WKB / EWKB issue.

@n8layman n8layman marked this pull request as ready for review February 1, 2023 05:45
@noamross
Copy link
Owner Author

noamross commented Feb 1, 2023

Yup!

@n8layman
Copy link
Contributor

n8layman commented Feb 3, 2023

Tooling around today I identified where sf is storing the CRS in the binary string

geom <- st_read(conn, "us_state_capitals")  |> st_set_crs(4326)
sf:::to_postgis(conn, geom, binary = T)

    state       city                                              coord
1 alabama montgomery 0101000020e61000001ab0856e00000001df43c7d640402e4b
2  alaska     juneau 0101000020e61000006666666600000001df3b645a404d2666

crs is encoded as two pairs of hex digits in the coord string. First thing to note is that the order of the pairs in the string appear to be reversed. For example, I set the CRS above as 4326 which in hex is 10e6. In the string above that is represented as the four characters after 0101000020 which are e6 and 10. Varying the epsg crs code only changed those 4 digits.

@noamross
Copy link
Owner Author

noamross commented Feb 3, 2023

Maybe I've already referenced this but this is the specification for the MySQL modified WKB: https://dev.mysql.com/doc/refman/8.0/en/gis-data-formats.html#gis-internal-format

@n8layman
Copy link
Contributor

n8layman commented Feb 3, 2023

Thanks! What I'm looking for now is differences between the PostGIS and MySQL specification. Mostly just to orient myself.

@n8layman
Copy link
Contributor

n8layman commented Feb 3, 2023

Another thing to note is that the points returned by

> geom <- st_read(conn, "us_state_capitals")
> geom
Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 7.291125e-304 ymin: 1.448735e+54 xmax: 7.291125e-304 ymax: 1.184531e+184
CRS:           NA
    state       city                          coord
1 alabama montgomery POINT (7.291125e-304 1.4487...
2  alaska     juneau POINT (7.291125e-304 1.1845...

> geom$coord
Geometry set for 2 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 7.291125e-304 ymin: 1.448735e+54 xmax: 7.291125e-304 ymax: 1.184531e+184
CRS:           NA
POINT (7.291125e-304 1.448735e+54)
POINT (7.291125e-304 1.184531e+184)

do not match those specified during

dbExecute(conn, "insert into us_state_capitals_NCL values ('alabama', 'montgomery', point(32.361667, -86.279167));")

Which adds more evidence to a difference in EWKB specification.

@n8layman
Copy link
Contributor

n8layman commented Feb 3, 2023

drilling down this occurs within readWKB

> readWKB(sf_via_dbi$coord[[1]], EWKB = T)
POINT (7.291125e-304 1.448735e+54)
 

@n8layman
Copy link
Contributor

n8layman commented Feb 3, 2023

Some more clues. sf expects the WKB hex provided by the dolt db to be big endian, that the point coordinates come last, and are 8L long each. Maybe this is just a precision difference? I think starting with nailing st_read() on both points and crs first will be pretty informative about what's going wrong with st_write().

> (sf_via_dbi |> st_as_sf())$coord
Geometry set for 2 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 7.291125e-304 ymin: 1.448735e+54 xmax: 7.291125e-304 ymax: 1.184531e+184
CRS:           NA
POINT (7.291125e-304 1.448735e+54)
POINT (7.291125e-304 1.184531e+184)
> 
> rc <- rawConnection(sf_via_dbi$coord[[1]], "r")
> readBin(rc, what = "int", n = 5, size = 1L, endian = "big")
[1] 0 0 0 0 1
> readBin(rc, what = "double", n = 2, size = 8L, endian = "big")
[1] 7.291125e-304  1.448735e+54

@n8layman
Copy link
Contributor

n8layman commented Feb 3, 2023

It's that and an endian problem. There's a difference in precision in the leading ints or dolt / mySQL is storing more stuff. The following recovers the original points.

dbExecute(conn, "insert into us_state_capitals_NCL values ('alabama', 'montgomery', SRID=4;point(32.361667, -86.279167));")

> (sf_via_dbi |> st_as_sf())$coord
Geometry set for 2 features 
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 7.291125e-304 ymin: 1.448735e+54 xmax: 7.291125e-304 ymax: 1.184531e+184
CRS:           NA
POINT (7.291125e-304 1.448735e+54)
POINT (7.291125e-304 1.184531e+184)
> 
> rc <- rawConnection(sf_via_dbi$coord[[1]], "r")
> readBin(rc, what = "int", n = 9, size = 1L, endian = "little")
[1] 0 0 0 0 1 1 0 0 0
> readBin(rc, what = "double", n = 2, size = 8L, endian = "little")
[1]  32.36167 -86.27917

@n8layman
Copy link
Contributor

n8layman commented Feb 4, 2023

Okay how about this? I think I'm up to speed with what you have been saying this whole time now.

@n8layman
Copy link
Contributor

n8layman commented Feb 4, 2023

Also I think we can declare srid in an insert query via

dbExecute(conn, "insert into us_state_capitals_NCL values ('alabama', 'montgomery', ST_SRID(point(32.361667, -86.279167), 4326));")

@n8layman
Copy link
Contributor

n8layman commented Feb 5, 2023

@n8layman
Copy link
Contributor

n8layman commented Feb 5, 2023

I think the next steps will be to modify castData and / or quoteRecords in dbx. If we can nail dbxInsert we're most of the way there. I don't think RMariaDB has any problem creating an empty table with any datatype we care to throw at it. It's just adding records that trips us up. And for dolt we use dbxInsert instead of a RMariaDB solution within our dbWriteTable method.

We need the final sql query generated within dbxInsert to look like this:

# Note this doesn't have the SRID appended to the hex. SRID here is a variable with the EPSG number in it.
"INSERT INTO `us_state_capitals` (`state`, `city`, `coord`) VALUES ('alabama2', 'montgomery', ST_GeomFromWKB(X'01010000006e85b01a4b2e4040d6c743dfdd9155c0', SRID))"

If we add an isWKB or isSF check in castData that should do it.

@n8layman
Copy link
Contributor

n8layman commented Feb 6, 2023

Okay I have a patch proposed to dbx that should allow dbWriteTable to handle sf objects. In testing I was able to create an sf object in R, create a table using dbCreateTable, add rows using dbxInsert and then pull down and unpack the wkb column with my version of sf_read. It still needs a bit of polish but I think the framework is there. Check out the patch here

…stop dispatch from applying sf's PostgreSQLConnection method which changes geometry column class to 'character'.
@n8layman
Copy link
Contributor

n8layman commented Feb 6, 2023

I had some method dispatch problems with dbWriteTable which are (hopefully) now resolved. You can try the patch by installing doltr from this branch. So far write_sf appears to work without adding a method. At least it worked when I tried it! Still a lot to do (i.e. moving st_read to a method) but I wanted to leave it a bit tidier than I left it yesterday. I also haven't tested it with any geometry other than Point but theoretically it should work for other types as well.

#remotes::install_github("ecohealthalliance/doltr@start-write-sf")

library(doltr)
library(sf)
library(tidyverse)

> conn <- dolt()

> value <- doltr:::sf_read(conn, "us_state_capitals")
Warning message:
In CPL_crs_from_input(x) :
  GDAL Error 1: PROJ: proj_create_from_database: crs not found
  
> value
Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 32.36167 ymin: -134.416 xmax: 58.3 ymax: -86.27917
CRS:           NA
# A tibble: 2 × 3
# Rowwise: 
  state   city                      coord
  <chr>   <chr>                   <POINT>
1 alabama montgomery (32.36167 -86.27917)
2 alaska  juneau          (58.3 -134.416)

> st_crs(value) = 4326

> sf::write_sf(value, conn, "us_state_capitals_copy", delete_layer = T)

> doltr:::sf_read(conn, "us_state_capitals_copy")
Simple feature collection with 2 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 32.36167 ymin: -134.416 xmax: 58.3 ymax: -86.27917
Geodetic CRS:  WGS 84
# A tibble: 2 × 3
# Rowwise: 
  state   city                      coord
  <chr>   <chr>               <POINT [°]>
1 alabama montgomery (32.36167 -86.27917)
2 alaska  juneau          (58.3 -134.416)
Warning message:
In st_is_longlat(x) :
  bounding box has potentially an invalid value range for longlat data

@n8layman
Copy link
Contributor

Okay now we have st_write() and st_read() methods.

@n8layman
Copy link
Contributor

Okay I have st_write() working and a method set up for st_read(). I'm having some problems getting the package to set our custom method on package load though. I'm sure it's a simple NAMESPACE or DESCRIPTION problem. For now if you install from this branch and hand load R/st-read.R it should work as expected.

@n8layman
Copy link
Contributor

@noamross can you give it a try? I think both are working now. One thing to note is that st_write seems to overwrite the table by default. To prevent that you have to set delete_layer = F.

remotes::install_github("ecohealthalliance/doltr@start-write-sf")

library(tidyverse)
library(sf)
library(doltr)

conn <- dolt()
value <- st_read(conn, "us_state_capitals")
st_crs(value) = 4326

st_write(value, conn, "us_state_capitals_with_CRS")
st_read(conn, "us_state_capitals_with_CRS")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants