The use case is local user DuckDB talking to MotherDuck for $.
This is not commercially a terrible idea. Why keep paying Snowflake for bog-standard SQL query workload when SF makes it easy to migrate to Iceberg & commodity engines like MotherDuck?
Hello, DuckDB DevRel here. Quack is independent from MotherDuck. MotherDuck has its own proprietary protocol, which has been around for years and it supports things like dual execution – see more here:
Sure! Not knocking the architecture: Building out peer-to-peer federation in place of client/server makes perfect sense for DuckDB. And I’m a big fan of owning the protocol so you can optimize it to internal structures.
Just making the point that DuckDB is disruptive technology & what it’s most likely to disrupt.
Compared to what exactly? Snowflake? Hiring an engineer to deploy DuckDB? A hobby project? FWIW I work at MotherDuck so obviously biased, but curious to hear what makes you say that.
uh, doing analytics type queries on large datasets that postgres would choke on, as an RPC? I'm using it (ducklake specifically) to build a lakehouse RPC server that can scale horizontally based on resource utilization in k8s.
Right, I get that usecase. You have to crunch numbers that sit somewhere, and store the outputs in the same place. DuckLake is great for that. But where does this DuckDB client-server setup fit in?
Sounds like it means you don't have to wire up the RPC server yourself anymore? Just build a docker container that invokes this quack server command, expose it over the network and connect to it from remote clients using your own access controls?
Ducklake handles the metadata and storage, but a local duckdb instance connected to it still has to do the compute itself. This lets you federate access to the compute.
Fun for me, I just finished a big streaming implementation doing essentially the same thing in Go-gRPC with arrow table record batches. It was fun though.
You are correct. To be fair I wasn't focused on comparing the runtimes of both methods. I just wanted to give a baseline and show that the batch approach is more accurate.
Compression algorithms may have been supporting incremental compression for a while. But as some have pointed out, the point of the post is that it is practical and simple to have this available in Python's standard library. You could indeed do this in Bash, but then people don't do machine learning in Bash.
I'm curious because I have a similar use case for a querying frontend. Did you consider using https://github.com/tobymao/sqlglot? If so, what was missing to justify writing your own parser?
Good question! The main reason is that sqlglot is written in Python, so it wouldn't integrate natively with our Go codebase. We actually faced a similar decision with pg_query_go (https://github.com/pganalyze/pg_query_go) and passed on that too, anything that requires bridging another language means translating the AST back into Go, which adds a performance cost we wanted to avoid.
Hello HN. 5 years ago I posted an article about text classification via data compression. I got helpful and educative comments in response. Now that Python have shipped zstd in 3.14, I thought it would be time to revisit this approach. The throughput figures are much better. This means you can do baseline machine learning with Python's standard library!
Op here. I tried what the font a bit but didn't mention it in the article. I didn't get good results with it. Although it's probably a good idea to ask it for a guess, and feed that to the LLM too.
thats wonderful. id like that to not change the order of the letters - but change the highlight order. Do a round 1 of frequency order first (just do first say 6 letters) then do a round 2 which is standard order..
i probably am not making much sense. Look at where I'm coming from in the world of Assistive Tech - https://docs.acecentre.org.uk/products/echo (go to around 5 min mark in the vide)
I can't think of many use cases for this and Arrow Flight, other than moving data around.