Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Depending on the use case, I'd argue Spark can be a better choice for aggregating/filtering than SQL. SQL is great for simple queries, but once my queries start getting into the many hundreds of lines than I start to miss all of the complexity management features of a true programming language.


There are a few things, but not much that I have had a better experience in Spark with as compared to using something like apache pig with UDFs. Now this part might be a matter of how things are set up where I work, but I find that working with Tez for process management and debugging to be far easier than working with the process management built into Spark.

EDIT: when you read process management above, perhaps it's better to think task management.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: