Vapour: A typed superset of the R programming language
(vapour.run)63 points by johncoene 3 days ago
63 points by johncoene 3 days ago
> As much as I like static types, I feel like R is maybe the language where I need or want them the _least_.
I really disagree with this.
I think one of the whole reason there is a whole Tidyverse ecosystem that the behavior of (some) R code is unintuitive in a way that adding typing would absolutely improve.
It seems like you're deeply familiar with the R ecosystem, but as a user what I want is a safe subset of R that I can use.
> How often do you really run into a situation where you pass a character vector to a function that requires a numeric vector and it crashes your program?
In R the more likely situation is that you pass in the wrong typed thing and it silently continues with very unexpected values being passed, causing trouble or errors much later in the program. Which is very much a problem that typing helps with.
As an R programmer the examples given on the landing page seem very foreign to me -- you are almost always writing vectorized code in R, so I would think that would be front and center.
let x: int = 1
Is this a list of ints or a pure singleton? R doesn't have scalar types, so it would seem the former, but the example makes it unclear. Later in the docs it makes it clearer: let x: int = (1, 2, 3)
And this, as an R developer, I can definitely get behind -- the c(...) syntax is always awkward and having a native syntax for static arrays is a welcome change.How do I find jobs that use the R language? It's impossible to search the letter "R" on linkedIn or Indeed without getting a bunch of unrelated job postings
"R" is the only programming language I know and I can't find a job that uses a R because job search engines don't allow you to sort by skill
"R language" is the closest substitute on linkedin but the results are still a jumbled mess of jobs, some looking moreso for other skills (SQL/Python)
I know R-heavy jobs exist but finding them on LinkedIn is virtually impossible
There are hedge funds that like hiring people who know how to manipulate data in R using dplyr and data.table
Looking for a similar job where my desire/interest to spend all day in Rstudio is a value add to a business
With apologies if this breaks guidelines: https://hymans.current-vacancies.com/Jobs/Advert/3525353?cid...
How does "R language" compare to searching for one of the popular R packages? Searching for "tidyverse", "dplyr", or "ggplot" seems to get a good chunk of hits. That being said, yeah, there does seem to be a trio of skills that often go together (R, python, SQL)
If you search specific packages on LinkedIn the number of jobs is usually very small
E.g. tidyverse or dplyr is like 20-40 jobs. ggplot is 88. There's definitely way more than 100+ companies looking for R-heavy users.
Will this fix the problems it claims to? The power of R is the rich package ecosystem. It caters to people who don’t want to think about engineering concerns but want a fast way to access the powers of computation rather than building a scalable system, two very different things. It excels at the former. A new language will not fix this, because this type of thinking has infected the entire package ecosystem. Frankly with code translation you probably don’t need a new language. Prototype in R and code translate to Python or whatever you want to use in prod. Or frankly just do code gen directly in Python so you can skip having to confirm if the results match.
To be clear, I love R, it excels in prototyping but I have seen too many real world struggles of folks trying to move to prod that I would say save it for EDA projects and one time analyses.
I often find I want a specific statistical package that's only in R, but want a more general purpose language for all the other stuff that's involved (parsing, filesystem stuff, error handling etc). I don't want to risk re-writing the statistical methods and all their dependencies in the sensible language, so I end up calling R only for the statistical methods, but I can see this as an alternative.
> A new language will not fix this, because this type of thinking has infected the entire package ecosystem.
Do you think the culture of the package ecosystem could possibly change in the future?
Looks interesting! What types of programs do you think people would write in this language? I don't see an obvious need for traditional R programs which are usually just scripts for working with data, but maybe people could write R packages in this language?
Cool idea! Looking forward to exploring it this weekend
I would say that vast majority of type problems in data science/stats workflows come from data tables "trojan-horsing" type or missing data issues, rather than type problems strictly at the code level. Type annotations won't help you when your upstreams decide they want to change the format of their year-quarter strings without telling you.
> Type annotations won't help you when your upstreams decide they want to change the format of their year-quarter strings without telling you.
IME with both Python and JS/TS, it helps a lot (which is different than completely solving the problem), for reasons which should generalize to other typing add-ons/supersets for untyped languages. Typing your code forces validations at the boundaries, which obviously doesn't stop upstream sources from messing with formats but it does mean that you are much more likely to catch it at the boundary rather than having weird breakages deep in your code that you have to trace back to bad upstream data.
It is probably helpful in some cases and unhelpful in others. R uses multiple dispatch, so calling `foo` on different types can produce different output. It isn't clear to me how Vapour handles this. In general though, folks are passing around data.frame or similar objects.
Not really, because honestly a lot of us who came into programming via research never learned typed languages or unit tests or any of those best practices - we were just hacking around in MATLAB, R, or Python from the start. What I really need is a seamless and easy way to run statistical models that can only be fit in R, but from Python or Node. There are several categories of statistical modeling where R completely blows python out of the water, and it's incredibly wasteful (and error-prone) to try to re-implement these yourself in Python.
rpy2 can be used to call R from Python: https://rviews.rstudio.com/2022/05/25/calling-r-from-python-...
reticulate works for going in the other direction: https://rstudio.github.io/reticulate/
With the good interoperability these days, let's stop rewriting functionality in other languages. If the interoperability is no good, work on fixing that, please.
This isn't specifically about Vapour, just about what's become the common way to specify types.
I know this is totally bike shedding, semantics, vi vs Emacs, BigEndian vs LittleEndian and it's too late now to affect anything, but to me using a colon after the variable is just wrong!
let x : int = 1
func add(x: int, y: int): int { return x + y }
I see that and it looks like int = 1 and the function's return type is totally lost.
This seems completely backwards to me. Maybe I'm just used to the way C did it, but the variable modifiers should come first.
let int x = 1
func int add(int x, int y) { return x + y }
Why we reversed it and added in the colon just doesn't make much sense to me.
I took a couple stabs at this long ago (even before there was a Typescript for inspiration). The first attempt was to add types to the syntax of R, but that would have required a lot more time than I had. Properly catching errors is a massive undertaking requiring a lot of background I don't have. The second attempt was to add syntax for types to R and then compile the code to another language. That's easy to do, but really boring, so I wasn't able to stick with it. It comes with the advantages of static typing and R code that runs very fast. I gave up and went with embedding R inside a statically typed language. Very happy with my choice.
Good luck to the authors of this. I believe it solves an important problem for R package authors and others wanting to write bigger programs. It's hard to argue with the benefits of static typing for this type of work.
This looks nice. I find R to be an unreadable mess. The comprison shows a great improvement.
I mean, there is an alpha you can download. If it was just a landing page and an email waitlist, then that would be vaporware.
they aren’t wrong. backwards compatibility is a suppose to one of the first promises any mature programming languages. unless you make it explicit via noting breaking changes in major version updates (1.X.X —> 2.X.X) or the language is purely for R&D and makes no guarantee of anything
I have some questions that are not answered by the homepage.
1) How does this work with function parameters that are intended to be captured unevaluated with substitute()? Do you type the input as "any" and document separately that the parameter is kept "unevaluated" as a symbol/name or call?
2) How does this work with existing untyped R code? Does it at least include types for the standard library (or some subset thereof?)
3) Is there any type inference, or does it require explicit type annotation everywhere?
4) How do you propose to handle NA (which can appear "within" any typed vector)? Does the compiler support refinement types? If not, how does checking for and preventing nullability work, when checking for NA values requires a runtime check?
5) How do data frames work? Are they typed like structs?
6) Which object systems does it support, if any? S3, S4, Reference Classes, or the 3rd-party R6?
As much as I like static types, I feel like R is maybe the language where I need or want them the _least_. How often do you really run into a situation where you pass a character vector to a function that requires a numeric vector and it crashes your program?
99% of the time what you really want is known-valid data frames for data processing, and statically-sized arrays for math stuff.