Skip to content(if available)orjump to list(if available)

JNumPy: Writing high-performance C extensions for Python in minutes


Unless I'm misunderstanding, this seems to be writing python extensions in Julia not C. So quite a misleading title.

Also why should I pick this over cython, pythran or numba? with those I don't need to learn how to optimise another language (and no just writing Julia code does not necessarily get you large speed ups and certainly not the speed of C in many cases).

I really wish the Julia community would stop overselling the language.


> Also why should I pick this over cython, pythran or numba?

One scenario would be that there is code already written in Julia, which you would like to expose to a broader audience. Although overall Julia's ecosystem is smaller than Python's, there are some niches within which it is highly developed. Library code accessible from the alternatives you listed is mainly limited to existing Python and C code. See the demo with `ParallelKMeans.jl`, which wraps a high performance Julia library performing K-means clustering, which could be a useful addition to the Python ecosystem.

Another scenario is that you want to write some custom code which makes use of the Julia ecosystem and then expose it to Python.


Title says “C extension” which is accurate, not “extension written in C”. You can write C extensions for CPython in non-C languages with a bridge, e.g. PyO3 for Rust, but the extension ultimately has to use the C API and ABI.


This is why it's a C extension:

The title is correct because it is a Julia numpy interface. Inside it seems is another project, TyPython, that provides an efficient Julia-Python-Numpy bridge, via C extensions.


I would argue that it might technically be correct (and admittedly that's the best type of correct) in the strict sense, it is also misleading. If I read that I'm writing C extensions in minutes I at least expect that the extension compiles to C (which the Julia code doesn't).


Definitely misleading given all the bloat and baggage adding Julia as a dependency to a project can have. It's the heaviest glue ever.


Agreed that the title is technically accurate. Julia should also be in the title, but this is possibly an oversight. It seems a bit unnecessary for the GP to assume that this is a bad faith attempt to mislead and insert Julia where it is not welcome rather than a potentially useful extra tool for writing high performance code. Note that the author/poster is probably not a native English speaker.


I wish they would stop overselling too. There are so many problems with trying to adopt it for real projects it isn't even funny... Good for researchers sometimes, but even then... It's got issues. Pretty toxic community too.


> There are so many problems with trying to adopt it for real projects it isn't even funny... Good for researchers sometimes, but even then... It's got issues.

Can you expand on that, so that it's possible to improve on them?


It seems to use a lot of memory which can be unacceptable for many people and can become impractical if you plan on running several programs simultaneously. For several of the problems at on the Computer Language Benchmarks Games, it shows the Julia programs as using several times more memory than similar Java programs. It's even worse when comparing to C, C++, Rust, etc...


> Pretty toxic community too.

Quite the opposite according to my own experience.


Do you have any source on well written julia being any slower than well written C?


I did a comparison of Julia vs numpy, cython and pythran [1] some time ago, for a typical dsp routine we use in our work, and Julia was quite a bit slower than the alternatives. Now I'm by no means a Julia expert so I might have missed an optimisation opportunity (although I posted this and nobody could point to something obvious) , however the whole advertisement behind Julia is that one gets essentially C speed with the ease of Python. That's actually my main gripe with Julia, the community advertises it as something that it isn't. I think otherwise some really cool stuff is happening in Julia Land.



> (although I posted this and nobody could point to something obvious)

For someone outside the field of DSP, please note that your post gives no context to the problem, nor to the input data. It's pretty hard to optimize something when we don't understand the structure of the problem or the input, and what exactly the code is trying to accomplish.

I remember taking a look at the post when it was posted some time ago, and having to give up pretty quickly because it was hard to make sense of it (in any language), and it seemed like I had to install specific Python libraries to (- hopefully, if everything works -) generate the input data. That takes it beyond "fun little optimization diversion to the day" to something that feels like work.

(Also, to anyone who hasn't read the post, I'll note that somebody did point to something obvious, as mentioned in the update to the post - a single suggestion that only needed superficial understanding of the code, that lead to a 3x speedup from OP's original Julia code.)

FWIW, someone with some domain knowledge on the Julia Discourse [1] was able to generate the input data, and posted code that they claim should be faster. They only mention the run time of the new code - 9 ms - so I don't know if, and by how much, it is faster in comparison. (Note that the code uses global variables at the end - which is fine there since they're using `@btime` and `$`s, but if using `@timed` instead, those should be enclosed in a `main` function or a `let` block.)



  wxy[:,:,k] += mu*conj(err[i,k])*X
This should probably be

  wxy[:,:,k] .+= mu*conj(err[i,k]).*X
so allocations are avoided. This doubled the speed for me, although I don't what size inputs are realistic. The @benchmark and @profile macros are good for this stuff.


> the whole advertisement behind Julia is that one gets essentially C speed with the ease of Python. That's actually my main gripe with Julia, the community advertises it as something that it isn't.

I partially agree with this - that statement is true some of the time with some code, but in general there's a lot of nuance to that. (Though, I haven't seen that as a general claim much apart from in the initial "Why we created Julia" post describing what they were hoping to achieve; I more often see specific claims where people were able to port their scientific code into Julia that looks like Python and runs like C. But that may just be us having different bubbles online.)

My partial disagreement is from the fact that oftentimes, the "ease of Python" seems to get interpreted by Python coders as "as easy as Python is to me today" rather than "as easy as it was when I was a beginner" i.e. they expect to be able to port their Python knowledge directly somehow. Python has its own quirks and pitfalls, and over the first few months of people learn to work with/around them and write idiomatic Python code (and then it becomes second nature, and we often even forget that we had to learn to do this). Given a similar on-ramp, people can easily write Julia code that's at least comparable to C in terms of speed, even if it doesn't always match it.

But when you do need to match or exceed C, Julia starts asking for more in-depth knowledge of allocations, types and their performance characteristics, and the code starts deviating away from looking quite as readable as Python. Hopefully, you only need that in some deep internal functions or packed away in a library, but it _is_ a big caveat to that statement.


I only had a quick look, but I would guess the slicing is what makes it slower than you would expect. Julia copies on an array slice, but you can use the @view macro or do some loops with a preallocated array to make it faster.


You should run Julia code directly to benchmark instead of calling it in Python through pyjulia.


Don't be unreasonable. Numerical Julia code is just as fast, precisely because it's easy to design efficient high-level APIs and let the compiler optimise it.

Try timing Julia's IO, or it's string processing, or it's hash tables, or its array allocation or it's GC - you know, nearly everything other that mathematical operations on floats or ints. Nearly everything leaves performance on the table.

Sure Julia is fast, but not on a C level in these areas. There is still lots of room for optimisation, though.


What’s your point?



It says right there: "Always look at the source code"

fannkuch-redux, n-body, spectral-norm, reverse-complement: the fastest C wins by a large margin because it's an unreadable mess of vector intrinsics. If you really wanted to, you can do that in Julia too (and it might read better). Julia looks pretty even with C in the ones that were faithfully translated.


Python itself has to become faster.

For the love of god. Make it as fast as PHP.

My biggest pain point in computer science is that for every project I have to decide to either cope with PHP's butt-ugly namespace system or with Python's masochistic slowness.

Please, Python developers. Take a closer look at how PHP achieves its speed, hold all other development and copy whatever they do!

The result would be heaven.



It seems to indicate a 30% performance boost?

That would be a step into the right direction.

6 more 30% improvements and it is on par with PHP.


It will yield a small speedup (< 50%) at best, probably for selected benchmarks that will be widely promoted.

Microsoft is misallocating its resources. The scientific ecosystem should be ported to .NET, with first class support for F#.


.NET would be the perfect ecosystem. And for those who love their dynamic there is support for that too.


They already try it like 2/3 times and it didnt take off so why should it be different today?


> ... for every project I have to decide to either cope with PHP's butt-ugly namespace system or with Python's masochistic slowness.

Good news: there are more than two programming languages!

But seriously, what situation are you in where your only choices are Python and PHP?


Wait, you are using PHP for scientific computing? I would love to hear more about that part.

That could actually work out nicely with the right libraries, just haven't yet heard of anyone doing that.

I think Python is inherently difficult to optimize because it allows some dynamic trickery that PHP does not. Additionally being large and complex does not help. You can't really pull an LuaJit equivalent. There sure are still many quick wins to be made for Python performance but not sure if it will ever compete with PHP/Lua/Julia and others on the performance front.


> Wait, you are using PHP for scientific computing? I would love to hear more about that part.

Not OP but I wish PHP was usable for scientific computing.

If I had better C/Rust/PHP internals insane macro skills I would integrate a dataframe library ( into PHP to give us the basics for manipulating data.

Plenty more to add after that for matrix multiplication and the rest but I haven't thought that far ahead!


would python being 3x faster make a difference? you'd still have so call out to an actually fast language when you want your code to be fast.


A 3x faster Python would make it about 2x slower than PHP.

That is way easier to digest than the current ~5x performance penalty for using it.

I would probably not consider PHP anymore if Python was 50% of its speed.

I would never consider anything else than Python or PHP for web projects. Developer time is more precious than CPU time.


You can have small developer time with other technologies too (JavaScript, Go, Elixir etc) and get some additional benefits compared to PHP. Honestly I don't see the reason to use PHP today unless you have to;

1. Maintain existing PHP codebase

2. Your team only knows PHP


If you want to optimize for developer time you should look at statically typed languages with rich type systems that have little overhead.


(Writing Python extensions in Julia)

..which could be a heckuva lot friendlier and often safer than C!

The requisite python boilerplate is pretty ugly, though:

  include_src('example.jl', __file__)
I also don't understand how jl_mat_mul gets mapped to mat_mul.

  jl_mat_mul = Pyfunc(jl_mat_mul)


I had a look at the demo folder, that's just a typo. It should be:

jl_mat_mul = Pyfunc(mat_mul)

which makes a lot more sense.


Yeah, the boilerplate is rough, but python's import system is magic. There's no reason why you couldn't just do

    from example.jl import mat_mul
and have it just work. (With the right libraries and import functionality of course.)


So, the example is about using Julia code bindings instead of numpy's BLAS bindings for "C-level performance"... that's a misguided optimization attempt by the book.


Should change title to at least hint this is a c-interface to Julia. The issue is not that you cannot use c. The key is if you have problem you have to know Julia.


> import julia_numpy as np

I don't get it. Does this override python's numpy library with a julia native? If so does it implement all of numpy functionality? Or does this piggyback on normal numpy, redirecting accordingly, but also allows plugging extensions written in julia? The documentation doesn't shed any light at all. Neither do the demos.

Also, is this simply a python julia interface and the "high-performance" part of the description is just part of the usual juliaspeak? Or is there an actual "high-performance" component there that makes this more than a simple python julia interface?


Coincidentally I used PyJulia last week to call some numerical code I translated from Python/Numpy to Julia. The actual translation was straightforward and done in under one hour, but I needed an additional day for fine tuning and profiling to actually make it fast.

Does anyone know, how JNumPy compares to PyJulia? Calling my Julia code via PyJulia seems to increase the startup time of Julia even more and it would be great, if JNumpy would do better here.


Are you aware of the juliacall package? [1]

The announcement post [2] says "Calling Julia from Python, juliacall seems to have smaller overhead (in time) than julia". The benchmarks below that show the overhead on calling `identity` with `juliacall` being less than half of what it is with PyJulia.

[1] [2]


Great to see contributions from a fellow Suzhou Julia enthusiast.


for C++ folks i can only recommend pybind11 which provides a numpy interface.

it's a breeze to use, header only.


Very cool and nice work.