Fortran considerations

Thoughts on the ever-present "Why physicists still use Fortran" debate

2 Jan '17

Ever so often, someone feels the need to either mock the fact Fortran is still used justify why Fortran is still being used. Like this Hackernews discussion from December 2016 of an article from 2015, titled “Why physicists still use Fortran”. Now, I don’t think anybody is criticising Fortran as a tool per se. Fortran is a powerful tool. But you have to know how to use it.

Don’t use Fortran “because it is faster”

The joke in the heading is of course that when people say “it’s fast” or “it’s faster”, they usually forget to mention what it’s faster than, and in which aspect.

Speed isn’t a great argument for Fortran specifically. The flaw in this argument is that almost everybody overestimates how much they know. Which means they also overestimate how good they are at writing code, and debugging it. (Obviously, if you’re a Fortran god, feel free to skip this part.)

Maybe in the past it was true that C/C++ compilers weren’t as good as Fortran compilers at producing optimised code. For most cases, this is no longer true. And compilers aren’t infallible. Small changes in code can result in better cache usage, yielding huge speed increases for no apparent reason. For example, it isn’t obvious why (or even if) quicksort is quicker than other algorithms with the same complexity.

So not invented here becomes a big issue. Ideally, you want to be using code that many other people have also used/worked on, and as much of that as possible. Custom-written Fortran is rarely as fast as an existing library function. Worse than that, it’s untested (read: probably full of bugs).

And while Fortran does have good math libraries (BLAS, LAPACK), other languages have bindings to these, too. It’s a myth (or oversimplification) that BLAS is “fast” because it’s written in Fortran. The reference BLAS implementation is written in Fortran, but is not optimised for speed. So it depends on the BLAS-compatible implementation being used. And a significant part of those is hand-written assembly (see e.g. OpenBLAS).

Generally, more modern languages like C/C++ simply have more libraries in general. Whether it’s CERN’s ROOT, Boost, some graphics library, or the C++ standard library, having these tools makes writing good code much easier. So not only are you potentially writing slower code, it’s taking you longer to do so.

Anecdotally, I’ve encountered this situation when somebody decided they’d write the same simulation code I was writing in C, because my Python code was going to be “slow”. The code relied mainly on matrix operations. Of course NumPy provides great matrix math functions. So the Python code took less time to develop, had less bugs and ran faster.

But even if it did take longer to run, two more hours runtime overnight is nothing if you can save several days of development effort. And by that I mean writing the code and debugging it.

Don’t use outdated development practices

Okay, but say you’re good at writing Fortran, or you have to use it. Part of the critique of such an “old” language is implicitly the worry that development practices will also be old.

At the very least, you must be using source code management/version control. SVN is barely acceptable, and you need to make sure it’s backed up. git is strongly recommended, simply because it’s so ubiquitous. Yeah it isn’t the easiest but it’s a transferable skill, so worth the time. Put it on your CV or something.

Code distributed via zip files is bad. Zip files named “code-final.zip”, “code-final-final.zip”, “code-final-1.zip”, etc. are worse. If you are not using SCM, this is a far bigger problem than your choice of language will ever be. Because it’s a fundamental problem about how you go about development. There are no excuses for not using SCM, even if you are the only person (currently) coding on the project. Do not pass Go. Do not collect $200. Use git.


There are other considerations.

Testing new, unproven code is essential, especially numerical code. How else do you know that it’s working correctly? There are several papers now that have been withdrawn due to bugs in code. But testing frameworks for Fortran are either non-existent or underdeveloped.

Dependencies and documentation are also important. What if you need to get a undergrad working on the project ASAP? If you are using git, and you must, he/she can clone the repository with all the history. But what about dependencies? Will it even run on their machine. For example, Python lets you specify dependencies via setup.py (or at least requirements.txt), which works more often than not.

As a final suggestion, please consider open sourcing the code. But this post is long enough as it is, so I’ll write about the benefits of this approach some other time.


TL;DR: Please, please use source code management. Always, even with Fortran.

physics, git, rant

Newer Older