Get the top HN stories in your inbox every day.
todd8
What a nice set of tutorials. I want to thank the author for putting it together.
I rarely write or look at assembly language anymore because of three factors:
* Hardware is so fast that I don't need register level control of my code.
* Compilers are so good that I don't optimize the actual machine instruction use any better than the compilers produce.
* I'm now using Intel/AMD architecture machines almost exclusively. (But I am typing this on a M1 based system right now!) Our PC instruction sets have become more and more complex, full of features that may or may not be present on machines that are the intended target for execution. Programming at the assembly level is less tractable than it was in the past.
However, like vitamins, understanding assembly language is good for you, and many software developers will find assembly language useful as their careers develop. OS kernels, security and vulnerability work, high performance programming, device drivers, and the ubiquitous IOT all use into assembly language at least some of the time. These are generally not areas where new CS majors will be working professionally, but those with the goal of becoming a senior developer should learn assembly language.
I find that after a bit of practice with the instruction set, assembly language programming is fun because with only a (relatively) small set of atomic building blocks, the registers and the instructions, the programmer builds a solution. No scouring StackOverflow, no reading the latest Pragmatic Press book on a new framework, and no searching through packages and standard libraries for the proper use of some zipped iterators.
I have a few tips for autodidacts setting out to learn assembly language. First, take a look at the article. It's a very nice introduction to assembly language. Second, pick some simple projects. Don't sit down to write a chess program in ASM. Third, learn how to use a good debugger like GDB or DDD; being able to single step through your assembly language and seeing what is going on is essential.
My forth tip is the best. Read and do exercises from R. Bryant and D. O'Hallaron's book Computer Systems: A Programmer's Perspective 3rd Edition, see [1].This is on my short list of best books for CS students. Unfortunately, this book is $150 new, but you should be able to buy a good condition used copy from a university bookstore. I would never sell my own copy, it is just great. This book, and the required course for CS and ECS majors at CMU it was written for, is the inspiration for the notorious and wonderful Binary Bomb Lab exercise.
To be clear, R. Bryant and D. O'Hallaron is not a book about assembly language programming. It is a book about computer systems from the programmer's perspective and specifically from a C system programmers perspective. This requires a realistic understanding of assembly language. If there was just one book that I would like my hires to be familiar with it would be this book. The book covers so many important systems programming concepts that it is hard to summarize (unfortunately, I'm not at home with my copy) but some of the the subjects are bit representations, hardware architecture, assemblers, compilers and linkers, buffer overflow vulnerabilities, calling conventions, network programming, concurrency, shells (fork, signals, process control), the impact of hardware memory hierarchy on performance, and virtual memory implementations.
[1] https://www.amazon.com/Computer-Systems-Programmers-Perspect...?
blame-troi
I’m a long time mainframe assembly language programmer and I appreciate good instruction set architecture. Intel is ubiquitous but it’s a messy architecture these days. There’s little need for doing assembly anymore beyond personal enrichment and enjoyment, and for that I recommend ARM.
Tepix
Love the NASM syntax, never got used to the Intel style.
oso2k
NASM follows Intel style though it deviated from MASM and TASM for certain things. I think you meant GAS & AT&T style.
Tepix
Yes i got it completely wrong. It's been too long.
wk_end
Isn't NASM syntax basically Intel-style? Whereas GAS uses AT&T-style?
hsbauauvhabzb
As an outsider, there are multiple syntax/languages and architectures right? As a beginner, what’s best to learn? I’m scared of getting stuck in a situation where I know something we’ll but I’m unable to apply it to the space I want to understand ( RE/binary exploitation on x86/64)
junon
Syntax and ISAs (instruction set architectures) are mutually exclusive.
GAS (GNU's Syntax) for example uses `mov src,dest`, whereas Intel Syntax uses `mov dest,src`. It still assembles down to the exact same instructions.
The problem with GAS is that it's verbose and almost arcane. It requires a lot of extra notation to indicate what you want to do when it's not entirely necessary. There are a lot of quirks and a number of footguns, too.
Intel syntax, on the other hand, is much more human friendly without sacrificing anything. Definitely start with Intel syntax.
The assembler itself might also have a preprocessor included, which allows you to write macros and stuff to automate tedious things or to DRY up code. Just a matter of checking the docs.
Lastly, ISAs are the actual instructions for a CPU - e.g. x86, AMD64, ARM, etc. Every assembler supports one or more of these of course. The syntax is irrelevant.
Hope that helps a bit. NASM is a great way to start.
hsbauauvhabzb
Thanks, it is. I really hope 2022 is the year that I finally pick up assembly, currently it’s a time thing more than a motivation thing but I’m pretty optimistic I’ll get there :)
NtGuy25
If you're trying to enter the field I would highly suggest ARM. Mobile malware is the wild west and they really need analysts and hire a bunch of people with no experience out of college. X86/64 is alot harder to enter the field in since theres alot more saturation so the barrier to entry is higher.
WJW
Learning assembler is on my to-do list for 2022, so this is super timely! Not that I expect to use it much, but it will be interesting to learn after slowly approaching the topic from both sides with the little hobby VMs from Advent of Code on the one side and electronics courses on the other.
tarkin2
Why do people use NASM? Faster binaries? When is it needed? Where do the binaries run?
sigjuice
Portions of a program or library might be written in assembly for performance. This list should give an idea of the type of software that does this
$ cd /opt/homebrew/Library/Taps/homebrew/homebrew-core/Formula
$ grep depends_on.*nasm * | sed 's,:.*,,' | xargs grep desc
dav1d.rb: desc "AV1 decoder targeted to be small and fast"
dssim.rb: desc "RGBA Structural Similarity Rust implementation"
ffmpeg.rb: desc "Play, record, convert, and stream audio and video"
handbrake.rb: desc "Open-source video transcoder available for Linux, Mac, and Windows"
isa-l.rb: desc "Intelligent Storage Acceleration Library"
jpeg-turbo.rb: desc "JPEG image codec that aids compression and decompression"
libass.rb: desc "Subtitle renderer for the ASS/SSA subtitle format"
libavif.rb: desc "Library for encoding and decoding .avif files"
libvmaf.rb: desc "Perceptual video quality assessment based on multi-method fusion"
lrzip.rb: desc "Compression program with a very high compression ratio"
mozjpeg.rb: desc "Improved JPEG encoder"
mvtools.rb: desc "Filters for motion estimation and compensation"
openh264.rb: desc "H.264 codec from Cisco"
openj9.rb: desc "High performance, scalable, Java virtual machine"
pce.rb: desc "PC emulator"
ppsspp.rb: desc "PlayStation Portable emulator"
rav1e.rb: desc "Fastest and safest AV1 video encoder"
vapoursynth.rb: desc "Video processing framework with simplicity in mind"
x264.rb: desc "H.264/AVC encoder"
x265.rb: desc "H.265/HEVC encoderthe_only_law
Interesting how many of these are related to video encoding. While I can understand wanting to micro optimize performance for this sort of work when done in the CPU, I would think that if you truly wanted maximum performance, you’d offload this to something like a GPU or other hardware offering this feature.
Emulators make sense I suppose.
I’ve only ever seen modern assembly in the wild in small parts, in say a kernel or something like coreboot. I’ve been told by graphics programmers and game devs that’s its not uncommon to hand roll assembly code in those domains, which was suprising to me
totorovirus
I am truly impressed by your command line skills. Awesome
yjftsjthsd-h
I'm about to use it for really early OS development. You kind of need assembly for some bits in early boot / setup, and nasm seems friendlier to me than the alternatives.
sharikous
More control on the machine. For example you may want to use certain CPU instructions, or as you mentioned you can eliminate a lot of overhead in certain cases.
Look at it like melting your own metal piece. Sometimes the screws that you find in the shop are not the exact shape you need and it's better to build your own.
I bet most of the things you code with Nasm are not entire binaries. You can write functions entirely in assembly and link the compiled outputs to any binary file once you know the calling convention
sebow
Better documentation (I'm pretty sure the latest official documentation from gnu/as is from like 2009), packages on all distros & *nix flavors: linux, deb/rpm-based, freebsd, etc. And also why not a lot of people use it.I think gnu/as has a bit better tooling,more flags, etc., but that's quickly changing.
Koshkin
Go to Java for the climate, NASM for the company. (Or was it the other way around?) In any case, even Windows programming in NASM is easy!
chrisseaton
> Why do people use NASM? Faster binaries?
There's nothing NASM can do to produce faster binaries than any other assembler - as it doesn't control which instructions to generate - they're specified entirely by the user.
glandium
Actually, nasm does have optimization passes. https://nasm.us/doc/nasmdoc2.html#section-2.1.24 That's not full of details, but one thing that it does on x86_64, for example is replace instructions that set 64-bits registers to immediate values smaller than 2^32 with instructions that set the equivalent 32-bits register (because that's a shorter equivalent, because those instructions zero-extend the value).
pjmlp
That is common in most Assemblers, RISC ones like MIPS even go further with pseudo instructions to work around how low level RISC opcodes tend to be.
rackjack
Are there any "major" assembly languages, or are they just used as needed for the platform?
erosenbe0
A family is typically called an "Instruction Set Architecture"
Main ones in development today would be:
x86-64 / AMD64 / Intel Architecture - modern Intel and AMD processors
i386 / x86 / IA-32 - older 32-bit Intel still widely in use
aarch64 - 64-bit ARM in nearly all phones and new Macs
aarch32 / ARM32 - 32-bit ARM in majority of 32-bit microcontrollers and older phones
RISC-V comes in 32 and 64 bit as well and is popular with startups and education due to its relative lack of legal/IP encumbrances
6502 is a classic vintage processor used in original Apple, Nintendo, Commodore 64, and early radiation hardened medical devices. It is resurgent with vintage enthusiasts, tinkerers and YouTubers due to its noted cycle for cycle advantage over other early microprocessors.
IBM 360/370/390 and z-architecture is surely running banks somewhere but you won't find much on that on HN
chrisseaton
> Intel Architecture - modern Intel and AMD processors
Note that 64-bit 'Intel Architecture' isn't AMD64, it's Itanium.
But these aren't assembly languages anyway - they're instruction set architectures. Assembly languages include AT&T and Intel.
erosenbe0
Agree. IA-64 or IA64 is the short form for Itanium not any form of x86.
Assembly languages: AT&T form starting on early Unix PDP/VAX systems and later SVR4 Unix as well as gas assemblers.
Intel form derived from 8080/5/6/8 and 8051 line as well as Zilog z80 line including assemblers such as Microsoft MASM.
IBM would have a couple assembler products for their mainframe lines likely supporting code written as far back as 1965.
Some obsoleted assemblies were a bit in-between as I recall Mac 68k assemblers being neither one nor the other.
Several embedded platforms have idiosyncratic assembly variants as well.
brandmeyer
It seems like every hardware vendor (and sometimes each compiler) provides its own syntax for assembly code. Leaving aside the obvious fact that each ISA has its own instructions, nearly everything else varies from one assembler to the next.
Are clobbered registers the left-most or right-most argument? How are comments expressed? Are operands separated by space alone or with commas? How are jump target labels expressed? How are addressing modes expressed? Go source diving in the FreeRTOS source code and you'll see that even when several different compilers support the same architecture (eg ARMv7-M), they will all have slightly incompatible assembly code syntax.
duped
Assembly languages aren't even defined for the platform, they're defined for the compilers (more technically, the assemblers). There aren't even standard assemblers, almost every compiler will roll their own (including VMs and JITs).
Intel vs AT&T syntax just standardize a few conventions, AT&T requiring more explicitness (and of course, flipping the position of source/destination operands).
analognoise
"The only interface a programmer has above the actual hardware is the kernel itself."
Laughs in bare metal.
chrisseaton
> Laughs in bare metal.
But that's what they said. If you aren't using bare metal, then the only other interface you have directly above that is the kernel.
tux3
Well, there's still plenty of systems you may interface with that sit somewhere between the kernel and the metal!
UEFI software, SMM code, various other pieces of firmware, your cloud provider's hypervisor, the Intel ME, ..
All those things send machine code to the CPU, which is really where bare metal starts. Everything that happens after is hardcoded in silicon (ignoring ucode!)
Koshkin
> ignoring ucode
Which you can’t anymore, because in modern CISC CPUs it is no longer microcode in the original sense but rather machine code for an internal RISC processor. Also, some hardware parts are built using FPGAs. (Basically, we might as well call the Java VM “bare metal.”)
Koshkin
The thing is, the metal has not been bare for a long time now.
amir734jj
What is the difference between NASM and MIPS?
Narishma
One is an assembler, the other an instruction set.
1ark
Can also recommend the rewrite of NASM, YASM. https://yasm.tortall.net/
shmerl
No active development it seems: https://github.com/yasm/yasm/commits/master
Compare to: https://github.com/netwide-assembler/nasm/commits/master
junon
YASM was the best at one point but it's development stunted a while back. NASM is currently the most prolific one.
Get the top HN stories in your inbox every day.
This is really good. But make sure that you read the NASM documentation(It's really good) https://www.nasm.us/xdoc/2.15.05/html/nasmdoc0.html .
Specifically my main gripe with this is the fact that X64 code changes alot of what this is assuming and can lead into ALOT of pitfalls. So make sure you read https://www.nasm.us/xdoc/2.15.05/html/nasmdo12.html (The x64 bit programming section) if you do follow this guide.