Why xor eax, eax?

561 points by hasheddan 2 days ago

jgrahamc 2 days ago

In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine.

Yeah, sadly the 6502 didn't allow you to do EOR A; while the Z80 did allow XOR A. If I remember correctly XOR A was AF and LD A, 0 was 3E 01[1]. So saved a whole byte! And I think the XOR was 3 clock cycles fast than the LD. So less space taken up by the instruction and faster.

I have a very distinct memory in my first job (writing x86 assembly) of the CEO walking up behind my desk and pointing out that I'd done MOV AX, 0 when I could have done XOR AX, AX.

[1] 3E 00

Reply View 48 replies

wavemode 2 days ago

> CEO walking up behind my desk and pointing out that I'd done MOV AX, 0 when I could have done XOR AX, AX
Now that's what I call micromanagement.
(sorry couldn't resist)

Reply View | 17 replies
- xigoi 2 days ago
  
  The real joke is that a CEO had actual technical knowledge instead of just being there for decoration.
  
  Reply View | 0 replies
- jgrahamc 2 days ago
  
  He was right though. We were memory and cycle constrained and I'd wasted both!
  
  Reply View | 0 replies
- mkornaukhov 2 days ago
  
  Similarly, the CEO couldn't resist the outstanding optimization of memory and execution speed!
  
  Reply View | 12 replies
  
  6510 2 days ago
  
  [flagged]
  
  Reply View | 11 replies
- crest 2 days ago
  
  I had to pad the code for alignment reasons. ;-)
  
  Reply View | 0 replies
- ksherlock 2 days ago
  
  I mean, he IS the Chief EORfficer
  
  Reply View | 0 replies
stevefan1999 2 days ago

> In my 6502 hacking days, the presence of an exclusive OR was a sure-fire indicator you’d either found the encryption part of the code, or some kind of sprite routine.
Correct. Most ciphers of that era should be Feistel cipher in the likes of DES/3DES, or even RC4 uses XOR too. Later AES/Rijndael, CRC and ECC (Elliptic Curve Cryptography) also make heavy use of XOR but in finite field terms which is based on modular arithmetic over GF(2), that effectively reduces to XOR (while in theory should be mod 2).

Reply View | 9 replies
- OhMeadhbh 2 days ago
  
  I was going to say "but RC4 and AES were published well after the 6502's heyday," but NESes were completely rocking it in '87 (and I'm told 65XX cores were used as the basis for several hard drive controllers of the era.) Alas, the closest I ever came to encryption on a (less than 32-bit system) was lucifer on an IBM channel controller in the forever-ago and debugging RC5 on an 8085.
  
  Reply View | 5 replies
  
  kjs3 2 days ago
  
  I'm told 65XX cores were used as the basis for several hard drive controllers of the era
  Western Design Center is still (apparently) making a profit at least in part licensing 6502 core IP for embedded stuff. There's probably a 6502 buried and unrecognized in all sorts of low-cost control applications laying around you.
  RC5 on an 8085
  Oof. Well played.
  
  Reply View | 4 replies
- ASalazarMX 2 days ago
  
  Reading cryptography was that advanced at that time, I'm even more surprised that the venerable Norton Utilities for MS-DOS required a password, that was simply XORed with some constant and embedded in the executables. If the reserved space was zeroes, it considered it a fresh install and demanded a new password.
  If it had been properly encrypted my young cracker self would have had no opportunity.
  
  Reply View | 0 replies
- stevefan1999 12 hours ago
  
  Self-correction: It is GF(2^8) and not GF(2), but GF(2^8) primitive operations (such as carryless multiplication) can be reduced into a bunch of table lookups and/or GF(2) operations, which is how to AES crypto accelerators are being done in hardware.
  
  Reply View | 0 replies
- Sesse__ a day ago
  
  Well, running in CTR mode is really common now, and that ends up XORing the generated keystream into the plaintext… (CTR mode is essentially converting block ciphers into stream ciphers, if you want to see it that way.)
  
  Reply View | 0 replies
vanderZwan 2 days ago

Hah, we commented on the exact same paragraph within a minute of each other! My memory agrees with your memory, although I think that should be 3E 00. Let me look that up:
https://jnz.dk/z80/ld_r_n.html
https://jnz.dk/z80/xor_r.html
Yep, if I'm reading this right that's 3E 00, since the second byte is the immediate value.
One difference between XOR and LD is that LD A, 0 does not affect flags, which sometimes mattered.

Reply View | 5 replies
- sfink 2 days ago
  
  What is this "LD A, 0" syntax? Is it a z80 thing?
  One of the random things burned into my memory for 6502 assembly is that LDA is $A9. I never separated the instruction from the register; it's not like they were general purpose. But that might be because I learned programming from the 2 books that came with my C64, a BASIC manual and a machine code reference manual, and that's how they did it.
  I learned assembly programming by reading through the list of supported instructions. That, and typing in games from Compute's Gazette and manually disassembling the DATA instructions to understand how they worked. Oh, and the zero-page reference.
  Good times.
  
  Reply View | 3 replies
  
  Narishma 2 days ago
  
  > One of the random things burned into my memory for 6502 assembly is that LDA is $A9. I never separated the instruction from the register; it's not like they were general purpose.
  You had LDA and LDX and LDY as separate instructions while the Z80 assembler had a single LD instruction with different operands. It's the same thing really.
  
  Reply View | 1 reply
  
  sfink 2 days ago
  
  Right, though the LD? and ST? instructions were kind of exceptions. You could only do arithmetic and stack and bitwise ops (and, or, eor, shift, rotate) with A, never X nor Y. Increment and decrement were X/Y only. You couldn't even add two registers together without stashing one in memory.
  
  Reply View | 0 replies
  
  vanderZwan a day ago
  
  > What is this "LD A, 0" syntax? Is it a z80 thing?
  Well, I never wrote any 6502 so I can't compare, but yes, you could load immediate values into any register except the flag register on the Z80. Was that not a thing on the 6502?
  
  Reply View | 0 replies
- jgrahamc 2 days ago
  
  You're right. Of course, it's 3E 00. Not sure how I remembered 3E 01. My only excuse is that it was 40 years ago!
  
  Reply View | 0 replies
favorited 2 days ago

"Prefer `xor a` instead of `ld a, 0`" is basically the first optimization that you learn when doing SM83 assembly.
https://github.com/pret/pokecrystal/wiki/Optimizing-assembly...

Reply View | 0 replies
anonzzzies 2 days ago

3E 00 : I was on MSX and never had an assembler when you so I only remember the Hex, never actually knew the instructions; I wrote programs/games by data 3E,00,CD,etc without comments saying LD A as I never knew those at the time.

Reply View | 8 replies
- unnah 2 days ago
  
  Umm... how did you manage to learn those hex codes? You just read a lot of machine code and it started to make sense?
  
  Reply View | 7 replies
  
  jgrahamc 2 days ago
  
  I started out writing machine code without an assembler and so had to hand assemble a lot of stuff. After a while you end up just knowing the common codes and can write your program directly. This was also useful because it was possible to write or modify programs directly through an interface sometimes called a "front panel" where you could change individual bytes in memory.
  Back in 1985 I did some hand-coding like this because I didn't have access to an assembler: https://blog.jgc.org/2013/04/how-i-coded-in-1985.html and I typed the whole program in through the keypad.
  
  Reply View | 1 reply
  
  stevekemp 2 days ago
  
  Same here. On/For the ZX Spectrum, looking up the hex-codes in the back of the orange book. At least it was spiral-bound to make it easier.
  Later still I'd be patching binaries to ensure their serial-checks passed, on Intel.
  
  Reply View | 0 replies
  
  af78 2 days ago
  
  I had a similar experience of writing machine code for Z80-based computers (Amstrad CPC) in the 90's, as a teenager. I didn't have an assembler so I manually converted mnemonics to hex. I still remember a few opcodes: CD for CALL, C9 for RET, 01 for LD BC, 21 for LD HL... Needless to say, the process was tedious and error-prone. Calculating relative jumps was a pain. So was keeping track of offsets and addresses of variables and jump targets. I tended to insert nops to avoid having to recalculate everything in case I needed to modify some code... I can't say I miss these times.
  I'm quite sure none of my friends knew any CPU opcode; however, people usually remembered a few phone numbers.
  
  Reply View | 0 replies
  
  kragen 2 days ago
  
  The instruction sets were a lot simpler at the time. The 8080 instruction set listing is only a few pages, and some of that is instructions you rarely use like RRC and DAA. The operand fields are always in the same place. My own summary of the instruction set is at https://dercuano.github.io/notes/8080-opcode-map.html#addtoc....
  
  Reply View | 0 replies
  
  senderista 2 days ago
  
  It wasn't unusual in the 80s to type in machine code listings to a PC; I remember doing this as an 8-year-old from magazines, but I didn't understand any of the stuff I was typing in.
  
  Reply View | 0 replies
  
  anonzzzies 2 days ago
  
  Typing from mags, getting interested in how the magic works by learning to use a hex monitor and trying out things. I was a kid so time enough.
  I didn't know you could do it differently for years after I started.
  
  Reply View | 0 replies
  
  amirhirsch 2 days ago
  
  I implemented a PDP-11 in 2007-10 and I can still read PDP-11 Octal
  
  Reply View | 0 replies
mmphosis 2 days ago
Try to keep the value 0 in the Y register.
echo tya|asm|mondump -r|6502 A=AA X=00 Y=00 S=00 P=22 PC=0300 0 0300- 98 TYA A=00 X=00 Y=00 S=00 P=22 PC=0301 2
Reply View | 2 replies
- brucehoult 2 days ago
  
  That's 1 byte smaller than `LDA #0`, but not faster. And you don't have enough registers to waste them -- being able to do `STZ` and the `(zp)` addressing mode without having to keep 0 in Z or Y were small but soooo convenient things in the 65C02.
  
  Reply View | 1 reply
  
  snvzz a day ago
  
  You might like the PC Engine, a game console based on the 65C02*.
  *Actually a custom chip also containing some peripherals.
  
  Reply View | 0 replies
user3939382 a day ago

I’m building a new 6502 machine

Reply View | 0 replies

daeken 2 days ago

Back in 2005 or 2006, I was working at a little startup with "DVD Jon" Johansen and we'd have Quake 3 tournaments to break up the monotony of reverse-engineering and juggling storage infrastructure. His name was always "xor eax,eax" and I always just had to laugh at the idea of getting zeroed out by someone with that name. (Which happened a lot -- I was good, but he was much better!)

Reply View 1 reply

VectorLock 2 days ago

I was there but never got in on the Quake 3 fun; mp3t**

Reply View | 0 replies

pansa2 2 days ago

> Unlike other partial register writes, when writing to an e register like eax, the architecture zeros the top 32 bits for free.

I’m familiar with 32-bit x86 assembly from writing it 10-20 years ago. So I was aware of the benefit of xor in general, but the above quote was new to me.

I don’t have any experience with 64-bit assembly - is there a guide anywhere that teaches 64-bit specifics like the above? Something like “x64 for those who know x86”?

Reply View 7 replies

sparkie 2 days ago

It's not only xor that does this, but most 32-bit operations zero-extend the result of the 64-bit register. AMD did this for backward compatibility. so existing programs would mostly continue working, unlike Intel's earlier attempt at 64-bits which was an entirely new design.
The reason `xor eax,eax` is preferred to `xor rax,rax` is due to how the instructions are encoded - it saves one byte which in turn reduces instruction cache usage.
When using 64-bit operations, a REX prefix is required on the instruction (byte 0x40..0x4F), which serves two purposes - the MSB of the low nybble (W) being set (ie, REX prefixes 0x48..0x4f) indicates a 64-bit operation, and the low 3 bits of low nybble allow using registers r8-r15 by providing an extra bit for the ModRM register field and the base and index fields in the SIB byte, as only 3-bits (8-registers) are provided by x86.
A recent addition, APX, adds an additional 16 registers (r16-r31), which need 2 additional bits. There's a REX2 prefix for this (0xD5 ...), which is a two byte prefix to the instruction. REX2 replaces the REX prefix when accessing r16-r31, still contains the W bit, but it also includes an `M0` bit, which says which of the two main opcode maps to use, which replaces the 0x0F prefix, so it has no additional cost over the REX prefix when accessing the second opcode map.

Reply View | 3 replies
- cesarb 2 days ago
  
  > It's not only xor that does this, but most 32-bit operations zero-extend the result of the 64-bit register. AMD did this for backward compatibility.
  It's not just that, zero-extending or sign-extending the result is also better for out-of-order implementations. If parts of the output register are preserved, the instruction needs an extra dependency on the original value.
  
  Reply View | 1 reply
  
  ychen306 2 days ago
  
  This. It's for renaming.
  
  Reply View | 0 replies
- nickelpro 2 days ago
  
  Except for `xchg eax, eax`, which was the canonical nop on x86. Because it was supposed to do nothing, having it zero out the top 32-bits of rax would be quite surprising. So it doesn't.
  Instead you need to use the multi-byte, general purpose encoding of `xchg` for `xchg eax, eax` to get the expected behavior.
  
  Reply View | 0 replies
veltas 2 days ago

Chapter 3 of volume 1, ctrl+f for "64-bit mode", has a lot of the essentials including e.g. the stuff about zeroing out the top half of the register.
https://www.intel.com/content/www/us/en/developer/articles/t...

Reply View | 1 reply
- huflungdung 2 days ago
  
  [dead]
  
  Reply View | 0 replies
matt_d 2 days ago

See https://github.com/MattPD/cpplinks/blob/master/assembly.x86.... - mostly focused on x86-64 (and some of the talks/tutorials offer pretty good overview)

Reply View | 0 replies

wildlogic 2 days ago

I learned this trick writing shellcode - the shellcode has to be null byte (0x00) free, or it will terminate and not progress past the null byte, since it is the string terminator. of course, when you xor something with itself, the result is zero. the byte code generated by the instruction xor eax, eax doesn't contain null bytes, whereas mov eax, 0 does.

Reply View 1 reply

anhldbk 2 days ago

Yes, it's one of my favorite trick also.

Reply View | 0 replies

eb0la 2 days ago

I remember a lot of code zeroing registrers, dating at least back from the IBM PC XT days (before the 80286).

If you decode the instruction, it makes sense to use XOR:

- mov ax, 0 - needs 4 bytes (66 b8 00 00) - xor ax,ax - needs 3 bytes (66 31 c0)

This extra byte in a machine with less than 1 Megabyte of memory did id matter.

In 386 processors it was also - mov eax,0 - needs 5 bytes (b8 00 00 00 00) - xor eax,eax - needs 2 bytes (31 c0)

Here Intel made the decision to use only 2 bytes. I bet this helps both the instruction decoder and (of course) saves more memory than the old 8086 instruction.

Reply View 22 replies

Sharlin 2 days ago

As the author says, a couple of extra bytes still matter, perhaps more than 20ish years ago. There are vast amounts of RAM, sure, but it's glacially slow, and there's only a few tens of kBs of L1 instruction cache.
Never mind the fact that, as the author also mentions, the xor idiom takes essentially zero cycles to execute because nothing actually happens besides assigning a new pre-zeroed physical register to the logical register name early on in the pipeline, after which the instruction is retired.

Reply View | 9 replies
- cogman10 2 days ago
  
  L1 instruction cache is backed by L2 and L3 caches.
  For the AMD 9950, we are talking about 1280kb of L1 (per core). 16MB of L2 (per core) and 64MB of L3 (shared, 128 if you have the X3D version).
  I won't say it doesn't matter, but it doesn't matter as much as it once did. CPU caches have gotten huge while the instructions remain the same size.
  The more important part, at this point, is it's idiomatic. That means hardware designers are much more likely to put in specialty logic to make sure it's fast. It's a common enough operation to deserve it's own special cases. You can fit a lot of 8 byte instructions into 1280kb of memory. And as it turns out, it's pretty common for applications to spend a lot of their time in small chunks of instructions. The slow part of a lot of code will be that `for loop` with the 30 AVX instructions doing magic. That's why you'll often see compilers burn `NOP` instructions to align a loop. That's to avoid splitting a cache line.
  
  Reply View | 6 replies
  
  Sharlin 2 days ago
  
  > For the AMD 9950, we are talking about 1280kb of L1 (per core). 16MB of L2 (per core)
  Ryzen 9 CPUs have 1280kB of L1 in total. 80kB (48+32) per core, and the 9 series is the first in the entire history of Ryzens to have some other number than 64 (32+32) kilobytes of L1 per core. The 16MB L2 figure is also total. 1MB per core, same as the 7 series. AMD obviously touts the total, not per-core, amounts in their marketing materials because it looks more impressive.
  
  Reply View | 3 replies
  
  [removed] 2 days ago
  
  [deleted]
  
  Reply View | 0 replies
  
  gpderetta 2 days ago
  
  Instruction caches also prefetch very well, as long as branch prediction is good. Of course on a misprediction you might also suffer a cache miss in addition to the normal penalty.
  
  Reply View | 0 replies
- umanwizard 2 days ago
  
  > nothing actually happens besides assigning a new pre-zeroed physical register to the logical register name early on in the pipeline, after which the instruction is retired.
  This is slightly inaccurate -- instructions retire in order, so it doesn't necessarily retire immediately after it's decoded and the new zeroed register is assigned. It has to sit in the reorder buffer waiting until all the instructions ahead of it are retired as well.
  Thus in workloads where reorder buffer size is a bottleneck, it could contribute to that. However I doubt this describes most workloads.
  
  Reply View | 1 reply
  
  Sharlin 2 days ago
  
  Thanks, that makes sense.
  
  Reply View | 0 replies
vardump 2 days ago

> - mov ax, 0 - needs 4 bytes (66 b8 00 00) - xor ax,ax - needs 3 bytes (66 31 c0)
You don't need operand size prefix 0x66 when running 16 bit code in Real Mode. So "mov ax, 0" is 3 bytes and "xor ax, ax" is just 2 bytes.

Reply View | 1 reply
- eb0la 2 days ago
  
  My fault: I just compiled the instruction with an assembler instead of looking up the actual instruction from documentation.
  It makes much more sense: resetting ax, and bc (xor ax,ax ; xor bx,bx) will be 4 octets, DWORD aligned, and a bit faster to fetch by the x86 than the 3-octet version I wrote before.
  
  Reply View | 0 replies
Someone 2 days ago

> If you decode the instruction, it makes sense to use XOR:
> - mov ax, 0 - needs 4 bytes (66 b8 00 00) - xor ax,ax - needs 3 bytes (66 31 c0)
Except, apparently, on the pentium Pro, according to this comment: https://randomascii.wordpress.com/2012/12/29/the-surprising-..., which says:
“But there was at least one out-of-order design that did not recognize xor reg, reg as a special case: the Pentium Pro. The Intel Optimization manuals for the Pentium Pro recommended “mov” to zero a register.”

Reply View | 2 replies
- qingcharles 2 days ago
  
  That's weird, I looked it up earlier and found the P6 (Pentium Pro) was the first to actually make the xor optimization into a zero clock operation.
  https://fanael.github.io/archives/topic-microarchitecture-ar...
  
  Reply View | 1 reply
  
  Someone a day ago
  
  A few paragraphs down from that:
  “I assume that the ability to recognize that the exclusive-or zeroing idiom doesn't really depend on the previous value of a register, so that it can be dispatched immediately without waiting for the old value — thus breaking the dependency chain — met the same fate; the Pentium Pro shipped without it.
  Some of the cut features were introduced in later models: segment register renaming, for example, was added back in the Pentium II. Maybe dependency-breaking zeroing XOR was added in later P6 models too? After all, it seems such a simple yet important thing, and indeed, I remember seeing people claim that's the case in some old forum posts and mailing list messages. On the other hand, some sources, such as Agner Fog's optimization manuals say that not only it was never present in any of the P6 processors, it was also missing in Pentium M.”
  
  Reply View | 0 replies
Anarch157a 2 days ago

I don't know enough of the 8086 so I don't know if this works the same, but on the Z80 (which means it was probably true for the 8080 too), XOR A would also clear pretty much all bits on the flag register, meaning the flags would be in a known state before doing something that could affect them.

Reply View | 1 reply
- vanderZwan 2 days ago
  
  Which I guess is the same reason why modern Intel CPU pipelines can rely on it for pipelining.
  
  Reply View | 0 replies
RHSeeger 2 days ago

> the IBM PC XT days (before the 80286)
Fun fact - the IBM PC XT also came in a 286 model (the XT 286).

Reply View | 2 replies
- eb0la 2 days ago
  
  You're right. I forgot that!
  
  Reply View | 1 reply
  
  RHSeeger 2 days ago
  
  To be fair, I only remember because that was the 2nd computer I owned.
  
  Reply View | 0 replies
chasd00 2 days ago

> - mov ax, 0 - needs 4 bytes (66 b8 00 00) - xor ax,ax - needs 3 bytes (66 31 c0)
iirc doesn't word alignment matter? I have no idea if this is how the IBM PC XT was aligned but if you had 4 byte words then it doesn't matter if you save a byte with xor because you wouldn't be able to use it for anything else anyway. again, iirc.

Reply View | 1 reply
- Narishma a day ago
  
  No, the 8088 used in the PC has a 2 byte word size. More importantly, it only has an 8-bit data bus, so alignment didn't really matter because it fetched instructions one byte at a time.
  
  Reply View | 0 replies

fooker 2 days ago

It's funny how machine code is a high level language nowadays, for this example the CPU recognizes the zeroing pattern and does something quite a bit different.

Reply View 8 replies

dheatov 2 days ago

It's really impressive how powerful and efficient it has become. However, I find it so much more difficult to build mental model of it. I've been struggling with atomic and r/w barrier as there are sooo many ways the instructions could've been executed (or not executed!).

Reply View | 1 reply
- fooker 2 days ago
  
  It's a consequence of keeping our general purpose single threaded programming model the same for five decades.
  It has it's merits, but the underlying hardware has changed.
  Intel tried to push this responsibility to the compiler with Itanium but that failed catastrophicically, so we're back to the CPU pretending it's 1985.
  
  Reply View | 0 replies
Reubensson 2 days ago

What do you mean that cpu does something different? Isnt cpu doing what is being asked, that being xor with consequence of zeroing when given two same values.

Reply View | 5 replies
- IsTom 2 days ago
  
  I think OP means that it has come a long way from the simple mental model of µops being a direct execution of operations and with all the register renamings and so on
  
  Reply View | 0 replies
- dooglius 2 days ago
  
  FTA:
  > And, having done that it removes the operation from the execution queue - that is the xor takes zero execution cycles!1 It’s essentially optimised out by the CPU
  
  Reply View | 0 replies
- 12_throw_away 2 days ago
  
  > with consequence of zeroing when given two same values
  Right, it has the same consequence, but it doesn't actually perform the stated operation. ASM is just a now just a high level language that tells the computer to "please give me the same state that a PDP-11-like computer would give me upon executing these instructions."
  
  Reply View | 0 replies
- fooker 2 days ago
  
  Same consequence yes.
  But it will not execute xor, nor will it actually zero out eax in most cases.
  It'll do something similar to constant propagation with the information that whenever xor eax, eax occurs; all uses of eax go through a simpler execution path until eax is overwritten.
  
  Reply View | 0 replies
- horsawlarway 2 days ago
  
  No.
  It's emulating the zero result when it recognizes this pattern, usually by playing clever tricks with virtual registers.
  
  Reply View | 0 replies

timonoko 2 days ago

I had Atari Portfolio, sold without software, except basic command.com. I could only copy raw ASCII bytes from keyboard to file. When it is in COM-file, it is directly executable. What is the simple program I made from ASCII-bytes to read bytes in hexadecimal. I thought that for a week, but finally solved it.

@grok: This is a classic and brilliant puzzle from the early days of DOS. The challenge is to write a self-displaying hex dumper (.COM file) using only the copy con command, which means every byte of the program must correspond to a standard, typeable ASCII character. This is a form of code-golfing that relies on a deep understanding of 8086 machine code.

The solution is a clever piece of self-modifying code. Since many essential 8086 opcodes (like MOV, INT, RET) don't have printable ASCII equivalents, the program must construct these instructions in memory and then execute them.

Here is a well-known 128-byte solution. I will first show you the ASCII characters you would type, and then explain how it works.

The Program to Type

You would have started by typing copy con hex.com and then carefully typed the following 128 characters (including spaces and case-sensitive characters) and finally pressed Ctrl+Z to save the file.

   1 j!>1!QhH-@0X,jHj.Ph1PhD0Ph 0@h, j, h.0@h, h 0@h.H-@,jH1X,h.H.@,H1j j X,h.H.@,H1j j X.H-@H-@,jHj.Ph1PhE0Ph@0@h, j, h.0@h, h
     0@h.

How It Works: A High-Level Explanation

This program is a marvel of 8086 trickery. Here is a simplified breakdown of what's happening:

etc.etc

Reply View 2 replies

timonoko 2 days ago
My program was definitively shorter. I think I did not bother with real hexadecimals. Just used last four bytes of characters to make a full byte. Used it as a bootstrap program.
@grok: While your exact code is lost to time, it would have looked something like one of the ultra-small ASCII "dropper" programs that were once passed around. Here is a plausible 32-byte example of what the program you typed might have looked like.
You would have run copy con nibbler.com, typed the following line, and hit Ctrl+Z: `j%1!PZYfX0f1Xf1f1AYf1E_j%1!PZ` This looks like nonsense, but to the 8088/8086 processor, it's a dense set of instructions that does the following:
etc etc.
Reply View | 1 reply
- timonoko 2 days ago
  
  97% of these millenials of HN do not understand the problem and its brilliant solution. That is why I was truly astonished @grok grokked it rightaway.
  BTW. It is not beyond possibility that this nibbler or dropper was made by myself and published in Usenet by me myself in 1989. Who else would have such a problem.
  It was a bankcrupt sale and the machine was sold as "inactivated".
  
  Reply View | 0 replies

pclmulqdq 2 days ago

In modern CPUs, a lot of these are recognized as zeroing idioms and they end up doing the same thing (often a register renaming trick). Using the shortest one makes sense. If you use a really weird zeroing pattern, you can also see it as a backend uop while many of these zeroing idioms are elided by the frontend on some cores.