I've had some problems writing to the SD card so I decided to take a step back and try to get reading working first. The hard part is verifying what exactly the "data address" in the spec refers to, how many bits are sent on each data line, the timing, and the CRC algorithm. When trying to write, I was making a few assumptions about this. But now that I am reading, I can verify it for sure.
First, I booted into linux and created a 512 byte block of repeating bytes with the value of "0xF0". I then used "dd" to write this block onto the beginning of the SD card (thus wiping out the existing FAT partition information that was on there). This created a bit pattern at the beginning of the disk that I would be able to recognize when reading (due to the way the bits are streamed out when in wide bus mode, this would create an alternating pattern of 1's and 0's).
Next came the challenge of being able to see what the heck the SD card was spitting out. Trying to log in VHDL is a huge pain. I had ordered a logic analyzer but it hadn't arrived yet. Then I had an idea! Since I was in control of how fast the SD card's clock pulsed at, I could slow it down enough that I could use an AVR microcontroller to log the data and spit it out to a serial port.
After a few calculations, I determined I would need to crank the clock down to about 1000 Hz and set the AVR to transmit at 230400 bps over the serial port. I ended up using an ATMega 328p which worked great as I was able to run it in 3.3V mode.
Here is the resulting screen shot right after I send the read data command to the SD card:
As one can see, I am definitely getting the alternating pattern of 1's and 0's that I was hoping for. PHEW! This means that data address 0 does refer to the beginning of the SD card (which is what I was expecting but you never know for sure until you verify).
Also interesting is that the SD card starts sending data before it has finished responding to the read command. That's weird. But I guess the specs did imply that that was what would happen.
Next I need to verify how many bits are being sent on each data line (I am assuming 128* since there are 4 lines of which I am only capturing 2). Finally I will need to see what CRC it is sending back to me and check it against my own algorithm that I think is correct.
After that, I _should_ be able to write something, dangit! :)
* - EDIT: Well, I obviously wasn't thinking straight. The correct answer is 1024 bits on each data line. This is because 512 bytes are sent and there are 4 bits sent per clock. This should've been obvious *sheepish look*