Wednesday, June 26, 2013

VBI Injection v2: Soldering on SMT version of ATMega 328p

A new PCB arrived today.  This is the second attempt at VBI injection which uses a video multiplexer instead of an op-amp.  I soldered on the hardest component first.

Step one was just to get solder on all the pads, even if the pins are connected when they are not supposed to be.  Here you can see I schlopped a bunch of solder on and joined them all.


Step two was to use the solder wick to remove the connections.  The finished soldering job looks pretty pro if I do say so myself :)


Tomorrow, I hope to solder on the rest of the components.

I have to say, soldering on this ATMega 328p was a lot easier than I thought it was going to be.  The hardest part is getting the pins lined up to the pads before beginning.  I also did make the pads extra long to make my job easier.

Wednesday, June 19, 2013

SD card CRC algorithm explained in detail

Recently, someone asked me to explain the SD card's CRC algorithm.  I had forgotten a lot so I was not able to tell him much but I got curious and decided to document exactly how it works here on this blog post so I can refer to it later.

First of all, the official SD card documentation (in section 4.5 as of the time of this post) briefly describes the CRC algorithm and gives a few examples.  Unfortunately, its description assumes that the reader is a math major in a masters program at a prestigious university rather than provide a simple algorithm that would make sense to a novice software developer :)

The wikipedia article ( http://en.wikipedia.org/wiki/Cyclic_redundancy_check ) is a bit more friendly and gives a helpful example.

SD cards use two CRC algorithms: one is called CRC7, the other is called CRC16.  I will explain CRC7 here.  CRC16 is the same except its "polynomial" is different as is the amount of padding that the initial number must receive.

The polynomial for CRC7 is 0x89; the polynomial for CRC16 is 0x1021 which is based upon a standard called CRC-CCITT.  More about that here.

The CRC7 algorithm works like this:

- Pretend that the data buffer you want to compute CRC7 from is a really huge number (big endian).
- Take this number and shift it left the number of bits that the result will have, in this case 7. (this seems to be standard for most CRC algorithms)
- Find the first bit (starting from the left) that is a 1 in the number and XOR that with the polynomial (where the polynomial is shifted left enough times that its highest bit lines up with the 1 that you found, I will provide an example later)
- Repeat the last step until the original number of bits (before you shifted it left by 7) are all 0's.
- The CRC result will be in the bottom 7 bits at this point.

Here is an example of how to get the CRC7 result that the SD card specs say you should get.


Saturday, June 15, 2013

LDP sniffer board arrived

The LDP sniffer board arrived today!  Some initial tests with the multimeter show that it is wired properly.  I will have to do some more extensive tests later and also solder it together.  It shouldn't take too long.

Thursday, June 13, 2013

Injecting data into VBI section of NTSC signal

You think you're in the middle of a hurricane now?  It's time to... INJECT DATA INTO THE VBI SECTION!

First of all, if NTSC displays 60/1.001 fields per second (about 59.94), and it displays 262.5 lines per field, then this means that one line is displayed at a rate of (60*262.5)/1001 kHz, or 15.734 kHz.  Taking the reciprocal of this value gives us the period, 0.063555 milliseconds or 63.555 microseconds if I've done my math correctly.

According to the laserdisc standard, a picture number stored in the VBI will span 48 microseconds because each bit is 2 microseconds long and that includes a transition from black to white or white to black.  See https://www.daphne-emu.com:9443/mediawiki/index.php/VBIInfo for more information.

So my task is to figure out how many microseconds to delay after starting the proper line and then do a white/black transition every 1 microsecond as exact as I can.  I probably will need to completely unroll a loop in assembly language on the AVR to avoid drift.

Here is a screenshot from my scope showing the first 19 lines of an NTSC field (generated from the raspberry pi).


So what is the easiest way to find where line 19 starts?  Well, the LM1881 does have a pin called Composite Sync which may be able to help.


This shows the CSYNC signal.  As you may be able to see, the signal is weak during the vsync pulse but strong everywhere else (this actually surprises me).  My conclusion is that due to the relatively unreliable pulse the occurs during vsync, the CSYNC pulse cannot be relied upon exclusively to find where the proper line number starts.

My current approach is to wait for VSYNC (and/or FIELD transition) and then reset the AVR's cycle counter.  Then on every CSYNC pulse, check the elapsed cycle count to determine whether I am on the correct line.

This actually seems to work pretty well (so far).

I was able to successfully isolate line 11 and inject a little white bar into it.


Here is how it looks when captured in VirtualDub:


In short, this is very promising!  With some careful work I should be able to start putting in frame numbers on lines 17 and 18!

Wednesday, June 12, 2013

Successfully overlaid some white lines on video signal

Looks like the VBI injector is working!  Just as a test I tried to overlay some white lines at an arbitrary spot on the video signal and it worked!!



The next step is to hand craft some assembly language AVR code where the cycle count is precise so I can put the VBI info on the correct lines.

Video signal working on VBI Injector board!

It turns out I had the incoming video jack soldered on incorrectly (that's what I get for not reading the datasheet).  I removed it, and a few SMT resistors this morning to try to troubleshoot and found that absolutely no current was going through the video jack pins.  At first I thought I had a defective port, then realized it was simply that the jack had a shunt pin in addition to a signal pin.  Hehehehe.

I was hoping to post another picture in addition to some screenshots from my scope, but I ran out of time.  I soldered up the little arduino board both to my VBI injector board and to 5 places on the dexter board (RX1, TX1, CSYNC, VSYNC, and FIELD).  The gain on the VBI injector's amp possibly needs to be increased but my capture card is still decoding the picture properly so it is pretty close.

Assuming everything else works, all I need to do now is write the AVR program to inject the VBI.

Tuesday, June 11, 2013

Finished soldering but it doesn't work (yet)

I ran out of time to troubleshoot this, but the LM1881 is not generating any vsync pulses so it appears that the video signal is not making it to the LM1881 properly.  I was able to program the ATMega328P on the arduino mini and even get DebugWire working on it so that is a good thing.


Sunday, June 9, 2013

Almost done soldering the VBI Injector board!

The dirty gunk around the center of the board is all of the flux that I schlopped on it in order to solder on the tiny surface mount part that you can see in the lower middle.  This was my first time using flux.  It turns out that I didn't need to use the soldering wick at all.  I just schlopped on a bunch of flux and kept tapping the soldering iron to the bridges/shorts until they went away (as I saw demonstrated in a youtube video).  I also used a headset with a magnifying lens that I got off amazon which was pretty handy.  Oh, and I held it in place with a pair of tweezers that got from Sparkfun.  My wife held down the PCB as I worked.  I tested the pins with a multimeter and they all are going to the right place so I am pretty excited (and also a little surprised).  This makes me _very_ confident in trying to hand solder a Dexter rev3 board with all SMT parts on it (including the ATMega644P).

Yes, I too, can solder tiny surface mount parts! (with my wife's help)

VBI Injector board PCB v1 has arrived!



I look forward to soldering on the tiny surface mount part in the middle of the injector board!

Friday, June 7, 2013

Does the Raspberry Pi cut off the left and right sides when in TV-out mode?

The topic says it all. Does it, or doesn't it? I whipped up a quick test video (played a quick 15 seconds of Left 4 Dead in order to get a source that had no artifacts in it) and made some comparisons.

Original image rendered from the game and cropped by me:













Converted to .ldimg file, and rendered (using composite NTSC) by Raspberry Pi:
Now let's examine one place:

Here is the original.  Notice the border on the right side is dark green with a grey patch of a zombie's shirt.

Here is the pi version.  It's a little hard to tell, but the right side is completely black and the grey patch from the zombie's shirt is nowhere to be seen.


Conclusion?

The pi does not shrink a source 720x480 image but instead overwrites the left and right borders with black.  This means that discs that are captured in 720x480 should be re-encoded "as is" without getting rid of the black borders.

Differences in .ldimg encoding quality

When encoding .ldimg files, one can adjust the quality where 100 is maximum quality.  Here is the difference in file size depending on the quality:

QualitySize
752.1G
852.4G
902.9G
923.2G
954.2G

Conclusion: Q95 is probably the minimum quality one can go without seeing any JPEG artifacts at all. However, the file size is a little too big. Q90 therefore is probably the "sweet spot" for all Dexter .ldimg encodes because, while one can see occasional JPEG artifacts, it still looks very good.

Tuesday, June 4, 2013

Dragon's Lair 2 using Dexter + Raspberry Pi

This video has a surprise conclusion :)

More optimizations required

Well, I got the JPEG decoding speed running very fast.  However, once I add in some game-generated video overlay, performance drops from 100 FPS down to 85 FPS.  This was a bit of a red flag for me since OpenGL is supposed to be, well, fast!

So I am making a goal to get performance back up to over 95 FPS (or better) with the overlay active.  I've found some areas of potential optimization that I can do.

UPDATE: Success!

Sunday, June 2, 2013

JPEG optimization success!

Well, it looks like I've reached my goal earlier than I expected!

By making the change to decode JPEG directly to a GLES2 texture, performance skyrocketed.

Decoding from memory buffer and rendering on screen:
100 FPS (up from 62 FPS) <-- GOAL REACHED

Only decoding JPEG:
140 FPS (up from 103 FPS)

Saturday, June 1, 2013

The first JPEG decoding optimization yields modest gains

I optimized the way that I pass the raw JPEG to the OpenMAX API to decode and I've got some modest gains already:

Decoding from memory buffer and rendering on screen:
66 FPS (up from 62 FPS)

Only decoding JPEG:
107 FPS (up from 103 FPS)

Yes, a 4 field per second boost!  Every little bit helps!