Friday, May 31, 2013

Optimizing JPEG decoder on Raspberry Pi

Months ago, I spent a lot of time getting the Raspberry Pi to perform fast JPEG decoding.

While it is fast, it is just barely not fast enough for what I want to do with Dexter so I am now going to spend another chunk of time optimizing it further.

Here are some current decoding speed statistics.  I used a high bitrate JPEG to stress it to its limit.

--------------------------------

Decoding from a memory buffer and rendering on the screen repeatedly:
62 fps (_fields_ per second)

When disabling the fragment shader that handles interlacing the video:
63 fps

Without transferring the decoded image to GLES2 but leaving everything else the same:
80 fps

Transferring the decoded image but not rendering it:
77 fps

Only decoding the JPEG, not transferring it or rendering it:
103 fps

-------------------------

Bottlenecks:
- Transferring the decoded image costs about 20 fps
- Rendering the image costs about 20 fps (but exercising the fragment shader only costs 1 fps so this is a good sign since I rely on it).  It's possible I could optimize the vertex shader to reduce rendering speed.

Conclusions:
- If I can decode JPEG directly to a texture then I don't have to transfer it in memory (this may boost up to 20 fps)
- Rendering performance may be running sub optimally
- I may be able to increase JPEG decoding speed by reducing some internal memcpy's I am doing right now and increasing OpenMAX input buffer size

Goal:
- Increase overall decoding/rendering speed from 62 fps to over 100 fps if I can manage it.  I think bare minimum I will need to get it over 80 fps to give me some breathing room.

Wednesday, May 29, 2013

Laserdisc frame number (VBI) injector board


Not to rest on our laurels, Warren and I have been hard at work at solving the problem of making Dexter replace the PR-8210 and PR-8210A for essential games such as Cliff Hanger.

The problem: games that used the PR-8210 had to read the laserdisc frame number from the video signal itself.  The frame number was encoded during the Vertical Blanking Interval.

Our proposed solution: Using the LM1881 chip to detect when the VBI starts, we are going to basically generate frame numbers in real-time using an auxilliary AVR microcontroller, some resistors and capacitors, and an "op-amp".

This prototype will hook up to an AVR ATMega 328p (which itself will be located on a mini arduino board).  This in turn will talk to the current Dexter rev2 board via TX/RX serial lines and also the LM1881 lines which are already on the Dexter rev2 board.

Assuming we are successful, we will put all of this on the final Dexter board.

In summary, this prototype board is completely awesome!

The ultimate laserdisc player interface sniffer

In our quest to make Dexter as accurate as possible, Warren and I have designed a PCB to make it easy to sniff laserdisc I/O.  Behold!


It will start getting fabricated tomorrow and I should have it in my hands in about 15 days.

It can sniff 25-pin, 24-pin, AND the 40-pin Cube Quest connectors.  In other words, pretty much every laserdisc player interface out there except for the PR-8210 one but we can make that one by hand pretty easily anyway.

Wednesday, May 22, 2013

My "ultimate" pseudo-veggie shake


Ok, here is my fine-tuned pseudo-veggie shake recipe (I say pseudo because it is mostly fruit hehehe).  But it tastes great!


- 1 banana (120 g)
- 1 scoop (36 g) of CytoSport chocolate protein powder (you can get a huge bag of this at Costco)
- 50 g of spinach leaves
- 100 g of fresh strawberries
- A little bit of water (like half of a cup) so it will blend


Blend that all together and consume.  Tastes great, you can hardly taste the spinach due to the strawberries and banana.

Nutrition facts:
289 calories
30 g protein
40 g carbs
2 g fat

Conclusion: tasty and healthy :)

Friday, May 17, 2013

LDP-1450 accurate latencies

Ok, as "promised", here are more accurate latency results from sending a wide range of commands to the LDP-1450.


-> 67
<- 80 (3.373208 ms latency)
<- 0 (4.357307 ms latency)
<- 10 (5.499495 ms latency)
<- 0 (6.485570 ms latency)
<- 1 (7.471316 ms latency)
-> 56
<- a (3.495397 ms latency)
-> 41
<- a (3.493421 ms latency)
-> 3f
<- a (3.364645 ms latency)
-> 51
<- a (3.039905 ms latency)
-> 43
<- a (3.705852 ms latency)
-> 30
<- a (3.709145 ms latency)
-> 30
<- a (3.660072 ms latency)
-> 30
<- a (3.527344 ms latency)
-> 30
<- a (3.491774 ms latency)
-> 31
<- a (3.390663 ms latency)
-> 40
<- a (3.530967 ms latency)
<- 1 (539.072661 ms latency)
-> 3a
<- a (3.681150 ms latency)
-> 4f
<- a (4.550306 ms latency)
-> 3a
<- a (3.925529 ms latency)
-> 80
<- a (3.112362 ms latency)
-> 0
<- a (3.596178 ms latency)
-> 0
<- a (3.556985 ms latency)
-> 0
<- a (2.892356 ms latency)
-> 0
<- a (3.356082 ms latency)
-> 80
<- a (3.281648 ms latency)
-> 2
<- a (3.511206 ms latency)
-> 0
<- a (3.713098 ms latency)
-> 81
<- a (3.738458 ms latency)
-> 80
<- a (3.574441 ms latency)
-> 1
<- a (3.658096 ms latency)
-> 0
<- a (3.708157 ms latency)
-> 3a
<- a (3.357399 ms latency)
-> 0
<- a (3.457851 ms latency)
-> 3f
<- a (3.262875 ms latency)
-> 3f
<- a (3.031013 ms latency)
-> 0
<- a (4.275299 ms latency)
-> 3f
<- a (3.811244 ms latency)
-> 3f
<- a (3.505936 ms latency)
-> 58
<- a (2.952298 ms latency)
-> 3f
<- a (3.203922 ms latency)
-> 3f
<- a (3.450935 ms latency)
-> 1a
<- a (3.264522 ms latency)
-> 60
<- 30 (3.444018 ms latency)
<- 30 (4.414285 ms latency)
<- 30 (5.409253 ms latency)
<- 30 (6.575483 ms latency)
<- 34 (7.694616 ms latency)
-> 60
<- 30 (3.528991 ms latency)
<- 30 (4.659980 ms latency)
<- 30 (5.786030 ms latency)
<- 30 (6.786267 ms latency)
<- 34 (7.771025 ms latency)
-> 60
<- 30 (3.416682 ms latency)
<- 30 (4.394853 ms latency)
<- 30 (5.550215 ms latency)
<- 30 (6.544853 ms latency)
<- 34 (7.667280 ms latency)
-> 60
<- 30 (3.763159 ms latency)
<- 30 (4.888549 ms latency)
<- 30 (5.886811 ms latency)
<- 30 (6.892647 ms latency)
<- 34 (7.876747 ms latency)
-> 60
<- 30 (3.121913 ms latency)
<- 30 (4.273981 ms latency)
<- 30 (5.235026 ms latency)
<- 30 (6.380178 ms latency)
<- 35 (7.394577 ms latency)
-> 60
<- 30 (3.266169 ms latency)
<- 30 (5.144125 ms latency)
<- 30 (6.146010 ms latency)
<- 30 (7.120887 ms latency)
<- 35 (8.266697 ms latency)
-> 60
<- 30 (3.746033 ms latency)
<- 30 (4.704113 ms latency)
<- 30 (5.875613 ms latency)
<- 30 (6.868605 ms latency)
<- 35 (7.844141 ms latency)
-> 60
<- 30 (3.259253 ms latency)
<- 30 (4.989989 ms latency)
<- 30 (6.025467 ms latency)
<- 30 (7.126486 ms latency)
<- 35 (8.107292 ms latency)
-> 60
<- 30 (4.134995 ms latency)
<- 30 (5.259398 ms latency)
<- 30 (6.250743 ms latency)
<- 30 (7.390955 ms latency)
<- 36 (8.355622 ms latency)
-> 60
<- 30 (3.297787 ms latency)
<- 30 (5.113496 ms latency)
<- 30 (6.050827 ms latency)
<- 30 (7.068850 ms latency)
<- 36 (8.166246 ms latency)
-> 60
<- 30 (3.666988 ms latency)
<- 30 (4.793038 ms latency)
<- 30 (5.810731 ms latency)
<- 30 (6.772105 ms latency)
<- 36 (7.937018 ms latency)
-> 60
<- 30 (3.270121 ms latency)
<- 30 (4.571385 ms latency)
<- 30 (5.493566 ms latency)
<- 30 (6.628508 ms latency)
<- 36 (7.651141 ms latency)
-> 60
<- 30 (3.243444 ms latency)
<- 30 (4.931365 ms latency)
<- 30 (5.927980 ms latency)
<- 30 (6.903845 ms latency)
<- 36 (8.052290 ms latency)
-> 60
<- 30 (3.025084 ms latency)
<- 30 (3.986788 ms latency)
<- 30 (5.479734 ms latency)
<- 30 (6.634107 ms latency)
<- 37 (7.628746 ms latency)
-> 60
<- 30 (3.613634 ms latency)
<- 30 (4.893819 ms latency)
<- 30 (5.892410 ms latency)
<- 30 (6.981243 ms latency)
<- 37 (8.023637 ms latency)
-> 60
<- 30 (3.489798 ms latency)
<- 30 (4.636926 ms latency)
<- 30 (6.012623 ms latency)
<- 30 (7.005614 ms latency)
<- 37 (8.164599 ms latency)
-> 60
<- 30 (3.698606 ms latency)
<- 30 (5.165204 ms latency)
<- 30 (6.315955 ms latency)
<- 30 (7.441675 ms latency)
<- 37 (8.440265 ms latency)
-> 60
<- 30 (3.509559 ms latency)
<- 30 (4.642525 ms latency)
<- 30 (5.877589 ms latency)
<- 30 (6.980254 ms latency)
<- 38 (7.978845 ms latency)
-> 60
<- 30 (3.158142 ms latency)
<- 30 (4.444914 ms latency)
<- 30 (5.439882 ms latency)
<- 30 (6.560662 ms latency)
<- 38 (7.562546 ms latency)
-> 60
<- 30 (3.484528 ms latency)
<- 30 (4.638902 ms latency)
<- 30 (5.766598 ms latency)
<- 30 (6.768482 ms latency)
<- 38 (7.761474 ms latency)
-> 60
<- 30 (3.521745 ms latency)
<- 30 (4.514737 ms latency)
<- 30 (5.759681 ms latency)
<- 30 (6.766177 ms latency)
<- 38 (7.761144 ms latency)
-> 60
<- 30 (3.642287 ms latency)
<- 30 (4.644501 ms latency)
<- 30 (5.770221 ms latency)
<- 30 (7.167984 ms latency)
<- 39 (8.256158 ms latency)
-> 60
<- 30 (3.523392 ms latency)
<- 30 (4.509138 ms latency)
<- 30 (5.601923 ms latency)
<- 30 (6.736535 ms latency)
<- 39 (7.748958 ms latency)
-> 60
<- 30 (3.021461 ms latency)
<- 30 (3.989752 ms latency)
<- 30 (5.146102 ms latency)
<- 30 (6.270175 ms latency)
<- 39 (7.226279 ms latency)
-> 60
<- 30 (3.283624 ms latency)
<- 30 (4.218980 ms latency)
<- 30 (5.300238 ms latency)
<- 30 (6.295205 ms latency)
<- 39 (7.419279 ms latency)
-> 60
<- 30 (3.112033 ms latency)
<- 30 (4.269700 ms latency)
<- 30 (5.268291 ms latency)
<- 31 (6.262929 ms latency)
<- 30 (7.392601 ms latency)
-> 60
<- 30 (2.966131 ms latency)
<- 30 (4.109965 ms latency)
<- 30 (5.127658 ms latency)
<- 31 (6.102865 ms latency)
<- 30 (7.251639 ms latency)
-> 60
<- 30 (3.390005 ms latency)
<- 30 (4.489706 ms latency)
<- 30 (5.351616 ms latency)
<- 31 (6.505660 ms latency)
<- 30 (7.490089 ms latency)
-> 60
<- 30 (3.264522 ms latency)
<- 30 (4.266406 ms latency)
<- 30 (5.273560 ms latency)
<- 31 (6.394340 ms latency)
<- 30 (7.389308 ms latency)
-> 60
<- 30 (3.246408 ms latency)
<- 30 (4.246645 ms latency)
<- 30 (5.392126 ms latency)
<- 31 (6.390388 ms latency)
<- 31 (7.516437 ms latency)

Observations:
It seems that most commands respond after about 3.5 ms (round trip).  The inquiry commands have a 1ms space between them.

Thursday, May 16, 2013

LDP-1450 latencies examined

I whipped up a quick program to test the LDP-1450 latency and I was surprised as it is nothing like what the programming manual said to expect!

WARNING: My program only will read in a byte every 1 ms so these values should not be considered "final". I will need to write a more accurate program and try again later. 


matt_sony: -> 0x67
matt_sony: <- 0x80 (4 ms latency)
matt_sony: <- 0x00 (6 ms latency)
matt_sony: <- 0x10 (7 ms latency)
matt_sony: <- 0x00 (8 ms latency)
matt_sony: <- 0x01 (9 ms latency)
matt_sony: -> 0x56
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x41
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x3F
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x51
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x43
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x30
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x30
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x30
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x30
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x31
matt_sony: <- 0x0A (4 ms latency)
matt_sony: -> 0x40
matt_sony: <- 0x0A (4 ms latency)
matt_sony: <- 0x01 (312 ms latency)
matt_sony: -> 0x3A
matt_sony: <- 0x0A (1 ms latency)
matt_sony: -> 0x4F
matt_sony: <- 0x0A (1 ms latency)
matt_sony: -> 0x3A
matt_sony: <- 0x0A (1 ms latency)
matt_sony: -> 0x60
matt_sony: <- 0x30 (1 ms latency)
matt_sony: <- 0x30 (2 ms latency)
matt_sony: <- 0x30 (3 ms latency)
matt_sony: <- 0x30 (4 ms latency)
matt_sony: <- 0x32 (5 ms latency)
matt_sony: -> 0x60
matt_sony: <- 0x30 (1 ms latency)
matt_sony: <- 0x30 (2 ms latency)
matt_sony: <- 0x30 (3 ms latency)
matt_sony: <- 0x30 (4 ms latency)
matt_sony: <- 0x34 (5 ms latency)
matt_sony: -> 0x60
matt_sony: <- 0x30 (1 ms latency)
matt_sony: <- 0x30 (2 ms latency)
matt_sony: <- 0x30 (3 ms latency)
matt_sony: <- 0x30 (4 ms latency)
matt_sony: <- 0x35 (5 ms latency)

Observations:
The play (3A) and pause (4F) commands respond instantly.
Most other commands including STATUS INQ (67), starting a search, and entering digits, seem to take about 4 ms round trip.
And perhaps the most puzzling, ADDRESS INQ (60) sometimes responds instantly and other time takes 4 ms to respond (not shown here).

Mad Dog McCree working with Dexter



Warren hooked up the latest Dexter firmware to his Mad Dog McCree hardware.

Dexter was running in LDP-1450 mode, which is the player used by Mad Dog.

I made a few test tweaks to Warren's firmware which was to make the latency between commands extra long to try to troubleshoot some issues.  I also made all seeks take 1 second.  You can see both reflected in this clip:
1 - any time you see the video pause for 1 second, it is waiting for a seek to finish
2 - any time you see video overrun, it is likely due to the extra long latency (there's also a chance it is due to Warren's fake VBI data)

I hope to get some time soon with my real LDP-1450 to do some precise timing of what the actual latency should be (the programming manual suggests that perhaps it should be 20 ms from reception of a command to the acknowledgement of the command).

Also, you will notice that there is a problem with the audio which makes it noticeably laggy.  I will have to look into this later after I get the other issues resolved.

At any rate, with just a few fixes, this game should be fully working with Dexter!

Wednesday, May 8, 2013

How are we supposed to get this dish washer out?

So yesterday we demolished (hehe) our counter tops in preparation to have new granite ones installed.  Along the way, I noticed that the previous owners had installed some tile on top of the old linoleum floor which effectively had "walled" the dish washer into place (they also did the same thing with our toilets).  So before we have the granite counter tops installed tomorrow, I just want to make sure that we are going to be able to get this dish washer out in case we want to replace it some day.  Does anyone with some experience know the typical way to install/remove dish washers?  It appears that the dish washer is designed to go behind the lip of the counter to keep it secure so does it have some way to reduce its height if one wants to remove it?  And if we do want to remove it, are we going to need to destroy some tile to get it out?



Saturday, May 4, 2013

Dragon's Lair 2 working with Dexter (including text overlay) !

I'm finally ready to release this update to the Dexter firmware!  To test Dragon's Lair 2, I modified Daphne to communicate with a real serial port (instead of an emulated one).  I then had the real serial port connected to the DB25 port on the Dexter board (as pictured).  Note that I am correctly using the null modem adapter!  


So Daphne is emulating Dragon's Lair 2 (except for the laserdisc player) and Dexter is emulating the laserdisc player.  It is working except for picture stop codes being ignored (which is currently unimplemented in Dexter but I guess I need to get wookin' on that!).

This will still need to be tested on real Dragon's Lair 2 hardware!

Found another LDP-1450 interpreter bug

Looks like there is another bug in my LDP-1450 interpreter:

I got Dragon's Lair 2 playing but then it dies at a "resurrection" scene, after you lose a life.  I looked at the serial I/O and here is what is happening:


Search to 2108 received

----------------------------


lair2: -> 80

lair2: <- 0A
lair2: -> 01
lair2: <- 0A
lair2: -> 14
lair2: <- 0A
lair2: -> 32
lair2: <- 0A
lair2: -> 20
lair2: <- 0A
lair2: -> 20
lair2: <- 0A
lair2: -> 20
lair2: <- 0A
lair2: -> 20
lair2: <- 0A
lair2: -> 4C
lair2: <- 0A
lair2: -> 49
lair2: <- 0A
lair2: -> 56
lair2: <- 0A
lair2: -> 45
lair2: <- 0A
lair2: -> 53
lair2: <- 0A
lair2: -> 1A
lair2: <- 0A

--------------------------


lair2: -> 60

lair2: <- 30
lair2: <- 32
lair2: <- 31
lair2: <- 30
lair2: <- 38

So it looks like the DL2 code sends a change to the video overlay while a seek is in progress.  And then my code never sends a seek complete code (0x01) so DL2 just sits there waiting and eventually reboots.

I should note that in the old Daphne LDP-1450 interpreter, this is not a problem and works.  However, I wrote a new one so that it would work properly with Dexter.  That's why I am having to make it work "all over again" with Dragon's Lair 2.

I probably will need to send this sequence of commands to my real LDP-1450 and see how it responds.  If I were to predict its behavior, I'd say that it probably sends the seek complete (0x01) result somewhere in the middle of the ACKnowledgement bytes (0x)A that it is sending in regards to the text overlay change command.

-----------------------------------

UPDATE:
Ok, I just sent these same bytes to my real LDP-1450 (with the Dragon's Lair 2 disc in the player, no less!) and here is the actual input and output:


matt_sony: -> 0x3A
matt_sony: <- 0x0A
matt_sony: -> 0x27
matt_sony: <- 0x0A
matt_sony: -> 0x24
matt_sony: <- 0x0A
matt_sony: -> 0x43
matt_sony: <- 0x0A
matt_sony: -> 0x30
matt_sony: <- 0x0A
matt_sony: -> 0x32
matt_sony: <- 0x0A
matt_sony: -> 0x31
matt_sony: <- 0x0A
matt_sony: -> 0x30
matt_sony: <- 0x0A
matt_sony: -> 0x38
matt_sony: <- 0x0A
matt_sony: -> 0x40
matt_sony: <- 0x0A
matt_sony: -> 0x80
matt_sony: <- 0x0A
matt_sony: -> 0x01
matt_sony: <- 0x0A
matt_sony: -> 0x14
matt_sony: <- 0x0A
matt_sony: -> 0x32
matt_sony: <- 0x0A
matt_sony: -> 0x20
matt_sony: <- 0x0A
matt_sony: -> 0x20
matt_sony: <- 0x0A
matt_sony: -> 0x20
matt_sony: <- 0x0A
matt_sony: -> 0x20
matt_sony: <- 0x0A
matt_sony: -> 0x4C
matt_sony: <- 0x0A
matt_sony: -> 0x49
matt_sony: <- 0x0A
matt_sony: -> 0x56
matt_sony: <- 0x0A
matt_sony: -> 0x45
matt_sony: <- 0x0A
matt_sony: -> 0x53
matt_sony: <- 0x0A
matt_sony: -> 0x1A
matt_sony: <- 0x0A
matt_sony: -> 0x60
matt_sony: <- 0x30
matt_sony: <- 0x30
matt_sony: <- 0x35
matt_sony: <- 0x30
matt_sony: <- 0x34
matt_sony: <- 0x01

So here are my conclusions:
- It is valid to query the current frame even in the middle of a seek (this surprised me).
- It is also valid to change text overlay in the middle of a seek.
- Seek complete (0x01) byte is sent at the very end.

Friday, May 3, 2013

Found my LDP-1450 interpreter bug

Seems like I have a bug in my LDP-1450 interpreter where I never get out of text overlay input mode once I enter it.  I'll need to write a unit test to prove this bug and then hopefully everything will work :)
Don't know if I'll be able to get to it tonight though.  Maybe tomorrow.

The quest for the ultimate veggie shake continues!

Today, I blended a banana, 1 scoop of chocolate protein powder, 85 grams of spinach and a minicucumber together.  I tried to find some veggies that didn't have much of a taste.  Unfortunately, in the final product, I can really taste the cucumber and the cucumber/banana blend is pretty disgusting.  So next time I am going to try this with just the spinach and no cucumber.



Thursday, May 2, 2013

So much progress made today on Dexter

Today I:
- updated dexter firmware to support new LDP-1450 command set (including text overlay)
- updated dexter viewer software to also support text overlay
- basically cracked serious pate left and right

Unfortunately, when the moment of truth came (me testing dexter with Dragon's Lair 2), I discovered that I had not implemented a command that Dragon's Lair 2 was trying to send.  So now I need to get Dragon's Lair 2 working in Daphne again (I temporarily broke it while re-writing the sony command interpreter hee hee) and then all should be well.  Plus, DL2 apparently spams 00 to the serial port in Daphne, not sure if the real machine did this also, which is kinda curious (I can of course ignore it).

So tomorrow I hopefully will have some video showing off Dexter running Dragon's Lair 2 :)

Missing colon in LDP-1450 font

I just happened to notice that my LDP-1450 font which I (and others) have carefully created based off of screen captures from the original player is missing a COLON character.  OH TRAGEDY!  Here I thought I was completely done with the LDP-1450 text code and here I need to do one more screen capture to grab the colon!

So close I can taste it... :)

Wednesday, May 1, 2013

A few more LDP-1450 vertical mode captures for Dexter



I am just finishing up adding quirky initial vertical offsets to these two modes (vertical scale *2 and *3).  After that, I just need to make some of the space characters completely transparent as I mentioned in a previous post and I will be 100% done with text overlay emulation for Dexter (and Daphne too hehe).