Update zap2xml.exe for new gracenote.com

I'm a happier user of "zap2xml.exe" for scraping Windows Media Center TV EPG, following this awesome guide since MS stops EPG service.

The "zap2xml.exe" is a packed perl script for Windows, getting EPG fromzap2it.com and as zap2it.com moves to gracenote.com recently, while everything, including user/pass favorite channels, works on gracenote.com, zap2xml.exe has zap2it.com hardcoded and fails.

One solution I found on reddit, is to use hosts to resolve the old tvlistings.zap2it.com to new IP of tvlistings.gracenote.com, however gracenote.com is on AWS and the IP keeps on changing. Without installing perl, there're some better approach.

Cache Spoofing

The zap2xml.exe is a packed perl script. When it's executed, it unpacks(unzip) itself to some folder then runs from there. The zap2xml.exe is packed in a way that, it doesn't clean the old unpacked files and just reuses them next time. So we can modify the script inside the unpacked folder. Here is how:

  1. The unpacked folder location can be overwritten by environmental variable PAR_GLOBAL_TEMP. Assuming the zap2xml.exe is in D:\TVEGP\zap2xml folder, set PAR_GLOBAL_TEMP as global environmental variable or change the batch file:

    1
    2
    set PAR_GLOBAL_TEMP=D:\TVEPG\zap2xml\parcache
    D:\TVEPG\zap2xml\zap2xml.exe -u user@email.com -p 1234 ...

    Run it once it will fail, as it's still on zap2it.com, but D:\TVEPG\zap2xml\parcache should be loaded with lots of files.

  2. Now run from command prompt, inside D:\TVEPG\zap2xml\parcache:

    1
    findstr zap2it.com *

    It should get only 1 file, a random named .pl, about 50KB in size. Open it in text editor, search tvlistings.zap2it.com and replace with tvlistings.gracenote.com. There're two instances only first one matters.

    $urlRoot = 'https://tvlistings.zap2it.com/';

    to

    $urlRoot = 'https://tvlistings.gracenote.com/';

    No harm to replace both.

    warning: don't replace all zap2it with gracenote, this will break the script. Make sure to just replace the full url.

  3. Unless PAR_GLOBAL_TEMP is properly set, don't run zap2xml.exe directly. Rerun the batch file, as zap2xml.exe reuses old files, the change takes effect, and zap2xml.exe will work on gracenote.com.

Patching the .exe

Another approach, is changing the .exe file. Instead of getting perl and repacking everything, I "patched" the zap2xml.exe.

  1. poke around withbinwalk

    1
    2
    3
    4
    5
    6
    7
    8
    9
    $ binwalk zap2xml.exe
    0 0x0 Microsoft executable, portable (PE)
    14400 0x3840 Microsoft executable, portable (PE)
    30012 0x753C Executable script, shebang: "/usr/bin/perl"
    .............
    .............
    5503779 0x53FB23 Zip archive data, at least v2.0 to extract, compressed size: 445, uncompressed size: 690, name: script/main.pl
    5504268 0x53FD0C Zip archive data, at least v2.0 to extract, compressed size: 13673, uncompressed size: 50660, name: script/zap2xml.pl
    5621040 0x55C530 End of Zip archive, footer length: 22
  2. script/zap2xml.pl is what I'm looking for, as it shows:

    1
    5504268       0x53FD0C        Zip archive data, at least v2.0 to extract, compressed size: 13673, uncompressed size: 50660, name: script/zap2xml.pl

    well it's kinda tricky for someone(me) unfamiliarknow nothing with zip file format. This is a "file section" including a variable length header and compressed data. The header starts at 5504268, and after the header is the 13673 bytes of data. So the whole chunk is larger than 13673(header + data). Don't "cut and replace" 13673 bytes from 5504268 as it wouldn't work. To replace old data, I prepared some new compressed data, with same 13673 bytes, and put it at proper location, in this case, header size is 47 bytes so it's 5504268+47=5504315.

  3. Now about getting the exact 136783 bytes of compressed data. To do so, I made a script folder, put updated zap2xml.pl inside(with extra -H option), and run:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    $ zip -r -9 -X out.zip script && binwalk out.zip
    updating: script/ (stored 0%)
    updating: script/zap2xml.pl (deflated 73%)

    DECIMAL HEXADECIMAL DESCRIPTION
    --------------------------------------------------------------------------------
    0 0x0 Zip archive data, at least v1.0 to extract, name: script/
    37 0x25 Zip archive data, at least v2.0 to extract, compressed size: 13705, uncompressed size: 50785, name: script/zap2xml.pl
    13905 0x3651 End of Zip archive, footer length: 22

    The data was 13705(YMMV) and larger than 13673. Removing some comments from zap2xml.pl and repeating zip && binwalk until it reached 13673 bytes. After that, extracted these 13673 bytes with dd and put it to zap2xml.exe at 5504312.

  4. At this point, zap2xml.exe should work as an executable, but damaged as a zip file. The file headers in the new zap2xml.exe -- both in file section and central directory file header (CDFH) -- need to get updated with new CRC32 checksum and uncompressed file length, And the new exe can be opened normally in 7zip then.

  5. The new zap2xml.exe is patched with extra -H option, for future domain change.

    The final exe can be found at:

    https://github.com/lucidusdev/zap2xml