While it is a little odd to respond as I don't have an answer, I will add a few ideas. I too did some searching and it seems PCIe with FPGAs is very proprietary. This is consistent with what you found; Altera seems like the easiest way to go. Maybe instead of doing PCIe on the FPGA, use a bridge chip to another interface? I found an interesting chip here: http://www.asix.com.tw/products.php?op=pItemdetail&PItemID=119;74;110&PLine=74
It has quite a few IO features. It doesn't really let a FPGA take advantage of the PCIe features or speed but the IO features sound similar to what you had in mind. 8 GPIO seems kind of low though. Maybe there is something else out there with a faster serial interface that wouldn't require a costly IP.
What happened to the dual Lattice / Altera idea? Was it too expensive? Cost was my concern rather than the complication of integration.
I wouldn't be of much use writing a PCIe core -- way beyond my skill level -- but I do know VHDL and basic digital design so I could help on much simpler things.