I cannot answer your question regarding PS1, I don’t own one and am not very familiar with its peculiarities.
Regarding the more general question, what is the difference between generic and optimized.
In generic mode, the analog signal is effectively oversampled. In an analog signal, colour changes between adjacent ”pixels” cannot be instant, there is always a transition. Example go from pure white to pure black, there will be a continuous section of shades of gray because the signal has to drop from maximum to minimum voltage. In generic mode, e.g. in x4 mode there are 1560 samples so for a 341 dot console like NES there are 4.6 samples per dot, thus capturing those transitions.
In optimized, only one sample per dot is made. This is then horizontally upscaled by the video chip on the OSSC by so-called pixel repetition. Thus, (in x4 mode) the one sample is repeated 4 times (320×240 & 256×240 8:7 mode) or 5 times (256×240 4:3). This gives an ultrasharp appearance that could otherwise only be achieved by digital systems such as an emulator or fpga console.
This is the reason phase matters for optimized but not generic modes: in generic there is an excess of samples compared to original signal. However, in optimized it’s crucial the sample is taken as close to the center of each dot as possible. Otherwise you see ”unclear” or ”jumping” pixels, due to being taken in/near a transition). Phase basically determines how soon after a line starts that sampling starts.
I’m sorry I have not been able to adress questions before, I’ve had very limited time recently.
Edit: I’m sure I saw a picture someone made that pedagogically showed this but now I couldn’t find it unfortunately.