This is related to graph tests... we automatically generate them based on the context (legacy/core.. graphson1,2,3)
Looks like this issue is gone. I can see failures as normal now.