Vendor doesn't understand tcp protocol and keeps blaming our infra for their POS
IDK how you guys would deal with this. I literally lost my shit on a call yesterday.
We're reviewing how a product is not living up to one of it's core specifications, timely delivery. A message is supposed to be delivered to a phone within our facility network within 10 seconds.
One of the errors they saw was "Connection reset by peer". They immediately started by saying this shows that the client was in a bad network spot. I reminded them that connection reset means that it received a TCP reset flag on a packet back. If the client was in a bad network spot it would be a timeout instead. This whole thing happened over 2 minutes. I asked why they don't implement pre-emptive timeouts for their http requests...no response.
Next they reviewed one clients logs and said look, there was a whole minute between long lines, the android app is getting suspended. I've never seen such inferences made like that. Log lines are pretty much a developer's take. I then asked if they saw any of the Android life cycle methods being called. No response.
I even asked if they understand that because you sent a push notification via sip to android does not make it an explicit guarantee. They keep responding with the same line, it will because we sent push notification to it.
This was with the senior support on the call. Like legit, the vendor is basically denying reality at this point. I want to scrap this crap.
Edit: wanted to add, this was my escalation too, I've been telling my boss to help line up serious people on their side. This is what they came back with. We've dumped millions in this POS over 3 years now. It only handles 500 concurrent TCP connections on a beefy Windows server (64 CPU, 128gb ram). Also the same vendor that reindexes the phone sqlite db after a single row insert and called it an "engineering decision".