BadSharedMemories

/by Barzilai Spinak a.k.a. barspi/

Recently we spent two or three days hunting down a bunch of weird behaviors on a client’s project.

It was a typical jPTS setup, and we were testing a Source Station (SS), talking to the central jPTS, talking to a destination station (DS). The different subsystems were chained together with the standard set of QueryHost + QMUX + ChannelAdaptor pointing to the QServer down the line.

The client wanted to do some stress load testing, and was using JMeter ^*^ with the jPOS-based ISO-8583 plugin (sending a very high load of messages to the SS).

Despite being a “standard setup”, there were several custom features that were giving us all kinds of trouble (it’s never a standard setup, is it?). The initial implementation was performing very badly with respect to transactions per second (TPS), and also losing many transactions due to timeouts. We had to isolate the different sources of the problems (extra queries to a very slow parallel database, a DB connection pool too small for the expected concurrency, an HSM that would choke when being hit with some ridiculously high number of TPS…). Things got much better after some refactoring, but we still got many responses with a timeout error code, or just dropped transactions, for which we never got a response. When analyzing the logs further, we saw some warnings from the multiplexers about duplicate keys, and we detected that it was better to use a different key set than QMUX‘s default 11, 41 (the terminal id, field 41, would often be constant, so we decided to add field 37, the Retrieval Reference Number, to the key set).

But still, many transactions were timing out… until we detected something even stranger. For example, RRN’s, or other fields, that would change in the response, returning a value that belonged to a different request. Our ISOMSg‘s were having some weird genetic recombination!

Now, we should clarify that this system was developed mostly by the client, so we weren’t familiar with all the details of their code and logic hidden in some of their custom classes. We were told that the DS was supposed to talk to a Mastercard remote node, but that at the moment it only had some “autoresponder simulator” in order to perform these tests. We imagined a simple RequestListener that just changed and added a couple of fields, with a standard success response in field 39.

In reality, the “autoresponder simulator”, was a full transaction manager with several participants, one of which was a not-so-simple equivalent to the following (here, extremely simplified, for clarity and brevity)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class SomeParticipant implements TransactionParticipant, Configurable {
private String requestKey; // GOOD
private ISOMsg theMsg; // BAD; DON'T DO!

private void readRequestFromContext(Context ctx) {
(1) theMsg= ctx.get(requestKey); // BAD; DON'T DO!
// ...manipulate theMsg...
}

protected int prepare(long id, Serializable context) throws Exception {
Context ctx = (Context)context;

(2) readRequestFromContext(ctx); // OOPS!

(3) theMsg.set(38, getApprovalCode()); // is theMsg still the same object
(4) theMsg.set(39, getResponseCode()); // ...and now?

(5) ctx.put(Constants.DS_RESPONSE, theMsg); // what about here?

return PREPARED | NO_JOIN;
}

public void setConfiguration(Configuration cfg) throws ConfigurationException {
requestKey = cfg.get("request-key", "REQUEST"); // GOOD
}
}

Can you spot the problem? The participant is using an instance variable, called theMsg, to store a reference to the request message in line (1). And what’s wrong with that? As explained in Chapter 9 of the jPOS Programmer’s Guide, the TransactionManager creates only one instance of each participant, but there may be multiple concurrent transactions (i.e. TransactioManager sessions) running. So, by using an instance member variable to store the value retrieved in line (1), each concurrent transaction (thread) being processed may be overwriting that value with a different ISOMsg. In fact, at each step where we use theMsg, such as in lines (2), (3), and (4), we could be working with a different ISOMsg!

Also, notice how it’s perfectly OK to use instance variables to store configuration options for this participant. These are usually read at the time of participant instantiation and initialization, and set in the setConfiguration() method. Those values shouldn’t normally change during the life of the participant.

So, what should we do? Always use either local variables, or pass the values as method arguments, or store them in the Context object and retrieve them from there.

Never use the instance variables of a participant to store transaction state, or you may incur in hard-to-debug race conditions


(*) We may have a blog post about using JMeter with jPOS in the near future