BadSharedMemories

/by Barzilai Spinak a.k.a. barspi/

Recently we spent two or three days hunting down a bunch of weird behaviors on a client’s project.

It was a typical jPTS setup, and we were testing a Source Station (SS), talking to the central jPTS, talking to a destination station (DS). The different subsystems were chained together with the standard set of QueryHost + QMUX + ChannelAdaptor pointing to the QServer down the line.

The client wanted to do some stress load testing, and was using JMeter ^*^ with the jPOS-based ISO-8583 plugin (sending a very high load of messages to the SS).

Despite being a “standard setup”, there were several custom features that were giving us all kinds of trouble (it’s never a standard setup, is it?). The initial implementation was performing very badly with respect to transactions per second (TPS), and also losing many transactions due to timeouts. We had to isolate the different sources of the problems (extra queries to a very slow parallel database, a DB connection pool too small for the expected concurrency, an HSM that would choke when being hit with some ridiculously high number of TPS…). Things got much better after some refactoring, but we still got many responses with a timeout error code, or just dropped transactions, for which we never got a response. When analyzing the logs further, we saw some warnings from the multiplexers about duplicate keys, and we detected that it was better to use a different key set than QMUX‘s default 11, 41 (the terminal id, field 41, would often be constant, so we decided to add field 37, the Retrieval Reference Number, to the key set).

But still, many transactions were timing out… until we detected something even stranger. For example, RRN’s, or other fields, that would change in the response, returning a value that belonged to a different request. Our ISOMSg‘s were having some weird genetic recombination!

Now, we should clarify that this system was developed mostly by the client, so we weren’t familiar with all the details of their code and logic hidden in some of their custom classes. We were told that the DS was supposed to talk to a Mastercard remote node, but that at the moment it only had some “autoresponder simulator” in order to perform these tests. We imagined a simple RequestListener that just changed and added a couple of fields, with a standard success response in field 39.

In reality, the “autoresponder simulator”, was a full transaction manager with several participants, one of which was a not-so-simple equivalent to the following (here, extremely simplified, for clarity and brevity)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
public class SomeParticipant implements TransactionParticipant, Configurable {
private String requestKey; // GOOD
private ISOMsg theMsg; // BAD; DON'T DO!

private void readRequestFromContext(Context ctx) {
(1) theMsg= ctx.get(requestKey); // BAD; DON'T DO!
// ...manipulate theMsg...
}

protected int prepare(long id, Serializable context) throws Exception {
Context ctx = (Context)context;

(2) readRequestFromContext(ctx); // OOPS!

(3) theMsg.set(38, getApprovalCode()); // is theMsg still the same object
(4) theMsg.set(39, getResponseCode()); // ...and now?

(5) ctx.put(Constants.DS_RESPONSE, theMsg); // what about here?

return PREPARED | NO_JOIN;
}

public void setConfiguration(Configuration cfg) throws ConfigurationException {
requestKey = cfg.get("request-key", "REQUEST"); // GOOD
}
}

Can you spot the problem? The participant is using an instance variable, called theMsg, to store a reference to the request message in line (1). And what’s wrong with that? As explained in Chapter 9 of the jPOS Programmer’s Guide, the TransactionManager creates only one instance of each participant, but there may be multiple concurrent transactions (i.e. TransactioManager sessions) running. So, by using an instance member variable to store the value retrieved in line (1), each concurrent transaction (thread) being processed may be overwriting that value with a different ISOMsg. In fact, at each step where we use theMsg, such as in lines (2), (3), and (4), we could be working with a different ISOMsg!

Also, notice how it’s perfectly OK to use instance variables to store configuration options for this participant. These are usually read at the time of participant instantiation and initialization, and set in the setConfiguration() method. Those values shouldn’t normally change during the life of the participant.

So, what should we do? Always use either local variables, or pass the values as method arguments, or store them in the Context object and retrieve them from there.

Never use the instance variables of a participant to store transaction state, or you may incur in hard-to-debug race conditions


(*) We may have a blog post about using JMeter with jPOS in the near future

Asking a smart question

/by apr/

Among the many helpful jPOS developers in our community, @marklsalter stands out for his professional, accurate, and detailed answers, but he has his standards when it comes to how you ask questions. You need to ask a smart question. jPOS has Mark (and Victor, and Andy, and Dave, and Matias, and Chhil, and Barzi, and, and, and), but if you go to any other open source community asking for free advice, you’ll find another Mark, or worst than that, you’ll find no Mark and your question will just get ignored and you won’t even know why.

This is what you should expect as a response if you don’t do your homework and ask a smart question (from a recent reply in jpos-users, in this case, related to a vague question about to the Transaction Manager - could have been anything else).


Please always start by asking a smart question.

Please read this now :-

http://www.catb.org/esr/faqs/smart-questions.html

Yes, the whole things, go on, treat yourself, it will take 5-10 minutes and save us both hours going forward.

Preparing to ask a smart question should cause you to read the available documentation, to understand it and enable you to make sure you include all the relevant details needed for another remote person to help you, but should also make sure you have understood the documentation and how it applies to your need.

I can honestly say that on this opening post that I could see that it was going to be another thread that would drag on - without you …

  1. Apparently making any effort to understand what you have done incorrectly or “misunderstood”.
  2. Trying first to understand how and why your set-up is broken
  3. Referring to the documentation on the life cycle of the TransactionManager and it’s given participants - which perhaps surprisingly, works perfectly when users follow the few simple rules and grasp what it does for you.

By the way, I understand that a TM configuration might not be obvious straight away, but often the best things are worth the effort. I still refer to the documentation again and again (and again).

I will include some comment below to, in the hope that you read it and take the time next time to ask a smart question.

Remember as you read through that I am not taking the piss out of you, but trying to highlight why this is a terrible opening question and how you can (hopefully) help yourself next time.

jPOS-EE Crypto Service

In many jPOS systems, we secure sensitive data using ANS X9.24 DUKPT as described in the Encrypting sensitive data post. The approach served us well, but now we believe we have a better one, using PKI and AES-256.

The cryptoservice module uses AES-256 to encrypt sensitive data, such as primary account numbers and protects the encryption key using PGP.

At start-up time, and at regular intervals, the crypto service generates a new AES-256 key, encrypts it using PGP using one or more recipient ids, and stores the resulting encrypted message in the sysconfig table, using the “key.” prefix, and a unique key UUID, i.e.:

id: key.f55fe6ec-ed9e-47a1-a0fe-c63dcbf128cb
value:
-----BEGIN PGP MESSAGE-----
Version: BCPG v1.56

hQEMA6Nw6GrTY6BpAQgAs1pUIK3n2FkMyNmfxSZgpPMNFKz39TcfExiwDRtuw+Zg
wRgFw86SJiL1BB+IE+mPAeCz4hrUkzliiu/760NiXHQysIasWEvUZZqFRA+ecNrk
zARgB8vgGTNgxPHoYPafVD5TrxY9LdRpJcO//Wm2fEVw0xc4Q7vxbH7e9gDQfiuA
gcNYk96rVCdbZFKxyMC8fpM9ng6M4V9lxp5TXihzJQEKHWavctIrU2rBolE1WCY2
Oobs1hELW4rfMpVwfGQDtxcFSNDYkd9IO/WnFTtTAxGHs0u1/miRVxNHadLINdke
wXx6au9vq12tqlYaJY+BAEtJaAInwwT5/irHj5dlwtJ0AW2wO3Mwh+A+pGJvSd2T
xyep1pNtm7tMbisZyms0TiGz+6BX6F5ZKCG5UuvsIvTHd/VLp2uajE5NVPe92Y1F
lLbbMyUfxzBwNhwhdfOEWwRAmrt7AbMyAQHUCZAXgwXn7SXsdh8TTzLMsssViD9+
h7lfP9w=
=YyZk

-----END PGP MESSAGE-----

The key is used to encrypt subsequent data for a given period of time (defaults to one day) until a new key is automatically generated.

Here is a sample usage:

private void encryptCardData (TLCapture tl, Card card)      <1>
   throws Exception {
     Map<String,String> m = new HashMap<>();
     m.put ("P", card.getPan());
     m.put ("E", card.getExp());
     SecureData sd = getCryptoService().aesEncrypt(         <2>
        Serializer.serializeStringMap(m)
     );
     tl.setKid(sd.getId());                                 <3>
     tl.setSecureData(sd.getEncoded());                     <4>
 }
  • <1> TLCapture in this example is a general purpose capture table.
  • <2> getCryptoService() just locates the CryptoService using the NameRegistrar
  • <3> kid stands for Key ID, we store the key UUID here
  • <4> secureData is a general purpose blob

The crypto service can be configured using a QBean descriptor like this:

<crypto-service class='org.jpos.crypto.CryptoService' logger='Q2'>
    <property name="custodian" value='demo@jpos.org' />               <1>
    <property name="pubkeyring" value='cfg/keyring.pub' />            <2>
    <property name="privkeyring" value='cfg/keyring.priv' />          <3>
    <property name="lazy" value="false" />                            <4>
    <property name="keylength" value="256" />                         <5>
    <property name="duration" value="86400000" />                     <6>
</crypto-service>
  • <1> custodian PGP id, there can be many custodian entries.
  • <2> path to the public keyring.
  • <3> path to the password-protected private keyring.
  • <4> if lazy=true, a key is generated the first time we call aesEncrypt, otherwise, a new one is created at service start.
  • <5> key length defaults to 256. Can be reduced if AES-256 is not supported by the JVM due to export restrictions.
  • <6> key duration

This allows jPOS nodes to encrypt data securely without storing the encryption key to disk.

NOTE: The transient encryption key is still in memory, so core dumps and swap should be disabled at the operating system level. This approach is still more secure than obfuscating encryption keys.

Decryption – that can of course run in a different node, at a different time – requires access to the private keyring, with its optional password. Said password can be entered manually, obtained from a remote service or HSM, etc. and it’s a two step process.

First the key has to be loaded into memory, using the loadKey method. Once the key is loaded, the aesDecrypt can be called.

These are the method’s signatures:

public void loadKey (String jobId, String keyId, char[] password) throws Exception;
public byte[] aesDecrypt (String jobId, String keyId, byte[] encoded) throws Exception;

Here keyId, password, and encoded cryptogram don’t require too much explanation, but jobId does and here is the rationale. We could have a one-shot aesDecrypt method accepting the private key password, but decrypting the AES-256 key using PGP is an expensive operation. In situations where you have extract a daily file, probably encrypted by just a handful keys, you don’t want to decrypt the key on every aesDecrypt call. We don’t want to expose the key to the caller either, so the CryptoService keeps it in a private field. In order to do that, loadKey caches the key (until it’s unloaded), so it’s cheap to call loadKey followed by aesDecrypt, after the first call where the key is actually decrypted, subsequent calls will be pretty fast.

In order to protect different clients from accessing keys loaded by other ones, we use a jobId that can be something as simple as a UUID or any nonce, only known to the caller. That jobId can then be used to unload those keys, using the unloadKey and unloadAll methods:

public boolean unloadKey (String jobId, String keyId);
public void unloadAll(String jobId);

There’s also a no-args unloadAll() that unloads all keys, and should be used with care.

NOTE: In order to simplify development and testing, and eventually to troubleshoot problems, we’ve also created a couple of CLI commands: aesencrypt and aesdecrypt.

TIP: If you’re accessing the CLI using the command line q2 --cli, remember that the default deployDir is deploy-cli instead of deploy. You need a copy (or symlink) of 25_cryptoservice.xml in that directory. If you ssh to a running Q2 to reach the CLI, then you can ignore this tip.

For up-to-date information about this CryptoService module, please see the jPOS-EE guide.

TxnId

There’s a new handy org.jpos.transaction.TxnId class in the jPOS-EE txn module that can be used to generate transaction ids in multi-node systems.

The id is composed of:

  • 1-digit century
  • 2-digits year
  • 3-digits day of year
  • 5-digits second of day
  • 3-digits node id
  • 5-digits transaction id

A typical ID long value would look like this: 173000702600000001, and the toString() method would show as 017-300-07026-000-00001 and the toRrn() method would return 1bbfmplq9la9.

TxnId also has a handy toRrn() method that can be used to create (and parse) 12-characters strings suitable to be used as retrieval reference numbers.

TxnId can be used instead of UUIDs. It puts less pressure in the database index and provides chronological order.

NOTE: The last two groups, node-id and transaction-id are supposed to be unique. transaction-id is easy to get from the transaction manager and node-id is a tricky one, user has to ensure each node has a unique node-id to avoid collisions.

Sample usage:

TxnId txnId = TxnId.create(DateTime.now(), 0, id);

jPOS 2.1.0 has been released

jPOS 2.1.0 has been released, new development version is now 2.1.1-SNAPSHOT

Please see the ChangeLog.

Remember we are using Semantic Versioning so the change from 2.0.10 to 2.1.0 means a full rebuild has to be done in your applications. Some of the most notable changes are:

  • TransactionContext is now backed by a Map<String,Object> instead of the old Map<Object,Object> so that needs review
  • Some methods that used to throw ISOException are not throwing it anymore

Other than those two minor changes, jPOS 2.1.0 has a large number of improvements, including TransactionManager metrics, new org.jpos.rc package, bug fixes and improved TransactionManager capacity.

jPOS-EE 2.2.4 has been released as well, new development versions are jPOS 2.1.0-SNAPSHOT and jPOS-EE 2.2.5-SNAPSHOT.

See Resources Page for details.