NIO Performance Tips
NIO (New IO) is a different way to use IO in Java. NIO is opening some new avenues that Java was not able to explore before. However, there are some things you need to wory about if you decide to use NIO. Some things I encounter are listed below. Feel free to use them any way you like :)
Non-Blocking Sockets
Non-blocking sockets are real treat in NIO. Using this feature you can now write server that can literally handle tens of thousands of connections on a fairly wimpy box. Where is the secret? How come we were not able to do that before? Well, in a traditional model you have a socket listener that accepts the connection, then after the connection is established, it is passed to the separate thread to handle it. That means one thread per connection. One thread per connection does not sound bad, but what I have learned (painfully) is that in some cases it can actually shut the whole box down (Linux Redhat 9, crashes and has to be physically rebooted when number of threads increases passing a couple thousand). Yes, I know RedHat 9 is outdated, but it's still out there. Anyway, thousands of threads can put significant preasure on any OS and with the NIO number of threads can be significantly reduced. The secret of NIO scalability is in the non-blocking socket implementation. Details about the non-blocking implementation inside JDK are outside of the scope of this writeup, but I will show some tricks to significantly increase the performance of NIO based server. Starting a Listener Let me state the obvious: You need a listener to have a socket based server. Starting a listener using NIO is slightly different then listening on a socket using traditional approach. Here is a code snippet of how to start the listener: ServerSocketChannel serverSocketChannel = ServerSocketChannel.open(); Selector selector = Selector.open();
serverSocket = serverSocketChannel.socket(); serverSocket.setReuseAddress(true); if(rcvBufSize != 0){ serverSocket.setReceiveBufferSize(rcvBufSize); } serverSocket.bind(new InetSocketAddress(port),backlog); serverSocketChannel.configureBlocking(false); SelectionKey acceptKey = serverSocketChannel.register(selector, SelectionKey.OP_ACCEPT); So to start the listener, we open ServerSocketChannel and we open the Selector. You guessed, those 2 are important... The last but one line in the snippet is where the magic happens. serverSocketChannel.configureBlocking(false); This is where we set the socket to be a non-blocking socket. Last line is registering this channel with the selector for operation OP_ACCEPT and Technically, with the snippet above, we are still not listening and processing the requests. Snippet below shows how to start processing requests: int selectorStatus = 0; try { selectorStatus = selector.select(); } catch (CancelledKeyException ex) { //ignore } selectedKeys = selector.selectedKeys(); Iterator it = selectedKeys.iterator(); while (it.hasNext()) { key = (SelectionKey) iterator.next(); it.remove(); if (key.isValid()) { processValidKey(key); } else { cancelKey(key); }
Now it gets interesting. Every selection key has a set of operations that are ready associated with it. Based on the operation, your code should do a different action. Usually, you will be interested in following operations: SelectionKey.OP_ACCEPT - client is trying to connect to the server SelectionKey.OP_READ - Data is available on the socket and should be processed SelectionKey.OP_WRITE - Channel is ready for writing All of these operations could be processed on the same thread. Many examples on the Web will have this processing on the single thread. Example code will show how to read from the channel, then if there is nothing else to read, attach the object to the key and go back to the selector waiting for the event. This approach is good in theory. Unfortunatelly, once you actually start processing requests, you will see that performance is not so stelar. Actually, traditional model with thread per connection is likelly to have better performance. Why? Well, once you have read the data from the socket, you need to process it. Unless your processing takes no time, it will block processing of the other requests. Ok, so it's obvious that actual data processing (once you have data) should be in a separate thread. Previous statement is not true only if your server just accepts the file and sends back the data from the file. The reason why this is different is the fact that if you just need to accept the file, once you have the data, you are done! However, if you are writing a server that accepts commands and needs to process them, then you must code for the possibility that command execution can take some time and your server should suffer from the fact that the command being excecuted takes about 1 sec. There is one more reason to avoid single threaded approach for NIO based server. Depending on the operating system and individual Java implementation you might run into a limitation where number of file handles is limited per selector. SocketChannel is usually represented as a file handle. Due to this limitation, single threaded NIO server might not be able to server very large number of clients, even if it's just writing the data from the socket to the file. After a lot of research and code changing, measuring performance in a real world environment, I found out that the best performance is achieved if all requests are accepted on the single thread. Processing accept event would register the channel with one of the selector threads readily available in the pool. This thread would read the data from the socket, as much as is available and when completed, message/command is placed in the queue for execution. This pattern is commonly refered to as the "Reactor Patern". The object that is processing the accept event is commonly called "Dispatcher". Processing Keys private void processValidKey(Selector selector, SelectionKey key) throws IOException, ClosedChannelException { if(key.isAcceptable()){ //accept connection and register read event... ServerSocketChannel ssc = (ServerSocketChannel) key.channel(); SocketChannel socketChannel = (SocketChannel) ssc.accept(); socketChannel.configureBlocking(false); DispatcherPool.getInstance().next().addChannel(socketChannel); } } DispatcherPool is a singleton that has a list of Dispatcher objects class DispatcherPool { ...... private static int MAX_D = 5; private List<Dispatcher> dispatchers = null; private int curInd = 0; private DispatcherPool() { initializeAndStartDispatchers(); } ..... public synchronized Dispatcher next(){ curInd = curInd >= MAX_D ? 0 : curInd; return dispatchers.get(curInd ++); } ...... } public class Dispatcher implements Runnable { public void run() { while(!stopped) { ....... register(); ..... selector.select();//blocking Iterator keys = selector.selectedKeys().iterator(); while(keys.hasNext()) { SelectionKey key = (SelectionKey)keys.next(); keys.remove(); if(key.isValid()) { processKey(key); } else { cancelKey(key); } } ....... } } public synchronized void addChannel(SocketChannel channel) ....{ channels.add(channel); //this will unblock selector.select(); there will be no keys and while loop will go into "register()" selector.wakeup(); } private synchronized void register() ...{ int size = channels.size(); for (int i = 0; i < size; i++) { SocketChannel sc = (SocketChannel)channels.get(i); sc.configureBlocking(false); try { sc.register(selector, SelectionKey.OP_READ); } catch (IOException cce) { } } channels.clear(); } private void processKey(SelectionKey key) { key.interestOps(key.interestOps() & (~SelectionKey.OP_READ)); key.interestOps( key.interestOps() & (~SelectionKey.OP_WRITE)); Processor proc = (Processor)key.attachment(); if(proc == null){ proc = new Processor(); } try { proc.process(key); if(key.isValid()){ key.interestOps( key.interestOps() | (SelectionKey.OP_READ)); } } catch (FFTCmdException e) { e.printStackTrace(); ....} catch (Throwable t){ .....}
} ..... } This pattern will be processing requests fast and will not block. You can play with the number of Dispatchers that you have to achieve optimal performance. I noticed that adding more then 30 dispatchers actually reduces performance, but it might be different for your particular case. For a general case 10 dispatcher should be able to handle very high loads. Great article about adding SSL to NIO can be found at: http://weblogs.java.net/blog/jfarcand/archive/2006/09/tricks_and_tips_2.html Send any comments to: admin@myjavatricks.com |