Saturday, May 5, 2012

SMB and Opportunistic Locking

Microsoft broke SMB backward compatibility again (sound of anguish!)

A friend of mine mentioned he cannot delete the file even after the application closed connection to a windows file server after the OS was upgraded and ask me if I seen such problem before. After some quick debugging, I have the following scenario:

App Server -> File Server -> DB Server

Application copies a temp file on File Share on File Server and App Server initiate a BCP from the File Server to a DB server. After the bcp job is completed, the application attempt to delete the temp file on File Share and failed because the file handles is still open, hence create an error condition. App Server running Windows 2003 creates the temp file on a Windows 2003 works. However, after the file server was upgraded to a Windows 2008-R2, the problem start occurring. App Server running Windows 2008-R2 creating the temp file on Windows 2008-R2 file share does not have that much problem.

So the problem is related to file locking and to better understanding the issue, we need to do a minimal post-mortem without attempting to use too much debugging tools with SysInternal tools or manually unlock a production server with unlocker tools. Since user runs a Windows file shares, I looked at how the SMB interacts with file locking, in particular the dependencies between SMB and oplocks to come with something we can justify about the failure. In the SMB protocol, opportunistic locking is a file locking mechanism designed to improve performance by controlling caching of network files by the client. Contrary to the traditional locks, OpLocks are not used in order to provide mutual exclusion. The main goal of OpLocks is to provide synchronization for caching.

To improve performance, Windows 2003 Server appear to only support SMB v1. By default, it enables and uses opportunistic file locking to allow multiple clients to read more database records than they currently need. This read-ahead can cause database corruption when multiple clients accessing the data, such as if one client modifies a record previously read by another client, then previous client may not flush the invalid data from the cache, hence create a possible database corruption. Microsoft recommends disabling opportunistic file locking in networks that clients writes to per KB296264.

The idea is that a client accessing a shared file can cache the file’s data for reuse without having to travel the network link multiple times; a classic caching technique. The only way the client can be sure that the cached file data hasn’t been changed is if the data gets locked out from writes by other clients. So the client places an "opportunistic lock" on the file, assuming the SMB server (the file server) approves the request – for example, if nobody else has the file open for writing. If another user subsequently wants to write to the file, the SMB server has to break the lock and then the original client is going to lose the speed benefit of caching the data locally.

With SMB v1, one can disable oplocks with a registry hack. There are situations in which they could create a net downgrade of performance and/or robustness, depending for example on the kind of database software you might be running. (Client/server databases might run faster and more happily with oplocks, but ISAM databases might run into data corruption issues.) However, you can’t turn oplocks off in SMB v2. Among other things, it would break the "offline files" feature if you did.

To disable opportunistic locking, you can modify the following registry entries:

Registry HiveValueTypeData
HKLM\SYSTEM\CurrentControlSet\Services\MRXSmb\ParametersOplocksDisabledREG_DWORD1
HKLM\SYSTEM\CurrentControlSet\Services\Lanmanworkstation\ParametersUseOpportunisticLockingREG_DWORD0
HKLM\SYSTEM\CurrentControlSet\Services\Lanmanserver\ParametersEnableOplocksREG_DWORD0

With SMB version 2.1, running on Server 2008 R2 and Windows 7, Microsoft has tuned the oplocks feature so that it works better in situations where a single application might generate multiple opens to the same file, which apparently happens more often. Microsoft claims a performance benefit from this tweak, providing yet another argument for giving serious thought to R2 and Windows 7.

My recommendation is user plan to use SMB 2.x only if they are running Windows 2008 machines on both ends. However, you can disable SMB 2.x entirely when one choose to run on different flavor OS version via modifying the registry setting disabling oplocks to work.

No comments: