NFS Cache - Invisible file issue

I ran into a following situation and it took some time to diagnose the issue and help from couple of folks from DBA and operations team to resolve it.  Here is what happened.

I exported a large data set from MySQL to a file in directory /dir_1/dir_2/exported_file.txt, for example, in an application.  Then after the file was exported the application went on to consume the file by reading it's content.  Since MySQL OUTFILE (exporting data) doesn't overwrite a file if the file name already exists, the code would rename the file to *.bak.   See below for pseudo code.

If OUTFILE exists
    Move or Rename OUTFILE to OUTFILE.bak    /* Step 1 */
Run MySQL export to OUTFILE    /*  Step 2 */
Check the error code
Read OUTFILE and parse               /* Step 3 */

When I ran the application, it would sometime create the output file and go on to parse it correctly but many a times it would fail in step 1 throwing an error like "file already exists" when in fact it was not.  Because I had removed the file with 'rm -f' before rerunning the program.  Other times it would fail in step 3 indicating that file does not exists even though SQL exported the file successfully in step 2.  I even provided sleep time between each step ranging from 5 to 60 seconds but continued to see the same random behavior.

After spending sometime trying to diagnose what might be going on, ended up debugging NFS caching.  The directory /dir_1 was a mounted file system with NFS caching set to few hundred seconds.  When the application wrote to NFS directory, the write cache was updated but not the OS directory structure (inode). Reducing the parameter setting (actimeo) to lower number, say 30 seconds, will help alliviate the delay. If sys admins are reluctant to change the older mounted system settings, you should get a new mount point with actimeo set (30).   Once these changes were made application was able to run smoothly with the application sleep set to little higher than actimeo timings.  Note, using actimeo sets all of acregmin, acregmax, acdirmin, and acdirmax to the same value. There is no default value. See man pages for more details.




Cheers,
Shiva

No comments:

Post a Comment