Thursday, November 10, 2016

Automate startup of IBM Docs 2.0 server - the correct way?!

Hi.

I´ve done 5 different customer installations on IBM Docs 2.0 now, and they have all been having the same issues.
When I´ve done the installations, I´ve also created a scheduled restart of the server once a week. And after the server has restarted, the issues have been:
  • Viewer does not work as it should
  • Docs don´t work as it should
  • Thumbnail generation won´t work at all.
IBM initial setup, which is working fine, is that the scheduled tasks are set up to start on a "one time" trigger, and than with a 5 minute repeat interval.

Btw, do not alter this interval!! I tried... don´t do it.

And the scheduled tasks that are installed has a "when user is logged on" criteria.

When I have tried to change this to "wether or not user is logged on", and set the trigger to "on startup". in order to make the sched tasks start up automatically at windows reboot, without having to actually log on to the servers desktop... things was not working as expected.

The environments for the customers has been basically the same setup:
  • 1 IBM Connections server 
  • 1 Docs server where all three parts are installed (docs, viewer, conversion)
  • 1 DB2 server

My first installation was on a test server, where IBM Connections, DB2 and the IBM Docs servers were all installed on the same windows machine.
This server has no issues at all.

On the customer scenarios, where the Docs parts are installed on a separate server, a colleague of mine, @kilotin, created a startup script for the Docs parts. This script is executed through a Windows Scheduled task, and is triggered "on system startup".
The mounting of the IBM Connections shared folder is also something that´s called from within this script.

He also created a "stopdocs.bat" script, which shuts down the WAS server, node agents and disconnects the network drive mapping. It also does a reboot of the operating system.
(This script is shared later down in this blogentry, a modified version, in order to make this whole thing work.)

And, in order for the two Docs scheduled tasks ("sym_monitor" and "kill_timeout") to start automatically when the Windows server reboots, I´ve also modified those tasks to be triggered "on system startup" and to be executed with the option "start wether or not user is logged on".

All sounds good, right?
Well, not quite.

When I was in the process of developing this auto-start routine, this is what I did:
  1. Executed the "stopdocs.bat" script.
  2. Watched as the Websphere servers were shut down
  3. Then monitored that the OS was actually restarted.
  4. Logged into the desktop of the server, to monitor that the WAS processes were launched automatically.
  5. Then I tested each IBM Docs component manually (Create a document, edit, publish, print to pdf, uploaded documents, edited, watched as the thumbnail was generated.. etc, etc.)
Conclusion: Docs server is working fine and the restart routine works flawlessly also.

WRONG!

When a week went by, customers started complaining that the viewer didn´t work.
Especially documents of elderly Microsoft word format, ".DOC",  was not converting at all.
Newer document format, such as ".DOCX" was working....

SystemOut.log said stuff like:

[9/23/16 9:50:31:821 CEST] 0000047f QuerySnapshot E   CLFAF409W: Snapshot generation failed. Conversion error occurred. Error code:1203 Error message:Server returned unexpected status. Document id:ffbb67a7-5a98-49c8-8af7-28a74f86bc60 Document mimetype:application/msword Document version:1


And thumbnails previewing the content of the file is not shown either:

SystemOut.log said:
[11/8/16 19:51:46:095 CET] 00000132 UploadConvers I   CLFAF007I: Need to start a new conversion. This is for upload service. Doc id is 414ba791-1e60-4325-8e1d-d4f397b5014c. Mime type is application/vnd.openxmlformats-officedocument.wordprocessingml.document[11/8/16 19:51:46:188 CET] 00000132 ThumbnailServ I   Aquired thumbnail service lock successfully. DocId: 414ba791-1e60-4325-8e1d-d4f397b5014c LastModified: 1,478,631,105,495[11/8/16 19:51:46:188 CET] 00000132 ThumbnailServ I   CLFAF007I: Need to start a new conversion. This is for thumbnail service - one page upload conversion. Doc id is 414ba791-1e60-4325-8e1d-d4f397b5014c Mime type is application/vnd.openxmlformats-officedocument.wordprocessingml.document LastModified is 1478631105495[11/8/16 19:51:46:188 CET] 00000155 LCFilesEJBRep W   Can't find the file folder on EJB server! E:\shared\files\upload\files\35\65\eb4e2658-3cdb-4d5d-b79e-fe5637db363f[11/8/16 19:51:46:188 CET] 00000155 LCFilesEJBRep W   CLFAF703E: Get download document file. The doc id is 414ba791-1e60-4325-8e1d-d4f397b5014c[11/8/16 19:51:46:188 CET] 00000155 DocumentServi W   Image conversion for uploading failed. Document Id:414ba791-1e60-4325-8e1d-d4f397b5014c null


I then logged into the server, tested and then I see that ".DOC" documents started working again in the viewer. This was often after I did a manual restart of the Websphere Docs servers.
Thumbnails generation however, did not work.

Weird, right?

I then turned off the auto-restart scheduled task for each customer, while I was trying to figure this issue out. I made sure that the servers was running fine whilst doing so.

I tried everything I could think of. I created Windows services for the Docs WAS processes, ran those as an admin user, also tried as the SYSTEM user.... I tried mapping the network drive as different users, I was also in contact with IBM Champion Roberto Boccadoro, where we compared Docs configuration files with what he had in his system.... I then asked Roberto which user does he start the servers with. He was using "System". I then tried this on the scheduled task that starts up the servers as well.... Nothing gave results.

The thing that seemed to work, was actually ALWAYS logging manually into the Windows Desktop of the server.
That triggered stuff to work regarding .doc documents at least.

So, I then created a PMR, explaining what was going on. And after a few failed attempts, we were actually on to something!

IBM said that "you can´t use an Admin user in the Scheduled Tasks. You have to use the System user as the user that fires the Sched-tasks.". That means ALL THE TASKS, not only the Sched Task that starts the Websphere processes, but also the "sym_monitor" and "kill_timout" process.

And also, IBM also reccomends calling the Mount script in a separate script....... (which is wrong... I´ll discuss this later on).

Ok, first of all. When you install IBM Docs, the two scheduled tasks "sym_monitor" and "kill_timeout" is created with the user you are logged in as the installation was running. So, those two Sched tasks I had to manually set to run as "System".

I then restarted the Docs Windows server, and tested without logging into the desktop of the server this time.

Wow, now ".DOC" documents was actually working. Converted just fine and the viewer worked.

But one thing was still not working, and that was the thumbnail creation of the document.

I then stumbled upon an article (WHICH I DID NOT BOOKMARK and can´t find it again), describing the fact that when you map a drive in a script through a scheduled task, throug a "call" to a different mount script (as IBM suggested), and when this process is started up as the "SYSTEM" user, there is no sharing between the two processes.
Which means, the system user that starts up the WAS servers does not have access to the mapped drive, which was called through a separate mount script!!

So the solution, is to do the drive mounting inside the same script that starts up the Docs WAS processes. This is opposed to what IBM suggested.

So... This has been a long read for you guys. Here´s the fun part: How to correctly automate the startup of the Docs server and all it´s related processes:

First, you need to create this script:

(The order of starting up the WAS servers are important; 1. Conversion, 2. Docs, 3. Viewer.)


1. Create a directory e.g. D:\Scripts
2. Create a new subdirectory e.g. D:\Scripts\logs
3.Create file in D:\Scripts with the name "mapAndStartDocs.bat" with this content:
(Modify the script to suite your environment)
(I only map 1 drive, the Connections Shared Folder share, The docs_share and the viewer_share is locally on the Docs Windows server, which means I don´t need to map those to u: and v:)

@echo off


:Mount
echo Mapping IBM Connections Shared Data disk %date% %time% > D:\Scripts\logs\mount.log
net use E: \\appsrv1.skya.local\e$\IBM\Data /user:USERNAME PASSWORD /persistent:yes >> D:\Scripts\logs\mount.log

:StartNodes
echo Start conv nodeagent %date% %time% > D:\Scripts\logs\startNodes.log
call D:\IBM\WebSphere\AppServer\profiles\conv1Node01\bin\startNode.bat >> D:\Scripts\logs\startNodes.log

echo Start docs nodeagent %date% %time% >> D:\Scripts\logs\startNodes.log
call D:\IBM\WebSphere\AppServer\profiles\docs1Node01\bin\startNode.bat >> D:\Scripts\logs\startNodes.log

echo Start viewer nodeagent %date% %time% >> D:\Scripts\logs\startNodes.log
call D:\IBM\WebSphere\AppServer\profiles\viewer1Node01\bin\startNode.bat >> D:\Scripts\logs\startNodes.log

:StartServers

echo Start conv server %date% %time% > D:\Scripts\logs\startServers.log
call D:\IBM\WebSphere\AppServer\profiles\conv1Node01\bin\startServer.bat IBMConversionMember1 >> D:\Scripts\logs\startServers.log

echo Start docs server %date% %time% >> D:\Scripts\logs\startServers.log
call D:\IBM\WebSphere\AppServer\profiles\docs1Node01\bin\startServer.bat IBMDocsMember1 >> D:\Scripts\logs\startServers.log

echo Start viewer server %date% %time% >> D:\Scripts\logs\startServers.log
call D:\IBM\WebSphere\AppServer\profiles\viewer1Node01\bin\startServer.bat IBMViewerMember1 >> D:\Scripts\logs\startServers.log

:exit


Then create the script stopDocs.bat and put it in the same D:\Scripts folder.
stopDocs.bat content:

@echo off

:StopServers
echo Stop viewer server %date% %time% > D:\Scripts\logs\stopServers.log
call D:\IBM\WebSphere\AppServer\profiles\viewer1Node01\bin\stopServer.bat IBMViewerMember1 -username WASADMIN -password PASSWORD >> D:\Scripts\logs\stopServers.log

echo Stop docs server %date% %time% >> D:\Scripts\logs\stopServers.log
call D:\IBM\WebSphere\AppServer\profiles\docs1Node01\bin\stopServer.bat IBMDocsMember1 -username WASADMIN -password PASSWORD >> D:\Scripts\logs\stopServers.log

echo Stop conv server %date% %time% >> D:\Scripts\logs\stopServers.log
call D:\IBM\WebSphere\AppServer\profiles\conv1Node01\bin\stopServer.bat IBMConversionMember1 -username WASADMIN -password PASSWORD >> D:\Scripts\logs\stopServers.log

:StopNodes
echo Stop viewer nodeagent %date% %time% > D:\Scripts\logs\stopNodes.log
call D:\IBM\WebSphere\AppServer\profiles\viewer1Node01\bin\stopNode.bat -username WASADMIN -password PASSWORD >> D:\Scripts\logs\stopNodes.log

echo Stop docs nodeagent %date% %time% >> D:\Scripts\logs\stopNodes.log
call D:\IBM\WebSphere\AppServer\profiles\docs1Node01\bin\stopNode.bat -username WASADMIN -password PASSWORD >> D:\Scripts\logs\stopNodes.log

echo Stop conv nodeagent %date% %time% >> D:\Scripts\logs\stopNodes.log
call D:\IBM\WebSphere\AppServer\profiles\conv1Node01\bin\stopNode.bat -username WASADMIN -password PASSWORD >> D:\Scripts\logs\stopNodes.log

:UnmountShares
echo Unmount share %date% %time% > D:\Scripts\logs\unmount.log
net use * /delete /yes >> D:\Scripts\logs\unmount.log

:Reboot
echo Reboot %date% %time% > D:\Scripts\logs\reboot.log
shutdown -r -t 3

Then, change the two scheduled tasks like this:
kill_timeout:
Set the user to "SYSTEM", run with highest priv and "hidden"


Change trigger to "At startup" and 5 min repeat and duration to Indefinitely:


Do the same for "sym_monitor":




Then, create a new Scheduled task with the name "StartDocs".
Set the configuration as the following screenshots show:





Then, if you want to schedule a restart of the Docs server, create a Scheduled task with the name "StopDocs"






You can then test the restart out by right-clicking the "StopDocs" scheduled task and select "Run."

If all goes well, the WAS servers and NodeAgents will shut down, and a windows reboot will occur.
And when the server is back up again, you will be able to create Docs documents + upload files with .doc format and preview them + you will notice that the Thumbnail generation of the files are working as well.

Phew... This has been a huge headache... I´m thrilled that I´ve come up with the solution on how to automate the startup of the Docs servers with a scheduled reboot of the servers.

Cheers...


P.S. Any comments or questions or suggestions on improvements, please post them in the comments section.



5 comments:

Sharon Bellamy James said...

Great post .. I already do most of this ...

Love the idea of mounting as part of the stop / start script - I will use that in the future :)

Robert Farstad said...

Wow, you´re up as well? And you´re a really fast read if you read all of this so quickly after I posted it :-)

I´ll bring this back to the PMR now, and hopefully the documentation regarding the two sched tasks will change, and also on the mapping of the drive(s).

Robert Farstad said...

The "stopdocs.bat" script could have been modified a bit to make it smoother. For instance, inserting 2 variables at the top of the script that stores the Websphere username and password, and then use the variables further down in the scripts when I stop the servers and nodeagens.

Or, the soap.client.properties file could be used to store the WAS credentials, so that username and passwords are no longer needed in the calling of the stop scripts.
See more here: http://www.ibm.com/support/knowledgecenter/SSFHJY_2.0.0/deploy/configure_the_soap.client.properties_file.html

Victor Toal said...

Robert! What version of Windows are you running on? when I try to change the scheduled tasks for sym_monitor and kill_timeout to system I get different results, I can't upload a pic to this comment though

Robert Farstad said...

Hi Victor. Sorry for the late reply to your comment, but we already had this chat on Skype, didn´t we?
The thing you saw was that the "system" user, when you selected it into the scheduled tasks, it came up as "NT_Authority\System" right?
Well, I have seen the same things on some servers. I don´t know why this happens, but it´s all the same. It works just as long as you select the local user with username "system". What Windows does after you select that user, does not matter :-)