by Manoj Navalkar.
Thank you Ken for your guidance. We conducted dry runs to simulate the issue over Friday and Sat. Resharing details as the issue (Cloudflare is not the issue). We have some pointers to the possible reason.
* Background - There are 4 live moodle instances on our production server (I had mentioned it in my post) + 1 UAT instance of the largest moodle instance (installed on 16 June on the same server. Not accessed since then)
* 8-Jul - Moodle Mobile Test for 50 users in OFFLINE mode.
* 8-Jul - At server end, Error logs got hundreds and thousands of messages of the type. Over 6 hours, the most executed scripts were as follows (number of executions first) were:
870589 /home/servername/public_html/GS/webservice/rest/server.php
91787 /home/servername/public_html/lfii.YYYYYYY.com/pluginfile.php
26941 /home/servername/public_html/GS/lib/ajax/service.php
23412 /home/servername/public_html/lfii.YYYYYYY.com/lib/ajax/service.php
15110 /home/servername/public_html/lfii.YYYYYYY.com/lib/ajax/setuserpref.php
13775 /home/servername/public_html/GS/local/mobile/check.php
12710 /home/servername/public_html/lfii.YYYYYYY.com/mod/scorm/datamodel.php
9163 /home/servername/public_html/GS/webservice/pluginfile.php
7807 /home/servername/public_html/GS/login/token.php
6883 /home/servername/public_html/lfii.YYYYYYY.com/theme/image.php
[client 35.209.0.200:33176 Database transaction aborted automatically in /home/servername/public_html/SG/webservice/rest/server.php
Sample Error Messages
- Execute of /home/servername/public_html/lfii.YYYYYYY.com/pluginfile.php stopped because of load 35.05
- Execute of /home/servername/public_html/SG/webservice/rest/server.php stopped because of load 35.05
- Execute of /home/servername/public_html/lfii.YYYYYYY.com/pluginfile.php stopped because of load 35.05
- Execute of /home/servername/public_html/lfii.YYYYYYY.com/mod/scorm/datamodel.php stopped because of load 35.31
- Execute of /home/servername/public_html/SG/lib/ajax/service.php stopped because of load 35.31
* 8-Jul - Result - Users faced 'Error connecting to Server' errors; Administrators got 'Service Unavailable'
* 9-Jul - Dry Run 1 with 25-30 users across India. Similar number of error logs got generated.
* 10-Jul - Checked the UAT instance on the server and found its logs had entries AFTER 16 Jun too, despite no one accessing it! Removed UAT instance of the largest Moodle installation, removed its sub-domain, removed its linked SSL Cert.
* 10-Jul - Dry Run 2 with 25 users simultaneous start. Server error messages down to 2831. Many users still faced 'Error connecting to Server'.
* 11-Jul - Dry Run 2 with 25 users, time staggering start (5 users starting every 5 min). Server error messages down to 2111. SQL-related issues didnt occur. Only 1 user encountered 'Error connecting to Server' error. Administered encountered 'Service Unavailable' error multiple times.
We interacted with local tech experts and their view was that as the UAT installation (including sub-domain, the redirections and the SSL Cert), the webservice calls were getting impacted at the web server level. Hence, some of the activities were diverted to the UAT server. Once we deleted the UAT instance, the no. of errors reduced substantially.
Would welcome your views too. Let me know if you require any more details.
Ken, many thanx for the response. I have been going through your various responses to moodle users who are in trouble as also with long-standing issues (we still havent been able to crack our issues with large backups). I appreciate the help and support you provide to the community. Noted attachments shouldnt be .doc. I have saved the file of interaction with Server Hosting Support in text and re-attached here. Removed the Clourflare part of the interaction as that is digressing the discussion.
best regards,