Skip to content

Forward all machine generated traffic to port 8443 for cmsweb services.#10330

Merged
amaltaro merged 3 commits intodmwm:masterfrom
todor-ivanov:feature_WMCore_portUsage_fix-10119
Mar 19, 2021
Merged

Forward all machine generated traffic to port 8443 for cmsweb services.#10330
amaltaro merged 3 commits intodmwm:masterfrom
todor-ivanov:feature_WMCore_portUsage_fix-10119

Conversation

@todor-ivanov
Copy link
Copy Markdown
Contributor

@todor-ivanov todor-ivanov commented Mar 5, 2021

Fixes #10119

Status

needs some more testing

Description

In order to finalize the forwarding of the whole machine (NOT human) generated traffic with destination https://cmsweb.*.cern.ch to port 8443 we made an extensive investigation on the pieces of code and APIs left to still query the standard 443 port, which is now reserved for human generated traffic only. The results are commented in the issue itself, but here are the two major changes to WMCore/WMAgent related services that needs to be introduced with the current PR:

  • Because at some places we import Requests and JSONRequests classes directly but not through the WMCore.Services.Service we need to add a check and a url mangling logic so that we can forward all traffic with destination cmsweb to https://cmsweb.*.cern.ch:8443
  • Even though the CouchDB server port is set in the configuration to port 5984. We still see some queries asking for port 443 with source client "CouchDB/1.6.1". These are the folwoing two APIs:
    • /couchdb/wmstats/...
    • /couchdb/workqueue/...
      They are coming from the following module WMCore.Database.CMSCouch and whoever imports it. This is basically related to the logic used for automated database replication between the agents and the central services.

Is it backward compatible (if not, which system it affects?)

YES

Related PRs

The relevant change to the top level class WMCore.Services.Service is here:
#8726

External dependencies / deployment changes

IMPORTANT
It needs a change in the WMAgent secrets file. The change should look like [1]. And will take effect after agent deployment. It should result in the following configuration changes in the agent config itself [2].

[1]

$ diff -u /data/admin/wmagent/WMAgent.secrets /data/admin/wmagent/WMAgent.secrets~
--- /data/admin/wmagent/WMAgent.secrets	2021-03-19 06:53:46.743574324 +0100
+++ /data/admin/wmagent/WMAgent.secrets~	2021-03-19 06:51:33.385626269 +0100
@@ -7,8 +7,8 @@
-GLOBAL_WORKQUEUE_URL=https://cmsweb-testbed.cern.ch:8443/couchdb/workqueue
-WMSTATS_URL=https://cmsweb-testbed.cern.ch:8443/couchdb/wmstats
+GLOBAL_WORKQUEUE_URL=https://cmsweb-testbed.cern.ch/couchdb/workqueue
+WMSTATS_URL=https://cmsweb-testbed.cern.ch/couchdb/wmstats

For private deployment please do swap your central services instance with https://cmsweb-testbed.cern.ch

[2]

cmst1@vocms0290:/data/srv/wmagent/current $ diff -u  config/wmagent/config.py*

-config.General.centralWMStatsURL = 'https://cmsweb-testbed.cern.ch:8443/couchdb/wmstats'
+config.General.centralWMStatsURL = 'https://cmsweb-testbed.cern.ch/couchdb/wmstats'

-config.WorkQueueManager.queueParams = {'QueueURL': 'http://vocms0290.cern.ch:5984', 'central_logdb_url': 'https://cmsweb-testbed.cern.ch/couchdb/wmstats_logdb', 'log_reporter': 'vocms0290.cern.ch', 'ParentQueueCouchUrl': 'https://cmsweb-testbed.cern.ch:8443/couchdb/workqueue', 'RequestDBURL': 'https://cmsweb-testbed.ch/couchdb/reqmgr_workload_cache', 'QueueDepth': 0.5, 'rucioAccount': 'wmcore_transferor', 'WorkPerCycle': 200}
+config.WorkQueueManager.queueParams = {'QueueURL': 'http://vocms0290.cern.ch:5984', 'central_logdb_url': 'https://cmsweb-testbed.cern.ch/couchdb/wmstats_logdb', 'log_reporter': 'vocms0290.cern.ch', 'ParentQueueCouchUrl': 'https://cmsweb-testbed.cern.ch/couchdb/workqueue', 'RequestDBURL': 'https://cmsweb-testbed.cern.ch/couchdb/reqmgr_workload_cache', 'QueueDepth': 0.5, 'rucioAccount': 'wmcore_transferor', 'WorkPerCycle': 200}

For private deployment please do swap your central services instance with https://cmsweb-testbed.cern.ch

NOTE:
Because we implemented this directly into the code by chaniging the file src/python/WMComponent/AgentStatusWatcher/AgentStatusPoller.py and forwarding all the urls related the couchdb replication process, the above change my be unnneded. Keepeing it here just for historical reasons and to have a documented alternative.

@todor-ivanov todor-ivanov requested a review from amaltaro March 5, 2021 11:40
@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: succeeded
  • Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 2 warnings
    • 40 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 23 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11428/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

Alan @amaltaro, please take a look at the PR explanation and the so proposed change, even though it says needs some more investigation in the Satus section. The so needed investigation is about how to forward the CouchDB client itself to port 8443. It is not something that can be done through WMCore code IMHO. Any ideas on how to change the port configuration for the database replication process is very welcome. Thanks!

@amaltaro
Copy link
Copy Markdown
Contributor

amaltaro commented Mar 8, 2021

@todor-ivanov can you please share 2 or 3 of those urls that you still see going through port 443? They are truncated in the initial description. I might be wrong, but I think the _changes CouchDB API is one of those that go through 443.

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

Hi @amaltaro Here [1] are few of them. Thanks for taking a look. Yes indeed _changes is one of them, even though I cannot find it right now, but I am 100% sure it is called, because I followed a trace regarding it two days ago.

[1]

     51 /couchdb/workqueue/_bulk_docs /... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
      1 /couchdb/workqueue/c474b90056a21b165d1d9de9a4ec4ca3 /... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
      1 /couchdb/workqueue/d24030a8163eb179206df1ec7d8fd381 /... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
    205 /couchdb/workqueue/ /... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
      1 /couchdb/workqueue/ecafbd9ced1b8e24f229ed65c3db6c0a /... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
     12 /couchdb/workqueue/_ensure_full_commit /... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
      1 /couchdb/workqueue/fad5709cbc0ad2c07f1c3f7601ddf110 /... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
     67 /couchdb/workqueue/_revs_diff /... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
     12 /couchdb/workqueue/_local/... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | 188.185.70.35 | 443
      5 /reqmgr2/data/request | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "WMCore.Services.Requests/v002" | 188.185.86.116 | 443
      4 /wmstatsserver/data/globallocks | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "WMCore.Services.Requests/v002" | 188.185.86.116 | 443

@amaltaro
Copy link
Copy Markdown
Contributor

amaltaro commented Mar 9, 2021

I failed to see which parameter we could use to change the SSL port to be used, configuration documentation is here:
https://docs.couchdb.org/en/1.6.1/config-ref.html

About your list, I think the following URLs are under our control and it should be possible to change the port, such as:
wmstatsserver REST APIs
reqmgr2 REST APIs

and perhaps the direct access to couchdb documents, e.g.:
/couchdb/workqueue/ecafbd9ced1b8e24f229ed65c3db6c0

but this one I'm not sure whether it is triggered from the couchdb replication, or from one of our WMCore clients.

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

Sorry to inform you @amaltaro but your comment did not help too much. I simply cannot understand the whole construction of CouchDB here. More precisely this whole picture [1] must have been set somewhere, together with the replication parameters for each one of those databases. And the documentation, besides this picture and brief comment on the purpose which database serves is very scarce. I am banging my head for two days now in this wall with exactly 0 success. I found the place where the ssl port is configured in our machines - after deployment it is in [2] but it is actually not changed at all (this is actually the default coming with the service mainstream config, which is 6984). The port itself is not even opened on neither of the instances: neither central services nor agent [3] && [4] so there must be another place where port redirections may happen in a way that eventual calls to those ports are redirected to something else... I have no idea.

And I actually do not think this is related to the ssl authentication itself. This IMHO is a standard database replications and calls, meaning this is actually the place and port to which the actual database query is sent (as we can see from the URIs for the APIs I find in the logs).

[1]
https://github.com/dmwm/WMCore/wiki/CouchDB-Diagram

[2]
/data/srv/HG2103g/sw/slc7_amd64_gcc630/external/couchdb15/1.6.1-comp7/etc/couchdb/default.ini

[3]

[tivanov@vocms0290 WMCoreDev.d]$ nmap -n  -p5984-6984 127.0.0.1

Starting Nmap 6.40 ( http://nmap.org ) at 2021-03-10 15:14 CET
Nmap scan report for 127.0.0.1
Host is up (0.00072s latency).
Not shown: 1000 closed ports
PORT     STATE SERVICE
5984/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.07 seconds

[4]

[tivanov@vocms0290 WMCoreDev.d]$ nmap -n  -p5984-6984 tivanov-unit02.cern.ch

Starting Nmap 6.40 ( http://nmap.org ) at 2021-03-10 15:14 CET
Nmap scan report for tivanov-unit02.cern.ch (188.185.86.116)
Host is up (0.00062s latency).
Not shown: 1000 closed ports
PORT     STATE SERVICE
5984/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.08 seconds

@amaltaro
Copy link
Copy Markdown
Contributor

Todor, I'd suggest to split our next actions here (actually yours) in 2:

  1. reqmgr2 and wmstatsserver
  2. couchdb

Talking about step 1). From your log from 2 days ago, there are still a few calls/APIs going through port 443. Those are things that should be under our control and I think we can change the port to use.

Then talking about step 2). I didn't even remember that documentation on the CouchDB databases, thanks for finding and sharing it. Talking about the agents, there is another couchdb configuration file that overrides whatever is provided in the default.ini that you pointed out, it sits under the current directory then config/couchdb/local.ini . The port defined there is simply the port that CouchDB is supposed to be listening on (I cannot check because nmap is not installed on the agents).

But as you correctly mentioned, the couchdb port is unrelated to the SSL port that we will use to contact the frontend. I have the feeling that we will not be able to change the behavior/urls/port used within the CouchDB replication. If we can't change that, then we need to communicate it to Valentin and see what the impact would be if we need to remain with the standard 443 port for CouchDB calls.

Can you please start with point 1) and try to move those away of port 443, such that we end up only and only with couchdb traffic going through 443?

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

todor-ivanov commented Mar 10, 2021

@amaltaro Those two APIs that you mention in 1) from the log message I sent you in a hurry yesterday, they are from client: WMCore.Services.Requests/v002 and must already have been fixed by the current change. They must have entered the frontend logs as some leftovers from my MSRuleCleaner standalone runs. So we are basically left with 2) only.

p.s. It is worth mentioning though that I found another client trying to connect port 443 but I think it is triggered due to human interactions with Wmstats User Interface. And it is coming with source central services instance (tivanov-unit02.cern.ch) with destination the central instance (tivanov-unit02.cern.ch), but the client is again a third parity client/library: zlib/1.2.8. So I am not sure we should do anything about that either.

[1]

2 /wmstatsserver/data/teams | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | zlib/1.2.8" | tivanov-unit02.cern.ch | 443

@amaltaro
Copy link
Copy Markdown
Contributor

Perfect! Then this PR is basically ready to go, but before that, I think we can give it a last try on the couchdb replication.
Here is the relevant code:
https://github.com/dmwm/WMCore/blob/master/src/python/WMComponent/AgentStatusWatcher/AgentStatusPoller.py#L70-L105

and what we need to do, is to update the agent configuration file with the new port (this must be done right after deploying a fresh new agent AND before starting any components). From a quick look at the code, it looks like the configuration attributes that we need to change are:

config.General.centralWMStatsURL
config.WorkQueueManager.queueParams (the ParentQueueCouchUrl parameter)

then start the agent and check whether:

  • replications are still working properly (could be done by having workqueue elements coming to the agent, by seeing job information in wmstats); or with some couchdb APIs like _active_tasks
  • and whether it's using the new port

Can you check that out Todor? Please let me know if you need further details, either here or on slack.

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

todor-ivanov commented Mar 10, 2021

Thanks @amaltaro I just redeployed a brand new agent and it seems like your suggested configuration change is doing the job [1]. The CouchDB client 1.6.1 now seems to be forwarded to port 8443. But just to be on the same page, can you give me some more details on what exactly did you mean by:

replications are still working properly (could be done by having workqueue elements coming to the agent, by seeing job information in wmstats); or with some couchdb APIs like _active_tasks

I am now going to resubmit a full validation campaign in order to test everything.

p.s. Here is the full set of changes change I have made to the agent config [2] This includes also the changes required from the temporary forwarding all our development agents to the Rucio production servers, which is explained here: dmwm/deployment#972

[1]

1 /couchdb/workqueue/ //... | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "CouchDB/1.6.1" | vocms0290.cern.ch | 8443

[2]

cmst1@vocms0290:/data/srv/wmagent/current $ diff -u  config/wmagent/config.py*

-config.General.centralWMStatsURL = 'https://tivanov-unit02.cern.ch:8443/couchdb/wmstats'
+config.General.centralWMStatsURL = 'https://tivanov-unit02.cern.ch/couchdb/wmstats'

-config.WorkQueueManager.queueParams = {'QueueURL': 'http://vocms0290.cern.ch:5984', 'central_logdb_url': 'https://tivanov-unit02.cern.ch/couchdb/wmstats_logdb', 'log_reporter': 'vocms0290.cern.ch', 'ParentQueueCouchUrl': 'https://tivanov-unit02.cern.ch:8443/couchdb/workqueue', 'RequestDBURL': 'https://tivanov-unit02.cern.ch/couchdb/reqmgr_workload_cache', 'QueueDepth': 0.5, 'rucioAccount': 'wmcore_transferor', 'WorkPerCycle': 200}
+config.WorkQueueManager.queueParams = {'QueueURL': 'http://vocms0290.cern.ch:5984', 'central_logdb_url': 'https://tivanov-unit02.cern.ch/couchdb/wmstats_logdb', 'log_reporter': 'vocms0290.cern.ch', 'ParentQueueCouchUrl': 'https://tivanov-unit02.cern.ch/couchdb/workqueue', 'RequestDBURL': 'https://tivanov-unit02.cern.ch/couchdb/reqmgr_workload_cache', 'QueueDepth': 0.5, 'rucioAccount': 'wmcore_transferor', 'WorkPerCycle': 200}

-config.WorkQueueManager.rucioAuthUrl = 'https://cms-rucio-auth.cern.ch'
+config.WorkQueueManager.rucioAuthUrl = 'https://cmsrucio-auth-int.cern.ch'
-config.WorkQueueManager.rucioUrl = 'http://cms-rucio.cern.ch'
+config.WorkQueueManager.rucioUrl = 'http://cmsrucio-int.cern.ch'

@amaltaro
Copy link
Copy Markdown
Contributor

If a couple of workflows manage to run on the agent; manage to move to completed; and you have the job information on those wmstats columns, including possible failures. Then that will confirm that replications are working properly.

Another way to check would be by calling the couchdb APIs, such as:

curl -ks "http://COUCHUSER:COUCHPASS@localhost:5984/_active_tasks"

and scanning the content of the _replicator database

curl -ks http://localhost:5984/_replicator
curl -ks http://COUCHUSER:COUCHPASS@localhost:5984/_replicator/_all_docs?include_docs=true

that's a bit more cryptic though.

I had a quick look at your agent and it seems to be working properly, but it's better if you check what I mentioned in the first paragraph above as well.

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

todor-ivanov commented Mar 11, 2021

Hi @amaltaro Thanks again.

Actually all the workflows I submitted last night are in running closed or completed status now. I am about to also check what you mentioned above. Here [1] is the result of the frontend log scan (I am going to paste it in the issue too just to have a nice history).

BTW I do see some errors in the couchdb logs when it tries to sync with wmstats [2], but I am almost sure they are not related to the port change.

[1]

+--------+---------------------------------------+--------------------------------------------+---------------------------------+------------------------+-------+
| # Req: | API:                                  | DN:                                        | Client:                         | ClientAddr:            | Port: |
+--------+---------------------------------------+--------------------------------------------+---------------------------------+------------------------+-------+
|      1 | /wmstatsserver/data/teams             | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | zlib/1.2.8"                     | tiavnov-unit02.cern.ch |   443 |
|        |                                       |                                            |                                 |                        |       |
|        |                                       |                                            |                                 |                        |       |
|     23 | /couchdb/wmstats/...                  | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "CouchDB/1.6.1"                 | vocms0290.cern.ch      |  8443 |
|    491 | /couchdb/workqueue/...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "CouchDB/1.6.1"                 | vocms0290.cern.ch      |  8443 |
|      3 | /couchdb/_all_dbs /...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|      2 | /couchdb/acdcserver/...               | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|     78 | /couchdb/reqmgr_workload_cache/...    | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|    102 | /couchdb/wmstats/...                  | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|   2418 | /couchdb/wmstats_logdb/...            | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|      4 | /couchdb/workloadsummary/...          | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|    227 | /couchdb/workqueue/...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|     71 | /couchdb/workqueue_inbox/...          | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|   1222 | /reqmgr2/data/...                     | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|     86 | /wmstatsserver/data/filtered_requests | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|     80 | /couchdb/_all_dbs /...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|      1 | /couchdb/acdcserver/...               | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|    866 | /couchdb/reqmgr_auxiliary/...         | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|  10120 | /couchdb/reqmgr_workload_cache/...    | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|    105 | /couchdb/t0_request/...               | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|    244 | /couchdb/wmstats/...                  | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|    794 | /couchdb/wmstats_logdb/...            | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|    736 | /couchdb/workqueue/...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|    448 | /couchdb/workqueue_inbox/...          | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|    474 | /reqmgr2/data/...                     | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|     15 | /wmstatsserver/data/filtered_requests | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|      1 | /wmstatsserver/data/globallocks       | /DC=ch/DC=cern/OU=computers/CN=wmagent/... | "WMCore.Services.Requests/v002" | tiavnov-unit02.cern.ch |  8443 |
|     66 | /reqmgr2/data/...                     | /DC=ch/DC=cern/OU=Organic Units/OU=Users/. | "-"                             | tiavnov-unit02.cern.ch |  8443 |
+--------+---------------------------------------+--------------------------------------------+---------------------------------+------------------------+-------+

[2]

[Wed, 10 Mar 2021 23:02:59 GMT] [error] [<0.502.0>] ** Generic server <0.502.0> terminating 
** Last message in was {'EXIT',<0.466.0>,
                           {checkpoint_commit_failure,
                               <<"Target database out of sync. Try to increase max_dbs_open at the target's server.">>}}
** When Server state == {state,<0.466.0>,<0.503.0>,10,
                         {httpdb,
                          "http://blah@localhost:5984/wmagent_summary/",
                          nil,
                          [{"Accept","application/json"},
                           {"User-Agent","CouchDB/1.6.1"}],
                          30000,
                          [{socket_options,[{keepalive,true},{nodelay,true}]}],
                          10,250,<0.467.0>,10},
                         {httpdb,
                          "https://tivanov-unit02.cern.ch:8443/couchdb/wmstats/",
                          nil,
                          [{"Accept","application/json"},
                           {"User-Agent","CouchDB/1.6.1"}],

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

todor-ivanov commented Mar 11, 2021

And here is the output from querying couchdb locally at the agent. [1] && [2]

[2]

vocms0290:/data/srv/wmagent/current $ curl -ks "http://*****:*****@localhost:5984/_active_tasks"
[{'pid': '<0.6856.0>',
  'checkpoint_interval': 600000,
  'checkpointed_source_seq': 5340,
  'continuous': True,
  'doc_id': '93a73a1f1c96b2ea6564fba7340011d1',
  'doc_write_failures': 0,
  'docs_read': 4,
  'docs_written': 4,
  'missing_revisions_found': 4,
  'progress': 99,
  'replication_id': 'aab1b78131839ba0833aaf481c53de3d+continuous',
  'revisions_checked': 212,
  'source': 'https://tivanov-unit02.cern.ch:8443/couchdb/workqueue/',
  'source_seq': 5360,
  'started_on': 1615415584,
  'target': 'http://*****:*****@localhost:5984/workqueue_inbox/',
  'type': 'replication',
  'updated_on': 1615454590},
 {'pid': '<0.6888.0>',
  'checkpoint_interval': 600000,
  'checkpointed_source_seq': 274,
  'continuous': True,
  'doc_id': '93a73a1f1c96b2ea6564fba734002179',
  'doc_write_failures': 0,
  'docs_read': 185,
  'docs_written': 185,
  'missing_revisions_found': 185,
  'progress': 99,
  'replication_id': '8a887eac7e3f09cb6d9e4659896dffd5+continuous',
  'revisions_checked': 212,
  'source': 'http://localhost:5984/workqueue_inbox/',
  'source_seq': 275,
  'started_on': 1615415585,
  'target': 'https://tivanov-unit02.cern.ch:8443/couchdb/workqueue/',
  'type': 'replication',
  'updated_on': 1615454592},
 {'pid': '<0.12309.0>',
  'checkpoint_interval': 600000,
  'checkpointed_source_seq': 7,
  'continuous': True,
  'doc_id': '93a73a1f1c96b2ea6564fba7340004a5',
  'doc_write_failures': 0,
  'docs_read': 3,
  'docs_written': 3,
  'missing_revisions_found': 3,
  'progress': 100,
  'replication_id': 'e2064deca174fde1c8556277a77f4189+continuous',
  'revisions_checked': 4,
  'source': 'http://*****:*****@localhost:5984/wmagent_summary/',
  'source_seq': 7,
  'started_on': 1615417384,
  'target': 'https://tivanov-unit02.cern.ch:8443/couchdb/wmstats/',
  'type': 'replication',
  'updated_on': 1615454586}]

[2]

vocms0290:/data/srv/wmagent/current $ curl -ks http://localhost:5984/_replicator
{'db_name': '_replicator',
 'doc_count': 4,
 'doc_del_count': 0,
 'update_seq': 13,
 'purge_seq': 0,
 'compact_running': False,
 'disk_size': 32869,
 'data_size': 3844,
 'instance_start_time': '1615411865258307',
 'disk_format_version': 6,
 'committed_update_seq': 13}

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

Ok, that last line left from the frontend logs which was about a client querying port 443 and I blamed just the zlib library was utterly misleading. Because of the parsing script I created there was some info cut from the client signature. The actual client was pycurl . Here is the full line from the frontend logs directly: [1]

[1]

[11/Mar/2021:10:17:36 +0100] tivanov-unit02.cern.ch 188.185.86.116 "GET /wmstatsserver/data/teams HTTP/1.1" 200 [data: 3186 in 21656 out 58 body 53234 us ] [auth: TLSv1.2 ECDHE-RSA-AES128-GCM-SHA256 "/DC=ch/DC=cern/OU=computers/CN=wmagent/..." "-" ] [ref: "-" "PycURL/7.43.0.3 libcurl/7.59.0 OpenSSL/1.0.2k zlib/1.2.8" ] 443

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

todor-ivanov commented Mar 11, 2021

So now since we already know that this zlib client is actually pycurl and all those calls come from pycurl_manager.RequestHandler, I pushed a little bit harder on the micro services part and forced MSRuleCleaner to run and I got a confirmation: [1] - whoever imports pycurl_manager.RequestHandler directly is actually a potential generator of calls to port 443. And here is the list of eventual suspects [2]. So we should decide whether we are going to put the change again in the pycurl_manager module only or at every single place.

At the end of the day what we ended up with is basically the flowing chain of modules WMCore.Services.Service -> WMCore.Services.Requests -> WMCore.Services.pycurl_manager.RequstHandler, which in different places of the WMCore system is imported/referred from a different chain link. So far we have covered the first two pieces of that chain only the last one is left.

[1]

      1 /ms-output/data/info | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "PycURL/7.43.0.3 zlib/1.2.8" | tivanov-unit02.cern.ch | 443
      1 /wmstatsserver/data/teams | /DC=ch/DC=cern/OU=computers/CN=wmagent/vocms0192.cern.ch | "PycURL/7.43.0.3 zlib/1.2.8" | tivanov-unit02.cern.ch | 443

[2]

------------------
src/python/WMCore/MicroService/Tools/PycurlRucio.py:
24:from WMCore.Services.pycurl_manager import RequestHandler
65:    mgr = RequestHandler()
91:    mgr = RequestHandler()
------------------
src/python/WMCore/MicroService/Unified/Common.py:
25:from WMCore.Services.pycurl_manager import RequestHandler
360:    mgr = RequestHandler()
370:    mgr = RequestHandler()
559:    mgr = RequestHandler()
571:    mgr = RequestHandler()
------------------
src/python/WMCore/MicroService/Unified/MSRuleCleaner.py:
30:from WMCore.Services.pycurl_manager import RequestHandler
90:        self.curlMgr = RequestHandler()
------------------
src/python/WMCore/MicroService/Unified/SiteInfo.py:
34:from WMCore.Services.pycurl_manager import RequestHandler
42:    mgr = RequestHandler()
139:    mgr = RequestHandler()
162:    mgr = RequestHandler()
------------------
src/python/WMCore/ReqMgr/CherryPyThreads/AuxCacheUpdateTasks.py:
9:from WMCore.Services.pycurl_manager import RequestHandler
21:        self.mgr = RequestHandler()
------------------
src/python/WMCore/ReqMgr/Web/ReqMgrService.py:
46:from WMCore.Services.pycurl_manager import RequestHandler
54:    mgr = RequestHandler()
------------------
src/python/WMCore/Services/MonIT/Grafana.py:
18:from WMCore.Services.pycurl_manager import RequestHandler
56:        mgr = RequestHandler(logger=self.logger)
------------------
src/python/WMCore/Services/Requests.py:
44:    from WMCore.Services.pycurl_manager import RequestHandler, ResponseHeader
80:            self.reqmgr = RequestHandler()
------------------

@vkuznet
Copy link
Copy Markdown
Contributor

vkuznet commented Mar 11, 2021

@todor-ivanov , my preference would be if you'll fix pycurl_manager.RequestHandler itself since it will eliminate all potentials issues in a future. Fixing individual parts now only time consuming, but also open a door that some client can still use pycurl_manager.RequestHandler and their requests will go to wrong port.

And, sorry for zlib identified of the client, it is partially my fault as I made original parser and did not spend too much time to adjust it to all use cases.

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

Thanks @vkuznet, we are on the same page here. I also think we should fix the pycurl_manager itself.

And BTW, I should take the blame for the wrong identification of the zlib -> pycurl client, because I took your initial parser and did some work on top of it, so I should have noticed that it does not account for spaces in the client name. But anyway, the good part is that now we know the exact point where the change should happen. Thanks!

@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: failed
    • 375 new failures
    • 14 tests deleted
    • 1 tests added
    • 12 changes in unstable tests
  • Pylint check: failed
    • 5 warnings and errors that must be fixed
    • 2 warnings
    • 94 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 44 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11476/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

@amaltaro please take a look at the latest solution proposed here (basically the last two commits).

@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: failed
    • 375 new failures
    • 14 tests deleted
    • 1 tests added
    • 12 changes in unstable tests
  • Pylint check: failed
    • 3 warnings and errors that must be fixed
    • 2 warnings
    • 95 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 45 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11478/artifact/artifacts/PullRequestReport.html

@amaltaro
Copy link
Copy Markdown
Contributor

Todor, I like the idea of creating a decorator for this change. However, I think your previous 2 line fix is much more sustainable and clean.
I second what Valentin said, we should fix it in the pycurl module; and perhaps that will cover 100% of the code, and we can even remove it from the other 2 places that are actually an upper layer of pycurl. The couch calls (with CMSCouch) would have to be verified once again though.

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

Hi @amaltaro, Yes, the two line solution was quite simple and was doing the job for Requsts quite well, but there I have the url in the constructor itself, so those two lines can be applied at a single place. While with pycrl_manager the urls are passed by function/method call, meaning I will have to apply those same lines on many places. And unfortunately the calls inside the pycurl manager they create loops, which mean such a solution would accumulate changes to the url string on every consequent loop,, which is basically a show stopper.

The decorator on the other hand works, but I it suffers from different issues, which I am trying to solve naw and simplify.

I agree on the single place of change and trying to encapsulate the change only in the pycurl_manager. Working on that now.

@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: failed
    • 318 new failures
    • 200 tests deleted
    • 1 tests added
    • 11 changes in unstable tests
  • Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 2 warnings
    • 94 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 45 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11481/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the feature_WMCore_portUsage_fix-10119 branch from e6bfaf4 to 6143cb8 Compare March 13, 2021 03:54
@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: succeeded
  • Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 2 warnings
    • 94 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 45 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11482/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the feature_WMCore_portUsage_fix-10119 branch from 6143cb8 to 0adec47 Compare March 13, 2021 11:37
@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: succeeded
  • Pylint check: failed
    • 2 warnings and errors that must be fixed
    • 2 warnings
    • 93 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 45 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11483/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: succeeded
    • 2 tests added
  • Pylint check: failed
    • 4 warnings and errors that must be fixed
    • 16 warnings
    • 160 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 70 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11514/artifact/artifacts/PullRequestReport.html

@todor-ivanov todor-ivanov force-pushed the feature_WMCore_portUsage_fix-10119 branch from 1a5880e to 311a639 Compare March 19, 2021 14:31
@todor-ivanov
Copy link
Copy Markdown
Contributor Author

@amaltaro I just fixed a root logger Bug which the decorator could have created and added a change to the AgentStatusPooler component as we talked in a private chat so that we avoid changing any urls at config time. I hope you will be able to take another look. Thanks!

@todor-ivanov todor-ivanov requested a review from amaltaro March 19, 2021 14:39
@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: succeeded
    • 2 tests added
  • Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 19 warnings
    • 180 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 71 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11516/artifact/artifacts/PullRequestReport.html

@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: succeeded
    • 2 tests added
  • Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 19 warnings
    • 180 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 71 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11515/artifact/artifacts/PullRequestReport.html

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

And just for the record here is the latest result about the port usage, coming from parsing the forntend logs after a full validation campaign injected. [1]

[1]

+--------+---------------------------------------+----------------------------------------------+---------------------------------+------------------------+-------+
| # Req: | API:                                  | DN:                                          | Client:                         | ClientAddr:            | Port: |
+--------+---------------------------------------+----------------------------------------------+---------------------------------+------------------------+-------+
|      9 | /couchdb/_all_dbs /...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|     13 | /couchdb/acdcserver/...               | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|    411 | /couchdb/reqmgr_workload_cache/...    | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|    111 | /couchdb/wmstats/...                  | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "CouchDB/1.6.1"                 | vocms0290.cern.ch      |  8443 |
|    303 | /couchdb/wmstats/...                  | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|   6560 | /couchdb/wmstats_logdb/...            | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|    111 | /couchdb/workloadsummary/...          | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|    981 | /couchdb/workqueue/...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "CouchDB/1.6.1"                 | vocms0290.cern.ch      |  8443 |
|    589 | /couchdb/workqueue/...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|    179 | /couchdb/workqueue_inbox/...          | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|   4532 | /reqmgr2/data/...                     | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|    237 | /wmstatsserver/data/filtered_requests | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | vocms0290.cern.ch      |  8443 |
|     90 | /couchdb/_all_dbs /...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|     66 | /reqmgr2/data/...                     | /DC=ch/DC=cern/OU=Organic Units/OU=Users/... | "-"                             | tivanov-unit02.cern.ch |  8443 |
|      3 | /couchdb/acdcserver/...               | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|   1395 | /couchdb/reqmgr_auxiliary/...         | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|   5857 | /couchdb/reqmgr_workload_cache/...    | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|    287 | /couchdb/t0_request/...               | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|    654 | /couchdb/wmstats/...                  | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|   2146 | /couchdb/wmstats_logdb/...            | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|   1619 | /couchdb/workqueue/...                | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|    948 | /couchdb/workqueue_inbox/...          | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|      7 | /ms-output/data/info                  | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | zlib/1.2.8"                     | tivanov-unit02.cern.ch |  8443 |
|   1116 | /reqmgr2/data/...                     | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|     41 | /wmstatsserver/data/filtered_requests | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|      9 | /wmstatsserver/data/globallocks       | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | "WMCore.Services.Requests/v002" | tivanov-unit02.cern.ch |  8443 |
|      1 | /wmstatsserver/data/teams             | /DC=ch/DC=cern/OU=computers/CN=wmagent/...   | zlib/1.2.8"                     | tivanov-unit02.cern.ch |  8443 |
|      1 | /ms-output/data/status                | /DC=ch/DC=cern/OU=Organic Units/OU=Users...  | Safari/537.36"                  | lxplus7117.cern.ch     |  8443 |
|      1 | /ms-rulecleaner/data/status           | /DC=ch/DC=cern/OU=Organic Units/OU=Users...  | Safari/537.36"                  | lxplus7117.cern.ch     |  8443 |
|      1 | /wmstatsserver/data/filtered_requests | /DC=ch/DC=cern/OU=Organic Units/OU=Users...  | Safari/537.36"                  | lxplus7117.cern.ch     |  8443 |
|      1 | /wmstatsserver/data/globallocks       | /DC=ch/DC=cern/OU=Organic Units/OU=Users...  | Safari/537.36"                  | lxplus7117.cern.ch     |  8443 |
+--------+---------------------------------------+----------------------------------------------+---------------------------------+------------------------+-------+

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

And one final note which is a must.
Before we merge we need to inform CRAB team for the change.

@belforte Stefano please take a look at the changes suggested here and raise any concern you might have. Thank you!

Copy link
Copy Markdown
Contributor

@amaltaro amaltaro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me, Todor. Can you please squash the 4th commit into 1st or 2nd one (whichever makes most sense to you)? Thanks

@amaltaro
Copy link
Copy Markdown
Contributor

@mapellidario if you feel like having a look at it today/Monday as well, just so we can catch any possible mistakes in dual-stack coding. Feel free to drop your comment even if it has been merged, planned for the next 30min or so.

… && Remove argument parsing from the decorator && Remove static url chage from Services.

Adding the PortForward class with __call__ method && Applying PortForward in the global scope function getdata().

Take function call outside try/except for the decorator.

Typo in portMangle function name.

Import division from future.

Avoid calling logging.basicConfig against the root logger from inside the decorator.

Apply port forwarding for couchdb replication in AgentStatusPoller.
Unittests

Unittests
@todor-ivanov todor-ivanov force-pushed the feature_WMCore_portUsage_fix-10119 branch from 8b605c8 to e5b38f6 Compare March 19, 2021 15:10
@mapellidario
Copy link
Copy Markdown
Member

From a first quick glance I have not seen anything worrying. Approved!

I will have another look later today, but it is unlikely that I will spot anything new

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

Thanks @mapellidario !

@cmsdmwmbot
Copy link
Copy Markdown

Jenkins results:

  • Unit tests: succeeded
    • 2 tests added
    • 1 changes in unstable tests
  • Pylint check: failed
    • 10 warnings and errors that must be fixed
    • 19 warnings
    • 180 comments to review
  • Pylint py3k check: succeeded
    • 0 errors and warnings that should be fixed
    • 0 warnings
    • 0 comments to review
  • Pycodestyle check: succeeded
    • 71 comments to review
  • Python3 compatibility checks: succeeded

Details at https://cmssdt.cern.ch/dmwm-jenkins/view/All/job/DMWM-WMCore-PR-test/11517/artifact/artifacts/PullRequestReport.html

@belforte
Copy link
Copy Markdown
Member

all CRAB traffic is using port 8443 already.
That said, I will look at this ASAP and let you know if I fear disruption
For the future: review of design before implementation may be more effective

@amaltaro
Copy link
Copy Markdown
Contributor

@belforte Stefano, please do let us know if you have any concerns.
From my side, I'm curious to see whether it can affect any expected/trapped exceptions upstream, but at this stage, the only thing we can do is to keep running tests for a longer period of time.

I'm merging it now such that we can build services on top of it.

@amaltaro amaltaro merged commit a75b590 into dmwm:master Mar 19, 2021
@vkuznet
Copy link
Copy Markdown
Contributor

vkuznet commented Mar 19, 2021

that's really good news. Can you tell us when do we expect the upgrade on all DMWM services? Do you need time to make new release and upgrade all services?

@belforte
Copy link
Copy Markdown
Member

Hi all. Well.. it is not written explicitly anywhere, but IIUC pycurl_manager.py will force use of port 8443 whenever there's an URL which starts with https://cmsweb, changing a different port number if present, and sort of leaving as is in case port 8443 is there already.
That will be no problem for CRAB which already passes URL's like https://cmsweb....cern.ch:8443 to pycurl_manager.

I would surely have not written such a complex code for this, but maybe you have so many moving pieces that you have something with hostname cmsweb* which needs the default port number 443. If I were you, in case pycurl_manager is passed a port number other than 443, I would have respected the wisdom of the client and left it as it was, but... again, maybe there are complexities here which I can't fathom.

Indeed one day, when we align to WMCore HEAD again, and have spare time, we could remove the code which I put in CRAB a month ago ! A toast to clean code !
Cheers

@vkuznet
Copy link
Copy Markdown
Contributor

vkuznet commented Mar 19, 2021

Stefano, it is very good point and I encourage WMCore to take it seriously. I didn't inspect code but I would agree that we should acknowledge clients if the use explicit port.

@todor-ivanov
Copy link
Copy Markdown
Contributor Author

Thanks @belforte @vkuznet, that is a good point. Note taken. In the way the code is designed right now it is quite flexible, and such a change is for sure possible and rather easy I would say. I am creating a separate issue for that now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate port usage within WMCore services

6 participants