Forward all machine generated traffic to port 8443 for cmsweb services.#10330
Conversation
|
Jenkins results:
|
|
Alan @amaltaro, please take a look at the PR explanation and the so proposed change, even though it says |
|
@todor-ivanov can you please share 2 or 3 of those urls that you still see going through port 443? They are truncated in the initial description. I might be wrong, but I think the |
|
Hi @amaltaro Here [1] are few of them. Thanks for taking a look. Yes indeed [1] |
|
I failed to see which parameter we could use to change the SSL port to be used, configuration documentation is here: About your list, I think the following URLs are under our control and it should be possible to change the port, such as: and perhaps the direct access to couchdb documents, e.g.: but this one I'm not sure whether it is triggered from the couchdb replication, or from one of our WMCore clients. |
|
Sorry to inform you @amaltaro but your comment did not help too much. I simply cannot understand the whole construction of CouchDB here. More precisely this whole picture [1] must have been set somewhere, together with the replication parameters for each one of those databases. And the documentation, besides this picture and brief comment on the purpose which database serves is very scarce. I am banging my head for two days now in this wall with exactly 0 success. I found the place where the ssl port is configured in our machines - after deployment it is in [2] but it is actually not changed at all (this is actually the default coming with the service mainstream config, which is And I actually do not think this is related to the ssl authentication itself. This IMHO is a standard database replications and calls, meaning this is actually the place and port to which the actual database query is sent (as we can see from the URIs for the APIs I find in the logs). [1] [2] [3] [4] |
|
Todor, I'd suggest to split our next actions here (actually yours) in 2:
Talking about step 1). From your log from 2 days ago, there are still a few calls/APIs going through port 443. Those are things that should be under our control and I think we can change the port to use. Then talking about step 2). I didn't even remember that documentation on the CouchDB databases, thanks for finding and sharing it. Talking about the agents, there is another couchdb configuration file that overrides whatever is provided in the default.ini that you pointed out, it sits under the current directory then But as you correctly mentioned, the couchdb port is unrelated to the SSL port that we will use to contact the frontend. I have the feeling that we will not be able to change the behavior/urls/port used within the CouchDB replication. If we can't change that, then we need to communicate it to Valentin and see what the impact would be if we need to remain with the standard 443 port for CouchDB calls. Can you please start with point 1) and try to move those away of port 443, such that we end up only and only with couchdb traffic going through 443? |
|
@amaltaro Those two APIs that you mention in 1) from the log message I sent you in a hurry yesterday, they are from client: p.s. It is worth mentioning though that I found another client trying to connect port 443 but I think it is triggered due to human interactions with Wmstats User Interface. And it is coming with source central services instance (tivanov-unit02.cern.ch) with destination the central instance (tivanov-unit02.cern.ch), but the client is again a third parity client/library: [1] |
|
Perfect! Then this PR is basically ready to go, but before that, I think we can give it a last try on the couchdb replication. and what we need to do, is to update the agent configuration file with the new port (this must be done right after deploying a fresh new agent AND before starting any components). From a quick look at the code, it looks like the configuration attributes that we need to change are: then start the agent and check whether:
Can you check that out Todor? Please let me know if you need further details, either here or on slack. |
|
Thanks @amaltaro I just redeployed a brand new agent and it seems like your suggested configuration change is doing the job [1]. The
I am now going to resubmit a full validation campaign in order to test everything. p.s. Here is the full set of changes change I have made to the agent config [2] This includes also the changes required from the temporary forwarding all our development agents to the Rucio production servers, which is explained here: dmwm/deployment#972 [1] [2] |
|
If a couple of workflows manage to run on the agent; manage to move to completed; and you have the job information on those wmstats columns, including possible failures. Then that will confirm that replications are working properly. Another way to check would be by calling the couchdb APIs, such as: and scanning the content of the that's a bit more cryptic though. I had a quick look at your agent and it seems to be working properly, but it's better if you check what I mentioned in the first paragraph above as well. |
|
Hi @amaltaro Thanks again. Actually all the workflows I submitted last night are in BTW I do see some errors in the couchdb logs when it tries to sync with [1] [2] |
|
And here is the output from querying couchdb locally at the agent. [1] && [2] [2] [2] |
|
Ok, that last line left from the frontend logs which was about a client querying port 443 and I blamed just the [1] |
|
So now since we already know that this At the end of the day what we ended up with is basically the flowing chain of modules [1] [2] |
|
@todor-ivanov , my preference would be if you'll fix And, sorry for |
|
Thanks @vkuznet, we are on the same page here. I also think we should fix the And BTW, I should take the blame for the wrong identification of the |
|
Jenkins results:
|
|
@amaltaro please take a look at the latest solution proposed here (basically the last two commits). |
|
Jenkins results:
|
|
Todor, I like the idea of creating a decorator for this change. However, I think your previous 2 line fix is much more sustainable and clean. |
|
Hi @amaltaro, Yes, the two line solution was quite simple and was doing the job for The decorator on the other hand works, but I it suffers from different issues, which I am trying to solve naw and simplify. I agree on the single place of change and trying to encapsulate the change only in the |
|
Jenkins results:
|
e6bfaf4 to
6143cb8
Compare
|
Jenkins results:
|
6143cb8 to
0adec47
Compare
|
Jenkins results:
|
|
Jenkins results:
|
1a5880e to
311a639
Compare
|
@amaltaro I just fixed a root logger Bug which the decorator could have created and added a change to the |
|
Jenkins results:
|
|
Jenkins results:
|
|
And just for the record here is the latest result about the port usage, coming from parsing the forntend logs after a full validation campaign injected. [1] [1] |
|
And one final note which is a must. @belforte Stefano please take a look at the changes suggested here and raise any concern you might have. Thank you! |
amaltaro
left a comment
There was a problem hiding this comment.
It looks good to me, Todor. Can you please squash the 4th commit into 1st or 2nd one (whichever makes most sense to you)? Thanks
|
@mapellidario if you feel like having a look at it today/Monday as well, just so we can catch any possible mistakes in dual-stack coding. Feel free to drop your comment even if it has been merged, planned for the next 30min or so. |
… && Remove argument parsing from the decorator && Remove static url chage from Services. Adding the PortForward class with __call__ method && Applying PortForward in the global scope function getdata(). Take function call outside try/except for the decorator. Typo in portMangle function name. Import division from future. Avoid calling logging.basicConfig against the root logger from inside the decorator. Apply port forwarding for couchdb replication in AgentStatusPoller.
8b605c8 to
e5b38f6
Compare
|
From a first quick glance I have not seen anything worrying. Approved! I will have another look later today, but it is unlikely that I will spot anything new |
|
Thanks @mapellidario ! |
|
Jenkins results:
|
|
all CRAB traffic is using port 8443 already. |
|
@belforte Stefano, please do let us know if you have any concerns. I'm merging it now such that we can build services on top of it. |
|
that's really good news. Can you tell us when do we expect the upgrade on all DMWM services? Do you need time to make new release and upgrade all services? |
|
Hi all. Well.. it is not written explicitly anywhere, but IIUC pycurl_manager.py will force use of port 8443 whenever there's an URL which starts with I would surely have not written such a complex code for this, but maybe you have so many moving pieces that you have something with hostname cmsweb* which needs the default port number 443. If I were you, in case pycurl_manager is passed a port number other than 443, I would have respected the wisdom of the client and left it as it was, but... again, maybe there are complexities here which I can't fathom. Indeed one day, when we align to WMCore HEAD again, and have spare time, we could remove the code which I put in CRAB a month ago ! A toast to clean code ! |
|
Stefano, it is very good point and I encourage WMCore to take it seriously. I didn't inspect code but I would agree that we should acknowledge clients if the use explicit port. |
Fixes #10119
Status
needs some more testing
Description
In order to finalize the forwarding of the whole machine (NOT human) generated traffic with destination
https://cmsweb.*.cern.chto port 8443 we made an extensive investigation on the pieces of code and APIs left to still query the standard 443 port, which is now reserved for human generated traffic only. The results are commented in the issue itself, but here are the two major changes to WMCore/WMAgent related services that needs to be introduced with the current PR:JSONRequestsclasses directly but not through theWMCore.Services.Servicewe need to add a check and a url mangling logic so that we can forward all traffic with destination cmsweb tohttps://cmsweb.*.cern.ch:8443"CouchDB/1.6.1". These are the folwoing two APIs:They are coming from the following module
WMCore.Database.CMSCouchand whoever imports it. This is basically related to the logic used for automated database replication between the agents and the central services.Is it backward compatible (if not, which system it affects?)
YES
Related PRs
The relevant change to the top level class
WMCore.Services.Serviceis here:#8726
External dependencies / deployment changes
IMPORTANT
It needs a change in the WMAgent secrets file. The change should look like [1]. And will take effect after agent deployment. It should result in the following configuration changes in the agent config itself [2].
[1]
For private deployment please do swap your central services instance with
https://cmsweb-testbed.cern.ch[2]
For private deployment please do swap your central services instance with
https://cmsweb-testbed.cern.chNOTE:
Because we implemented this directly into the code by chaniging the file
src/python/WMComponent/AgentStatusWatcher/AgentStatusPoller.pyand forwarding all theurls related the couchdb replication process, the above change my be unnneded. Keepeing it here just for historical reasons and to have a documented alternative.