Notes: idpy dev call, 6 August 2019
by hlflanagan@sphericalcowgroup.com
Attending:
Scott Koranda, Heather Flanagan, Leif Johansson, Giuseppe de Marco, Johan Lundberg, John Paraskevopoulos, Alex Stuard, Hannah Sebuliba,
Regrets:
Ivan, Rainer
Virtual IdP front end to Satosa - can expose multiple virtual IdPs through the Satosa front end. Can configure various options for the IdP, including the name of the IdP, the scope the IdP wants to use, etc. That config belongs to the front end. Scott also wants to have some microservices that operate on the assertions as they go through the system, and the microservices should have the same access to that configuration (e.g., so they can see the scope). Waiting on a decision from Ivan on how to implement this.
Can you run a single Satosa instance with multiple front and back ends? Example, a SAML back end that would authN against eduGAIN, and another that authN to ORCID, and front ends that would work with either SAML or OIDC.
Question: has anyone set up OIDC front end and had it work with mod_auth_oidc? Mod_auth_oidc is complaining. Giuseppe is planning on doing this in the next month or so.
With the OIDC front end, it won’t automatically work with multiple backends (cannot select between multiple backends). Need a custom routing service. Does anyone have such a routing service available? Giuseppe wrote one; can find it in the Satosa PRs. It intercepts the call and uses a map of entity IDs that need this.
Update on pyFF
Current release = 1.1.1; there are some bug fixes that need to go into 1.1.2 asap.
Code is stabilizing, but not sure he’d bet on 1.1.2 being stable.
2.0 will start with Leif removing the front end bits; he will provide a bash script to help people who are used to calling pyffd. There will still be a wsgi app (and it will be the main entry point).
Hannah has been working on some interesting memory things. She is looking for memory leaks. Scott thinks that when pyff is running as a server, it needs to never create a really large DOM, avoid ever having read the eduGAIN feed as a single DOM object, because that creates a huge, unnecessary memory request. Also have to avoid creating lists of many things; even if you don’t load the whole DOM, you have a list of small DOMs and if it is held in memory before being given to the backend store, you still consume a lot of memory. Scott suggests the architecture needs to shift from parsing large chunks of metadata, to parsing small chunks, handing them off to the backend, then garbage collect. Leif points out that as soon as you’re dealing with signed metadata, you have to handle all of it at once. Could try to do something by making the pipeline smaller.
One suggestion: switch to the Redis backend. Which could work in some use cases, but not for the full aggregate
Could do an offline fetch as another way to control size.
One goal is to keep pyFF from needing a server with more than 4GB. Not likely that’s going to function as eduGAIN gets larger
In eduTEAMS, pyFF does take up the largest memory footprint.
Could start pyFF, ingest all you need, use mirror MDQ to produce an offline copy, then shut down the pyFF service until you need to reingest. The offline MDQ could be used for discovery for as long as the signature is valid. Can use the thiss.io MDQ (thiss-mdq <https://github.com/TheIdentitySelector/thiss-mdq>) for a miniature search function.
Giuseppe uses pyFF with a scheduler.
Another alternative is to use the default discovery service being put together by SeamlessAccess.org <http://seamlessaccess.org/> (based on RA21)
Would be interesting to compare woosh+Redis to a JSON-only index store. Action item for Hannah.
How to exclude entityIDs from pyFF? It works as expected up to 0.9.3. Can use a filter, but the previous version of fork, merge, remove does not. (The latter impacts the current working document, and should not actually work.) Suggest you look at load-cleanup - there’s a way to run a pipeline early on before you update the backend store, and that might be it.