August 2019 - Idpy-discuss - lists3.sunet.se

Agenda: idpy Developers call, 16 May 2019

by hlflanagan＠sphericalcowgroup.com

Date: Tuesday, 16 May 2018 Time: 06:00 PT | 09:00 ET | 15:00 GMT https://bluejeans.com/163562895 Agenda: 0. Agenda bash 1. Governance update a. Commons Conservancy b. Note Well and licensing by repository vs license in each file 2. Code releases - speed vs stability? a. Resources (do we have enough if we use them well? If not, what are we asking the idpy Board for?) 3. GitHub review a. pySAML2 - https://github.com/IdentityPython/pysaml2 b. Satosa - https://github.com/IdentityPython/SATOSA c. pyFF - https://github.com/IdentityPython/pyFF d. ... 4. AOB Thanks! Heather

3 days, 14 hours

4
4
0 0

pyFF: Pipes for additional output formats

by rainer＠hoerbe.at

Hi Leif, I added 2 pipes to buildin.py: - publish_html creates static HTML views of IDPs and SPs, using XSLT based on Peter Schober’s alternative to MET; - publish_split: similar to store, but added validUntil and creates signed XML-file per EntityDescriptor. This can be consumed dynamically by ADFS in an IDP role. I put it directly into buildin.py because it shares some code with the sign pipe. Is this viable from your PoV - if yes, I would make an PR. Cheers, Rainer

2 months, 4 weeks

3
2
0 0

Project licensing within IdentityPython

by ivan.kanak＠gmail.com

Hi all, being part of Commons Conservancy brought up yet another subject, which is whether we should add a header with license information in every file in the projects under idpy. This is not something done in an abstract way, there is a specific format modelling this information (see https://spdx.org/ and https://reuse.software/ - more specifically https://reuse.software/practices/2.0/) Still, I find it problematic. We want to open up the question to the wider community and consider their thoughts on this. The forwarded message below is discussing this subject. You can see the question we posed, the answer we got and my comments. Feel free to tell us what you think on this. ---------- Forwarded message --------- Date: Thu, 16 May 2019 at 09:56 > ---------- Forwarded message ---------- > Date: May 8, 2019, 8:15 AM -0700 > > > Why does CC think having a single license file per project is > > insufficient? Our thought is that if we can avoid adding a header to > > every single file, that would be nice, esp. given we already have this > > info in the license file and we have the Note Well. > > > this is not just our opinion, but something that is an industry and > community standard for legal compliance these days. When companies like > Siemens, Samsung or Honeywell use some code in one of the hundreds or > thousands of devices and systems in their product line, they need to be > able to provide the correct license and a download of the exact version. > This means machine readability too. > I've actually observed the opposite of that. Communities abandon the "license in every file" model, and just use a single LICENSE file in the root of the project. The LICENSE file contains license information, that is, it is not a single license but it has exception sections and so on. > To quote from https://reuse.software/practices/2.0/ : > > Scroll to the section "2. Include a copyright notice and license in each > file"... > > "Source code files are often reused across multiple projects, taken from > their origin and repurposed, or otherwise end up in repositories where > they are separate from its origin. You should therefore ensure that all > files in your project have a comment header that convey that file’s > copyright and license information: Who are the copyright holders and > under which license(s) do they release the file? > Continuing from above, the standardization of package-management formats and tools has helped exactly with that: to avoid distribution of single files, and instead provide packages and modules. It is bad practice and considered a hack to copy files. Nobody liked that model and everyone is moving away; it is unstructured, it becomes unmanageable and it will cause problems. > It is highly recommended that you keep the format of these headers > consistent across your files. It is important, however, that you do not > remove any information from headers in files of which you are not the > sole author. > > You must convey the license information of your source code file in a > standardised way, so that computers can interpret it. You can do this > with an SPDX-License-Identifier tag followed by an SPDX expression > defined by the SPDX specifications." > > (the text goes on for a while after this, to clarify the point but this > is the basic gist of it) > > There is a nice Python tool to check: > > https://github.com/fsfe/reuse-tool > > I hope this makes sense > Well, it does not make complete sense. We're talking about licensing a project. A project is not just code; there are data files (html, xml, yaml, json files), binary files (archives/zip, images, audio, video, etc), text files (configs, ini-files, etc) all "not-code". How do you mark those files? Does the LICENSE file need a license-header? The json format does not define comments, how do you add a header there? If a binary file does not get a license header, why should a file with code get one? I would expect there to be a way to have the needed information unified. If the files themselves cannot provide this information it has to be external; thus the LICENSE file. If someone is worried about somebody else re-using single files that do not have license information (a python file, a png image, etc) there is really nothing you can do (the DRM industry has been trying to solve for a long time; and still your best bet is "social DRM"). Since, we're developing on open source with a permissive license, even if someone does that, should we be happy that someone is actually using what we built or sad that the files they copied did not have a license header? And if they include the license information of that copied file in their project's LICENSE file, is this solved? Having pointed these contradictions, I am thinking that the "license in every file" model seems to be a step backwards. It is introducing overhead and does not really solve the problem, while at the same time it enables a culture of bad practice (copying files around). Cheers, -- Ivan c00kiemon5ter Kanakarakis >:3

2 months, 4 weeks

4
3
0 0

Notes: idpy dev call, 20 August 2019

by hlflanagan＠sphericalcowgroup.com

Attendees: Ivan, Heather, Giuseppe, Christos, Hannah, Roland, John P. Regrets: Rainer, Scott Agenda: 0. Agenda bash 1. GitHub review a. pySAML2 - https://github.com/IdentityPython/pysaml2 <https://github.com/IdentityPython/pysaml2>

5 years, 10 months

3
3
0 0

pyFF, RedisWhooshStore, gunicorn, APScheduler

by skoranda＠gmail.com

Hi, I want to run pyFF using the RedisWhooshStore to leverage the superior indexing that Whoosh provides to support API search. The example API runner script at https://github.com/IdentityPython/pyFF/blob/master/scripts/run-pyff-api.sh uses this command: gunicorn \ --preload \ --log-config logger.ini \ --bind 0.0.0.0:8000 \ -t 600 \ -e PYFF_PIPELINE=$pipeline \ -e PYFF_STORE_CLASS=pyff.store:RedisWhooshStore \ -e PYFF_UPDATE_FREQUENCY=300 \ --threads 4 \ --worker-tmp-dir=/dev/shm \ --worker-class=gthread pyff.wsgi:app My testing shows that combination of arguments will not work. If you start from a new deployment the Whoosh index is never generated. The reason is the --preload option to gunicorn. When that option is used the application code (pyff.wsgi:app) is loaded/evaluated before the worker process is forked. As such the APScheduler BackgroundScheduler instance is created before gunicorn forks to create the worker process. It is realized as a thread running in the parent. The forked child process does not inherit threads (other than the main) from the parent. So the BackgroundScheduler only runs in the parent, which seems initially to be the desired behavior--you would not want two copies of the BackgroundScheduler, one in the parent and one in the child. Further, with --preload it is the parent process that adds or schedules the "call" job to run the update/load cycle. Since the BackgroundScheduler also runs in the parent it immediately "sees" the added "call" job and runs it. The "call" job does an HTTP GET to localhost to cause the update/load cycle. By this point the parent has forked to create the child, and it is the child worker that services that GET call. As part of servicing that GET call the child worker eventually schedules the job "RedisWhooshStore._reindex". But since it is the child and not the parent that schedules the reindex job, the BackgroundScheduler thread running in the parent never "sees" that the reindex job has been scheduled. If using the default memory scheduler job store the reindex job will never be seen, and hence the Whoosh index will never be created. If the command above is changed to include -e PYFF_SCHEDULER_JOB_STORE=redis then the job is stored in Redis. While that helps because now the BackgroundScheduler thread in the parent will "see" the reindex job, it will not "see" it until it wakes up to run the "call" job. So the reindexing does not happen until PYFF_UPDATE_FREQUENCY later. I want to run with update frequencey of one hour, but I do not want to wait an hour for the initial Whoosh index to be created, nor do I want any changes from the update/load to not appear in the index until an hour later. I think the right workaround for now is to NOT use --preload. In this scenario the BackgroundScheduler thread runs in the forked child process and so it "sees" both the "call" and "reindex" jobs as they are scheduled. The Whoosh index is created immediately after the update/load cycle. The downside of this approach is that gunicorn should only be run with a single worker (the default). When run with more than one worker there would be multiple BackgroundSchedulers doing the same work, and that is probably not desireable. One gunicorn worker with multiple thread should be able to easily serve the loads I expect for most of my deployments, but if pyff:wsgi:app is to really scale I think the approach with APScheduler needs to be redesigned. I think the scheduler will need to run in a dedicated process that is managed via some type of remote procedure call, or another background task/job manager will need to be used instead of APScheduler. Leif, do you agree with this assessment? If so, I will submit a PR for run-pyff-api.sh that removes the --preload option and explicitly uses "--workers 1", and that includes a comment about only using one worker. Thanks, Scott K

5 years, 10 months

1
0
0 0

pyFF near time plans

by leifj＠sunet.se

Hey, Sorry for the crosspost... After a few weeks of spending all of my available development bits on the various parts of RA21 (cf github.com/TheIdentitySelector, yes its all nodejs!) I'm back to working on pyFF for a bit. Here is what I have planned for in the quite near term: 1. merge the api-refactory branch which includes a pyramids-based API 2. merge documentation PR from Hannah Sebuliba (thx!) 3. tag and release the last monolothic version of pyFF 4. in HEAD which becomes the new 1.0.0 release: - remove all frontend bits (old discovery, management web app) - pyffd will now start pyramids-based API server - wsgi will be available/recommended - create a new "frontend app" as a separate webpack+nodejs project - create docker-compose.yaml that starts pyffd (API) + frontend app 5. tag and release 1.0.0 thereby moving pyFF over to semantic versioning After 4 it makes sense to talk about things like... - new redis/#nosql backends - work on reducing memory footprint - pubsub for notifications between MDQ servers - more instrumentation & monitoring - adaptive aggregation for large-scale deployments - elastic search - management APIs for integrated editing of local metadata - OIDC - generating offline MDQ directory structures (cf scripts/mirror-mdq.sh) Thoughts etc are as usual welcome. Cheers Leif

5 years, 10 months

5
32
0 0

Agenda: idpy dev call, 20 August 2019

by hlflanagan＠sphericalcowgroup.com

Date: Tuesday, 20 August 2018 Time: 06:00 PT <x-apple-data-detectors://1> | 09:00 ET <x-apple-data-detectors://2> | 15:00 GMT <x-apple-data-detectors://3> https://bluejeans.com/489221749 Agenda: 0. Agenda bash 1. GitHub review a. pySAML2 - https://github.com/IdentityPython/pysaml2 <https://github.com/IdentityPython/pysaml2> b. Satosa - https://github.com/IdentityPython/SATOSA <https://github.com/IdentityPython/SATOSA> c. pyFF - https://github.com/IdentityPython/pyFF <https://github.com/IdentityPython/pyFF> d. … 2. Hackathon planning - https://wiki.refeds.org/pages/viewpage.action?pageId=44959235 <https://wiki.refeds.org/pages/viewpage.action?pageId=44959235> 3. AOB Thanks! Heather

5 years, 10 months

3
2
0 0

CMService -> SimpleConsent

by rainer＠hoerbe.at

I drafted a new consent service based on Django: https://github.com/identinetics/simpleconsent/blob/master/README.adoc [1] I weighted the complexity of CMservice, its lack of documentation and community support vs being an already deployed project. I think that I will drop CMservice and go ahead with developing simpleconsent in second half of September, unless someone would propose an alternative. Any encouragement or dissuation? A consideration is that Django does not work with SQLAlchemy, which is a different type of ORM. But I would need to stick to Django for development speed. - Rainer [1] @Heather: Is there an RFC that dismisses the use of „simple“ in project names? My excuse is, that SCAR (Simple Consent for Attribute Release) did not sound well, either. > Am 2019-08-15 um 20:16 schrieb Rainer Hoerbe <rainer at hoerbe.at>: > > Thanks for the quick answer. I hope that we can cover this in the idpy call next week, as I will be on vacation fro 2weeks afterwards. > > I would be interested in your assessment of the code. On my side, I am unhappy that the APi is undocumented and has to be reverse engineered from the view definitions etc. > > - Rainer > > > >> Am 2019-08-15 um 20:06 schrieb Christos Kanellopoulos <christos.kanellopoulos at geant.org <mailto:christos.kanellopoulos at geant.org>>: >> >> Hi Rainer >> >> We have done some further work on the CM service and we have fixed various bugs. Now not myself and Ivan are on holidays. Next week we will not be back and share the updated code. >> >> Having said this, we are seriously thinking to abandon this code base and develop a cm l component from scratch. >> >> Christos >> From: Rainer Hoerbe <rainer at hoerbe.at <mailto:rainer at hoerbe.at>> >> Sent: Thursday, August 15, 2019 8:22:13 PM >> To: Christos Kanellopoulos <christos.kanellopoulos at geant.org <mailto:christos.kanellopoulos at geant.org>> >> Subject: Re: CMservice gitlab export >> >> Hi Christos, >> >> The integration of CMservice into SATOSA is again on the top of my todo list. When I added your tar-ball from 22. May, I notices that the unit tests have not been updated to reflect the changes in src. I fixed this in https://github.com/its-dirg/CMservice/pull/11 <https://github.com/its-dirg/CMservice/pull/11>, and a few dependency issuers. >> >> Is there any new status on the GÉANt branch of the project? Any new commits? I would like to know if there is a chance to consolidate efforts wrt this project. do you know, or do you know someone who might know? >> >> Cheers, Rainer >> >>> Am 2019-05-22 um 12:02 schrieb Christos Kanellopoulos <christos.kanellopoulos at geant.org <mailto:christos.kanellopoulos at geant.org>>: >>> >>> Hello Rainer, >>> >>> find it attached. Yesterday afternoon, became really late night. >>> >>> Christos >>> >>> On 22 May 2019, at 11:54, Rainer Hoerbe wrote: >>> >>> May I send a friendly reminder? >>> >>> thanks! >>> >>>> Am 2019-05-21 um 08:47 schrieb Christos Kanellopoulos <christos.kanellopoulos at geant.org <mailto:christos.kanellopoulos at geant.org>>: >>>> >>>> Hello Rainer >>>> >>>> I am at the hospital, but I will be able to send it to you later this afternoon >>>> >>>> Christos >>>> >>>> From: Rainer Hoerbe <rainer at hoerbe.at <mailto:rainer at hoerbe.at>> >>>> Sent: Tuesday, May 21, 2019 9:46 AM >>>> To: Christos Kanellopoulos >>>> Subject: CMservice gitlab export >>>> >>>> >>>> Hi Christos, >>>> >>>> You mentioned in the last idpy meeting that I might get a copy of Geant’s CMService repo on gitlab. Whom would I ask to get it? >>>> >>>> Thanks and best regards >>>> Rainer >>> >>> >>> -- >>> Christos Kanellopoulos >>> Senior Trust & Identity Manager >>> GÉANT >>> M: +31 611 477 919 >>> >>> Networks • Services • People >>> Learn more at www.geant.org <http://www.geant.org/> >>> >>> GÉANT Vereniging (Association) is registered with the Chamber of Commerce in Amsterdam with registration number 40535155 and operates in the UK as a branch of GÉANT Vereniging. Registered office: Hoekenrode 3, 1102BR Amsterdam, The Netherlands. UK branch address: City House, 126-130 Hills Road, Cambridge CB2 1PQ, UK. >>> >>> <cm-service-devel.tar.gz> >

5 years, 10 months

3
2
0 0

Notes: idpy dev call, 6 August 2019

by hlflanagan＠sphericalcowgroup.com

Attending: Scott Koranda, Heather Flanagan, Leif Johansson, Giuseppe de Marco, Johan Lundberg, John Paraskevopoulos, Alex Stuard, Hannah Sebuliba, Regrets: Ivan, Rainer Virtual IdP front end to Satosa - can expose multiple virtual IdPs through the Satosa front end. Can configure various options for the IdP, including the name of the IdP, the scope the IdP wants to use, etc. That config belongs to the front end. Scott also wants to have some microservices that operate on the assertions as they go through the system, and the microservices should have the same access to that configuration (e.g., so they can see the scope). Waiting on a decision from Ivan on how to implement this. Can you run a single Satosa instance with multiple front and back ends? Example, a SAML back end that would authN against eduGAIN, and another that authN to ORCID, and front ends that would work with either SAML or OIDC. Question: has anyone set up OIDC front end and had it work with mod_auth_oidc? Mod_auth_oidc is complaining. Giuseppe is planning on doing this in the next month or so. With the OIDC front end, it won’t automatically work with multiple backends (cannot select between multiple backends). Need a custom routing service. Does anyone have such a routing service available? Giuseppe wrote one; can find it in the Satosa PRs. It intercepts the call and uses a map of entity IDs that need this. Update on pyFF Current release = 1.1.1; there are some bug fixes that need to go into 1.1.2 asap. Code is stabilizing, but not sure he’d bet on 1.1.2 being stable. 2.0 will start with Leif removing the front end bits; he will provide a bash script to help people who are used to calling pyffd. There will still be a wsgi app (and it will be the main entry point). Hannah has been working on some interesting memory things. She is looking for memory leaks. Scott thinks that when pyff is running as a server, it needs to never create a really large DOM, avoid ever having read the eduGAIN feed as a single DOM object, because that creates a huge, unnecessary memory request. Also have to avoid creating lists of many things; even if you don’t load the whole DOM, you have a list of small DOMs and if it is held in memory before being given to the backend store, you still consume a lot of memory. Scott suggests the architecture needs to shift from parsing large chunks of metadata, to parsing small chunks, handing them off to the backend, then garbage collect. Leif points out that as soon as you’re dealing with signed metadata, you have to handle all of it at once. Could try to do something by making the pipeline smaller. One suggestion: switch to the Redis backend. Which could work in some use cases, but not for the full aggregate Could do an offline fetch as another way to control size. One goal is to keep pyFF from needing a server with more than 4GB. Not likely that’s going to function as eduGAIN gets larger In eduTEAMS, pyFF does take up the largest memory footprint. Could start pyFF, ingest all you need, use mirror MDQ to produce an offline copy, then shut down the pyFF service until you need to reingest. The offline MDQ could be used for discovery for as long as the signature is valid. Can use the thiss.io MDQ (thiss-mdq <https://github.com/TheIdentitySelector/thiss-mdq>) for a miniature search function. Giuseppe uses pyFF with a scheduler. Another alternative is to use the default discovery service being put together by SeamlessAccess.org <http://seamlessaccess.org/> (based on RA21) Would be interesting to compare woosh+Redis to a JSON-only index store. Action item for Hannah. How to exclude entityIDs from pyFF? It works as expected up to 0.9.3. Can use a filter, but the previous version of fork, merge, remove does not. (The latter impacts the current working document, and should not actually work.) Suggest you look at load-cleanup - there’s a way to run a pipeline early on before you update the backend store, and that might be it.

5 years, 11 months

3
2
0 0

idpy dev call: 6 August 2019

by skoranda＠gmail.com

Hello, We have a call on the calendar for tomorrow, 6 August 2019. While Ivan is on holiday and dreaming of functional programming in Python, we will still have a call. Leif has agreed to join the call so that we can spend some time talking about the latest changes to pyFF. We can also cover other topics as time permits. Thanks, Scott K

5 years, 11 months

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Idpy-discuss August 2019