Hi,
I have a pyFF deployment answering MDQ/API queries at
https://mdq-dev.ligo.org/
I am using the master of today's thiss-js repository at commit
2bca90db03a3110feeaf2773e3a91598bcee1b4e.
I did
npm intall
make docker
and it generated the image thiss-js:1.0.1
I then did
docker run \
-d \
--name ds \
-e BASE_URL=http://localhost:8080 \
-e COMPONENT_URL=http://localhost:8080/cta \
-e PERSISTENCE_URL=http://localhost:8080/ps \
-e MDQ_URL=https://mdq-dev.ligo.org/entities/ \
-e SEARCH_URL=https://mdq-dev.ligo.org/api/search \
-e LOGLEVEL=debug \
-p 8080:80 \
thiss-js:1.0.1
To simulate an IdP discovery flow I then browsed to
http://localhost:8080/ds/index.html?entityID=https%3A%2F%2Fwiki.ligo.org%2F…
The discovery page renders and I expect it would and I can type in the
text box. I see the API queries going out to the pyFF MDQ service and
the IdP I want to use is found and displayed as a choice.
When I click on the IdP choice, however, nothing happens.
The generated HTML that I am clicking on is
<a class="institution identityprovider bts-dynamic-item" tabindex="0" data-href="">
<li><i class="arrow" data-fa-i2svg=""><svg class="svg-inline--fa fa-angle-right fa-w-8" aria-hidden="true" focusable="false" data-prefix="fa" data-icon="angle-right" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 256 512" data-fa-i2svg=""><path fill="currentColor" d="M224.3 273l-136 136c-9.4 9.4-24.6 9.4-33.9 0l-22.6-22.6c-9.4-9.4-9.4-24.6 0-33.9l96.4-96.4-96.4-96.4c-9.4-9.4-9.4-24.6 0-33.9L54.3 103c9.4-9.4 24.6-9.4 33.9 0l136 136c9.5 9.4 9.5 24.6.1 34z"></path></svg></i>
<div class="d-block institution-text">
<div class="text-truncate label primary">University of Wisconsin-Milwaukee</div>
<div class="text-truncate label-url secondary">uwm.edu</div>
</div>
</li>
</a>
Is there some configuration I missed for running the Docker image?
I have tried both Firefox 69.0.1 and Google Chrome 77.0.3865.90 on Debian 10.
Scott K
P.S. Note that I reported yesterday an issue with pyFF MDQ not responding
correctly when the query is the sha1 hash of an entityID. Does not
being able to resolve the metadata for the SP prevent discovery from working?
Hi,
I've been experimenting with thiss.io combined with pyff as mdq and tried to
use the pyff "select as" construct to narrow down the list of available IdP's
in the DS. It doesn't work? Pyff correctly reports a request on the ds, for
alias test, but thiss.io returns a 404?
pyffd[1318]: 2019-09-24 15:48:21,415 DEBUG [pyff.api:31][MainThread] handling
entry=request, alias=test, path=ds
I suspect this is caused by the (example) nginx conf, as the /test alias can
not be served by thiss.io it is directly forwarded to pyff, bypassing the DS?
server {
include "mime.types";
listen 0.0.0.0:8081 default_server;
location / {
root /opt/thiss-js/dist;
try_files $uri $uri/index.html $uri.html @mdq;
}
location @mdq {
proxy_pass http://127.0.0.1:8083;
}
}
But then, even if we could solve the nginx conf, thiss.io should be "alias"
aware and fire the correct query on pyff. Will it?
Best regards,
Martin
Hi,
I am using today's pyFF master head, commit
bbdf245ccdb0be8ce45dda8c0cef06a6d33e2755
My pipeline contains
- when request:
- select:
- pipe:
- when accept application/xml:
- first
- finalize:
cacheDuration: PT12H
validUntil: P10D
- sign:
key: metadata-signer.key
cert: metadata-signer.crt
- emit application/xml
- break
- when accept application/json:
- discojson
- emit application/json
- break
This query returns the XML I expect
curl 'http://127.0.0.1:8080/entities/https%3A%2F%2Fwiki.ligo.org%2Fshibboleth-sp'
proving that pyFF has the metadata for the entityID.
But this query returns an empty <EntitiesDescriptor>:
curl 'http://127.0.0.1:8080/entities/%7Bsha1%7Dff767393c6b06e8282603e9e4541ac1e87…'
Note that
$ python3
Python 3.7.3 (default, Apr 3 2019, 05:39:12)
[GCC 8.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from hashlib import sha1
>>> m = sha1()
>>> m.update(b'https://wiki.ligo.org/shibboleth-sp')
>>> m.hexdigest()
'ff767393c6b06e8282603e9e4541ac1e878d63aa'
This is problematic since thiss-js wants to use sha1 hashes to query
MDQ.
Is there something special I have to configure to get pyFF to accept the
sha1 hash?
Thanks,
Scott K
Attending
Ivan, Heather, Roland, Ed, Hannah, Alex, Scott, Christos
Notes:
0. Agenda bash
1. Hackathon planning - https://wiki.refeds.org/pages/viewpage.action?pageId=44959235 <https://wiki.refeds.org/pages/viewpage.action?pageId=44959235>
Currently about 24 people signed up; no sense of distribution.
Maybe have some “how to” stations to help people learn how to connect, how the various software stacks work.
We do know there will be some people there specifically for thiss.io, some for OIDC federation, some SimpleSAMLphp
For Satosa, should have a set up with a ready-made Satosa deployed, then look into how to configure microservices, and develop new microservices. Also do something internal data for Satosa. On the list, one suggestion was to see that the MFA profile passed from the front end to the back end. That might be a good project.
Ivan is still planning to work on some machine images this week and other items in prep for the Hackathon.
If anyone else has specific ideas of things to share or things they plan to prepare, please let Ivan know.
There may be some work on pySAML2 as part of the integration with Satosa (see in particularhttps://github.com/IdentityPython/pysaml2/issues/586 <https://github.com/IdentityPython/pysaml2/issues/586>)
For the OIDC Federation, there will be a couple of folks doing some java programming, and hopefully the SimpleSAMLphp will join. Roland will have a server set up to work against. Also has a few people interested in writing mobile clients, though they are not going to be at the hackathon. Roland will also set up a server for them to test against on the web. Giuseppe de Marco is working on an OIDC IdP implementation based on Django, though he will not be at the hackathon.
2. GitHub review
a. pySAML2 - https://github.com/IdentityPython/pysaml2 <https://github.com/IdentityPython/pysaml2>
b. Satosa - https://github.com/IdentityPython/SATOSA <https://github.com/IdentityPython/SATOSA>
https://github.com/IdentityPython/SATOSA/pull/252 <https://github.com/IdentityPython/SATOSA/pull/252> - (Scott’s PR) Doing things with LDAP. Are just about at the point that this can be merged. Ivan suggests that we need to normalize the attributes received from LDAP, which means removing the options part from the attributes names and then map them to other attributes. Scott notes that the attribute options need to remain; sometimes the attributes options are enumerated, sometimes they aren’t, and how they are mapped will vary (but they must be mapped). That makes this all fine, then, and the PR can be merged.
https://github.com/IdentityPython/SATOSA/pull/275 <https://github.com/IdentityPython/SATOSA/pull/275> - (Hannah’s PR) converting logging to something different, to allow outputting of structured information. Ivan will be merging changes as they come in. When we’re done, we will look at exactly what we log, whether they are the right things to capture and if they help us enough in knowing what’s happening within the service.
While Ivan was on holiday, discussed how to maintain state between back ends and front ends (use cases, not solutions). Goal is to have access to the configuration of something, like a microservice. Those values could then be used. This might be best dealt with by using handles. We have multiple back ends and front ends, and if we say we want to have access to those, there needs to be specific coupling. We will need to mimic interfaces within the backend (which they don’t otherwise have). Better to communicate data than to have functions be passed around, because the data can be served and be agnostic to the things consuming them. They become the API. Given this data needs to be communicated between requests, right now that means info has to be pushed into the state cookie. Worried about the size of that growing over time, Need alternative ways to keep state. If we use something like the web storage or the index db, then much of what we need to transfer can be eliminated. If we could have context around configuration that could be serialized, we could keep the state of that. Every time it is needed, it could be rebuilt using the serialized state. (This would be a future architecture possibility)
https://github.com/IdentityPython/SATOSA/pull/275 <https://github.com/IdentityPython/SATOSA/pull/275>
https://github.com/IdentityPython/SATOSA/pull/273 <https://github.com/IdentityPython/SATOSA/pull/273> - before merging this, note there are many common things between the way the OIDC services do things, and perhaps we should have a base for the services.
c. pyFF - https://github.com/IdentityPython/pyFF <https://github.com/IdentityPython/pyFF>
i. pyFF metadata downloads
d. …
3. AOB
Neils van Dijk reported a security issue.
The issue is with the handling of the redirect_uri. When a request is received, SatoSa validates if the redirect_uri is valid semantically and is allowed as defined in a client database.
If that is not the case, SaToSa shows an error page for example "Something went wrong: Redirect uri 'foobar' is not registered"
However, in case the parameter contains e.g. javascript like redirect_uri%3Cscript%3Ealert(%22hi%20SaToSa%22)%3B%3C%2Fscript%3E SaToSa does detect correctly this is not a correct URI, but then continues to present the javascript on the error page unmodified, hence executing the javascript.
Any other error in the url paramaters passed between rp and op should normally result in a response back to the RP without any user screens involved, however as per https://openid.net/specs/openid-connect-core-1_0.html#AuthError <https://openid.net/specs/openid-connect-core-1_0.html#AuthError>, an issue with the redirect_uri is the sole exception :(
I think SaToSa should always escape the html characters in the error message and perhaps also detect e.g. the presence of a bracket as an illegal character.
Another security issue: Scott and Ivan have discussed that the default for the SAML backend will consume the attributes without checking the scope. This might best be part of the core in pySAML2, and keeps with the most conservative defaults as possible.
Hello folks,
The UK federation team have discovered that a pyFF deployment is making a large number of metadata aggregate downloads from our Metadata Publication Service. In August, 34.74.200.81 made over 3,000 gzipped downloads of our metadata, downloading 36GB of metadata. As we update metdadata once per day, this deployment is clearly downloading excessively.
The IP address 34.74.200.81 is somewhere in Google cloud (81.200.74.34.bc.googleusercontent.com.) and the user agent string is pyFF/1.0.1. Is this anyone's deployment?
We also recommend that metadata downloaders make use of the conditional GET. Is it possible to configure pyFF to do conditional GET for metadata downloads?
Thanks,
Alex
—
Alex Stuart, Principal technical support specialist (UK federation)
alex.stuart at jisc.ac.uk
UK federation helpdesk: service at ukfederation.org.uk
Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.
Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800.
Hello everyone,
Not many people gathered today for the usual biweekly call. We decided
to postpone the call for next week. I'll followup with an agenda
notification, soon.
Cheers,
--
Ivan c00kiemon5ter Kanakarakis >:3
Hi,
I want to run pyFF using the RedisWhooshStore to leverage the superior
indexing that Whoosh provides to support API search.
The example API runner script at
https://github.com/IdentityPython/pyFF/blob/master/scripts/run-pyff-api.sh
uses this command:
gunicorn \
--preload \
--log-config logger.ini \
--bind 0.0.0.0:8000 \
-t 600 \
-e PYFF_PIPELINE=$pipeline \
-e PYFF_STORE_CLASS=pyff.store:RedisWhooshStore \
-e PYFF_UPDATE_FREQUENCY=300 \
--threads 4 \
--worker-tmp-dir=/dev/shm \
--worker-class=gthread pyff.wsgi:app
My testing shows that combination of arguments will not work. If you
start from a new deployment the Whoosh index is never generated.
The reason is the --preload option to gunicorn. When that option is used
the application code (pyff.wsgi:app) is loaded/evaluated before the
worker process is forked. As such the APScheduler BackgroundScheduler
instance is created before gunicorn forks to create the worker process.
It is realized as a thread running in the parent.
The forked child process does not inherit threads (other than the main)
from the parent. So the BackgroundScheduler only runs in the parent,
which seems initially to be the desired behavior--you would not want two
copies of the BackgroundScheduler, one in the parent and one in the
child.
Further, with --preload it is the parent process that adds or schedules
the "call" job to run the update/load cycle. Since the
BackgroundScheduler also runs in the parent it immediately "sees" the
added "call" job and runs it. The "call" job does an HTTP GET to
localhost to cause the update/load cycle.
By this point the parent has forked to create the child, and it is the
child worker that services that GET call. As part of servicing that GET
call the child worker eventually schedules the job
"RedisWhooshStore._reindex".
But since it is the child and not the parent that schedules the reindex
job, the BackgroundScheduler thread running in the parent never "sees"
that the reindex job has been scheduled. If using the default memory
scheduler job store the reindex job will never be seen, and hence the
Whoosh index will never be created.
If the command above is changed to include
-e PYFF_SCHEDULER_JOB_STORE=redis
then the job is stored in Redis. While that helps because now the
BackgroundScheduler thread in the parent will "see" the reindex job, it
will not "see" it until it wakes up to run the "call" job. So the
reindexing does not happen until PYFF_UPDATE_FREQUENCY later. I want to
run with update frequencey of one hour, but I do not want to wait an
hour for the initial Whoosh index to be created, nor do I want any
changes from the update/load to not appear in the index until an hour
later.
I think the right workaround for now is to NOT use --preload. In this
scenario the BackgroundScheduler thread runs in the forked child process
and so it "sees" both the "call" and "reindex" jobs as they are
scheduled. The Whoosh index is created immediately after the update/load
cycle.
The downside of this approach is that gunicorn should only be run with a
single worker (the default). When run with more than one worker there
would be multiple BackgroundSchedulers doing the same work, and that is
probably not desireable.
One gunicorn worker with multiple thread should be able to easily serve
the loads I expect for most of my deployments, but if pyff:wsgi:app
is to really scale I think the approach with APScheduler needs to be
redesigned. I think the scheduler will need to run in a dedicated
process that is managed via some type of remote procedure call, or
another background task/job manager will need to be used instead of
APScheduler.
Leif, do you agree with this assessment?
If so, I will submit a PR for run-pyff-api.sh that removes the --preload
option and explicitly uses "--workers 1", and that includes a comment
about only using one worker.
Thanks,
Scott K