This is the proof-of-concept implementation of the concept presented by the VGIscience Privacy Project. It shows how to read social media data from a certain social media service and store it in a local database using the cardinality estimator HyperLogLog.
Core part is a FastAPI-based Python application ("sink-API") that provides a RESTful API to do the following:
API-Documentation: Swagger UI, ReDoc
This project includes a Compose file that can run the sink-API, the sinkdb database, a pgAdmin instance and a sinkmap visualization tool.
.env.example
to .env
docker compose up
- the database container will auto-create tables for already existing stream rulesstream-twitter.sh
to read that filtered stream and post it to the sink-APITo run this in a production environment, you have to create reverse proxies to the published container ports on your machine, e.g. using Apache or nginx. Don't forget to adjust .env
accordingly.
You can also run the stream reader script as a systemd service unit:
stream-twitter.sh
to /usr/local/bin/stream-twitter
stream-twitter.service
to /etc/systemd/system/
systemctl start stream-twitter.service
journalctl -fu stream-twitter.service
to see the output of the script.Recommended dev environment is VSCode and its Python extension. Experimental dependency management with Poetry.
http get localhost:8888
http get localhost:8888/rules
http post localhost:8888/rules value="tag_$(echo $((999 + RANDOM % 8999)))" tag="test" precision="4"
http delete localhost:8888/rules/1430817260
http delete localhost:8888/rules/$(http post localhost:8888/rules value="tag_$(echo $((RANDOM)))" tag="test" | jq -r '.data[0].id')
for id in $(http get localhost:8888/rules | jq -r '.data[].id' | grep 1438); do http delete "localhost:8888/rules/$id"; done
for t in flood fire storm; do http post localhost:8888/rules value="$t has:geo" tag="$t disaster"; done
# get number of areas for each rule
for id in $(http localhost:8888/rules | jq -r '.data[].id'); do echo -n "$id "; http "localhost:8888/rules/$id" | jq '.features | length '; done
BEARER_TOKEN=$(gopass show -o www/twitter.com/mlvgi/bearer_token)
http --stream GET "https://api.twitter.com/2/tweets/search/stream?tweet.fields=author_id,created_at,geo,id,text&place.fields=full_name,geo,id,name,place_type" "Authorization: Bearer $BEARER_TOKEN" | jq
http --stream get "https://api.twitter.com/2/tweets/search/stream?tweet.fields=author_id,created_at,geo,id,text&place.fields=full_name,geo,id,name,place_type" "Authorization: Bearer $BEARER_TOKEN" | while read line; do echo "$line" | http post localhost:8888/lbsn/posts ; done
( "POST /lbsn/posts HTTP/1.1" 422 Unprocessable Entity )
http --stream get "https://api.twitter.com/2/tweets/search/stream?tweet.fields=author_id,created_at,geo,id,text&place.fields=full_name,geo,id,name,place_type" "Authorization: Bearer $BEARER_TOKEN" | jq '. | {id: .data.id, geo: .data.geo, rules: .matching_rules}' | while read line; do echo "$line" | http post localhost:8888/lbsn/posts ; done
( https://stackoverflow.com/questions/69364553 )
source ~/projects/ml/vgisink/.env && PGPASSWORD=$SINKDB_PASS psql -U $SINKDB_USER -h $SINKDB_HOST -d $SINKDB_NAME -p $SINKDB_PORT
http post localhost:8888/posts <~/playground/example-tweet.json