Page MenuHomePhabricator

ayounsi (Arzhel Younsi)
Staff Network SRE

Projects (10)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 3 2017, 6:23 PM (384 w, 5 d)
Availability
Available
IRC Nick
xionox
LDAP User
Ayounsi
MediaWiki User
AYounsi (WMF) [ Global Accounts ]

Recent Activity

Fri, Aug 16

ayounsi triaged T372654: Netbox ProvisionServer script fails vlan verification as High priority.
Fri, Aug 16, 3:49 PM · netops, Infrastructure-Foundations
ayounsi closed T311052: Netbox: replace getstats.GetDeviceStats with netbox-more-metrics as Resolved.

All done.

Fri, Aug 16, 2:05 PM · Patch-For-Review, netbox, Infrastructure-Foundations
ayounsi added a comment to T341843: Netbox rq.timeouts.JobTimeoutException.
Job dispatched to netbox1003 - takes less than 2min
Aug 16 09:24:25 netbox1003 python[1079619]: 09:24:25 default: extras.scripts.run_script(commit=True, data={}, job=<Job: 61e3e2e3-59b1-4982-81a7-5589221d7ed9>, request=<utilities.request.NetBoxFakeRequest ob>
Aug 16 09:25:47 netbox1003 python[1622900]: 09:25:47 default: Job OK (61e3e2e3-59b1-4982-81a7-5589221d7ed9)
Fri, Aug 16, 9:55 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, netbox

Wed, Aug 14

ayounsi triaged T372461: Remove Additional IP records from procurement request template as Low priority.
Wed, Aug 14, 10:56 AM · Patch-For-Review, DC-Ops
ayounsi added a comment to T229542: Export LibreNMS data to Prometheus.

Sounds good to me !

Wed, Aug 14, 8:19 AM · Observability-Metrics
ayounsi added a comment to P67286 nbshell script to find clusters that should be removed from the NO_V6_DEVICE_NAME_PREFIXES variable in the Netbox network report.
output
>>> pprint.pprint(results)
defaultdict(<class 'int'>,
            {'an-redacteddb': 1,
             'clouddb': 9,
             'db': 251,
             'dbprov': 6,
             'dbproxy': 18,
             'dbstore': 3,
             'dumpsdata': 1,
             'es': 42,
             'ganeti': 29,
             'maps': 12,
             'ms-be': 16,
             'pc': 14,
             'restbase': 30,
             'snapshot': 1,
             'thanos-fe': 6,
             'wdqs': 5})
>>> 
>>> print(set(NO_V6_DEVICE_NAME_PREFIXES) - set(results.keys()))
{'thumbor', 'mc-gp', 'restbase-dev', 'mc', 'wtp', 'mwlog', 'mw', 'parse', 'graphite', 'ores', 'sessionstore'}
Wed, Aug 14, 7:26 AM · netbox
ayounsi created P67286 nbshell script to find clusters that should be removed from the NO_V6_DEVICE_NAME_PREFIXES variable in the Netbox network report.
Wed, Aug 14, 7:25 AM · netbox
ayounsi renamed T372453: snapshot1010, dumpsdata1003: add AAAA DNS record from snapshot1010: add AAAA DNS record to snapshot1010, dumpsdata1003: add AAAA DNS record.
Wed, Aug 14, 6:58 AM · Data-Platform-SRE (2024.08.17 - 2024.09.06)
ayounsi triaged T372453: snapshot1010, dumpsdata1003: add AAAA DNS record as Low priority.
Wed, Aug 14, 6:52 AM · Data-Platform-SRE (2024.08.17 - 2024.09.06)

Tue, Aug 13

ayounsi added a comment to T371890: pynetbox incompatibility with Netbox >= 4.0.6.

With the patches above, the last urgent-ish thing needed is to package the new pynetbox for the cumin hosts (bullseye)

Tue, Aug 13, 9:30 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi reopened T363341: Q4:rack/setup/install cloudcephosd10[39-41] as "Open".

https://netbox.wikimedia.org/extras/scripts/results/78992/
cloudcephosd1039 (WMF11571) /dcim/devices/5296/ Primary IPv6 missing DNS name
I guess the skip IPv6 box got checked by mistake, could someone add the host's FQDN to https://netbox.wikimedia.org/ipam/ip-addresses/17171/ (similar to https://netbox.wikimedia.org/ipam/ip-addresses/17159/) then run the sre.dns.netbox cookbook ?

Tue, Aug 13, 8:09 AM · SRE, ops-eqiad, cloud-services-team (Hardware), DC-Ops

Mon, Aug 12

ayounsi updated the task description for T310590: Netbox: use Custom Model Validation.
Mon, Aug 12, 4:25 PM · Infrastructure-Foundations, netbox
ayounsi claimed T372248: Alert in need of triage: BGP status (instance cr1-esams).

Emailed AS54994 and cleared the errors for the others.

Mon, Aug 12, 9:51 AM · Infrastructure-Foundations, netops, sre-alert-triage
ayounsi added a comment to T372161: Publish, and maintain ASPA records for valid AS14907 upstreams.

Nevertheless, it should be possible to publish ASPA records in RPKI through the ARIN portal

I looked a bit around Arin's RPKI's portal but couldn't find it, is there doc about it ?

Mon, Aug 12, 9:08 AM · netops, Infrastructure-Foundations
ayounsi added a comment to T372158: Apply egress Source Address Validation on the Wikimedia core routers.

However, in reality, it should be possible to reject all IP packets where the source IP is not part of the IP prefixes that the Foundation has been assigned (i.e. prefix lists production{4,6}, which are a superset of the publicly routable LVS service IPs).

We would need to at least permit traffic from the transit interface IPs, as they do BGP to their peers, v6 link local for neighbor discovery, some land GRE tunnels, etc. Not sure what is the cleanest way for that, maybe using an apply-path like for bgp-sessions ? Ideally we wouldn't have to list them all :)

Mon, Aug 12, 8:54 AM · Infrastructure-Foundations, netops

Thu, Aug 8

ayounsi closed T371957: Decom Netbox 3 servers as Resolved.

Cleaned up.

Thu, Aug 8, 1:17 PM · Infrastructure-Foundations, netbox
ayounsi added a comment to T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts.

This went very well until it didn't. Changes fully rolled back.

Thu, Aug 8, 12:37 PM · Infrastructure-Foundations, netops, SRE
ayounsi added a comment to T371653: New hosts with "Netbox status: unknown".

Thanks, it's fixed for those 2 hosts.

Thu, Aug 8, 8:31 AM · netbox, Patch-For-Review, Infrastructure-Foundations
ayounsi added a comment to T371890: pynetbox incompatibility with Netbox >= 4.0.6.

Fix released : https://github.com/netbox-community/pynetbox/releases/tag/v7.4.0

Thu, Aug 8, 7:26 AM · Patch-For-Review, Infrastructure-Foundations, netbox

Wed, Aug 7

ayounsi removed projects from T268621: Move some of wikimediacloud.org 185.15.56.0/23 to Netbox: Infrastructure-Foundations, SRE, netbox.
Wed, Aug 7, 2:01 PM · cloud-services-team, DNS
ayounsi claimed T310590: Netbox: use Custom Model Validation.

Sent the patches for the last few ones left in the task description.

Wed, Aug 7, 1:39 PM · Infrastructure-Foundations, netbox
ayounsi triaged T371957: Decom Netbox 3 servers as Low priority.
Wed, Aug 7, 9:43 AM · Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x as Resolved.
Wed, Aug 7, 9:39 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi added a comment to T336275: Upgrade Netbox to 4.x.

Notes from the Debrief meeting

Wed, Aug 7, 9:38 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T327643: Add network devices fingerprints to known_hosts, as Resolved.
Wed, Aug 7, 9:38 AM · SRE, SRE-tools, netops, Infrastructure-Foundations
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T252747: Generate ssh_known_hosts for network devices, as Resolved.
Wed, Aug 7, 9:38 AM · Infrastructure-Foundations, SRE-tools, SRE
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T341843: Netbox rq.timeouts.JobTimeoutException, as Resolved.
Wed, Aug 7, 9:38 AM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts, as Resolved.
Wed, Aug 7, 9:38 AM · Infrastructure-Foundations, netops, SRE
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T340444: Markdown bug in Netbox-next, as Resolved.
Wed, Aug 7, 9:38 AM · Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T354169: Evaluate usage of Kubernetes/Wikikube Tags in netbox and replace them with something if possible, as Resolved.
Wed, Aug 7, 9:38 AM · Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T355899: Netbox MoveServersUplinks script doesn't handle trunked ports correctly, as Resolved.
Wed, Aug 7, 9:38 AM · Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T358339: Netbox: capirca.getHosts script runs into timeout, as Resolved.
Wed, Aug 7, 9:38 AM · Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T365989: Test Netbox-More-Metrics plugin on Netbox 4.0, as Resolved.
Wed, Aug 7, 9:38 AM · Infrastructure-Foundations, netbox
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T371036: Netbox logs filling up disk, netbox1002, as Resolved.
Wed, Aug 7, 9:38 AM · netbox, Infrastructure-Foundations
ayounsi closed T336275: Upgrade Netbox to 4.x, a subtask of T371079: Change icinga link to alerts.w.o in netbox device page, as Resolved.
Wed, Aug 7, 9:38 AM · Infrastructure-Foundations, netbox
ayounsi closed T310717: Netbox: get rid of WMF Production Patches as Resolved.

All is done !

Wed, Aug 7, 7:26 AM · Patch-For-Review, netbox, Infrastructure-Foundations
ayounsi renamed T311052: Netbox: replace getstats.GetDeviceStats with netbox-more-metrics from Netbox: replace getstats.GetDeviceStats with ntc-netbox-plugin-metrics-ext to Netbox: replace getstats.GetDeviceStats with netbox-more-metrics.
Wed, Aug 7, 7:16 AM · Patch-For-Review, netbox, Infrastructure-Foundations
ayounsi added a comment to T311052: Netbox: replace getstats.GetDeviceStats with netbox-more-metrics.

Data is back on https://grafana.wikimedia.org/d/ppq_8SRMk/netbox-device-statistic-breakdown

Wed, Aug 7, 6:59 AM · Patch-For-Review, netbox, Infrastructure-Foundations

Tue, Aug 6

ayounsi removed a project from T270101: Grants not working with DB hosts with to ipv6: Infrastructure-Foundations.
Tue, Aug 6, 1:35 PM · DBA
ayounsi removed a project from T270101: Grants not working with DB hosts with to ipv6: netbox.
Tue, Aug 6, 1:35 PM · DBA
ayounsi triaged T371892: Netbox: Remove leftovers of CAS auth as Low priority.
Tue, Aug 6, 1:30 PM · Infrastructure-Foundations, netbox
ayounsi closed T371653: New hosts with "Netbox status: unknown" as Resolved.
Tue, Aug 6, 1:11 PM · netbox, Patch-For-Review, Infrastructure-Foundations
ayounsi renamed T341843: Netbox rq.timeouts.JobTimeoutException from Netbox report test_mgmt_dns_hostname - rq.timeouts.JobTimeoutException to Netbox rq.timeouts.JobTimeoutException.
Tue, Aug 6, 1:00 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, netbox
ayounsi merged task T358339: Netbox: capirca.getHosts script runs into timeout into T341843: Netbox rq.timeouts.JobTimeoutException.
Tue, Aug 6, 12:59 PM · Infrastructure-Foundations, netbox
ayounsi merged T358339: Netbox: capirca.getHosts script runs into timeout into T341843: Netbox rq.timeouts.JobTimeoutException.
Tue, Aug 6, 12:58 PM · Patch-For-Review, User-Elukey, Infrastructure-Foundations, netbox
ayounsi removed a project from T253173: Some clusters do not have DNS for IPv6 addresses (TRACKING TASK): netbox.
Tue, Aug 6, 12:53 PM · Infrastructure-Foundations, IPv6, User-jbond
ayounsi triaged T371890: pynetbox incompatibility with Netbox >= 4.0.6 as High priority.
Tue, Aug 6, 12:51 PM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi created T371889: Upgrade Netbox to 4.1.
Tue, Aug 6, 12:29 PM · netbox, Infrastructure-Foundations
ayounsi closed T365989: Test Netbox-More-Metrics plugin on Netbox 4.0 as Resolved.

testing done!

Tue, Aug 6, 8:42 AM · Infrastructure-Foundations, netbox
ayounsi triaged T371868: cr2-codfw - Host 0 ECC single bit parity error as Low priority.
Tue, Aug 6, 6:47 AM · Infrastructure-Foundations, netops

Mon, Aug 5

ayounsi reassigned T370164: Request additional mgmt IP range for frack servers from cmooney to Dwisehaupt.

@Dwisehaupt could you send a patch to add 10.195.1.0/25to subnet-administration-codfw in the pfw filters (alongside of the current 10.195.0.64/28) ?

Mon, Aug 5, 4:27 PM · fundraising-tech-ops, Infrastructure-Foundations, DC-Ops, netops, SRE, ops-codfw
ayounsi claimed T371653: New hosts with "Netbox status: unknown".
Mon, Aug 5, 2:31 PM · netbox, Patch-For-Review, Infrastructure-Foundations
ayounsi edited projects for T371653: New hosts with "Netbox status: unknown", added: netbox; removed netops.
Mon, Aug 5, 2:29 PM · netbox, Patch-For-Review, Infrastructure-Foundations

Fri, Aug 2

ayounsi added a comment to T311052: Netbox: replace getstats.GetDeviceStats with netbox-more-metrics.

I played a bit with the plugin and it's quite easy to duplicate what we currently do:

Screenshot 2024-08-02 at 14-53-50 Device count NetBox.png (901×1 px, 232 KB)

Fri, Aug 2, 1:00 PM · Patch-For-Review, netbox, Infrastructure-Foundations
ayounsi added a comment to T311052: Netbox: replace getstats.GetDeviceStats with netbox-more-metrics.

As a data point, GetDeviceStats runs ~5000 times per day, which clutters the DB and probably contributes to Netbox's overall slowness. https://netbox.wikimedia.org/core/jobs/?q=GetDeviceStats&created__before=2024-08-02+12%3A00%3A00&created__after=2024-08-01+12%3A00%3A00

Fri, Aug 2, 9:44 AM · Patch-For-Review, netbox, Infrastructure-Foundations
ayounsi added a comment to T371653: New hosts with "Netbox status: unknown".

Confirmed working:

before
>>> robj = api.extras.scripts.get('import_server_facts.ImportPuppetDB').url
[...]
TypeError: a bytes-like object is required, not 'str'
after
>>> robj = api.extras.scripts.get('import_server_facts.ImportPuppetDB').url
>>> robj
'https://netbox-next.discovery.wmnet/api/extras/scripts/20/'
Fri, Aug 2, 8:45 AM · netbox, Patch-For-Review, Infrastructure-Foundations
ayounsi added a comment to T371653: New hosts with "Netbox status: unknown".

The re-image cookbook is hitting this bug too: https://github.com/netbox-community/pynetbox/pull/632

Fri, Aug 2, 8:30 AM · netbox, Patch-For-Review, Infrastructure-Foundations

Thu, Aug 1

fgiunchedi awarded T371079: Change icinga link to alerts.w.o in netbox device page a Like token.
Thu, Aug 1, 12:21 PM · Infrastructure-Foundations, netbox
ayounsi closed T371079: Change icinga link to alerts.w.o in netbox device page as Resolved.

No pb! Done. See for example https://netbox.wikimedia.org/dcim/devices/1969/

Thu, Aug 1, 10:42 AM · Infrastructure-Foundations, netbox
ayounsi added a comment to T358339: Netbox: capirca.getHosts script runs into timeout.
python
Traceback (most recent call last):
  File \"/srv/deployment/netbox/venv/lib/python3.11/site-packages/django/db/models/fields/related_descriptors.py\", line 236, in __get__
    rel_obj = self.field.get_cached_value(instance)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File \"/srv/deployment/netbox/venv/lib/python3.11/site-packages/django/db/models/fields/mixins.py\", line 15, in get_cached_value
    return instance._state.fields_cache[cache_name]
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'role'
Thu, Aug 1, 10:33 AM · Infrastructure-Foundations, netbox
ayounsi added a parent task for T336275: Upgrade Netbox to 4.x: Unknown Object (Task).
Thu, Aug 1, 9:50 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi closed T340444: Markdown bug in Netbox-next as Resolved.

Going to close that one as I can't reproduce it on Netbox 4. Please reopen if needed.

Thu, Aug 1, 9:02 AM · Infrastructure-Foundations, netbox
ayounsi added a comment to T371079: Change icinga link to alerts.w.o in netbox device page.

+1 to keep both for now.

Thu, Aug 1, 8:56 AM · Infrastructure-Foundations, netbox
ayounsi closed T371565: Netbox dns record generation not working as Resolved.

Fixed with https://gerrit.wikimedia.org/r/c/operations/puppet/+/1056505 and https://gerrit.wikimedia.org/r/c/operations/puppet/+/1058953

Thu, Aug 1, 7:55 AM · Infrastructure-Foundations, SRE

Wed, Jul 31

ayounsi moved T320638: Add Dell switches support to Homer/Cookbooks from This quarter to Backlog on the netops board.
Wed, Jul 31, 9:06 AM · Patch-For-Review, SRE, Infrastructure-Foundations, netops
ayounsi moved T361549: Automatically run Capirca Netbox script regularly from Backlog to This quarter on the netops board.
Wed, Jul 31, 9:02 AM · netbox, Infrastructure-Foundations, netops
ayounsi moved T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts from Backlog to This quarter on the netops board.
Wed, Jul 31, 9:02 AM · Infrastructure-Foundations, netops, SRE
ayounsi moved T370068: Upgrade anycast-healthchecker to 0.9.8 (from 0.9.1-1+wmf12u1) from Backlog to Watching on the netops board.
Wed, Jul 31, 9:02 AM · SRE, Traffic, netops, Infrastructure-Foundations
ayounsi assigned T371434: Q1:codfw:frack network upgrade tracking task to Papaul.
Wed, Jul 31, 7:33 AM · SRE, Infrastructure-Foundations, fundraising-tech-ops, netops, ops-codfw, DC-Ops

Tue, Jul 30

ayounsi added a comment to T370846: Netbox automation to move selected hosts from ASW to LSW.

We can potentially re-use the move_server.MoveServer script but make the server selection a MultiObjectVar as input and make the rack U and switch port optional.

Tue, Jul 30, 6:18 AM · netops, Infrastructure-Foundations, SRE

Mon, Jul 29

ayounsi changed the status of T344325: gNMI module in Spicerack from Open to Stalled.
Mon, Jul 29, 2:37 PM · Patch-For-Review, Infrastructure-Foundations, Spicerack, SRE-tools
ayounsi changed the status of T344325: gNMI module in Spicerack, a subtask of T320638: Add Dell switches support to Homer/Cookbooks, from Open to Stalled.
Mon, Jul 29, 2:37 PM · Patch-For-Review, SRE, netops, Infrastructure-Foundations
ayounsi added a parent task for T336275: Upgrade Netbox to 4.x: T371036: Netbox logs filling up disk, netbox1002.
Mon, Jul 29, 2:29 PM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi added a subtask for T371036: Netbox logs filling up disk, netbox1002: T336275: Upgrade Netbox to 4.x.
Mon, Jul 29, 2:29 PM · Infrastructure-Foundations, netbox
ayounsi updated the task description for T371216: Route FR to esams.
Mon, Jul 29, 8:39 AM · Traffic
ayounsi updated the task description for T371216: Route FR to esams.
Mon, Jul 29, 8:36 AM · Traffic
ayounsi added a parent task for T336275: Upgrade Netbox to 4.x: T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts.
Mon, Jul 29, 8:12 AM · Patch-For-Review, Infrastructure-Foundations, netbox
ayounsi added a subtask for T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts: T336275: Upgrade Netbox to 4.x.
Mon, Jul 29, 8:12 AM · Infrastructure-Foundations, netops, SRE
ayounsi added a subtask for T371079: Change icinga link to alerts.w.o in netbox device page: T336275: Upgrade Netbox to 4.x.
Mon, Jul 29, 6:52 AM · Infrastructure-Foundations, netbox
ayounsi added a parent task for T336275: Upgrade Netbox to 4.x: T371079: Change icinga link to alerts.w.o in netbox device page.
Mon, Jul 29, 6:52 AM · Patch-For-Review, Infrastructure-Foundations, netbox

Thu, Jul 25

ayounsi updated subscribers of T371024: cookbook failed after the fist "go" host cloudcephmod1004.
Thu, Jul 25, 2:23 PM · SRE, DC-Ops, ops-eqiad, Infrastructure-Foundations

Mon, Jul 22

ayounsi created P66886 (An Untitled Masterwork).
Mon, Jul 22, 3:36 PM
ayounsi created P66881 (An Untitled Masterwork).
Mon, Jul 22, 9:13 AM
ayounsi added a comment to T363576: Broadcom NICs with recent firmware fail to reimage.

Amazing progress !

Mon, Jul 22, 7:13 AM · User-Elukey, DC-Ops, ops-codfw, Infrastructure-Foundations, SRE
ayounsi added a comment to T319301: Netbox: manage VRRP priorities.

This doesn't actually seem to be the case? We have no priority set on either router.

Indeed, not anymore since we moved to 100G between eqiad/codfw

Mon, Jul 22, 7:03 AM · Infrastructure-Foundations, netbox
ayounsi added a comment to T364870: Q4:rack/setup/install new cloudcephmon hosts.

There is an outstanding diff on the switch for cloudcephmon1006. It looks correct, but could DCops double check it and make sure the switch config is in sync with Netbox ?

Mon, Jul 22, 6:40 AM · SRE, cloud-services-team (Hardware), ops-eqiad, DC-Ops

Jul 18 2024

ayounsi added a comment to T363576: Broadcom NICs with recent firmware fail to reimage.

That's a great idea ! Happy to help if needed.
fixed-address sretest2001.codfw.wmnet; this needs to be fixed-address $some-ip-address;

Jul 18 2024, 2:53 PM · User-Elukey, DC-Ops, ops-codfw, Infrastructure-Foundations, SRE
ayounsi added a comment to T365372: Spicerack: expand Supermicro support in the Redfish module.

I'd suggest to abstract the device creation by a custom script or cookbook. This could run additional safeguards, ask for addition information (and store them where necessary), and hide the unnecessary fields.

Jul 18 2024, 2:03 PM · Patch-For-Review, DC-Ops, Infrastructure-Foundations, SRE-tools, User-Elukey, Spicerack
ayounsi added a comment to T370164: Request additional mgmt IP range for frack servers.

Or we could just use a IPv6 /64 and stop worrying about space :)

Jul 18 2024, 7:51 AM · fundraising-tech-ops, Infrastructure-Foundations, DC-Ops, netops, SRE, ops-codfw
ayounsi closed T370366: Issue creating GNMI telemetry subscription to certain QFX5120 devices as Resolved.

Thanks for the investigation ! Seems like the last step was :

asw1-b3-magru> restart analytics-agent gracefully
Analytics agent started, pid 87102

Solved the issue.

Jul 18 2024, 7:38 AM · Infrastructure-Foundations, netops, SRE
ayounsi closed T370366: Issue creating GNMI telemetry subscription to certain QFX5120 devices, a subtask of T369384: Productionize gnmic network telemetry pipeline, as Resolved.
Jul 18 2024, 7:38 AM · netops, Infrastructure-Foundations, SRE

Jul 17 2024

ayounsi added a comment to T368513: Core router error logs: "sshd: Did not receive identification string" from prometheus hosts.

@cmooney @fgiunchedi I'm wondering if the probe could/should be changed to a TCP handshake only or totally removed. In order to reduce log spam.
Thanks to Rancid and the daily diff scripts we already get a notification within the hour if SSH gets unreachable.
While still keeping the ICMP check for the "normal" connectivity probe.

Jul 17 2024, 9:29 AM · Infrastructure-Foundations, netops, SRE
ayounsi added a comment to T370164: Request additional mgmt IP range for frack servers.

We will need to migrate the whole range to a new prefix :( Running 2 ranges is going to be a pain long term, and would need much more work on the automation side than a migration.

Jul 17 2024, 9:08 AM · fundraising-tech-ops, Infrastructure-Foundations, DC-Ops, netops, SRE, ops-codfw
cmooney awarded T355750: CFSSL gencert "remote error: tls: certificate require" a Love token.
Jul 17 2024, 9:04 AM · Infrastructure-Foundations, CFSSL-PKI
ayounsi closed T355750: CFSSL gencert "remote error: tls: certificate require" as Resolved.

Yep it's all good ! I manually added the host to gNMIc and metrics are properly being collected/exposed. Thanks !

Jul 17 2024, 8:14 AM · CFSSL-PKI, Infrastructure-Foundations

Jul 16 2024

ayounsi closed T369690: Netbox : sync src/ submodule as Resolved.

Thanks.
I went that way, master on https://gerrit.wikimedia.org/r/plugins/gitiles/operations/software/netbox has been updated to match upstream's master.
https://gerrit.wikimedia.org/r/c/operations/software/netbox-deploy/+/1053243 for the netbox-deploy side.

Jul 16 2024, 12:05 PM · netbox, Infrastructure-Foundations
ayounsi added a comment to T364092: Upgrade core routers to Junos 22.4R3.

There has been a spike of CPU usage on cr1-eqiad (with no impact), not sure if just a coincidence.

Jul 16 2024, 7:01 AM · netops, Infrastructure-Foundations, SRE
ayounsi closed T370048: cr3-ulsfo flapping on July 14 as Resolved.

Closing this task in favor of T364092: Upgrade core routers to Junos 22.4R3.

Jul 16 2024, 6:56 AM · SRE, Infrastructure-Foundations, netops

Jul 15 2024

ayounsi claimed T369690: Netbox : sync src/ submodule.
Jul 15 2024, 2:34 PM · netbox, Infrastructure-Foundations
ayounsi triaged T369136: Package ipxe-qemu as Low priority.
Jul 15 2024, 2:33 PM · Packaging, Infrastructure-Foundations