Nov 12, 2020

FUCK YOU HP (aka how to downgrade firmware on HP 254dw)

 Fucking shoot me in the face. Nothing in this world works, any any attempt to make it better just makes it worse.

This time it's HP. FUCK YOU HP.

< Actual ranting goes on for a while, just scroll past the image for downgrade steps >

I've had this image on my printer for the past few weeks, but since I didn't have to print anything, I figured it'd go away eventually. Printer has been pretty solid for the last 2 years, and just fed it brand new cartridges.


Well... Turns out a few things:

a. Last time I upgraded firmware, it enabled auto upgrade.

b. in Nov 2020, HP pushed a firmware that disables 3rd party cartridges. Instantly bricking 1000s of customers printers all over the world. FUCK YOU AGAIN.  Nice write up here:   https://borncity.com/win/2018/09/19/blocks-hp-firmware-update-third-party-ink-catriges-again/ 

c. Now, instead of all colors for $50 from amazon, customers have to pay $600 for 4 high capacity cartridges.




Downgrade steps - w/o special installers

This process follows the HP article here: https://support.hp.com/nz-en/document/c01711356#AbT3
How to disable auto upgrades going forward:  https://www.youtube.com/watch?v=3DxGnet3XLg




Step 1. Download new firmware from HP FTP archives.

Grab both, 2019 and 2020 firmwares
2: ftp://ftp.hp.com/pub/networking/software/pfirmware/HP_Color_LaserJet_Pro_M254_dw_Printer_series_20200612.rfu  or anywhere on the internet that may host it by the time you read this.

Note: Looks like 2020-06 is published same day as November, and doesn't work as the first upload. So you'll be downgrading to 2019 to unbrick your printer first.


 


Step 2. Get your printers IPv4 or IPv6 address address. #3 or #5 below.



Step 2b: Enable FTP access in networking





Step 3: FTP into printer using and upload file using put command
Type put and type a space, and then drag and drop the 2019 .RFU file onto the terminal window.




Step 4: Install firmware

Within a few seconds (up to 5 minutes per doc above) the printer will self reboot and self install it.



Step 5: To get 2020-06 version:

For whatever reason, I couldn't upgrade to 2020-06 immediately after 2019 version. I did get it to work using following steps (no idea if all of them are necessary, but I don't care to test)

After 2019 firmware is done, reboot printer.
Check for latest update, decline update.
Disable FTP
Enable FTP
Install 2020-06 successfully


Done

You're back in business with amazon cartridges.
Now, maybe disable FTP printing, maybe not. Up to you ¯\_(ツ)_/¯ 

Nov 7, 2020

Booking tickets during Covid-19 with ITA

Years ago I read a wonderful blog post showing some advanced use-cases for ITA

I have a following problem: I need a ticket from Russia to US that doesn't go through any countries with locked borders due to Covid-19. 


So, Step 1: List of rules: 

Sorry I didn't clean up formatting, that's not the point of the post.

From CDC: https://www.cdc.gov/coronavirus/2019-ncov/travelers/from-other-countries.html

plane solid icon

With specific exceptions, foreign nationals who have been in any of the following countries during the past 14 days may not enter the United States. For a full list of exceptions, please refer to the relevant proclamations in the links below.


And another one from Trip.com < Full URL>

United StatesPartially restricted
Foreign nationals who have visited any of the following countries / regions within 14 days of their arrival in the United States are prohibited from entering: Austria, Belgium, Brazil, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Iran, Ireland, Italy, Latvia, Liechtenstein, Lithuania, Luxembourg, Mainland China, Malta, Monaco, Netherlands, Norway, Poland, Portugal, San Marino, Slovakia, Slovenia, Spain, Sweden, Switzerland, Vatican City, United Kingdom.

All persons returning to the United States from abroad should self-isolate and monitor their health status for 14 days. From March 20, 2020, the United States government has suspended the issuing of new visas. From March 21, 2020, all US land border checkpoints with Canada and Mexico are closed to non-essential traffic. From June 24, 2020, additional restrictions are in place for persons holding special category visas including H-1B, H-2B, J-1, and L-1 visas. Travelers should verify their status with the US Department of Homeland Security prior to making travel arrangements.


Next, I had to find that wonderful blog post:

 Luckily I found it very quickly. Turns out, not many people write about ITA

>>>   https://creativeroam.com/blog/advanced-flight-searching/  <<<

READ THIS:> The big reason I needed this post, was for the attachments. It has a very convenient list of ALL airport codes by continent and by region. These codes are needed for filtering in ITA. Otherwise, I would have to fetch all of the airport codes for each of the countries in the list. 

Btw, text extraction processing in One Note didn't extract even 20%. If you've read this far into the blog post, I would appreciate a suggestion for a good took to extract text from images.

This wonderful document is here: < Link to google excel



Next, feed this info into ITA:

ITA is here: https://matrix.itasoftware.com/

You need to enable Advanced Controls and add syntax to Routing codes to disable connections. The Question mark sign has documentation with examples on the knobs you can turn.



So, outbound extension code for me, is to Disable NY (JFK) and Chicago (ORD) as connections on US side, because there are some connecting flights via SFO. I learned it by poking around flight searches. Also, I had to Disable all of Europe.

Routing code for that is a space delimited list of airports. Just enable header filters in google sheets, and then paste codes straight from the column. Comma delimited list was the old format, and isn't needed (and doesn't work) any more. When it's ready, paste it into the outbound extension code box:

-cities JFK ORD SJJ SOF BUD SKP BUH OTP EVN BAK MSQ OMO BOJ GOZ ROU SLS TGV VAR VID DBV LSZ OSI PUY RJK SPU ZAD ZAG QUF TLL TBS RIX VNO OHD CND AER KHV HTA IKT KZN MRV MOW DME SVO VKO MMK OVB LED UUD VLU ARH YKS BTS LJU MBX KBP IEV LWO NLV ODS SIP BEG INI QND TGD PRN TIV TIA INN SZG VIE CPH HEL BER SXF TXL DRS HAM ATH HEW CFU KGS JMK MJT RHO SKG IBZ ORK DUB GWY KIR NOC SNN CAG MLA BGO OSL TRF KRK WAW LIS PDL PMI SVQ VLC GOT STO ARN BMA BHD BFS PIK GLA INV ALV GRZ KLU LNZ ANR BRU LGG AKT LCA QLI NIC PFO PRG AAR AAL BLL EBJ FAE KRP ODE RNN SKS SGD TED ENF IVL JOE JYV KAJ KHJ KEM KTT KOK KUO KAO LPP MHQ MIK OUL POR RVN SVL SJY SOT TMP TKU VAA VRK AJA LBI NCY AUR BIA BIQ BOD BES CLY CMF CFE DNR FSC FRJ GNB LRH LAI LIL LIG LRT LDE LYS MRS MZM MPL MLH ENC NTE NCE FNI PAR CDG LBG ORY PUF PGF UIP RNS RNE RDZ SBK EBU SXB TLS AGB BYU BRE CGN DTM DUS ERF FRA HNN FDH HAJ HOQ FKB KEL CGN LEJ MUC FMO MSR NUE PAD SCN STR GWT WIE GIB GPA CHQ JKH HER KLX AOK KVA PVK SKG SMI JSI JTR ZTH CXI EGS REK KEF SXL AHO AOI BRI BGY BLQ VBS BDS CTA FLR GOA SUF LMP MIL LIN MXP BGY NAP OLB PMO PNL PEG PSR PSA REG RMI ROM CIA FCO TPS TSF TRS TRN VCE VBS VRN LUX AMS HAG EIN LEY MST RTM AES ALF BDU BOO BNN EVE FRO HFT HAU KKN KRS KSU LKL SOG SVG TOS TRD GDN POZ SZZ FAO FNC HOR OPO PXO SMA TER ALC LEI ACE BJZ BCN BIO ODB FUE GRO GRX XRY LCG LPA MAD MAH AGP MJV OVD REU EAS SPC SDR SCQ TCI TFS TFN VLL VDE VGO VIT ZAZ LYR JHE JKG KLR KSD KRN KID LDK LLA MMA MMX NRK ORB RNB SDL VXO VST VBY ACH BSL BRN ZDJ GVA LUG ZRH EAP TFN TFS SZD ABZ BHX BRS CBG CWL EMA LDY EDI GCI HUY IOM JER LBA LPL LON LCY LGW LHR LTN STN MAN NCL KOI SOU SEN STN SYY LSI MME WIC


From here, you can now filter down your results using the dropdowns in UI, or going back and adding additional controls on search screen.


What the excel looked like:


 


That's it.
You can have google flights in another window. It's got much easier controls, but doesn't show all of the options.




Oct 29, 2020

Migrating Atlassian users from Google SSO to Azure SSO

 Scenario: You decided to move your company (and all of your users) from Google to Azure.

You would assume that as an Admin, you could bulk migrate all of your users, but you can't. Don't ask why, I am pretty sure Atlassian themselves don't know why.


Every user will have to migrate themselves, and work through Atlassians "not bugs but features" bugs.


Step 0Turns out this is important: Validate that you can login with email account and not Google SSO. 
  • Log out and log back in using @<your company>.com email address & password combination. If you don’t know, reset your password for @<your company>.com address. 
  • After you know your password, proceed.
  • Log in using Username / Password combination.

Next: Update your email address to different SSO

  • Now that you’re logged in, browse to https://id.atlassian.com/manage-profile. Then select “Email” (direct URL: https://id.atlassian.com/manage-profile/email )

    OPTIONAL: While you’re here, why not use the Email Preference Center to unsubscribe from marketing emails?

     

  • Enter your <new email address> in the box and save your changes.

  • You will get confirmation email in your <new email> inbox.

  • Open the email, click the link, and you’ll taken to the login in screen showing your <new email address> account.

  • Use your <new email address> and old password.

    THIS IS CRITICAL and a massive source of pain (which is "behavior as expected" according to Attlassian.)

    • If you don’t know your password, stop, log out and go to step 0. Resetting password here will create the new account, and you’ll have to go to troubleshooting section.


Last: Connect to Azure SSO

  • Super duper important (otherwise you’ll create additional headaches for youself): Review your Atlassian profile to validate that your email address has been updated to <new email associated with Azure> (https://id.atlassian.com/manage-profile)

  • Log out

  • Log in using "Sign in with Microsoft" button

  • You will get an email in your new account with a long string of numbers.

  • Enter it into the prompt and you're done.

  • Now under your "Email" page in Account settings, you will see a banner that reads "Your account is connected to a Microsoft account. Changing the email address here will disconnect your account from the Microsoft account."

Final: If you’ve ever posted in the Atlassian forums, you will get another confirmation email with a link to approve Forums email address update.



Troubleshooting:

Q: At confirmation screen: you don’t know the password for your <new email address> 

A: This is actually the password for your <old account> account. Attlassian claims it's not a bug



Q: You’re getting an error that account is already in use?

A: This means that somehow an Atlassian account already exists. You’ll have to free it up first, and then switch the <old account> to <new account>

The path of least resistance is to log into conflicting account, and update email address to some other email address. (dont forget to mark it for deletion when it's all done)

Once that's done, and your desired email address gets freed up, and you can attempt to switch address to Azure SSO again using steps above. 

Note: Due to Atlassian creating accounts on login attempts without explicitly asking for an account, this problem can happen multiple times. Sorry

After successful switch, log into the extra account, and delete it from "Account Preference" page (https://id.atlassian.com/manage-profile/account-preferences). it’ll take 2 weeks.



Q: You’re getting an error that you need to wait 24 hours due to email update limit.
A: Wait 24 hours.

Dec 9, 2018

troubleshooting Amazon Photo

Straight to the answer cause it's past midnight.

 get-winevent -listLog * -ErrorAction 0 |?{$_.recordcount -gt 0} | %{get-winevent -FilterHashtable @{logname=$_.logname}} | Where-Object -Property message -like "*photo*" | out-gridview

What was I trying to solve?
I was getting some error message on the AmazonPhoto window that didn't fit the screen. And The folder where Amazon  Photo was installed didnt have any logs - at least none that I could find.

So, I concocted above powershell to scan all the logs to find all of the error messages that it could have thrown.

Where I started:
Get-ChildItem -recurse | Select-String -pattern "the external drive you were uploading from" | group path | select name

Answers:

C:\Users\xxx\AppData\Local\Amazon Drive\AmazonPhotos.exe
C:\Users\xxx\AppData\Local\Amazon Drive\en-AU\AmazonPhotos.resources.dll
C:\Users\xxx\AppData\Local\Amazon Drive\en-GB\AmazonPhotos.resources.dll

Nov 12, 2018

Deleting amazon glacier vault (crash course in AWS/Glacier/CLI)


Part 1


WOW! Who would have thought it to be so painful!

Some background, feel free to skip:
Problem is Glacier UI doesn't support deleting files. Only CLI/SDK are supported.

The problem is, AWS charges for transfer to Glacies, and maintenance. So an entire qnap full of pictures sent to Glacier is ~$50 per months, plus I would guess a few $100 to restore. The plus here is setting up glacier backup to QNAP was pretty damn easy.

Initially when I compared Glacier to other file storage, it looks cheaper and safer, but at current pricing model it's very poorly suited to a bunch of small files. At 1.6 TB of pictures, I ended up with 800'000 files (aka archives) and my transfer bill became $40+ per months with storage bill at $6.20 per month. I had 6 more TB to go.

------- Start -------
Been a while since I logged into AWS console. First, had to create a new IAM user & Give him Glacier full control and S3 full control[1]. Expert advice: make a backup, but keep this tab open

Then, I installed powershell tools, and found out they dont support Glacier. Don't bother installing powershell tools

Next, install developer tools. Turns out they are wrong and windows 10 blocked whatever they had to do, so it didn't work anway. (this is about an hour in at this point). Install the next set of dev tools, and looks like it worked (AWSCLI64PY3.msi)[2]

get aws creds configured with aws configure

Dick around with the aws help glacier and end up nowhere. I can't delete vault cause it has items inside. Cant delete items, because you don't know their IDs and it's poorly documented anyway. There are articles going back to 2013 with people suffering.

You can use linux shorthand so that's useful aws glacier describe-vault --va qnap --a -

It looks like you can get the full inventory, mark all archives for deletion, and then feed it back into delete-archive command.

The command in the helpfile doesn't work, so you have to massage it in the most awkward way with double quotes:
aws glacier initiate-job --acc - --va qnap --job-parameters '{""Type"": ""inventory-retrieval""}'

Mind you, this is probably nearing two hours at this point. Looks like the command takes a long time to run, so i went off watching Westworld. Gonna resume tomorrow.

aws glacier list-jobs --ac - --va qnap

------------------------

Part 2


I finally gave up, and went with https://github.com/leeroybrun/glacier-vault-remove

In order to not contaminate my home box I used vagrant and virtual box to standup some flavor of linux - I think and then hacked inside that. It was pretty quick to get going.

Main point is you need Python 3 (more info here: https://github.com/leeroybrun/glacier-vault-remove/issues/29)


Links:
1. https://console.aws.amazon.com/iam
1h. https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html

2. https://docs.aws.amazon.com/acm-pca/latest/userguide/PcaInstallCLI.html

Apr 15, 2017

Working with multiple Chef Orgs and persisting Amazons EC2 access across multiple knife.rb files.


So, I've been doing something for so long, I didn't even realize it.

The way knife looks for configuration is by looking for .chef folder from your current location, and then recursively going down.

Setting up .chef folder

So, I've created my .chef folder in  ~/     (c:\chef on windows).

Next, you cd .chef and  git init it - no remote, just a local repo.
You can't have completely empty git repo, so touch a file and do an initial commit on master.
Now, make a branch (git checkout -b my_org_06) with some coherent naming convention.
Now, grab your starter kit or just make files by hand.
git add . ; git commit -m 'my keys before I accidentally re-downloaded the starter kit'
Done, Next

Setting up second .chef folder

When you need keys for another org or another chef server:
git checkout master
Now, make a branch (git checkout -b my_org_06) with some coherent naming convention.
Now, grab your starter kit or just make files by hand.
git add . ; git commit -m 'my keys before I accidentally re-downloaded the starter kit'

Setting up knife to work with AWS

simply add the following to knife.rb:
knife[:aws_access_key_id] = env['AWS_ACCESS_KEY']
knife[:aws_secret_access_key] = env['AWS_SECRET_KEY']

wait what? hold on...

Setting up AWS command line tools

add the following with your keys to your bash / zsh / emacs / etc.. 
export AWS_ACCESS_KEY='DFJKLWEISKDJFKLSDFJK'
export AWS_SECRET_KEY='ec/d8HwiDkwork802idvnwl9f/e9KEoos09kxlwd'
export AWS_CREDENTIAL_FILE='/Users/alexvinyar/supersecretlocation/alexv.pem'
export JAVA_HOME=$(/usr/libexec/java_home)

Now your knife ec2 and your ec2-* commands will work from anywhere.

Setting up project

On pretty rare occasions you may need to have .chef inside the repo for whatever reason.
simlink .chef into your cookbook or project, and now it's as if it was local.

ln -s ~/.chef .chef

Oct 5, 2016

Modifying chef resources after they're already in a resource collection

BOOM!

First, lets fire up chef-shell to demonstrate by creating a basic resource

$ chef-shell  
chef (12.14.57)> recipe_mode
chef:recipe (12.14.57)> file 'testing_edit' do
chef:recipe > content 'words'
chef:recipe ?> end
 => <file[testing_edit] @name: "testing_edit" @noop: nil @before: nil @params: {} @provider: nil @allowed_actions: [:nothing, :create, :delete, :touch, :create_if_missing] @action: [:create] @updated: false @updated_by_last_action: false @supports: {} @ignore_failure: false @retries: 0 @retry_delay: 2 @source_line: "(irb#1):1:in `irb_binding'" @guard_interpreter: nil @default_guard_interpreter: :default @elapsed_time: 0 @sensitive: false @declared_type: :file @cookbook_name: nil @recipe_name: nil @content: "words">

Easy way to modify resource collection

Now, I am going to modify this resource using a NEW resource  edit_resource
chef:recipe (12.14.57)>
chef:recipe >
chef:recipe > edit_resource(:file, 'testing_edit') do
chef:recipe > content 'different words'
chef:recipe ?> end
 => <file[testing_edit] @name: "testing_edit" @noop: nil @before: nil @params: {} @provider: nil @allowed_actions: [:nothing, :create, :delete, :touch, :create_if_missing] @action: [:create] @updated: false @updated_by_last_action: false @supports: {} @ignore_failure: false @retries: 0 @retry_delay: 2 @source_line: "(irb#1):1:in `irb_binding'" @guard_interpreter: nil @default_guard_interpreter: :default @elapsed_time: 0 @sensitive: false @declared_type: :file @cookbook_name: nil @recipe_name: nil @content: "different words">
chef:recipe (12.14.57)>

Coolness (remove resource from collection):

edit_resource(:file,'testing') do
chef:recipe > action :nothing
chef:recipe ?> end
 => <file[testing] @name: "testing" @noop: nil @before: nil @params: {} @provider: nil @allowed_actions: [:nothing, :create, :delete, :touch, :create_if_missing] @action: [:nothing] @updated: false @updated_by_last_action: false @supports: {} @ignore_failure: false @retries: 0 @retry_delay: 2 @source_line: "(irb#1):1:in `irb_binding'" @guard_interpreter: nil @default_guard_interpreter: :default @elapsed_time: 0 @sensitive: false @declared_type: :file @cookbook_name: nil @recipe_name: nil @content: "words">


Oct 4, 2016

Setting up chef Automate / Workflow (aka: delivery) in completely air gapped environment - level 1

Manual delivery install in airgapped env (AWS in Oregon)


Creation of Air-gapped environment



Create DHCP option set

Create vpc 'alexv-manual automate in airgapped env'
  set DNS resolution to Yes
  set DNS Hostname to Yes
  set DHCP option set to one above

Create a Windows 'jump box' inside VPC
  network: vpc above
  subnet - create new
    VPC - vpc above
    AZ - no pref
    CIDR - same as vpc
  refresh vpc field and select subnet
  assign public IP - true
  Network - default
  Storage - default
   (make sure you have enough free space to store all of the binaries needed inside VPC)
   (i used 40 gigs) - this may mean that you have to expand default hard drive to occupy full HD space
  Tag:
    Name - alexv-jump box
  SG:
    create new SG 'jump box'
    RDP - anywhere
    HTTP - anywhere
    HTTPS - anywhere
  Select your keypair
  ** On your local Mac inside RPD app, enable folder redirection when you add this box.
     set folder redirect to the location where your delivery.license file lives
  on the windows box, install filezilla - to make it easy to transfer files

Installing Delivery


Create Chef Server
  m3.medium
  VPC - same as above
  auto assign public ip - false
  storage - change to 30
  tag - alexv-chef-server
  SG:
    create new SG "chef server"
    open port 22
    open  All ICMP
    10`000-10`003
    8989
    HTTP
    HTTPS
  Keypair - select yours

Create Workflow server
  click on chef-server, select more like this
  VPC - same as above
  Subnet - internal subnet
  auto assign public ip - false
  storage - change to 30
  tag: alexv-Workflow-server
  SG:
    create new SG "Workflow server"
    open  port 22
    open  All ICMP
    10`000-10`003
    8989
    HTTPS
    HTTP
    (maybe needed?) 9200 - due to elastic search get errors
    (maybe needed?) 5672 - due to another elastic search failure?
  Keypair - select yours

Create Windows (or *nux) build node
  network: vpc above
  subnet
    VPC - vpc above
    AZ - no pref
    CIDR - same as vpc
  refresh vpc field and select subnet
  assign public IP - false
  Network - default
  Storage - default
  Tag:
    Name - alexv-windows build node
  SG:
    create new "windos build node"
    open RDP - anywhere
    open All ICMP
    open 5984-5986 anywhere (for rdp)
  Select your keypair

  Internet Gateway:
    create internet gateway - alexv-air gapped
    attach to VPC (above)

  Route:
    when you create VPC, it created a route table
    edit:
      add 0.0.0.0/0 -> point at internet gateway
    Save

Create 4 CentOS boxes to be environment nodes
  medium size
  HDD default
  SG - copy from workflow server
  Name SG "environment nodes"
  create

Create 2 CentOS boxes to be build nodes
  medium size
  HDD - 15 gigs
  SG - copy from workflow server
  Name SG "build nodes"
  create


Actually Install and Configure Automate

on the Chef Server and Automate node - follow directions
===================
disable ipv6 in /etc/hosts
make sure they can ping each other
make sure they can resolve dns of each other
make sure they cant access internet


Jump Box (or workstation)
===================
copy target os binaries into jump box: chef server, automate, push jobs server, chefdk, chef manage, supermarket if needed.
copy binaries to correct server /tmp folder
copy chefdk for use on workstation as a management node
setup user ssh auth
  ssh-keygen -t rsa -b 4096 -C "you@example.com"


Chef Server
===================
install chef server per directions
chef-server-ctl user-create alex alex alex@chef.io 'alexalex' --filename /tmp/alex_user.pem
chef-server-ctl org-create alex_org 'Fourth Coffee, Inc.' --association_user alex --filename /tmp/alex_org-validator.pem

install push jobs per directions:
  sudo chef-server-ctl install opscode-push-jobs-server --path /tmp/opscode-push-jobs-server.x86_64.rpm

sudo chef-server-ctl user-create delivery delivery user deliver@chef.io 'alexalex' --filename /tmp/delivery_user_key.pem
sudo chef-server-ctl org-create automate_org 'org description'  --filename /tmp/automate_org-validator.pem -a delivery

Install manage: (optional)
sudo chef-server-ctl install chef-manage --path /tmp/chef-manage-2.4.3-1.el6.x86_64.rpm
reconfigure chef, push, manage



on the Delivery server
===================
install delivery
setup command: sudo delivery-ctl setup \
                      --license /tmp/automate.license \
                      --fqdn ip-10-0-0-67.ec2.internal \
                      --key /tmp/chefserver/delivery_user_key.pem \
                      --server-url https://ip-10-0-0-80.ec2.internal/organizations/automate_org
copy all PEMs from chef server to delivery (validator, admin, delivery_user)
Enter name of your enterprise
  example: alex_ent
  (note: look for a bug here where enterprise is created, but admin creds are not displayed nor created in /etc/delivery/<enterprise-admin-credentials>)
  (if bugged) creat enterprise manually
    delivery-ctl create-enterprise alex_ent --ssh-pub-key-file=/etc/delivery/builder_key.pub
Copy ChefDk binary to /tmp/chefdk-0.18.30-1.el6.x86_64.rpm
install build node
  sudo delivery-ctl install-build-node -I /tmp/chefdk-0.18.30-1.el6.x86_64.rpm -f 10.0.0.23 -u chef -P chef

Verify build node works with `knife node status`
  this will query push jobs server for status of each node (different from knife status)
  available means push jobs can communicate with the node (you will know that at least push jobs is running at this point)
Verify you can fire off a push job:
  knife job start chef-client --search '*:*'

create user (via UI or CLI)

add public ssh key from workstations `ssh-keygen` step to the user
  delivery ui -> user -> ssh pub key

Jump Box (or workstation)
===================
Install chefdk
configure knife.rb with delivery key for communication with chef server
  example:
  node_name            'delivery'
  chef_server_url       "https://ip-10-0-0-80.ec2.internal/organizations/automate_org"
  client_key           'C:\Users\chef\.chef\delivery.pem'
  trusted_certs_dir    'C:\Users\chef\.chef\trusted_certs'
  # analytics_server_url 'https://cad-chef-server/organizations/cad'
  cookbook_path 'C:\Users\chef\chef-demo\cookbooks'

fetch certs if needed
  knife ssl fetch

verify knife works
 knife node list
 (or from delivery server)
  knife node list -k /etc/delivery/delivery.pem -u delivery --server-url https://ip-10-0-0-80.ec2.internal/organizations/automate_org

Pull down all of the cookbook dependencies to be used in air-gapped env (i do it via berks)
  mkdir repo
  cd repo
  chef generate cookbook staging (this will be the first test cookbook)
  modify metadata.rb of seeding cookbook to include:
    depends 'delivery-truck'
    depends 'push-jobs'
    depends 'build_cookbook'
    depends 'delivery_build'
  mkdir seeding
  cd staging\.delivery\build_cookbook
  run `berks vendor ..\..\..\seeding` to pull down all dependencies into a local folder

upload necessary cookbooks up to chef server
  knife cookbook upload -o seeding -a
  (or alternatively `knife cookbook upload delivery-truck --include-dep -o seeding`

test ssh auth to delivery box
  ssh -l alex@alex_ent -p 8989 ip-10-0-0-67.ec2.internal

Configure delivery cmd - C:\Users\chef\cookbooks\staging\.delivery\cli.toml
  in root of staging cookbook$ delivery setup -e alex_ent -o automate_org -s
 ip-10-0-0-67.ec2.internal -u alex

make sure you can interact with delivery via delivery cli:
  Verify API works
    delivery api get users
    delivery api get orgs
  verify you can create a project
    create a cookbook
    `delivery init` inside that cookbook


First pipeline
===================
i'll use staging cookbook as it's a nice example
initialize delivery pipeline
  inside staging cookbook run `delivery init`
bump metadata.rb if needed
modify config.json to exclude spec and test folders due to foodcritic testing them, leading to workflow epic failing on linting phase.
  $ cat config.json
      {
        "version": "2",
        "build_cookbook": {
          "name": "build_cookbook",
          "path": ".delivery/build_cookbook"
        },
        "delivery-truck":{
          "lint": {
            "foodcritic": {
              "excludes" : ["spec","test"]
            }
          }
        },
        "skip_phases": [],
        "build_nodes": {},
        "dependencies": []
      }
change Berksfile (of build cookbook)
  Since you're not connected to internetz, you'll fail all phases of workflow due to Berksfile
  change source to :chef_server
    $ cat Berksfile
      source :chef_server
      # or your internal supermarket
      metadata

      group :delivery do
        cookbook 'delivery_build'#, chef_api: :config
        cookbook 'delivery-base'#, chef_api: :config
        cookbook 'test', path: './test/fixtures/cookbooks/test'
      end

      #original
      # group :delivery do
      #   cookbook 'delivery_build', git: 'https://github.com/chef-cookbooks/delivery_build'
      #   cookbook 'delivery-base', git: 'https://github.com/chef-cookbooks/delivery-base'
      #   cookbook 'test', path: './test/fixtures/cookbooks/test'
      # end

add and commit changes
  git add -u
  git commit -m 'very descriptive comment'
  delivery review


Bill of Materials:
===================
Filezilla (windows) - management node
Chef-server-core-12.9.1
delivery-core-0.5.346
push-jobs-1.1.6
chefdk-chefdk-0.18.30-1.el6.x86_64.rpm
  note: seems like chefdk 17.17 doesnt work in isolated environment with a Yajl error
chefdk-18.30 for windows
chef manage rpm
supermarket rpm
berks vendor of `build cookbook`
  should include all of the following:
     build-essential
     build_cookbook
     chef-ingredient
     chef-sugar
     chef_handler
     compat_resource
     delivery-base
     delivery-sugar
     delivery-truck
     delivery_build
     dmg
     git
     mingw
     packagecloud
     push-jobs
     runit
     seven_zip
     test
     windows
     yum
     yum-epel


troubleshooting.
================

*) The setup command *may* create an enterprise for you. If you see that behavior, and do not get credentials as an output, you will have to delete the enterprise, and create it again using create-enterprise command.

*) node create command installs push jobs via this script:
https://github.com/chef/delivery/blob/114649cc8d6ddbf494a9666ef476e6a4b8523a7f/omnibus/files/ctl-commands/installer/gen_push_config.sh
..which is called by this script:
https://github.com/chef/delivery/blob/2ab9d4809e4ac1f237b52ee20088b1ac68d85af4/omnibus/files/ctl-commands/build-node/installer.rb#L217


Aug 11, 2016

Getting started with DevOps - Basic Chef (and any other CI/CD environment)

A got a question a few weeks back, and I think the answer is worth sharing.


  • Do we need to have a chef workstation hosted in our cloud environment that everyone logs in to (something like a jump box configured with chefdk and all the plugins), or can users spin up VMs off their own machine and that is used as the Chef Workstation? 
  • If spinning up VMs off our machine, how do we connect to chef-repo which I’m setting up in AWS? 
  • We need to connect to git for source control. I have set up an enterprise git instance - how do I change the cookbooks to connect to our instance of git?


This is going to be a lot of words, because there is no easy answer...
What one starting a similar journey should do though, is take the below suggestions, and run through them iteratively. Version 1, ver 2, 3...etc. Don't try to do everything at once.

Also, https://github.com/chef-customers/dojo-assessment-guide is a fantastic tool to figure out where you are in the DevOps journey, and where others typically go.

Git:

Source control will be your base, so it's first in the list.

For git, there are a lot of articles about "forking the code" and the eventual price of having done that. So, when it comes to community cookbooks, best course of action is - don't fork community cookbooks (or at the very least don't fork it for very long).

If you rely on a community cookbook, take it, along with the full git history, and upload it into your private git and your chef server. Any changes to community cookbooks should be done via `wrapper cookbook` (mycompany_apache for apache, etc..).

If some feature is not supported (or you found a bug), make a change and ASAP push the change back to the community via PR - this is so you don't have to maintain the fork, and can take advantage of improvements in the public version. (example: you're using apache 6 with your private hotfix, apache 7 comes out, and it's drastically different. Due to your custom changes, you need to spend 20 hours merging the versions and applying any custom hotfixes you've accumulated. You further fork the code to make it work. You spend all your days fixing bugs. Your head hurts from drinking too much coffee....)

Chef repo:

So, a good pattern is to have a git repo called chef_base or mycompany_base, etc..
Usually, a user starts by cloning the this repo locally. Any updates that would affect the whole company would be pushed back to git, so every user can benefit from it.

In the chef-base you'll have your environments folder with various environemnts, roles with roles, and a folder called cookbooks (execute `chef generate repo test` for a basic example). You would have chefignore and .gitignore filled out as per your org rules. You would then do something about .chef folder - either have a symlink that points to a known location, use a known variable to load the file, or leave it up the the user to fill in the details. Typically the .kitchen or vagrant file to stand up a cloud workstation would live here (more on this later). The \cookbooks folder is either empty or has a global cookbook like chef_workstation in there.

So, when a user starts working with chef, they clone chef_base and have everything they need to get started. All they do, is go into \cookbooks folder, and git clone the thing they want to work on.  This keeps chef-repo and each cookbook they work on completely independent. If they want to add new community cookbooks to your org, they follow the same process as above: clone community cookbook into /cookbooks and push it internally.

Chef workstation:

So, you definitely want to have each user have their own cloud workstation. Also, they should have the ability to create/destroy them whenever they need to. On average, workstations don't survive longer than a week (not should they).

Locally is pretty easy. Use test kitchen, or vagrant, mount the local chef-repo folder inside the VM and you're done.
(Here is how kitchen would work: https://github.com/test-kitchen/kitchen-vagrant#-synced_folders)
(here is how vagrant would work: https://www.vagrantup.com/docs/synced-folders/basic_usage.html)

With Amazon/Azure/Aws/Vmware/etc.. mounting a folder is done differently. In this scenario, when users run Test Kitchen, it would create additional VMs in the cloud. You'll need to setup a sync mechanism if your virtualization platform doesn't support mounting local folder on a VM. You could give users an EBS volume they could share across workstation and local dev machine. Or just a regular network share they can mount locally and on a workstation.  Also, I heard https://atom.io/packages/remote-sync works really well, however I never touched it personally.

Key takeaway here is that lots of companies are going down the VM road.

The long version is that this decision will be guided by a couple of factors - how powerful your users workstations are, your companies business direction, how much money your DevOps initiative has been given.

What I've seen in the wild is very interesting. The best and the worst shops use nearly identical workstation hardware. On the one end of the bell curve, there are companies where employees have 2 gig laptops incapable of opening notepad in under 30 seconds. All work is done locally, and is very painful. On the other end, you also have 2 gig laptops - though typically surface and mac book minis - however, in this case, all of the work is done in the cloud, and these machines are plenty powerful since they are simply used to RDP into remote resources (and used for skype and facebook the rest of the time).


Hope that helps.
Alex-

Apr 25, 2016

Resetting opscode-reporting password

One in a while upgrading opscode-reporting goes wrong. Or it doesnt start, you do manual clean up, and basically passwords go out of sync.

Solution is pretty straight forward - reset the passwords to what the system things the passwords should be.

1.
Open up /etc/opscode-reporting/opscode-reporting-secrets.json
Grab opscode_reporting & opscode_reporting_ro passwords and pipe them to opscode-pgsql

echo "ALTER USER opscode_reporting PASSWORD 'XXXXX' " | su -l opscode-pgsql -c 'psql'
echo "ALTER USER opscode_reporting_ro PASSWORD 'XXXXX' " | su -l opscode-pgsql -c 'psql'

You should get the result "ALTER ROLE" from each of the 'echo' commands

2.
Next, make sure rabbitmq password is in sync:
In the same .json file, in the "opscode_reporting" section, grab the "rabbitmq_password" and use it in place of XXXXX

PATH=/opt/opscode/embedded/bin:$PATH rabbitmqctl change_password runs XXXXX

3.
then chef-server-ctl restart opscode-reporting


4.
And finally, you might still be broken.
If you look at the process list and see an error similar to below, send the HUP to svlogd to reload the configs.


root      1456  0.0  0.0   4092   196 ?        Ss    2015   3:12 runsvdir -P /opt/opscode/service log: vlogd: pausing: unable to rename current: /var/log/opscode/opscode-reporting: file does not exist?svlogd: pausing: unable to rename current: /var/log/opscode/opscode-reporting: file does not exist?svlogd: pausing: unable to rename current: /var/log/opscode/opscode-reporting: file does not exist?svlogd: pausing: unable to rename current: /var/log/opscode/opscode-reporting: file does not exist?

So grab the correct pid by running chef-server-ctl status

...
run: opscode-reporting: (pid 17407) 30088s; run: log: (pid 32415) 88051s
...

kill -HUP 32415


Apr 3, 2016

Chef - Passing output of one resource as input to the next

There are a couple of ways to do that.

One is via lazy

directory '/tmp' 
file '/tmp/alex.txt' do
  content 'sudo make me a sandwitch'
end 
ruby_block "something" do
    block do
      command = 'cat /tmp/alex.txt'
      command_out = shell_out(command)
      node.set['a'] = command_out.stdout
    end
    action :create
end 
file '/tmp/alex2.txt' do
  action :create
  owner 'root'
  group 'root'
  mode '0644'
  content lazy { node['a'] }
end

Mar 31, 2016

Powershell and chef - how to view the script powershell_script resource is executing

So, I was troubleshooting passing arrays to powershell_script resource.

Troubleshooting powershell_script

First - the easy way. Just run chef-client -l debug. In debug logging, you can see the whole script, which might be enough.

What makes troubleshooting powershell_script difficult, is the way it works from inside chef. A temporary file is created, and immediately nuked after execution, making it somewhat difficult to see exactly what happened.

After some messing around, I realized a simple trick:
powershell_script 'script_name' do
  code <<-EOH
    copy $MyInvocation.ScriptName c:/chef_powershell_staging_file.ps1  EOH
end

Passing array node attributes to powershell_script:

Seems that in defining a generic array, ruby inserts square brackets [ ] which actually become part of the string when consumed by powershell_script, and powershell chokes on it.
default['test_cookbook']['array'] = 'string1','string2' 
default['test_cookbook']['array'] = %w(string1,string2)
In both of the above, Powershell will either throw an error or generally not work as expected
Missing type name after '[' 
What actually happens, is during resource declaration phase, the square brackets get escaped (you can see it via chef-shell by creating a simple bash or powershell_script resource)


chef:attributes (12.8.1)> default['test_cookbook']['array'] = 'string1','string2'=> ["string1", "string2"]
for example bash:
chef:recipe >bash 'some-bash' do
chef:recipe > code <<-EOH
chef:recipe"> $Array = #{node['test_cookbook']['array']}
chef:recipe"> EOH
chef:recipe ?> end
=> <bash[Set-VariableArray] .... @code: " $Array = [\"string1\", \"string2\"] \n" ... 
using native ruby:
attribute:
default['a']['b'] = %w(a,b,c)
keeping the recipe the same, the resulting code will be:
... @code: " $Array = [\"a,b,c\"] \n" ... 

Solution - simple in retrospect - double quotes:
node.default['a'] = "'value1', 'value2', 'value3'"
In your recipe, you'll get an actual powershell array:

powershell_script 'script_name' do
  code <<-EOH
    Out-File -InputObject "#{node['a']}".GetType() c:/out.txt
  EOH
end

Feb 16, 2016

Governments entering IT at glacier speeds.

What happens when you give powerful tools to people with low motivation?

They create products that make it attractive to go back to filling out stacks of paper forms by hand.

Problem of the day:
I was filling out a Visa application for entering Japan. On the plus side, the file is a PDF and I can actually type the information into them. All of the fields are present. That's another plus.

The minus, is that input validation is broken on some fields. For example, of the 5 phone number entry fields, only 3 allow dashes. One of the date entry fields does not actually allow entry. Another date entry only allows 3 digits for the year.

But that's actually not bad.
What's bad, is that I could not print the thing. There was something literally not allowing me to print my own document.





This absolutely blew my mind. I attempted to print a few more times in complete disbelief for what was happening, before accepting the yak shave ahead of me as one of my own. First the obvious, was Ctrl+P different from File -> Print? Sadly, same result. (if you recall, Chrome takes over your print functionality)






So, can I just turn it off somewhere? Yes! Edit -> Preferences (it even has a hotkey!!)




And that is the Story of how I printed a form, that was likely butchered due to some government compliance rules on PDF security by people who were not given any autonomy and probably no explanation.

Homeownership

Nov 25, 2015

INNACCESSIBLE_BOOT_DEVICE - aka: how much I hate intel rapid raid


One beer and 2 episodes of DBZ Super, I am finally back to a functional Windows 10 machine. Also known as wasting two hours by beating your head against a useless recovery capabilities of Windows 10 and Intels incompetence at writing drivers.

Hardware:
GA-EX58-DS4 with 4 drives in Raid 5.

Scenario:
Upgraded from Windows 7 to Windows 10, and because I am an idiot, upgraded Intel Rapid Raid driver from 14.5 to 14.6

After the reboot I got the :( INNACCESSIBLE_BOOT_DEVICE . Instantly the memories of how much I hate upgrading intel drivers rushed back into my head, and I remember every time I swore never to upgrade their piece of %#@ drivers again...

Solution:
Most of what I did I borrowed from janbambas.cz

1: Boot into Windows 10 recovery mode. Advanced -> very Advanced -> Command prompt
2: Go to <system drive>\Windows\System32\drivers  (example: C:\Windows\System32\drivers)
3: Make a backup of existing driver files: ias*
3a: mkdir bad_intel
3b: copy ias*.*   bad_intel
4: get a piece of paper and a pen.
5: go to C:\Windows\System32\DriverStore\FileRepository and search for all folders with older version of a driver in it (dir /s iastora.sys)
6: write down a couple of characters to uniquely identify each folder. (example: iastorac.inf_amd64_47ebd65d436e75d0 - take the _47)
In my case I had 5 folders..
7: Start the recovery process:
7a: Take a look at timestamps iaStorA.sys in all of the FileRepository folders you got from search.
7b. Copy over the newest one over to C:\Windows\System32\drivers
7c. Exit command prompt (literally type exit)
7d. Click the button to continue booting to windows 10 and cross your fingers.
7e. If it doesn't work, go back to start and repeat with the next file. This is why you have paper, so you dont forget where you are in the process)

This process worked for me on 3rd file, which was from a few months ago.

Good luck, and if this works, have a beer in the honor of janbambas.cz

useful links:
* http://www.janbambas.cz/inaccessible_boot_device-on-windows-10-boot-after-update-of-the-intel-rapid-storage-techonology-driver/

* https://communities.intel.com/thread/78198?wapkw=intel+matrix+storage+and+windows+10+bsod+at+boot

Oct 21, 2015

Provisioning windows box with Chef-provisioning on azure from a mac

After spending about half a day trying to get vagrant-azure to work it became very clear, that as of this writing the driver is just not mature enough. It works pretty good for Ubuntu/Linux but the moment you try to provision windows boxes, it sets your laptop on fire.

Instead of wasting any more time on it, I decided to give v1 and v2 provisioning drivers a chance, followed by Test Kitchen. IIRC they all use different drivers, and while all are pretty solid at provisioning Linux boxes, support for WinRM is very spotty.


Authentication:

First challenge is to authenticate successfully via provisioning driver. While Vagrant accepts subscription id and path to .pem as parameters, provisioning needs a azureProfile.json.

To get that file generated, I installed azure-cli via brew `brew cask install azure-cli`

Next, import azure creds with `azure account import ../../Projects/Azure/myazure.publishsettings`
This command will generate the missing azureProfile.json in ~/.azure

Next, validate it works with `azure account list`

Chef-Provisioning piece:

Get a name of the box (ami) you'll be using: `azure vm image list | grep -i Win2012`

Next, hack up the simplest recipe that'll spin up a box:

`knife cookbook create azure_old`
content of recipe/default.rb:

require 'chef/provisioning/azure_driver'
with_driver 'azure'
machine_options = {
    :bootstrap_options => {
      :cloud_service_name => 'alexvinyar', #required
      :storage_account_name => 'alexvinyar', #required
      :vm_size => "Standard_D1", #required
      :location => 'West US', #required
      :tcp_endpoints => '80:80' #optional
    },
    :image_id => 'b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-14_04_2-LTS-amd64-server-20150706-en-us-30GB', #required
    # :image_id => 'a699494373c04fc0bc8f2bb1389d6106__Windows-Server-2012-R2-20150916-en.us-127GB.vhd', #next step
    # Until SSH keys are supported (soon)
    :password => 'Not**RealPass' #required
}
machine 'toad' do
  machine_options machine_options
  ohai_hints 'azure' => { 'a22' => 'b33' }
end
Finally, run chef-zero (chef client in local mode): `chef-client -z -r azure_old`

If the above recipe fails, dont fail. Check the output, and see if it gets past the authentication piece. If it does, it's just a matter of getting chef-provisioning syntax correct.

Once the run finishes (Azure is slow) connect to the box with `ssh root@12.23.34.45` for centos or ubuntu@ip for ubuntu boxes.

Now the Windows piece

With the `azure vm image list | grep -i Win2012` command I got a list of boxes, and once the test run with ubuntu succeeds, I move on to Windows.

This is where I took a break and had a beer. But I published this post anyway because I'll finish it eventually.





Useful links:
http://azure.microsoft.com/en-us/documentation/articles/xplat-cli/
http://brew.sh/
https://unindented.org/articles/provision-azure-boxes-with-vagrant/


chef-base repo and workstation cookbook


A "chef-base" or "chef-repo" is a git repository which maps 1:1 to Chef organization hosted on the Chef server.  An organization in Chef server 12 is analogous to a single Chef server. Each of these "chef-base" Git repositories becomes the system of record for the global Chef objects (Environments, Roles, Data Bags) in a given organization.  This Git repository typically* does not contain cookbooks.

To setup chef-base a user should first create an empty git repository on VSO / GitHub / GitLab / etc..
It makes things slightly easier if none of the files are initialized, including readme and gitignore.

Next, user should execute "chef generate repo <name of github repo>" command. This will generate the skeleton for the repo.
The resulting skeleton folder should be pushed it its entirety to git repo.

Workstation cookbook

* One exception to not having cookbooks in chef-base is the workstation cookbook. 
The workstation cookbook is a shared cookbook for anyone using chef in an organization and provides a standardized way to work with chef. It also allows rapid on-boarding of new team members and ability to safely experiment with a new tools. 
It works well in Vagrant, but there is a major limitation, you can't run Test Kitchen inside a Vagrant. For best results, encourage teams to leverage internal or external cloud VM, where kitchen runs will create additional VMs in the same cloud.
A Vagrantfile can be placed in the root of the cookbook. This vagrant file has a couple of purposes:
  • responsible for creating / destroying the workstation VM
  • kicking off chef-client run
  • easy access into the box via vagrant login
  • mounting the local chef-base as a folder in a VM
.gitignore file should be modified to exclude all cookbooks with exception of the workstation cookbook.

Places to learn more:
<add yours here> or in the comments.

Oct 3, 2015

Random observations of a new publically facing Chef website.

First time using speakr.chef.co – musings and observations

I hope I won't hurt anyone's feelings by below, below is what I see as an engineer. Every time I see similar pages, I make a conscious choice to overlook these defects, it could be because I trust the site, or because I found the thing I need.

There is no way in hell I would know how to write an existing page, or actually implement the changes I noted. But what I find most fascinating about my job, is there is a guy somewhere in the company – every company - who knows exactly what comma to change to address the issue. If I were a business, I would seek these guys out, and reward them with titles, work from home schedules, “work on your own problem”, etc... It's just so un-economic and un-business like to lose them.

To business:

The experience has been an exercise in patience, but only due to an unfortunate coincidence of API incompatibility:

                The GeekWire even was announced using the Seattle address which excluded ZipCode:
                "Oct. 1-2, 2015, Sheraton Seattle, 1400 Sixth Ave."
                ( URL: http://www.geekwire.com/events/geekwire-summit-2015/ )


Executive Summary: Overall Conclusion:
This experience instantly demonstrated the inferiority of this form of entry, as compared to the auto context/syntax entry offered by modern companies. If this is an internally developed tool for anything other than a personal project, it should be replaced with a real tool meant for the job.


Error 1:
The speakr input fields request ZipCode as a mandatory field.

Result 1:
I had to visit google maps, enter the partial address to get the ZipCode to unblock myself.
Pretty sure my mom would get past this now.


Error2:
As @echohack says - Default matter. There is non-primary field that requests event start time. The defaults of the all 4 fields are set to 23:00. Meaning the entries are valid data type, but values for start date are totally off.

Musing:
I think an 8am is a nice default for "start time" on "start date".

Possible scenario: a study of booking data found that most people fly in a day before, and they actually do want the start time to be 11pm for previous day for a networked dinner.
After digesting things over, above doesn’t make sense, because this isn’t an expense system. An event system should specify actual start time.

Result 2:
Had to make a couple of extra clicks to change the start time.


Error 3:
On initial event creation webpage threw errors: "Invalid start date", "Invalid end date". Clicking on start/end date fields again and resubmitting the form resulted in successful creation message.

Result / Assumptions
The drop off rate here is probably very high. I actually almost gave up here.
I wonder if there is monitoring or metrics in place to see this kind of drop off. Unlikely, but I do wonder if there is an easy to implement “business flow” monitoring solution for that like Zabbix.

Personal research todo: I wonder if paid version of google analytics is significantly faster at page load times than free one.


Error 4:
Allowed creation of events which have already occurred.

Possible scenario:
Could be a feature too I guess.

Musings:
Might be a good idea to check if there is an anti-spam mechanism on event creation button.
Wonder if vanilla code coverage would pick something like this up, or if you need something like Fortify.


Error 5:
After successful event creation, that event would not show up in search results on events.chef.io.
Possible causes is the refresh job on events jobs is not triggered instantly, the page is not yet hooked up to events, past events are ignored as a result of a conscious choice (possibly even from business), or something else entirely.


Overall Conclusion:
This experience instantly demonstrated the inferiority of this form of entry, as compared to the auto context/syntax detection offered by modern companies. If this is an internally developed tool for anything other than a personal project, it should be replaced with a real tool meant for the job.



Sep 21, 2015

Continuous key rotation with Chef

Lets see if I can get this down on paper in a meaningful way.. Players: a) some server (has to be Chef Server) - aka: Key Master. b) the rest of the infrastructure Tools needed: a) chef vault b) admin key for the Key Master c) sublime text The flow: Key master converges a recipe that does a global search for all of the nodes. For each node it generates a new key pair. It rotates the key and places the new key into a vault with search criteria of only itself and the node. Each node on converge accesses the vault and retrieves a new key. Marks the vault as converged or deletes the vault after consumption. Faults: What happens if the node doesn't converge for a long time? How does key rotation actually work? Can a node even converge if the key has been rotated? >> probably this is the way << Perhaps the node has to generate a key and set the search criteria to itself and Key Master. Key Master consumes the key and runs ctl command. Do Nodes continue to fail converges until Key Master updated the key? How does key rotation actually work? Result: Ever converge the node rotates its own key. Same model can probably be done for SSH keys. Final thought(s): What does it actually buy? I don't know, but many customers ask about it. Should it be done? Should each node have a unique, individual vault? Most likely, if you really think about it, there isn't a reason. Node's should be grouped and each group should run off a same vault. Having 1 vault per node with identical info is meaningless. Especially, if there is an admin who has access to all of the nodes anyway.