What was your most serious production bug? - production

What was your most serious production bug? This could be any bug you contributed to the making of or solving in a live system.
[moved my response to the answers]

Mine was on my first project out of school, on a large sales compensation system for a software company. We had a bug in the final summation routine which would attempt to subtract any owned money from the next paycheck. In certain situations, where a retroactive computation increased the amount of money owed from a previous month, the debit would be recorded, and then never get reduced from the next paycheck. What might start out as a $3.23 the first month would increase to $6.46 the following month. You can see where this is going. Although we heard of a couple of user complaints early on we dismiss them as "user error" - the sales plans were complex and it was quite easy for anyone to misunderstand what the correct amount was to be paid. But after a few months, the monies that were missing were too large to be ignored - over $2,000,000 in not paid out payroll checks. The code fix was easy, going over months of payroll computations for hundreds of employees, not so much.

I worked on an e-commerce website where the client data was supplied as a CSV dump from a legacy back-end system. We only had a sample data set to work with (despite repeated requests for the full data set) so the first time we saw the full data was on the live site the morning it launched. All the strings were quoted in the CSV file but the numbers weren't. What we didn't realise is that the legacy system inserted a comma for the thousands in larger numbers - so where we expected, say, 1099.99, we got 1,099.99. Of course, the CSV parser saw the comma and took the value as 1. Imagine the client's surprise when orders started to come in for big ticket items which were apparently selling at the bargain price of £1 each. The code was fixed quickly and fortunately their terms allowed them to decline the orders. Lesson learned: never trust a sample data set and don't go live until you've tested with a full data load.

We had an e-commerce system, and when it was moved to the production server (through our super awesome manual copy/paste/edit settings process), the senior developer - the only one with access to the server - forgot to connect the system to the payment gateway. $18,000 worth of sales later, the client notices that their bank account isn't any bigger than when we started.
Process improvements since that day:
Not one.
How we solved the problem:
Told the client to contact all the customers based on their email notifications

I lost some user registration data for about 7 users during a live update to a system I built. That doesn't sounds so bad, except that it was registrations for an $18 billion IPO. We were able to track the information down through the automated emails that got sent out, but there were a few beads of sweat shed over that little hiccup.

Related

investments/transactions/get endpoint - how long to return data?

I've been testing Plaid's investments transactions endpoint (investments/transactions/get) in development.
I'm encountering issues with highly variable delays for data to be returned (following the product initialization with Link). Plaid states that it takes 1–2 minutes to return investment transaction data, but I've found that in practice, it can be up to several hours before the data is returned.
Anyone else using this endpoint and getting data returned within 1–2 minutes, or is it generally a longer wait?
If it is a longer wait, do you simply wait for the DEFAULT_UPDATE webhook before you retrieve the data?
So far, my experience with their investments/transactions/get has been problematic (missing transactions, product doesn't work as described in their docs, limited sandbox dataset, etc.) so I'm very interested in hearing from anyone with more experience with this endpoint.
Do you find this endpoint generally reliable, and the data provided to be usable, or have you had issues? I've not seen any issues with investments/holdings/get, so I'm hoping that my problems are unusual, and I just need to push through it.
I'm testing in development with my own brokerage accounts, so I know what the underlying transactions are compared to what Plaid is returning to me. My calls are set up correctly, and I can't get a helpful answer from Plaid support.
I took at look at the support issue and it does appear like the problem you're hitting is related to a bug (or two different bugs, in this case).
However, for posterity/anyone else reading this question, I looked it up and the general answer to the question is that the endpoint in the general case is pretty fast -- P95 latency for calling /investments/transactions/get is currently about 1 second (initial calls on an Item will be higher latency as they have more data to fetch and because they are blocked on Plaid's extracting the data for the Item for the first time -- hence the 1-2 minute guidance in the docs).
In addition, Investments updates at some major brokerages are scheduled to happen only overnight after market close, so there might be a delay of 12+ hours between making a trade and seeing that trade be returned by the API.

What is "sf_max_daily_api_calls"?

Does someone know what "sf_max_daily_api_calls" parameter in Heroku mappings does? I do not want to assume it is a daily limit for write operations per object and I cannot find an explanation.
I tried to open a ticket with Heroku, but in their support ticket form "Which application?" drop-down is required, but none of the support categories have anything to choose there from, the only option is "Please choose..."
I tried to find any reference to this field and can't - I can only see it used in Heroku's Quick Start guide, but without an explanation. I have a very busy object I'm working on, read/write, and want to understand any limitations I need to account for.
Salesforce orgs have rolling 24h limit of max daily API calls. Generally the limit is very generous in test orgs (sandboxes), 5M calls because you can make stupid mistakes there. In productions it's lower. Bit counterintuitive but protects their resources, forces you to write optimised code/integrations...
You can see your limit in Setup -> Company information. There's a formula in documentation, roughly speaking you gain more of that limit with every user license you purchased (more for "real" internal users, less for community users), same as with data storage limits.
Also every API call is supposed to return current usage (in special tag for SOAP API, in a header in REST API) so I'm not sure why you'd have to hardcode anything...
If you write your operations right the limit can be very generous. No idea how that Heroku Connect works. Ideally you'd spot some "bulk api 2.0" in the documentation or try to find synchronous vs async in there.
Normal old school synchronous update via SOAP API lets you process 200 records at a time, wasting 1 API call. REST bulk API accepts csv/json/xml of up to 10K records and processes them asynchronously, you poll for "is it done yet" result... So starting job, uploading files, committing job and then only checking say once a minute can easily be 4 API calls and you can process milions of records before hitting the limit.
When all else fails, you exhausted your options, can't optimise it anymore, can't purchase more user licenses... I think they sell "packets" of more API calls limit, contact your account representative. But there are lots of things you can try before that, not the least of them being setting up a warning when you hit say 30% threshold.

Where to fetch REAL- TIME economic data announcements for use in algorithmic trading?

I'm looking to fetch values of macro- economic announcement data (e.g. interest rate announcements, unemployment figures, consumer price index figures etc.) as soon as/ as near to the time the figures are released from the original source to be used within an MQL4 algorithm written on metatrader4.
At the moment I'm fetching the latest value from Quandl which provides a csv API so that the value can be fetched within an MQL4 script. The issue here lies in that Quandl doesn't update latest values as soon as the sources release them, which is a factor that is very important for my algorithm.
So:
Q. Which sources allow you to fetch real- time LATEST values upon release, to be used within an algorithm?
There doesn't seem to be any documentation on the source websites such as Bureau of Labor Statistics [US], Bank of England [UK] etc. regarding fetching released data values, yet I see online FOREX market calendar websites retrieving latest values sometimes within the second the value is announced- so they must be fetching data from the source?
Examples of the sort of latest values to be fetched:
[US] Non- Farm Payroll - source: Bureau of Labor Statistics
[GB] Interest Rate Announcement - source: Bank of England
[EU] Unemployment Rate - source: Eurostat
To summarise:
which sources can I use to fetch a single real- time latest value of an economic announcement as soon as its released? (I understand latency will mean that it won't be fetched immediately)
can be fetched using MQL4
To get "near real-time", you need to be subscribed to feed services like Bloomberg. But they are expensive. They provide an WebAPI and wrapper interface.
There are other online services too, but again, they can be expensive.
Alternative way is for you to "data-scrap" them off those "near real-time" sources (note: gray-area legality).
This method is possible, I've done it. I managed to get it down to ~2sec (near real-time), and am currently using it in one of my MT4 EA project ( TriskM ):
https://www.facebook.com/TrackRiskM/photos/a.800008416769352.1073741828.781013552002172/800008486769345
Basically, in involve 2 parts:
A server application that I host in the cloud. This application's job is to go out and scrape the data, and format it properly for easy downstream consumption.
At the MT4 application (EA) level, you can make HTTP request to the cloud host and request for the info.
Update: Forex News Gun does not work any more.
The fastest free way to get macroeconomic data in almost real time is a Forex Peace Army Forex News Gun. You get all the economic signals you mentioned, and more, with a delay of less than a second after the release in most cases.
It works like this: before the actual economic data release, you set up where the program will click in case the number released will be such and such range. At the time of the release ( or better said, when the number arrives at your computer ), if the number is in the specified range, click will occur and you can possibly perform a trade or run a trading algorithm.
For an even faster access, you have to pay a lot of money ( thousands or tens of thousands of dollars monthly ) to financial news providers aka Bloomberg or Thomson-Reuters.

Server Error upon joining many rooms in a short period of time

My application joins about 50 rooms for one user on one connection all at once. After a couple rooms successfully join I start to get a server error return on some of the rooms.
There error is always the same, here it is:
Error: Server Error
at Object.i.build (https://cdn.goinstant.net/v1/platform.min.js:4:7501)
at Connection._onResponse (https://cdn.goinstant.net/v1/platform.min.js:7:25694)
at Connection._onMessage (https://cdn.goinstant.net/v1/platform.min.js:7:28812)
at Connection._onMessage (https://cdn.goinstant.net/v1/platform.min.js:3:4965)
at r.e (https://cdn.goinstant.net/v1/platform.min.js:1:4595)
at r.emit (https://cdn.goinstant.net/v1/platform.min.js:2:6668)
at r.e (https://cdn.goinstant.net/v1/platform.min.js:1:4595)
at r.emit (https://cdn.goinstant.net/v1/platform.min.js:3:7482)
at r.onPacket (https://cdn.goinstant.net/v1/platform.min.js:3:14652)
at r.<anonymous> (https://cdn.goinstant.net/v1/platform.min.js:3:12614)
It's not isolated to any particular rooms, sometimes half of them pass, sometimes nearly all pass, but there are almost always a couple that break.
What I have found is if it's less than 10 rooms it won't break.
Is there any rate limiting on joining rooms that could be causing this? I'd rather not put a delay between each room join but I can if I need to.
Update: It definitely has to do with how fast I'm connecting to the rooms. Spacing them out by 1s each makes it work every time. I need to connect faster though, is there a fix for this?
Even a 100ms deplay seems to work.
This isn't a case of rate-limiting or anything along those lines. It's a bug and we are working to fix it as soon as we can. We'll update you here once we have a solution deployed. If you'd like for us to email you a notification directly, drop us a message via our contact form (https://goinstant.com/contact). Just make reference to this issue and I'll make sure a note is added to email you directly as soon as the fix goes live.
Sorry for any inconvenience this may be causing you.
Regards,
Thomas
Developer, GoInstant

Random Duplicate Transactions in Authorize.Net

Having an emergency situation. Currently on my site I have some customers that are being charged multiple times for the same order. The payment gateway is Authorize.Net and the store front platform is Magento Enterprise. What could be causing this? Bad code, server error, etc? This has never happened before and totally random. If this isn’t enough info to help please let me know.
It's a coding issue but trying to spot the code will be difficult in a site like this. A developer will need to go through and review the entire checkout code to look for potential errors.
The best course of action is to look to see how far apart the transactions are. If they are very close together (i.e. a few minutes or less) you can try to fix this by setting the duplicate transaction window to a value large enough to block the duplicates from happening. In other words, if the duplicate transactions are happening within 60 seconds of each other, update the Authorize.Net code to set x_duplicate_window to a value of 180 (this value is seconds). That should prevent the duplicate orders from happening.

Resources