LCI data format in BW2 - lifecycle

So I want to import my own LCI database to Brightway2, and my process has 3 valuable products.
I found this example with co-products: https://github.com/massimopizzol/B4B/blob/main/02.2_Simple_LCA_co_products.py
The example shows more or less how it works, but I would like to use allocation for my process and not substitution. Should I just change the type to "allocation" instead of putting there "substitution", or is bw2 not supporting allocation? Also, if we have 3 valuable products, the first one is part of the main activity as type="production", and 2 others have 'type="substitution"? And for the other 2, we create 2 separate activities and they are just kinda one exchange activity, in which their type is production, like the example?
Besides just to make sure, if one of the inputs has the type="technosphere", we need to create another activity where we show the process behind it. When it comes to raw products, they have the type="biosphere", and their amounts are negative, in comparison to emissions.
I set other valuable products type as "substitution" and for each of them I created a new activity, where their type was equal to "production". Overall it worked, but the obtained LCA score wasn't correct, so I don't know if it wasn't conceptual mistake.
Thank you in advance for all your help and time!

So, Brightway currently does not have a model where you can enter a multifunctional process and get the software to do allocation for you. You will need to do the allocation yourself :) Here is a notebook I wrote up that shows a simple allocation procedure.
P.S. In the future please only post to one of the beginners mailing list or SO, otherwise everyone doesn't get notified twice.

Changing "substitution" by "allocation" will not work. If you want to use allocation / partition, I would create the activities with the exchanges already allocated.
The meaning of the "substitution" exchange as well as the sign conventions for biosphere flows is explained in the documentation here.

Related

Making sure my Go page view counter isn't abused

I believe I have a found a very good and fast solution for efficiently counting page views:
Working example in go playground here: https://play.golang.org/p/q_mYEYLa1h
My idea is to push this to the database every X minutes, and after pushing a key then delete it from the page map.
My question now is, what would be the optimal way to ensure that this isn't abused? Ideally, I would only want to increase page count from the same person if there was a time interval of 2 hours since last visiting the page.
As far as I know, it would be ideal to store and compare both IP and user agent (I don't want to rely on cookie/localstorage), but I'm not quite sure how to efficiently store and compare this information.
I'd likely get both the IP (req.Header.Get("x-forwarded-for")) and UserAgent (req.UserAgent()) from http.Request.
I was thinking making a visitor struct similar to my page struct that would look like this:
type visitor struct {
mutex sync.Mutex
urlIPUAAndTime map[string]time
}
This way should make it possible to do something similar to before. However, imagine if the website had so many requests that there would be hundreds of millions of unique visitor maps being stored, and each of these could only be deleted after 2 (or more) hours. I therefore think this is not a good solution.
I guess it would be ideal/necessary to write to and read from some file, but not sure how this should be done efficiently. Help would be greatly appreciated
One of optimization ways is to add a Bloom filter before this map. Bloom filter is a probabilistic structure which can say one of these:
this user is definitely new
and this user possibly was here
This is a way to cut off computation on early stage. If many of your users are new then you save requests to database to check all of them.
What if structure says "user is possibly non-unique"? Then you go the database and check it.
Here's one more optimization: if you do not need very accurate information and can agree with mistake about several percent, you may use the sole bloom filter. I guess many large sites use this technique for estimation.

Representing PCP/GP History in FHIR

Background:
I have been digging into the FHIR DSTU2 specification to try and determine what is the most appropriate resource(s) to represent a particular patient's historical list of GPs/PCPs. I am struggling to find an ideal resource to house this information.
The primary criteria I have been using is to identify the proper resource is that it must provide values to associate a patient to a practitioner for a period of time.
Question:
What is the proper resource to represent historical pcp/gp information that can be tied back to a patient resource?
What I have explored:
Here is a list of my possible picks thus far. I paired the resource types with my thought process on why I'm not confident about using it:
Episode of Care - This seems to have the most potential. It has the associations between a patient and a set of doctors for a given time period. However, when I read its description and use-case scenarios, it seems like I would be bastardizing its usage to fit my needs, since it embodies a period of time where a group of related health care activities were performed.
Group - Very generic structure that could fit based on its definition. However, I want to rule out other options before taking this approach.
Care Plan - Similar to Episode of Care rational. It seems like a bastardization to just use this to house PCP/GP history information. The scope of this is much bigger and patient/condition-centric.
I understand that there may not be a clear answer and thus, the question might run the risk of becoming subjective and I apologize in advance if this is the case. Just wondering if anyone can provide concrete evidence of where this information should be stored.
Thanks!
That's not a use-case we've really encountered before. The best possibility is to use the new CareTeam resource (we're splitting out CareTeam from EpisodeOfCare and CarePlan) - take a look at the continuous integration build for a draft.
If you need to use DSTU 2, you could just look at Patient.careProvider and rely on "history" to see changes over time. Or use Basic to look like the new CareTeam resource.

TDD strategy when implementing a multi-stage process?

At the moment I'm developing a piece of code which first gathers sentences from a set of documents, then tokenises these, then uses the results to analyse recurring frequencies of token sequences, including case variations (upper case/lower case/leading cap/other), then prints out the results.
Now I want to introduce two more stages before printing out the results:
1. firstly, removing "stop words" (i.e. words or short sequences the frequency of which can never be of interest, such as, in English, "the", "of the", "of which", etc.) - these stop words/"stop sequences" to be taken from a database table
2. secondly, bringing up a dialog enabling the user to identify sequences of new stop words, which would then remove the token sequences involved and also add the sequence in question to the database table.
The thing is, this is a multi-stage process, and I'm just wondering what TDD experts do faced with a situation like this: do I create a new test method for each individual stage...? The problem being that each individual stage requires the use of "live memory data" from the previous stage: another possibility could be to somehow serialise this data and then deserialise it when testing for the next stage... but then this would involve the app code doing things which were of benefit only for the testing code, i.e. it would mean tweaking ("distorting"?) the app code for the benefit of the testing code, which seems wrong in principle...
Also, if anyone can point me in the direction of a book or site which helps TDD newbs like myself go to "the next level" I would be very grateful.
later
To the person who marked this as "favorite": I've now got hold of a book called "Growing Object-Oriented Software, Guided by Tests", which is well-reviewed and appears to be for someone wanting to move from beginner to intermediate. First impressions good.
Any views on this book by experts also welcome, of course...
On the face of it, you seem to be building a pipeline. From what I can tell, you're currently implementing all of it within a single class, which stores both the data that's being worked on and implements the methods that do the processing. One approach that you could take would be to break down the problem into smaller chunks. Rather than having a single class, you have a class for each stage of the pipeline and another class for orchestrating the process which is responsible for plugging the stages together in the correct order.
So, scanning through what you've described, you appear to have the following processors:
DocumentReader (reads documents from somewhere into in memory document)
SentenceExtractor (document/list of documents in, list of sentences out)
1 or more SentenceAnalysers (sentences in, statistics out), you might want to break this down depending on the type of analysis and how complex it is.
StopWordExtractor (StopWordProvider and sentences in, sentences out)
There are additional supporting classes that would be needed, to support writing of new stopwords to the database and depending on how the stopwordprovider was implemented keeping it in sync as the user selects new ones.
Essentially, what I'm saying is that you appear to be doing too much in a single location. If you're really happy that the code as you've described it is a single unit, then there is nothing wrong with you testing it all in one place, but then your inputs will be your starting documents/sentences and your outputs will be the end of the process. If you agree with me that really, there are several distinct components involved in the process that could change independently, then I would suggest breaking the process down into smaller classes and testing that those perform as expected for given sets of inputs/outputs...

How to correlate the below given scenario for check boxes?

in my script i have a scenario like the page contains multiple check boxes for example 10, as per the user need user selects check boxes for example one user selects 4 check boxes and other user clicks 5 check boxes, so per each it will vary.
so how to correlate those values,
thanking you.
From the website: "Please don’t share your solutions, ask for help, or help others. This is meant to be a challenge."
So you appear to be violating one of the primary rules in this website. I have looked at this challenge and it's really good to gauge someone's knowledge.
However, to address technology generally - in reading your question I get the sense you may be missing certain fundamental knowledge for this kind of thing. Here's some fundamental knowledge. Hopefully my answer will help increase your knowledge. And hopefully you can use this increased general knowledge to address this specific question.
Definitions:
Correlation - you're taking data the SERVER sends to the browser, capturing it and sending it back. Information present on web pages would fit into this category.
Parameterization - you've got a set of values you'd like to put into web forms. This is usually values like names, addresses, etc
Also understand exactly what is happening when you conduct certain actions on your browser. When you "click" a checkbox does that actually send a message to a server? That usually doesn't (though not always) happen. So when you use phrases like 'click a checkbox' that tells me you may not appreciate the fact that performance testing is server focused, not browser focused.
Performance testing isn't intuitive so you need to understand these concepts. If you dedicate time to understanding the concepts I've outlined above you'll have the knowledge to complete the challenge.
Good luck.
What is driving the variation on check boxes being checked? Is it the result of something that comes back from the server, from a previous request? Or is it somewhat random based on whatever the user wants to do at runtime?

Transferring lots of objects with Guid IDs to the client

I have a web app that uses Guids as the PK in the DB for an Employee object and an Association object.
One page in my app returns a large amount of data showing all Associations all Employees may be a part of.
So right now, I am sending to the client essentially a bunch of objects that look like:
{assocation_id: guid, employees: [guid1, guid2, ..., guidN]}
It turns out that many employees belong to many associations, so I am sending down the same Guids for those employees over and over again in these different objects. For example, it is possible that I am sending down 30,000 total guids across all associations in some cases, of which there are only 500 unique employees.
I am wondering if it is worth me building some kind of lookup index that I also send to the client like
{ 1: Guid1, 2: Guid2 ... }
and replacing all of the Guids in the objects I send down with those ints,
or if simply gzipping the response will compress it enough that this extra effort is not worth it?
Note: please don't get caught up in the details of if I should be sending down 30,000 pieces of data or not -- this is not my choice and there is nothing I can do about it (and I also can't change Guids to ints or longs in the DB).
Your wrote at the end of your question the following
Note: please don't get caught up in the details of if I should be
sending down 30,000 pieces of data or not -- this is not my choice and
there is nothing I can do about it (and I also can't change Guids to
ints or longs in the DB).
I think it's your main problem. If you don't solve the main problem you will be able to reduce the size of transferred data to 10 times for example, but you still don't solve the main problem. Let us we think about the question: Why so many data should be sent to the client (to the web browser)?
The data on the client side are needed to display some information to the user. The monitor is not so large to show 30,000 total on one page. No user are able to grasp so much information. So I am sure that you display only small part of the information. In the case you should send only the small part of information which you display.
You don't describe how the guids will be used on the client side. If you need the information during row editing for example. You can transfer the data only when the user start editing. In the case you need transfer the data only for one association.
If you need display the guids directly, then you can't display all the information at once. So you can send the information for one page only. If the user start to scroll or start "next page" button you can send the next portion of data. In the way you can really dramatically reduce the size of transferred data.
If you do have no possibility to redesign the part of application you can implement your original suggestion: by replacing of GUID "{7EDBB957-5255-4b83-A4C4-0DF664905735}" or "7EDBB95752554b83A4C40DF664905735" to the number like 123 you reduce the size of GUID from 34 characters to 3. If you will send additionally array of "guid mapping" elements like
123:"7EDBB95752554b83A4C40DF664905735",
you can reduce the original size of data 30000*34 = 1020000 (1 MB) to 300*39 + 30000*3 = 11700+90000 = 101700 (100 KB). So you can reduce the size of data in 10 times. The usage of compression of dynamic data on the web server can reduce the size of data additionally.
In any way you should examine why your page is so slowly. If the program works in LAN, then the transferring of even 1MB of data can be quick enough. Probably the page is slowly during placing of the data on the web page. I mean the following. If you modify some element on the page the position of all existing elements have to be recalculated. If you would be work with disconnected DOM objects first and then place the whole portion of data on the page you can improve the performance dramatically. You don't posted in the question which technology you use in you web application so I don't include any examples. If you use jQuery for example I could give some example which clear more what I mean.
The lookup index you propose is nothing else than a "custom" compression scheme. As amdmax stated, this will increase your performance if you have a lot of the same GUIDs, but so will gzip.
IMHO, the extra effort of writing the custom coding will not be worth it.
Oleg states correctly, that it might be worth fetching the data only when the user needs it. But this of course depends on your specific requirements.
if simply gzipping the response will compress it enough that this extra effort is not worth it?
The answer is: Yes, it will.
Compressing the data will remove redundant parts as good as possible (depending on the algorithm) until decompression.
To get sure, just send/generate the data uncompressed and compressed and compare the results. You can count the duplicate GUIDs to calculate how big your data block would be with the dictionary compression method. But I guess gzip will be better because it can also compress the syntactic elements like braces, colons, etc. inside your data object.
So what you are trying to accomplish is Dictionary compression, right?
http://en.wikibooks.org/wiki/Data_Compression/Dictionary_compression
What you will get instead of Guids which are 16 bytes long is int which is 4 bytes long. And you will get a dictionary full of key value pairs that will associate each guid to some int value, right?
It will decrease your transfer time when there're many objects with the same id used. But will spend CPU time before transfer to compress and after transfer to decompress. So what is the amount of data you transfer? Is it mb / gb / tb? And is there any good reason to compress it before sending?
I do not know how dynamic is your data, but I would
on a first call send two directories/dictionaries mapping short ids to long GUIDS, one for your associations and on for your employees e.g. {1: AssoGUID1, 2: AssoGUID2,...} and {1: EmpGUID1, 2:EmpGUID2,...}. These directories may also contain additional information on the Associations and Employees instances; I suspect you do not simply display GUIDs
on subsequent calls just send the index of Employees per Association { 1: [2,4,5], 3:[2,4], ...}, the key being the association short id and the ids in the array value, the short ids of the employees. Given your description building the reverse index: Employee to Associations may give better result size wise (but higher processing)
Then its all down to associative arrays manipulations which is straightforward in JS.
Again, if your data is (very) dynamic server side, the two directories will soon be obsolete and maintaining synchronization may cost you a lot.
I would start by answering the following questions:
What are the performance requirements? Are there size requirements? Speed requirements? What is the minimum performance that is truly needed?
What are the current performance metrics? How far are you from the requirements?
You characterized the data as possibly being mostly repeats. Is that the normal case? If not, what is?
The 2 options you listed above sound reasonable and trivial to implement. Try creating a look-up table and see what performance gains you get on actual queries. Try zipping the results (with look-ups and without), and see what gains you get.
In my experience if you're not TOO far from the goal, performance requirements are often trial and error.
If those options don't get you close to the requirements, I would take a step back and see if the requirements are reasonable in the time you have to solve the problem.
What you do next depends on which performance goals are lacking. If it is size, you're starting to be limited if you're required to send the entire association list ever time. Is that truly a requirement? Can you send the entire list once, and then just updates?

Resources