A store has n customers and anyone can visit them any time throughout the year - data-structures

A store has n customers and anyone can visit them any time throughout the year. Data is stored in a file. Design a data structure to find if a given person visited on a date or not.
Could anyone suggest data structure I shall use in this case?

I'd suggest this: Every customer is stored in one line while you include the customer name first and then the date. You can split them with commas or something.
These are some examples
Name, Date
Name | Date
Just choose something that will be the easiest for you to use and to retrieve the information correctly with using string .split or .substring.

The problem statement does not state whether or not a customer may visit the store numerous times during the year so assuming that they can I would use a Map data-structure, where the key is the name of the customer and the value is the set of dates the customer visited the store. The data can be stored in the file using XML.


How to make groups in an input and select a specific row in each of them in Talend?

I am working on a Talend transformation process (we are using Talend 6.4).
, and I don't know how to implement the current requirement.
I have an input consisting in :
Two columns that are my group keys (Account and Product), but are not unique (the same Account x Product couple can happen in multiple rows)
A criterion column (Contract end date), which will help me decide which row I want to keep for each group
Some "tail" data that need to be passed to the following step of the processing (the contract number)
The rule to implement is:
Keep only one record per group
The selected record must be one with no end date or, if all have end date, with the biggest end date
The selected record can be random in case there is a tie
See the transformation applying those rules on some dummy data:
I thought first to do the following:
sort by Account, Product, End_date (nulls first)
"select first" in each group
but I am not skilled enough to know whether the second transformation exists in Talend.
Very interesting Talend question.
You need to create something like this job.
here a link to the zip file to import in your Talend
The answer from #MBDIA seem to be working, however I would like to share what we did to fulfill our requirement.
See our Talend process here:
The first tMap (tMap_3) acts like a tReplicate and a tMap, and sends:
in the upper branch only the Account and Product references, that are then deduplicated by the tAggregateRow_1.
in the lower branch all data and computed fields that enables us to take care of the case where the date is missing (instead of defaulting to 31/12/9999, we compute a flag (0 or 1) that we use in the sort step afterwards).
In the second part of the process, we first apply the sort to the whole data on Account, Product, Empty date flag (computed before), End date (desc) and use a second tMap to make a join on both branches (on Account x Product), only keeping First Match in order to keep the first record as per our requirement.

How do i extract multiple tables(35-40 tables) from a html website into one excel file?

Currently, am trying to retrieve data from this page: https://www.hdb.gov.sg/cs/infoweb/residential/renting-a-flat/renting-from-the-open-market/rental-statistics , as you can see, there are 4 quarters in a year, and for each quarter, there is a different table. I wish to extract the table but currently, i am unable to automate the process, only able to take one. On top of that, i wish to add two columns to the retrieved data table which is "Quarter" and "Year". Any suggestions? Attached photos are my workflow and my excel.
Get the number of years/ loop through the years (or start with the 1st year up to the last year).
For each year try to get the data via data scraping (the elements exist, just hidden/not expanded ; do one table datascraping for data modelling and reuse it within the loop). For the datascraping you need to change the selector, to make it usable for all tables by using the year and the quarter (just a generic example, like * year * quarter *). Columns are the same for all tables.
I haven't seen details within the website menu or within the page, is good to check if robots are allowed to scrape for data
Above would be the quickest way. More complex with FindChidren activity.

Queries in Dynamodb

I have an application written in Nodejs that needs to find ONE row based on a city name (this could just be the table's name, different cities will be categorized as different tables), and a field named "currentJobLoads" which is a number. For example, a user might want to find ONE row with the city name "Chicago" and the lowest currentJobLoads. How can I achieve this in Dynamodb without scan operations(since scan would be slower and can only read so much data before it gets terminated)? Any suggestions would be highly appreciated.
You didn't specify what your current partition key and sort key for the table are, but I'm guessing the currentJobLoads field isn't one of them. So you would need to create a Global Secondary Index on the currentJobLoads field, at which point you will be able to run query operations against that field.

Informatica: If Current month data missing, use previous month

The project I'm working on has monthly data for gas prices in California. The data is taken from a website and loaded into a table. I've done this part - the data is current until March 2016. We are now in April, which does not have any data yet, so the next step I need to do is use March's data and place that into April.
Here is what my table looks like right now:
My question is: How do I add a new row with first column data of 201604 and use March's price?
Let me know if I need to add more information.
I can't help but thinking that your table structure is going to hurt later.
You don't appear to have a primary key which helps with integrity and performance.
YYYYMM could be a key but it's not clear whether you are storing it as a number or a string.
The use of YYYYMM as a column name might prove troublesome as that is part of the Oracle data format.
your naming convention of GAS_PRICES table and GAS_PRICE column could provide confusion due the similarity

Neo4j: how to query by intermediate date when given date range

Neo4J TimeTree is an efficient way of modelling time in a graph. However, I'm interested in how best to model/query for an object with a defined start and end time.
For instance, a ticket might be validFrom and validTo given dates, which may be separated by many days. A user may have many tickets.
For a given date, what is the most efficient way of querying for valid tickets?
When entering the data, I suppose I could create lots of validOn relationships between a ticket and the intermediate days between the start and end, but this seems inefficient. Can anyone think of a better way of querying the data?
I can start from a user and find all tickets for that user whose validFrom is <= and validTo is >= the date. However, what happens if I need to start from a date? I.e. match all tickets that are valid on a given date?
You only link the ticket to the validFrom and validTo dates with dedicated relationships.
For any given day, you query backwards for tickets that have their :START relationship before that date but the :END relationship after that date, something like this:
MATCH path = (t:Ticket)-[:START]->(before:Day)-[:NEXT*0..30]->(day:Day {date:{date}})
WHERE (t)-[:END]->(:Day)<-[:NEXT*1..30]-(day)
