How can I extract a column's (called say "X") any cell value from a text file with multiple columns using Bash? - bash

I have a huge file with 100 columns.
I am concerned with one column called 'Location'. I know for a fact that all rows of this column are same in value. I need to get that value through Bash.
Any thoughts on how to go about this?

If the column is always in the same location relative to other columns (say 10th) you could use
cut -d" " -f10
In this case you're assuming there's one whitespace between each column, you could change the delimiter to whatever separates between the columns.

Related

Send a Flat file attachment in the workflow in Informatica Developer

In a mapping we use delimited flat file having 3 columns.The column separated through comma. But i have a requirement that in between the column there is a column having 2 comma.So how should I process the column in the mapping?
You should have information quoted with "" so whatever is within " is skiped. this way you could differentiate between comma of a piece of information or as a column separator.
We don't know what have you tried, but count the number of commas for each line and separate accordingly (if possible).

advanced concatenation of lines based on the specific number of compared columns in csv

this is the question based on the previous solved problem.
i have the following type of .csv files(they aren't all sorted!, but the structure of columns is the same):
name1,address1,town1,zip1,email1,web1,,,,category1
name2,address2,town2,zip2,email2,,,,,category2
name3,address3,town3,zip3,email3,,,,,category3_1
name3,address3,town3,zip3,,,,,,category3_2
name3,address3,town3,zip3,,,,,,category3_3
name4,address4,town4,zip4,,,,,,category4_1
name4,address4,town4,zip4,email4,,,,,category4_2
name4,address4,town4,zip4,email4,,,,,category4_3
name4,address4,town4,zip4,,,,,,category4_4
name5,address5,town5,zip5,,,,,,category5_1
name5,address5,town5,zip5,,web5,,,,category5_2
name6,address6,town6,zip6,,,,,,category6
first 4 records in columns are always populated, other columns are not always, except the last one - category
empty space between "," delimiter means that there is no data for the particular line or name
if the nameX doesnt contain addressX but addressY, it is a different record(not the same line) and should not be concatenated
i need the script in sed or awk, maybe the bash(but this solution is little slower on bigger files[hundreds of MB+]), that will take first 4 columns(in this case) compares them and if matched, will merge every category with the ";" delimiter and will keep the structure and the most possible data in other columns of those matched lines of a .csv file:
name1,address1,town1,zip1,email1,web1,,,,category1
name2,address2,town2,zip2,email2,,,,,category2
name3,address3,town3,zip3,email3,,,,,category3_1;category3_2;category3_3
name4,address4,town4,zip4,email4,,,,,category4_1;category4_2;category4_3;category4_4
name5,address5,town5,zip5,,web5,,,,category5_1;category5_2
name6,address6,town6,zip6,,,,,,category6
if that is not possible, solution could be to retain data from the first line of the duped data(the one with categoryX_1). example:
name1,address1,town1,zip1,email1,web1,,,,category1
name2,address2,town2,zip2,email2,,,,,category2
name3,address3,town3,zip3,email3,,,,,category3_1;category3_2;category3_3
name4,address4,town4,zip4,,,,,,category4_1;category4_2;category4_3;category4_4
name5,address5,town5,zip5,,,,,,category5_1;category5_2
name6,address6,town6,zip6,,,,,,category6
does the .csv have to be sorted before using the script?
thank you again!
sed -n 's/.*/²&³/;H
$ { g
:cat
s/\(²\([^,]*,\)\{4\}\)\(\([^,]*,\)\{5\}\)\([^³]*\)³\(.*\)\n\1\(\([^,]*,\)\{5\}\)\([^³]*\)³/\1~\3~ ~\7~\5;\9³\6/
t fields
b clean
:fields
s/~\([^,]*\),\([^~]*~\) ~\1,\([^~]*~\)/\1,~\2 ~\3/
t fields
s/~\([^,]*\),\([^~]*~\) ~\([^,]*,\)\([^~]*~\)/\1\3~\2 ~\4/
t fields
s/~~ ~~//g
b cat
:clean
s/.//;s/[²³]//g
p
}' YourFile
Posix version (so --posixwith GNU sed) and without sorting your file previously
2 recursive loop after loading the full file in buffer, adding marker for easier manipulation and lot of fun with sed group substitution (hopefully just reach the maximum group available).
loop to add category (1 line after the other, needed for next loop on each field) per line and a big sub field temporary structured (2 group of field from the 2 concatened lines. field 5 to 9 are 1 group)
ungroup sub field to original place
finaly, remove marker and first new line
Assuming there is no ²³~ character because used as marker (you can use other marker and adapt the script with your new marker)
Note:
For performance on a hundred MB file, i guess awk will be lot more efficient.
Sorting the data previoulsy may help certainly in performance reducing amount of data to manipulate after each category loop
i found, that this particular problem is faster being processed through db...
SQL - GROUP BY to combine/concat a column
db: mysql through wamp

How to sort file rows with vi?

I have to edit multiple files with multiple rows, and also everything is in three columns, like this:
#file
save get go
go save get
rest place reset
Columns are separated with tab. Is there any possible way to sort rows based on second or third column using vi?
sort by the 2nd col:
:sor /\t/
sort by the 3rd col:
:sor /\t[^\t]*\t/
Second column:
:sort /\%9c/
Third column:
:sort /\%16c/
\%16c means "column 16".
Hi light the rows you want to sort with "V" command
Use a bash command with "!" to work on the selection, like:
!sort -k 10
Where the number is the column number where your second (sort) column starts.
vi will replace the selection with the output of the sort command - which is given the original selection.
You can specify a pattern for sort. For example:
sort /^\w*\s*/
Will sort on the second column (the first thing to sort after matching the pattern).
Likewise
sort /^\w*\s*\w*\s*/
Should sort on the third column.
delimit the column using some char here I have | symbol as delimiter, once did with that you can use below command to sort specific column use -n if u want to sort numeric and its working on some version of vi and not on ubuntu vi :(
/|.*|/ | sort

Space came between digits in jQgrid when showing total number of records in the end

The jQgrid list shows a space between the first and second digit when the number of items are in to 4 digits. For example, when I was viewing, there were 3956 users and it was displaying on the right side bottom part of the table as ‘3 956’.
The space as the separator will be shown because of wrong value of thousandsSeparator value in grid.locale-en.js file. This is correct for Bulgarian (the developer of jqGrid come from Bulgarian), but looks strange for you.
Yo can modify the value thousandsSeparator from " " to "," in the grid.locale-en.js file file or make the following changes
$.jgrid.formatter.integer.thousandsSeparator=',';
$.jgrid.formatter.number.thousandsSeparator=',';
$.jgrid.formatter.currency.thousandsSeparator=',';
before the first call of jqGrid.
If you want you can use no separator ('') instead of the comma (',').

List of names and their numbers needed to be sorted .TXT file

I have a list of names (never over 100 names) with a value for each of them, either 3 or 4 digits.
john2E=1023
mary2E=1045
fred2E=968
And so on... They're formatted exactly like that in the .txt file. I have Python and Excel, also willing to download whatever I need.
What I want to do is sort all the names according to their values in a descending order so highest is on top. I've tried to use Excel by replacing the '2E=' with ',' so I can have the name,value then important the data so each are in separate columns but I still couldn't sort them any other way than A to Z.
Help is much appreciated, I did take my time to look around before posting this.
Replace the "2E=" with a tab character so that the data is displayed in excel in two columns. Then sort on the value column.

Resources