Read csv and update csv - shell

I have csv file which has list of hadoop file path, so I have to read each hadoop file from each row and calling hadoop - get. It is working fine. But I would like to mark the csv 2nd column with files are copied to destination folder.
something like flag. How to do edit the second column in while loop and save it in same csv?
Input.csv
path,flag
file1path,
file2path,
So after copying each file want to mark flag as Y in the same file.

Related

how do i combine txt file from a list of file emplacement

i have a problem, i used "everything" to extract every txt file from a specific directory so that i can merge them. But on emeditor i don't find a way to merge file from a list of localisation.
Here what the everything file look like:
E:\Main directory\subdirectory 1\file.txt
E:\Main directory\subdirectory 2\file.txt
E:\Main directory\subdirectory 3\file.txt
E:\Main directory\subdirectory 4\file.txt
The list goes over 40k location. is there a way to use a program to read all the location in the text file and combine them ?
Also, the subdirectory has other txt file that i don't want to so i can't just merge all txt file from the main. Another thing is that there are variation of the "file.txt" like "Files.txt" for example.

Shell Script - Iterate through each line in text file and rename HDFS file

I have a text file in HDFS which would have records like below. The number of lines in file may vary every time.
hdfs://myfile.txt
file_name_1
file_name_2
file_name_3
I have the below hdfs directory and file structure like below.
hdfs://myfolder/
hdfs://myfolder/file1.csv
hdfs://myfolder/file2.csv
hdfs://myfolder/file3.csv
Using shell script I am able to count the number of files in HDFS directory and number of lines available in my HDFS text file. Only if the count matches between the number of files in directory and number of records in my text file, I am going to proceed further with the process.
Now, i am trying to rename hdfs://myfolder/file1.csv to hdfs://myfolder/file_name_1.csv using the first record from my text file.
Second file should be renamed to hdfs://myfolder/file_name_2.csv and third file to hdfs://myfolder/file_name_3.csv
I have difficulty in looping through both the text file and also the files in HDFS directory.
Is there an optimal way to achieve this using shell script.
You cannot do this directly from HDFS, you'd need to stream the file contents, then issue individual move commands.
e.g.
#!/bin/sh
COUNTER = 0
for file in $(hdfs dfs -cat file.txt)
do
NAME = $(sed $file ...) # replace text, as needed. TODO: extract the extension
hdfs dfs -mv file "$NAME_${COUNTER}.csv" # 'csv' for example - make sure the extension isn't duplicated!!
COUNTER = $((COUNTER + 1)
done

Need to update the csv file with the timestamp of the files from another location

I have a csv file score.csv with at path /NAS/DQ with 2 columns Scorename,filename.
scorename,filename
ABC,cust.txt
XYZ,bank.txt
These filescust.txt and bank.txt are placed at /NAS/files_path. There will be unique instance of each file placed at this path everyday.
I want to append the file timestamp from /NAS/files_path to /NAS/DQ csv file.
So the timestamp should be updated everytime to the csv file at /NAS/DQ location.
I am new to unix and currently looking for ways to do it.
Any help is appreciated!!
Sed will be a good candidate for this:
sed -ri '2,$s/(^.*$)/\1 '$(date)'/' filename
Substitute the existing line for the existing line plus a space and the date. The format of the date can be amended as required with +%.. We don't want to format the first line, so run the amendments from lines 2 to the last line ($)

Shell Script to Convert CSV to Text File

I need to create a shell script that reads a different folder based on today's date and inside the folder contains multiple files and one csv file that will have unique name everyday that is tab delimited. I want to pull this file and resave it as a text file.
Example of file path:
data/model/output20190725 (folder contains multiple files, new folder is created everyday)
-logfile1
-logfile2
-part3983isis4838.csv (this csv file will have a new and randomly generated name everyday, the csv file is also tab delimited)
I know how to go from a csv file to a text file, but I don't know how to add the logic of the folder name and the csv name changing everyday.
I saw that I could possibly use grep, but I don't know how to navigate to today's date folder and pull the csv and pass to the next argument to make the conversion.
grep -l .csv * |

Bash Script to read CSV file and search directory for files to copy

I'm working on creating bash script to read a CSV file (comma delineated). The file contains parts of names for files in another directory. I then need to take these names and use them to search the directory and copy the correct files to a new folder.
I am able to read the csv file. However, csv file only contains part of the file names so I need to use wildcards to search the directory for the files. I have been unable to get the wildcards to work within the directory.
CSV File Format (in notepad):
12
13
14
15
Example file names in target directory:
IXI12_asfds.nii
IXI13_asdscds.nii
IXI14_aswe32fds.nii
IXI15_asf432ds.nii
The prefix to all of the files is the same: IXI. The csv file contains the unique numbers for each target file which appear right after the prefix. The middle portion of the filenames are unique to each file.
#!/bin/bash
# CSV file with comma delineated numbers.
# CSV file only contains part of the file name. Need to add IXI to the
beginning, and search with a wildcard at the end.
input="CSV_file.csv"
while IFS=',' read -r file_name1
do
name=(IXI$file_name1)
cp $name*.nii /newfolder
done < "$input"
The error I keep getting is saying that no folder with the appropriate name is able to be identified.

Resources